• 杭电OJ第4018题 Parsing URL


      杭电OJ第4018题,Parsing URL题目链接)。

    Parsing URL

    Problem Description

    In computing, a Uniform Resource Locator or Universal Resource Locator (URL) is a character string that specifies where a known resource is available on the Internet and the mechanism for retrieving it.
    The syntax of a typical URL is:
    scheme://domain:port/path?query_string#fragment_id
    In this problem, the scheme, domain is required by all URL and other components are optional. That is, for example, the following are all correct urls:
    http://dict.bing.com.cn/#%E5%B0%8F%E6%95%B0%E7%82%B9
    http://www.mariowiki.com/Mushroom
    https://mail.google.com/mail/?shva=1#inbox
    http://en.wikipedia.org/wiki/Bowser_(character)
    ftp://fs.fudan.edu.cn/
    telnet://bbs.fudan.edu.cn/
    http://mail.bashu.cn:8080/BsOnline/
    Your task is to find the domain for all given URLs.

    Input

    There are multiple test cases in this problem. The first line of input contains a single integer denoting the number of test cases. For each of test case, there is only one line contains a valid URL.

    Output

    For each test case, you should output the domain of the given URL.

    Sample Input

    3
    http://dict.bing.com.cn/#%E5%B0%8F%E6%95%B0%E7%82%B9
    http://www.mariowiki.com/Mushroom
    https://mail.google.com/mail/?shva=1#inbox

    Sample Output

    Case #1: dict.bing.com.cn
    Case #2: www.mariowiki.com
    Case #3: mail.google.com

    Source

    The 36th ACM/ICPC Asia Regional Shanghai Site —— Warmup

      解题思路:简单的字符串解析,没有任何难度。不过要注意,不要输出端口号。直接用Java的正则表达式就能轻松搞定。

    import java.io.*;
    import java.util.*;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    public class Main
    {
        public static void main(String args[])
        {
            Scanner cin = new Scanner(System.in);
            int n;
            String URL;
            Matcher matcher;
            Pattern pattern = Pattern.compile("([A-Za-z]+://)([^:/]+)[:/].*");
    
            n = cin.nextInt();
            URL = cin.nextLine();
            for ( int i = 1 ; i <= n ; i ++ )
            {
                URL = cin.nextLine();
                matcher = pattern.matcher(URL);
                if ( matcher.matches() )
                    System.out.println("Case #" + i + ": " + matcher.group(2) );
            }
        }
    }

      喜欢用C语言搞也行。C语言本来可以用GNU正则表达式的。

    C语言 + GNU正则表达式
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <regex.h>
    
    typedef int COUNT;
    
    #define MAX_LENGTH 1000
    
    int main (void)
    {
        COUNT i;
        int n;
        char url[MAX_LENGTH];
        regmatch_t pmatch[4];
        regex_t match_regex;
    
        regcomp( &match_regex, "([A-Za-z]+://)([^:/]+)([:/].*)", REG_EXTENDED );
    
        scanf( "%d", &n );
        for ( i = 1 ; i <= n ; i ++ )
        {
            scanf( "%s", url );
            regexec( &match_regex, url, 4, pmatch, 0 );
            url[pmatch[2].rm_eo] = '\0';
            puts( &(url[pmatch[2].rm_so]) );
        }
    
        regfree( &match_regex );
        return EXIT_SUCCESS;
    }

    不过杭电OJWindows服务器,用的gcc编译器是MinGWgcc,所以不支持GNU正则表达式,所以如果用C语言写,就只能自己解析字符串了。C代码如下:

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <stdbool.h>
    
    typedef int COUNT;
    
    #define MAX_LENGTH 1000
    
    int main (void)
    {
        COUNT i, j;
        int n;
        bool starturl;
        char url[MAX_LENGTH];
        char outputurl[MAX_LENGTH];
        int len;
        scanf( "%d", &n );
        for ( i = 1 ; i <= n ; i ++ )
        {
            starturl = false;
            scanf( "%s", url );
            sprintf (outputurl, "Case #%d: ", i );
            len = strlen( outputurl );
            for ( j = 0 ; url[j] != '\0' ; j ++ )
            {
                if ( !starturl )
                {
                    if ( url[j] == '/' )
                    {
                        j ++;
                        starturl = true;
                    }
                }
                else
                {
                    if ( url[j] == ':' 
                            || url[j] == '/'
                            || url[j] == '\0' )
                        break;
                    outputurl[len++] = url[j];
                }
            }
            outputurl[len] = '\0';
            puts( outputurl );
        }
        return EXIT_SUCCESS;
    }
  • 相关阅读:
    “王者对战”之 MySQL 8 vs PostgreSQL 10
    PostgreSQL 进程结构
    Linux core dump 诊断进程奔溃退出
    linux下core dump--转载
    2.4 等比数列
    2.3 等差数列的前n项和
    2.2 等差数列
    1.1.1 三角形正弦定理
    调整颜色
    去括号法则
  • 原文地址:https://www.cnblogs.com/yejianfei/p/2697706.html
Copyright © 2020-2023  润新知