• What does a character class with only a lone caret do?


    What does a character class with only a lone caret do?

    [^][…] is not two character classes but just one character class containing any other character except ], [, and (see Special Characters Inside a Bracketed Character Class):

    However, if the ] is the first (or the second if the first character is a caret) character of a bracketed character class, it does not denote the end of the class (as you cannot have an empty class) and is considered part of the set of characters that can be matched without escaping.

    Examples:

    "+"   =~ /[+?*]/     #  Match, "+" in a character class is not special.
    "cH" =~ /[]/      #  Match,  inside in a character class
                         #  is equivalent to a backspace.
    "]"   =~ /[][]/      #  Match, as the character class contains.
                         #  both [ and ].
    "[]"  =~ /[[]]/      #  Match, the pattern contains a character class
                         #  containing just ], and the character class is
                         #  followed by a ].
    

    Character Classes or Character Sets

    With a “character class”, also called “character set”, you can tell the regex engine to match only one out of several characters. Simply place the characters you want to match between square brackets. If you want to match an a or an e, use [ae]. You could use this in gr[ae]y to match either gray or grey. Very useful if you do not know whether the document you are searching through is written in American or British English.

    A character class matches only a single character. gr[ae]y does not match graay, graey or any such thing. The order of the characters inside a character class does not matter. The results are identical.

    You can use a hyphen inside a character class to specify a range of characters. [0-9] matches a single digit between 0 and 9. You can use more than one range. [0-9a-fA-F] matches a single hexadecimal digit, case insensitively. You can combine ranges and single characters. [0-9a-fxA-FX] matches a hexadecimal digit or the letter X. Again, the order of the characters and the ranges does not matter.

    Character classes are one of the most commonly used features of regular expressions. You can find a word, even if it is misspelled, such as sep[ae]r[ae]te or li[cs]en[cs]e. You can find an identifier in a programming language with [A-Za-z_][A-Za-z_0-9]*. You can find a C-style hexadecimal number with 0[xX][A-Fa-f0-9]+.

    Negated Character Classes

    Typing a caret after the opening square bracket negates the character class. The result is that the character class matches any character that is not in the character class. Unlike the dot, negated character classes also match (invisible) line break characters. If you don’t want a negated character class to match line breaks, you need to include the line break characters in the class. [^0-9 ] matches any character that is not a digit or a line break.

    It is important to remember that a negated character class still must match a character. q[^u] does not mean: “a q not followed by a u”. It means: “a q followed by a character that is not a u”. It does not match the q in the string Iraq. It does match the q and the space after the q in Iraq is a country. Indeed: the space becomes part of the overall match, because it is the “character that is not a u” that is matched by the negated character class in the above regexp. If you want the regex to match the q, and only the q, in both strings, you need to use negative lookahead: q(?!u). But we will get to that later.

    https://webchat.freenode.net/#regex

    Yes. Javascript and ruby, maybe some others.

    Well, you can check the documentation for your given engine or look at https://www.regular-expressions.info/charclass.html
     
    I was wrong about ruby. It's just javascript.
     
    Ruby treats it as an error, whereas other engines treat it as an incomplete character class.
     
    "You can include an unescaped closing bracket by placing it right after the opening bracket, or right after the negating caret. []x] matches a closing bracket or an x. [^]x] matches any character that is not a closing bracket or an x. This does not work in JavaScript, which treats [] as an empty character class that always fails to match, and [^] as a negated empty character class that matches any single character. Ruby treats empty character
     
    classes as an error. So both JavaScript and Ruby require closing brackets to be escaped with a backslash to include them as literals in a character class."
    There is a standard rule. The standard rule is, if ] is the first non-meta character in a character class, it's taken literally. Javascript and ruby break that rule.
     
    You may even say that they're the exceptions that prove the rule.
     
     

    Negate characters in Regular Expression [closed]

    The caret inside of a character class [^ ] is the negation operator common to most regular expression implementations (Perl, .NET, Ruby, Javascript, etc). So I'd do it like this:

    [^Wsd]
    
    • ^ - Matches anything NOT in the character class
    • W - matches non-word characters (a word character would be defined as a-z, A-Z, 0-9, and underscore).
    • s - matches whitespace (space, tab, carriage return, line feed)
    • d - matches 0-9

    Or you can take another approach by simply including only what you want:

    [A-Za-z]
    

    The main difference here is that the first one will include underscores. That, and it demonstrates a way of writing the expression in the same terms that you're thinking. But if you reverse you're thinking to include characters instead of excluding them, then that can sometimes result in an easier to read regular expression.

    It's not completely clear to me which special characters you don't want. But I wrote out both solutions just in case one works better for you than the other.

     test in .net

    .net的实现是把[^][…]解析成一个character class

     
     List<string> list = new List<string>()
                    {
                        "1…]3",
                        "12[5",
                        "1][…4"
                    };
                    Regex regex = new Regex("[^][…]");
                    foreach (var item in list)
                    {
                        Console.WriteLine(regex.IsMatch(item));
                        var col = regex.Matches(item);
                        foreach (var c in col)
                        {
                            Console.WriteLine(c);
                        }
                    }
     
  • 相关阅读:
    bzoj3110
    idea 设置系列 各种乱码
    vim 系列
    idea 神键
    简单工厂,工厂方法,抽象工厂
    log4 按包进行日志输出
    maven依赖本地宝
    kafka 理论学习
    kafka windows环境搭建 测试
    linux 查找文件的命令
  • 原文地址:https://www.cnblogs.com/chucklu/p/14206336.html
Copyright © 2020-2023  润新知