概念:
正则表达式,又称规则表达式。(英语:Regular Expression,在代码中常简写为regex、regexp或RE),计算机科学的一个概念。
正则表通常被用来检索、替换那些符合某个模式(规则)的文本。
用途:
通常用于判断语句,检查字符串是否满足某一格式(匹配)。字符串查找、替换等。
正则表达式是含有一些特殊意义的字符的字符串,这些特殊字符称为正则表达式的元字符。
涉及的类
java.lang.String
java.util.regex.Pattern----模式
java.util.regex.Matcher---结果
示例:"."代表任何一个字符。“abc”用“...”匹配
public class RegExp { public static void main(String[] args){ //简单介绍正则表达式 System.out.println("abc".matches("...")); } }
"d"---0-9任意数字,java正则表达式在元字符基础上需要加""区分转义字符,所以写成“\d”
public class RegExp { public static void main(String[] args){ //简单介绍正则表达式 p("abc".matches("..."));//匹配 //"d"---匹配数字 p("d1234w".replaceAll("\d", "-"));//替换,采用的是反斜杠 } public static void p(Object o){ System.out.println(o); } }
类的介绍:
Pattern
定义:
A compiled representation of a regular expression.
A regular expression, specified as a string, must first be compiled into an instance of this class. The resulting pattern can then be used to create a Matcher
object that can match arbitrary character sequences against the regular expression. All of the state involved in performing a match resides in the matcher, so many matchers can share the same pattern.
A typical invocation sequence is thus
Pattern p = Pattern.compile
("a*b"); Matcher m = p.matcher
("aaaaab"); boolean b = m.matches
();
A matches
method is defined by this class as a convenience for when a regular expression is used just once. This method compiles an expression and matches an input sequence against it in a single invocation. The statement
boolean b = Pattern.matches("a*b", "aaaaab");
is equivalent to the three statements above, though for repeated matches it is less efficient since it does not allow the compiled pattern to be reused.
下面的写法更有效率efficient ,同时Pattern和Matcher提供了更多的方法。
Pattern p = Pattern.compile("a*b"); Matcher m = p.matcher("aaaaab"); boolean b = m.matches();
[a-z]代表一个在a-z范围内的字母
[]代表范围;
限定修饰符
?---0次或者多次
*----0次或者多次
+---一次或者多次
{n}---正好出现{n}次
{n,}--至少出现n次
{n,m}出现n~m次
//范围
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegExp { public static void main(String[] args){ //范围 p("a".matches("[abc]")); p("a".matches("[^abc]"));//除了abc之外的都可以 p("A".matches("[a-zA-Z]"));//任意字母都可以 p("A".matches("[a-z]|[A-Z]"));//a-z或者A-Z,任意字母都可以 p("A".matches("[a-z[A-Z]]"));//一样 p("A".matches("[A-Z]&&[REG]"));//属于A-Z而且是EEG中的一个 } public static void p(Object o){ System.out.println(o); } }
//Predefined character classes
"\".matches("\\")----匹配一个反斜线要写4个,前面写一个就会认为是转义,后面写两个会出错,三个转义,四个正确(暂时不清楚原理)
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegExp { public static void main(String[] args){ //认识s w d p(" ".matches("\s{4}")); p(" ".matches("\S")); p("a_8".matches("\w{3}")); p("abc888&^%".matches("[a-z]{1,3}\d+[&^#%]+")); p("\".matches("\\")); } public static void p(Object o){ System.out.println(o); } }
Predefined character classes | |
---|---|
. | Any character (may or may not match line terminators) |
d | A digit: [0-9] |
D | A non-digit: [^0-9] |
h | A horizontal whitespace character: [ xA0u1680u180eu2000-u200au202fu205fu3000] |
H | A non-horizontal whitespace character: [^h] |
s | A whitespace character: [ x0Bf ] |
S | A non-whitespace character: [^s] |
v | A vertical whitespace character: [ x0Bf x85u2028u2029] |
V | A non-vertical whitespace character: [^v] |
w | A word character: [a-zA-Z_0-9] |
W | A non-word character: [^w] |
find()
Attempts to find the next subsequence(子序列) of the input sequence that matches the pattern.
reset()
Resetting a matcher discards all of its explicit state information and sets its append position to zero.
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegExp { public static void main(String[] args){ //matches find looking Pattern p = Pattern.compile("\d{3,5}"); String s = "123-45623-789-00"; Matcher m = p.matcher(s); p(m.matches()); m.reset();//matches方法和find方法会造成冲突,记得要调用reset方法 p(m.find()); p(m.start()+"-"+ m.end()); p(m.find()); p(m.start()+"-"+ m.end()); p(m.find()); p(m.start()+"-"+ m.end()); p(m.lookingAt()); p(m.lookingAt()); p(m.lookingAt()); p(m.lookingAt()); } public static void p(Object o){ System.out.println(o); } }
查找替代
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegExp { public static void main(String[] args){ //replacement 可以参考appendReplacement()在API文档里面的描述 Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE); Matcher m = p.matcher("java Java Java I love Java u hate JAVA sfarwwfr"); // p(m.replaceAll("JAVA"));//所有都替换成JAVA StringBuffer buf = new StringBuffer(); int i = 0; while(m.find()){ //寻找 i++; if (i%2 == 0) { //单数替换为java双数替换成JAVA m.appendReplacement(buf, "java"); } else { m.appendReplacement(buf, "JAVA"); } } m.appendTail(buf);//appendReplacement()多次调用后用此方法补全尾部 p(buf); } public static void p(Object o){ System.out.println(o); } }
分组
Matcher.group()-----Returns the input subsequence matched by the previous match.
1 ((A)(B(C)))
2 (A)
3 (B(C))
4 (C)
group运用括号可以得到不同的分组,eg:group(1);group(2)
public class RegExp { public static void main(String[] args){ //groupregex Pattern p = Pattern.compile("(\d{3,5})|([a-z]{2})"); String s = "123aa-34345bb-234cc-00"; Matcher m = p.matcher(s); while (m.find()) { p(m.group(2)); } } public static void p(Object o){ System.out.println(o); } }
总结几个重要的知识点: