cat test2.html | sed -e 's/(^|[^0-9])(13[0-9][0-9]{8}|14[579][0-9]{8}|15[0-3,5-9][0-9]{8}|16[6][0-9]{8}|17[0135678][0-9]{8}|18[0-9][0-9]{8}|19[89][0-9]{8})($|[^0-9])/ find_phone:2 /g' | sed -e 's/(^|[^0-9])([0-9]{6}[1-2][0-9]{3}((0[1-9])|(10|11|12))(([0-2][1-9])|10|20|30|31)[0-9]{3}[0-9Xx])($|[^0-9])/ find_idcard:2 /g' | awk '/find_.*/{printf $1;printf " "}'
测试文件test2.html内容:
dddd
bbb131102198910084421ccc eee13611112222fff13133334444
h15855556666j
aaaa
13177778888
13199990000
18611112222
370785199507319527
测试结果:
find_idcard:131102198910084421 find_phone:13611112222 find_phone:13133334444 find_phone:15855556666 find_phone:13177778888 find_phone:13199990000 find_phone:18611112222 find_idcard:370785199507319527
身份证号正则式:https://www.jb51.net/article/109384.htm
只是参考,不能直接用,shell中或|要加;左右括号()也要加;表示8个数字应为[0-9]{8} https://zhidao.baidu.com/question/1115861792946350259.html;^表示开头$表示结尾,不需要加
手机号正则式:https://blog.csdn.net/voidmain_123/article/details/78962164 同只是参考,不能直接用
awk命令:按行读取。未匹配上的不保留 https://www.cnblogs.com/xudong-bupt/p/3721210.html
sed命令:我自己试出来的。。
awk、sed、grep、fgrep、egrep:
https://www.cnblogs.com/EasonJim/p/8282511.html
https://blog.csdn.net/qq504196282/article/details/52995198
https://www.cnblogs.com/moveofgod/p/3540575.html
同时匹配ABC 和 123: sed -n '/ABC/{/123/p}' awk '/ABC/&&/123/{ print $0 }' grep -E '(ABC.*123|123.*ABC)' 匹配ABC 或 123: sed -n '/(ABC|123)/p' awk '/ABC/||/123/{ print $0 }' grep -E '(ABC|123)' 或 egrep 'ABC|123'
shell awk输出换行print,shell输出不换行printf,连续输出中间用分号