sed与正则表达式

sed与正则表达式

行的开头（^）

^匹配每一行的开头

[root@sishen ~]# sed -n '/^103/ p ' employee.txt

103,Raj Reddy,Sysadmin

只有^出现在正则表达式开头时，它才匹配行的开头，所以，^N匹配所有以N开头的行。

行的结尾（$）

$匹配行的结尾

显示以字符r结尾的行

[root@sishen ~]# sed -n '/r$/ p' employee.txt

102,Jason Smith,IT Manager

104,Anand Ram,Developer

105,Jane Miller,Sales Manager

单个字符（.）

元字符点 . 匹配除换行符之外的任意单个字符

l . 匹配单个字符

l . . 匹配两个字符

l . . .匹配三个字符

l …..依次类推

下面的例子中，模式”J后面跟三个字符和一个空格”将被替换为“Jason后面一个空格”

所以“J····“后面同时匹配employee.txt文件中的“John”和“Jane”，替换结果如下

[root@sishen ~]# sed -n 's/J.../Jason/ p' employee.txt

101,Jason Doe,CEO

102,Jasonn Smith,IT Manager

105,Jason Miller,Sales Manager

匹配0次或多次（*）

星号*匹配0个或多个其前面的字符，如：1*匹配0个或多个1

首先建立测试文件

[root@sishen ~]# vim log.txt

log:input.txt

log:

log:testing resumed

log:

log:output created

例如：只查看包含log且后面有信息的行

[root@sishen ~]# sed -n '/log: *./p' log.txt #注意log冒号后面有空格，而且点也是必须的

log:input.txt

log:testing resumed

log:output created

匹配一次或多次（+）

“+”匹配一次或多次他前面的字符，例如空格+ 或 “+“匹配至少一个或多个空格

只显示包含log且log后面有一个或多个空格的所有行

[root@sishen ~]# sed -n '/log: +/ p' log.txt

log: input.txt

log: testing resumed

注意<tab>与空格不同

零次或一次匹配（?）

?匹配0次货一次他前面的字符

[root@sishen ~]# sed -n '/log: ?/ p' log.txt

log: input.txt

log:

log: testing resumed

log:

log:output created

转义字符（）

如果要在正则表达式中搜寻特殊字符（如：*， .）必须使用来转义它们

[root@sishen ~]# sed -n '/127.0.0.1/ p' /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

字符集[0-9]

匹配包含2、3、或者4的行

[root@sishen ~]# sed -n '/[234]/p' employee.txt

102,Jason Smith,IT Manager

103,Raj Reddy,Sysadmin

104,Anand Ram,Developer

注意：方括号中，可以是哟个连接字符指定一个字符范围，如[0123456789]可以用[0-9]表示，字母可以用[a-z],[A-Z]表示，等等

匹配包含2、3或者4的行另一种方式

[root@sishen ~]# sed -n '/[2-4]/p' employee.txt

102,Jason Smith,IT Manager

103,Raj Reddy,Sysadmin

104,Anand Ram,Developer

其他正则表达式

或操作符（|）

管道符号用来匹配两边任意一个子表达式，子表达式1|子表达式2匹配子表达式1或者子表达式2

打印包含101或者包含102的行

[root@sishen ~]# sed -n '/101|102/p' employee.txt

101,John Doe,CEO

102,Jason Smith,IT Manager

打印包含数字2~3或者包含105的行

[root@sishen ~]# sed -n '/[2-3]|105/p' employee.txt

102,Jason Smith,IT Manager

103,Raj Reddy,Sysadmin

105,Jane Miller,Sales Manager

精确匹配m次（{m}）

正则表达式后面跟上{m}注明精确匹配该正则m次

首先建立测试文件

[root@sishen ~]# vim number.txt

1

12

123

1234

12345

123456

打印包含任意数字的行（相当于打印所有的行对于该文件来说）

[root@sishen ~]# sed -n '/[0-9]/p' number.txt

1

12

123

1234

12345

123456

打印只包含5个数字的行

[root@sishen ~]# sed -n '/^[0-9]{5}$/p' number.txt

12345

注意这里一定要有开头和结尾符号

匹配m至n次（{m,n}）

正则表达式后面跟上{m,n}表明精确匹配该正则至少m，最多n次，m和n不能是负数，并且要小于255

打印由3至5个数字组成的行

[root@sishen ~]# sed -n '/^[0-9]{3,5}$/ p' number.txt

123

1234

12345

正则表达式后面跟上{m,}表明精确匹配该正则至少m，最多不限，（同样如果是{,n}表明最多n次最少一次）

字符边界（）

用来匹配单词开头（xx）或（xx）的任意字符，因此the将匹配the，但不匹配they，the将匹配the或they

首先建立测试文件

[root@sishen ~]# vim words.txt

word matching using:the

word matching using:thethe

word matching using:they

匹配包含the作为整个单词的行

[root@sishen ~]# sed -n '/the/ p' words.txt

word matching using:the

注意如果没有后面的那个，效果将等同匹配包含所有以the开头的单词的行

[root@sishen ~]# sed -n '/the/ p' words.txt

word matching using:the

word matching using:thethe

word matching using:they

回溯引用（）

使用回溯引用可以给正则表达式分组，以便后面引用他们

只匹配重复the两次的行

[root@sishen ~]# sed -n '/(the)1/ p' words.txt

word matching using:thethe

同理，“([0-9])1“匹配连续两个相同的数字，如11,22,33 ····

在sed替换中使用正则表达式

把employee.txt 中每行最后两个字符替换为“Not defined “:

[root@sishen ~]# sed -n 's/..$/,Not Defined/ p' employee.txt

101,John Doe,C,Not Defined

102,Jason Smith,IT Manag,Not Defined

103,Raj Reddy,Sysadm,Not Defined

104,Anand Ram,Develop,Not Defined

105,Jane Miller,Sales Manag,Not Defined

删除Manager及其以后的字符：

[root@sishen ~]# sed 's/Manager.*//' employee.txt | cat -A

101,John Doe,CEO$

102,Jason Smith,IT $

103,Raj Reddy,Sysadmin$

104,Anand Ram,Developer$

105,Jane Miller,Sales $

注意：原文中没有“|cat -A“，是为了表现102和105最后的空格而添加的

删除所有以#开头的行

[root@sishen ~]# cat employee.txt

101,John Doe,CEO

102,Jason Smith,IT Manager

103,Raj Reddy,Sysadmin

104,Anand Ram,Developer

105,Jane Miller,Sales Manager

#106,Jane Miller,Sales Manager

#107,Jane Miller,Sales Manager

[root@sishen ~]# sed -e 's/^#.*// ; /^$/d' employee.txt

101,John Doe,CEO

102,Jason Smith,IT Manager

103,Raj Reddy,Sysadmin

104,Anand Ram,Developer

105,Jane Miller,Sales Manager

[root@sishen ~]# sed '/^#/d' employee.txt

101,John Doe,CEO

102,Jason Smith,IT Manager

103,Raj Reddy,Sysadmin

104,Anand Ram,Developer

105,Jane Miller,Sales Manager

首先建立test.html文件

[root@sishen ~]# vim test.html

<html><body><h1>Hello word!</h1></body></html>

清楚test.html文件中的所有html标签

[root@sishen ~]# sed 's/<[^>]*>//g' test.html

Hello word!

删除所有注释和空行

[root@sishen ~]# sed -e 's/#.*//;/^$/ d' /etc/profile

只删除注释行不删除空行

[root@sishen ~]# sed '/^#.*/d' /etc/profile

使用sed可以把DOS的换行符（CR/LF）替换为Unix格式。当把DOS格式的文件拷贝到Unix上，你会发现，每行结尾都有

使用sed把DOS格式的文件转换为Unix格式

[root@sishen ~]# sed 's/.$//' filename
相关阅读:
pip install pli 提示：Could not find a version that satisfies the requirement PIL
关于selenium部分元素定位不到的解决办法
 ERROR 1054 (42S22): Unknown column ‘password‘ in ‘field list‘
通过Tomcat jpress连接不到数据库
 Navicat MySQL 连接出现 Authentication plugin ‘caching_sha2_password‘ cannot be loaded
Selenium中核心属性以及方法
 selenium中定位frame中的元素
 selenium中截屏以及按照时间格式保存到相应文件夹
 Selenium中核心属性以及方法
 selenium中关于js脚本的一些操作
原文地址：https://www.cnblogs.com/zd520pyx1314/p/6061339.html