基础正则表达式

基本正则表达式

（一）正则表达式介绍

正则表达式是处理文件的内容，也就是字符

REGEXP ：由一类特殊字符及文本字符所编写的模式，其中有些字符（元字符）不表示字符字面意义，而表示控制或通配的功能。

程序支持：：grep,sed,awk,vim, less,nginx,varnish等

分两类：

基本正则表达式：BRE

扩展正则表达式：ERE。grep -E, egrep

正则表达式引擎：

采用不同算法，检查处理正则表达式的软件模块PCRE（Perl Compatible Regular Expressions）

元字符分类：字符匹配、匹配次数、位置锚定、分组

文件名可以使用通配符表示

*是通配符，涉及到文件管理。通配符是模糊匹配

aaa  aa.txt  access_log  anaconda-ks.cfg  anan.diff
[root@centos72 ~]# ls  a*  -l
-rw-r--r--. 1 root root        9 May  7 13:28 aaa
-rw-r--r--. 1 root root       27 May  7 19:11 aa.txt
-rw-r--r--. 1 root root 14372536 May  7 22:31 access_log
-rw-------. 1 root root     1592 Jan 13 00:22 anaconda-ks.cfg
-rw-r--r--. 1 root root      359 May  7 23:07 anan.diff

正则表达式是匹配字符串，不是匹配文件名。正则表达式是通用技术，对于开发也适用，所以非常重要。

学会了基本正则表达式也就学会了扩展的正则表达式。

正则表达式涉及到算法，正则表达式引擎和汽车的发动机类似，实际上就是软件。

采用不同算法，检查处理正则表达式的软件模块PCRE（Perl Compatible Regular Expressions）

perl语言功能太强太灵活，这就让很多人很难掌握。

代码主要是维护，经常要进行修改，所以代码写的容易理解是最好的

diff工具促进了开源软件的发展，因为可以对比代码

[root@centos72 ~]# rpm  -q  pcre
pcre-8.32-17.el7.x86_64
[root@centos72 ~]# which  pcre
/usr/bin/which: no pcre in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)
[root@centos72 ~]# yum  whatprovides  pcre
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
pcre-8.32-17.el7.x86_64 : Perl-compatible regular expression library
Repo        : base



pcre-8.32-17.el7.x86_64 : Perl-compatible regular expression library
Repo        : @anaconda

查看帮助

Regular expressions ("RE"s), as defined in POSIX.2表示其为国际上开发软件的标准

[root@centos72 ~]# man  7 regex
No manual entry for regex in section 7
[root@centos72 ~]# yum install man-pages
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Resolving Dependencies
--> Running transaction check
---> Package man-pages.noarch 0:3.53-5.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

========================================================================================================
 Package                   Arch                   Version                    Repository            Size
========================================================================================================
Installing:
 man-pages                 noarch                 3.53-5.el7                 base                 5.0 M

Transaction Summary
========================================================================================================
Install  1 Package

Total download size: 5.0 M
Installed size: 4.6 M
Is this ok [y/d/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : man-pages-3.53-5.el7.noarch                                                          1/1 
  Verifying  : man-pages-3.53-5.el7.noarch                                                          1/1 

Installed:
  man-pages.noarch 0:3.53-5.el7                                                                         

Complete!
[root@centos72 ~]#  man 7 regex

（二）基本正则表达式元字符匹配

. 匹配任意单个字符

[ ] 匹配指定范围内的任意单个字符

[ ^ ] 匹配指定范围外的任意单个字符

[:alnum:] 字母和数字

[:alpha:] 代表任何英文大小写字符，亦即 A-Z, a-z

[:lower:] 小写字母 [:upper:] 大写字母

[:blank:] 空白字符（空格和制表符）

[:space:] 水平和垂直的空白字符（比[:blank:] 包含的范围广）

[:cntrl:] 不可打印的控制字符（退格、删除、警铃...）
[:digit:] 字十进制数字 [:xdigit:] 十六进制数字

[:graph:] 可打印的非空白字符

[:print:] 可打印字符

[:punct:] 标点符号

（1）. 匹配任意单个字符

[root@centos72 ~]# grep  r..t  /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
[root@centos72 ~]# grep  root  /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

（2）[ ] 匹配指定范围内的任意单个字符

和文件通配符类似，只是适用的地方不同

[root@centos72 ~]# grep  [abco][abo]  /etc/passwd
root:x:0:0:root:/root:/bin/bash
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
wang:x:1000:1000:wang:/home/wang:/bin/bash
[root@centos72 ~]# grep  r[abco][abo]t  /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

[root@centos72 ~]# echo   raat   |  grep  r[abco][abo]t 
raat
[root@centos72 ~]# echo   ract   |  grep  r[abco][abo]t 
[root@centos72 ~]# echo   rabt   |  grep  r[abco][abo]t 
rabt
[root@centos72 ~]# echo   raot   |  grep  r[abco][abo]t 
raot

（3）[^] 匹配指定范围外的任意单个字符

[root@centos72 ~]# echo   ract   |  grep  r[abco][^abo]t 
ract
[root@centos72 ~]# echo   ract   |  grep  r[^abco][^abo]t

（4）[:digit:] 字十进制数字 [:xdigit:] 十六进制数字

取出主版本号

在写脚本的时候要判断版本的不同

[[:digit:]]注意要写两个中括号，第1个中括号表示0-9，第2个中括号表示0-9的某个数字

[root@centos72 ~]# cat  /etc/centos-release  |  grep   [[:digit:]]
CentOS Linux release 7.5.1804 (Core)

只取6和7这个数字

如果版本在达到了10，那么也要对其考虑

[root@centos72 ~]# cat  /etc/centos-release  |  grep   -o  [[:digit:]]  
7
5
1
8
0
4

使用head过滤出了第1个

[root@centos72 ~]# cat  /etc/centos-release  |  grep   -o  [[:digit:]]  | head -n1
7

取出6版本

[root@centos65 ~]# cat  /etc/centos-release  |  grep   -o  [[:digit:]]  
6
8

[root@centos65 ~]# cat  /etc/centos-release  |  grep   -o  [[:digit:]]  | head -n1
6

如果版本在达到了10，那么也要对其考虑

[root@centos72 ~]# cat  /app/centos-release 
CentOS Linux release 17.5.1804 (Core) 
[root@centos72 ~]# cat  /app/centos-release  |  grep   -o  [[:digit:]]  
1
7
5
1
8
0
4
[root@centos72 ~]# cat  /app/centos-release  |  grep   -w  [[:digit:]]  
CentOS Linux release 17.5.1804 (Core)

（三）匹配次数

匹配次数：用在要指定次数的字符后面，用于指定前面的字符要出现的次数

* 匹配前面的字符任意次，包括0次

贪婪模式：尽可能长的匹配

.* 任意长度的任意字符

? 匹配其前面的字符0 或1次

+ 匹配其前面的字符至少1次

{n} 匹配前面的字符n次

{m,n} 匹配前面的字符至少m 次，至多n次

{,n} 匹配前面的字符至多n次

{n,} 匹配前面的字符至少n次

ab*的情况是a,ab,abb,abbb,abbbb.......

所以和a没有关系的，只是打酱油的

文件通配符的*和正则表达式的*不一样。

通配符针对的是文件，比如下面匹配的是a开头的文件

[root@centos72 ~]# ls  a*
aaa  aa.txt  access_log  anaconda-ks.cfg  anan.diff
[root@centos72 ~]# ll  a*  
-rw-r--r--. 1 root root        9 May  7 13:28 aaa
-rw-r--r--. 1 root root       27 May  7 19:11 aa.txt
-rw-r--r--. 1 root root 14372536 May  7 22:31 access_log
-rw-------. 1 root root     1592 Jan 13 00:22 anaconda-ks.cfg
-rw-r--r--. 1 root root      359 May  7 23:07 anan.diff

[root@centos72 ~]# grep  root  /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
[root@centos72 ~]# grep  ro*t  /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

（2）.* 任意长度的任意字符

因为.是匹配任意单个字符，*匹配前面的字符任意次

贪婪模式：尽可能长的匹配，经常使用

巧记吃了点心才有动力去做任何事情

[root@centos72 ~]# grep   r.*t  /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
polkitd:x:999:998:User for polkitd:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin

[root@centos72 ~]# ls  |  grep  a.*
aaa
aa.txt
access_log
anaconda-ks.cfg

注意如果涉及到文件名，那么关键字就要加双引号，否则会被认为是通配符

[root@centos72 ~]# grep  "a*"  anaconda-ks.cfg 
#version=DEVEL
# System authorization information
auth --enableshadow --passalgo=sha512
# Use CDROM installation media
cdrom
# Use graphical install
graphical
# Run the Setup Agent on first boot
firstboot --enable
ignoredisk --only-use=sda
# Keyboard layouts
keyboard --vckeymap=us --xlayouts='us'
# System language
lang en_US.UTF-8

# Network information
network  --bootproto=dhcp --device=ens33 --onboot=off --ipv6=auto --no-activate
network  --hostname=centos72.huawei.com

# Root password
rootpw --iscrypted $6$3ZpKJEd3ctkruWkF$ACv/Y4HSNb4lTqk4Gbol157B2lHw0AVcKM1rjEshEOrMcIIXw1DvoPPCZy3y3i.SijcTdTAfvFs/uFPwLxKd51
# System services
services --disabled="chronyd"
# System timezone
timezone Asia/Shanghai --isUtc --nontp
user --name=wang --password=$6$PqqaCIq7qipkXclF$5idE9A8TzG/yLzqHbmlSg9cVaNUmxPG85y/K81a0KSrosFH/srLzY0HQxeTUMZKs.KVoyJOphaA8Xz.nidUF// --iscrypted --gecos="wang"
# System bootloader configuration
bootloader --location=mbr --boot-drive=sda
# Partition clearing information
clearpart --none --initlabel
# Disk partitioning information
part swap --fstype="swap" --ondisk=sda --size=2048
part /app --fstype="xfs" --ondisk=sda --size=20480
part / --fstype="xfs" --ondisk=sda --size=51200
part /boot --fstype="xfs" --ondisk=sda --size=1024

%packages
@^minimal
@core

%end

%addon com_redhat_kdump --disable --reserve-mb='auto'

%end

%anaconda
pwpolicy root --minlen=6 --minquality=1 --notstrict --nochanges --notempty
pwpolicy user --minlen=6 --minquality=1 --notstrict --nochanges --emptyok
pwpolicy luks --minlen=6 --minquality=1 --notstrict --nochanges --notempty
%end

[root@centos65 ~]# cat  anaconda-ks.cfg  |  grep  a*
[root@centos65 ~]# cat  anaconda-ks.cfg  |  grep  a*
[root@centos65 ~]# cat  anaconda-ks.cfg  |  grep  a*
[root@centos65 ~]# cat  anaconda-ks.cfg  |  grep  a*
[root@centos65 ~]# cat  anaconda-ks.cfg  |  grep  a*

[root@centos72 ~]# grep  a* anaconda-ks.cfg 
[root@centos72 ~]# grep  a* anaconda-ks.cfg 
[root@centos72 ~]# grep  a* anaconda-ks.cfg 
[root@centos72 ~]# grep  a* anaconda-ks.cfg 
[root@centos72 ~]# ls  anaconda-ks.cfg   |  grep  a*
[root@centos72 ~]# ls  anaconda-ks.cfg   |  grep  a*
[root@centos72 ~]# ls  anaconda-ks.cfg   |  grep  a*
[root@centos72 ~]# ls  anaconda-ks.cfg   |  grep  "a*"
anaconda-ks.cfg

通过管道传输之后，文件名就是字符，而添加双引号就是正则表达式

因为查看的文件名内容，所以后面的正则表达式要添加引号

[root@centos72 ~]# ls  |  grep  "a*"
aaa
aa.txt
access_log
anaconda-ks.cfg
f1
f2
f3
f4
f5
grep

单引号也可以

[root@centos72 ~]# ls  |  grep  'a*'
aaa
aa.txt
access_log
anaconda-ks.cfg
f1
f2
f3
f4
f5
grep

? 匹配其前面的字符0或1次，也就是前面的字符是可有可无的

因为查看的文件名内容，所以后面的正则表达式要添加引号

前面加上反斜线是因为在通配符里面？是有特殊含义的

[root@centos72 ~]# cat  /etc/passwd  |  grep  "ba?" 
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
polkitd:x:999:998:User for polkitd:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
wang:x:1000:1000:wang:/home/wang:/bin/bash
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin

[root@centos65 ~]# cat  anaconda-ks.cfg  |  grep  a?
[root@centos65 ~]# cat  anaconda-ks.cfg  |  grep  "a?"
# Kickstart file automatically generated by anaconda.

#version=DEVEL
install
cdrom
lang en_US.UTF-8
keyboard us
network --onboot no --device eth0 --bootproto dhcp --noipv6
rootpw  --iscrypted $6$PnGoqdV0v.gilohW$pJsYiUbd8ZRFVyVXnZzJQutfCR.WGGsJGREUV4r6IguF9mBPXog/UJVw7RBdnF4m76RuGaQHHBZiAv46LcugO1
firewall --service=ssh
authconfig --enableshadow --passalgo=sha512
selinux --enforcing
timezone Asia/Shanghai
bootloader --location=mbr --driveorder=sda --append="crashkernel=auto rhgb quiet"
# The following is the partition information you requested
# Note that any partitions you deleted are not expressed
# here so unless you clear all partitions first, this is
# not guaranteed to work
#clearpart --none

#part /boot --fstype=ext4 --size=1024
#part / --fstype=ext4 --size=50000
#part /app --fstype=ext4 --size=20000

#part swap --size=2048


repo --name="CentOS"  --baseurl=cdrom:sr0 --cost=100

%packages
@core
@server-policy
@workstation-policy
%end

全部都匹配了，0次或者1次

[root@centos72 ~]# cat  1.txt  |  grep  a?
[root@centos72 ~]# cat  1.txt  |  grep  "a?"
a
ab
1
2
3
b
c

[root@centos72 ~]# cat  1.txt 
a
ab
1
2
3
b
c

注意是贪婪匹配，也就是只要包含就可以了，超过一次相同的关键字都可以

实际上也就是匹配了多次

比如"a?"只要有一个a就可以，如果是aa,aaa也可以，匹配了2次，3次

[root@centos72 ~]# cat  2.txt  |  grep  "a?"
a
aa
aaa
aaaa
aaaaa
1234
123456
12a
sdfg
dfgaa
dgbaaafv
dgvhaaaaa
[root@centos72 ~]# cat  2.txt  |  grep  "a?" | wc
     12      12      66
[root@centos72 ~]# cat  2.txt  
a
aa
aaa
aaaa
aaaaa
1234
123456
12a
sdfg
dfgaa
dgbaaafv
dgvhaaaaa
[root@centos72 ~]# cat  2.txt  | wc
     12      12      66

匹配的关键字就显示红色

[root@centos72 ~]# cat  3.txt  |  grep  "ba?" 
ba
b
bbb
badgkcg
fkfbljoajf
baakkfj
baaajko
[root@centos72 ~]# cat  3.txt  |  grep  "ba?" | wc
      7       7      44

[root@centos72 ~]# cat  3.txt 
ba
b
bbb
badgkcg
aaajlnhl
fkfbljoajf
baakkfj
baaajko
[root@centos72 ~]# cat  3.txt |  wc
      8       8      53

+ 匹配其前面的字符至少1次

[root@centos72 ~]# cat  /etc/passwd  |  grep  "ba+" 
root:x:0:0:root:/root:/bin/bash
wang:x:1000:1000:wang:/home/wang:/bin/bash

[root@centos72 ~]# cat  3.txt  |  grep  "ba+" 
ba
badgkcg
baakkfj
baaajko
[root@centos72 ~]# cat  3.txt 
ba
b
bbb
badgkcg
aaajlnhl
fkfbljoajf
baakkfj
baaajko

{n} 匹配前面的字符n次

先写大括号，中间是次数，在括号的前面都添加，对大括号进行转义，最前面写上要过滤出来的关键字

下面表示匹配出现3次b

[root@centos72 ~]#  cat  3.txt  |  grep  "b{3}"
bbb
[root@centos72 ~]# cat  3.txt 
ba
b
bbb
badgkcg
aaajlnhl
fkfbljoajf
baakkfj
baaajko

[root@centos72 ~]# cat  3.txt 
ba
b
bbb
badgkcg
aaajlnhl
fkfbljoajf
baakkfj
baaajko

贪婪匹配

文件出现了2次及以上，那么2次都可匹配

[root@centos72 ~]#  cat  3.txt  |  grep  "b{2}"
bbb
[root@centos72 ~]#  cat  3.txt  |  grep  "b{1}"
ba
b
bbb
badgkcg
fkfbljoajf
baakkfj
baaajko

正则表达式最好都添加引号

[root@centos72 ~]# echo  "bbb"  |   grep  "b{1}"
bbb
[root@centos72 ~]# echo  "bbb"  |   grep  "b{1}"
bbb
[root@centos72 ~]# echo  'bbb'  |   grep  "b{1}"
bbb
[root@centos72 ~]# echo  'bbb'  |   grep  "b{2}"
bbb
[root@centos72 ~]# echo  'bbb'  |   grep  "b{3}"
bbb
[root@centos72 ~]# echo  'bbb'  |   grep  b{3}
[root@centos72 ~]# echo  'bbb'  |   grep  b{3}
[root@centos72 ~]# echo  'bbb'  |   grep  b{2}
[root@centos72 ~]# echo  'bbb'  |   grep  b{1}
[root@centos72 ~]# echo  "bbb"  |   grep  b{1}
[root@centos72 ~]# echo  "bbb"  |   grep  b{2}
[root@centos72 ~]# echo  "bbb"  |   grep  b{2}

单引号也可以

[root@centos72 ~]# cat  /etc/passwd  | grep  'o{1}'
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
polkitd:x:999:998:User for polkitd:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
wang:x:1000:1000:wang:/home/wang:/bin/bash

{m,n} 匹配前面的字符至少m次，至多n次

[root@centos72 ~]# echo  "bbb"  |   grep  "b{1,3}"
bbb
[root@centos72 ~]# echo  "bbb"  |   grep  "b{1,2}"
bbb

[root@centos72 ~]# cat  /etc/fstab    |   grep  "u{2,3}"
[root@centos72 ~]# cat  /etc/fstab    |   grep  -i  "u{2,3}"
UUID=5998ead0-b370-4859-9153-ecf4e2b9dd84 /                       xfs     defaults        0 0
UUID=ac6bb7e3-fa78-4eb2-b00d-e85c421c1bb0 /app                    xfs     defaults        0 0
UUID=92886c3f-42a3-40f4-8cf7-c6890ca3a52e /boot                   xfs     defaults        0 0
UUID=104520e1-0e97-4248-8fd0-a21e7d88a881 swap                    swap    defaults        0 0

[root@centos72 ~]# cat  /etc/services | grep  "u{2,3}"
uucp-path       117/tcp
uucp-path       117/udp
uucp            540/tcp         uucpd           # uucp daemon
uucp            540/udp                 # uucpd
uucp-rlogin     541/tcp                 # uucp-rlogin
uucp-rlogin     541/udp                 # uucp-rlogin
uuidgen         697/tcp                 # UUIDGEN
uuidgen         697/udp                 # UUIDGEN
opequus-server  2400/tcp                # OpEquus Server
opequus-server  2400/udp                # OpEquus Server
suucp           4031/tcp                # UUCP over SSL
suucp           4031/udp                # UUCP over SSL
continuus       5412/tcp                # Continuus
continuus       5412/udp                # Continuus
aequus          23456/tcp               # Aequus Service
aequus-alt      23457/tcp               # Aequus Service Mgmt
[root@centos72 ~]#

[root@centos72 ~]# echo  "bbbbbbbbbb"  |   grep  "b{5,9}"
bbbbbbbbbb
[root@centos72 ~]# echo  "bbbbbbbbb"  |   grep  "b{5,9}"
bbbbbbbbb
[root@centos72 ~]# echo  "bbbbbbbb"  |   grep  "b{5,9}"
bbbbbbbb
[root@centos72 ~]# echo  "bbbbbbb"  |   grep  "b{5,9}"
bbbbbbb
[root@centos72 ~]# echo  "bbbbbb"  |   grep  "b{5,9}"
bbbbbb
[root@centos72 ~]# echo  "bbbbbbbbbb"  |  wc
      1       1      11

如果是最小次数或者最大次数的整数倍，那么就会匹配整数次

刚好30个字符，那么匹配了6次，也就是最大次数的整数倍

[root@centos72 ~]# echo  "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"  |   grep  "b{4,5}"
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
[root@centos72 ~]#  echo  "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"  |  wc
      1       1      31

刚好14个字符，那么匹配了7次，也就是最小次数的整数倍

[root@centos72 ~]#  echo  "bbbbbbbbbbbbbb" | grep "b{2,4}"
bbbbbbbbbbbbbb
[root@centos72 ~]#  echo  "bbbbbbbbbbbbbb" | wc
      1       1      15

也可以匹配最小次数和最大次数的整数倍

3和7加起来是10，刚好匹配了1次最小次数和最大次数之和

[root@centos72 ~]#  echo  "bbbbbbbbbb" | wc
      1       1      11
[root@centos72 ~]#  echo  "bbbbbbbbbb" |  grep "b{3,7}"
bbbbbbbbbb

注意不可以匹配最小次数和最大次数之间的次数，比如最小次数和最大次数是3，5

那么不可以匹配到7

{,n} 匹配前面的字符至多n次

虽然是7个字符，超过了5次，但是可以匹配多次

[root@centos72 ~]# echo  "bbbbbbb" |   wc
      1       1       8
[root@centos72 ~]# echo  "bbbbbbb" |    grep "b{3,5}"
bbbbbbb
[root@centos72 ~]# echo  "bbbbbbb" |    grep "b{,5}"
bbbbbbb

{n,} 匹配前面的字符至少n次

[root@centos72 ~]# echo  "bbbbbbb" |   wc
      1       1       8
[root@centos72 ~]# echo  "bbbbbbb" |    grep "b{3,5}"
bbbbbbb
[root@centos72 ~]# echo  "bbbbbbb" |    grep "b{,5}"
bbbbbbb
[root@centos72 ~]# echo  "bbbbbbb" |    grep "b{5,}"
bbbbbbb

为什么显示的是8个字符，实际上是7个字符，因为回车换行占用一个字符

[root@centos72 ~]# cat  4.txt 
bbbbbbb
[root@centos72 ~]# cat  4.txt  | wc
      1       1       8
[root@centos72 ~]# hexdump   -C  4.txt 
00000000  62 62 62 62 62 62 62 0a                           |bbbbbbb.|
00000008

[root@centos72 ~]# cat  5.txt 
aaa
aaaa
aaaaa
aaaaaa
bcdaaaa
hdkfaaaa
hkflieaaaajdlhw
aa
a
fgbaa
[root@centos72 ~]# cat  5.txt  |  grep   a{3,}
[root@centos72 ~]# cat  5.txt  |  grep   "a{3,}"
aaa
aaaa
aaaaa
aaaaaa
bcdaaaa
hdkfaaaa
hkflieaaaajdlhw

前面的a一定要有，后面的a可有可无

示例1——取出主版本号

适用一位数和两位数

目前6是主流，也发行centos7版本，如果8出来那么主流就是7了，一般是滞后一个版本

法1

-o: 仅显示匹配到的字符串

+ 匹配其前面的字符至少1次

下面假设版本是17

[root@centos72 ~]# grep  -o  "[0-9]+"   /app/centos-release
17
5
1804
[root@centos72 ~]# grep  -o  "[0-9]+"   /app/centos-release |  head  -n1
17

[root@centos65 ~]# echo  111  |   grep  -o  "[0-9]+"
111
[root@centos65 ~]# echo  "111"  |   grep  -o  "[0-9]+"
111
[root@centos65 ~]# echo  "123456"  |   grep  -o  "[0-9]+"
123456
[root@centos65 ~]# echo  123456  |   grep  -o  "[0-9]+"
123456

[root@centos65 ~]# cp  /etc/centos-release   /app/
[root@centos65 ~]# vim  /app/centos-release 
[root@centos65 ~]# cat  /app/centos-release 
CentOS release 16.8 (Final)
[root@centos65 ~]# grep  -o  "[0-9]+"   /app/centos-release |  head  -n1
16

[root@centos72 ~]# grep  -o  "[0-9]+"  /etc/centos-release  | head  1
head: cannot open ‘1’ for reading: No such file or directory
[root@centos72 ~]# grep  -o  "[0-9]+"  /etc/centos-release  | head  -n1
7
[root@centos72 ~]# grep  -o  "[0-9]+"  /etc/centos-release  | head  -1
7

现在的版本

[root@centos65 ~]# grep  -o  "[0-9]+"  /etc/centos-release  | head  -n1
6
[root@centos65 ~]# grep  -o  "[0-9]+"  /etc/centos-release  | head  -1
6
[root@centos65 ~]# grep  -o  "[0-9]+"  /etc/centos-release  | head  1
head: cannot open `1' for reading: No such file or directory

[root@centos65 ~]# cat   /etc/centos-release 
CentOS release 6.8 (Final)
[root@centos72 ~]# cat   /etc/centos-release 
CentOS Linux release 7.5.1804 (Core)

法2：

去掉多余的空格

[root@centos72 ~]# cat  /app/centos-release  | tr -s ''
CentOS Linux release 17.5.1804 (Core)

以空格为分割符取第4个字段

[root@centos72 ~]# cat  /app/centos-release  | tr -s '' | cut -d" " -f4
17.5.1804

以点作为分割符


[root@centos72 ~]# cat  /app/centos-release  | tr -s '' | cut -d" " -f4 |  grep   "[[:digit:]]{2}"
17.5.1804
[root@centos72 ~]# cat  /app/centos-release  | tr -s '' | cut -d" " -f4 |  grep   "[[:digit:]]{2}" |  cut -d. -f1
17

[root@centos65 ~]# cat  /app/centos-release |tr -s '' | cut -d" " -f3 |   grep   "[[:digit:]]{1}" |cut -d. -f1
16

（四）位置锚定

位置锚定：定位出现的位置
^ 行首锚定，用于模式的最左侧
$ 行尾锚定，用于模式的最右侧
^PATTERN$ 用于模式匹配整行
^$ 空行
^[[:space:]]*$ 空白行
< 或或词首锚定，用于单词模式的左侧
> 或或词尾锚定；用于单词模式的右侧
<PATTERN>

单纯的过滤不管关键字在什么地方都会显示

[root@centos72 ~]# grep  root  /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

（1） ^ 行首锚定，用于模式的最左侧

显示以某个关键字开头的行

[root@centos72 ~]# grep  ^root  /etc/passwd
root:x:0:0:root:/root:/bin/bash
[root@centos72 ~]# grep  "^root"  /etc/passwd
root:x:0:0:root:/root:/bin/bash

[root@centos72 ~]# grep  '^root'  /etc/passwd
root:x:0:0:root:/root:/bin/bash

（2）$行尾锚定，用于模式的最右侧

$行尾锚定，用于模式的最右侧^{P} A T T E R N$

[root@centos72 ~]# grep  'bash$'  /etc/passwd
root:x:0:0:root:/root:/bin/bash
wang:x:1000:1000:wang:/home/wang:/bin/bash

（3）显示非空行

空行就是起始和结尾没有任何内容；空白行表示空行的一种，而且还要空格

^$ 空行

^[[:space:]]*$ 空白行

[root@centos72 ~]# cat  /etc/issue 
S
Kernel 
 on an m

[root@centos72 ~]# cat  /etc/issue | grep  ^$

[root@centos72 ~]# cat  /etc/issue | grep  "^$"

[root@centos72 ~]# cat  /etc/issue | grep  "^$"  | wc
      1       0       1

空行和空白行显示的内容是一样的

[root@centos72 ~]# cat /etc/issue  |  grep  ^[[:space:]]*$

[root@centos72 ~]# cat /etc/issue  |  grep  ^[[:space:]]*$ | wc
      1       0       1

下面是空行，但空行没有空格

$行尾锚定，用于模式的最右侧^{P} A T T E R N$

[root@centos72 ~]# cat -A  /etc/issue
S$
Kernel 
 on an m$
$

$行尾锚定，用于模式的最右侧^{P} A T T E R N$

[root@centos72 ~]# cat  /etc/issue | grep  -v   "^$"  
S
Kernel 
 on an m
[root@centos72 ~]# cat  /etc/issue | grep  -v   "^$"   | wc
      2       6      22

[root@centos72 ~]# cat /etc/issue  |  grep  -v ^[[:space:]]*$
S
Kernel 
 on an m
[root@centos72 ~]# cat /etc/issue  |  grep  -v ^[[:space:]]*$  | wc
      2       6      22

$行尾锚定，用于模式的最右侧^{P} A T T E R N$

下面是空行，空行有空格

[root@centos72 ~]# cat -A  /etc/issue
S$
Kernel 
 on an m$
       $

（4）显示空白行

"^$"只能显示空行，而^[[:space:]]*$的范围更广，可以显示空行和空白行

*表示0或者有

[root@centos72 ~]# cat  /etc/issue | grep  "^$" 
[root@centos72 ~]# cat  /etc/issue | grep  "^$"  | wc
      0       0       0
[root@centos72 ~]# cat  /etc/issue | grep ^[[:space:]]*$  
       
[root@centos72 ~]# cat  /etc/issue | grep ^[[:space:]]*$  | wc
      1       0       8

[root@centos72 ~]# cat /etc/issue  |  grep  -v ^$  | wc
      3       6      30
[root@centos72 ~]# cat /etc/issue  |  grep  -v ^$  
S
Kernel 
 on an m
       
[root@centos72 ~]# cat /etc/issue  |  grep  -v ^[[:space:]]*$  
S
Kernel 
 on an m
[root@centos72 ~]# cat /etc/issue  |  grep  -v ^[[:space:]]*$  | wc
      2       6      22

"^[[:space:]]$"和"^$"显示的结果一样

"^[[:space:]]$"表示空格或者tab键，也就是有换行

而"^$"显示的是空行

[root@centos72 ~]# cat /etc/issue  |  grep  "^[[:space:]]$"  
[root@centos72 ~]# cat /etc/issue  |  grep  "^[[:space:]]$"   | wc
      0       0       0

（5）< 或词首锚定，用于单词模式的左侧

<可以理解为倒下的脱字符^

左边小括号就是词首，因为我们习惯于从左到右看内容

注意要判断是否为单词，那么除了子母数字下划线的都是单词的分隔符

[root@centos72 ~]# cat  /etc/passwd  | grep    "<r" 
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

[root@centos72 ~]# cat  /etc/issue  | grep    "<K" 
Kernel 
 on an m
[root@centos72 ~]# cat  /etc/issue  | grep    "<k" 
[root@centos72 ~]# cat  /etc/issue  | grep  -i   "<k" 
Kernel 
 on an m

注意要判断是否为单词，那么除了子母数字下划线的都是单词的分隔符

[root@centos72 ~]# echo  "aa_root"   | grep    "<r"
[root@centos72 ~]# echo  "aa1root"   | grep    "<r"
[root@centos72 ~]# echo  "aacroot"   | grep    "<r"
[root@centos72 ~]# echo  "aa-root"   | grep    "<r"
aa-root
[root@centos72 ~]# echo  "aa root"   | grep    "<r"
aa root
[root@centos72 ~]# echo  "aa+ root"   | grep    "<r"
aa+ root
[root@centos72 ~]# echo  "aa=root"   | grep    "<r"
aa=root
[root@centos72 ~]# echo  "aa@root"   | grep    "<r"
aa@root
[root@centos72 ~]# echo  "aa！root"   | grep    "<r"
aa！root
[root@centos72 ~]# echo  "aa……root"   | grep    "<r"
aa……root

> 或或词尾锚定；用于单词模式的右侧

[root@centos72 ~]# cat  /etc/passwd  | grep    "h>" 
root:x:0:0:root:/root:/bin/bash
wang:x:1000:1000:wang:/home/wang:/bin/bash

注意不要使用b作为词首词尾锚定，容易搞混的

[root@centos72 ~]# echo  "aa……root"   | grep    "r"
aa……root
[root@centos72 ~]# echo  "aa……rootr"   | grep    "r"
aa……rootr
[root@centos72 ~]# echo  "aa……rootr"   | grep    "r"

（6）<PATTERN> 匹配整个单词，也就是完全匹配

[root@centos72 ~]# cat  /etc/passwd | grep  "<root>"
> ^C
[root@centos72 ~]# cat  /etc/passwd | grep  "<root>"
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

注意cat /etc/passwd | grep root是模糊匹配

[root@centos72 ~]# cat  /etc/passwd | grep  "<root>"
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
[root@centos72 ~]# cat  /etc/passwd | grep  root
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
[root@centos72 ~]# cat  /etc/passwd | grep  "<root>"  | wc
      2       2      77
[root@centos72 ~]# cat  /etc/passwd | grep  root | wc
      2       2      77

$行尾锚定，用于模式的最右侧^{P} A T T E R N$

创建一个用户

注意cat /etc/passwd | grep root是模糊匹配，范围更广

[root@centos72 ~]# useradd  rooter
[root@centos72 ~]# cat  /etc/passwd | grep  root 
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
rooter:x:1001:1001::/home/rooter:/bin/bash
[root@centos72 ~]# cat  /etc/passwd | grep  "<root>"  
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
[root@centos72 ~]# cat  /etc/passwd | grep  root | wc
      3       3     120
[root@centos72 ~]# cat  /etc/passwd | grep  "<root>"  | wc
      2       2      77
[root@centos72 ~]#

完全匹配和加上选项w的结果是一样的

[root@centos72 ~]# cat  /etc/passwd | grep -w   root 
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
[root@centos72 ~]# cat  /etc/passwd | grep -w   root  | wc
      2       2      77

在下面的文件里面有很多带下划线的单纯后面加上（）比如apply_sysctl()

[root@centos72 ~]# cat /etc/init.d/functions

显示含有数字字母或者下划线的函数

[root@centos72 ~]# cat /etc/init.d/functions  | grep  ".*{$"  
systemctl_redirect () {
checkpid() {
__kill_pids_term_kill_checkpids() {
__kill_pids_term_kill() {
__pids_var_run() {
__pids_pidof() {
daemon() {
killproc() {
pidfileofproc() {
pidofproc() {
status() {
echo_success() {
echo_failure() {
echo_passed() {
echo_warning() {
update_boot_stage() {
success() {
failure() {
passed() {
warning() {
action() {
strstr() {
is_ignored_file() {
convert2sec() {
is_true() {
is_false() {
apply_sysctl() {
[root@centos72 ~]# cat /etc/init.d/functions  | grep  ".*{$"  | tr -d   {
systemctl_redirect () 
checkpid() 
__kill_pids_term_kill_checkpids() 
__kill_pids_term_kill() 
__pids_var_run() 
__pids_pidof() 
daemon() 
killproc() 
pidfileofproc() 
pidofproc() 
status() 
echo_success() 
echo_failure() 
echo_passed() 
echo_warning() 
update_boot_stage() 
success() 
failure() 
passed() 
warning() 
action() 
strstr() 
is_ignored_file() 
convert2sec() 
is_true() 
is_false() 
apply_sysctl()

法2

_和a-Z是或的关系

[root@centos72 ~]# grep  -o   "^[_a-Z]*()"   /etc/init.d/functions 
checkpid()
__kill_pids_term_kill_checkpids()
__kill_pids_term_kill()
__pids_var_run()
__pids_pidof()
daemon()
killproc()
pidfileofproc()
pidofproc()
status()
echo_success()
echo_failure()
echo_passed()
echo_warning()
update_boot_stage()
success()
failure()
passed()
warning()
action()
strstr()
is_ignored_file()
is_true()
is_false()
apply_sysctl()
[root@centos72 ~]#

法3

字母数字下划线是必须要有1个，空白可有可无

[root@centos72 ~]# grep  -o   "^[[:alnum:]_]+[[:space:]]*()"   /etc/init.d/functions   
systemctl_redirect ()
checkpid()
__kill_pids_term_kill_checkpids()
__kill_pids_term_kill()
__pids_var_run()
__pids_pidof()
daemon()
killproc()
pidfileofproc()
pidofproc()
status()
echo_success()
echo_failure()
echo_passed()
echo_warning()
update_boot_stage()
success()
failure()
passed()
warning()
action()
strstr()
is_ignored_file()
convert2sec()
is_true()
is_false()
apply_sysctl()
[root@centos72 ~]# grep  -o   "^[[:alnum:]_]+[[:space:]]*()"   /etc/init.d/functions   | wc
     27      28     384
[root@centos72 ~]#

（7）分组：() 将一个或多个字符捆绑在一起，当作一个整体进行处理。

分组括号中的模式匹配到的内容会被正则表达式引擎记录于内部的变量中，

这些变量的命名方式为: 1, 2, 3, ...

1 表示从左侧起第一个左括号以及与之匹配右括号之间的模式所匹配到的字符

示例： (string1+(string2)*)
1 ：string1+(string2)*
2 ：string2

（8）后向引用：引用前面的分组括号中的模式所匹配字符，而非模式本身
或者：：|
示例：a|b: a 或b

C|cat: C 或cat

(C|c)at:Cat 或cat

注意是对（），|进行转义

下面(a|b|c)是一个整体，并且是或的关系

[root@centos72 ~]# echo  ax bx cx  | grep  "(a|b|c)x"
ax bx cx

如果不加括号，那么就不是组合了

[root@centos72 ~]# echo  ax bx cx  | grep  "a|b|cx"
ax bx cx

显示t重复2次及以上

[root@centos72 ~]# echo  rootrootroottt  | grep  "root{2,}"
rootrootroottt

显示root重复2次及以上

注意在基本正则表达式里面要写的，进行转义

[root@centos72 ~]# echo  rootrootroot  | grep  "(root){2,}"
rootrootroot

分组：() 将一个或多个字符捆绑在一起，当作一个整体进行处理

后向引用：引用前面的分组括号中的模式所匹配字符，而非模式本身

中间是两个字符都可以，空格不行

[root@centos72 ~]# echo  axyb  | grep  "(a..b)"
axyb

[root@centos72 ~]# echo  ab  | grep  "(a..b)"
[root@centos72 ~]# echo  a  b  | grep  "(a..b)"
[root@centos72 ~]# echo  a  b  | grep  "(a..b)"

如果要表示axyb xx a12b yyy那么要使用两次分组

[root@centos72 ~]# echo  axyb  xx  a12b  yyy  | grep  "(a..b).*(a..b).*"
axyb xx a12b yyy

下面是3种不同的情况:

axyb xx a12b yyy

axyb xx axyb yyy

a12 xx a12b yyy

使用正则表达式，可以使用相同的写法：

[root@centos72 ~]# echo  axyb  xx  a12b  yyy  | grep  "(a..b).*(a..b)*"
axyb xx a12b yyy
[root@centos72 ~]# echo  axyb  xx  axyb  yyy  | grep  "(a..b).*(a..b)*"
axyb xx axyb yyy
[root@centos72 ~]# echo  a12b  xx  a12b  yyy  | grep  "(a..b).*(a..b)*"
a12b xx a12b yyy

[root@centos72 ~]# echo  axyb  xx  a12b  yyy  | grep  "(a..b).*(a..b).*"
axyb xx a12b yyy
[root@centos72 ~]# echo  axyb  xx  axyb  yyy  | grep  "(a..b).*(a..b).*"
axyb xx axyb yyy
[root@centos72 ~]# echo  a12b  xx  a12b  yyy  | grep  "(a..b).*(a..b).*"
a12b xx a12b yyy

如果是后面两种情况，可以使用其他方法表示

因为出现了两次相同的字符，为了方便就不要再写一遍了

1代表了(a..b)里面的表达出来的字符串

[root@centos72 ~]# echo  a12b  xx  a12b  yyy  | grep  "(a..b).*1.*"
a12b xx a12b yyy

如果是两个正则表达式分组，并且是不同的

[root@centos72 ~]# echo  a12b  xx  n12m  yyy  | grep  "(a..b).*(n..m).*"
a12b xx n12m yyy

1调用了第1个分组(a..b)里面的表达出来的字符串,.*都表示任意个任意字符串，2表示调用了第2个分组（x..y ）里面的表达出来的字符串

后向引用：引用前面的分组括号中的模式所匹配字符，而非模式本身，在此例中就不是模式(a..b)以及（x..y ），而是模式匹配出来的字符

1对应第1个分组，2对应第2个分组

0930(a..b)出现两次并且是里面的字符是完全一样的，（x..y）出现两次并且是里面的字符是完全一样的

情况1

下面的完全一样的两部分：

[root@centos72 ~]# echo  a12bdggxxxxery a12bdggxxxxery |  grep   "(a..b).*(x..y).*1.*2"
a12bdggxxxxery a12bdggxxxxery
[root@centos72 ~]# echo  a12bdggxxxxerya12bdggxxxxery |  grep   "(a..b).*(x..y).*1.*2"
a12bdggxxxxerya12bdggxxxxery

情况2

除了括号里面的都一样，其他的不一样

a12b和xery出现了两次

[root@centos72 ~]# echo  a12bdggxxxgdfdhfdsgxerya12bdggxxxfgdsgvntexery |  grep   "(a..b).*(x..y).*1.*2"
a12bdggxxxgdfdhfdsgxerya12bdggxxxfgdsgvntexery

情况3

没有出现两次a12b,虽然出现了两次xery,这是不能匹配的

[root@centos72 ~]# echo  a12bdggxxxgdfdhfdsgxerya34bdggxxxfgdsexery |  grep   "(a..b).*(x..y).*1.*2"

1 表示从左侧起第一个左括号以及与之匹配右括号之间的模式所匹配到的字符，

2表示左侧起第2个左括号以及与之匹配右括号之间的模式所匹配到的字符

(string1+(string2)*)
1 ：string1+(string2)*
2 ：string2

这里的1分别代表了第2次出现的1234，xyz

搜索替代的时候很适合使用此技巧

[root@centos72 ~]# echo  12341234  | grep  "(1..4).*1"
12341234
[root@centos72 ~]# echo  xyzxyz  | grep  "(x.z).*1"
xyzxyz

或者：：|

注意表示对（），|进行转义

示例：a|b: a 或b C|cat: C 或cat (C|c)at:Cat 或cat

作者：wang618
出处：https://www.cnblogs.com/wang618/
本文版权归作者和博客园共有，欢迎转载，但未经作者同意必须保留此段声明，且在文章页面明显位置给出原文链接，否则保留追究法律责任的权利。

相关阅读:
电话号码分身
 利用Geoerver+Mysql+openlayers实现gis空间数据线段、多边形的存储、编辑、平移等功能
 vue+openlayers图形交互，实现多边形绘制、编辑和保存
 JetBrains AppCode：用于 iOS/macOS 开发的智能 IDE
GIS基础知识
 class java.time.LocalDateTime cannot be cast to class java.util.Date
geoserver配置SQL图层 cql_filter模糊查询
 gis论坛
 Geoserver的WFS服务
 Linux 环境下修改 MySQL 时区
原文地址：https://www.cnblogs.com/wang618/p/11078720.html

最新文章
linux-进程
 linux-磁盘
 linux-vim
linux-apt
jenkins-简介
 jenkins-安装配置
 pytest-pytest.ini
pytest-参数化
 linux环境和命令
 算术运算符