引子
前一段时间处理一个线上问题,服务器拉组播码流,但是每隔3-4分钟就断流一次,引起服务异常。排除了交换机和组播网络的问题后,
确认问题还是在服务器侧。
组播为什么断流?
前方工程人员抓包确认,交换机发送了igmp general query报文,但是服务器没有响应组播report报文,交换机上igmp条目超时退出,导致断流。
抓包分析如下:
rp_filter配置对入向报文的影响
具体的排查过程就不再赘述了,这里只写结论:rp_filter配置影响了系统响应IGMP general query查询,当rp_filter设置为0后,
系统正常响应交换机GIMP general query报文,组播码流没有再出现断流。
即使系统中配置了策略路由,也没有发挥应有的作用,rp_filter模块在做反向路径检查时,还是认为源地址校验失败。这比较
令人费解,看了只能从rp_filter机制和内核代码中找答案了。
rp_filter简要说明
关于rp_filter详细描述,可以参考Lninux内核Documentation etworkingip-sysctl.txt描述,以及本文末尾的博文链接。
rp_filter - INTEGER
0 - No source validation.
1 - Strict mode as defined in RFC3704 Strict Reverse Path
Each incoming packet is tested against the FIB and if the interface is not the best reverse path the packet check will fail.
By default failed packets are discarded.
2 - Loose mode as defined in RFC3704 Loose Reverse Path
Each incoming packet's source address is also tested against the FIB and if the source address is not reachable via any interface
the packet check will fail.
Current recommended practice in RFC3704 is to enable strict mode to prevent IP spoofing from DDos attacks.
If using asymmetric routing or other complicated routing, then loose mode is recommended.
The max value from conf/{all,interface}/rp_filter is used when doing source validation on the {interface}.
Default value is 0. Note that some distributions enable it in startup scripts.
由此可以看出,rp_filter有三个取值:
0: 不进行源地址校验;
1: 严格模式,即RFC3704定义的严格反向路径;每个入向报文都要经过FIB进行反向路径检验,如果反向路径的出向端口不是最优的,则检测失败。默认情况下,丢弃检验失败的报文;
2: 松散模式,即RFC3704定义的松散反向路径;每个入向报文的源地址都要经过FIB检验,如果入向报文的源地址不能通过反向路径的任何出向端口到达,则检测失败。
当前RFC3704文档建议使能严格模式,防止IP欺骗的DDos攻击。如果使用非对称路由或者其他复杂路由,建议使用松散模式。
引用别人一张图说明rp_filter作用:
系统路由配置:
策略路由规则
#ip rule
0: from all lookup local
200: from 176.100.1.74 lookup 4
200: from all to 176.100.1.74 lookup 4
200: from 176.100.1.71 lookup 2
200: from all to 176.100.1.71 lookup 2
200: from 176.100.1.73 lookup 3
200: from all to 176.100.1.73 lookup 3
32766: from all lookup main
32767: from all lookup default
策略路由:
# ip r s t 2
default via 176.100.1.65 dev eth1 proto static src 176.100.1.71
176.100.1.64/27 dev eth1 proto static src 176.100.1.71
# ip r s t 3
default via 176.100.1.66 dev eth0 proto static src 176.100.1.73
176.100.1.64/27 dev eth0 proto static src 176.100.1.73
# ip r s t 4
default via 176.100.1.67 dev eth1 proto static src 176.100.1.74
176.100.1.64/27 dev eth1 proto static src 176.100.1.74
主路由表
# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 176.100.1.65 0.0.0.0 UG 0 0 0 eth1
176.100.1.64 0.0.0.0 255.255.255.224 U 0 0 0 eth0
176.100.1.64 0.0.0.0 255.255.255.224 U 0 0 0 eth1
link-local 0.0.0.0 255.255.0.0 U 1005 0 0 eth0
rp_filter相关代码分析
内核函数通过fib_validate_source做反向路径检查,在三个地方调用,调用关系如下所示:
调用fib_validate_source函数的总入口是ip_rcv函数,也即是入向接收IP协议报文的总入口函数
ip_rcv
--> ip_rcv_finish
--> ip_route_input_noref ##如果skb还没有目的条目(路由相关),初始化虚拟路径cache
##目的地址是组播地址 ,这就是我们要分析的一支路径
-->ip_route_input_mc -->fib_validate_source --> __fib_validate_source -->fib_lookup
##else 目的地址 非组播地址
-->ip_route_input_slow
-->fib_validate_source ## 通过fib_lookup查找到RTN_LOCAL类型路由,做反向检查,最终走local_input流程
##查找到RTN_BROADCAST类型路由且源地址非全0,也做反向检查
##不满足RTN_LOCAL和RTN_BROADCAST类型路由,则调用ip_mkroute_intput, 创建route cache entry
-->ip_mkroute_input-->__mkroute_input-->fib_validate_source
static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr, u8 tos, struct net_device *dev, int our)
调用ip_route_input_mc时,daddr,saddr,tos参数都来源与IP报文里面的目的地址,源地址和tos字段。
int fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
u8 tos, int oif, struct net_device *dev,
struct in_device *idev, u32 *itag)
fib_validate_source(skb, saddr, 0, tos, 0, dev, in_dev, &itag);
1 static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst, 2 u8 tos, int oif, struct net_device *dev, 3 int rpf, struct in_device *idev, u32 *itag) 4 5 { 6 7 struct fib_result res; 8 struct flowi4 fl4; 9 10 ...... 11 12 /* 通过 struct flowi4 fl4做反向路由匹配 */ 13 14 fl4.flowi4_oif = 0; 15 fl4.flowi4_iif = oif; /* 入向网口index赋值为出向网口index */ 16 fl4.daddr = src; /* 反向路由匹配时,源目的IP赋值给fl4的目的地址 */ 17 fl4.saddr = dst; /* 反向路由匹配时,目的地址赋值给fl4的源地址,这个值为0 */ 18 fl4.flowi4_tos = tos; /* ip报文里面此值为0 */ 19 fl4.flowi4_scope = RT_SCOPE_UNIVERSE; 20 21 ...... 22 fl4.flowi4_mark = IN_DEV_SRC_VMARK(idev) ? skb->mark : 0; /* 网口配置了src_valid_mark,才会把skb mark赋值给flowi_mark */ 23 net = dev_net(dev); 24 25 /* 做反向路由查询 */ 26 if (fib_lookup(net, &fl4, &res)) 27 goto last_resort; 28 if (res.type != RTN_UNICAST) { 29 if (res.type != RTN_LOCAL || !accept_local) 30 goto e_inval; 31 } 32 33 ...... 34 35 last_resort: 36 if (rpf) 37 goto e_rpf; 38 *itag = 0; 39 return 0; 40 41 e_inval: 42 return -EINVAL; 43 e_rpf: 44 return -EXDEV; /* Cross-device link */ 45 46 }
/* 做反向路径检查时,saddr为IP报文头的源地址,而目的地址赋值为0 */
系统中配置了路由规则,因此要分析的代码是 __fib_lookup函数
1 static inline int fib_lookup(struct net *net, struct flowi4 *flp, 2 struct fib_result *res) 3 { 4 struct fib_table *tb; 5 int err = -ENETUNREACH; 6 /* 系统配置了路由规则,因此走这个路径; 只要添加了rule,此值为1,即使删除添加的rule,仍为1 */ 7 if (net->ipv4.fib_has_custom_rules) 8 return __fib_lookup(net, flp, res); 9 10 rcu_read_lock(); 11 /* 无策略路由规则时,直接查询local/main/default三张路由表 */ 12 res->tclassid = 0; 13 /* 查找local路由表 */ 14 tb = rcu_dereference_rtnl(net->ipv4.fib_local); 15 if (tb) 16 err = fib_table_lookup(tb, flp, res, FIB_LOOKUP_NOREF); 17 18 if (!err) 19 goto out; 20 /* 查找main路由表 */ 21 tb = rcu_dereference_rtnl(net->ipv4.fib_main); 22 if (tb) 23 err = fib_table_lookup(tb, flp, res, FIB_LOOKUP_NOREF); 24 25 if (!err) 26 goto out; 27 /* 查找default路由表 */ 28 tb = rcu_dereference_rtnl(net->ipv4.fib_default); 29 if (tb) 30 err = fib_table_lookup(tb, flp, res, FIB_LOOKUP_NOREF); 31 32 out: 33 if (err == -EAGAIN) 34 err = -ENETUNREACH; 35 36 rcu_read_unlock(); 37 38 return err; 39 }
__fib_lookup函数中,通过fib_rules_lookup查询对应的路由表
1 int __fib_lookup(struct net *net, struct flowi4 *flp, struct fib_result *res) 2 { 3 struct fib_lookup_arg arg = { 4 .result = res, 5 .flags = FIB_LOOKUP_NOREF, 6 }; 7 int err; 8 /* 通过路由规则查询对应的路由表 */ 9 err = fib_rules_lookup(net->ipv4.rules_ops, flowi4_to_flowi(flp), 0, &arg); 10 #ifdef CONFIG_IP_ROUTE_CLASSID 11 if (arg.rule) 12 res->tclassid = ((struct fib4_rule *)arg.rule)->tclassid; 13 else 14 res->tclassid = 0; 15 #endif 16 17 if (err == -ESRCH) 18 err = -ENETUNREACH; 19 20 return err; 21 }
分析代码我们可以导到,路由匹配函数为fib4_rule_match,一般action函数为fib4_rule_action,在下面会继续分析这两个函数。
/* fib_rules.c文件 */
1 static const struct fib_rules_ops __net_initconst fib4_rules_ops_template = { 2 .family = AF_INET, 3 .rule_size = sizeof(struct fib4_rule), /* struct fib4_rule包含了struct fib_rule结构 */ 4 .addr_size = sizeof(u32), 5 .action = fib4_rule_action, /* 策略路由规则的一般动作 函数 */ 6 .match = fib4_rule_match, /* 策略路由规则匹配函数 */ 7 .configure = fib4_rule_configure, 8 .delete = fib4_rule_delete, 9 .compare = fib4_rule_compare, 10 .fill = fib4_rule_fill, 11 .default_pref = fib_default_rule_pref, 12 .nlmsg_payload = fib4_rule_nlmsg_payload, 13 .flush_cache = fib4_rule_flush_cache, 14 .nlgroup = RTNLGRP_IPV4_RULE, 15 .policy = fib4_rule_policy, 16 .owner = THIS_MODULE, 17 };
路由规则操作函数是怎么与设置的路由规则相关联呢?
/* ip_fib_net_init初始化时会调用fib_rules_init注册 路由规则的操作函数 */
1 int __net_init fib4_rules_init(struct net *net) 2 { 3 int err; 4 struct fib_rules_ops *ops; 5 6 ops = fib_rules_register(&fib4_rules_ops_template, net); 7 if (IS_ERR(ops)) 8 return PTR_ERR(ops); 9 10 err = fib_default_rules_init(ops); 11 if (err < 0) 12 goto fail; 13 net->ipv4.rules_ops = ops; 14 net->ipv4.fib_has_custom_rules = false; 15 return 0; 16 17 fail: 18 /* also cleans all rules already added */ 19 fib_rules_unregister(ops); 20 return err; 21 }
fib_rules.c文件 fib_rules_lookup 函数
1 int fib_rules_lookup(struct fib_rules_ops *ops, struct flowi *fl, 2 int flags, struct fib_lookup_arg *arg) 3 { 4 struct fib_rule *rule; 5 int err; 6 7 rcu_read_lock(); 8 /* 遍历路由策略链表,并做策略匹配 */ 9 list_for_each_entry_rcu(rule, &ops->rules_list, list) { 10 jumped: 11 if (!fib_rule_match(rule, ops, fl, flags)) /* 路由策略匹配,实际调用fib4_rules_ops_template中的match,也就是fib4_rule_match函数 */ 12 continue; 13 14 if (rule->action == FR_ACT_GOTO) { 15 struct fib_rule *target; 16 17 target = rcu_dereference(rule->ctarget); 18 if (target == NULL) { 19 continue; 20 } else { 21 rule = target; 22 goto jumped; 23 } 24 } else if (rule->action == FR_ACT_NOP) 25 continue; 26 else 27 err = ops->action(rule, fl, flags, arg); 28 29 if (err != -EAGAIN) { 30 if ((arg->flags & FIB_LOOKUP_NOREF) || 31 likely(atomic_inc_not_zero(&rule->refcnt))) { 32 arg->rule = rule; 33 goto out; 34 } 35 break; 36 } 37 } 38 39 err = -ESRCH; 40 out: 41 rcu_read_unlock(); 42 43 return err; 44 } 45 46 static int fib_rule_match(struct fib_rule *rule, struct fib_rules_ops *ops, 47 struct flowi *fl, int flags) 48 { 49 int ret = 0; 50 /* 在我们配置的策略路由规则中,mark_mask,iifindex和oifindex都为0, 所以前三个分支都会跳过 */ 51 if (rule->iifindex && (rule->iifindex != fl->flowi_iif)) 52 goto out; 53 54 if (rule->oifindex && (rule->oifindex != fl->flowi_oif)) 55 goto out; 56 57 if ((rule->mark ^ fl->flowi_mark) & rule->mark_mask) 58 goto out; 59 /* 调用的是fib4_rule_match */ 60 ret = ops->match(rule, fl, flags); /* 即调用fib4_rule_match函数 */ 61 out: 62 return (rule->flags & FIB_RULE_INVERT) ? !ret : ret; 63 }
先回顾一下系统中的路由规则配置:
#ip rule
0: from all lookup local #rule->src = 0, rule->dst = 0, srcmask and dstmask 均为0
200: from 176.100.1.74 lookup 4 #要匹配规则src = 176.100.1.74,srcmask = 255.255.255.255,dst = 0,dstmask =0
200: from all to 176.100.1.74 lookup 4 #要匹配规则dst = 176.100.1.74,dstmask = 255.255.255.255, src = 0, srcmask = 0
200: from 176.100.1.71 lookup 2
200: from all to 176.100.1.71 lookup 2
200: from 176.100.1.73 lookup 3
200: from all to 176.100.1.73 lookup 3
32766: from all lookup main # src=dst=0, srcmask=dstmask=0,匹配所有 源地址
32767: from all lookup default # src=dst=0, srcmask=dstmask=0,匹配所有 源地址
1 static int fib4_rule_match(struct fib_rule *rule, struct flowi *fl, int flags) 2 { 3 struct fib4_rule *r = (struct fib4_rule *) rule; 4 struct flowi4 *fl4 = &fl->u.ip4; 5 __be32 daddr = fl4->daddr; /* 根据fib_validate_source调用过程,此为IP报文的源IP地址: 176.100.1.66 */ 6 __be32 saddr = fl4->saddr; /* 根据fib_validate_source调用过程,此为0 */ 7 /* 地址和掩码与操作后为真,则返回0,即不匹配;如果规则中没有源或目的地址,则掩码为0,src或dst为0 */ 8 if (((saddr ^ r->src) & r->srcmask) || 9 ((daddr ^ r->dst) & r->dstmask)) 10 return 0; 11 /* 我们配置的路由规则tos为0,这个分支可以忽略 */ 12 if (r->tos && (r->tos != fl4->flowi4_tos)) 13 return 0; 14 15 return 1; 16 }
在我们研究的案例中,源地址为176.100.166,在做反向路由检查时,赋值给fl4->daddr, 它只会匹配以下三条路由规则:
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
Linux系统中,local表示系统本地路由表,main是主路由表,是ip route或者route命令看到的路由信息,而default表是空的。
在反向路径检查过程中,只能在主路由表中搜寻到出向路由。
从上面的主路由表中,能够匹配以下两条路由,根据路由选取规则,选择了标记为红色的路由条目。
176.100.1.64 0.0.0.0 255.255.255.224 U 0 0 0 eth0
176.100.1.64 0.0.0.0 255.255.255.224 U 0 0 0 eth1
由于176.100.1.74地址配置在eth1上,而查询到的路由出向端口为eth0,根据反向路由检查的规则,会判定为非最优路径,
也即是unreachable,直接丢弃该入向报文。所以就不会回应IGMP查询报文,最终会导致组播入向码流断。
综上所述,fib_validate_source函数调用fib_validate_source函数时,参数src为IP报文的源地址字段,参数dst直接赋值为0;
这样在__fib_validate_source函数里,fl4.daddr为参数src值,即IP报文的源地址字段,fl4.saddr为0。这样做,就匹配不到系统配置
的路由规则,只能查找主路由表。
在主路由表查找到的路由条目的出向端口,与IGMP报文的入向端口不一致,反向路由检查失败,所以就不会回应IGMP查询报文,
交换机会在IGMP条目老化后,删除掉对应的组播条目,最终组播入向码流断。
问题解决思路
第一个想到的解决办法就是修改 rp_filter为0或者2,即不做反向路径检查,或为松散模式。这个方法的缺点,正如前文所说,
不能防止IP欺骗的DDos攻击。
第二个解决办法是与路由规则和反向路由检查相关:经过分析__fib_valid_source代码,可以结合iptables、路由规则、策略路由
对来自网关的IGMP查询报文做一个标记(匹配源/目的IP地址,入向网口),然后配置一个与标记相关的路由规则,当其匹配时,查找
我们的策略路由表,而策略路由表中出向端口 与 标记的IGMP查询报文的入向端口一致,也即会通过反向路由检查。
实验验证:
在有两个同网段IP的系统中,笔者做了一次实验,验证了上述第二个解决办法:
系统配置:
网口IP配置:
4: em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 14:18:77:5d:49:f1 brd ff:ff:ff:ff:ff:ff
inet 10.47.242.116/24 scope global em3
valid_lft forever preferred_lft forever
5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 14:18:77:5d:49:f2 brd ff:ff:ff:ff:ff:ff
inet 10.47.242.117/24 brd 10.47.242.255 scope global noprefixroute em4
valid_lft forever preferred_lft forever
inet6 fe80::2512:169:6dca:72b1/64 scope link noprefixroute
valid_lft forever preferred_lft forever
路由配置:
# ip r s t 6
default via 10.47.242.1 dev em3
10.47.242.0/24 dev em3 scope link
# ip r s t 7
default via 10.47.242.1 dev em4
10.47.242.0/24 dev em4 scope link
# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default gateway 0.0.0.0 UG 100 0 0 em4
10.47.242.0 0.0.0.0 255.255.255.0 U 0 0 0 em3
10.47.242.0 0.0.0.0 255.255.255.0 U 100 0 0 em4
192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0
说明:如果组播组加在em4网口上,在配置了rp_filter情况下,一般不会回应igmp查询报文的
rp_filter配置:
# sysctl -a |grep -E "em[0-9]+.rp_filter"
net.ipv4.conf.em1.rp_filter = 1
net.ipv4.conf.em2.rp_filter = 1
net.ipv4.conf.em3.rp_filter = 1
net.ipv4.conf.em4.rp_filter = 1
实验步骤:
做了如下三项配置:
1)配置网口src_valid_mark,利用它得到标记的IGMP查询报文
# sysctl -w net.ipv4.conf.em4.src_valid_mark=1
2)在PREROUTING中配置mangle表,匹配入向网口,源/目的IP地址,设置标记
# iptables -t mangle -A PREROUTING -i em4 -s 10.47.242.1 -d 224.0.0.1 -j MARK --set-mark 9527
3)设置标记相关的路由策略,使其查询路由表7
# ip rule add fwmark 9527 table 7
以下是路由规则,路由表、mangle表的配置信息:
# ip rule
0: from all lookup local
32760: from all fwmark 0x2537 lookup 7
32761: from all to 10.47.242.116 lookup 6
32762: from 10.47.242.116 lookup 6
32763: from all to 10.47.242.117 lookup 7
32764: from 10.47.242.117 lookup 7
32766: from all lookup main
32767: from all lookup default
# ip r s t 7
default via 10.47.242.1 dev em4
10.47.242.0/24 dev em4 scope link
# iptables --list -t mangle
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
MARK all -- gateway 224.0.0.1 MARK set 0x2537
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
# sysctl -a |grep _mark
net.ipv4.conf.all.src_valid_mark = 0
net.ipv4.conf.default.src_valid_mark = 0
net.ipv4.conf.em1.src_valid_mark = 0
net.ipv4.conf.em2.src_valid_mark = 0
net.ipv4.conf.em3.src_valid_mark = 0
net.ipv4.conf.em4.src_valid_mark = 1
在em4上加入组播组:
# ip maddr show dev em4
5: em4
link 01:00:5e:00:00:01
link 33:33:00:00:00:01
link 33:33:ff:ca:72:b1
link 01:00:5e:01:01:01
inet 230.1.1.1
inet 224.0.0.1
inet6 ff02::1:ffca:72b1
inet6 ff02::1
inet6 ff01::1
抓包确认em4网口是否回复IGMP report报文:
tcpdump -i em4 igmp -n
22:23:37.390054 IP 10.47.242.1 > 224.0.0.1: igmp query v2
22:23:46.213070 IP 10.47.242.117 > 230.1.1.1: igmp v2 report 230.1.1.1
引用:
https://www.linuxidc.com/Linux/2012-06/64065.html
https://www.cnblogs.com/lipengxiang2009/p/7446388.html