Towards Deep Learning Models Resistant to Adversarial Attacks

Towards Deep Learning Models Resistant to Adversarial Attacks
目录
Madry A, Makelov A, Schmidt L, et al. Towards Deep Learning Models Resistant to Adversarial Attacks.[J]. arXiv: Machine Learning, 2017.

@article{madry2017towards,
title={Towards Deep Learning Models Resistant to Adversarial Attacks.},
author={Madry, Aleksander and Makelov, Aleksandar and Schmidt, Ludwig and Tsipras, Dimitris and Vladu, Adrian},
journal={arXiv: Machine Learning},
year={2017}}

概

利用特定的方法产生"坏"样本(Adversarial samples), 以此来促进网络的稳定性是当下的热点之一, 本文以实验为主, 比较PGD( projected gradient descent) 和 FGSM(fast gradient sign method)在不同数据下的表现, 以及由普通样本产生"坏"样本会出现的一些现象.

主要内容

Adversarial attacks 主要聚焦于下列问题:

[ ag{2.1} min_{ heta} ho ( heta) quad where quad ho( heta) =mathbb{E}_{(x,y)sim D}[max_{delta in S} L( heta, x+delta, y)]. ]
其中(S)是我们指定的摄动集合, 直接一点就是(|delta| <constant)之类.

通过FGSM产生"坏"样本:

[x + epsilon : mathrm{sgn}( abla_x L( heta,x,y)). ]
这个思想是很直接的(从线性感知器谈起, 具体看here).

PGD的思路是, 给定摄动集(S), 比如小于某个常数的摄动(e.g. ({ ilde{x}:|x- ilde{x}|_{infty}<c})), 多次迭代寻找合适的adversarial samples:

[x^{t+1} = prod_{x+S} (x^t + alpha : mathrm{sgn} ( abla_x L( heta,x, y)), ]
其中(prod)表示投影算子, 假设(S={ ilde{x}:|x- ilde{x}|_{infty}<c}),

[x^{t+1} = arg min_{z in x+S} frac{1}{2} |z - (x^t + alpha : mathrm{sgn} ( abla_x L( heta,x, y))|_2^2, ]
实际上, 可以分开讨论第((i,j))个元素, (y:=(x^t + alpha : mathrm{sgn} ( abla_x L( heta,x, y))), 只需找到(z_{ij})使得

[|z_{ij}-y_{ij}|_2 ]
最小即可. 此时有显示解为:

[z_{ij}= left { egin{array}{ll} x_{ij} +c & y_{ij} > x_{ij}+c \ x_{ij} -c & y_{ij} < x_{ij}-c \ y_{ij} & else. end{array} ight. ]
简而言之就是一个截断.

重复几次, 至到(x^t)被判断的类别与初始的(x)不同或者达到最大迭代次数.

Note
- 如果我们训练网络能够免疫PGD的攻击, 那么其也能很大一部分其它的攻击.
- FGSM对抗训练不能提高网络的稳定性(在摄动较大的时候).
- weak models may fail to learn non-trival classfiers.
- 网络越强(参数等程度)训练出来的稳定性越好, 同时可转移(指adversarial samples 在多个网络中被误判)会变差.
相关阅读:
贪心策略---不重叠的区间个数
 贪心策略---分配饼干
 双指针---最长子序列
 双指针---回文字符串
 双指针---反转字符串中的元音字符
 双指针---两数平方和
 双指针---有序数组的TWO SUM
排序---小结
 排序---桶排序
 变量的解构赋值
原文地址：https://www.cnblogs.com/MTandHJ/p/12411847.html

Towards Deep Learning Models Resistant to Adversarial Attacks

概

主要内容

Note