Adversarial Training 周りの話まとめ
簡単のため, 以下のような略語を使用する.
- AE: Adversarial Examples
- AA: Adversarial Attack
- clean: AAを受けていない自然画像
- AT: Adversarial Training
- AR: Adversarial Robustness
- BN: Batch Normalization
- EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES [Goodfellow+, ICLR15]
- Improving back-propagation by adding an adversarial gradient [Nøkland, arXiv15]
- Learning with a strong adversary [Huang+, arXiv15]
- Adversarial machine learning at scale [Kurakin+, ICLR17]
- Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization [Shaham+, Neurocomputing18]
- Ensemble Adversarial Training: Attacks and Defenses [Tramèr+, ICLR18]
- Towards Deep Learning Models Resistant to Adversarial Attacks [Madry+, ICLR18]
- Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples [Athalye+, ICML18]
- Adversarial logit pairing [Kannan+, arXiv18]
- Adversarial Training and Robustness for Multiple Perturbations [Tramèr, NeurIPS19]
- Adversarial Training for Free! [Shafahi+, NeurIPS19]
- You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle [Zhang+, NeurIPS19]
- Towards the first adversarially robust neural network model on MNIST [Schott+, ICLR19]
- Interpreting Adversarially Trained Convolutional Neural Networks [Zhang+, ICML19]
- On the Connection Between Adversarial Robustness and Saliency Map Interpretability [Etmann+, ICML19]
- On the Convergence and Robustness of Adversarial Training [Wang+, ICML19]
- Fast is better than free: Revisiting adversarial training [Wong+, ICLR20]
- Improving adversarial robustness requires revisiting misclassified examples [Wang+, ICLR20]
- INTRIGUING PROPERTIES OF ADVERSARIAL TRAINING AT SCALE [Xie+, ICLR20]
- Max-margin adversarial (mma) training: Direct input space margin maximization through adversarial training [Ding+, ICLR20]
- Concise Explanations of Neural Networks using Adversarial Training [Chalasani, ICML20]
- Smooth Adversarial Training [Xie+, arXiv20]
- Identifying Layers Susceptible to Adversarial Attacks [Siddiqui+, arXiv21]
- Boosting Fast Adversarial Training with Learnable Adversarial Initialization [Jia+, arXiv21]
- Fixing Data Augmentation to Improve Adversarial Robustness [Rebuff+, arXiv21]
EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES [Goodfellow+, ICLR15]
- clean と AE を混ぜて学習する AT を初めて提案
Improving back-propagation by adding an adversarial gradient [Nøkland, arXiv15]
Learning with a strong adversary [Huang+, arXiv15]
Adversarial machine learning at scale [Kurakin+, ICLR17]
- label leaking: AT されたモデルが clean よりもむしろ AE に対して優れた性能を出してしまうこと. しかしこれは PGD などを使うと見られない.
- label leaking を考えると以下の二つの戦略で AT を行うのが良い.
- FGSM などの one step 手法を使うとき, 真のラベルを利用しないでモデルの予測するラベルを使った方がいい.
- PGD などの multi step 手法を使うのが良い.
- モデルの容量を増やすことで AR が上がる.
Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization [Shaham+, Neurocomputing18]
Ensemble Adversarial Training: Attacks and Defenses [Tramèr+, ICLR18]
- 複数の他のモデルで生成された AE を使う AT
Towards Deep Learning Models Resistant to Adversarial Attacks [Madry+, ICLR18]
- clean を使わずに AE だけで学習する AT を提案
- AR には clean を分類するより遥かに大きな容量が必要となる.
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples [Athalye+, ICML18]
- Obfuscated Gradients (難読化された勾配) に強く依存する防御が多くあるが, これは BPDA という attack で破ることができる.
- Backward Pass Differentiable Attack (BPDA): 勾配が計算できないレイヤーを勾配が1となるような別のレイヤーで置き換える. また不確実性を扱うモデルは期待値から勾配を予測する.
- 以下は obfuscated gradients を利用した防御
- Thermometer encoding: One hot way to resist adversarial examples [Buckman+, ICLR18]
- Characterizing adversarial subspaces using local intrinsic dimensionality [Ma+, ICLR18]
- Countering adversarial images using input transformations [Guo+, ICLR18]
- Stochastic activation pruning for robust adversarial defense [Dhillon+, ICLR18]
- Mitigating adversarial effects through randomization [Xie+, ICLR18]
- Pixeldefend: Leveraging generative models to understand and defend against adversarial examples [Song+, ICLR18]
- Defensegan: Protecting classifiers against adversarial attacks using generative models [Samangouei+, ICLR18]
- ちゃんとした防御
- Towards deep learning models resistant to adversarial attacks [Madry+, ICLR18]
- Cascade adversarial machine learning regularized with a unified embedding [Na+, ICLR18]
Adversarial logit pairing [Kannan+, arXiv18]
- 以下の二つの手法で AR が上がる.
- adversarial logit pairing: clean と AE の logit を似るように学習
- clean logit pairing: clean と clean (同じクラスですらない) の logit を似るように学習
Adversarial Training and Robustness for Multiple Perturbations [Tramèr, NeurIPS19]
- 複数の攻撃種類で AT されたモデルが, それぞれの攻撃に対して個別に AT したモデルと同等の頑健性を達成できないことを示す。
Adversarial Training for Free! [Shafahi+, NeurIPS19]
- モデルのパラメータを更新する際に計算された勾配情報を再利用することで, AE を生成するためのコストを排除した
You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle [Zhang+, NeurIPS19]
- ネットワークの第1層の入出力だけを使うことで AT の時間を減らす.
Towards the first adversarially robust neural network model on MNIST [Schott+, ICLR19]
- AT は使われた攻撃以外の攻撃に強くない.
- OOD を高い確信度であるクラスへと分類する.
Interpreting Adversarially Trained Convolutional Neural Networks [Zhang+, ICML19]
- AT されたモデルはテクスチャ・バイアスを緩和し, より形状を学習する.
- patch shuffle などにも実験をした.
On the Connection Between Adversarial Robustness and Saliency Map Interpretability [Etmann+, ICML19]
- AR が高いモデルは saliency の解釈可能性が高い.
- NN の非線形性がこの関係を弱めることをリプシッツ正則化を用いて学習したモデルを以って示した.
- 判定境界への距離が大きくなると、アライメントも大きくなる
On the Convergence and Robustness of Adversarial Training [Wang+, ICML19]
- AT の初期段階では収束品質 (AE の強さのようなもの) の良い AE を使う必要はなく, むしろ AR を下げる可能性がある.
- AT の後期段階では収束品質の良い AE を使う必要がある.
- 段々, 収束品質の良い AE を使うような AT を提案.
- 初期段階で非常に難しい AE を含めると、DNN の特徴学習が阻害されてしまう可能性があることを示唆している.
Fast is better than free: Revisiting adversarial training [Wong+, ICLR20]
- FGSM の AT にランダム初期化を加えることで, PGD の AT と同等の効果を得られる. これによって AT にかかる時間が削減できる.
- FGSM の AT は周期的学習率や混合精度学習などで劇的に加速することができる.
- PGD に対して FGSM の AT が失敗したのは catastrophic overfitting (突然, トレーニングデータに対するPGD への耐性が 0% になるような現象) によるもの. early stopping などで避けることができる.
- FGSM の AT が失敗する原因は多数ある.
- 初期化がゼロ
- ステップサイズが大きすぎる
- 特定の学習率スケジュールやエポック数
Improving adversarial robustness requires revisiting misclassified examples [Wang+, ICLR20]
INTRIGUING PROPERTIES OF ADVERSARIAL TRAINING AT SCALE [Xie+, ICLR20]
- BN は AT の際に学習の邪魔をする可能性がある. BN を用いてATしたモデルでは学習データから clean な画像を取り除くだけで, AR が18.3%と大きく向上する.
- clean と AE に別々のBNを適用することで, より強いARを達成できる.
- ネットワークをどんどん深くしていっても自然画像に対する精度は変化しないが, AT では深くするほど AR が上がる.
Max-margin adversarial (mma) training: Direct input space margin maximization through adversarial training [Ding+, ICLR20]
- ネットワークのマージンを最大化する
Concise Explanations of Neural Networks using Adversarial Training [Chalasani, ICML20]
Smooth Adversarial Training [Xie+, arXiv20]
- ReLU の非平滑性は AT を大幅に弱める.
- Smooth Adversarial Training (SAT): ReLUを滑らかな近似関数に置き換える. 例えば parametric softplus.
- 滑らかな活性化関数はより困難な敵対例を見つけるのを手助けする.
- 複数 step の攻撃で ReLU の問題は防げるかもしれない.
- 最終的に最も AR が良かったのは SILU(x) = x · sigmoid(x)
Identifying Layers Susceptible to Adversarial Attacks [Siddiqui+, arXiv21]
- モデルをいくつかの部分に分け, 再学習. 結果, AEの影響を受けやすいのは feature extractor である前半の層だった. なので後半の層を AT するだけではいけない. 逆に初期の層だけで AT すれば必要十分.
- 普通のモデルに対して生成された AE の特徴量は clean なものと本質的に異なる.
- AT は AE から抽出される特徴量を clean から抽出される特徴量と同じ分布にするように初期層の重みを変える.
- AT されたモデルに対して生成される AE は初期層で clean を模倣する.
Boosting Fast Adversarial Training with Learnable Adversarial Initialization [Jia+, arXiv21]
- Fast AT の初期化を工夫.
Fixing Data Augmentation to Improve Adversarial Robustness [Rebuff+, arXiv21]
- モデルの重みの平均化とデータ増強 (cutout, cutmix, mixup) を組み合わせるとARが大幅に向上する. cutmixが最も効果的.
- 最先端の生成モデルを使うともっと良くなる.
Latex warning: You have requested document class X の解決方法
tex
ファイルの方で
\documentclass[...]{AAA}
となっているときに, ファイル AAA の方で
\ProvidesClass{BBB}[...]
のように AAA BBB だとwarningが出る。ここを一致させるとwarningは消える。
Docker上JupyterLabのterminalがおかしい
普通、JupyterLabのterminalは以下のような感じなのだが、
username@hostname:~/$
なぜかDocker上でterminalを起動すると設定ファイルが全く同じでも以下のようになる。
#
これだと自分がどこにいるのか分かり辛いし、他にも矢印キーで過去のコマンドが使えないとか色々不便。
これは .jupyter/jupyter_notebook_config.py
の以下の一文を
c.NotebookApp.terminado_settings = {}
以下のように変えればいい。
c.NotebookApp.terminado_settings = {'shell_command': ['/bin/bash']}
最小の操作でGASとslack連携
目的
GASでslackにメッセージを送る。また毎日自動的にメッセージを送るようにする。
GASでslackにメッセージを送る
- https://api.slack.com/ にアクセス
- App NameとWorkspaceを入力
- Add features and functionalityを選択
- Incoming Webhooksをon
- Install your appでメッセージを送信するチャンネルを選択する
- 左のタブのIncoming Webhooksを選択
- Webhook URLをコピー
- GASで以下を入力, URLは上のWebhook URLを入力
- 実行
let text = "test" let url = "https://..." let options = { "method" : "post", "contentType" : "application/json", "payload" : JSON.stringify( { "text" : text, link_names: 1 } ) }; UrlFetchApp.fetch(url, options);
毎日自動的にメッセージを送るようにする。
- GAS左タブのトリガーを選択
- トリガーを追加
- 「イベントのソースを選択」->「時間主導型」
- 「時間ベースのトリガーのタイプを選択」 -> 「日付ベースのタイマー」
- 時刻は好きに選ぶ
IEEE Symposium on Security & Privacy16~20のadversarial examples関連論文リンク集
目視で判断したので、間違っていたり抜けてたりするかもしれませんが、ご容赦ください。
20
Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning*
[1911.02142] Intriguing Properties of Adversarial ML Attacks in the Problem Space
19
On the Feasibility of Rerouting-Based DDoS Defenses - IEEE Conference Publication
[1802.03471] Certified Robustness to Adversarial Examples with Differential Privacy
[1902.01350] F-BLEAU: Fast Black-box Leakage Estimation
18
17
[1608.04644] Towards Evaluating the Robustness of Neural Networks
16
[1608.04644] Towards Evaluating the Robustness of Neural Networks
[1511.04508] Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
AAAI16~20のadversarial examples関連論文リンク集
目視で判断したので、間違っていたり抜けてたりするかもしれませんが、ご容赦ください。
20
Adversarially Robust Distillation | Proceedings of the AAAI Conference on Artificial Intelligence
Universal Adversarial Training | Proceedings of the AAAI Conference on Artificial Intelligence
19
Distributionally Adversarial Attack | Proceedings of the AAAI Conference on Artificial Intelligence
18
Distributionally Adversarial Attack | Proceedings of the AAAI Conference on Artificial Intelligence
17
[1705.08378] Detecting Adversarial Image Examples in Deep Networks with Adaptive Noise Reduction
[1709.04114] EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples
[1801.04693] Towards Imperceptible and Robust Adversarial Example Attacks against Neural Networks
Learning to Attack: Adversarial Transformation Networks
16
(PDF) Multi-Defender Strategic Filtering Against Spear-Phishing Attacks
[PDF] Data Poisoning Attacks against Autoregressive Models | Semantic Scholar
ICML18~20のadversarial examples関連論文リンク集
目視で判断したので、間違っていたり抜けてたりするかもしれませんが、ご容赦ください。
20
[1909.13806] Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML
[2004.13617] Adversarial Learning Guarantees for Linear Hypotheses and Neural Networks
[2003.03778] Adversarial Attacks on Probabilistic Autoregressive Forecasting Models
[2002.11569] Overfitting in adversarially robust deep learning
[2002.04694] Adversarial Robustness for Code
[2008.02883] Stronger and Faster Wasserstein Adversarial Attacks
Attacks Which Do Not Kill Training Make Adversarial Learning Stronger
Adversarial Risk via Optimal Transport and Optimal Couplings
[2002.04599] Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations
[2006.14748] Proper Network Interpretability Helps Adversarial Robustness in Classification
[2006.16520] Black-box Certification and Learning under Adversarial Perturbations
[1909.04068] Adversarial Robustness Against the Union of Multiple Perturbation Models
[2007.11826] Hierarchical Verification for Adversarial Robustness
[2006.16384] Sharp Statistical Guarantees for Adversarially Robust Gaussian Classification
[1906.07153] Adversarial attacks on Copyright Detection Systems
[2011.07478] Towards Understanding the Regularization of Adversarial Robustness on Neural Networks
[1810.06583] Concise Explanations of Neural Networks using Adversarial Training
[2002.11821] Improving Robustness of Deep-Learning-Based Image Reconstruction
[2003.10602] Defense Through Diverse Directions
[2002.11565] Randomization matters. How to defend against strong adversarial attacks
[1907.02044] Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack
19
First-Order Adversarial Vulnerability of Neural Networks and Input Dimension
[1809.01093] Adversarial Attacks on Node Embeddings via Graph Poisoning
[1907.13220] Multi-Agent Adversarial Inverse Reinforcement Learning
[1901.08846] Improving Adversarial Robustness via Promoting Ensemble Diversity
[1903.06603] On Certifying Non-uniform Bound against Adversarial Attacks
[1904.00759] Adversarial camera stickers: A physical camera-based attack on deep learning systems
[1805.10204] Adversarial examples from computational constraints
[1905.07387] POPQORN: Quantifying Robustness of Recurrent Neural Networks
[1810.04065] Generalized No Free Lunch Theorem for Adversarial Robustness
[1902.10660] Robust Decision Trees Against Adversarial Examples
[1802.06552] Are Generative Classifiers More Robust to Adversarial Attacks?
[1902.02918] Certified Adversarial Robustness via Randomized Smoothing
[1905.06635] Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization
[1902.07906] Wasserstein Adversarial Examples via Projected Sinkhorn Iterations
[1905.05897] Transferable Clean-Label Poisoning Attacks on Deep Neural Nets
[1905.07121] Simple Black-box Adversarial Attacks
[1905.06494] Data Poisoning Attacks on Stochastic Bandits
Data Poisoning Attacks in Multi-Party Learning
Transferable Adversarial Training:A General Approach to Adapting Deep Classifiers
[1901.10513] Adversarial Examples Are a Natural Consequence of Test Error in Noise
[1905.09797] Interpreting Adversarially Trained Convolutional Neural Networks
[1806.02977] Monge blunts Bayes: Hardness Results for Adversarial Training
[1905.04172] On the Connection Between Adversarial Robustness and Saliency Map Interpretability
[1906.11897] On Physical Adversarial Patches for Object Detection
18
Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope
Synthesizing Robust Adversarial Examples
Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples
Selecting Representative Examples for Program Synthesis
Adversarial Risk and the Dangers of Evaluating Against Weak Attacks
Black-box Adversarial Attacks with Limited Queries and Information
Analyzing the Robustness of Nearest Neighbors to Adversarial Examples