1 Title
2 hE lwoann
3 hE lwoann
4 hE lwoann
5 Plan for today
6 Why do we need interpretability?
7 Why do we need interpretability? |
8 Why do we need interpretability?
9 Why do we need interpretability?
10 Why do we need interpretability?
11 Why do we need interpretability?
12 Why do we need interpretability?
13 Why do we need interpretability?
14 Why do we need interpretability?
15 Why do we need interpretability?
16 NOS Nieuws. Sport. Live Programma’s 2 Q @
clathodieose oe Genweg17 Title
18 x I BoG =Q
Fr Nn dows Need ay ey19 Title
20 Why do we need interpretability?
21 Why do we need interpretability?
22 egg
x Can we ever truly understand a large-scale Al model’s internal reasoning? vy | Wh23 Why do we need interpretability?
24 How do we explain a model?
25 How do we explain a model?
26 How do we explain a model?
27 How do we explain a model?
28 How do we explain a model?
29 How do we explain a model?
30 How do we explain a model?
31 Explanation Faithfulness
32 Explanation Faithfulness
33 Explanation Faithfulness
34 Explanation Methods
35 Explanation Methods
36 Explanation Methods
37 Explanation Methods
38 Behavioural Interpretability
39 Behavioural Interpretability
40 BLIMP
41 BLIMP
42 BLIMP
43 BLIMP
44 BLIMP
45 BLIMP
46 BLIMP
47 BLIMP
48 Behavioural Tests for Uncovering Biases
49 Behavioural Tests for Uncovering Biases
50 Limitations of Behavioural Tests
51 Limitations of Behavioural Tests
52 Feature Attribution Methods
53 Pronoun Resolution
54 Pronoun Resolution
55 Pronoun Resolution
56 Pronoun Resolution
57 Pronoun Resolution
58 Pronoun Resolution
59 Averaae contributions
60 Averaae contributions
61 Averaae contributions
62 Averaae contributions
63 Default Reasoning?
64 Feature Attribution Methods
65 Feature Attribution Methods
66 Attribution Dimensions
67 Feature Removal
68 Feature Removal
69 Feature Removal
70 Feature Removal
71 Feature Removal
72 Feature Removal
73 Feature Removal
74 Featu re Removal Conditioned on present features |
75 Featu re Removal Conditioned on present features |
76 Feature Influence
77 Feature Influence
78 Shapley Values
79 Shapley Values
80 Shapley Values
81 Shapley Values
82 Feature Influence
83 Feature Influence
84 Highlighting via Input Gradients
e Estimate importance of a feature using derivative of output w.rt that feature85 Example of highlighting: Image classification
86 Gradient-based Highlightings for NLP
For NLP, derivative of output w.r.t a feature87 Gradient-based Highlightings for NLP
For NLP, derivative of output w.r.t a feature88 Problems with Using Gradient for Highlighting
e 100 “local” and thus sensitive to slight perturbations89 Problems with Using Gradient for Highlighting
90 Problems with Using Gradient for Highlighting
91 Extensions of Vanilla Gradient
e too “local” and thus sensitive to slight perturbations92 Extensions of Vanilla Gradient
SmoothGrad: add gaussian noise to input and average the gradient93 Extensions of Vanilla Gradient
Integrated Gradients: average gradients along path from zero to input94 Summary of Gradient-based Highlighting
Positives:95 Summary of Gradient-based Highlighting
96 Probing
97 Probing
98 Probing | Linauistic
99 Probing | os-tase NER etc. |
100 Representations
101 What does probed info imply?
102 Why linear?
103 K(A) = 1.60 K(s) = 0.19
Probing | POS-tags | S| 0] k@ets7 K(s) = 0.83104 x
x] | Recap