Interpretability of NLP models

Description of this Post
Author
Published

December 3, 2023

Interpretability of NLP models

Description of this Post
Author
Published

December 3, 2023

1 Title

Slide 1

2 hE lwoann

Slide 2

3 hE lwoann

Slide 3

4 hE lwoann

Slide 4

5 Plan for today

Slide 5

6 Why do we need interpretability?

Slide 6

7 Why do we need interpretability? |

Slide 7

8 Why do we need interpretability?

Slide 8

9 Why do we need interpretability?

Slide 9

10 Why do we need interpretability?

Slide 10

11 Why do we need interpretability?

Slide 11

12 Why do we need interpretability?

Slide 12

13 Why do we need interpretability?

Slide 13

14 Why do we need interpretability?

Slide 14

15 Why do we need interpretability?

Slide 15

16 NOS Nieuws. Sport. Live Programma’s 2 Q @

clathodieose oe Genweg

Slide 16

17 Title

Slide 17

18 x I BoG =Q

Fr Nn dows Need ay ey

Slide 18

19 Title

Slide 19

20 Why do we need interpretability?

Slide 20

21 Why do we need interpretability?

Slide 21

22 egg

x Can we ever truly understand a large-scale Al model’s internal reasoning? vy | Wh

Slide 22

23 Why do we need interpretability?

Slide 23

24 How do we explain a model?

Slide 24

25 How do we explain a model?

Slide 25

26 How do we explain a model?

Slide 26

27 How do we explain a model?

Slide 27

28 How do we explain a model?

Slide 28

29 How do we explain a model?

Slide 29

30 How do we explain a model?

Slide 30

31 Explanation Faithfulness

Slide 31

32 Explanation Faithfulness

Slide 32

33 Explanation Faithfulness

Slide 33

34 Explanation Methods

Slide 34

35 Explanation Methods

Slide 35

36 Explanation Methods

Slide 36

37 Explanation Methods

Slide 37

38 Behavioural Interpretability

Slide 38

39 Behavioural Interpretability

Slide 39

40 BLIMP

Slide 40

41 BLIMP

Slide 41

42 BLIMP

Slide 42

43 BLIMP

Slide 43

44 BLIMP

Slide 44

45 BLIMP

Slide 45

46 BLIMP

Slide 46

47 BLIMP

Slide 47

48 Behavioural Tests for Uncovering Biases

Slide 48

49 Behavioural Tests for Uncovering Biases

Slide 49

50 Limitations of Behavioural Tests

Slide 50

51 Limitations of Behavioural Tests

Slide 51

52 Feature Attribution Methods

Slide 52

53 Pronoun Resolution

Slide 53

54 Pronoun Resolution

Slide 54

55 Pronoun Resolution

Slide 55

56 Pronoun Resolution

Slide 56

57 Pronoun Resolution

Slide 57

58 Pronoun Resolution

Slide 58

59 Averaae contributions

Slide 59

60 Averaae contributions

Slide 60

61 Averaae contributions

Slide 61

62 Averaae contributions

Slide 62

63 Default Reasoning?

Slide 63

64 Feature Attribution Methods

Slide 64

65 Feature Attribution Methods

Slide 65

66 Attribution Dimensions

Slide 66

67 Feature Removal

Slide 67

68 Feature Removal

Slide 68

69 Feature Removal

Slide 69

70 Feature Removal

Slide 70

71 Feature Removal

Slide 71

72 Feature Removal

Slide 72

73 Feature Removal

Slide 73

74 Featu re Removal Conditioned on present features |

Slide 74

75 Featu re Removal Conditioned on present features |

Slide 75

76 Feature Influence

Slide 76

77 Feature Influence

Slide 77

78 Shapley Values

Slide 78

79 Shapley Values

Slide 79

80 Shapley Values

Slide 80

81 Shapley Values

Slide 81

82 Feature Influence

Slide 82

83 Feature Influence

Slide 83

84 Highlighting via Input Gradients

e Estimate importance of a feature using derivative of output w.rt that feature

Slide 84

85 Example of highlighting: Image classification

Slide 85

86 Gradient-based Highlightings for NLP

For NLP, derivative of output w.r.t a feature

Slide 86

87 Gradient-based Highlightings for NLP

For NLP, derivative of output w.r.t a feature

Slide 87

88 Problems with Using Gradient for Highlighting

e 100 “local” and thus sensitive to slight perturbations

Slide 88

89 Problems with Using Gradient for Highlighting

Slide 89

90 Problems with Using Gradient for Highlighting

Slide 90

91 Extensions of Vanilla Gradient

e too “local” and thus sensitive to slight perturbations

Slide 91

92 Extensions of Vanilla Gradient

SmoothGrad: add gaussian noise to input and average the gradient

Slide 92

93 Extensions of Vanilla Gradient

Integrated Gradients: average gradients along path from zero to input

Slide 93

94 Summary of Gradient-based Highlighting

Positives:

Slide 94

95 Summary of Gradient-based Highlighting

Slide 95

96 Probing

Slide 96

97 Probing

Slide 97

98 Probing | Linauistic

Slide 98

99 Probing | os-tase NER etc. |

Slide 99

100 Representations

Slide 100

101 What does probed info imply?

Slide 101

102 Why linear?

Slide 102

103 K(A) = 1.60 K(s) = 0.19

Probing | POS-tags | S| 0] k@ets7 K(s) = 0.83

Slide 103

104 x

x] | Recap

Slide 104

105 References

Slide 105