Text Processing

Description of this Post
Author
Published

February 8, 2024

Author
Published

February 8, 2024

Slide 1

1 Text preprocessing

Slide 2

2 Outline

Slide 3

3 Outline

Slide 4

4 Outline

Slide 5

5 Zipf’s law

Slide 6

6 High-frequency words

Slide 7

7 Low-frequency words

Slide 8

8 Zipf’s law vs. real data

Slide 9

9 Outline

Slide 10

10 Heaps’ law

Slide 11

11 Outline

Slide 12

12 Text preprocessing pipeline

Slide 13

13 Example

Slide 14

14 Stop-word removal

Slide 15

15 Outline

Slide 16

16 Stemming

Slide 17

17 Algorithmic stemming (Porter stemmer)

Slide 18

18 Algorithmic stemming (Porter stemmer)

Slide 19

19 Dictionary-based stemming

Slide 20

20 Hybrid stemming (Krovetz stemmer)

Slide 21

21 Stemming example

Slide 22

22 Outline

Slide 23

23 Example

Slide 24

24 Dealing with phrases

Slide 25

25 Summary

Slide 26

26 Additional References

Slide 27