Word Frequency Lookup


中文

So you have got a list of popular English words popular-words.txt and a list of irregular English verbs irregular-verbs.txt. You might want to rearrange the latter into popular-irregular-verbs.txt -- a list of irregular English verbs sorted according to popularity. Anyway, this desire is what gets me started writing this small perl script wflu when I teach an English class.

Suppose you use GNU/Linux or *BSD like I do. Then you can open a terminal and use the following command to generate the sorted list: ./wflu -f popular-words.txt irregular-verbs.txt | sort -n | expand | > popular-irregular-verbs.txt

Data file format:

  1. The frequency file (specified by the -f option; popular-words.txt in our example) should contain a list of words, one one each line, sorted according to their popularities (decreasing frequency). Optionally, words can be followed by spaces and then their frequency counts (in some corpus).
  2. The data file (irregular-verbs.txt in our example) should contain a list of words, one one each line. Each word can be optionally followed by spaces and then some arbitrary string.
  3. Blank lines and lines beginning with # are ignored in both files.

Then wflu will look up, in the frequency file, each word from the data file, and output the word (and the arbitrary data following it) preceded by its popularity, and its frequency if this info exists in the frequency file.