Word Frequency Lookup

中文

So you have got a list of popular English words popular-words.txt and a list of irregular English verbs irregular-verbs.txt. You might want to rearrange the latter into popular-irregular-verbs.txt -- a list of irregular English verbs sorted according to popularity. Anyway, this desire is what gets me started writing this small perl script wflu when I teach an English class.

Suppose you use GNU/Linux or *BSD like I do. Then you can open a terminal and use the following command to generate the sorted list: ./wflu -f popular-words.txt irregular-verbs.txt | sort -n | expand | > popular-irregular-verbs.txt

Data file format:

The frequency file (specified by the -f option; popular-words.txt in our example) should contain a list of words, one one each line, sorted according to their popularities (decreasing frequency). Optionally, words can be followed by spaces and then their frequency counts (in some corpus).
The data file (irregular-verbs.txt in our example) should contain a list of words, one one each line. Each word can be optionally followed by spaces and then some arbitrary string.
Blank lines and lines beginning with # are ignored in both files.

Then wflu will look up, in the frequency file, each word from the data file, and output the word (and the arbitrary data following it) preceded by its popularity, and its frequency if this info exists in the frequency file.

Most updated version of this page: http://frdm.cyut.edu.tw/~ckhung/p/toy/wflu.php; the version you're reading: July 27 2011 10:14:48.
Author: Chao-Kuei Hung at Chaoyang University Information Management Department
Save our Earth; please reduce printing, make use of the unprinted side, and recycle.
You are welcome to distribute this document in accordance with the Creative Commons Attribution-ShareAlike License or the Free Document License.