"We can only see a short distance ahead, but we can see plenty there that needs to be done" (Alan Turing)

Software and tools

Simple Text Frequency

Download application
Language: Romanian/English
Supported systems: Windows XP/Vista/7

The Idea...

You would be surprised to see how much information you can learn by just analyzing the occurrences of each word in a text. You can find out the domain/nature of the text, the style of the author (word frequency is just one of the methods used in stylometry) etc.

For instance, if the occurrence of pronoun 'I' is very high, one can draw the conclusion the text is written in first person. Also, the presence of technical terms may indicate a text written in a 'frozen language', belonging to a scientific field.

No lemmatization, or other kind of operation is done to the text to be analyzed. It is a very simple text frequency tool. There are many ways to improve the software: word lemmatization, detection of numerals and proper names, detection of punctuation and sentence splitting.

