Last week, I’ve read a quite interesting article about a work by Mark Mayzner who worked in Bell Telephone Labs, Barnard College, USC, NYU, and Loyola. It was a very detailed research about the frequency of letters in English. He analyzed 20.000 words from the books in 1965 to show this frequency. Now let’s learn about more his Google Research visit and its contribution to his work almost 50 years later! 🙂
Note: This part is a quotation from the blog of expectlabs. You can see the link at the bottom.
In 1965, Mark Mayzner meticulously analyzed over 20,000 words from books, magazines, and newspapers using an IBM card-sorting machine, in order to paint a more complete picture of the various word and letter frequencies that characterize the English language. Mayzner recently contacted Peter Norvig, Google’s head of research, to see if he could update his experiment by leveraging the enormity of data in the Google Books Ngram Corpus. Norvig agreed to the challenge, and updated Mayzner’s study by analyzing the over 97,565 distinct words which were mentioned over 743 billion times in the Google data collection. In fact, Norvig’s sample had 37 million more word occurrences than the 20,000-word sample that Mayzner used.
Norvig’s chart below visualizes letter counts by word position, with the frequencies proportional to the length of the bars. The results show that the most common first letter in English is T, while the most common second letter is O.
Click here to find out more interesting facts.
Visit our Facebook page to read and enjoy more posts about languages!
You can also follow me on Twitter.
Let’s get connected more! We are on Google+.
What about learning more about AIM Consulting?