I love wordle

January 31, 2009

Here’s a wordle of this blog. Looks like ‘information’ is the winner!

Wordle: metadatamonkey

For those who haven’t met it yet, wordle counts the words in any document and distributes them graphically with incidence count shown by text size. It’s a little like a tag cloud that way.

The good people at www.wordle.net describe it as a ‘toy’, but I think it’s better than that. It gives you an immediate insight into where the bulk of the text lies. It’s not perfect, because individual words don’t always represent semantic categories – ‘Library of Congress’ would come out ‘library’ and ‘congress’ in wordle, giving completely the wrong idea. Also note that ‘journal’ and ‘journals’ are divided in the above wordle, even though they are a semantic match. (If they WERE united as a semantic match, the relative importance of the term in the context of the whole would be much clearer.)

But these are quibbles. This is an excellent technology, and I for one am hopeful that means will be found to improve it semantically. A user generated controlled vocabulary of semantically unified terms (‘controlled vocabulary’ itself springs to mind) could make this extremely powerful. After all, the search engines are beginning to get the hang of meaningful phrases that contain whitespace – why not the rest of us?

UPDATE: It’s been pointed out to me that I am far from the only person who thinks this is more than a toy. Courtesy of the New York Times, visualised word counts from all presidential inagural speeches since 1789: http://www.nytimes.com/interactive/2009/01/17/washington/20090117_ADDRESSES.html

And yes, the results are very telling.

