If you will remember back a while ago I put up a simple test of a link graph that was hooked up to CNN and other news sources. That was for an article that I wrote for IBM Developer Works. At the time I thought to myself, "What would happen if I threw this algorithm at the same data over time." So I set up my machine to record the RSS feeds from around 20 different sources and set out to write some code to analyze the whole mess.
I became a little more pressed when it was clear that the invitation to Foo Camp entailed actually coming up with something cool. So, I set out to write a serious News/Blog Analysis system based on some of the work from the Link Graph article. Tonight I have the first version of the result.
Download the zipped tar file and unpack it into any directory. Then use Firefox to open the index.html file. I'm serious here about the Firefox thing. I've only tested it on one other browser and that was Safari. And Safari just cannot launch this bad boy.
If you want to try out this thing without downloading it to your machine, click here. Once again though, I have only tried this with Firefox and you will wait about a minute for the data to download.
The first thing you should see is the new dynamic link graph:

Click the image if you want to see it larger. So what is this telling us? This is showing us that the big story on the selected day was Hurricane Dennis. To change the date use the date control on the right hand side of the window:

Clicking (not dragging) on the slider will change the date. So will pressing the play, stop and pause buttons. Those will change the date at the rate of once a second. Giving you an animation of the graph.
Currently the graph is only showing the major network news sources. You can change that by clicking in the buttons in the Source portion of the panel.

Just click on an item to either add or remove it from the list of sources being used for the display. There is no way to add an arbitrary source of information to the system.
The final section on the right controls the words that are available.

Each word or term is assigned to one or more classes. For example, the term karlrove is both a person and political. You can enable or disable whole classes by clicking on the class you wish to enable or disable.
If you select a word then an option will appear in this panel to remove the word from the display. Removed words appear in the section below the 'bad words' line. You can add them back into the display by clicking on them again. I would use drag and drop, but frankly I'm too tired for that.
Ok. That concludes the right hand portion of the display.
The real fun starts when you click on a word.

Here I have clicked on the term marijuana. The first thing to notice is that other terms have been hilighted as well. Those are terms that are related by being in the same articles as the selected term. In this case it pretty clearly points out that there is something to do with marijuana in California that has something to do with the Supreme Court. If you will remember that around 7/9/05 the Supreme Court had just upheld some challenges to states rights concerning medical marijuana.
Ok, so let's jump ahead a little and click on karlrove.

He shows up a lot, so that's not surprising to see him related to a lot of terms. I'm not sure why he is related to Harry Potter. It could have been a reference in an article that covered various topics of the day.
It's also worth noting that on the right hand side of the window things have changed as well. The sources that referenced the selected term on the given day are shown in bold. In this case a whole bunch of blogs covered the topic, but only NBC and the Washington Post actually had stories on it. So much for liberal media bias.
But don't we have this data over time? Where is that? Click on the Timeline tab.

That shows us the articles that mentioned Karl Rove, over time by source. The brown bars are each one article. Clicking on the brown bar will open up a new window and take you to the article.
If you want to read the titles of the articles click on the Articles tab.

Not much to talk about there. This is more of a "show me the money" report.
Now, for the really, really cool stuff. And to see Karl Rove escape from the media eye in graphs:

This the the Prevalence tab and it shows the sources that referenced the selected word and the number of counts of that word as a function of time. As you can see Al Franken was the most vocal about Karl, but that spiked about mid-July and has been dropping since. Other media outlets spiked in early July and then fell off. Only the Huffington Post seems to be holding the torch, but even that light is fading. Curiously Michelle Malkin, noted right wing blogger and Fox regular, also seems to be vocal about the story.
The Related tab shows the terms related to the selected term by source:

Only the sources selected in the Sources panel are shown in the Related tab.
So what next for this system? I really don't know. I'll probably keep it rolling. Maybe add a feature or two. Probably try to get it compatible with IE and maybe Safari. The problem is really the maintenance of the data. The data requires hand pruning which is very tiresome. Most days bring about 200 new terms and a host of new names that need to be turned into terms. It's not something I can keep up with myself.
In addition, while I think this was an interesting experiment in how far you can push DHTML it's obviously not practical for multiple months worth of data. So something a little less interactive but more scalable that included a server in the solution is probably the way to go.
Let me know if this interests you and if you have any suggestions for how it can be improved beyond just browser compatibility and bug fixes. By the way, this is pre-alpha software and is completely without warranty. If it eats your machine or your browser, that is notable, but unintended.
Posted by jherr at August 7, 2005 09:01 PMThanks for signing in, . Now you can comment. (sign out)
(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)