Technical traders will tell you that the market is distilling out all possible news sources, and that the signal thus derived (the market itself) is already the most efficient possible way of distilling all that news. Thus my own intuition, that by analyzing the news in some way I can produce better models of future market behavior, would on this view be a waste of energy.
And yet people do analyze the news and predict future market behavior, and this process is itself the action of the market distilling the news. So it should be possible to model that process and predict it more quickly than sloppy meat-based processors can. We shall see. In the meantime, I expect to do some pretty interesting work.
As my news source, I'm going to go with the Reuters RSS feeds, because it's easy. I'm going to set up an RSS scraper into a database so I can build a history of news stories for later analysis. Part of the reason for this is that tick data from IB is a week delayed (unless you have money on the barrelhead, which I don't) and so my news feed will also have to be a week delayed. I can get free market closings, and that will also be an interesting thing, but really I'm going to want to have at least a few weeks' news built up. (Not to mention that it will probably be necessary to go back and re-analyze news from the perspective of later revelation - so a history of news will really be essential.)
So that's task #1 in my quantsem project.