Can Social Media Data Predict Stock Price Changes?

In the previous part of this series on Sentiment Analysis, I ranted a bit about how correlation is often confused with causality — the trap of assuming “A” caused “B” because it’s easily observed that “B” occurred after “A”.  Strategy consulting firm Altman Vilandrie & Company conducted a proof-of-concept study to test if there really may be predictive value for their customers in measuring consumer sentiment expressed in social media.

Granted, this was only preliminary research. But the results warrant further exploration of social media as a proxy for actual sentiment. And the study exposed what might be a skew in news media sentiment.

Dan Stone of Altman Vilandrie & Company presented this study at the recent Lexalytics User Group meeting. The firm — focusing on the communications, media, smart grid, and related technology and investor sectors — helps their clients maximize shareholder value through business growth and performance optimization. The event that was studied was the iPhone 4 “Antennagate” problem during which Apple lost over $10 billion in market capitalization.

Along with examining sentiment from over 200,000 tweets, the study collected several thousand online news articles and scored them by circulation and sentiment. Some additional details are on the chart that follows.

Based on the preliminary results, social media may serve as a good proxy for actual consumer sentiment, as well as a leading indicator for changes to stock price and news sentiment.

An unexpected result was the exposure of a skew in news media sentiment, which recalibrates during Antennagate from more positive than social media to more negative than social media. Social media was both a leading indicator of stock drop and more reliable proxy for sentiment than online news sources which responded slowly, then overcompensated.

Even with futher validation, Stone feels this analysis technique may have limited application. However, if the methodology can be used to improve performance of just a percent or two for something on a large scale, it’s well worth the effort.

Stone shared some of the limiting factors met during this analysis and suggested there were others:

Availability of historical data — It was very difficult to obtain focused, historical Twitter data from the full firehose. A way was found to programmatically extract the data after several vendors were unable to deliver.

Off-the-shelf tools found insufficient — Many tools currently focus on monitoring as opposed to managing data.

Noise and bias inherent in large bodies of un/semi-structured data — The garbage-in-garbage-out principle which applies to all analyses, is particularly salient when attempting to extract insight from social media. Especially as we move beyond monitoring activity/sentiment to actually making business decisions, the rigor of our analysis depends not only on obtaining high quality data, but applying the right filters.

Attribution of observed behaviors to causes — For example, how does one treat the statement “I hate apple #iphone4”? The data is insufficient to attribute this tweet to Antennagate.

The combination of business, technical and analytical knowledge required to determine how to act upon the data:

  • Business: Understand the question, design the approach, identify the right filters
  • Technical: Data extraction, automation, etc.
  • Analytical: Bridging the gap between the business “what I want” and the technical “what I have”