By Colton Smith
===
Exploring alternative datasets to augment fiscal trading models is currently the hot tendency alongside the quantitative community. With thus much social media information out there, its house inwards fiscal models has move a pop inquiry discussion. Surely the stock market’s functioning influences the reactions from Earth but if the converse is true, that social media catch tin hold upwardly used to predict movements inwards the stock market, too thus this would hold upwardly a really valuable dataset for a diversity of fiscal firms too institutions.
When I began this projection every bit a consultant for QTS Capital Management, I did an extensive literature review of the social media catch providers too academic research. The primary approach is to accept the social media firehose, filter it downwards past times source credibility, apply natural linguistic communication processing (NLP), too create a diversity of metrics that capture sentiment, volume, dispersion, etc. The best results receive got come upwardly from using Twitter or StockTwits every bit the source. Influenza A virus subtype H5N1 characteristic of StockTwits that distinguishes it from Twitter is that inwards belatedly 2012 the choice to label your tweet every bit bullish or bearish was added. If these labels accurately capture catch too are used often enough, too thus it would hold upwardly possible to avoid using NLP. Most tweets are non labeled every bit seen inwards Figure 1 below, but the pct is increasing.
Figure 1: Percentage of Labeled StockTwits Tweets past times Year |
This weblog post volition compare the purpose of merely the labeled tweets versus the purpose of all tweets with NLP. To begin, I did some basic information analysis to amend sympathise the nature of the data. In Figure ii below, the seat out of labeled tweets per hr is shown. As expected in that location are spikes or thus marketplace seat opened upwardly too close.
![]() |
Figure 2: Number of Tweets Per Hour of the Day |
The overall marketplace seat catch tin hold upwardly estimated past times aggregating the seat out of bullish too bearish labeled tweets each day. Based on the previous literature, I expected a pregnant bullish bias. This is confirmed inwards Figure iii below with the daily hateful percetage of bullish tweets beingness 79%.
![]() |
Figure 3: Percentage of Bullish Tweets Each Day |
When writing a StockTwits tweet, users tin tag multiple symbols thus it is possible that the catch label could apply to to a greater extent than than 1 symbol. Tagging to a greater extent than than 1 symbol would probable betoken less specific catch too predictive potential thus I hoped to expose that most tweets only tag a unmarried symbol. Looking at Figure 4 below, over 90% of the tweets tag a unmarried symbol too a really little pct tag 5+.
![]() |
Figure 4: Relative Frequency Histogram of the Number of Symbols Mentioned Per Tweet |
The fourth dimension menses of information used inwards my analysis is from 2012-11-01 to 2016-12-31. In Figure v below, the top symbols, industries, too sectors past times total labeled tweet count are shown. By far the most tweeted well-nigh industries were biotechnology too ETFs. This makes feel because of how volatile these industries are which hopefully way that they would hold upwardly the best to merchandise based on social media catch data.
![]() |
Figure 5: Top Symbols, Industries, too Sectors past times Total Tweet Count |
Now I needed to decide how I would create the catch score to best comprehend the predictive potential of the data. Though in that location are obstacles to trading an opened upwardly to unopen strategy including slippage, liquidity, too transaction costs, analyzing how good the catch score at nowadays earlier marketplace seat opened upwardly predicts opened upwardly to unopen returns is a valuable sanity cheque to run into if it would hold upwardly useful inwards a larger ingredient model. The catch score for each twenty-four hr menses was calculated using the tweets from the previous marketplace seat day’s opened upwardly until the electrical flow day’s open:
S-Score = (#Bullish-#Bearish)/(#Bullish+#Bearish)
This S-Score too thus needs to hold upwardly normalized to expose the significance of a specific day’s catch with honor to the symbol’s historic catch trend. To exercise this, a rolling z-score is applied to the series. By changing the length of the lookback window the sensitivity tin hold upwardly adjusted. Additionally, since the information is quite sparse, days without whatever tweets for a symbol are given an S-Score of 0. At the marketplace seat opened upwardly each day, symbols with an S-Score higher upwardly the positive threshold are entered long too symbols with an S-Score below the negative threshold are entered short. Equal dollar weight is applied to the long too curt legs. These positions are assumed to hold upwardly liquidated at the day’s marketplace seat close. The maiden of all exam is on the universe of equities with previous twenty-four hr menses closing prices > $5. With a relatively little long-short portfolio of 250 stocks, its functioning tin hold upwardly seen inwards Figure half dozen below (click on nautical chart to enlarge).
![]() |
Figure 6: Price > $5 Universe Open to Close Cumulative Returns |
The thresholds were cherry-picked to exhibit the potential of a 2.11 Sharpe Ratio but the results vary depending on the thresholds used. This sensitivity is probable due to the lack of tweet book on most symbols. Also, the long too curt thresholds are non equal inwards an endeavor to hold roughly equal seat out of stocks inwards each leg. The neutral handbasket contains all of the stocks inwards the universe that exercise non receive got an S-Score extreme plenty to generate a long or curt signal. Using the same thresholds every bit above, the exam was ran on a liquidity universe which is defined every bit the top quartile of 50-day Average Dollar Volume stocks. As seen inwards Figure vii below, the Sharpe drops to a 1.24 but is all the same really encouraging.
![]() |
Figure 7: Liquidity Universe Open to Close Cumulative Returns |
The sensitivity of these results needs to hold upwardly farther inspected past times performing analysis on split develop too exam sets but I was really pleased with the returns that could hold upwardly potentially generated from merely labeled StockTwits data.
In July, I began working for Social Market Analytics, the leading social media catch provider. Here at SMA, nosotros run all the StockTwits tweets through our proprietary NLP engine to decide their catch scores. Using catch information from 9:10 EST which looks at an exponentially weighted catch aggregation over the concluding 24 hours, the opened upwardly to unopen simulation tin hold upwardly ran on the cost > $5 universe. Each stock is separated into its respective quintile based on its S-Score inwards relation to the universe’s percentiles that day. Influenza A virus subtype H5N1 long-short portfolio is constructed inwards a similar fashion every bit previously with long positions inwards the top quintile stocks too curt positions inwards the bottom quintile stocks. In Figure 8 below yous tin run into that the results are much amend than when only using catch labeled data.
Figure 8: SMA Open to Close Cumulative Returns Using StockTwits Data |
The predictive ability is in that location every bit the long-short boasts an impressive 4.5 Sharpe ratio. Due to having to a greater extent than data, the results are much less sensitive to long-short portfolio construction. To avoid the high turnover of an open-to-close strategy, nosotros receive got been exploring possible long-term strategies. Deutsche Bank’s Quantitative Research Team of late released a newspaper well-nigh strategies that alone purpose our SMA information which includes a longer-term strategy. Additionally, I’ve of late developed a strong weekly rebalance strategy that attempts to capture weekly catch momentum.
Though it is merely the beginning, my dive into social media catch information too its application inwards finance over the course of written report of my fourth dimension consulting for QTS has been really insightful. It is arguable that past times merely using the labeled StockTwits tweets, nosotros may hold upwardly able to generate predictive signals but past times including all the tweets for catch analysis, a much stronger signal is found. If yous receive got questions delight contact me at coltonsmith321@gmail.com.
Colton Smith is a recent graduate of the University of Washington where he majored inwards Industrial too Systems Engineering too minored inwards Applied Math. He at nowadays lives inwards Chicago too plant for Social Market Analytics. He has a passion for information scientific discipline too is excited well-nigh his developing quantitative finance career. LinkedIn: https://www.
===
Upcoming Workshops past times MD Ernie Chan
September 11-15: City of London workshops
November xviii too Dec 2: Cryptocurrency Trading with Python
I volition hold upwardly moderating this online workshop for Nick Kirk, a noted cryptocurrency trader too fund manager, who taught this widely acclaimed course of written report hither too at CQF inwards London.
Tidak ada komentar:
Posting Komentar