How Clade will use doctoral theses to build the platform
Doctoral theses are often a kickstarter for making an idea a reality and Clade has been looking at many sources to understand what types of algorithms will work best, whilst building out our platform.
If we look back in time to 2007 a these was written, “Media Coverage and the Cross-Section of Stock Returns”, by L. Fang and J. Peress.
This thesis as many do also uses previous research for information, in this instance 1987. Therefore the idea that media can influence the way stocks or companies performance is measured on a stock market, more than likely goes back to the very first financial publications, such as the Wall Street Journal below.
At Clade we’re not just using one doctoral paper to look at how to build out our algorithms, we’re using as many. As we find information in a thesis we add this into our refinement model.
For example if we look at finding from “USING MACHINE LEARNING AND ALTERNATIVE DATA TO PREDICT MOVEMENTS IN MARKET RISK”* We can see that there are 3 findings all related to Twitter.
- Twitter sentiment can be used to predict the price trend of GOOGL, AAPL and FB. In addition, tweet volume has a strong impact on both price and trend. (2016)
- Daily bullish percentage extracted from Twitter helps explain excess returns even when the traditional factors used in asset pricing models are considered. (2017)
- Volatility sentiment on social media contains information regarding future stock volatility. (2017)
These findings most definitely provide us with insights into how media, such as twitter, like previously with finance journals and newspapers still have the ability to effect how a company is viewed and hence the stock price.
The equation below, which is from the same thesis refers to a CBOE Volatility Index or VIX.
The research paper, using the VIX formula with 1 year of pricing data for Apple (AAPL), mapped the outputs against related publications. The team observed that depending on the data brought into the equation the quality, relevence and timing of the information. This impacted how the prediction could be evaluated.
So how will Clade approach given a similar set of issues. The Clade algorithms will not just be one super algorithm but many.
In the example above Clade will be able to refine the datasets first before we approach the VIX Vs Machine Learnt Prediction Model. As part of Clades data points, we have specifically widened the data to using multiple and highly accurate news APIs, which if are searchable using keywords. We’re also using social media and twitter as referenced in the paper, however again we’re filtering using Supervised Learning techniques.
In a later post, we will go into much further detail using the Python coding and the API’s we’re building.
*by T. Dierckx, J. Davis & W. Schoutens (https://arxiv.org/pdf/2009.07947v1.pdf)