There are no shortcuts to constructing thematic indices
Chapter 1 : Index construction
of least resistance by curvefitting the index rules to make the process easier and more operationally efficient . This can be achieved through the use of relatively standardised sector and industry classifications based on the results of a natural language processing ( NLP ) process , or even a combination of both . While expeditious , these processes alone will not help investors gain exposures that truly align with their investment theses .
Alas , there are no shortcuts : A rigorous research layer in the index-development process is essential , and standard classifications and NLP are simply tools at the index providers ’ disposal .
Taking a closer look at one relatively straightforward index , the BlueStar Digital Advertising index could be useful . This index is meant to help investors achieve targeted exposure to companies operating in the digital advertising and marketing industry by including only those companies that derive at least 50 % of their revenue from activities such as digital advertising exchanges , digital advertising-related data and analytics and digital advertising content production and distribution .
Deep research is key
From a starting point of nearly 34,000 unique companies with a market cap of more than $ 50m , only 58 ( 0.17 %) meet the pure-play rules of the index . By casting even a wide net , we can quickly eliminate 98.6 % of the starting universe . Only 466 companies are part of related standard industry classifications or produce a keyword hit

from an NLP process . With a significantly smaller set of companies , a thorough research process is more feasible . Using information and data from publicly available sources such as annual and quarterly filings , investor presentations and company websites , we can manually verify that 58 companies meet the rules of the index and 408 do not .
While this process appears straightforward , it is daunting and labour-intensive to do correctly for one index , let alone