A document is a representation of thought that is written, drawn, presented, or memorialized. Over the years documents have evolved as a means to structure a finite set of information to be shared or stored. In asset / investment management, documents constitute the code to life, and they are central to all transactions.
The proliferation of documents started with snail mail, then faxes, emails, and more recently virtual data rooms and cloud-based drives, making it ever easier to share, and as a result accelerating the velocity of document flow. Simultaneously, documents increasingly took the form of PDFs to avoid data tampering. The resulting situation is that investment professionals have the information they need in theory, but they do not have the bandwidth to extract, synthesize and act on it. Today we have an infinite amount of information shared across an infinite number of documents throughout the industry. More often than not, because the information cannot be readily extracted, a document is a source of information loss. ‘Water, water everywhere, but not a drop to drink!’
So much time is spent on items 1-3 above, that there is little time remaining to distill the message shared inside a document. Making this information available for broader use becomes a copy and paste marathon.
Most of the legacy industry solutions have addressed centralization and meta data management. Newer entrants are taking it a notch further by focus on improved user experience, SaaS offering at pricing that enables collaboration.
The next stage of transformation that the industry desperately needs is a focus on extracting relevant data from these documents.
Whether documents are transactional or informational in nature, the basic construction of most of these documents has not changed for a long time. Furthermore, many of these documents were not designed with end users in mind. Now it is time to disrupt these documents!
Address the source itself by eliminating the dependency on documents for areas that can be de-containerized. This is quite difficult to achieve, however, as it requires disrupting an existing ecosystem, changing processes and behaviors that are natural to a lot of people. This scenario requires retracing a few steps in the flowchart at the beginning of this blog and adopting a platform approach. One successful example of such a disruption is the way DocuSign created a new digital signing experience while maintaining the integrity of contract execution.
Leverage NLP at the destination, which is technically complex, but still easier to implement than #1. An added advantage is that AI and ML have a buzz associated with them which makes them easier to sell. We would argue that once the user base gets exposed to digital data repository, convincing them to bring about a change in the way documents are used (#1) will be easier.
Let's discuss top 3 NLP applications for the industry:
Leverage topic modeling: What topic is being addressed? How should the document be categorized? A few are already offering this.
Human-aided data extraction: The ratio of data relevance to document content is skewed. For example, a typical 1-page tear sheet includes 2 pages of disclosure, has 5 charts, a table, and investment commentary. Even in the charts and tables, it is often the case that only a single piece of information is relevant to build a time series which could be meaningful to the investor, and there are techniques to extract this information.
Clustering and classifying information: identifying areas of similarity, creating relationships between data points, and segmenting the important from the mundane.
Most firms are focused on streamlining document storage, centralization and collaboration. A few are beginning to think about creating a data repository. They seek to make use of data that:
Technology is key in aiding this evolution, as its capacity to analyze large quantities of data exceeds that of humans. Leveraging technology results in a momentous improvement for investors in their ability to make judgments around investing… much like what we saw in the recent Google Duplex teaser.
Recent transactions such as
Nasdaq’s acquisition of eVestment and Markit’s proposed acquisition of Ipreo are indicators the market’s perceived value of firms that centralize and house data is increasing. We’d venture that DocuSign is poised to disrupt the segment for contract and legal documents, as they facilitate a lot of document flow across a large network. Microsoft is another potential disruptor, as they have a large installed user base and the necessary technology. In the industry specifically, VDRs have meaningful mind share and are likely to partner with data centric firms to build dominant business models. Or will it be a case of rise of an underdog defying all expectations? We’d certainly root for one :)