Holistic Detection – Fantasy or Reality?
The gains made in the financial services sector from the application of various forms of appropriate artificial intelligence are already making a difference. The challenge that practitioners are now presenting to the technologists is the next dimension – using these advances to achieve holistic analytics that combine a mixture of data types and involve both structured and unstructured data.
Huge benefit is being reaped in terms of false positive reduction using machine learning, compared to traditional approaches. The use of Natural Language Processing (NLP) is helping to track down the more obvious forms of market abuse and misconduct that are present in most capital markets focused institutions. The ability to achieve truly holistic monitoring is surveillance nirvana, but who is actually doing it?
Radar took time out to look at a case study at a highly reputable asset manager that has solved the problem, working closely with two data scientists and their account manager at Behavox. This client is extremely conservative and careful; it looks to the highest standards of regulatory compliance globally and challenges itself to achieve these. It has an impeccable regulatory record.
The primary focus for the client was to identify potential insider trading by the employees of the manager, the passing and use of material nonpublic information (MNPI), and cyber risk related to sensitive information being sent out of the institution that was proprietary to it. In particular, there was a need to monitor the use of, and relationships with, external expert networks that are crucial to the management of the positions, and trading decisions within the core portfolios.
The original approach to compliance was relatively simple; using lexicons, a large number of the compliance team’s manual hours, and some spreadsheets. The manager was processing ecommunications (ecomms) alerts derived from its own automated solution by a legacy provider, and then cross referencing these to records in three other systems of record/control that covered trades, personal account transactions and greylists combined with chinese wall data. The existing approach involved extensive commitment from the compliance team in hours devoted to the process.
The proof of concept (PoC) started in the first quarter of 2018 and took three months to complete. The client dictated what it wanted to see in terms of alerts and worked with two data scientists from Behavox to build the logic behind these and test them thoroughly on the PoC data set. Nine scenarios were created that were calibrated to the client from out of the box scenarios; three new ones were developed as custom to the client. Value was seen immediately in the quality of the alerts and the ability to reduce false positives markedly.
More work was done with the compliance team by the scenario analysts to fine-tune the scenarios and improve their accuracy. Three months later the client approved the move towards production for the 12 scenarios. The potential to reduce three sets of siloed alerts to just one was close to being achieved.
Challenges for this project
Without doubt the biggest challenge was creating and populating the data fields for ingestion that could capture the trade ID and transaction information. It also required a significant amount of data mapping for the personal transaction data, as well as the greylist data and the chinese wall status. Related to all this was the frequency of update of this underlying data, and then connecting that to the data in the scenarios. This all made the schema for ingestion more complex. The expert network data also took time to standardise and understand. The ability to solve the challenge was facilitated significantly by the way that the client worked cooperatively to explain their own process, and learn of the technology’s limitations, for a highly collaborative outcome.
Key facts in this project
127 pages of documentation were created for the client to explain each of the scenarios comprehensively, and each script contained in these. The analysts ran through each line of code with the client to ensure they understood it and agreed with its aim.
The client’s compliance and cyber teams were conversant with the NLP capability of the platform.
The cyber team were especially good to work with, as they already understood how to make models and this combined well with the behavioral science team at Behavox.
The client enabled the success of this project with the following key attributes: made suggestions; tested the new approach; explained their process; reviewed results; labelled data; gave the vendor time and the opportunity to investigate and re-run the logic in order to achieve the best results.
Three custom scenarios
1. Greylist + chinese wall list + comms.
- Flags communication from persons possessing MNPI about a company, with public side persons who have no restrictions for this company, with any mentions of restricted company or tickers.
- Identify inappropriate communication. Use relationship map, participant count, list of financial entities to narrow down context.
- Identify tickers and companies in communication (NLP does this. It is generally very accurate with tickers, but with some company names the situation gets more complicated).
- Identify company aliases, short names with use of catalog.
2. Personal trades + chinese wall list + greylist + comms.
- Flags personal trades executed by persons who are allowed to trade an asset, if these trades happen after communication with mention of the asset or linked company, sent by person who possesses MNPI about it.
- The same as above, as well as linking transactions to securities.
- For production, client provided unique company IDs, by which transactions and greylist entries can be linked.
3. Expert networks: comms + expert network meeting logs + personal trades.
- Flags personal trades executed shortly after a meeting with an expert representing linked company, or if the ticker is given as meeting topic.
- Flags communication sent by staff who took part in a meeting with mention of the expert, a company they represent, or a linked ticker.
- Flags direct communication with the expert, or contact looking very similar to the expert
- Computationally heavy, because a lot of communication without any filter must be analysed.
- Identification of inappropriate communication needs relationship map and participant count but also something to narrow down the context if the results are not manageable.
- Identification of the company (logs have it in free form) and linked tickers using a catalog.
Behavioral models employed
The client wanted to create a custom set of models that were tuned to the individual behaviors of more than 2,600 employees. Not a quick fix, but possible! The scenario analyst faced the challenge of having to keep the models in production up-to-date. Groovy script was employed and the analyst managed to get the daily script execution time down to about one hour, which was impressive. The main features that were created in the models measured: the amount by size of file attached within all emails that were being sent externally by the monitored employees, any abnormal increase in the amount of content with attachments for that individual each week, and also any abnormal increase in sum of attachment size for that individual each day.
The analyst had to account for large signature files to eliminate these from the sizing metric to make the models more precise. Another technical problem revolved around the processing and real-time capability of the scenarios, which had to be factored precisely to push alerts through as soon as possible, and account for processing delays.
Once the models were tested and refined the client started to see immediate results in terms of true positives, and a huge reduction in false positives. Math logic makes the outcome very precise – if the rules are strictly specified, the machine is 100 percent accurate.
Boxes ticked in this project
- Speed to production – just six months from a seemingly unsolvable problem to a big success.
- Solution combined out of the box and custom scenarios for comprehensive outcome.
- Savings – one better quality set of alerts from three previously monitored in silo.
- No data scientists required/employed at the client (only compliance/cyber teams involved).
- Holistic surveillance achieved combining structured and unstructured data.
- Not a black box – easy to explain the logic and approach to the third line, auditors and regulators.