ACAMS Today, March-May 2025

AFC CHALLENGES

Popper ’ s swans

To begin our discussion of false positives , let us start with where they originate . Automated transaction monitoring tools designed to identify suspicious activity will always create two types of errors . The first is called a Type I error or , more familiarly , a false positive . Calling a false positive a Type I error is terminology from a branch of statistics called hypothesis testing . The underlying premise of hypothesis testing is that observation is not proof . To prove something with observation alone requires observing such phenomena everywhere in all cases , which is impossible for humans ― or computers . With a single observation , however , humans can disprove a thing . As philosopher Karl Popper explained , observing white swans will not prove that all swans are white , but we can disprove that all swans are white with a single black swan . 1

Hypothesis testing starts with a hypothesis ― a theory ― about the nature of a population . Like Popper ’ s white swans , this theory is one we are trying to disprove . This disprovable hypothesis is known as the “ null hypothesis .” A successful hypothesis test is one in which we have correctly rejected the null hypothesis , as Popper ’ s black swan did .

We can think of transaction monitoring rules as a form of hypothesis testing . If I want to identify the crime of structuring cash deposits to avoid currency transaction reporting , my null hypothesis is that the structured transactions are not suspicious . My sample is based on the typical behavior of clients attempting to structure . Such behavior might include clients who do multiple cash transactions , each under $ 10,000 , aggregating to over $ 10,000 in a single day . When my transaction monitoring system generates an alert , that alert rejects the null hypothesis by saying , “ This transaction is suspicious .” When my investigator works the alert and dispositions it as “ not suspicious ,” we conclude that the system has generated a Type I error , a false positive , because it incorrectly rejected the null hypothesis .

Inverse errors

A benefit of hypothesis testing is that it allows conclusions to be drawn about a population without the entire population being observed . This process is called sampling and , while useful , has risk . How can we be sure that the sample adequately represents the population as a whole ? Hypothesis testing accounts for this risk by assessing the likelihood that the sample is wrong . Part of the test ’ s design is determining how often it incorrectly rejects the null hypothesis or generates Type I errors . The rate of Type I errors is referred to as a test ’ s alpha ( α ). The relationship between the expected alpha and the actual alpha tells the researcher whether the sample is reliable . If the number of false positives is less than the test ’ s expected alpha , the researcher can conclude that the sample used for the test is sound and its conclusions can be reasonably applied to the whole . The default alpha for a generic hypothesis test is 5 %.

Most AFC professionals would be delighted to have this generic alpha for their transaction monitoring tests , as most anti-money laundering models routinely generate a 95 % false positive rate . 2 Why is this ? The industry ’ s high tolerance for false positives is due to the existence of the other type of errors made by these tests : Type II errors , also known as “ false negatives .” A false negative is a transaction that the system should have alerted on but did not ; the test failed to reject the null hypothesis when it should have .

We can think of transaction monitoring rules as a form of hypothesis testing

The rate of Type II errors is known as a test ’ s beta ( β ). There is an inverse relationship between a test ’ s alpha and its beta . The more false positives a test generates , the less likely it will be to generate false negatives . So FIs tolerate high alphas in practice because we have an extremely low tolerance for high betas . We do not want to miss suspicious activity .

Unfortunately for us , it is very easy to miss suspicious activity . The truth is most bank transactions are normal ― the null hypothesis is right . When the probability of something occurring is low , as is the case with one customer out of hundreds of thousands being a criminal , finding a low-probability event requires a sufficient sample size , which inevitably includes a lot of false positives .

That said , banks are neither required nor expected to capture 100 % of suspicious activity . This is good because our automated transaction monitoring systems never will . Indeed , the regulatory guidance on model risk management states explicitly that every model is flawed . 3 “ All models have some degree of uncertainty and inaccuracy because they are by definition imperfect representations of reality ,” says the Federal Reserve in its model risk management guidance . 4

Instead of perfection , your regulators will want to see that you know the ways in which your transaction monitoring systems are generating false positives and false negatives . Therefore , your goal as an end user should not be to eliminate false positives or false negatives because you never will . Instead , your goal must be to know how many false positives and false negatives your tools create and to ensure that your bank operates within the established tolerances for each type of error .

The reality of AI in financial crimes ― A human touch still matters

Are you familiar with the artificial intelligence ( AI ) hype cycle ? Gartner refers to it as the progression and excitement

18 acamstoday . org

ACAMS Today, March-May 2025 | Seite 18