Benford’s law – how to detect fraud through numbers

How to detect problems pertaining to randomness of numbers using Benford’s law. The theory can be used to detect fraud, evasion of rules etc.

Benford's law

Suppose you have some financial data – let us say all the vouchers paid by the company in a given month, and you want to run some tests to determine if there are any anomalies. For example, are the employees beating the approval process by entering say $24.99 vouchers if the limit is $25, or if any fraud is being committed. One way to do this is to use Benford’s law.

Benford’s law states that in a given list of numbers generated naturally (for example stock prices or census figures), the probability of a number starting with 1 is 30.1%. The probability of a number starting with 2 is 17.6% and so on – it keeps decreasing as the numbers increase. The rationale behind it is explained as: it takes a 100% increase to take a number from 100 to 200. However, it takes only a 50% change to go from 200 to 300. 100% increase is more difficult to do (and thus has less probability of happening) than a 50% increase.

In this way, the probability of having a number starting with digit d is given by log(1+1/d), log to base 10. More information is available here. Its usually extended to the first two digits for analysis in the real world.

Download from here a spreadsheet (called Numeric Truth) to carry out this analysis for you. All you have to do is to paste your data into the green cells. After that, on the first sheet it will show the results of first digit analysis, and on the second sheet, two digit analysis. Have a look at the graph, the variances for the individual digits, and the total variance. That should give you a starting point for your analysis/audit.


Value of Pi


Wikipedia defines Pi or p as “a mathematical constant which represents the ratio of any circle’s circumference to its diameter in Euclidean geometry”. As a kid I used to be interested in the calculation of Pi. The value of Pi, to 100 places of decimal is:

3.1415 9265 3589 7932 3846 2643 3832 7950 2884 1971 6939 9375 1058 2097 4944 5923 0781 6406 2862 0899 8628 0348 2534 2117 0664

Value to 100 places
Value to 100 places

I have calculated this using the Unix command bc. The command for this is based on an identity (that I think is credited to Ramanujan) and is:
where a stands for the arctangent function.

The formula is:

pi = 24*arctan(1/8)+8*arctan(1/57)+4*arctan(1/239)

First, load the bc language with associated library using “bc -l”. Then, set the scale to the number of digits using “scale=100”. Afterwards run the identity I gave above.

To experimentally calculate Pi experimentally, there are two ways: One using a random number generator and the other by physically measuring the circumference of a given circle.

Consider a square of length unity. Within that, a circle is drawn, having unity diameter. If a point is taking within the square at random, the probability of that point also lying within the circle is Pi*(1/2)*(1/2) which is Pi/4. Now, start taking points at random (x,y) and see if x*x+y*y<=1/4 or not (if it is, then that means the point lies within the circle). Maintain the count of total number of points taken (t), and the number that fell within the circle(c). Now Pi can be calculated as:


The other method is to take a circular bottle (measure the radius r) or tin and tie a thread around its circumference. Measure the circumference(c). Now Pi is c/2r.

The following two sentences contain the value of Pi: the number of letters in each word indicates the corresponding number in the value of Pi.

1. “May I have a large cup of coffee.”

2. “How I want a drink, alcoholic of course, after the heavy chapters involving quantum mechanics.”

This makes the value of Pi easy to remember.

Lastly, how useful are the digits of Pi as a source of random numbers? Not bad, according to a study: “while sequences of digits from pi are indeed an acceptable source of randomness – often an important factor in data encryption and in solving certain physics problems – pi’s digit string does not always produce randomness as effectively as manufactured generators do”.

Always wanting to do my own thing, I downloaded the value of Pi to one million places from a website, split it into (x,y,z) coordinates, each having 5 digit precision. Here is the graph that got generated:


A bell curve, but could have been better – seems to mimic the results from the study.


Licensing and information about the blog available here.