Monday, July 28, 2014

BenfordsLaw

Problem:

TopCoder Problem Statement - BenfordsLaw

Overview:

Determine if any numbers in a set of data do not conform to Benford's law. That is the number of itmes that start with a 1 should fall within an expected range. The same goes for numbers starting with 2, 3, ... 9.

Java Source:
01: /*
02: TopCoder
03: Single Round Match: 155
04: Division: 2
05: Level: 2
06: Points: 500
07: Description: http://community.topcoder.com/stat?c=problem_statement&pm=1348
08:  */
09:
10: public class BenfordsLaw {
11:
12:     /*
13:      * Given the number of transactions, returns an array of doubles that
14:      * represents the expected number of items for each starting digit.  The
15:      * index of the array corresponds to the starting digit.
16:      */
17:     private static double[] calculateExpectedArray(int numTransactions) {
18:
19:         /*
20:          * Make the array of length 10, and set the 0 position to 0.0.  0
21:          * will never be used, but this avoids having to adjust everything by
22:          * 1.
23:          */
24:         double[] exp = new double[10];
25:         exp[0] = 0.0;
26:
27:         /*
28:          * The probabibility for each position is given by the equation
29:          * log(base 10) of (1 + 1/D)
30:          * Multiply this by the number of tranactions to get the expected
31:          * number.
32:          */
33:         for (int i = 1; i < 10; i++) {
34:             exp[i] = (double) numTransactions *
35:                     Math.log10(1.0 + ((1.0 / (double) i)));
36:         }
37:
38:         return exp;
39:
40:     }
41:
42:     public int questionableDigit(int[] transactions, int threshold) {
43:
44:         int[] actual = new int[10];
45:         double[] expected = calculateExpectedArray(transactions.length);
46:
47:         // count the number of items that begin with each digit.
48:         for (int i : transactions) {
49:             actual[getFirstDigit(i)]++;
50:         }
51:
52:         // Look for digits that fall out of their expected range.
53:         for (int i = 1; i < 10; i++) {
54:             if (outOfRange((double) actual[i], expected[i], threshold)) {
55:                 return i;
56:             }
57:         }
58:
59:         return -1;
60:     }
61:
62:     // Returns the first digit of an integer.
63:     private int getFirstDigit(int i) {
64:
65:         while (i >= 10) {
66:             i /= 10;
67:         }
68:
69:         return i;
70:     }
71:
72:     // Returns true if the count is outside the expected range.
73:     private boolean outOfRange(double i, double e, int threshold) {
74:
75:         double lowerBound = e / (double) threshold;
76:         double upperBound = e * (double) threshold;
77:
78:         return ((i < lowerBound) || (i > upperBound));
79:     }
80: }
Notes:

There's just two parts to this solution, neigher of which is very challenging. First we need to determine the number of items we expect to begin with each digit using Benford's law. calculateExpectedArray() handles this part. Using the number of transactions, it calculates the number of items we should expect to see for each starting digit. These expected amounts are returned in an array where position 1 is the number we expect to see starting with the digit 1. position 2 is the number we expect to see starting with the digit 2, and so on up to 9. We should not get any items starting with 0, but we'll fill that in anyway just as a place holder to avoid having to shift all of our indexes by 1.

After determining the expected amounts, we count the number items that actually start with each digit. This is done by the loop starting at line 48. The method getFirstDigit() just divides by 10 until the number is less than 10, and then returns it. Another approach to getting the first digit might be converting the number to a String, and grabbing the character at position 0. It helps that all numbers are positive, but this wouldn't be too difficult to deal with were it not the case.

Finally, we check the amounts and determine if they fall between ((expected amount) / threshold) and ((expected amount) * threshold). The first number that falls outside this range is returned. If they all fall within the range, -1 is returned.

Just two things to be careful of. First, make sure that your equation for Benford's Law is correct - Line 34 (use log10 instead just log); and second make sure that you don't lose any precision when mixing doubles and integers.

No comments:

Post a Comment