Lab 1: Benford’s Law
Bard College – Computer Science – Data Structures

In our first lab we will explore Java, the UNIX command line, and begin employing an empirical approach to algorithms that we will adopt throughout the course. This week, we will use these tools to explore something known as Benford’s law.  You will write a lab report and create two Java classes:

  1. Stats — A program that reports descriptive statistics of a set of numbers.
  1. Benford — A program that displays only the first digit of a set of numbers.

These programs should be useful in analyzing and presenting the performance of various data structures throughout the semester.

Benford’s Law

Benford’s law says that the first digits of a collection of measurements are not uniform across 1,2,3,4,5,6,7,8,91,2,3,4,5,6,7,8,9 as one might expect. In fact, we can expect to see ‘1’ not 1/9 of the time but almost 1/3 of the time as the most significant digit.  Ideally, the distribution looks something like:
digit
frequency
1
30.1%
2
17.6%
3
12.5%
4
9.7%
5
7.9%
6
6.7%
7
5.8%
8
5.1%
9
4.6%
With the the mean digit being 3.443.44 and standard deviation being 2.462.46.

Create a program Benford that reads numbers from standard input and prints out only the first digit of those numbers. The String class has a method to access individual characters.

java -cp algs4.jar:. Benford < heights.txt
8
6
6
6
5
4
<SNIP>
7
7

Alternatively, you can also enter in a series of numbers, one at a time, from the terminal rather than using a file (typing control-d sends the end-of-file character in the terminal). 

You can direct the output of a program to another text file using the greater-than sign.

 java -cp algs4.jar:. Benford < heights.txt > firstdigits.txt

Histograms

You are provided class called Hist.java  that creates a histogram. For example, the following example displays the 700,000700,000 random numbers in r5.txt as a histogram with 1111 bins.

java -cp algs4.jar:. Hist 11 2 13 < r5.txt

The histogram has 1111 bins starting at 2.03.02.0-3.0 and ending with 12.013.012.0-13.0

2.000         -         3.000:     19525 *
3.000         -         4.000:     39090 ***