Bard College – Computer Science – Data Structures
In our first lab we will explore Java, the UNIX command line, and begin employing an empirical approach to algorithms that we will adopt throughout the course. This week, we will use these tools to explore something known as Benford’s law. You will write a lab report and create two Java classes:
Stats — A program that reports descriptive statistics of a set of numbers.
Benford — A program that displays only the first digit of a set of numbers.
These programs should be useful in analyzing and presenting the performance of various data structures throughout the semester.
Benford’s Law
Benford’s law says that the first digits of a collection of measurements are not uniform across 1,2,3,4,5,6,7,8,9 as one might expect. In fact, we can expect to see‘1’ not 1/9 of the time but almost 1/3 of the time as the most significant digit. Ideally, the distribution looks something like:
digit
frequency
1
30.1%
2
17.6%
3
12.5%
4
9.7%
5
7.9%
6
6.7%
7
5.8%
8
5.1%
9
4.6%
With the the mean digit being 3.44 and standard deviation being 2.46.
Create a program Benford that reads numbers from standard input and prints out only the first digit of those numbers. The String class has a method to access individual characters.
java -cp algs4.jar:. Benford < heights.txt
8
6
6
6
5
4
<SNIP>
7
7
Alternatively, you can also enter in a series of numbers, one at a time, from the terminal rather than using a file(typing control-d sends the end-of-file character in the terminal).
Histograms
You are provided class called Hist.java that creates a histogram. For example, the following example displays the 700,000 random numbres in r5.txt as a histogram with 11 bins.
java -cp algs4.jar:. Hist 11 2 13 < r5.txt
The histogram has 11 bins starting at 2.0−3.0 and ending with 12.0−13.0.
Benford’s Law
java -cp algs4.jar:. Benford < heights.txt
8
6
6
6
5
4
<SNIP>
7
7
Histograms
java -cp algs4.jar:. Hist 11 2 13 < r5.txt
2.000 - 3.000: 19525 *
3.000 - 4.000: 39090 ***
4.000 - 5.000: 58363 *****
5.000 - 6.000: 77738 ******
6.000 - 7.000: 96386 ********
7.000 - 8.000: 116633 *********