Lab 1: Subtle Shuffling and Statistics
Bard College – Computer Science – Data Structures

In our first lab we will explore Java, the UNIX command line, the textbook’s standard library, and begin employing an empirical approach to algorithms that we will adopt throughout the course. Specifically, we will be looking at two common computations: shuffling an array and calculating the standard deviation; both are subtler than you might think.  

You will create two Java classes:

  1. Stats — A program that will report descriptive statistics of a set of numbers. 
  1. Hist — A program that displays a histogram of a set of numbers.

Both of these programs should be useful in analyzing and presenting the performance of various data structures over the semester. 

Shuffling

Shuffling a set of objects is a common task, for example, employed in online poker. The textbook provides a shuffle method (often referred to as the Knuth of Fisher-Yates shuffle) on page 32: 

    public static void shuffle(double[] a) {
        int n = a.length;
        for (int i = 0; i < n; i++) {
            int r = i + uniform(n-i);     // between i and n-1
            double temp = a[i];
            a[i] = a[r];
            a[r] = temp;
        }
    }

Let’s compare that method with this simpler version:

    public static void shuffle(double[] a) {
        int n = a.length;
        for (int i = 0; i < n; i++) {
            int r = uniform(n);     // between 0 and n-1
            double temp = a[i];
            a[i] = a[r];
            a[r] = temp;
        }
    }

The ShuffleTest class you are provided runs an experiment you can use to compare those two methods.  Read, understand and comment ShuffleTest .

Statistics

Given a set of data [x1,x2,...,xn][x_1, x_2, ..., x_n], the first measure we often calculate is the mean or average; the mean summarizes the data with a single number. Mathematically, we can write it as:

  • x¯=1ni=1nxi\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i

The Average class described on page 39 turns this summation into a loop (a trick we will often see). Implemented as on online, streaming, calculation, i.e., the program doesn’t know the NN ahead of time, it works on a variable sized input set.

public class Average
    public static void main(String[] args)
        int count = 0;       // number input values
        double sum = 0.0;    // sum of input values

        // read data and compute statistics