Lab 1: Shuffling & Histograms
Bard College – Computer Science – Data Structures

In our first lab we will explore Java, the UNIX command line, and begin employing an empirical approach to algorithms that we will adopt throughout the course. Specifically, we will be building histograms. Histograms provide a way to look at the shape and frequency of a large data set, a first step in many analyses. This week, we’ll use histograms to uncover the underlying distribution of random numbers, and to determine the best way to shuffle.

Shuffling

Shuffling a set of objects is a common task, for example, in online poker, and as a first step in many randomized algorithms (e.g. quicksort). Our textbook covers a shuffle method (often referred to as the Knuth of Fisher-Yates shuffle) on page 32: 

    public static void shuffle(double[] a) {
        int n = a.length;
        for (int i = 0; i < n; i++) {
            int r = i + uniform(n-i);     // between i and n-1
            double temp = a[i];
            a[i] = a[r];
            a[r] = temp;
        }
    }

Let’s compare that method with this slightly simpler version:
    public static void shuffle(double[] a) {
        int n = a.length;
        for (int i = 0; i < n; i++) {
            int r = uniform(n);     // between 0 and n-1
            double temp = a[i];
            a[i] = a[r];
            a[r] = temp;
        }
    }

ShuffleTest  (on Moodle) runs an experiment you can use to compare these two approaches. 

With a partner, take some time to understand  ShuffleTest.  Using the Histogram program you create in this lab, can you figure out which method is a better shuffler?

Random Distributions

There are seven different text files of random numbers on moodle (r1.txt, r2.txt, r3.txt, r4.txt, r5.txt, r6.txt, r7.txt).  Each files consists of 700,000 numbers generated using one of the following methods from StdRandom .  

  1. uniform() 
  1. uniform(2, 12)  
  1. uniform(2.0, 12.0) 
  1. uniform(1000000.0, 1000001.0) 
  1. gaussian(7, 2.5) 
  1. die — uniform(1, 7) 
  1. two dice — uniform(1, 7) + uniform(1, 7)  (see Experiment 1.1.35 on page 61)

Can you match those seven distributions to the seven generated files?

The first method, uniform(), on that list was used in the following manner:

import edu.princeton.cs.algs4.*;

public class Random{
    public static void main(String args[]){