Column Sort

Note: Projects are to be completed by each student individually (not by groups of students).

Introduction

Column Sort is a sorting algorithm that was designed for use in parallel processing environments where multiple computers (or CPUs) are available to help sort the data. The task of sorting the data can be divided among the available CPUs. Because the CPUs can work on their individual parts simultaneously, the data can be sorted much faster than with a single CPU. For this project you will implement Column Sort in a single-processor environment (i.e., on a regular computer with only one CPU). While you would never actually use Column Sort in a single-processor environment, working with this algorithm will provide the following benefits:
  1. You will gain valuable experience working with sorting algorithms. As you examine the Column Sort algorithm, you will see that Column Sort spends a lot of time sorting sections of the data (the columns). You will need to supply one or more auxiliary sorting algorithms for Column Sort to use to sort the columns. You may use any sorting algorithms that you wish as long as you write the code yourself (with help from the textbook, of course).
  2. You will gain valuable experience with algorithm implementation. There are many different ways to implement Column Sort, and you will need to choose among the many possibilities (including your choice of auxiliary sorting algorithms).
  3. You will gain valuable experience with algorithm analysis. After implementing Column Sort, you will be asked to derive a Big-Oh bound for your implementation and to experimentally verify your result.
A description of the Column Sort algorithm is provided at the following link:
Column Sort Algorithm Description

Project Files

The following files are required to complete the project. You should download these files with a web browser. You are not allowed to modify these files in any way except for implementing methods that were interface definitions may not be modified, and all code must remain in the cs235.columnsort package.

Requirements

Requirement #1 - Implement the ColumnSort interface

Write a Java class that implements the ColumnSort interface defined in ColumnSort.java. This interface has one method which has the following signature:
void run(short[] data, int rows, int columns);
The run method will sort an array of integers. Your implementation should use the Column Sort algorithm to do the sorting. The first parameter, data, is the array to be sorted. Although data is a one-dimensional array, Column Sort treats the input as a two-dimensional matrix of values. The number of rows and the number of columns included in the input array are specified by the second and third parameters, rows and columns. (The number of columns would normally correspond to the number of CPUs that are available to help do the sorting.) You may assume that the inputs to the method already satisfy the constraints of the Column Sort algorithm: data.length == rows * columns, rows % columns == 0, and rows >= 2(columns-1)*(columns-1).

Requirement #2 - Modify the ColumnSortFactory

Modify the ColumnSortFactory class in ColumnSortFactory.java to return an object of your ColumnSort implementation class. This will allow the test and passoff drivers to run your code.

Requirement #3 - Test and debug your code

The test driver in TestDriver.java is provided to help you test and debug your code. The test driver performs the following steps:
  1. Generate an array of random numbers to be sorted
  2. Sort the data using Column Sort, and print out how long the sort took
  3. Sort the data using Java's built-in sorting algorithm, and print out how long the sort took
The test driver accepts two command-line arguments, both of which are integers:
  1. count - The number of elements to be sorted. (This number might be adjusted upward by the test driver in order to satisfy Column Sort's constraints on the number of rows and columns.)
  2. columns - The number of columns to be included in the generated data. (The number of rows is inferred from the number of elements and the number of columns).
If no command-line arguments are specified, instead of generating random test data the test driver uses the example 9x3 matrix from the Column Sort algorithm description. This may be useful to you as you test and debug your Column Sort implementation.

For example, the following command would run the test driver with an array of 10 million random values and 10 columns:
java -cp \projects\columnsort cs235.columnsort.TestDriver 10000000 10
The output from this command would look something like this:

java -cp \projects\columnsort cs235.columnsort.TestDriver 10000000 10
padded count: 10000000
rows: 1000000
cols: 10
column sort: 7541 ms
built-in sort: 3735 ms
padded count: is the actual size of the data array after being adjusted to meet Column Sort's constraints. rows: and cols: are the numbers of rows and columns used. column sort: displays the running time of Column Sort in milliseconds. built-in sort: displays the running time of Java's built-in sorting algorithm in milliseconds. This is for comparison with Column Sort's running time.

NOTE: By default, Java programs are limited to using 64MB of memory. In order to sort large arrays of data, when running the test driver you will need to raise this limit by using the -Xmx command-line option on the Java virtual machine. For example, the following command would allow the test driver to consume up to 350MB of memory:
java -Xmx350m -cp \projects\columnsort cs235.columnsort.TestDriver 10000000 10 1 47
If you don't use the -Xmx option to increase the memory limit, you will get an "out of memory" error whenever the test driver exceeds the default 64MB limit.

Requirement #4 - Analyze your algorithm implementation

Provide written answers to the following questions:
  1. In terms of the input size, N, what is the Big-Oh bound on the running time of your Column Sort implementation? (The answer will depend on how you implemented the algorithm.) Explain how you reached your conclusion.
  2. In terms of the input size, N, what is the approximate peak memory usage of your Column Sort implementation? Again, explain how you reached your conclusion.

Requirement #5 - Experimentally verify your algorithm analysis

Use the technique described in section 5.7 of the textbook to verify the Big-Oh bound on your Column Sort implementation. This requires that you write a program that runs your Column Sort implementation on inputs of different sizes and prints out a table similar to Figure 5.13 in the textbook. For example:
N           T            T/N              T/N^2             T/NlogN
100000      10           .0001            .000000001        .00002
1000000     200          .0002            0                 .00002
etc.
You will need to select the number of table rows and appropriate values for N. Aim for 5-6 rows and have a reasonable variety of N values. The goal is to demonstrate that the table values are converging, thus validating your Big-Oh analysis.

Requirement #6

Go through the submission process and then meet with a TA for a pass off interview. During the interview you will be required to do the following:
  1. Walk the TA through your code and explain how your implementation of Column Sort works
  2. Turn in a printout of your answers to the questions given in Requirement #4
  3. Walk the TA through the program you wrote to verify your algorithm analysis and explain why its output supports your conclusions