Hashing

Note: Projects are to be completed by each student individually (not by groups of students).

Be sure to check out the Tips at the bottom of the page.

Introduction

This project requires you to implement parts of the java.util.Set and java.util.Map interfaces using hashing. Your implementations will be compared against the java.util.TreeSet and java.util.TreeMap classes in a number of test cases. Your hash table will also be tested to determine if items are hashed to the 'correct' location in the table.

Requirement #1: Implement the java.util.Set Interface

The on-line documentation at java.sun.com defines the java.util.Set interface. You must implement the interface using hashing. You must fully implement the following methods:
boolean add(Object o)
void clear()
boolean contains(Object o)
boolean isEmpty()
Iterator iterator()
boolean remove(Object o)
int size()
Object[] toArray()
Your implementation is allowed to throw UnsupportedOperationException for the following methods:
boolean addAll(Collection c)
boolean containsAll(Collection c)
boolean removeAll(Collection c)
boolean retainAll(Collection c)
Object[] toArray(Object[] array)
For example, you could implement addAll as follows:
boolean addAll(Collection c) {
     throw new UnsupportedOperationException();
}
Your implementation must conform to the required behavior of each method as defined in the on-line documentation. In addition to the requirements given in the on-line documentation, you must also do the following:
  1. You must resolve collisions using chaining.
  2. When you add an item to a bucket in the hash table, you must add the new item to the end of the chain of items that previously hashed to that bucket.
  3. When adding item x to your hash table, you must use the following hash function:
    Math.abs(x.hashCode() % hash_table_size)
  4. The initial size of your hash table must be 101.
  5. When you add a new item to the hash table that would cause the load factor to exceed 1.0 you must rehash into a hash table of size:
    2 * hash_table_size + 1
  6. The load factor of the hash table is the fraction N/S, where N is the number of objects stored in the hash table and S is the size of the table (the number of buckets). It doesn't matter how many objects are in each individual bucket.
  7. When rehashing, iterate through the old table in the same order as the iterator described in the next section.
  8. When removing items, don't do any rehashing.
Put your implementation class in the cs235.hash package.

Requirement #2: Implement the java.util.Iterator Interface

The java.util.Iterator interface is defined on-line at java.sun.com. You must implement an Iterator that will iterate over the items stored in your hash table. You could implement your Iterator using a class that is nested inside of the class you use to implement java.util.Set. You must fully implement the following methods:
boolean hasNext()
Object next()
Your implementation is allowed to throw UnsupportedOperationException for the following method:
void remove()
Your implementation must conform to the required behavior of each method as defined in the on-line documentation.

The definition of java.util.Set allows an Iterator to return the items in the set in any order. You must return the items in the following specific order. Iterate sequentially through the buckets in your hash table from index 0 through index hash_table_size - 1. For each bucket in the table, iterate through the items in the bucket in the order they were added to the hash table.

The iterator method in the class that implements java.util.Set should return an Iterator object as described in this section.

The toArray method in the class that implements java.util.Set must fill its array in the same order as described here for the Iterator. In fact you could use an Iterator to help implement the toArray method.

Requirement #3: Implement the method in the cs235.hash.SetFactory class.

The file SetFactory.java contains a class named SetFactory. SetFactory has one method named createSet. You must implement this method. The method creates a new instance of your hash table class and returns it.

Requirement #4: Implement the java.util.Map Interface

The on-line documentation at java.sun.com defines the java.util.Map interface. You must implement java.util.Map using your implementation of java.util.Set. You must fully implement the following methods:
void clear()
boolean isEmpty()
Object get(Object key)
Object put(Object key, Object value)
Object remove(Object key)
int size()
Your implementation is allowed to throw UnsupportedOperationException for the following methods:
boolean containsKey(Object key)
boolean containsValue(Object value)
Set entrySet()
Set keySet()
void putAll(Map t)
Collection values()
Your implementation must conform to the required behavior of each method as defined in the on-line documentation. Put your implementation class in the cs235.hash package.

Implementation Notes

  1. Write one class that implements java.util.Set and a second separate class that implements java.util.Map. The class that implements Map contains an object instantiated from the class that implements Set.
  2. Write a class that contains key/value pairs. Implement the equals and hashCode methods for this class such that they operate only on the key. The hash table used to implement the Map contains these key/value pairs.
  3. Include an extra method in your class that implements Set. The method is similar to contains except that it returns an Object from the hash table rather than only returning a boolean. You can pass a key/value pair to this method with only the key defined and the method will return the complete matching key/value pair from the hash table.

Requirement #5: Implement the method in the cs235.hash.MapFactory class.

The file MapFactory.java contains a class named MapFactory. MapFactory has one method named createMap. You must implement this method. The method creates a new instance of your class that implements java.util.Map and returns it.

Requirement #6: Test Your Code

The file TestDriver.java contains a test program you can use to help you test your code. A number of files must be in the current directory when you run the test driver. These files are packaged together in a zip archive hashtests.zip.

The test driver should not be viewed as a replacement for doing your own testing. If your code passes the test driver, it is still possible that it will fail at pass off because a different test program is used for pass off (i.e., the pass off driver). It is your responsibility to make sure that your code works.

Requirement #7: Use Your Hash Table in Your Spelling Checker

Modify your spelling checker to use your Hash Table to store the Dictionary (set of correct words). You will need to import the cs235.hash package to access your Hash Table implementation.

Requirement #8: Pass Off Your Project

Pass off your code according to the course pass off policy. You will be required to answer questions about your implementation and to show your modified spelling checker.

Tips

  1. The load factor of the hash table is the fraction N/S, where N is the number of objects stored in the hash table and S is the size of the table (the number of buckets). It doesn't matter how many objects are in each individual bucket.
  2. Make sure you ONLY re-hash when you need to add an item that will cause the load factor to exceed 1.0. For example, if you had a hash table with 5 buckets and you already had 4 items added, when you add the fifth item, you DON'T need to re-hash. When you want to add the sixth item, however, you will first see that the hash table is at its maximum load capacity. You want to rehash the table BEFORE you add the sixth item.
  3. Be careful with your size variable when you rehash!