Computer Science 235 :: Data Structures and Algorithms

CS 235 Programming Project Tips - Spring 2008


This page contains a few helpful hints to help you to avoid common problems encountered every semester by many students. This page will be updated often so, as you run into trouble implementing the various labs, make sure to consult this page.


Spelling Checker

  1. Remember that Java is case-sensitive. If your dictionary is all in lower-case, and your file has some words in upper-case, the upper-case words won't be in the dictionary!

  2. When using the Scanner, you want to tokenize by non-letter characters. You tell the Scanner to delimit by non-letter characters by using the following pattern: "[^A-Za-z]+". See the API for how to use patterns with the Scanner class.

  3. Remember, it should be possible to use the same instantiated SpellingChecker object to check several documents, make sure that misspelled words found in one run of checkDocument() do not show up in the list of misspelled words for subsequent runs unless they really should be there (ie. make sure that you clear your list of misspelled words every time you scan a new document)

  4. Make sure that you never remove words from your dictionary of correct words when a new dictionary file is loaded, all words from all previously loaded dictionary files should still be in your dictionary after the file finishes loading

  5. Think carefully about how you implement the project. Think about the different abstract data types available to you and which ones are best suited to the task at hand. Just a hint, finding an element in an ArrayList or a Vector takes linear time, while finding them with the contains() method on a set or a map takes O(1) or O(log n) time on average.

Boggle

  1. The dictionary is in lower case and the Boggle board is in upper-case. Make sure you use the same case for everything so your binary searches will work. (Uppercase letters are considered "less than" lowercase letters.)


Sorting Lab

Counting the compares can be the most difficult part of this lab. They are particularly tricky because sometimes they are made inside of for or while loop tests. Take this case for example : for (int i = 0; i < items.length && items[a].compareTo(items[b]) > 0; i++)

  1. Here, there are two parts to the for loop condition. If you count inside the loop, you could miss the case where the loop didn't run because items[a].compareTo(items[b]) <= 0. However, if you just add an extra double increment to your compare count after the loop, you won't be accounting for the case where i < items.length (because java boolean expression evaluation is short circuited).
  2. A way around this problem would be the creation of a compare() function that does the incrementing. Make sure this method keeps the same meaning as the original call to compareTo() and then put it into the body of the sort to replace the compareTo()s.
  3. If you are having the problem of:
    TESTING: insertion sort with sorted data
        checking compares
        expected: 199
        actual: 298
    FAILED
    
    Then you need to re-initialize your variable that keeps track of the # of compares to 0 each time you call the sort() function.


XML Parser

  1. One of the trickiest parts of this lab is getting your node manipulation methods to work right. The test/passoff drivers call each of these methods several times on your parse tree in such a way that, if all of these methods are working correctly, the tree should come out identical to how it was before the manipulations. If any of these methods (removeChild(), appendChild(), insertChildBefore(), getLastChild(), getFirstChild(), etc.) does not work perfectly, whole sections of your tree will be obliterated or repeated in the post-manipulations tree.

  2. You'll need your Node classes before you can actually get anywhere with your parser. You can write the parser first, you know the functionality of the nodes based on the interfaces they implement, but you won't be able to compile/run it until you have working Node classes. I recommend writing/testing your Node classes thoroughly before beginning to write the parser.

  3. Make plenty of use of the TestNode class that is provided for download on the Project Description page. It can be a great tool in helping you trace problems in your tree. It has an excellent toString() method that prints out a node and it's children (if you pass it in the node at the root of the tree, it will give a String representation of the whole tree). It also prints out white space characters as their corresponding escape sequences. To create a TestNode, just pass it's constructor any object that is an instance of a class that implements the Node interface. Here's an example of how to use the TestNode class, in this example, doc is an instance of a class that implements the Node and Document interfaces (i.e. it is the root of a document parse tree): TestNode myTest = new TestNode(doc);
    System.out.println(myTest.toString());

  4. If you write your Node classes correctly, you should never have to mess with children or sibling pointers in the parser. To add a node to the tree, you should always use one of the methods on the Node classes (like append child).

  5. The basic idea of creating the parse tree is this: Somehow a node is kept track of which is the node that all tokens encountered are being added onto as children. When a start tag token is encountered, a new Element Node is created and that becomes the current node. When an end tag token is encountered, the parent of the current node becomes the current node.

  6. The Tokenizer will throw an IllegalStateException if you try to call a method out of place. For example calling getText() on on a StartTag is not needed and impossible. These errors can often occur if you forget to add breaks in your switch statements.


AVL Tree

  1. There are many bad and inefficient ways to do AVL balancing, your code should be able to run even the largest tests in the TestDriver in under 10 seconds. If your code is running slow, here are some things to check.
  2. Is your tree really balanced? Once you get 90,000 items in a tree, it's hard to check sometimes if the tree is really AVL balanced. An unbalanced tree will start to exhibit O(N) behavior in it's contains() and insert() methods. Usually, the culprit is in the remove() method.
  3. Are you re-calculating information that might not be changing? Store the heights of nodes inside the node object and only update them when they might have changed. Only a small fraction of the nodes in a tree will need to have their heights checked during each insertion and deletion and this fraction gets smaller and smaller as the tree gets bigger and bigger
  4. Are you using too much recursion? If you check the heights and the balancing of your tree by calling some recursive method on the root of your tree every time you insert or delete, you are doing too much work. Any time that you insert or delete, the only possible nodes that could need to have their height updated or that might become unbalanced are those along the path from the root node to the inserted note. The insert() and remove() methods should be recursive, and if they are, this path is "stored" in the recursive calls to insert() and delete(). Do your height checking, balance checking and balancing all as you come out of these recursive calls (after the recursive call in your recursive methods, put the code to do the work for these operations)

  5. One thing that isn't mentioned explicitly in the project description for this page, but should be a consideration as you design your AVL tree is what to do when you the left and right subtrees have the same height when it comes time to balance. You will recall that the decision as to whether a single or double rotation is needed is made by testing to see if the unbalance is left-left or right-right (both of which result in a single rotation) or left-right or right-left (both of which result in a double rotation). These cases are the only possibilities for insertion.

    When it comes time to delete, there is a 5th case that is possible. This is the case where node X is the unbalanced node (it's left child and right child have heights that are off by two) and the child with the larger height has two children that now, because of a delete operation, have equal heights. In this case, your code should "default" to a single rotation!. This means that, when you test to see which side is larger, you should make judicious use of the <= and the >= operators.

Debugging Tool

We have provided AvlGUI.java, TreeFactory.java, Tree.java, and BinaryTreeNode.java to help you see what your avl tree looks like as you add and remove nodes from it. In order for the GUI to draw your tree, it needs to have access to the root node of your avl tree. TreeFactory.java allows us to create an instance of your AVL tree. Tree.java is the interface that allows the GUI to access your root node. BinaryTreeNode.java is the interface that allows the GUI to know what kind of node the root node is. This interface also allows the GUI to get information from each node such as the data stored in the node, each child of the node, and the height of the node in the tree.

You are not required to implement TreeFactory.java, Tree.java or BinaryTreeNode.java but the AvlGUI will not draw your avl tree unless you do implement them.

Notes for using the GUI

The purpose for the GUI is to help you visualize your avl tree after adding or removing a node from the tree. The GUI has been programmed to allow you to add/remove nodes from your avl tree in two different ways. The clear tree button will clear the tree and the queue containing all the words loaded from a file. The clear queue button will remove all the left over words from the queue that were loaded from a file without modifying the tree.

You will notice that the tree will quickly grow out of the visible area in the main panel of the gui. All you need to do is Maximize or Restore Down the window so the Scroll Pane will adjust the visible content. If the tree is larger then the dimensions of the window then you will be able to use the scroll bars to see the area of the tree that you desire to look at.


Hash Lab

  1. The load factor of the hash table is the fraction N/S, where N is the number of objects stored in the hash table and S is the size of the table (the number of buckets). It doesn't matter how many objects are in each individual bucket.

  2. Make sure you ONLY re-hash when you need to add an item that will cause the load factor to exceed 1.0. For example, if you had a hash table with 5 buckets and you already had 4 items added, when you add the fifth item, you DON'T need to re-hash. When you want to add the sixth item, however, you will first see that the hash table is at its maximum load capacity. You want to rehash the table BEFORE you add the sixth item.

  3. Be careful with your size variable when you rehash!