Sing, Pray Announcements =============== * Test adjusted to be out of 160 rather than 175 (email was sent out to everyone) * Homework #3 due today Tips ======= Josh's Tip on TokenizerImp: * jar -cvf xml.jar cs235/xml/TokenizerImp.class * Also, in Eclipse, you will need to add an External JAR in the project properties. * Furthermore, in order to be able to use the file without having to use "import" statements, then it needs to be in the same package that your are working in Outline & Reading Notes =========================== Binary Search Trees (BST) -------------------------- For any node in the BST, all smaller keyed nodes are in the left subtree and all larger keyed nodes are in the right subtree. Duplicates are not allowed. BST Order Property: In a binary search tree, for every node X, all keys in X's left subtree have smaller values than the key in X, and all keys in X's right subtree have larger values tha the key X. In a tree of N nodes there are N+1 null links. BST Operations: * A find operation performed by repeatedly branching either left or right, depending on the result of a comparison. * The findMin operation is performed by following left nodes as long as there is a left child. The findMax operation is similar. * The remove operation is difficult because nonleaf nodes hold the tree together and we do not want to disconnect the tree. * If a node has one child, it can be removed by having its parent bypass it. The root is a special case because it does not have a parent link. * A node with two children is replaced using the smallest item in the right subtree. Then another node is removed. >>> The cost of an operation is proportional to the depth of the last accessed node. The cost is logarithmic for a well-balanced tree, but it could be as bad as linear for a degenerate tree. The internal path length of a binary tree is the sum of the depths of its nodes. (It is used to measure the cost of a successful search.) >>> Theorem 19.1: The internal path length of a binary search tree is approximately 1.38 N log N on average, under tha assumption that all permutations are equally likely. The external path length of a binary tree is the sum of the depths of the N+1 null links. The terminating null node is considered a node for these purposes. (It is used to measure the cost of an unsuccessful search.) >>> Theorem 19.2: For any tree T, let IPL(T) be the internal path length of T and EPL(T) be its external path length. Then, if T has N nodes, EPL(T) = IPL(T) + 2N. Adelson-Velskii and Landis (AVL) Trees ---------------------------------------- * The first balanced binary search tree (but, not the last) * It was named after its discoverers. An AVL tree is a BST with the additional balance property that, for any node in the tree, the height of the left and right subtrees can differ by at most 1. As usual, the height of an empty subtree is -1. Every node in an AVL tree has subtrees whose heights differ by at most 1. Any empty subtree has height -1. The AVL tree has height at most roughly 44 percent greater than the minimum The depth of a typical node in an AVL tree is close to the optimal log N. An update in an AVL tree could destroy the balance. It must then be rebalanced before the operation can be considered complete. Only nodes on the path from the root to the insertion point can have their balances altered. If we fix the balance at the deepest unbalanced node, we rebalance the entire tree. There are four cases that we might have to fix; two are mirror images of the other two. Balance is restored by tree rotations. A single rotation switches the roles of the parent and child while maintaining the search order.