Sing, Pray Announcements =============== * How were the Help sessions? * Go see the TAs if you are struggling * Homework #4 Due Today Outline =========================== 19.7 Implementing TreeSet and TreeMap 20.1 Hash Tables 20.2 Hash Function 20.5 Chaining 19.7 Implementing TreeSet and TreeMap --------------- Uses underlying balanced BST similar to what we are doing with AVL tree (however, AA-tree) 20.1 Hash Tables and 20.2 Hash Function Go through notes... Extra Notes: --------------- Quadratic Probing examines cells 1,4,9 ... Theorem 20.4: If quadratic probing is used and the table size is prime, then a new element can element can always be inserted if the table is a at least half empty. Furthermore, in the course of the insertion, no cell is probed twice. 20.5 Chaining --------------- Separate chaining hashing is a space-efficient alternative to quadratic probing in which an array of linked lists is maintained. It is less sensitive to high load factors See example images in 'hashing' folder. Rehashing (from http://users.informatik.uni-halle.de/~jopsi/dinf204/notes_full.shtml) --------------- As we cannot always know beforehand for how many elements the data structure is intended, we must occasionally rebuild the whole structure into a larger one. This is simple: choose a new size N' and a new hash function. Then traverse the array and insert all elements encountered in the larger array. If N' is sufficiently large and we have no bad luck, this will cost O(N). Of course we can continue inserting until the array is entirely full, but this is not clever: the last insertions will cost linear time. The simplest criterion for rebuilding is the load factor. For example, one can rebuild as soon as the load factor exceeds 80%. This is quite static though and does not tak into account that sometimes we are running into bad hashing behavior because of an unlucky choice of the hash function. Therefore, it is better to rebuild as soon as performance becomes too poor. For example, one can rebuild as soon as the latest 1000 insertions took more than c * 1000 steps, for some small constant c. If it is important to keep the maximum time per operation bounded, then the rebuilding can be started somewhat earlier and run in the background until the new structure is ready. In this way the amortized cost of the rebuilding can be reduced to O(1).