Comments on: Hash Tables (Part 1) https://webkit.org/blog/6/hashtables-part-1/ Open Source Web Browser Engine Wed, 18 Nov 2015 23:58:56 +0000 hourly 1 https://wordpress.org/?v=6.5.2 By: maciej https://webkit.org/blog/6/hashtables-part-1/comment-page-1/#comment-40 Fri, 15 Jul 2005 03:26:16 +0000 http://webkit.opendarwin.org/blog/?p=6#comment-40 As you will soon discover in future parts of this series, the new WebCore hashtable does extremely well in terms of collision rate, through the combination of good algorithms and good hash functions.

]]>
By: iSee https://webkit.org/blog/6/hashtables-part-1/comment-page-1/#comment-39 Wed, 13 Jul 2005 10:58:35 +0000 http://webkit.opendarwin.org/blog/?p=6#comment-39 On Hash-tables.
Let me clear up some common misconceptions about hash tables:
While it is feasible to get actually very fast lookups using hash functions and the O(1) access is commonly stated, but strictly speaking, only true if your key-length and your address-space (pointer length) are considered constant.
Though the later one might be ignored (except they can lead to slow-down when moving to a 64-bit architecture), the former one poses a real problem.
The best-case lookup complexity can not be better then O(l) where l is the length of the key for key-value lookups (you can’t possibly do a lookup without looking at the key). Again this means the best-case (with the key-space is fully utilized) you get a lookup complexity of log(n) where n is the number of entries which happens to be the complexity of balanced search trees.
Having said that, as with quicksort, hash tables can be much faster in practice than balanced search trees and there is hope they help in some places making Safari a neat, fast browser.
However, as with quicksort, there are some pitfalls you should be aware of that can adversely affect complexity. For quicksort, these are sorted arrays and reversed arrays, for hash tables you need to care about clashes and deletes that make the use of hash tables for caches a questionable choice unless you choose to just overwrite the old value in case of a clash.
Let me explain: A hash function maps the key space to an array position where the value is stored typically making it much shorter (if you would not make it shorter you could just use the key and use an array big enough to hold an entry for each possible key – and in fact if your key is suitable that is what you should do). As the physical key is much smaller, there will be clashes (i.e. different keys that are mapped onto the same slot). And there will be clashes. I do not know the exact finish. Even under ideal conditions (i.e. complete randomness) pick 23 people and it is more likely that two of them have a birthday on the same day of the year than not. Of course hash table algorithms can cope with clashes (otherwise they would basically be useless), but there is some hassle involved with it, especially if you choose to store the clashing value inside the same array as the other values. In that case it will be hard to ever really delete values after there was a clash, and you will over time end up doing a linear search through the hash table which is very slow.
So if you (the reader) are ever using hash-tables for read-write data structures that exist for a long time, be sure to have a fallback data structure for clashes or some smart and efficient mechanism to delete entries without leaving delete-markers around or cause local contention.
If you are looking for alternatives, there is at least one. In the need of a very dynamic cache data structure in an earlier project, I have created/(re-?)invented a data structure I call “algebraic search tree” that is fast and scalable and works basically by chopping the key into parts and applying each part to a little lookup table. If you want further information, feel free to contact me (i underscore see at macnews.de)

]]>