7    VisuAlgo.net / /hashtable Login LP QP DH SC
Exploration Mode ▿

>

>
slow
fast
go to beginning previous frame pause play next frame go to end

Hash Table is a data structure to map key to values (also called Table or Map Abstract Data Type/ADT). It uses a hash function to map large or even non-Integer keys into a small range of Integer indices (typically [0..hash_table_size-1]).


The probability of two distinct keys colliding into the same index is relatively high and each of this potential collision needs to be resolved to maintain data integrity.


There are several collision resolution strategies that will be highlighted in this visualization: Open Addressing (Linear Probing, Quadratic Probing, and Double Hashing) and Closed Addressing (Separate Chaining). Try clicking Search(8) for a sample animation of searching a value in a Hash Table using Separate Chaining technique.


Click 'Next' (on the top right)/press 'Page Down' to advance this e-Lecture slide, use the drop down list/press 'Space' to jump to a specific slide, or Click 'X' (on the bottom right)/press 'Esc' to go to Exploration mode.


Remarks: By default, we show e-Lecture Mode for first time (or non logged-in) visitor.
Please login if you are a repeated visitor or register for an (optional) free account first.

X Esc
Next PgDn

Hashing is an algorithm (via a hash function) that maps large data sets of variable length, called keys, not necessarily Integers, into smaller Integer data sets of a fixed length.


A Hash Table is a data structure that uses a hash function to efficiently map keys to values (Table or Map ADT), for efficient search/retrieval, insertion, and/or removals.


Hash Table is widely used in many kinds of computer software, particularly for associative arrays, database indexing, caches, and sets.


In this e-Lecture, we will digress to Table ADT, the basic ideas of Hashing, the discussion of Hash Functions before going into the details of Hash Table itself.


Pro-tip: Since you are not logged-in, you may be a first time visitor who are not aware of the following keyboard shortcuts to navigate this e-Lecture mode: [PageDown] to advance to the next slide, [PageUp] to go back to the previous slide, [Esc] to toggle between this e-Lecture mode and exploration mode.

X Esc
Prev PgUp
Next PgDn

A Table ADT must support at least the following three operations as efficient as possible:

  1. Search(v) — determine if v exists in the ADT or not,
  2. Insert(v) — insert v into the ADT,
  3. Remove(v) — remove v from the ADT.

Hash Table is one possible implementation for this Table ADT (the other one is this).


PS: For two weaker implementations of Table ADT, you can click the respective link: unsorted array or a sorted array to read the detailed discussions.


Another pro-tip: We designed this visualization and this e-Lecture mode to look good on 1366x768 resolution or larger (typical modern laptop resolution in 2017). We recommend using Google Chrome to access VisuAlgo. Go to full screen mode (F11) to enjoy this setup. However, you can use zoom-in (Ctrl +) or zoom-out (Ctrl -) to calibrate this.

X Esc
Prev PgUp
Next PgDn

When the range of the Integer keys is small, e.g. [0..M-1], we can use an initially empty (Boolean) array A of size M and implement the following Table ADT operations directly:

  1. Search(v): Check if A[v] is true (filled) or false (empty),
  2. Insert(v): Set A[v] to be true (filled),
  3. Remove(v): Set A[v] to be false (empty).

That's it, we use the small Integer key itself to determine the address in array A, hence the name Direct Addressing. It is clear that all three major Table ADT operations are O(1).

X Esc
Prev PgUp
Next PgDn

In Singapore (as of May 2017), bus routes are numbered from [2..990].


Not all integers between [2..990] are currently used, e.g. there is no bus route 989 — Search(989) should return false. A new bus route x may be introduced, i.e. Insert(x) or an existing bus route y may be discontinued, i.e. Remove(y).


As the range of possible bus routes is small, to record the data whether a bus route number exists or not, we can use a DAT with a Boolean array of size 1 000.


Discussion: In real life class, we may discuss on why we use 1 000 instead of 990 (or 991).

X Esc
Prev PgUp
Next PgDn

Notice that we can always add satellite data instead of just using a Boolean array to record the existence of the keys.


For example, we can use an associative String array A instead to map a bus route number to its operator name, e.g.

A[2] = "Go-Ahead Singapore",
A[10] = "SBS Transit",
A[183] = "Tower Transit Singapore",
A[188] = "SMRT Buses", etc.

Discussion: Can you think of a few other real-life DAT examples?

X Esc
Prev PgUp
Next PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
Prev PgUp
Next PgDn

The keys must be (or can be easily mapped to) non-negative Integer values.
Basic DAT has problem in the full version of the example in the previous two slides as there are actually variations of bus route numbers in Singapore, e.g. 96B, 151A, NR10, etc.


The range of keys must be small.
The memory usage will be (insanely) large if we have (insanely) large range.


The keys must be dense, i.e. not many gaps in the key values.
DAT will contain too many empty cells otherwise.


We will overcome these restrictions with hashing.

X Esc
Prev PgUp
Next PgDn

Using hashing, we can:

  1. Map (some) non-Integer keys to Integers keys,
  2. Map large Integers to smaller Integers,
  3. Influence the density, or load factor α = N/M, of the Hash Table where N is the number of keys and M is the size of the Hash Table.
X Esc
Prev PgUp
Next PgDn

For example, we have N = 400 Singapore phone numbers (Singapore phone number has 8 digits, so there are up to 10^8 = 100M possible phone numbers in Singapore).


Instead of using a DAT and use a gigantic array up to size M = 100 Million, we can use the following simple hash function h(v) = v%997.


This way, we map 8 digits phone numbers 6675 2378 and 6874 4483 into up to 3 digits h(6675 2378) = 237 and h(6874 4483) = 336, respectively. Therefore, we only need to prepare array of size M = 997 instead of M = 100 Million.

X Esc
Prev PgUp
Next PgDn

With hashing, we can now implement the following Table ADT operations using Integer array (instead of Boolean array) as follows:

  1. Search(v): Check if A[h(v)] != -1 (we use -1 for an empty cell assuming v ≥ 0),
  2. Insert(v): Set A[h(v)] = v (we hash v into h(v) so we need to somehow record key v),
  3. Remove(v): Set A[h(v)] = -1 — to be elaborated further.
X Esc
Prev PgUp
Next PgDn

If we have keys that map to satellite data and we want to record the original keys too, we can implement the Hash Table using pair of (Integer, satellite-data-type) array as follows:

  1. Search(v): Return A[h(v)], which is a pair (v, satellite-data), possibly empty,
  2. Insert(v, satellite-data): Set A[h(v)] = pair(v, satellite-data),
  3. Remove(v): Set A[h(v)] = (empty pair) — to be elaborated further.

However, by now you should notice that something is incomplete...

X Esc
Prev PgUp
Next PgDn

A hash function may, and quite likely, map different keys (Integer or not) into the same Integer slot, i.e. a many-to-one mapping instead of one-to-one mapping.


For example, h(6675 2378) = 237 from three slides earlier and if we want to insert another phone number 6675 4372, we will have a problem as h(6675 4372) = 237 too.


This situation is called a collision, i.e. two (or more) keys have the same hash value.

X Esc
Prev PgUp
Next PgDn

The Birthday (von Mises) Paradox asks: 'How many people (keys) must be in a room (Hash Table) before the probability that some share a birthday (collision), ignoring the year and leap days (i.e. all years have 365 days), becomes at least 50 percent (very likely)?'


The answer, which maybe surprising for some of us, is Reveal.


Discussion: Why?

X Esc
Prev PgUp
Next PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
Prev PgUp
Next PgDn

Issue 1: We have seen a simple hash function like the h(v) = x%997 used in Phone Numbers example that maps large range of Integer keys into a smaller range of Integer keys, but how about non Integer keys? How to do such hashing efficiently?


Issue 2: We have seen that by hashing, or mapping, large range into smaller range, there will very likely be a collision. How to deal with them?

X Esc
Prev PgUp
Next PgDn

How to create a good one with these desirable properties?

  1. Fast to compute, i.e. in O(1),
  2. Uses as minimum slots/Hash Table size M as possible,
  3. Scatter the keys into different base addresses as uniformly as possible ∈ [0..M-1],
  4. Experience as minimum collisions as possible.
X Esc
Prev PgUp
Next PgDn

Suppose we have a hash table of size M where keys are used to identify the satellite-data and a specific hash function is used to compute a hash value.


A hash value/hash code of key v is computed from the key v with the use of a hash function to get an Integer in the range 0 to M-1. This hash value is used as the base/home index/address of the Hash Table entry for the satellite-data.

X Esc
Prev PgUp
Next PgDn

Using the Phone Numbers example, if we we define h(v) = floor(v/1 000 000),
i.e. we select the first two digits a phone number.

h(66 75 2378) = 66
h(68 74 4483) = 68

Discuss: What happen when you use that hash function? Hint: See this.

X Esc
Prev PgUp
Next PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
Prev PgUp
Next PgDn

Before discussing the reality, let's discuss the ideal case: perfect hash functions.


A perfect hash function is a one-to-one mapping between keys and hash values, i.e. no collision at all. It is possible if all keys are known beforehand. For example, a compiler/interpreter search for reserved keywords. However, such cases are rare.


A minimal perfect hash function is achieved when the table size is the same as the number of keywords supplied. This case is even rarer.


If you are interested, you can explore GNU gperf, a freely available perfect hash function generator written in C++ that automatically constructs perfect functions (a C++ program) from a user supplied list of keywords.

X Esc
Prev PgUp
Next PgDn

People has tried various ways to hash a large range of Integers into a smaller range of Integers as uniformly as possible. In this e-Lecture, we jump directly to one of the best and most popular version: h(v) = v%M, i.e. map v into Hash Table of size M slots. The (%) is a modulo operator that gives the remainder after division. This is clearly fast, i.e. O(1) assuming that v does not exceed natural Integer data type limit.


The Hash Table size M is set to be a reasonably large prime not near a power of 2, about 2+ times larger than the expected number of keys N that will ever be used in the Hash Table. This way, the load factor α = N/M < 0.5 — we shall see later that having low load factor, thereby sacrificing empty spaces, help improving Hash Table performance.


Discuss: What if we set M to be a power of 10 (decimal) or power of 2 (binary)?

X Esc
Prev PgUp
Next PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
Prev PgUp
Next PgDn

People has also tried various ways to hash Strings into a small range of Integers as uniformly as possible. In this e-Lecture, we jump directly to one of the best and most popular version, shown below:

function h(v) // assumption 1: v uses UPPERCASE chars ['A'..'Z'] only
sum = 0
for each character c in v // assumption 2: v is a short string
sum = sum*26 + (ASCII-value-of(c)-'A')
return sum%M

Discussion: In real life class, discuss the components of the hash function above, e.g. why loop through all characters?, will that be slower than O(1)?, why multiply with 26?, what if the string v uses more than just UPPERCASE chars?, etc

X Esc
Prev PgUp
Next PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
Prev PgUp
Next PgDn

There are two major ideas: Open Addressing versus Closed Addressing method.


In Open Addressing, all hashed keys are located in a single array. The hash code of a key gives its base address. Collision is resolved by checking/probing multiple alternative addresses (hence the name open) in the table based on a certain rule.


In Closed Addressing, the Hash Table looks like an Adjacency List (a graph data structure). The hash code of a key gives its fixed/closed base address. Collision is resolved by appending the collided keys inside a (Doubly) Linked List identified by the base address.

X Esc
Prev PgUp
Next PgDn

There are three Open Addressing collision resolution techniques discussed in this visualization: Linear Probing (LP), Quadratic Probing (QP), and Double Hashing (DH).


To switch between the three modes, please click on the respective header.


Let:
M = HT.length = the current hash table size,
base = (key%HT.length),
step = the current probing step,
secondary = smaller_prime - key%smaller_prime (to avoid zero — elaborated soon)

We will soon see that the probing sequences of the three modes are:
Linear Probing: i=(base+step*1) % M,
Quadratic Probing: i=(base+step*step) % M, and
Double Hashing: i=(base+step*secondary) % M.

X Esc
Prev PgUp
Next PgDn

Separate Chaining collision resolution technique is simple. We use M copies of auxiliary data structures, usually Doubly Linked Lists. If two keys a and b both have the same hash value i, both will be appended to the (front/back) of Doubly Linked List i (in this visualization, we append to the back in O(1) with help of tail pointer).


If we use Separate Chaining, the load factor α = N/M describes the average length of the M lists and it will determine the performance of Search(v) as we may have to explore α elements on average. As Remove(v) — also requires Search(v), its performance will be similar as Search(v). Insert(v) is clearly O(1).


If we can bound α to be a small constant, all Search(v), Insert(v), and Remove(v) operations using Separate Chaining will be O(1).

X Esc
Prev PgUp
Next PgDn

View the visualisation of Hash Table above.


In this visualization, we prevent insertion of duplicate keys.


Due to limited screen space, we limit the maximum Hash Table size to be M = 19.


The Hash Table is visualized horizontally like an array where index 0 is placed leftmost and index M-1 is placed rightmost but the details are different when we are visualizing Open Addressing versus Separate Chaining collision resolution techniques.

X Esc
Prev PgUp
Next PgDn

There are three Open Addressing collision resolution techniques discussed in this visualization: Linear Probing (LP), Quadratic Probing (QP), and Double Hashing (DH).


For all three techniques, each Hash Table cell is displayed as a vertex with cell value of [0..99] displayed as the vertex label. Without loss of generality, we do not show any satellite data in this visualization as we concentrate only on the arrangement of the keys. We reserve value -1 to indicate an 'EMPTY cell' (visualized as a blank vertex) and -2 to indicate a 'DELETED cell' (visualized as a vertex with abbreviated label "DEL"). The cell indices are shown as red label below each vertex.

X Esc
Prev PgUp
Next PgDn

For Separate Chaining (SC) collision resolution technique, the first row contains the M "H" (Head) pointers of M Doubly Linked Lists.


Then, each Doubly Linked List i contains all keys that are hashed into i in arbitrary order. Again, we do not store any satellite data in this visualization.

X Esc
Prev PgUp
Next PgDn

In Linear Probing collision resolution technique, we scan forwards one index at a time for the next empty/deleted slot (wrapping around when we have reached the last slot) whenever there is a collision.


For example, let's assume we start with an empty Hash Table HT with table size M = HT.length = 7 as shown above that uses index 0 to M-1 = 7-1 = 6. Notice that 7 is a prime number. The (primary) hash function is simple, h(v) = v%M.


This walk-through will show you the steps taken by Insert(v), Search(v), and Remove(v) operations when using linear probing as collision resolution technique.

X Esc
Prev PgUp
Next PgDn

Now click Insert([18,14,21]) — three individual insertions in one command.


Recap (to be shown after you click the button above).


Formally, we describe Linear Probing index i as i = (base+step*1) % M where base is the (primary) hash value of key v, i.e. h(v) and step is the linear probing step starting from 1.

X Esc
Prev PgUp
Next PgDn

Now click Insert([1,35]) (on top of the first three values inserted in the previous slide).


Recap (to be shown after you click the button above)

X Esc
Prev PgUp
Next PgDn

Now we illustrate Search(v) operation while using Linear Probing as collision resolution technique. The steps taken are very similar as with Insert(v) operation, i.e. we start from the (primary) hash key value and check if we have found v, otherwise we move one index forward at a time (wrapping around if necessary) and recheck on whether we have found v. We stop when we encounter an empty cell which implies that v is not in Hash Table at all (as earlier Insert(v) operation would have placed v there otherwise).


Now click Search(35) — you should see probing sequence [0,1,2,3 (key 35 found)].


Now click Search(8) — [1,2,3,4, 5 (empty cell, so key 8 is not found in the hash table)].

X Esc
Prev PgUp
Next PgDn

Now let's discuss Remove(v) operation.


If we just set HT[i] = EMPTY cell straightaway where i is the index that contains v (after linear probing if necessary), do you realize that we will cause a problem? Why?


Hint: Review the past three slides on how Insert(v) and Search(v) behave.

X Esc
Prev PgUp
Next PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
Prev PgUp
Next PgDn

Now let's see the complete Remove(v). If we find v at index i (after linear probing if necessary), we have to set HT[i] = DELETED (abbreviated as DEL in this visualization) where DEL is a special symbol (generally you should only use a symbol that is not used in your application) to indicate that cell can be by-passed if necessary by future Search(v), but can be overwritten by future Insert(w). This strategy is called Lazy Deletion.


Now click Remove(21) — [0,1 (key 21 found and we set H[1] = DEL)].


Afterwards, please continue the discussion in the next slide.

X Esc
Prev PgUp
Next PgDn

Now click Search(35) — [0,1 (bypassing that DELETED cell), 2,3 (found key 35)].


Imagine what would have happened if we wrongly set H[1] = EMPTY.

X Esc
Prev PgUp
Next PgDn

Now click Insert(28) — you should see probing sequence [0,1 (found a cell with DEL symbol)], so it is actually can be overwritten with a new value without affecting the correctness of future Search(v). Therefore, we put 28 in index 1.

X Esc
Prev PgUp
Next PgDn

Although we can resolve collision with Linear Probing, it is not the most effective way.


We define a cluster to be a collection of consecutive occupied slots. A cluster that covers the base address of a key is called the primary cluster of the key.


Now notice that Linear Probing can create large primary clusters that will increase the running time of Search(v)/Insert(v)/Remove(v) operations beyond the advertised O(1).


See an example above with M = 11 and we have inserted keys that are all 6 (modulo 11). Now see how 'slow' Insert(94) is.

X Esc
Prev PgUp
Next PgDn

The probe sequence of linear probing can be formally described as follows:

 h(v) // base address
(h(v) + 1*1) % M // 1st probing step if there is a collision
(h(v) + 2*1) % M // 2nd probing step if there is still a collision
(h(v) + 3*1) % M // 3rd probing step if there is still a collision
...
(h(v) + k*1) % M // k-th probing step, etc...

During Insert(v), if there is a collision but there is an empty (or DEL) slot remains in the Hash Table, we are sure to find it after at most M linear probing steps. And when we do, the collision will be resolved, but the primary cluster of the key v is expanded as a result and future Hash Table operations will get slower too.


The primary cluster size can be very big due to the annexation of neighbouring clusters.

X Esc
Prev PgUp
Next PgDn

To reduce primary clustering, we can modify the probe sequence to:

 h(v) // base address
(h(v) + 1*1) % M // 1st probing step if there is a collision
(h(v) + 2*2) % M // 2nd probing step if there is still a collision
(h(v) + 3*3) % M // 3rd probing step if there is still a collision
...
(h(v) + k*k) % M // k-th probing step, etc...

That's it, the probe jumps quadratically, wrapping around the Hash Table as necessary.


Common mistake (this is a different kind of Quadratic Probing): Doing h(v), (h(v)+1) % M, (h(v)+1+4) % M, (h(v)+1+4+9) % M, ...

X Esc
Prev PgUp
Next PgDn

Assume that we have called Insert(18) and Insert(10) into an initially empty Hash Table of size M = HT.length = 7. As 18%7 = 4 and 10%7 = 3, 18 and 3 do not collide and both reside in index 4 and 3 respectively as shown above.


Now, let's click Insert(38).


Recap (to be shown after you click the button above).

X Esc
Prev PgUp
Next PgDn

Remove(x) and Search(y) operations are defined similarly. Just that this time we use Quadratic Probing instead of Linear Probing.


For example, assume that we have called Remove(18) after the previous slide and we mark HT[4] = DEL. If we then call Search(38), we will use the same Quadratic Probing sequence as with previous slide, but passing through HT[4] which marked as DELETED.

X Esc
Prev PgUp
Next PgDn

In a glance, Quadratic Probing that jumps +1, +4, +9, +16, ... quadratically seems able to solve the primary clustering issue that we have with Linear Probing earlier, but is it the perfect collision resolution technique?


Try Insert([12,17]).


Discuss: What happened?

X Esc
Prev PgUp
Next PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
Prev PgUp
Next PgDn

If α < 0.5 and M is a prime, then we can always find an empty slot using Quadratic Probing. Recall: α is the load factor and M is the Hash Table size (HT.length).


Discuss: Why is that so?

X Esc
Prev PgUp
Next PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
Prev PgUp
Next PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
Prev PgUp
Next PgDn

In Quadratic Probing, clusters are formed along the path of probing, instead of around the base address like in Linear Probing. These clusters are called Secondary Clusters.


Secondary clusters are formed as a result of using the same pattern in probing by all keys. Notice that if two distinct keys have the same base address, their Quadratic Probing sequences are going to be the same.


Secondary clustering in Quadratic Probing is not as bad as primary clustering in Linear Probing as a good hash function should theoretically disperse the keys into different base addresses ∈ [0..M-1] in the first place.

X Esc
Prev PgUp
Next PgDn

To reduce primary and secondary clustering, we can modify the probe sequence to:

 h(v) // base address
(h(v) + 1*h2(v)) % M // 1st probing step if there is a collision
(h(v) + 2*h2(v)) % M // 2nd probing step if there is still a collision
(h(v) + 3*h2(v)) % M // 3rd probing step if there is still a collision
...
(h(v) + k*h2(v)) % M // k-th probing step, etc...

That's it, the probe jumps according to the value of the second hash function h2(v), wrapping around the Hash Table as necessary.

X Esc
Prev PgUp
Next PgDn

If h2(v) = 1, then Double Hashing works exactly the same as Linear Probing.
So we generally wants h2(v) > 1 to avoid primary clustering.


If h2(v) = 0, then Double Hashing does not work for an obvious reason as any probing step multiplied by 0 remains 0, i.e. we stay at the base address forever during a collision
We need to avoid this.


Usually (for Integer keys), h2(v) = M' - v%M' where M' is a smaller prime than M.
This makes h2(v) ∈ [1..M'], which is diverse enough to avoid secondary clustering.


The usage of the secondary hash function makes it theoretically hard to have either primary or secondary clustering issue.


Discussion: In real life class, we may discuss on why h2(v) is designed that way.

X Esc
Prev PgUp
Next PgDn

Click Insert([35,42]) to insert 35 and then 42 to the current Hash Table above.


Recap (to be shown after you click the button above).

X Esc
Prev PgUp
Next PgDn

Remove(x) and Search(y) operations are defined similarly. Just that this time we use Double Hashing instead of Linear Probing or Quadratic Probing.


For example, assume that we have called Remove(17) after the previous slide and we mark HT[3] = DEL. If we then call Search(35), we will use the same Double Hashing sequence as with previous slide, but passing through HT[3] which marked as DELETED.

X Esc
Prev PgUp
Next PgDn

In summary, a good Open Addressing collision resolution technique needs to:

  1. Always find an empty slot if it exists,
  2. Minimize clustering,
  3. Give different probe sequences when 2 different keys collide,
  4. Fast, O(1).
X Esc
Prev PgUp
Next PgDn

Try Insert([9,16,23,30,37,44]) to see how Insert(v) operation works if we use Separate Chaining as collision resolution technique. Note that all Integers {9,16,23,30,37,44} are 2 (modulo 7) so all of them will be appended into the (back of) Doubly Linked List 2 and each insertion is clearly O(1).


Due to the screen limitation, we limit the length of each Doubly Linked List to be 6.

X Esc
Prev PgUp
Next PgDn

Try Search(35) to see that Search(v) can be made to run in O(1+α).


Try Remove(7) to see that Remove(v) can be made to run in O(1+α) too.

X Esc
Prev PgUp
Next PgDn
Discussion: After all these explanations, which of the two collision resolution technique is the better one?
X Esc
Prev PgUp
Next PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
Prev PgUp
Next PgDn

You have reached the end of the basic stuffs of this Hash Table data structure and we encourage you to explore further in the Exploration Mode.


However, we still have a few more interesting Hash Table challenges for you that are outlined in this section.

X Esc
Prev PgUp
Next PgDn

The performance of Hash Table degrades when the load factor α gets higher. For (standard) Quadratic Probing collision resolution technique, insertions might fail when the Hash Table α > 0.5.


If that happens, we can rehash. We build another Hash Table about twice as big with a new hash function. We go through all keys in the original Hash Table, recompute the new hash values, and re-insert the keys (with their satellite-data) into the new, bigger Hash Table, before finally we delete the older, smaller Hash Table.


A rule of thumb is to rehash when α ≥ 0.5 if using Open Addressing and when α > small constant close to 1.0 if using Separate Chaining.


If we know the maximum number of total possible keys, we can always influence α to be a low number.

X Esc
Prev PgUp
Next PgDn

However, if you ever need to implement a Hash Table in C++ or Java and your keys are either Integers or Strings, you can use the built-in C++ STL or Java API. They already have good built-in implementation of default hash functions for Integers or Strings.


See C++ STL unordered_map, unordered_set or Java HashMap, HashSet.


Notice that multimap/multiset implementations also exist (duplicate keys are allowed).

X Esc
Prev PgUp
Next PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
Prev PgUp
Next PgDn

Hash Table is an extremely good data structure to implement Table ADT if the (Integer or String) keys only need to be mapped to satellite-data, with O(1) performance for Search(v), Insert(v), and Remove(v) operations if the hash table is set up properly.


However, if we need to do much more with the keys, we may need to use an alternative data structure.

X Esc
Prev PgUp
Next PgDn

For a few more interesting questions about this data structure, please practice on Hash Table training module (no login is required, but short and of medium difficulty setting only).


However, for registered users, you should login and then go to the Main Training Page to officially clear this module and such achievement will be recorded in your user account.

X Esc
Prev PgUp
Next PgDn

Try to solve two basic programming problems that somewhat requires the usage of Hash Table: Kattis - cd (the inputs are already sorted so alternative, non Hash Table solution exists) and Kattis - whatdoesthefoxsay.

X Esc
Prev PgUp
Next PgDn

As the action is being carried out, each step will be described in the status panel.

X Esc
Prev PgUp
Next PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

X Esc
Prev PgUp
Next PgDn

Control the animation with the player controls! Keyboard shortcuts are:

Spacebar: play/pause/replay
Left/right arrows: step backward/step forward
-/+: decrease/increase speed
X Esc
Prev PgUp
Next PgDn

Return to 'Exploration Mode' to start exploring!


Note that if you notice any bug in this visualization or if you want to request for a new visualization feature, do not hesitate to drop an email to the project leader: Dr Steven Halim via his email address: stevenhalim at gmail dot com.

X Esc
Prev PgUp

Create

Insert(v)

Remove(v)

>

Create empty hash table of size =

Go

v =

Go

v =

Go

About Team Terms of use

About

VisuAlgo was conceptualised in 2011 by Dr Steven Halim as a tool to help his students better understand data structures and algorithms, by allowing them to learn the basics on their own and at their own pace.

VisuAlgo contains many advanced algorithms that are discussed in Dr Steven Halim's book ('Competitive Programming', co-authored with his brother Dr Felix Halim) and beyond. Today, some of these advanced algorithms visualization/animation can only be found in VisuAlgo.

Though specifically designed for National University of Singapore (NUS) students taking various data structure and algorithm classes (e.g. CS1010, CS1020, CS2010, CS2020, CS3230, and CS3230), as advocators of online learning, we hope that curious minds around the world will find these visualisations useful too.

VisuAlgo is not designed to work well on small touch screens (e.g. smartphones) from the outset due to the need to cater for many complex algorithm visualizations that require lots of pixels and click-and-drag gestures for interaction. The minimum screen resolution for a respectable user experience is 1024x768 and only the landing page is relatively mobile-friendly.

VisuAlgo is an ongoing project and more complex visualisations are still being developed.

The most exciting development is the automated question generator and verifier (the online quiz system) that allows students to test their knowledge of basic data structures and algorithms. The questions are randomly generated via some rules and students' answers are instantly and automatically graded upon submission to our grading server. This online quiz system, when it is adopted by more CS instructors worldwide, should technically eliminate manual basic data structure and algorithm questions from typical Computer Science examinations in many Universities. By setting a small (but non-zero) weightage on passing the online quiz, a CS instructor can (significantly) increase his/her students mastery on these basic questions as the students have virtually infinite number of training questions that can be verified instantly before they take the online quiz. The training mode currently contains questions for 12 visualization modules. We will soon add the remaining 8 visualization modules so that every visualization module in VisuAlgo have online quiz component.

Another active branch of development is the internationalization sub-project of VisuAlgo. We want to prepare a database of CS terminologies for all English text that ever appear in VisuAlgo system. This is a big task and requires crowdsourcing. Once the system is ready, we will invite VisuAlgo visitors to contribute, especially if you are not a native English speaker. Currently, we have also written public notes about VisuAlgo in various languages: zh, id, kr, vn, th.

Team

Project Leader & Advisor (Jul 2011-present)
Dr Steven Halim, Senior Lecturer, School of Computing (SoC), National University of Singapore (NUS)
Dr Felix Halim, Software Engineer, Google (Mountain View)

Undergraduate Student Researchers 1 (Jul 2011-Apr 2012)
Koh Zi Chun, Victor Loh Bo Huai

Final Year Project/UROP students 1 (Jul 2012-Dec 2013)
Phan Thi Quynh Trang, Peter Phandi, Albert Millardo Tjindradinata, Nguyen Hoang Duy

Final Year Project/UROP students 2 (Jun 2013-Apr 2014)
Rose Marie Tan Zhao Yun, Ivan Reinaldo

Undergraduate Student Researchers 2 (May 2014-Jul 2014)
Jonathan Irvin Gunawan, Nathan Azaria, Ian Leow Tze Wei, Nguyen Viet Dung, Nguyen Khac Tung, Steven Kester Yuwono, Cao Shengze, Mohan Jishnu

Final Year Project/UROP students 3 (Jun 2014-Apr 2015)
Erin Teo Yi Ling, Wang Zi

Final Year Project/UROP students 4 (Jun 2016-Dec 2017)
Truong Ngoc Khanh, John Kevin Tjahjadi, Gabriella Michelle, Muhammad Rais Fathin Mudzakir

List of translators who have contributed ≥100 translations can be found at statistics page.

Acknowledgements
This project is made possible by the generous Teaching Enhancement Grant from NUS Centre for Development of Teaching and Learning (CDTL).

Terms of use

VisuAlgo is free of charge for Computer Science community on earth. If you like VisuAlgo, the only payment that we ask of you is for you to tell the existence of VisuAlgo to other Computer Science students/instructors that you know =) via Facebook, Twitter, course webpage, blog review, email, etc.

If you are a data structure and algorithm student/instructor, you are allowed to use this website directly for your classes. If you take screen shots (videos) from this website, you can use the screen shots (videos) elsewhere as long as you cite the URL of this website (http://visualgo.net) and/or list of publications below as reference. However, you are NOT allowed to download VisuAlgo (client-side) files and host it on your own website as it is plagiarism. As of now, we do NOT allow other people to fork this project and create variants of VisuAlgo. Using the offline copy of (client-side) VisuAlgo for your personal usage is fine.

Note that VisuAlgo's online quiz component is by nature has heavy server-side component and there is no easy way to save the server-side scripts and databases locally. Currently, the general public can only use the 'training mode' to access these online quiz system. Currently the 'test mode' is a more controlled environment for using these randomly generated questions and automatic verification for a real examination in NUS. Other interested CS instructor should contact Steven if you want to try such 'test mode'.

List of Publications

This work has been presented briefly at the CLI Workshop at the ACM ICPC World Finals 2012 (Poland, Warsaw) and at the IOI Conference at IOI 2012 (Sirmione-Montichiari, Italy). You can click this link to read our 2012 paper about this system (it was not yet called VisuAlgo back in 2012).

This work is done mostly by my past students. The most recent final reports are here: Erin, Wang Zi, Rose, Ivan.

Bug Reports or Request for New Features

VisuAlgo is not a finished project. Dr Steven Halim is still actively improving VisuAlgo. If you are using VisuAlgo and spot a bug in any of our visualization page/online quiz tool or if you want to request for new features, please contact Dr Steven Halim. His contact is the concatenation of his name and add gmail dot com.