Hashing and Hash Tables, Trees & Binary Tree
Hashing
Suppose we want to design a system for storing
employee records keyed using phone numbers. And we want following queries to be
performed efficiently:
- Insert a phone number and corresponding information.
- Search a phone number and fetch
the information.
- Delete a phone number and related information.
We can think of using the following data
structures to maintain information about different phone numbers.
- Array of phone numbers and
records.
- Linked List of phone numbers and
records.
- Balanced binary search tree with
phone numbers as keys.
- Direct Access Table.
Hashing is an improvement over Direct Access
Table. The idea is to use hash function that converts a given phone number or
any other key to a smaller number and uses the small number as index in a table
called hash table.
Hash Function: A function that converts a
given big phone number to a small practical integer value. The mapped integer
value is used as an index in hash table. In simple terms, a hash function maps
a big number or string to a small integer that can be used as index in hash
table.
A good hash function should have following
properties
1) Efficiently computable.
2) Should uniformly distribute the keys (Each
table position equally likely for each key)
For example for phone numbers a bad hash
function is to take first three digits. A better function is consider last
three digits. Please note that this may not be the best hash function. There
may be better ways.
There are many ways to
hash a string into a key. The following are some of the important methods for
constructing hash functions.
Folding
A folding hash code is produced by dividing
the input into n sections of m bits, where 2^m is the table size, and using a
parity-preserving bitwise operation like ADD or XOR, to combine the sections.
The final operation is a mask or shift to trim off any excess bits at the high
or low end. For example, for a table size of 15 bits and key value of
0x0123456789ABCDEF, there are 5 sections 0x4DEF, 0x1357, 0x159E, 0x091A and
0x8. Adding, we obtain 0x7AA4, a 15-bit value.
Mid-squares
A mid-squares hash code is produced by
squaring the input and extracting an appropriate number of middle digits or
bits. For example, if the input is 123,456,789 and the hash table size 10,000,
squaring the key produces 1.524157875019e16, so the hash code is taken as the
middle 4 digits of the 17-digit number (ignoring the high digit) 8750. The
mid-squares method produces a reasonable hash code if there are not a lot of
leading or trailing zeros in the key. This is a variant of multiplicative
hashing, but not as good, because an arbitrary key is not a good multiplier.
Division hashing
A standard technique is to use a modulo
function on the key, by selecting a divisor {\displaystyle M}M which is a prime
number close to the table size, so {\displaystyle h(K)=K\mod M}{\displaystyle
h(K)=K\mod M}. The table size is usually a power of 2. This gives a
distribution from {\displaystyle \{0,M-1\}}{\displaystyle \{0,M-1\}}. This
gives good results over a large number of key sets. A significant drawback of
division hashing is that division is microprogrammed on most modern
architectures including x86, and can be 10 times slower than multiply. A second
drawback is that it won't break up clustered keys. For example, the keys
123000, 456000, 789000, etc. modulo 1000 all map to the same address. This
technique works well in practice because many key sets are sufficiently random
already, and the probability that a key set will be cyclical by a large prime
number is small.
Hash Table: An array that stores pointers to
records corresponding to a given phone number. An entry in hash table is NIL if
no existing phone number has hash function value equal to the index for the
entry.
Collision Handling: Since a hash function gets
us a small number for a big key, there is possibility that two keys result in
same value. The situation where a newly inserted key maps to an already
occupied slot in hash table is called collision and must be handled using some
collision handling technique. Following are the ways to handle collisions:
- Chaining: The idea is to make each cell of hash table point to a linked list of records that have same hash function value. Chaining is simple, but requires additional memory outside the table.
- Open Addressing: In open addressing, all elements are stored in the hash table itself. Each table entry contains either a record or NIL. When searching for an element, we one by one examine table slots until the desired element is found or it is clear that the element is not in the table.
Binary Tree
A
tree whose elements have at most 2 children is called a binary tree. Since each
element in a binary tree can have only 2 children, we typically name them the
left and right child. Trees: Unlike Arrays,
Linked Lists, Stack and queues, which are linear data structures, trees are
hierarchical data structures.
Tree Vocabulary: The topmost node is called root
of the tree. The elements that are directly under an element are called its
children. The element directly above something is called its parent. For
example, ‘a’ is a child of ‘f’, and ‘f’ is the parent of ‘a’. Finally, elements
with no children are called leaves.
Why using trees?
1. One reason to use trees might be because you want to store information that naturally forms a hierarchy. For example, the file system on a computer:
file system ----------- / <-- root / \ ... home / \ ugrad course / / | \ ... cs101 cs112 cs113
2. Trees (with some ordering e.g., BST) provide moderate access/search (quicker than Linked List and slower than arrays).
3. Trees provide moderate insertion/deletion (quicker than Arrays and slower than Unordered Linked Lists).
4. Like Linked Lists and unlike Arrays, Trees don’t have an upper limit on number of nodes as nodes are linked using pointers.
Main applications of
trees include:
- Manipulate hierarchical data.
- Make information easy to search (see tree traversal).
- Manipulate sorted lists of data.
- As a workflow for compositing digital images for visual effects.
- Router algorithms
- Form of a multi-stage decision-making (see business chess).
Binary Tree: A tree
whose elements have at most 2 children is called a binary tree. Since each
element in a binary tree can have only 2 children, we typically name them the
left and right child. Binary Tree
Representation in C: A tree is represented by a pointer to the topmost node in
tree. If the tree is empty, then value of root is NULL.
A Tree node contains
following parts.
- Data
- Pointer to left child
- Pointer to right child
Properties of Binary Trees
1) The maximum number of nodes at level ‘l’ of a binary tree is 2l-1.
Here level is number of nodes on path from root to the node (including root and node). Level of root is 1. This can be proved by induction.
For root, l = 1, number of nodes = 21-1 = 1
Assume that maximum number of nodes on level l is 2l-1
Since in Binary tree every node has at most 2 children, next level would have twice nodes, i.e. 2 * 2l-1
Here level is number of nodes on path from root to the node (including root and node). Level of root is 1. This can be proved by induction.
For root, l = 1, number of nodes = 21-1 = 1
Assume that maximum number of nodes on level l is 2l-1
Since in Binary tree every node has at most 2 children, next level would have twice nodes, i.e. 2 * 2l-1
2) Maximum number of nodes in a binary tree of height ‘h’ is 2h – 1.
Here height of a tree is maximum number of nodes on root to leaf path. Height of a tree with single node is considered as 1. This result can be derived from point 2 above. A tree has maximum nodes if all levels have maximum nodes. So maximum number of nodes in a binary tree of height h is 1 + 2 + 4 + .. + 2h-1. This is a simple geometric series with h terms and sum of this series is 2h – 1.
In some books, height of the root is considered as 0. In this convention, the above formula becomes 2h+1 – 1
Here height of a tree is maximum number of nodes on root to leaf path. Height of a tree with single node is considered as 1. This result can be derived from point 2 above. A tree has maximum nodes if all levels have maximum nodes. So maximum number of nodes in a binary tree of height h is 1 + 2 + 4 + .. + 2h-1. This is a simple geometric series with h terms and sum of this series is 2h – 1.
In some books, height of the root is considered as 0. In this convention, the above formula becomes 2h+1 – 1
3) In a Binary Tree with N nodes, minimum possible height or minimum number of levels is ? Log2(N+1) ?
This can be directly derived from point 2 above. If we consider the convention where height of a leaf node is considered as 0, then above formula for minimum possible height becomes ? Log2(N+1) ? – 1
This can be directly derived from point 2 above. If we consider the convention where height of a leaf node is considered as 0, then above formula for minimum possible height becomes ? Log2(N+1) ? – 1
4) A Binary Tree with L leaves has at least ? Log2L ? + 1 levels
A Binary tree has maximum number of leaves (and minimum number of levels) when all levels are fully filled. Let all leaves be at level l, then below is true for number of leaves L.
A Binary tree has maximum number of leaves (and minimum number of levels) when all levels are fully filled. Let all leaves be at level l, then below is true for number of leaves L.
L <= 2l-1 [From Point 1] l = ? Log2L ? + 1 where l is the minimum number of levels.
5) In Binary tree where every node has 0 or 2 children, number of leaf nodes is always one more than nodes with two children.
L = T + 1 Where L = Number of leaf nodes T = Number of internal nodes with two children
Common types of Binary Trees
Full Binary Tree A Binary Tree is full if every node has 0 or 2 children. Following are examples of a full binary tree. We can also say a full binary tree is a binary tree in which all nodes except leaves have two children.
18 / \ 15 30 / \ / \ 40 50 100 40 18 / \ 15 20 / \ 40 50 / \ 30 50 18 / \ 40 30 / \ 100 40
In a Full Binary, number of leaf nodes is number of internal nodes plus 1
L = I + 1
Where L = Number of leaf nodes, I = Number of internal nodes
L = I + 1
Where L = Number of leaf nodes, I = Number of internal nodes
Complete Binary Tree: A Binary Tree is complete Binary Tree if all levels are completely filled except possibly the last level and the last level has all keys as left as possible
Following are examples of Complete Binary Trees
18 / \ 15 30 / \ / \ 40 50 100 40 18 / \ 15 30 / \ / \ 40 50 100 40 / \ / 8 7 9
Perfect Binary Tree A Binary tree is Perfect Binary Tree in which all internal nodes have two children and all leaves are at the same level.
Following are examples of Perfect Binary Trees.
Following are examples of Perfect Binary Trees.
18 / \ 15 30 / \ / \ 40 50 100 40 18 / \ 15 30
A Perfect Binary Tree of height h (where height is the number of nodes on the path from the root to leaf) has 2h – 1 node. Example of a Perfect binary tree is ancestors in the family. Keep a person at root, parents as children, parents of parents as their children.
Balanced Binary Tree
A binary tree is balanced if the height of the tree is O(Log n) where n is the number of nodes. For Example, AVL tree maintains O(Log n) height by making sure that the difference between heights of left and right subtrees is atmost 1. Red-Black trees maintain O(Log n) height by making sure that the number of Black nodes on every root to leaf paths are same and there are no adjacent red nodes. Balanced Binary Search trees are performance wise good as they provide O(log n) time for search, insert and delete.
A degenerate (or pathological) tree A Tree where every internal node has one child. Such trees are performance-wise same as linked list.
10 / 20 \ 30 \ 40
Source:
https://www.geeksforgeeks.org/binary-tree-set-3-types-of-binary-tree/https://en.wikipedia.org/wiki/Binary_tree#Types_of_binary_trees
Comments
Post a Comment