Data Structures
Data Structures
A data structure organizes related data in a computer so that it can be used efficiently.
Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks. For example, graphs are used to represent connections on social websites (such as Twitter, Facebook), trees are used to represent hierarchical data, such as files and folders in a filesystem, queues can be used to create task schedulers, compiler implementations usually use hash tables to look up identifiers, binary trees are used in searching algorithms, etc.
Usually, choosing the correct data structure is key to designing an efficient algorithm.
Some of the most common data structures are:
- Array
- Linked List
- Stack, Queue
- Graph, Tree
- Hash Table
Some data structures can be used to create others, for example, linked lists can be u
Linked List
The linked list data structure is often used to implement other data structures.
A linked list is a sequence of nodes where each node stores its own data and a pointer (address) to the location of the next node.
One node links to another forming what can be thought of as a linked chain:
Advantages:
Although a linked list is similar to an array, it is not restricted to a declared number of elements. Additionally, unlike an array which stores data contiguously in memory or on disk, a linked list can easily insert or remove elements without reallocation or reorganization of the entire structure because the data items need not be stored contiguously.
Linked List Drawback
1) Random access is not allowed. We must access nodes sequentially starting from the first one. Therefore, we cannot do a binary search on a linked list.
2) Extra memory space for a link is required for each element of the list.
Terminology
Stack
A stack is a simple data structure that adds and removes elements in a particular order.
Every time an element is added, it goes on the "top" of the stack. Only an element at the top of the stack can be removed, just like a stack of plates. This behavior is called LIFO (Last In, First Out).
Adding a new element onto the stack is called push.
Removing an element from the stack is called pop.
Applications
Stacks can be used to create undo-redo functionalities, parsing expressions (infix to postfix/prefix conversion), and much more.
Queue
queue is a simple data structure that allows elements to be inserted from one end, called the rear (also called tail), and deleted from the other end, called the front (also called head).
Terminology
The process of adding new elements into the queue is called enqueue.
The process of removal of an element from the queue is called dequeue.
Applications
Queues are used whenever we need to manage objects in order starting with the first one in.
Scenarios include printing documents on a printer, call center systems answering people on hold people, and so on.
Graph
Graphs are used to represent many real-life applications like networks, transportation paths of a city, and social networks such as Facebook where each person represents a single node. Graphs are useful in many applications of computer science as well.
A graph is a set of connected nodes where each node is called a Vertex and the connection between two of them is called an Edge. We define a Graph as a data structure that consists of finite vertices and edges.
There are two types of graphs, undirected (A) and directed (B):
Undirected Graphs
Image A is an example of an undirected graph. Here, the edges do not have directions, meaning that if the vertex u is connected to v, then v is connected to u.
Undirected graphs usually are drawn with straight lines between the vertices (no arrows).
Example: On Facebook, if user A is a friend of user B, that automatically means that user B is a friend of user A.
Directed Graphs
Image B is an example of a directed graph. Here, each edge has a defined direction, which is usually represented with arrows. Having u connected to v does not necessarily mean that v is connected to u.
Example: On Twitter, if user A follows user B, then user B does not necessarily follow user A back.
Binary Tree
Unlike linear data structures (Arrays, Linked Lists, Stacks, and Queues), Binary trees are hierarchical.
They consist of interconnected nodes. The topmost node is called the root. For a given node, the nodes that are directly under are called its children.
The tree is called binary because each node can have at most two children.
The node directly above another node is called its parent.
The tree which is a child of a node is called subtree.
Elements with no children are called leaves.
The maximum number of nodes from root to leaf is called the height.
Check out the image below for a visual representation of these terms:
Types
There are various types of binary trees, each having characteristics suitable for its purpose.
Full Binary Tree
A binary tree is considered full if every node has 0 or 2 children.
Complete Binary Tree
A binary tree is considered complete if all levels are completely filled except possibly the last level and the last level has all nodes as far left as possible.
Perfect Binary Tree
A binary tree is considered perfect if all internal nodes have two children and all leaves are at the same level.
- Trees reflect structural relationships in the data, and are used to represent hierarchies.
- Trees provide efficient insertion and searching.
- Trees are very flexible, allowing subtrees to be moved around with minimum effort.
Hash Table
Hash tableA hash function is any function that maps a data set of an arbitrary size to a data set of a fixed size that falls into the hash table. The values returned by a hash function are called hash values, hash codes, hash sums, or simply hashes.
The hash function should be able to get a unique hash for each element in the data set to prevent collisions.A hash table is a structure that maps keys to valued
Let's first understand the term hash.
Hashing
Hashing is a technique that is used to uniquely identify a specific object from a group of similar objects.
Some examples of how hashing is used in our lives include:
• University students are assigned a unique ID that is used to store and retrieve information about them.
• Libraries assign each book a unique number that is used to determine its shelf location.
In both these examples the students and books were hashed to a unique number.
Hash function
A hash function is any function that maps a data set of an arbitrary size to a data set of a fixed size that falls into the hash table. The values returned by a hash function are called hash values, hash codes, hash sums, or simply hashes.
The hash function should be able to get a unique hash for each element in the data set to prevent collisions.
For Example:
Consider 4 books: "Harry Potter", "War and Peace", "Twilight", and "Jane Eyre". We have 13 positions in the shelf and want to assign each book a unique number that can be used to determine its exact position.
Our hash function will take the book title, sum its character values (a is 1 , b is 2,..., z is 26), and modulo with 13 to get a unique value for the position.
For our books we get:
Harry Potter: (8+1+...+18)%13=8
War and Peace: (23+1+..+5)%13=0
Twilight: (20+23+...+20)%13=4
Jane Eyre: (10+1+...+ 5)%13=5
Hash table
A hash table is a data structure that is used to store the hashes and their associated data.
The books in our example will be stored in the following way:
AdvantagesEach element is assigned a key (hash). The key can be used to find the desired element directly.
DisadvantagesHash tables don't remember the order in which items are inserted. Thus, hash tables are not efficient at ordering or sorting data.
Matrix
A matrix is a data structure with rows and columns. It could be thought of as a two-dimensional array.
Matrices are expressed by their number of rows and columns usually denoted by the letters m and n respectively. For example, a matrix with 6 rows and 3 columns is denoted as a 6 by 3 matrix.
Usage
Matrices have many applications. Consider graphics, where a digital image is basically a matrix to begin with. Rows and columns of the matrix correspond to rows and columns of pixels, and the numerical entries correspond to the pixels' color values.
Other applications include decoding digital video, processing digital sound, and so on.
As another example, graphs are often represented using matrices, where the values show which vertices of the graph are connected.
Operations
Matrices support many operations, such as addition and multiplication, as well as some more specific ones, like transposition, row operations, and submatrix.
Matrix Operations
Addition and scalar multiplication are the most basic matrix operations.
These are heavily used in calculations that solve problems in algebra, physics, and many other fields.
Scalar Multiplication
As the name suggests, a matrix is multiplied by a scalar (a constant), which results in a matrix where each element of the original matrix has been multiplied by the scalar.
For example:
Addition
In order to add two matrices, they must correspond in dimensional sense (must have the same dimensions). For example, you cannot add a 3 by 4 matrix to a 4 by 2 matrix. Instead, we can add a 3 by 4 matrix to another 3 by 4 matrix.
Check out the following example:
Matrix Multiplication
Multiplication is one of the most important operations for matrices and has many applications in science and mathematics.
To understand how matrices are multiplied, we first need to understand the Dot Product concept.
Dot Product is an algebraic operation that takes two equal-length vectors (arrays) and returns a single number. Each element of the first vector is multiplied with its corresponding element from the second vector, then the results are added up to a final number, which is called the dot product of the two vectors.
In relation to matrices, the dot product is used to multiply two matrices. Two matrices can be multiplied only when the first one's number of columns is equal to the second one's number of rows.
The result of matrix multiplication is another matrix. Given an m by n matrix named A and n by k matrix named B, the resulting matrix is an m by k matrix.
Each element of that matrix is the dot product of each corresponding row of the first matrix and the corresponding column of the second matrix.
For this example, multiplying a 3x2 matrix with a 2x1 matrix results in a 3x1 matrix. Let's break down the calculation.
To get the first element of the result, we need to calculate the dot product of the first row of the first matrix, with the column of the second matrix: 1*1+2*0 = 1
For the second element: 0*1+1*0 = 0
The third element: 1*1 + 0*0 = 1
Transposition
Transposing a matrix results in a matrix called the transpose of the matrix.
It is used in a wide range of operations, such as image processing, signal modulation, statistical programming, and so on.
The transpose of an m by n matrix is an n by m matrix. Transposition swaps the rows and the columns of a matrix.
Gap Buffer
A gap buffer is a dynamic array that allows efficient insertion and deletion operations clustered near the same location. Gap buffers are especially common in text editors, where most changes to the text occur at or near the current location of the cursor.
Let's say, we want to create a text editor, which should support the following operations:
- insert or delete a character at column x, row y,
- insert or delete the column x.
Using an array of strings may seem a reasonable solution. However, this approach has a very slow performance because if we want to insert or delete a character, we have to shift characters to free up or fill the required space.
Gap buffer
pre holds the characters before the cursor (cursor of the text editor)
- gap is the cursor itself, a sort of scratch area we can use to perform common text editor operations like inserting or deleting text
- post holds the characters after the cursor
These sections are not physical parts of the array, they're virtual: in a gap buffer implementation, we have to write code to keep track of which parts of the array belong to which section.Here's how will the gap buffer look like if we type "hllo" in the text editor:
Insert
To insert the missing 'e', we should move to the position at 'h'
Here's how will the gap buffer look like if we type "hllo" in the text editor:
then insert the letter 'e', which decreases the size of the gap.
Delete
In order to delete a character, we just move the start of the gap to the left. Note that we don’t have to change the value of the "deleted" character. Simply increasing the size of the gap is enough.
Heap
A heap is a special case of a binary tree data structure where the root node key is compared with its children and arranged accordingly. The key at the root node is greater than or equal to its children if the heap is a max heap, and is less than or equal to its children if the heap is a min heap.
The following steps describe the algorithm of inserting an element into the min heap.
1. Create a new node at the end of the heap.
2. Assign the new value to the node.
3. Compare the value of this new node with its parent. If the value of the parent is greater than the child value, then swap them.
4. Repeat the step 3 until the heap property holds.
Deleting an element is done as follows:
1. Remove the root node.
2. Move the last element of the heap to the root.
3. Compare the value of this moved node with its children. If the value is less than the child node, then swap them.
4. Repeat the step 3 until the heap property holds.
AVL Tree
AVL tree (named after inventors Adelson-Velsky and Landis) is a self-balanced binary search tree. A binary tree is said to be balanced, if the difference between the heights of left and right subtrees of every node in the tree is either -1, 0 or +1. In an AVL tree, every node maintains an extra information known as the balance factor.
The balance factor of a node is the difference between the heights of left and right subtrees of that node. We calculate the balance factor of a node as follows:
A node N:
- with BalanceFactor(N) < 0 is called "left-heavy"
- with BalanceFactor(N) > 0 is called "right-heavy"
- with BalanceFactor(N) = 0 is sometimes simply called "balanced"
AVL Tree Rotations
In AVL tree, after performing every operation like insertion and deletion we need to check the balance factor of every node in the tree. If every node satisfies the balance factor condition then we finish the operation otherwise we must make it balanced.
Rotation operations are used to make a tree balanced. It is the process of moving the nodes to either left or right to make tree balanced.
There are four rotations and they are classified into two types
Single Left Rotation (LL Rotation)
In LL Rotation every node moves one position to the left from the current position.
Left Right Rotation (LR Rotation)
The LR Rotation is a combination of single left rotation followed by a single right rotation. In LR Rotation, first, every node moves one position to the left then one position to the right from the current position.
Right Left Rotation (RL Rotation)
The RL Rotation is a combination of single right rotation followed by a single left rotation. In RL Rotation, first, every node moves one position to the right then one position to the left from the current position.
Decision Tree
A decision tree is a flowchart-like model where each branch is the possible outcome. It is usually used to display an algorithm in computer science and machine learning.
There can be different rules, cases and custom scenarios based on the use cases. The most common use case is to build a conceptual model of the product or software.
A decision tree typically consists of three types of nodes: decision nodes, chance nodes and end nodes. There are no limitations on the size and structure (balanced or not balanced) of decision trees.
The following image demonstrates a simple decision tree.
Red-Black Tree
A red-black tree is similar to the AVL-tree in its self-balancing nature. Each node of the binary tree has an extra bit, and that bit is often interpreted as the color (red or black) of the node. These color bits are used to keep the tree balanced during the insertions and deletions.
The recoloring of the nodes are done efficiently and doesn't slow down the tree itself.
Like most of the data structures, it guarantees to operate on O(log n) in most of the cases. The reordering and recoloring are also done in O(log n).
The red-black tree is defined by the following rules:
1. Every node is either red or black.
2. The root of the tree is always black.
3. There can't be two adjacent red nodes. (A red parent can't have a red child or vice-versa)
4. Every path from the root to the Null child node has the same number of black nodes.
5. All the Null leaves are black.
Insertion
It is very similar to the standard binary search tree where we add a node. We should also color it red. In the binary search tree a new node is added as a leaf, whereas leaves contain no information in the red-black tree, so instead, the new node replaces an existing leaf and then has two black leaves of its own added.
Removal
It is similar to the standard binary search tree node removal. If the node to be deleted was red, just delete it as in the BST. If the node to be deleted is black, we erase its value and associate the still colored node with whatever node comes to take its place (which could be a terminal node). If we allow the empty but colored node to be counted when measuring the black-node-length of a simple path, we will have retained the original black height of the tree. Our goal is to remove this empty black node from the tree.
---------------------------->END OF DATA STRUCTURE<----------------------------






Comments
Post a Comment