Algorithm
Sorting Algorithms
All sorting algorithms share the goal of outputting a sorted list, but the way that each algorithm goes about this task can vary. When working with any kind of algorithm, it is important to know how fast it runs and in how much space it operates in. These factors are referred to as the algorithm's time complexity and space complexity.
Efficient sorting is important for optimizing the use of other algorithms (such as search and merge algorithms) which require input data in sorted lists.
Here are some known sort algorithms:
- Bubble Sort
- Selection Sort
- Insertion Sort
- Merge Sort
- QuickSort
When choosing a sorting algorithm, you need to consider the amount of data you are sorting and how much time you have to implement the algorithm.
For example, QuickSort is a very efficient, but can be pretty tricky to implement. Bubble Sort is simple to implement, but slow. To sort small sets of data, Bubble Sort may be a better option since it can be implemented quickly, but for larger datasets, the speed of QuickSort might be worth the work of implementing the algorithm.
Bubble Sort
Bubble Sort is an algorithm which is used to sort a list of elements, for example elements in an array.
The algorithm compares two adjacent elements and then swaps them if they are not in order.
The process is repeated until no more swapping is needed.
For example:
Let's take the following array: [3, 1, 5, 2]
Step 1: [1, 3, 5, 2] - the first two elements are compared and swapped.
Step 2: [1, 3, 5, 2] - the next pair is compared and not swapped, as they are in order.
Step 3: [1, 3, 2, 5] - the last two elements are swapped.
This was the first iteration over the array. Now we need to start the second iteration:
Step 1: [1, 3, 2, 5]
Step 2: [1, 2, 3, 5]
Step 3: [1, 2, 3, 5]
The third iteration will not swap any elements, meaning that the list is sorted!
The main advantage of Bubble Sort is the simplicity of the algorithm. Also, it does not require any additional storage space, as it operates in-place.
In terms of complexity, bubble sort is considered to be not optimal, as it required multiple iterations over the array. In the worst scenario, where all elements need to be swapped, it will require (n-1)+(n-2)+(n-3)+...+3+2+1 = n(n-1)/2 swaps (n is the number of elements).
Selection Sort
Selection Sort is a simple sorting algorithm that finds the smallest element in the array and swaps it with the element in the first position, then finds the second smallest element and swaps it with the element in the second position, and continues in this way until the entire array is sorted.
For example:
Let's take the following array: [3, 1, 5, 2]
Step 1: The smallest element is 1. We swap it with the first element.
Result: [1, 3, 5, 2]
Step 2: The second smallest is swapped with the second element.
Result: [1, 2, 5, 3]
Step 3: The third smallest is swapped with the third element.
Result: [1, 2, 3, 5]
The array is now sorted.
Insertion Sort
Insertion Sort is a simple sorting algorithm that works the way we sort playing cards in our hands. We sort the first two cards and then place the third card in the appropriate position within the first two, and then the fourth is positioned within the first three, and so on until the whole hand is sorted.
During an iteration, an element of the list is inserted into the sorted portion of the array to its left. So, basically, for each iteration, we have an array of sorted elements to the left, and an array of other elements still to be sorted to the right.
Sounds confusing? Let's look at an example to better understand the algorithm.
Take the following array: [3, 1, 5, 2]
Step 1: We start with the second element (1) and properly position it in the "array" of the first two elements.
Result: [1, 3, 5, 2] - now we have a sorted array to the left ([1, 3]), and the other elements to the right.
Step 2: The next element is 5. Inserting it into the array to the left results in [1, 3, 5, 2].
Step 3: The last element (2) is inserted into the corresponding position, resulting in: [1, 2, 3, 5]
Merge Sort
Merge Sort belongs to the category of Divide and Conquer algorithms.
These algorithms operate by breaking down large problems into smaller, more easily solvable problems.
The idea of the merge sort algorithm is to divide the array in half over and over again until each piece is only one item long. Then those items are put back together (merged) in sort-order.
Let's have a look at an example:
We start by dividing the array:
[31, 4, 88, 1, 4, 2, 42]
[31, 4, 88, 1] [4, 2, 42] //divided into 2 parts
[31, 4] [88, 1] [4, 2] [42] //divided into 4 parts
[31] [4] [88] [1] [4] [2] [42] //single items
Now we need to merge them back together in sort-order:
First we merge single items into pairs. Each pair is merged in sort-order:
[4, 31] [1, 88] [2, 4] [42]
Next we merge the pairs, again in sort order:
[1, 4, 31, 88] [2, 4, 42]
And then we merge the final two groups:
[1, 2, 4, 4, 31, 42, 88]
Now the array is sorted! The idea behind the algorithm is that smaller parts are easier to sort.
The merge operation is the most important part of the algorithm.
QuickSort
QuickSort is another algorithm from the Divide and Conquer category.
It operates by breaking down large problems into smaller, more easily solvable problems.
The idea of QuickSort is the following: a pivot element is picked. The versions of QuickSort differ in the way of pivot picking. First, last, median, or even a randomly selected element is a candidate to be picked as the pivot.
The primary part of the sort process is partitioning. Once the pivot is picked, the array is partitioned into two halves - one half containing all the elements less than the pivot and the other half containing the elements greater than the pivot. The equal ones can remain either side. Then, the same process occurs to the remaining halves of the array recursively, eventually resulting in a sorted array.
Consider the example:
Suppose we have the following sequence:
[ 2, 0, 7, 4, 3 ]
We choose 3 (last element) as the pivot.
After doing 4 comparisons we get the two halves:
[ 2, 0 ] (3) [ 7, 4 ]
Now, the same process repeats for the two halves. We choose (0) as a pivot for the lesser half, and (4) for the greater half.
After a comparison for each half, we get:
[ (0) [2] ] (3) [ (4) [7] ]
Which is a sorted sequence.
Advantages / Disadvantages
QuickSort operates in-place, so it doesn't require extra storage.
The choice of the pivot makes a big difference, as an unsuccessful pivot selection can decreases the algorithm's speed significantly.
A variation of QuickSort is the 3-way QuickSort - it divides the sequence into three pieces: smaller, greater, and equal. This makes it more convenient for data with high redundancy (repetitions
Tower of Hanoi
The Tower of Hanoi is a mathematical puzzle. It consists of three rods and a number of disks of different sizes, which can slide onto any rod:
The objective of the puzzle is to move the entire stack to another rod, obeying the following simple rules:
- Only one disk can be moved at a time.
- Each move consists of taking the upper disk from one of the stacks and placing it on top of another stack.
- No disk may be placed on top of a smaller disk.
To create a solution algorithm, let's start with two disks:
- First, we move the smaller (top) disk to the middle rod.
- Then, we move the larger (bottom) disk to the destination rod.
- And finally, we move the smaller disk from the middle to the destination rod.
For a general algorithm, we need to divide the stack into two parts - the largest disk (nth disk) is in one part and all other (n-1) disks are in the second part.
Our ultimate goal is to move disk n from source to destination and then put all other (n-1) disks onto it.
So, we can break up the problem into smaller problems and come up with a general algorithm:
Step 1: Move n-1 disks from source to the middle.
Step 2: Move nth disk from source to destination.
Step 3: Move n-1 disks from the middle to destination.
With 3 disks, the puzzle can be solved in 7 moves.
The minimal number of moves required to solve a Tower of Hanoi puzzle is 2^n-1, where n is the number of disks.
Linear Search
Linear search is a very simple search algorithm. Every item is checked and if a match is found then that particular item is returned, otherwise the search continues until the end of the list. It does not require a sorted list.
Consider that we are trying to find the value x in the list.
The algorithm consists of the following steps:
1. Start from the leftmost element of the list and one by one compare x with each element of the list.
2. If x matches with an element, return the index.
3. If x doesn't match with any of the elements, return -1.
Binary Search
Binary search is a fast search algorithm that works on sorted lists.
It belongs to the Divide and Conquer category, meaning that it breaks down large problems into smaller, more easily solvable problems.
The algorithm looks for a match by comparing the middle item of a collection.
If a match occurs, then the index of the item is returned.
If the middle item is greater than the item, then the middle item of the subarray to the left is compared.
Otherwise, the middle item of the subarray to the right is compared.
This process repeats on subarrays until the size of the subarray reduces to zero.
Basically, the list is divided into two halves and the search continues in the half where the element has a possibility of being located. This is why the algorithm requires a sorted list.
Let's take a sample sorted array [2, 5, 16, 18, 24, 42] and search for the element 24.
Step 1: We take the middle most item (18) and compare it to 24. We could have taken 16 as the middle-most element, so its up to you to decide how to choose it. Usually you divide the number of elements in the list by 2 and take the result as the index for the middle element.
Step 2: As 18 < 24, we divide the array into two parts and continue working with the subarray to the right: [24, 42]
Step 3: The same process is repeated, and as 42 > 24, we consider the left subarray: [24], which is the match!
Below is an illustrated example, where we search for the number 4 in the list. We start by comparing 4 with the middle most item (7):
Depth-First Search
This search algorithm is specially designed for graphs and trees.
As you may recall, a Graph is a set of connected nodes where each node is called a vertex and the connection between two of them is called an edge. While a tree is basically the same with the restriction that it should be an undirected graph in which two vertices are connected by exactly one path. So, any acyclic connected graph is a tree.
DFS - Depth-First Search (or Traversal) is a recursive search algorithm that uses backtracking. It starts from the root of the tree (arbitrary node in the case of graph) and explores as far as possible until all nodes become visited.
The word backtrack means that when you are moving forward and there are no more nodes along the current path, you move backwards on the same path to find nodes to traverse.
Example
Let's run the algorithm over the graph above and see in which order the nodes will be visited. Assume the algorithm is choosing the left edges before the right ones.
Step 1: Visiting node A. Continue to the left node.
Step 2: Visiting node B. Continue to the left node.
Step 3: Visiting node D. No more nodes available in the path. Move back to node B.
Step 4: Visiting node F. Continue to the node available in the path.
Step 5: Visiting node E. Node A is already visited.
Step 6: Going back to A.Visiting node C. Continue to the node available in the path.
Step 7: Visiting node G. All nodes are visited.
So, with remembering the visited nodes the order will be: A, B, D, F, E, C, G
Without remembering the visited nodes: A, B, D, F, E (the path will repeat again) and it will never reach the nodes C and G.
If we remove the edge between nodes F and E, we will get a tree, and the order of visited nodes will be: A, B, D, F, C, G, E
Binary Search Tree
Binary Search Tree (BST) is a special binary tree where the key in each node must be greater than or equal to any key stored in the left subtree, and less than or equal to any key stored in the right subtree.
For example:BSTs allow fast lookup, addition and removal of items, and can be used to implement either dynamic sets of items or lookup tables that allow finding an item by its key (e.g., finding the phone number of a person by last name).
Search, insert, and delete are the main operations of BST.
As the keys are stored in a particular order, the principle of binary search can be used for lookup.
We start from the root and compare the key. if the root has the key, it is returned as the result.
If the key is greater than the root, the right subtree is recursively checked.
If the key is less than the root, the left subtree is recursively checked.
The recursion continues until the key is found.
Insertion and deletion operations are much similar, as both use the same search logic to locate the needed node.
Expression Evaluation
Consider an arithmetic expression: A+B
This is called an infix notation, when the operator (+) is between the operands (A and B).
We, humans, know how to interpret the infix notation, as that's what we are taught in schools. But this form is not very suitable for the computer to read and evaluate the expressions, because each operator has a precedence, which shows the order in which the expression needs to be evaluated.
For example, A+B*C. We know that the multiplication need to be done first, because it has higher precedence, than the + operator. Parentheses also make a difference, for example: (A+B)*C. Now the addition needs to be done first, as parentheses dictate the precedence.
Now, we need an algorithm to "understand" the expression and calculate the result. This can be achieved using two Stacks - one for the operators, and another for the operands.
Assuming that each operator is a single character and we use only binary operators, the algorithm is:
Until the end of the expression is reached, get one character and perform only one of the steps (a) through (f):
(a) If it's an operand, push it onto the operand stack.
(b) If it's an operator, and the operator stack is empty then push it onto the operator stack.
(c) If it's an operator and the operator stack is not empty, and the character's precedence is greater than the precedence of the stack top operator, then push the character onto the operator stack.
(d) If it's "(", then push it onto operator stack.
(e) If it's ")", then, until "(" is encountered, do the following:
1) pop operand stack once (value1)
2) pop operator stack once (operator)
3) pop operand stack again (value2)
4) compute value1 operator value2
5) push the value obtained in operand stack.
6) pop operator stack to ignore the "("
When there are no more input characters, keep processing the steps in (e), until the operator stack becomes empty. The value left in the operand stack is the final result of the expression.
Let’s consider the expression (5+3)*6.
1) "(" is pushed (=>) onto the operator stack
2) 5 => operand stack
3) + => operator stack
4) 3 => operand stack
5) ")" is encountered. The top operator and operands are popped from the stack, resulting in 5+3 being evaluated and pushed onto the operand stack. Now we have 8 on the operator stack
6) * => operator stack
7) 6 => operand stack
8) The top operator and operands are popped from the stack, resulting in 8*6 being evaluated, and resulting in 48. This is the final value, as the operator stack is empty.
Time Complexity
In computer science, the time complexity is the computational complexity that measures or estimates the time taken for running an algorithm.
Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that an elementary operation takes a fixed amount of time to perform.
Since an algorithm's running time may vary with different inputs of the same size, one commonly considers the worst-case time complexity expressed using Big-O notation, which is the maximum amount of time taken on inputs of a given size.
For example, an algorithm with time complexity O(n) is a linear time algorithm.
Common Time Complexities
O(1) - Constant time: Given an input of size n, it only takes a single step for the algorithm to accomplish the task.
Pseudocode:
O(log n) - Logarithmic time: the number of steps it takes to accomplish the task are decreased by some factor with each step. A binary search algorithm is an example.
Pseudocode:
O(n log n) - Quasilinear time: in many cases, the n log n running time is simply the result of performing a O(log n) operation n times
For example, binary tree sort creates a binary tree by inserting each element of the n-sized array one by one.
Also, the average case of quicksort, heapsort, and merge sort run in O(n log n) time.
Pseudocode:
O(n) - Linear time: the running time increases at most linearly with the size of the input.
Pseudocode:
O(2ⁿ) - Exponential time: This is common in situations when you traverse all the nodes in a binary tree.
Pseudocode example:
O(n!) - Factorial time: This is common in generating permutations.
Pseudocode:
-------------------------------END OF ALGORITHM---------------------------------
Comments
Post a Comment