Algorithms & Data structures Handbook
Last updated
Last updated
If you are looking for detailed references and best way to practice
Resources (WIP)
Some techniques I notices how useful they are when solving problems
Techniques for Solving Data Structures Problems with Examples
Data Structures Implementation
Data Structures ADTs Implementation (In Go)
Sorting
Sorting Algorithms
🚧 **Graphs (Work in progress)**ℹ️ All Solutions also here on my Github
An algorithm is every function you created in any programming language you are programming in. Any steps (programming statements) you wrote to solve a problem. And all programming problems is about processing data. So, in Computer Science there is field of Algorithms. which focuses on dealing with processing data at scale, and analyzing the performance of an algorithm to solve programming problems such as sorting, searching, removing and so on.
💡 Reasoning about an algorithm is impossible without a careful description of the sequence of steps to be performed. The three most common forms of algorithmic notation are (1) English, (2) pseudocode, or (3) a real programming language. - The Algorithms Manual bookℹ️ So before we talk about Algorithms, let's talk about a data that a program might have
Why we need to say this is a Boolean variable ? or this is an integer ?
First of all any data in a program has a type. A data type help us define 2 things
A domain of allowed values
A set of operations on these values
And a program compiler will go through a compilation error if we misuse a data type, for example performing multiplication on chars
z = x*y
is not allowed
true/false
is not allowed
You start with a data you want to reference, let’s say a name
Then you use a variable to do that var name = “Abdulrhaman”
Now, you have multiple names/data you want to group to manage it (read it, delete from it, insert into it)
You need a structured (defined) way to access and manage your data, this is the whole motivation behind a data structure
Have no component parts. E.g int, char , float, etc
Types that can be broken into parts. for example a Person object consist of two properties : name and an age, both of them are atomic data types (string, int respectively )
Object ⇒ broken into properties
Array ⇒ broken into indices
So a data structure is a data type whose values
Can be decomposed into a set of atomic data or another data structure (2D Array)
Include a structure involving the component parts (a fixed technique on how to search, store, remove and so on)
Possible Structures: Set, Linear, Tree, Graph
Next, you store your data in array, you want to search through it
sort it using quick/tem sort
apply BST algorithm with log n
Provide a level of Abstraction : I want to use a data structure but I don't want to know how it's implemented, for example in Golang you can append to a slice but you don't care of the implementation at the time of using the slice data structure
While designing ADTs, a designer has to deal with two types of questions:
What values are in the domain? What operations can be performed on the values of a particular data type?
How is the data type represented?
How are the operations implemented?
Generally, the performance factors of a program
Code Algorithims
Languages + compilers
Processer (CPU) speed + Memory ⇒ how fast instruction will be executed
I/O and number of CPU cores
How we can determine the performance of an algorithm.
Or compare algorithms and determine which one is better ?
first, we need to determine what we mean of "better"
High-level description of the algorithm instead of an implementation (benchmarking/running-time)
Characterize running time as function of the input size, n
takes into account all possible inputs (int, string, float...)
Allows us to evaluate the speed of an algorithm independently of the hardware/software environment
So an algorithm depends on the inputs of size n, so the time complexity or running time will vary from input to input. for example sorting an array of 10 numbers is much less time consuming then sorting an array of 100000 numbers! maybe your algorithm in the best vase scenario will take 100 array elements to be sorted. BUT you always count the worst case scenario. which is maybe 10000000 elements. remember, when your input grow your algorithm time and space complexity will increase. and may increase in a huge way
There is 3 types of possible scenarios
best case : you count it maybe through some sort of statistics
average case : you count it maybe through some sort of statistics
worst case : you defiantly to imagine the worst case scenario (what if the inputs scaled)
We focus on the worst case running time
easier to analyze
critical to games, finance and robotics
Time complexity ⇒ the time it takes to execute
Space complexity ⇒ the memory it needs to execute
We will focus on time complexity right now
Experimental analysis ⇒ compare the running time (benchmarking)
inputs ⇒ array or objects or any data
Theoretical analysis ⇒ analyze the algorithms independently of the implementation (software or hardware)
An algorithm is said in-place if it does not require more than O(log n) space in addition to the input.
An algorithm is said in-place if it does not require more than O(log n) space in addition to the input.
Let's see how we calculate the time-complexity using Big oh notation
Low-level: Sequential or fixed sequence on the memory
Re-sizing: you need to re-size your array if the capacity is full which is not a cheap operation
Fast access O(1), worst delete/insert O(n) because of shifting
If you can, try to push to last of array instead of adding the item in a specific index, you will have a constant time-complexity since you will not do shifting
Arrays are contagious data structure , which is great for spatial locality
Deep copy vs Shallow copy of arrays
Most brute-force algorithms would be linear O(n) time complexity, you could reduce the space complexity to 1 by using the array itself not creating a new one (for example using 2 pointers i
,j
)
Arrays is not sorted by default, you need to sort it before you can search it logarithmically.
Trees is better for searching, where you can search (without a sorting step) which is binary trees (AVL, B-Tree)
Trees are better for searching if you are searching based on a condition ( query), index is not that useful for huge amount of data(in arrays)
💡 Once the array is sorted, you could search through it using binary search tree with time complexity O(log n)
binary_Search
lower_bound
upper_bound
min_element
reverse
rotate
sort
Allow for fast insert/delete , but slow read
Low-level : linked list is scattered in memory and not required to sequential,
💡 Arrays and linked lists are memory access techniques rather than just data structure, so you have contagious data structures built on array such heaps
Low-level : you need to choose between arrays or linked-list
Arrays:
fast read, poor insert/delete
Sequential : which means you need fixed segment of memory to store your values, if you items exceed the size , you will do expensive task which is re-sizing
memory is a concern. Filled arrays take up less memory than linked lists
you need indexed/random access to elements
you know the number of elements in advance
Linked-lists
fast insert/delete , poor read
Random allocation, scattered through your memory
situations where we don't know the exact size of the data structure but anticipate that the list could potentially grow to large sizes
Also when you anticipate that the data structure will grow more than the available memory, (half of your collection may resides on disk?)
High-level
You always going to work with arrays, since most external data sources will return it as arrays
But why ?
The DB know in advance the size before sending it to you
Mostly you are going to do read heavy apps, so arrays is better
You can sort it ! linked list can’t be sorted, you sort and search with log n
Implemented as linked list or array based on the access method you prefer
For recent history records or backtracking , for example you want to finish with a call-stack and then starts the next most recent thing
Parsing
Accessing things in reverse order
2 common errors
Underflow
Overflow
Implemented as linked list or array
keep track of jobs to be processed
Constant time of everything!
Any map implementation like hash map, tree map etc
Keep track of random data with constant time
Hierarchical of data
one to 2 or many (in case of graph)
Range queries of data : b-tree
Sub-set of data in hard-disk to memory : b-tree
Ordered tree with finding largest or smallest second : heap
Hash table: not sorted collection with fast retrieval of random key search in the expense of space complexity + poor linear range / traversal search: hash table
BSTs allow fast in-order traversal of the keys (log n) hash tables don't
hard to come up with hash function against a simple key comparison (integer or string) in BST
Hash tables are not particularly efficient data structures to use for some set operations like taking the intersection of two sets.
When you need to be able to access the nth item efficiently. For example, if you want to be able to pick a random member of the collection efficiently. Range query
FIFO , LIFO
scattering data across disk : linked-list
Sequence of data in memory with fist read : array
A set is a data structure that holds elements without any particular order. An element only appears once in a set.
Linear data structures have 1-1 relationship between its items (arrays, stack, linked-lists…) , what if we want to model our data structure as a graph or hierarchal (1-2,1-3,1-4…1-n) ?
Also, what if you want to access data in certain way based on a need, such as back-tracking problems like history, call-stack, prefix calculation ? use stack
For buffering tasks/jobs/data to be processed based on special pretty (using priority queue) or based on time(FIFO) use Queue
If you know your keys that identify the data you need (not search for random members of a collection based on condition) use hash-table
B-tree is low-level, just balanced binary tree that you can load segment of it into memory. And search on that segment, useful for databases (Not loading all data from disk to memory since you have a range or condition to filter out results) B-tree
Binary heap is used mostly as priority queue → about constantly maintaining the smallest/largest element, which corresponds to a Priority Queue, a concept that is used extensively and in many applications, why maintaining the whole sorted array when I care only about the biggest/smallest ?