Algorithms & Data structures Handbook

Overview

If you are looking for detailed references and best way to practice
Resources (WIP)
Some techniques I notices how useful they are when solving problems
Techniques for Solving Data Structures Problems with Examples
Data Structures Implementation
Data Structures ADTs Implementation (In Go)
Sorting
Sorting Algorithms

🚧 **Graphs (Work in progress)**ℹ️ All Solutions also here on my Github

What is an algorithm ?

An algorithm is every function you created in any programming language you are programming in. Any steps (programming statements) you wrote to solve a problem. And all programming problems is about processing data. So, in Computer Science there is field of Algorithms. which focuses on dealing with processing data at scale, and analyzing the performance of an algorithm to solve programming problems such as sorting, searching, removing and so on.

💡 Reasoning about an algorithm is impossible without a careful description of the sequence of steps to be performed. The three most common forms of algorithmic notation are (1) English, (2) pseudocode, or (3) a real programming language. - The Algorithms Manual bookℹ️ So before we talk about Algorithms, let's talk about a data that a program might have

Data types and why do they exist

Why we need to say this is a Boolean variable ? or this is an integer ?

First of all any data in a program has a type. A data type help us define 2 things
- A domain of allowed values
- A set of operations on these values
And a program compiler will go through a compilation error if we misuse a data type, for example performing multiplication on chars
- z = x*y is not allowed
- true/false is not allowed

A journey from an atomic data to a data structure

You start with a data you want to reference, let’s say a name
Then you use a variable to do that var name = “Abdulrhaman”
Now, you have multiple names/data you want to group to manage it (read it, delete from it, insert into it)
You need a structured (defined) way to access and manage your data, this is the whole motivation behind a data structure

Atomic / Simple Data

Have no component parts. E.g int, char , float, etc

Data structure types

Types that can be broken into parts. for example a Person object consist of two properties : name and an age, both of them are atomic data types (string, int respectively )
- Object ⇒ broken into properties
- Array ⇒ broken into indices
So a data structure is a data type whose values
- Can be decomposed into a set of atomic data or another data structure (2D Array)
- Include a structure involving the component parts (a fixed technique on how to search, store, remove and so on)
- Possible Structures: Set, Linear, Tree, Graph
Next, you store your data in array, you want to search through it
- sort it using quick/tem sort
- apply BST algorithm with log n

Abstract Data Types (ADTs)

Provide a level of Abstraction : I want to use a data structure but I don't want to know how it's implemented, for example in Golang you can append to a slice but you don't care of the implementation at the time of using the slice data structure

How to create your own data structure

While designing ADTs, a designer has to deal with two types of questions:

What values are in the domain? What operations can be performed on the values of a particular data type?
How is the data type represented?
How are the operations implemented?

Performance Analysis

Generally, the performance factors of a program
- Code Algorithims
- Languages + compilers
- Processer (CPU) speed + Memory ⇒ how fast instruction will be executed
- I/O and number of CPU cores
How we can determine the performance of an algorithm.
Or compare algorithms and determine which one is better ?
first, we need to determine what we mean of "better"
High-level description of the algorithm instead of an implementation (benchmarking/running-time)
Characterize running time as function of the input size, n
takes into account all possible inputs (int, string, float...)
Allows us to evaluate the speed of an algorithm independently of the hardware/software environment

Scenarios

So an algorithm depends on the inputs of size n, so the time complexity or running time will vary from input to input. for example sorting an array of 10 numbers is much less time consuming then sorting an array of 100000 numbers! maybe your algorithm in the best vase scenario will take 100 array elements to be sorted. BUT you always count the worst case scenario. which is maybe 10000000 elements. remember, when your input grow your algorithm time and space complexity will increase. and may increase in a huge way
- There is 3 types of possible scenarios
  - best case : you count it maybe through some sort of statistics
  - average case : you count it maybe through some sort of statistics
  - worst case : you defiantly to imagine the worst case scenario (what if the inputs scaled)
- We focus on the worst case running time
  - easier to analyze
  - critical to games, finance and robotics

Algorithms performance

Time complexity ⇒ the time it takes to execute
Space complexity ⇒ the memory it needs to execute
We will focus on time complexity right now

Ways of measuring the time complexity of algorithms

Experimental analysis ⇒ compare the running time (benchmarking)
- inputs ⇒ array or objects or any data
Theoretical analysis ⇒ analyze the algorithms independently of the implementation (software or hardware)
An algorithm is said in-place if it does not require more than O(log n) space in addition to the input.
An algorithm is said in-place if it does not require more than O(log n) space in addition to the input.

Let's see how we calculate the time-complexity using Big oh notation

Array

Low-level: Sequential or fixed sequence on the memory
Re-sizing: you need to re-size your array if the capacity is full which is not a cheap operation
Fast access O(1), worst delete/insert O(n) because of shifting
If you can, try to push to last of array instead of adding the item in a specific index, you will have a constant time-complexity since you will not do shifting
Arrays are contagious data structure , which is great for spatial locality
Locality of reference - Wikipedia
Deep copy vs Shallow copy of arrays
Most brute-force algorithms would be linear O(n) time complexity, you could reduce the space complexity to 1 by using the array itself not creating a new one (for example using 2 pointers i,j)

Searching with arrays with logarithmic time-complexity

Arrays is not sorted by default, you need to sort it before you can search it logarithmically.
Trees is better for searching, where you can search (without a sorting step) which is binary trees (AVL, B-Tree)
- Trees are better for searching if you are searching based on a condition ( query), index is not that useful for huge amount of data(in arrays)

💡 Once the array is sorted, you could search through it using binary search tree with time complexity O(log n)

What will be time complexity if I am searching a sorted array and I found the element on first index? And what if array is unsorted?

Key methods 🔨

binary_Search
lower_bound
upper_bound
min_element
reverse
rotate
sort

Linked List

Allow for fast insert/delete , but slow read
Low-level : linked list is scattered in memory and not required to sequential,

💡 Arrays and linked lists are memory access techniques rather than just data structure, so you have contagious data structures built on array such heaps

Array Vs Linked list

Low-level : you need to choose between arrays or linked-list
- Arrays:
  - fast read, poor insert/delete
  - Sequential : which means you need fixed segment of memory to store your values, if you items exceed the size , you will do expensive task which is re-sizing
  - memory is a concern. Filled arrays take up less memory than linked lists
  - you need indexed/random access to elements
  - you know the number of elements in advance
- Linked-lists
  - fast insert/delete , poor read
  - Random allocation, scattered through your memory
  - situations where we don't know the exact size of the data structure but anticipate that the list could potentially grow to large sizes
  - Also when you anticipate that the data structure will grow more than the available memory, (half of your collection may resides on disk?)
High-level
- You always going to work with arrays, since most external data sources will return it as arrays
- But why ?
  - The DB know in advance the size before sending it to you
  - Mostly you are going to do read heavy apps, so arrays is better
  - You can sort it ! linked list can’t be sorted, you sort and search with log n

Stack

Implemented as linked list or array based on the access method you prefer

Usage

For recent history records or backtracking , for example you want to finish with a call-stack and then starts the next most recent thing
Parsing
Accessing things in reverse order
2 common errors
- Underflow
- Overflow

Queue

Implemented as linked list or array

Usage:

keep track of jobs to be processed

Hash table

Constant time of everything!
Any map implementation like hash map, tree map etc
Keep track of random data with constant time

Trees

Hierarchical of data
- one to 2 or many (in case of graph)
Range queries of data : b-tree
Sub-set of data in hard-disk to memory : b-tree
Ordered tree with finding largest or smallest second : heap
Hash table: not sorted collection with fast retrieval of random key search in the expense of space complexity + poor linear range / traversal search: hash table
- BSTs allow fast in-order traversal of the keys (log n) hash tables don't
- hard to come up with hash function against a simple key comparison (integer or string) in BST
- Hash tables are not particularly efficient data structures to use for some set operations like taking the intersection of two sets.
- When you need to be able to access the nth item efficiently. For example, if you want to be able to pick a random member of the collection efficiently. Range query
- FIFO , LIFO
scattering data across disk : linked-list
Sequence of data in memory with fist read : array
A set is a data structure that holds elements without any particular order. An element only appears once in a set.