# Data Structures

Data structures are a mandatory part of becoming a software engineer. Data structures are how we solve problems in programming with a lot of information. There are multiple different structures to use when storing information or solving a problem. We need to determine which data structure is the best fit for the scenario we are in. We can do this by checking the efficiency and speed of doing certain operations such as** accessing**, **searching**, **inserting**, and **deleting**. The way to measure the efficiency and speed of these operations is to use Big O Notation. See my blog on Big O notation below if you’d like to learn or review that topic.

# The Types of Data Structures

## Arrays

It is likely that if you are reading this you know what an array is. While I thought I knew arrays in practice of coding problems for technical interviews I realized my knowledge was only the bare minimum of how to work with an array. Here is a look into the methods and speeds:

**Accessing — **O(1) This is incredibly fast and efficient. you preform this by accessing the index of an array to find the element you need. It is able to do this because we state the size of an array and it is stored in memory of the computer never being split up. We just find the array in memory then provide an index. Not many other data structures have this benefit of speed for accessing.

**Searching — **0(n) In most cases your array will not be sorted. Because of this to search through in array the speed or operations is completely tied to how many elements are in the array. You will have to look through every element and assume worst case scenario is it is the last one in the array. This is also called a linear search.

**Inserting and Deleting — **O(n) To insert or delete anything from an array you need to move every element over 1 space. This was surprising to me at first but if you imagine a bookshelf and you want to put a new book first in line you would have to physically touch every book and move it to the right one space. If there are 200,000 books (or **n**) you are going to have to move them all. you could technically add to the end of the array and not touch every element but big O notation *assumes* the worst case scenario. Deletion works similarly if we want to delete the first element we need to move everything over to take its place.

**Pros**

- Good for storing similar data.
- O(1) accessing.

**Cons**

- Cannot change size of array after initializing.
- Slow insertion and deletion.
- Wasted storage space if there are empty spaces in array

I will leave off Array Lists for now as they are language dependent.

# A sequential access data structure.

Unlike arrays each element is dependent on other elements. A common example of a real life instance of this would be a tape measure. In order to get to any length you would need to go through many others to access it.

## The Stack

The stack is a sequential data structure. It operates using the **LIFO **principle (Last In First Out). An example of this would be a stack of books. If you added a book to the stack it would be on top. If you wanted to remove one you would remove the latest addition from the top. You cannot take from the middle you would have to remove every book on top of that. An example of this in computer science would be recursion, undo-redo and back paging on a site.

You can:

**Push** onto the top of the stack.

**Pop** to remove the top element of the stack.

**Peek** at the top element leaving it unchanged.

**Contains** to see if it is true or false that the stack holds the element anywhere.

Methods and speeds:

**Accessing — **O(n) you need to pop off every single element in relation to the one you are looking for. This will scale based on how many elements are in **n**

**Searching — **O(n) much the same as above.

**Inserting and Deleting — **O(1) This speed is the benefit of using stacks. No matter how much data is being represented it will only ever take 1 operation to add something to the top or remove something from the top.

**Pros**

- Quick insertion and deletion.

**Cons**

- Slow for searching and accessing.

## The Queue

Similar to the stack the queue is a sequential access data structure but is flipped. It follows **FIFO **(first in first out). A real life example of this is waiting in a line. The first person in line will be served first. When a new person shows up they get added to the line. I also like to think of a music playlist queue on Spotify. If you add a bunch of songs the earliest ones will play to the latest added. In queues we add to the end and remove from the beginning.

**Enqueue** onto the tail of the queue.

**Dequeue** to remove the head element of the queue.

**Peek** at the head element leaving it unchanged.

**Contains** to see if it is true or false that the queue holds the element anywhere.

Methods and speeds:

**Accessing **— O(n) you need to dequeue every single element in relation to the one you are looking for. This will scale based on how many elements are in **n**

**Searching **— O(n) much the same as above.

**Inserting and Deleting — **O(1) this is the same as stacks. No matter how much data is being represented it will only ever take 1 operation to add something to the head or remove something from the tail.

**Pros**

- Quick insertion and deletion.

**Cons**

- Slow for searching and accessing.

## Linked List

Linear data structure in which every element is an **object** called a **node**. Nodes have** **2 parts: the **data** that holds the actual content of the node, and the **reference (or pointer)** which points to the next node in the list. The data can hold strings, integers, and Booleans. The pointers reference where the next node resides in the computers memory. Every linked list begins with a **head node**. The** tail node** is last and will always point towards a** null value.**

You can:

**Add to the head of a linked list -** Make your new node point to the current head node.

**Remove the head of a linked list - **Set the head nodes pointer to a null value to cut off the flow of information.

**Add to the middle of a linked list - **Make the new node point to the node you want to follow it. Then change the pointer of the node located before your new node to point to said new node. If you have nodes of 1, 3, and 4 and you want to insert 2, you would have **2 point to the 3 node** and **change the 1 node to point to the newly added 2 node**.

**Remove from the middle of a linked list- **Change the pointer of the previous node of your targeted node to the node that follows the targeted node. Then your targeted node should point to a null value. This will cut off your targeted node from the flow of information.

**Add to the tail of a linked list - **Make the current tail node point to the newly added node.

**Remove the tail of a linked list- **Make the node previous to the tail point to a null value.

Methods and speeds:

**Accessing —** O(n) same as other sequential access data structures. You need to start from the beginning and follow the pointers of every node until the target is found.

**Searching —** O(n) same as above.

**Insertion and deletion —** This one is different it actually has 2 different time complexities. If you are inserting or deleting from the head or tail, it is instant as stated above. This process is O(1). If you plan to insert or delete from anywhere besides the head or tail, the list will need to be traversed leaving it with a O(n) time complexity.

**Pros**

- Backs other data structures.

**Cons**

- You can only go forward with pointers.

## Doubly-Linked List

Almost the same as a Linked list but uses a** next pointer** and a **previous pointer** to remove the drawback of linked lists only being able to go forward. The Previous pointer is almost an “undo”.

Adding or removing is the same as above. Just keep in mind tying the nodes pointers accordingly considering adding the previous pointer to the equation.

## Dictionaries

Sometimes called hashes, maps, associative arrays and objects depending on the language. This is an abstract data type which stores data in the form of **key/value pairs. **Just like in a real life dictionary that holds words with a matching definition tied to it, The data structure has **keys** that match up with a **value** they hold. Keys can be any primitave data type but *must* be unique and only hold one value. The values do *not* have to be unique, you could have multiple keys hold the same value. The values can be anything you can think of including other data structures.

Methods and speeds:

If using a hash table or hash function (a blog for another time) the speed of accessing, searching, insertion, and deletion is best case 0(1)(if the function is good) or worst case O(n) (if the hash function puts everything under the same key which is unlikely). Without using the hash table or hash function everything is O(n).

**Pros**

- Don’t use numerical indexes.
- Keys are flexible in which they can be any data type.
- If implementing a hash table or hash function it is the most efficient data structure.

**Cons**

- More advanced terms to learn to use efficiently.

## Trees

Please refer to another blog I have done for an intro to trees:

Here I will mention some structures I left out of the trees blog.

## Tries

Tries are a tree that store characters to traverse down a branch and make a word. This is used for things like autocomplete and spell check.

## Heaps

Heaps have a rule where a parent can only have 2 children maximum and the parent compares to the children in some way. These are sometimes used for sorting algorithms called “Heap sorts”.

**Min-Heaps-** The parent should be less than both of its children as well as any descendant below it.

**Max-Heaps-** The opposite of Min-Heaps. The parent should be greater than both of its children as well as any descendant below it.

## Graphs

Graphs are pieces of information and the paths that run between them. Like a map, it keeps track of point A and point B and how to get there. Information is kept in nodes and connected by **edges **(pointers again). Unlike trees that always start at the root node, graphs can start anywhere.

There are **directed** and **undirected** graphs. Undirected means that you can go back and forth to any node ignoring directions as long as there is an edge connecting them and establishing a relationship. Directed is like a one way street, if you go to that node you cannot go backwards. There are also **cyclical **and **acyclical** graphs. Cyclical means that there is at least one node that can travel back to itself somehow. All undirected graphs are cyclical. Acyclical graph has no nodes that can get back to itself.

Weighted Graphs are graphs that hold a numeric value (**cost**) in the edges. an example of this could be distance between 2 nodes in navigation.

Uses for graphs are plentiful and there are many graphs to use. Some common examples are navigation and social media friend networks.