Introduction to data structures: Tree

Learn about the tree data structure in an intuitive way

November 24th, 2024 4 minute read

The tree data structure is the first non-linear data structure that we will cover. It's a data structure in which elements form a hierarchy.
Let's learn what the tree data structure is and how it works.

What are non-linear data structures?
To answer this question, let's first describe linear data structures. If we traverse the elements in a linear data structure, starting from the first element and going to the last element, we would get a line. The reason is that each element has a single predecessor and a single successor. The first and last element are an exception of course (the first one has no predecessor and the last one has no successor). Examples of linear data structures that we have covered in previous articles are: array, linked list, stack and queue.
If we traverse non-linear data structures, we will not get a line because each element can have multiple predecessors and multiple successors. That means that when traversing an element there are multiple routes to take, not a single one as in linear data structures.

The tree data structure is a non-linear data structure in which each element has a single parent (i.e. predecessor) and multiple children (i.e. successors).
It gets its name from real trees. A real tree starts with a root and grows upwards. The root is followed by a trunk. The trunk has branches. Those branches have their own branches, that have even smaller branches... Finally, there are leafs.
The tree data structure is visualized with its root element at the top and it grows downwards.

Image taken from: https://tree-visualizer.netlify.app/

The elements of the tree data structure are also called nodes and the links between them are called edges.
Let's go over some terminology:

Parent - A node's predecessor.
Child - A node's successor.
Root - The first node in the tree. This node has no parent.
Leaf - A node that has no children.
Tree height - the number of edges from the root node to the furthest leaf node.
Tree size - the number of nodes in the tree.

In the example above, parent nodes are: 0, 1 and 2. Child nodes are: 1, 2, 3, 4, 5, and 6. The root node is 0. Leaf nodes are: 3, 4, 5 and 6. The tree height is 2. The tree size is 7.

The tree data structure usually provides the following operations:

add - used to insert a node into the tree.
remove - used to remove a node from the tree.
find - used to find a node in the tree.
traverse - used to traverse every node in the tree.

A tree data structure can be traversed in multiple ways. The reason is that each node can have multiple child nodes (i.e. the data structure is not linear) and we need to decide the order in which to visit the nodes. There are two main traversal approaches:

BFS - breadth-first search.
DFS - depth-first search.

Breadth-first search is a traversing approach in which the tree is traversed one level at a time. It starts at the root and it visits every node on each level before going to the next level. In the example above, the nodes will be visited in this order: 0, 1, 2, 3, 4, 5, 6.

Depth-first search is a traversing approach in which the the tree is traversed in depth, visiting the leaf nodes before coming back up to traverse the other child nodes. In the example above, the nodes will be visited in this order: 0, 1, 3, 4, 2, 5, 6.

There are different types of trees. They differ by the number of children each node can have and the way that they operate. Here are a few examples:

Binary trees - trees in which each node can have at most two children.
Binary search trees - binary trees in which the value of the left child must be smaller than the value of the parent and the value of the right child must be larger than the value of the parent.
AVL trees - trees that "balance" their height when add/remove operations are performed. This allows efficient search operations in the tree.

Let's give a few practical uses of the tree data structure:

The HTML DOM - the HTML Document Object Model is represented with a tree data structure. Each HTML element has only one parent (except the root element that has no parent) and each HTML element can have multiple children (e.g. "div" containing "a" and "img" elements).
File system - the file system can be represented with a tree data structure. Each folder has a single parent folder (except the root folder) and each folder can have multiple children (i.e. other folder or files).
Class hierarchy - single class inheritance can be represented with a tree data structure. Each class has a single parent class (except the base class) and each class can have multiple child classes.

How do trees work internally?
Depending on the type, the tree data structure can use arrays or linked lists internally. In the most general case, the tree data structure can be created using nodes that have a link to their parent node and a linked list that consists of their child nodes. The nodes have to "hold" the data as well.
In the case of a binary tree, we don't need a linked list for the child nodes. Binary trees can have only two children, so we can use two links instead. Binary trees can also be represented using arrays.

In future articles we will cover some of the different types of trees.

If you enjoy my articles, please consider supporting me.