"The unreal is more powerful than the real, because nothing is as perfect as you can imagine it. because it’s only intangible ideas, concepts, beliefs, fantasies that last. Stone crumbles. Wood rots. People, well, they die. But things as fragile as a thought, a dream, a legend, they can go on and on.” – Chuck Palahniuk
Machine Learning – Introduction
Like a giant nervous system, computers lie at the centre of human civilization, driving our very lives.
No single technology has so affected the modern world as the digital computer. No invention conjured such endless possibilities.
If you go to the buzzing business district of Rio de Janeiro, the crowded terminals at Heathrow or the immense port at Rotterdam, you will have a vague idea of the scale to which computers influence our world. Cars, planes, trains, subway systems, border controls, banks, stock markets and even the records of our very existence, are controlled by computers.
If you were born after 1990, then chances are that most things you ever experienced, used or owned are a mere fabrication of computational omnipotence.
The production, extraction, and distribution of oil, gas, coal, spices, sugar or wheat all depend, in one way or the other, on computational power. The silicon chip migrated into nearly every home, monitoring our time, entertaining our children, recording our lives, storing personal information, allowing us to keep in touch with loved ones, monitoring our home and alarming the police in the event of unwanted intruders.
And recently, computers have begun taking over the most fundamental tasks of our brains: pattern recognition, learning and decision making. Theories, techniques and concepts from the field of artificial intelligence have resulted in scientists and engineers building ever more complex and “smarter” systems, that, at times, outperform even the brightest humans.
In recent times, one subfield of artificial intelligence in particular has contributed to these massive advances: machine learning.
Figure 1: Abstract image depicting Machine Learning
” It’s ridiculous to live 100 years and only be able to remember 30 million bytes. You know, less than a compact disc. The human condition is really becoming more obsolete every minute.” – Marvin Minsky
Each subfield of artificial intelligence is concerned with representing the world in a certain way, and then solving the problem using the techniques that this model supports. Some subfields, for example, might model problems in graph form so that the computer can easily “search” for a solution by finding the shortest path from one point in a graph to another.
Other techniques might represent problems, such as solving a sudoku puzzle, using a matrix of numbers and imposing restrictions or “constraints” on some points in that matrix which the computer can not violate (this subfield of AI is known as “constraint programming”).
A solution to a problem expressed in such a way would be a combination of numbers so that none of these numbers violate the given constraints. Similarly, the field of machine learning looks at the world through its special lens: that of using large amounts of data to solve problems by means of classification.
That is, machine learning is all about getting computers to learn from historical data, and classifying new, unseen data, by using the “knowledge” gathered from having looked at this past data. Whilst this may sound complex, the essence of it is achieved by using techniques borrowed from statistics to write programs that are good at pattern recognition. This, in turn, is the fundamental philosophy behind machine learning, one eloquently expressed by the fictional character Maximillian Cohen in the cult classic “Pi” when he describes his three assumptions about the world:
1. Mathematics is the language of nature.
2. Everything around us can be represented and understood through numbers.
3. If you graph the numbers of any system, patterns emerge.
Therefore: There are patterns everywhere in nature.
Evidence: The cycling of disease epidemics; the wax and wane of caribou populations; sun spot cycles; the rise and fall of the Nile.
Although from a philosophical point of few, looking at the world in terms of numbers and patterns may seem rather cold-hearted, this way of thinking works well for solving the types of problems for which i) large amounts of data are available, ii) the problem itself is best understood through patterns, and iii) the information needed to solve a new problem in the domain constantly varies.
The first requirement – that of data – is pretty much self-explanatory. Given that machine learning is concerned with pattern recognition, then the more data we have, the easier it becomes to identify certain patterns in the data.
Take, for example, the problem of object recognition in images. Or more specifically, the problem of differentiating cats from dogs in digital images: The problem itself is well suited to pattern recognition, since i) it is relatively easy to collect a large number of digital photos of both cats and dogs, and ii) the images themselves tend to be similar enough in order for patterns to emerge across differences.
If we only ever had a single image of each animal – one of a dog, and one of a cat – then recognizing patterns between the two would be impossible. If, however, we had a thousand such images, the similarities and differences across these images will become apparent, allowing them to be categorized and labelled.
Whilst the need for data is easy to understand, the statement with the second requirement that “solving problems using pattern recognition requires that the problem itself is best understood through patterns” may seem silly at first. Nevertheless, it is an important one!
Unlike the fictional movie character Maximillian Cohen likes us to think, not everything in the world can be best understood through patterns.
Take for example, the problem of multiplying two numbers. The multiplication follows a predefined set of mathematical and accounting rules. Creating software that allows users to calculate the result, based on user input, will be easier if we simply write down the rules for multiplication (for example, 2×2) in computer code, instead of collecting thousands of multiplications and building up a statistical model to try and find patterns across these formulas.
Similarly, the problem of finding the shortest path from one point on a map to another might be better suited by modelling the map as a graph, instead of collecting the paths that thousands of citizens take to work every day, and trying to find some sort of pattern for each of the different points in the graph.
The third requirement – that is, the variation of information – refers to the inconsistencies in the input data when presenting the machine with a new problem to solve.
Returning to our example of distinguishing cats from dogs, it is easy to see what is meant by this statement: every cat and every dog is slightly different.
Although the general characteristics of each animal are the same, minor variations exist both in body shape, colour and size of each animal. Therefore, any new image that we ask the machine to classify, will differ slightly to the previous images that it has seen. The information needed to solve a new problem therefore, constantly varies.
This stands in stark contrast to the problem of finding the shortest path between two points on a map: whilst people might try to find the shortest paths between different points on the map, neither the points nor the paths themselves are likely to change (only in exceptional circumstances, such as when a new road or new building is being constructed). In this case, the information provided by the user to the machine will never change drastically. The user might ask to find the path between A and B, or B and C, or A and C, but the points themselves (A, B and C) will never change.
What to expect from this new Machine Learning Tutorial Series?
We, therefore, see that machine learning, far from being a magic pill, is a powerful problem-solving technique that works well for certain types of problems. That is, problems that are defined by strong variations in input, patterns and lots of data. What these problems look like, and exactly the different techniques that machine learning relies on to solve these problems, is the sole purpose of this Machine Learning tutorial series.
Your knowledge of the field will be built from the ground up, using practical examples and easy to understand explanations – without mathematics or complex technical terms.
We will start our journey by first examining what computer science and artificial intelligence are, and what they try to achieve. Then we will move on to focus specifically on machine learning, exploring the different concepts, how they work, and what their advantages and limitations are. By the end of this series, you will have a solid understanding of the topic itself, will be able to understand advanced technical concepts and differentiate fact from fiction, marketing hype from reality.
Here’s a list of topics that you’ll see in the future tutorials:
- What is Computer Science? (Covered in this tutorial)
- What is Artificial Intelligence?
- What is Machine Learning?
- Decision Trees Explained
- Neural Networks Explained
- CBC, Logic, Programming, Instance-based learning
- Deep Learning
- The Bigger Picture
What is Computer Science?
”If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is” – von Neumann
Before we can start discussing the various concepts and techniques used in machine learning, we must understand what artificial intelligence actually is. And before we can understand artificial intelligence, we must first understand how computers work and what computer science actually is. We therefore begin this series about Machine Learning by introducing Computer Science.
In its very essence, Computer Science lies at the cross-roads between mathematics and engineering, and it draws upon both fields to teach you how to analyse and solve problems and represent them in such a way that digital computers can process them.
Figure 2: Computer Science Mind map (Image Credit: Harrio)
The advent of Computer Science
Ever since their domestication, the workings of the computer have often been rendered inexplicable.
It is often difficult to comprehend that the computer is just a big calculator, a progression of the dry sciences of electronics and mathematics, performing addition, subtraction, multiplication and division.
While pure mathematics and computer science have transgressed over the past few decades, mathematics plays a fundamental role in the development of computational systems. It was really mathematics that provided the advent for our digital evolution, allowing us to represent our world in an abstract and logical fashion.
To truly appreciate just how profound a difference mathematics made to our existence, we need to look back 12,000 years to the first prehistoric settlements along the banks of the River Jordan.
Among the round houses and plastered storage pits, we find the origins of mathematical and scientific thought. Prior to their settling, hunter-gatherer communities led a very monolithic existence. Life was relatively simple and therefore little need arose to introduce abstract thought that could simplify problems. Anthropologists suggest that hunter-gatherer communities maintained a limited mathematical ability that did not exceed a vocabulary foregoing the number “ten”. Archaeological discoveries indicate that such limited knowledge was used by cavemen as early as 70,000 years ago for time measurement or quantifying belongings.
As communities settled, their needs evolved, as did their problems. With an increasingly sophisticated lifestyle, and therefore respectively sophisticated volumes of information, people were in need of a way to reduce chunks of information to essential characteristics. We call this system of summarizing the main points about something without creating an instance of it, abstract thought.
In addition, people began using symbols or sounds to represent these characteristics, without having to consider the characteristic itself. This realization was the birth of mathematics. The idea of using one “thing” to represent many “things”, without considering any “thing” in specific.
Figure 3: The Ishango bone, a tally stick, was used to construct a number system. This bone is dated to the Upper Paleolithic era, around 18000 to 20000 BC (Source: Wikipedia)
Take the number “eight” as an example. “Eight” or ”8” is merely a symbol or sound. It is not a concrete thing or object in the physical world. Instead it is an abstract thought that we can use to represent an amount of “things” – be it trees, animals, cars or planes.
Mathematics became a projection of a tribe’s sophistication and a magnification of their own thoughts and capabilities. As the first civilizations developed, they used such skills to solve many practical problems such as the measuring of areas of land, calculating the annual flooding of rivers or accounting for government income. The Babylonians were such a civilization and their work on mathematics is used to this day. Clay tabs found by archaeologists, in what is modern-day Iraq, show squares, cubes, and even quadratic equations.
Figure 4: The Babylonian mathematical tablet Plimpton 322, dated to 1800 BC (Source: Wikipedia)
To deal with complex arithmetic, the Babylonians developed primitive machines to help them with their calculations; the counting board. This great-grandfather of the modern computer consisted of a wooden board into which grooves were cut. These grooves allowed stones to be moved along them, representing different numbers. Such counting boards proved to be building blocks for modern mathematics and were arguably a considerable step towards academic sophistication.
With the eventual decline of the Mesopotamian civilization, much of the old wisdom was abandoned and forgotten, and it wasn’t until 500-600 BC that mathematics began to thrive again, this time in Greece.
Unlike the Babylonians, the Greek had little interest in preserving their ancestor’s wills. Their trade and battles with foreign cultures brought about swift change, as the Greek noblemen embraced the unknown. Frequent contact with foreign tribes opened their eyes and they soon came to appreciate new knowledge and wisdom. As they built the first Universities, the history of the world began to change at a great speed. Many of the world’s most brilliant minds began to gather at these centers of knowledge, exchanging theories and schooling the young.
One of these disciples was Aristotle, a student of Plato’s. He was to become the teacher of Alexander the Great and profoundly shaped Western thought by publishing many works on mathematics, physics and philosophy. One of his greatest feats was the incorporation of logic which, as we will see later on, led to the development of the first digital computer. The notion of logic will be discussed in one of the upcoming tutorials in this series; however, the basic idea is that, given two statements, one can draw a conclusion based on these statements. For example:
1. All Greeks are mortal
2. Aristotle is Greek
3. Therefore, Aristotle is mortal
This may seem rather obvious now, but at the time, such formalization of thought was an exceptional breakthrough in the way people viewed their surroundings. In fact, it was so significant to western culture that, nearly two thousand years later, a mathematician at Queens University Cork (now University College, Cork) devised an algebraic system of logic that expressed Aristotle’s logic in the form of algebraic equations. Known as Boolean Logic, the system turned out to provide the basis for digital electronics, and, without it, computers as we know them today would probably have never come into existence. In his memory, the University installed a stained-glass window in its Aula Maxima, showing Boole at a table with Aristotle and Plato in the background. In later years, the University named its new library after him and in addition launched an annual mathematics competition for students, allowing them to obtain the “Boole Award”.
Electrifying abstract thought
With an appreciation of the power of abstract thought and boolean logic, we now know that, given a number, we can manipulate it to illustrate anything: words, thoughts, sounds, images and even self-contained intelligence.
What we do not yet know is how to overcome the barrier between the metaphysical number and merge it with the physical world around us. That is, how can we use wires and electrical currents to bring complex mathematical equations to life?
As computers consist of switches (imagine those switches to be similar to the light switch in your room – the switch can either be on or off – in other words, it can either allow the flow of electricity or prevent it from flowing), the key lies in representing mathematics in the form of electric flows. That is, reducing numbers to two states; on or off. Or, as Aristotle would have put it; true or false.
As our numbering system consists of 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9 – in mathematical terms: base 10), we need a notation that allows us to represent a sequence of numbers using only two digits. The fabrication of such a scheme is accredited to Gottfried Leibniz, a German 17th century mathematician, who, while constructing a mechanical calculator, realized that using only two digits instead of 10 simplifies not only the construction of his machine but also requires fewer parts. The concept behind his idea was that any decimal number can be represented in patterns of 1’s and 0’s. For example, the decimal number 421, can be converted to read 101010. (we will examine the mathematics behind these conversions in the next section). The number 10011110 can be converted to read 158 and so on. This numbering system of base two (1’s and 0’s) is referred to as the binary.
Leibniz, in addition to being a mathematician and multilingual philosopher, was also among the first Europeans to study Chinese civilization. It was this passion with Chinese society that caused Leibniz to realize that the Chinese had preceded him. Having availed of the binary code for centuries, the Chinese used a different notation: broken and unbroken lines instead of Leibniz’s 0 and 1. Staggered by this discovery, Leibniz outlined his newly found knowledge in ”Explication de l’ arithmetique binaire” in which he stated that ”this arithmetic by 0 and 1 happens to contain the secret of the lines of an ancient Chinese king and philosopher named Fohy (Fu Xi), who is believed to have lived more than 4,000 years ago, and whom the Chinese regard as the founder of their empire and their sciences”.
What Leibniz was ultimately referring to in his writings was the yin-yang – a complex system based on the principles of what we call binary arithmetic; a two-symbol representation of the world around us. The broken (yin) and unbroken (yang) lines represent opposites: male and female, day and night, fire and water, negatives and positives. Coherently, the principle of the yin-yang is the building block of Chinese civilization and is at the heart of its society. Despite its significance, Leibniz’s (re)discovery was soon to be forgotten. It wasn’t until the 19th century that the logic upon which we have become so reliant was once again revived and elaborated upon with the emergence of George Boole’s boolean algebra.
The astute reader will have noticed a similarity between the ”on” and ”off” of the switch and the 1’s and 0’s in boolean logic; if we combine boolean logic and the idea of the switch, then we can see that any decimal numerical combination can be stored and processed in patterns of 1’s and 0’s using silicon, some switches and transistors. If then we would have a “controlling chip” that could perform simple arithmetic on these binary numbers, then we can represent any type of mathematical formula, functions or logical operations. If we can represent mathematics in the form of electric flows, then we can create entire models of actions and representations of different states. This, in essence, is how the digital computer works: translating man-made mathematical models that represent the real world into electric signals using boolean algebra.
From the “switch” to Computer Science
Now that we have a rough idea as to how computers work, it’s time to answer a fundamental question: What is Computer Science? The Oxford Dictionary defines computer science as “the study of the principles and use of computers.” But what exactly does that mean? And what can one expect to do when signing up to study computer science?
Most people think that computer science is all about learning how to program. Whilst programming certainly is an integral part of computer science, the field itself is about much, much more than just that. Instead of just teaching you how to write computer code, computer
science lies at the cross-roads between mathematics and engineering, and it draws upon both fields to teach you how to analyse and solve problems, using technology. It teaches you how to break down complex problems and how to describe solutions in a formal way, so that a solution’s correctness can be validated and proven. And, in a way that a computer can understand and execute these solutions. To allow you to do this, Computer Science will give you a fundamental understanding of how information technology works, from bottom to top.
Figure 5: Computer science in one picture.
And as a result of this, you will be learning how to use certain technologies – although learning how to use a technology is not the focal point of computer science.
That is, students of computer science will learn how to solve complex problems in such a way that they can be solved by a computer. To do so, the problem itself is first abstracted, and a general solution to this abstracted problem is created. This solution is then implemented, using a specific technology. For example, using the Java programming language whose instructions are executed using a virtual machine installed on a laptop running Microsoft Windows or Linux. The focus here is to become capable of abstracting and solving problems, and technologically competent enough to implement solutions to these problems. The aim is not to become an expert user of a specific technology (although this is usually the side-effect of repetitively solving and implementing solutions to problems).
…so what do Computer Scientists do?
We previously talked about “what Computer Science is” and discovered that computer science is, in essence, concerned about solving complex problems, using technology.
But, if you were a computer scientist or software engineer, how exactly would you go about doing that?
In essence, there really are only 4 things that computer scientists do:
1. When you encounter a problem, the first thing you do is analyse the problem, and see whether you can decompose it into smaller problems that are easier to solve. That is, a computer scientist would spend a large amount of time thinking about how to break down a problem down into smaller problems, and how to determine just how difficult a problem can be to solve. Abstraction forms an important part of this process (in their book “Artificial Intelligence: A modern approach”, Stuart Russell and Peter Norvig, give a concise definition of abstraction as the “process of removing detail from a representation”).
2. Once computer scientists have thought about, and analysed a problem, they can begin to develop solutions to the analysed problem. Each solution, they will need to describe formally. That is, they typically write down an abstract set of instructions (called algorithms) that solve the said problem. Such instructions must be universally understandable, concise and free of ambiguity. Above all, these solutions must be free of implementation details.
3. Once they have come up with a solution, computer scientists must prove the correctness of their solutions to all the other computer scientists out there. That is, they must demonstrate that their solution correctly solves the given problem, and any other instances of the problem. For example, consider the problem of sorting an unordered list of numbers: 3, 1, 2. Your solution must correctly sort this list of numbers in ascending order: 1, 2, 3. However, not only must you demonstrate that your solution is capable of sorting the specific numbers 3, 1, and 2; you must also demonstrate that your solution can sort any other unordered sequence of numbers. For example, 192,384, 928, 48, 3, 1,294,857.
4. Last but not least, computer scientists must implement the given solution. This is the part where programming comes into play, and where you really get your hands dirty. As such, when people think about computer science, this is the part that they tend to think about. Programming is all about learning how to write instructions that a computer can understand and execute, and is a huge field in itself.
In order to become competent at analysing problems and developing solutions to these problems, you will need to understand the fundamental principles behind information and computer systems. As such, a computer science course will typically cover topics such as:
1. Logic and boolean algebra. These are huge, fundamental aspects of computer science. We already touched upon this in earlier sections of this book. Understanding boolean algebra will help you understand how silicon chips combined with some electrical wiring, can work together to perform incredible complex tasks such as connecting to a massive network of other computers and playing YouTube videos.
2. Theoretical foundations behind information, computer architectures, algorithms and data structures. The latter are, in essence, constructs that hold and organize data and give rise to efficient ways of processing and, searching and retrieving data. A very simple type of data structure which every reader should be familiar with is a list which represent data, such as numbers, in a linear sequence (for example: 1, 2, 3). There exist however many more complex data structures, such as trees which organize data not in a sequence but in the form of nodes, branches and leaves (imagine this just like a christmas tree).
3. Theories behind data organisation, storage and the processing of different types of data. These theories give rise to things like databases (which are pieces of software that focus on storing data in the most effective form so that the data can be recovered quickly from disk) and multimedia (which is concerned with things like image file formats and recording, storing and reproducing sound and video).
4. Networking and communication. This, in essence, is the study of “how computers talk to each other”.
5. Theories governing secure systems. That is, how to prevent attacks or unauthorized access on computer systems, how to communicate securely and how to encode data in a secure fashion (cryptography).
6. Programming and software engineering. This teaches you how to actually turn abstract solutions to problems (i.e. algorithms) into actual programs that can be executed by a computer. It is also concerned with how to best design and implement said programs, as well as with the development of new programming languages.
7. Advanced, domain-specific topics such as how to create programs that learn by themselves or exhibit intelligent behaviour, a subfield of which (machine learning) is the focal point of this book.
Machine Learning Example
At this point, we know:
i) how computers work (electric switches that can perform logical operations using boolean algebra),
ii) what computer science is (the study of translating real world problems and solutions into abstractions that can be solved using boolean algebra), and
iii) what computer scientists do (analyse, solve and translate problems and solutions).
Let’s now look at how we would actually go about studying a real-world problem, simplifying and abstracting it, producing an algorithm to solve it. We won’t dive into writing an actual
implementation of our algorithm – after all, learning to program is far beyond the scope of this series!
Furthermore, the process and fundamentals outlined in this section might seem complex at first and are designed to give you an appreciation and understanding of the entire field of computer science. A thorough understanding of the problem itself won’t be necessary to understand the future articles of this series, but will of course help you appreciate the complexity behind artificial intelligence and machine learning in general.
The sample example problem that we will be focusing on in this section is that of finding the shortest route between two points on a map.
It is a problem that has been well-studied in computer science and is one that we have all faced in real-life. Its solution transformed logistics and travel worldwide, and has been applied to a wide range of other problems that are unrelated to maps or transportation. Many excellent manifestations and implementations of the solution to the shortest-path problem exist in products such as Google Maps, Waze or Uber. As such, the problem itself shouldn’t require much explanation: given a map, and two points on this map, we want to find the shortest path between these two points. For example, given that we are at the town hall, we might want to find the shortest route to the local library.
The first thing we should do is find a way to abstract the problem. That is, we should forget everything about physical maps, buildings, cars and the city. Instead, we should think of a way in which we can represent the map itself in the simplest form possible, capturing its essence and removing everything else that is not needed. One way to do this is to represent the map as a graph.
We have all seen a graph at one point or another in life, and we know that it basically is just an abstract diagram. The nodes in the graph represent the different buildings on our map, whilst the edges (lines connecting the nodes) represent the streets between these different buildings. For the sake of simplicity, we will ignore the length of streets or the amount of traffic on them and just consider the number of points (or nodes) that a driver would need to traverse in order to reach his or her destination.
Using this graph representation, we have simplified or abstracted our map, since we now:
1. Removed the unnecessary pieces of information, such as colours, gradients, trees and shapes of the streets.
2. Maintained only the essence – that is, the locations and paths between the different locations – of the map.
3. Represent the map in such a way that we can reason about it more easily and represent the map digitally (as we will see shortly).
Now that we have represented our problem in a more abstract form, we must think about solving it. How exactly would we determine the shortest path from one node to another?
Well, to provide an answer to this question, we must first look at what exactly the shortest path is. Given our representation of the world (or map) in graph form, the definition of the shortest path between two points is simply the path on the graph that traverses the least number of nodes. Looking at figure [TODO], we see that path B is shorter than path A, since path B involves only travelling across 2 nodes, whilst path A involves travelling across 3 nodes.
Starting at a given node M, we therefore know the shortest path to T to be the path that passes through R and S. This is easy to figure out by just looking at the graph and then counting the
number nodes in our path. But how would we write a formal set of instructions (an algorithm) to describe a solution?
Well, let’s think about simplifying the problem a bit: by definition, starting at node M, the shortest path to node T would be the shortest path to node T from either node N or node R. Likewise, from node R, the shortest path to node T would be the shortest from node S to node T; from node N, the shortest path to node T would be the shortest path from node O to node
T. From node O, the shortest path to node T would be the shortest path from node P to node T. And so on, so forth. Therefore, if we were starting at node M and wanted to find the shortest path to node T, then we would only need to know the length of the path from node R to node T and the length of the path from node N to node T. We would then need to choose the shortest of the two.
We could therefore write a piece of code that calculates the length of the path between two nodes. We would then invoke this piece of code for all the immediate neighbours of our starting node, and then choose the smallest number (i.e. shortest path length) returned.
Since we need the actual path, and not just the length of the path, we would somehow keep track of the nodes with the minimum path length, and then use the node with the minimum path length as our new starting point.
Let’s call the piece of code that calculates the length of the path from one node to another pathLength(a, b). Don’t worry about how this length is being calculated for now. Just assume that some dark magic returns the length of the path, if we want to get from node a to node b, where node a and b can vary to refer to any node that we like.
When starting off at node M, we therefore use the pathLength code twice. Once to get the length from R to T, and once to get the length from N to T:
length from R to T = pathLength(R, T)
length from N to T = pathLength(N, T)
By looking at the graph, we know that pathLength(R, T) will return 1, whilst
pathLength(N, T) will return 2 (since it takes two nodes to get to T).
Since 1 is less than 2, we know that the shortest path from M to T will involve us travelling through node R. We therefore record R by adding it to our list of nodes that we use to keep track of our shortest path (let’s call this list path).
Our new starting point is now R, so we will need to execute pathLength for all of R’s immediate neighbours: U and S.
length from U to T = pathLength(U, T)
length from S to T = pathLength(S, T)
Since S is directly connected to T (and hence passes through no intermediate node), the length of the path from S to T is 0. The length of the path from U to T is 1, since we need to pass through P to arrive at T.
We therefore choose S and add it to our path list. Our path list now contains the nodes R and S, and hence our shortest path from M to T, which passes through R and S. And that’s it! We have discovered a simple way of calculating the shortest path from any node in a network or graph to any other. Don’t worry if this seems a bit complex. You won’t need to memorize or fully understand every fact of this solution in order to understand future chapters.
We now want to put this solution down on paper in a way that it is easy to understand and can later be converted into code by a programmer. Let’s do this now:
shortestPath(from, to): minPathLength = infinity bestNode = nothing
for neighbour n in getNeighbours(from): if n is to:
length = pathLength(n, to) if length < minPathLength:
minPathLength = length bestNode = n
return [bestNode] + shortestPath(bestNode, to)
Does this look overwhelmingly complex?
Don’t worry, let’s break things down. shortestPath is the name of the piece of code that will calculate the shortest path from one node in the graph, to the other. It will return the actual shortest path in the form of a list. For example, shortestPath(M, T) will return [R, S, T].
The first two lines in the shortestPath algorithm simply say that the value for minPathLength is set to infinity (or just some really really large value); and that bestNode is empty (or nothing). Next, we will go through all of the neighbours from our starting node. If a given neighbour is our destination node (to), then we terminate, returning an empty list
containing only the destination node. Otherwise, we perform our path length calculation by executing the pathLength code which returns the length from n to our destination node.
Once we know the length, we check it against the minimum length known to date: If it is less,
then we set the minPathLength to the new minimum length, and bestNode to the node containing the new minimum length.
We then terminate by merging the list containing our bestNode with that of the list returned by a new invocation of shortestPath from bestNode to the destination node.
If this explanation and the algorithm itself is confusing, try to spend some time thinking about it. Don’t worry if you didn’t fully understand it: as long as you understand the overall idea behind:
1) abstracting a problem
2) formalizing a solution to an abstract problem
…then you understand the very essence of computer science, and are well equipped to move onto the next chapter. To the reader more familiar with abstract reasoning: We greatly simplified the problem and solution to the shortest path problem in this section. Many different, more complex and more efficient solutions to this problem exist. For example, we could the same approach (modelling the map as a graph and traversing the nodes and edges of this graph to find the shortest path) to also factor in things like traffic or the length of the roads by applying weights to the different edges, and then finding the path with the minimum sum. But this is far beyond the scope of this article.
In this article, we explained the very essence of computer science. We first bridged the gap between real world problems and silicon chips, giving an indication as to how, using logic and abstract thought, we can model real problems and have them solved by a computer. The take home message here is that we can use abstract thought to summarize or represent our world in such a way that we can reason about it and produce general solutions to concrete problems. That is, using abstract thought, logic and mathematics, we can take a problem and formulate a solution to the problem in such a way that the solution can be applied to other instances of the same problem.
By formulating solutions in a certain way, we can use computers to solve them for us by expressing the solutions using a precise set of instructions. Under the hood, the computer translates these instructions, using boolean algebra, in such a way that we can use electrical currents, silicon chips and wiring, to calculate a result for us.
In the forthcoming articles of this series, we will see:
This article has been editorially reviewed by Suprotim Agarwal.
C# and .NET have been around for a very long time, but their constant growth means there’s always more to learn.
We at DotNetCurry are very excited to announce The Absolutely Awesome Book on C# and .NET. This is a 500 pages concise technical eBook available in PDF, ePub (iPad), and Mobi (Kindle).
Organized around concepts, this Book aims to provide a concise, yet solid foundation in C# and .NET, covering C# 6.0, C# 7.0 and .NET Core, with chapters on the latest .NET Core 3.0, .NET Standard and C# 8.0 (final release) too. Use these concepts to deepen your existing knowledge of C# and .NET, to have a solid grasp of the latest in C# and .NET OR to crack your next .NET Interview.
Click here to Explore the Table of Contents or Download Sample Chapters!
Was this article worth reading? Share it with fellow developers too. Thanks!
Benjamin Jakobus is a senior software engineer based in Rio de Janeiro. He graduated with a BSc in Computer Science from University College Cork and obtained an MSc in Advanced Computing from Imperial College London. For over 10 years he has worked on a wide range of products across Europe, the United States and Brazil. You can connect with him on LinkedIn