"It is not even clear that intelligence has any long-term survival value. Bacteria, and other single cell organisms, will live on, if all other life on Earth is wiped out by our actions." – Stephen Hawking
In the previous tutorial of our Machine Learning for Everybody series, we explained the fundamental concept behind computer science, and hopefully transmitted an appreciation of the complexity of the field itself.
Through the sample problem of finding the shortest route between two points on a map, we tried to demonstrate the process of abstracting problems. As such, it should now be clear to the reader, that the first step in the construction of intelligent systems involves the translation of real world problems into abstract forms.
What is still a mystery however is what lies beyond this. That is, at what point a system can be considered intelligent, what it means to use AI techniques to solve problems and what the boundaries of these systems and techniques really are.
These questions are the focal point of this tutorial.
What is intelligence? What is Artificial Intelligence?
Before we can explain artificial intelligence, we must understand what intelligence actually is.
Unfortunately, this is an extremely difficult question, one for which we don’t actually have a globally accepted answer. Entire books across multiple disciplines – from psychology to philosophy, biochemistry and neuroscience – have been dedicated to discussing possible answers and definitions.
At its core, one of the main issues with defining intelligence is that we do not understand its purpose. Some objects or terms, such as money, have a clearly defined purpose. For example, the purpose of money is the storage and transfer of wealth. This makes defining the term very straightforward: “A current medium of exchange.”
On the other hand, not knowing the objective or actual purpose of something, makes it difficult to come up with a universally accepted definition.
One popular argument has always been that intelligence aids long-term survival of a species.
But as Stephen Hawking, Ernst Mayr and others eloquently pointed out: there is no clear evidence for this. Less complex organisms – such as cockroaches or bacteria – tend to be much more robust than the intelligent homo sapiens, and have existed for hundreds of millions of years. On the other hand, mammals – a species much more sophisticated and with varying degrees of intelligence – have an average life expectancy of 100,000 to 150,000 years before going extinct.
Indeed, Enrico Fermi – one of the main physicists responsible for the development of the atom bomb at the Manhattan Project – raised the question why, given the massive size of the observable universe, and the probability of many other earth-like planets capable of sustaining intelligent life forms, we have not yet made contact with other intelligent life forms. The question became known as the “Fermi Paradox”, and one of the proposed answers is the “Great Filter” theory, which in essence proposes that intelligence acts as a “filter” for any sufficiently advanced civilization. That is, any sufficiently intelligent life form will eventually evolve the means for self-destruction (for example by developing nuclear weapons or developing the ability to manipulate their environment to such a drastic extent that it will no longer be able to sustain them). According to this “Great Filter” theory, few intelligent life forms ever surpass this threshold of self-destruction, and it therefore “is the nature of intelligent life to destroy itself”.
Not only do we not know the fundamental purpose of intelligence, we also have many different angles from which we can look at the term itself. For example, psychologists tend to look at intelligence from the perspective of behaviour or culture, linguists and philosophers from the perspective of understanding, and computer scientists in terms of the ability to solve problems.
Consider for example the following four definitions of intelligence:
“The ability to acquire and apply knowledge and skills.” – Oxford English Dictionary (Lexico.com)
“Intelligence is not a single, unitary ability, but rather a composite of several functions. The term denotes that combination of abilities required
for survival and advancement within a particular culture.” – A. Anastasi, 1992.
“[…] the ability to solve hard problems.” – M. Minsky
“Achieving complex goals in complex environments” – B. Goertzel
The first definition – the one given by the Oxford English Dictionary – approaches intelligence from the perspective of acquisition. That is being able to gather, retain and apply knowledge.
The second definition on the other hand, coming from the perspective of psychology, ties intelligence to survival and advancement within an established group or species. This makes the definition of intelligence both location and context dependent. That is, placing an astrophysicist amongst an isolated tribe in the Amazonian rainforest would, by definition, make her/him less intelligent as her/his knowledge, skills and brain power would be unlikely to help her/him survive or advance in the jungle. Likewise, a tribesman would not be considered intelligent if placed in a modern metropolis.
Similarly, Minsky’s and Goertzel’s definitions are problematic if we were to universally accept them, as in order to determine whether someone or something is intelligent, we would need objective standards that define “hard problems”, “complex goals” and “complex environments”. This poses problems in and of itself, as the difficulty of solving certain problems or achieving certain goals tend to decrease as society becomes…more intelligent.
Furthermore, their definitions are difficult to reconcile with equally sensible definitions such as the one presented in the Oxford English Dictionary, which defines intelligence in terms of the ability to acquire knowledge rather than the ability to solve problems.
So, if we do not understand the purpose of intelligence and cannot agree on a universally accepted definition, then how can we even begin to define what it means to be “artificially intelligent”?
The truth is, that we can’t. Not completely at least.
The boundaries between what people consider to be intelligent, artificially intelligent and not intelligent, will (most likely) always be blurry, and subject to disagreement. A lot comes down to perspective, context and intuition.
However, given that as part of this tutorial i) we are neither trying to discuss the purpose (or lack thereof) to our existence; and ii) the context (i.e. that of solving abstract problems) of our discussions are clear, we can view intelligence in terms of the ability to learn and use concepts to solve problems.
We first came across this definition during a lecture by Maja Pantic, a Professor of Affective and Behavioral Computing at Imperial College London, leader of the iBUG group and the Research Director of the Samsung AI Centre in Cambridge. The definition presented by Pantic makes sense as it incorporates both skills, learning and the ability to solve problems, whilst leaving out the more problematic areas such as culture, behaviour and complexity.
Although the definition may not satisfy everybody, it provides a general basis for accepting or rejecting whether an organism or system behaves intelligently. It therefore allows us to move on to defining what it means to be artificially intelligent.
The beginnings of artificial intelligence
Having discussed what we mean by intelligence, it is now time to direct our attention back to the silicon and the switch, and look at how the idea of using machines to perform human cognitive tasks, came about.
Following the invention of boolean algebra, the next major epoch in the history of the computer would prove as an eminent turning point in one of the most devastating wars ever fought; World War II.
At the time, Germany commanded the largest submarine fleet in the world, sinking approximately one British vessel every 4 hours. To coordinate their attacks, the Nazi’s developed a complex code known as the “Enigma Code”.
Figure 1: The Engima Machine
Facing annihilation, the British Government established a top-secret group of intellectuals, dedicated to breaking the Nazi’s “unbreakable” code. Mathematicians, chess players, codebreakers, astrologers, secret service officers, technicians, crossword experts and even actresses worked shifts around the clock in an old estate a few kilometers outside of London.
Among them was Alan Turing, a British mathematician who, prior to the war, came up with the concept of the first computer. His machine, called the Turing Machine, was imaginary with the aim of processing equations without human direction. The idea was a conceptual breakthrough and resembled a typewriter that used logic to compute any imaginable mathematical function.
Figure 2: The Turing Machine
To this day computer scientists use the concept of “Turing Completeness” to measure computational systems. A machine is said to be Turing-Complete if it can calculate all possible mathematical functions that could ever exist. Based abstractly on his previous work, Turing, with the help of Polish intelligence, developed an electromechanical machine capable of cracking the Nazi cipher. The ”Bombe”, for this was what they called the ton heavy machine, was to be the predecessor to the first digital computer. By the end of the war, approximately 200 Bombe’s were employed by the allied forces, and Alan Turing was to be awarded a medal for his services to the British Government.
However, the Bombe did not prove to be the universal solution the allies had hoped for, as the Germans soon realized that their encryption mechanism had been broken and set about constructing an even more complex code which the allies promptly referred to as “The Fish”.
Once again it were the British, together with allied intelligence, that came to aid. At their code- breaking center at Bletchley park they constructed the world’s first programmable digital computer, called the “Colossus”, which they used to decode the German cipher.
Figure 3: The Colossus Computer
This machine was vastly inferior to today’s computers, but what a sight it must have been; the size of a room and consisting of 1,800 vacuum tubes, yet incapable of performing simple decimal multiplication. Its construction was top secret and it wasn’t until the 1970s that the British Government acknowledged its existence.
It was due to this that the great minds behind the birth of the computer were never credited. Instead it was the ENIAC (Electrical Numerical Integrator and Calculator, 1946-1955) that laid claim on being the world’s first digital, high-speed computer. Developed by the US Government to calculate ballistic firing tables, the ENIAC was much larger than the Colossus, required 548 square meters of floor space, weighed more than 27 tonnes and consumed approximately 140,000 watts of electricity.
Figure 4: ENIAC, the First Digital General Purpose Computer
For each new program, the ENIAC had to be re-wired, as, unlike today, computer programs could not be stored by the computer. Therefore, a lot of the work was hard manual labour and it took a team of 6 technicians to maintain the machine. The vacuum tubes that made up the actual processor had to be replaced on a daily basis, as the machine consumed approximately 2,000 tubes per month. The machine could predict the effect of weather and gravity on the firing of shells, mathematical equations that could take a human up to one day to solve. Not being able to store data proved to be laborious, and whilst Turing foresaw the development of stored program computers, it was a mathematician by the name of John von Neumann who first authored the concept.
In 1945, von Neumann published a paper that outlined the architecture for such computers, dividing their inner workings into four sections; the central control unit, the central arithmetical unit, memory and input/output devices. Von Neumann was accused of plagiarism by his team of researchers and consequently split up with them. However, the construed architecture remains at the core of every computer to this date.
Although the calculations performed by the Colossus, the ENIAC or the Bombe were complex, and solved real, cognitive puzzles and problems that could not be easily computed by a human brain, they were not considered “smart” or “intelligent”. The machines were complex, and built by the brightest minds, but at the end of the day, could not acquire knowledge, or learn by themselves.
To use the definition that we set out in the previous section, these machines did not have “the ability to learn and use concepts”. Instead, they followed fairly simple mathematical rules. What gave them an edge over the human brain, was that they could do so very quickly and reliably.
In 1943, in parallel to all wartime advances in information technology, Warren McCulloch and Walter Pitts started using their knowledge of biology and the human brain to construct a computational model that modelled neurons using the concepts of the switch’s “on” and “off”.
They showed that using this model, they could mimic the functioning of the human brain whilst computing any computable function. Their work gave rise to the first “neural network” (which we will discuss in detail in an upcoming tutorial in this series about Neural Network) and became the first officially accepted work on artificial intelligence. Their demonstrated way of computing brought information technology one step closer to the functioning of the human brain.
In 1951, Dean Edmonds and Marvin Minsky built the first actual physical computer based on this model whilst at the University of Princeton. And at around the same time, Alan Turing published his famous article “Computing Machinery and Intelligence”, which laid the fundamentals of research into artificial intelligence by, amongst other things, introducing a test for determining whether or not a machine is capable of exhibiting characteristics of human thought. The test, known as the “Turing Test”, is described concisely by Stuart J. Russel and Peter Norvig, in their book: “Artificial Intelligence – A modern approach”:
“Rather than proposing a long and perhaps controversial list of qualifications required for intelligence, he [Turing] suggested a test based on indistinguishability from undeniably intelligent entities – human beings. The computer passes the test if a human interrogator, after posing some written questions, cannot tell whether the written responses come from a person or not. […] Turing’s test deliberately avoided direct physical interaction between the interrogator and the computer, because physical simulation of a person is unnecessary for intelligence.”
Although the test is a hallmark in artificial intelligence, and the paper in which it was presented gave birth to many concepts in the field, the test itself is of no major relevance to modern AI. As many observers correctly note: Imitation is not the same than implementation.
Russel and Norvig, a mere half a page further down from their description of the test state: “The quest for ‘artificial flight’ succeeded when the Wright brothers and others stopped imitating birds and learned about aerodynamics. Aeronautical engineering texts do not define the goal of their field as making ‘machines that fly exactly like pigeons that they can fool even other pigeons’”.
As such, AI researchers concern themselves with studying the human brain and thought processes, and determining how to best build systems that can acquire knowledge and solve very specific problems (such as playing chess or recognizing cancerous growths in medical imagery).
Using our definition of intelligence, AI researchers concern themselves with creating computer programs that have the ability to learn and use concepts to solve problems. Given how difficult it already is to merely define intelligence, studying it is an even greater undertaking. Therefore, building systems that behave in an intelligent manner requires not just computer science, but knowledge drawn from a wide variety of fields, such as economics, linguistics, behavioural and cognitive psychology, biology and neuroscience, mathematics and philosophy.
Each of these fields tries to answer unique questions, from whose answers computer scientists working on intelligent systems draw conclusions and solutions, and in many cases try to implement them.
For example, economics tries to answer questions on how to best act under uncertainty, how to make decisions that produce optimal outcomes or how self-interested parties should best interact with each other.
Linguistics explains the structure and meaning of words.
Neuroscience on the other hand studies the physical functioning of the brain, whilst by looking at psychology we can draw general conclusions about how humans and other intelligent animals behave. Similarly, by studying philosophy we can gain insights into logic, meaning and how we arrive at certain conclusions.
By combining theories and insights from these different fields with mathematics and engineering we can, to some extent, develop software that can use historic data to learn and solve difficult problems within a specific domain.
Advancements in engineering
Implementing and using theories gained through fields outside of engineering and computer science to solve domain-specific problems still required actual advances in engineering.
Scientists and engineers needed the actual physical capability for machines to calculate hundreds of thousands of instructions per second in order to actually realize their implementations. Therefore, advances in artificial intelligence – which as we saw in the previous section is married to a variety of different fields and sciences – relied heavily on actual advances in engineering.
Therefore, as the idea of using machines to help us solve real problems – whether by means of simple mathematics or with the aim of producing intelligent systems – began to set foothold among government officials, British, Australian and American Universities began investing in more research projects on the topic of engineering and computer science. Researchers at different locations around the world – Cambridge, Sydney and Manchester – began work on the first stored program computers. Their work soon bore fruit, triggering worldwide academic interest in the development of computers.
The technological revolution whose foundations were laid by warfare, was soon to be embraced by intelligible business men. In 1952 an American company, IBM, who up to now had produced calculators, decided to enter the computer market by releasing the first commercially successful general-purpose computer.
Figure 5: The IBM 701 EDPM Computer
The IBM 701 EDPM Computer could execute 17,000 instructions per second and was available for rent at $15,000 per months. The nineteen models sold, were in use primarily by government bodies such as atomic research laboratories, aircraft companies, weather bureaus and even the United States Department for Defense. With such a demand in super-computers, IBM quickly realized that their future lay in computing technology. In 1954 another model was released, this time for use in Universities and business and by 1958 their computers could execute up to 229,000 instructions per second. IBM’s products were in high demand, some costing up to $3 million, with others available for rent at up to $60,000 per month.
Considering computers and information warfare as a means for the cold war arms race, the US government invested large sums of money to the benefit of corporations and Universities. With such a bright future ahead, interest among academic communities kept growing as more and more Universities began teaching Computer Science degrees. From now on, America became the primary breeding ground for information technology, concentrating many of its resources on technological advancement.
The 1960’s were the years for technological revolution. Computers could solve all forms of mathematical equations, and given enough money and time, mathematicians began mirroring more and more aspects of our life in mathematics. Some of this knowledge began manifesting itself amongst large corporations, who saved a lot of money by employing machines that could not only solve problems faster than the human mind, but also store massive amounts of information using a minimal amount of space.
One such example of commercial application that survives to this day is the SABRE system employed by American Airlines. SABRE was developed in the early 60’s and was the world’s first automated transaction processing system that now connects over 30,000 travel agents with approximately 3 million online users. Others include UNIMATE, the first industrial robot employed by General Motors and the first Computer Aided Design program ”DAC-1”.
By the mid-seventies, computers had embodied themselves inside most large US corporations and the diminishing cost of hardware together with increasing computer miniaturization created new markets, opportunities and frontiers. Medium to small sized businesses, for the first time in history, could gain access to low-cost computers. Hobbyists and students took advantage of this and began purchasing computer kits. This proved to be the beginning of the personal computer era, and would eventually lead to the wide-range adoption of computers, and hence artificial intelligence.
Solving problems using Artificial Intelligence
So far in this tutorial, we have spoken about how we could go about defining intelligence (as the ability to learn and use concepts to solve problems) and how creating systems that behave in an intelligent way requires the combination of a wide range of fields and theories.
We saw that advances in computing are closely linked to wartime efforts, and how complex calculations performed by computers at the time – such as calculating missile trajectories – are difficult for humans to perform, but easy for computers to solve quickly and correctly. However, we also learned that, whilst performing these complex calculations might make the machine appear smart, appearing intelligent is not the same thing than behaving intelligently.
If performing complex mathematical calculations – such as missile trajectories – that are difficult even for the smartest human to solve (and to do so correctly and consistently) is in itself not considered an application of artificial intelligence, then what is?
What type of problems does a computer need to solve in order to be considered intelligent?
Broadly speaking, artificial intelligence concerns itself with problems are difficult to define precisely. That is, problems that we can’t formulate abstractly in a straightforward way so that the computer can just follow a single step of instructions. Mathematical equations, such as those for addition, subtraction or the calculation of a trajectory, might be difficult for our human brain to solve quickly, but their solution can be precisely defined using a simple set of rules.
The problem of multiplying 2 by 2 for example, is easily defined by the rule for multiplication. Once this rule has been defined, a computer then needs to just follow the rule every time that we want to multiply two numbers together. The computer will precisely follow a precise definition of the problem and arrive at a consistent solution. There is neither variation in the problem definition itself, nor is there potential scope for ambiguity in the solution itself. Returning to our mathematical example: multiplication will always be multiplication. It will never mean something else if the input data itself changes. If we were to multiply 2 by 2, then the computer will follow the actual rules for multiplication in the same way than if we were to multiply 250 by 12. Likewise, the solution itself is always clear: 2×2 will always produce 4, and 4 will always just be a number. It has a precise meaning that, within the context of the multiplication, won’t change.
On the other hand, problems that can’t be defined precisely, require a certain amount of knowledge, experience and feeling to solve.
Consider for example the problem of identifying emotions in facial expressions.
If you see a friend, or family member, you will most likely be able to read their emotional state – whether they are happy, sad or angry – by looking at their facial expressions. But trying to define a precise set of abstract instructions for defining the emotional state given a facial expression would be very difficult, if not impossible.
Even if you were to succeed in creating a formula that defines, for example happiness on your friend’s face, chances are that this abstract formula will not apply to other faces. This is because each face is unique – the dimensions, colours and shapes vary, and so do the facial expressions given a certain emotional state. For example, when smiling, some people might show their teeth more than others. Others might smile more with their eyes and less with their mouth. And so on, so forth.
Unlike in the case of multiplication, following one precise set of instructions here will not work for detecting emotional states. When you identify an emotional state, you do not follow one or two simple, abstract rules. Instead, you build on a rich history of having encountered different faces and emotional states as a child. You learned through reactions and experience, and subconsciously built up a model in your mind that now allows you to quickly recognize the physical patterns in someone’s appearance that convey an emotional state.
Furthermore, unlike with multiplication, there is always a certain level of uncertainty when it comes to problems that can’t be clearly defined. When multiplying 2×2, the answer will always be 4. But when trying to detect an emotional state, we may not always make the correct judgement call. We have all been in a situation in which a person’s reaction or mood surprised us: maybe we thought they were angry at us, whilst in fact they were merely stressed or worried. This means that every time our mind performs the subconscious series of “calculations” for detecting an emotional state, there is a chance for error. Or in other words, there is a numerical certainty (or probability) identified with each emotional state that we classify.
Last but not least, if we were to ask a computer to determine an emotional state based on a picture of a face, then a certain amount of variation is introduced with each photograph that we pass into the computer. When multiplying two numbers, the value of the two numbers that are multiplied might vary, but we are sure to always multiply actual numbers. With the photographs however, there will always be small (or large) difference that we need to account for. In some photographs, the face might be closer to the lens and hence appear larger. In another photograph, the person might appear slightly further away. Noise, contrast and lense quality might introduce additional variations. The problem itself is therefore never quite the same.
Artificial intelligence is exactly about solving such complex problems that at times are ambiguous and always difficult to formulate abstractly.
Instead of following a simple set of abstract instructions, artificial intelligence tries to use a model of the world to arrive at a solution.
A model is in essence a simplified or abstract representation of the world – or parts of it – and can be constructed in a variety of shapes and forms. Some are constructed using ways of detecting patterns in knowledge or data that we have collected in the past, and try to represent these patterns in such a way that they can easily be used to make a prediction or classification using new, unseen data (this is the essence behind machine learning). Others are complex mathematical models that express the world using logic and rules that determine what we can or can’t do within this model (difficult scheduling or timetabling problems are often solved in this way). Again, other models are generated on the fly as need be.
Coming up with and implementing these models is the main challenge faced by AI researchers. That is, the challenge lies in creating representations of the world that are accurate enough to help us solve complex problems, yet are sufficiently simplified so that they exclude any irrelevant data or assumptions that could negatively impact the result.
Not only do the models need to be an accurate representation of the world or domain in which they are used, they also need to be general enough to account for variations in input data or unexpected changes.
Last but not least, the data used to build the model needs to be organized in such a way that a computer can easily and quickly use it to determine the solution for our problem. Just like a phonebook organizes entries in such a way that we can easily look up a person’s number, the model used to solve needs to organize data in such a way that a computer can process it within a reasonable amount of time. As we will see the upcoming tutorials of this series, these are no easy challenges.
Self-consciousness, subjective experiences and emotions
We often see films, read science fiction novels or listen to pundits on podcasts that depict or predict all-mighty, all-powerful sentient machines. The popular media has sufficiently promoted and intertwined the concept of consciousness with advances in artificial intelligence, that we felt we must briefly speak about the importance (or lack thereof) of self-consciousness, subjective experiences and emotions.
In the previous section, we described how intelligent systems use models of the world to learn about, and solve problems. Self-consciousness is the ability to understand one’s own position in this model – or, the world. Similar to intelligence, we know very little about consciousness and subjective experiences, how they come about and how they aid long-term survival. But in terms of short-term survival, several arguments exist that aims to explain its immediate existence.
Of these arguments, two are most prominent.
The first suggests that self-consciousness is a mere by-product of our brain. That is, it argues that consciousness is an emergent property that is the result of a wide range of complex processes – such as emotion and memory – working together. It attributes no actual value to subjective experiences and consciousness itself, and just accepts them as is.
The argument makes sense if we look at the myriad of tasks and actions that we need to accomplish each day in order to stay alive: none of them really requires us to be aware of them. Determining whether a fruit is edible, for example, requires that our sensory abilities – such as taste, smell, touch and sight – are sufficiently adjusted for the task. Understanding our position within this process or “experiencing” the process in its full richness is not really required to achieve this objective. Similarly, finding water, defending ourselves against predators, resting or mating, can be defined and accomplished in terms of rather “mechanical” processes and does not really require our full awareness.
The second line of thought argues that self-consciousness allows us to understand our position in the world better, and hence permits us to simulate outcomes of different decisions more effectively. In essence, it argues that self-consciousness provides us with a deeper understanding of the world, and the impact of our actions on it. By being aware of the effect that we play on
our environment, we can make more complex decisions, plan and think further ahead than if we were merely a collection of cells that react to biochemical signals.
Furthermore, the more we understand ourselves and are aware of ourselves, the easier we can interact and communicate with other beings around us.
Some authors and scientists, such as Noah Yuval Harari, point out that it is really our social ability that separates us from other species. Specifically, our ability to organize ourselves in large numbers. Social organization is something that insects – such as bees or ants – are very good at too. And undoubtedly, their level of self-awareness and consciousness is much lower than ours. However, so is the upper limit on the size of their social structures.
Many colonies vary greatly in their thousands, whilst some reach up to 300 million. The level of human organization however stretches into the billions.
Therefore, is self-consciousness a strict requirement for complex social interactions? And do these interactions indeed aid long-term survival?
The short answer is that we don’t know.
There exist opponents and proponents in both camps, many with sensible arguments. However, at least for now, the arguments on both sides are largely irrelevant to AI researchers to date. That is, regardless of whether one believes consciousness to be an emergent property, or whether one looks at it in terms of a complex simulator, to the best of our knowledge, consciousness is in and of itself is irrelevant when it comes to the development of intelligent systems. The domain-specific types of problems solved by computer scientists working in the field of AI don’t require the computer to actually be aware of its role in the process of determining a solution.
As we will see in the upcoming tutorials in our Machine Learning for Everybody series, the process of, for example differentiating between images of cats and dogs, does not require the machine to be aware of the fact that it is identifying the differences. Similarly, as we saw in the first part, finding the shortest paths between two points on a map, requires a model of the map and a technique for finding the shortest path using that model. At no point were we required to simulate some form of self-awareness on behalf of the computer. The same can be said about emotions.
Whilst AI researchers have an entire field – called “Emotion AI” – dedicated to the study, analysis and implementation of emotions, producing a machine that can actually “feel” (whatever that actually means) is not strictly necessary for the development of intelligent systems. Whilst having a machine that can detect, “read” or simulate emotional states certainly has a wide range of benefits (from improving the user experience of interacting with a machine to threat detection) and can help solve domain-specific problems involving emotions, it is not a prerequisite when developing intelligent systems.
In animals, emotions tend to serve as a quick-reaction mechanism, that circumvents the slower thought processes of the brain. For example, pain acts as a protection mechanism by quickly signaling to our body that “something is wrong”. If we were to accidentally touch a hot stove, nerves in our skin cells send a signal (pain) to our spinal cord that causes us to quickly withdraw our hand. The withdrawal is automatic and does not actually require us to think about the situation or process. We just do it. Similarly, emotions such as anger or the feeling that something is wrong help us function in certain situations without requiring us to think deeply about the situation itself. Emotions and intuition have evolved over thousands of years to help protect organisms from adverse circumstances or to promote certain behavioural patterns. They help us learn more effectively from others without having to experience certain situations ourselves, and allow us to communicate, organize and maintain beneficial relationships.
Whilst there are clear evolutionary benefits to having emotions, just like self-consciousness, there is no clear advantage in building emotional machines for most of the narrow, domain-specific problems that are solved using AI (unless of course detecting, interpreting or displaying certain emotions is part of the actual problem definition). The fast reactions that emotions allow us to make, or the ability “record” memories using certain emotions (such as intense fright) are implemented using other means when operating within the context of the digital computer.
Problems solved using AI
By now, it should be clear that when we talk about developing intelligent systems, we don’t imply the development of all-knowing, emotional, sentient robot overlords. Instead, AI researchers are concerned with developing solutions to specific problems – problems that are not easily expressed and solved using a set of abstract formulas, but problems that are often “fuzzy” and difficult to define.
Intelligent systems that attempt to solve these problems tend to act within a flexible environment, where input data is subject to a large degree of change and variation and where solutions often are not 100% clear cut but instead involve a certain level of uncertainty. So far, these definitions and explanations all sound very abstract, so let’s take a look at some of the problems that are solved using artificial intelligence.
Sample problem #1: Search
Search problems, as their name indicates, revolve around the process of searching for a solution given a large amount of data. The route planning problem discussed in the first part is a perfect example of such a problem. As part of the route planning problem, we modelled the map as a graph on which we used a search algorithm to find the shortest path. Our algorithm used the path length (i.e. number of nodes to traverse) as a performance measure, selecting the path which requires us to traverse the minimum number of nodes.
Search problems are well studied, and a wide range of general-purpose algorithms exist to help solve such problems. Route-planning is an intuitive example of a search problem, but is by far not the only one. Many real-world problems outside the space of navigation can be formulated as search problems. For example, robots that perform a certain series of actions such as hoovering different tiles in a room, often “decide” what action to perform next by searching for a certain sequence of decisions or actions that might lead to a goal state (such as having a clean room).
Search problems are formulated by
i) defining a range of possible states and their relations (these form the graph nodes),
ii) defining a goal state and
iii) defining a performance measure (such as minimizing the number of nodes to traverse or assigning weights to the edges between nodes and trying to minimize the overall cost of traversal).
Whilst search problems are a whole field of study in themselves, the models, algorithms and techniques are often shared, used and cross-referenced across other sub-fields of artificial intelligence.
Search problems fit the definition of the types of problems being solved by intelligent systems, as they deal with large amounts of data and a search space that involves a large degree of change and variation (just think about the search space for a route planner might change as the vehicle itself moves, traffic increases or decreases and the destination is changed).
Sample problem #2: Medical diagnosis
Despite decades of research, we do not yet fully understand exactly how the human brain processes images.
Nevertheless, computer vision has been an active topic of research, and machine learning techniques have been applied to detect patterns in images, and recognize and classify objects. One concrete example of the utility of using artificial intelligence to identify certain shapes in images is medical diagnosis. Here, computer software is used to diagnose medical conditions based off of MRI scans. For example, the computer can be used to detect tumours, sometimes more accurately than human beings. As part of this, machine learning techniques are used to “train” the software using medical images labelled by experts (how exactly this works we will discover in the upcoming chapter). When feeding in large enough amounts of data, patterns emerge. The AI uses these patterns to then classify the images or detect certain objects.
Given how image quality, organ shapes and sizes, bodies and tumour size, shape and position varies, we see how medical diagnosis, or image recognition in general, fits the criterion of “uncertainty” which makes the problem a tough and suitable one for artificial intelligence.
Furthermore, describing a medical diagnosis using an image and a concise formula or series of steps is difficult. Whilst the problem itself is very domain-specific (again, a characteristic of problem-solving using artificial intelligence), the problem of detecting a cancerous growth itself is difficult to define to begin with, and, from a human perspective, relies a lot on past experience and “intuition”.
Sample problem #3: Scheduling, resource allocation and timetables
Constraint programming” is a subfield of artificial intelligence that was pioneered by Eugene Freuder. The technique is used to solve “Constraint Satisfaction Problems” (CSPs). These are models that are created by describing the world (or problem) in terms of variables and the possible restrictions (the “constraints”) on those variables, along with a possible set of values that each variable can assume (the “domain”).
For many types of problems, this is a very natural way of expressing them, which led Eugene Freuder to say that “Constraint programming represents one of the closest approaches computer science has yet made to the Holy Grail of programming: the user states the problem, the computer solves it.”
A classic example of a constraint satisfaction problem is the Sudoku puzzle, which is often found on the backs of magazines or newspapers. This puzzle – usually a 9×9 square – requires users to fill numbers into the grid in such a way that each box, each row and each column contains the numbers 1 – 9. The numbers cannot repeat themselves (for example, a single row cannot contain the number 5 twice) and a few cells come with preset numbers which cannot be changed. Modelled as a constraint satisfaction problem, the variables in this case refer to the individual cells that need to be completed (see figure 3.1); the “domain” are the numbers 1 to 9; and the constraints are the following rules:
1. Each box must contain only the numbers 1-9. Each row must contain the numbers 1-9. Each column must contain the numbers 1-9.
2. The numbers 1-9 cannot repeat themselves within each box; row and column.
3. Certain cells must have the assignment of certain preset numbers.
Although a Sudoku puzzle is just a simple example of how we can apply the logic behind constraint programming, constraint satisfaction problems themselves can become extremely complex. They are a powerful method for solving real-world problems that include a large number of restrictions on their possible outcomes. Concrete examples of such problems include:
● Scheduling: The automatic creation of timetables or schedules, whereby entities (such as rooms and participants) have certain limitations (capacity restrictions, availability etc).
● Procurement: Finding the cheapest suppliers based on capacity limits and other criteria. For example, given a large (think hundreds or thousands of items) tender for sourcing different types of vaccines, different bidders might place bids at different prices for each vaccine type. The purchaser would want to identify the cheapest bidders, using not only the prices submitted, but maybe additional criteria, such as an upper limit on the total number of different suppliers.
Resource allocation: Factories often produce different types of products. Some of the raw materials used to produce these products are abundant, others are scarce. Factory owners might wish to determine how to allocate these scarce resources most efficiently, so that profit can still be maximized without affecting other constraints.
By formulating problems using variables, constraints and domains, computer scientists can use other techniques from artificial intelligence – specifically, search algorithms – to find solutions to these problems.
It is also important to note that CSPs tend to be very dynamic, in that the constraints are added on the fly and therefore don’t require a static problem formulation. This makes them ideal for solving real world situations where restrictions arise, change or disappear quickly. Depending on the problem, the change in even a single, simple restriction can produce thousands, hundreds of thousands or millions of new possible states or solutions.
Sample problem #4: E-mail spam filtering
In 2004, Bill Gates famously predicted that “Spam will be a thing of the past in two years’ time”. His prediction was off by almost 20 years, but at the time of writing it seems to finally come true.
Companies like Google have becoming fairly effective at eliminating spam from their products. Gmail – Google’s email service – does an impressive job at classifying spam messages correctly and hiding them from view. This is largely thanks to the massive amounts of data available to the company, that allows their machine learning and natural language processing algorithms to function very effectively (just how important data is to machine learning, we will see in the upcoming chapter).
Classifying emails as spam involves reasoning with uncertainty, and draws upon a range of different fields, such as natural language processing, probability theory and machine learning. As the contents of spam emails and techniques used by the spammers change, the spam filter needs to continuously update and “learn” from the new data.
Sample problem #5: Product recommendations
Recommender systems are becoming ever more ubiquitous. A very profitable example of a recommender system is Amazon’s product recommender. Based off of the types of products that you have looked at, purchased or rated in the past, Amazon will suggest products to you that “you might also like”. These suggestions appear under titles such as “Customers who bought this item also bought…” when browsing their website.
How does Amazon do this?
Using the data available to them, Amazon creates a complex web of product interactions, purchases, ratings and search terms to predict your likes and interests as accurately as possible.
In more general terms, what recommender systems, such as Amazon’s product recommender system, do, is build up profiles of thousands, hundreds of thousands or millions of users, based off of their individual behaviors and interactions with a system, application or website. Often times, slightly unethical approaches are also used by companies to factor in third party data that
was collected by other websites or services.
As you browse a website or set of websites, these recommender systems continuously adapt and update your profile, factoring in any new behavioural patterns that you might exhibit. Using machine learning techniques, different methods of organising data (such as things like ontologies which are ways of organising concepts and identifying relationships between different concepts) and things like “cluster analysis”, these recommender systems can make shockingly accurate predictions as to your likes, dislikes and intentions.
Limitations of AI
We have just seen some concrete examples of the types of problems that can be solved using artificial intelligence. By observing these examples carefully, you will have hopefully seen that these examples are not easily expressed and solved using a set of abstract formulas. The problem definitions themselves can be rather vague and the contexts in which they are framed involve uncertainty and constantly changing data.
Instead of being able to define a set of abstract formulas, these types of problems are solved by building an abstract model of a very specific problem domain, which are used to “infuse” the computer with “knowledge”.
In just four words of the preceding paragraph lies the key to defining the limitations of artificial intelligence: “very specific problem domain”. That is, artificial intelligence techniques are used for solving specific problems within very specific contexts.
These techniques are often very effective, and depending on the problem domain, can produce more accurate predictions, classifications or solutions than the human brain could. However, they are not “magic bullets” that can be applied to any problem, regardless of its definition.
Models are developed under specific contexts only. They therefore require a precise understanding of all factors influencing a possible solution. Given our limited understanding of the world, there may be factors that we simply cannot (yet) express abstractly. Consider human intuition for example: it might be possible to formulate intuition abstractly for a very narrow set of circumstances (such as a situation involving a very specific threat), but the broader the circumstance becomes, the more difficult it becomes to abstract and formulate it. In other words: AI fails for problems that are too difficult to formalize and abstract.
In a sense, that contradicts the beginning of this tutorial, in which we discussed how the application of AI is suited to areas where formulating a precise set of rules for a problem is difficult. The difficulty in abstraction is largely due to our lack of understanding of many problems (such as the replication of intuition) itself. We actually still only have a very limited understanding of how the human mind works. Emotions, sub consciousness, human psychology and our decision-making process are still being studied and we are far away from arriving at definitive answers to fundamental questions. Therefore, implementing models that mimic or reproduce these processes is difficult. How can we implement something that we do not fully understand? …And that is precisely why we are very far away from developing an “all-powerful singularity”.
Applying AI to solve problems may also fail when we have a domain-specific problem that we can formulate, but either lack the data to make an implementation work, or have the data but in a form which makes it difficult to detect patterns in the data.
Take for example the case of a procurement department at a large organisation: The act of procuring certain goods at certain rates can be precisely defined, and the actions involved could easily be automated. However, the data required for a procurement AI to learn how to make decisions on how to act, may not exist in a structured form. This data (which arises from decisions and discussions made over the years) could be scattered across emails, Word documents, notes from phone calls, recordings, different tender platforms, and so on so forth.
Collecting, categorizing, and structuring (i.e. organizing the information in such a way that it can be used by the AI) these hundreds of thousands of items so that they can be used by the AI might simply not be possible or financially feasible. Artificial intelligence (and especially machine learning) needs to be backed by data (how much data, depends on the problem and the way the problem is to be solved), and a lack of it can hugely limit the application of AI.
Last but not least, even if we can model the problem, and have sufficient data available to support the model and decision-making process, the model itself might fail. Bruce Bueno de Mesquita describes the 3 ways in which models fail very concisely in his book “Predictioneer’s Game”:
“Models fail for 3 main reasons: The logic fails to capture what actually goes on in people’s heads when they make choices; the information going into models is wrong – garbage in, garbage out; or something outside the frame of reference of the models occurs to alter the situation, throwing it off course.”
Conclusion
In this tutorial, we covered a lot of ground. We began by discussing the difficulties around actually defining what intelligence is, arriving at an acceptable middle ground, defining intelligence as: “the ability to learn and use concepts to solve problems”. We then used this definition to explain what exactly we mean by artificial intelligence, dispelling some of the myths around it, and essentially reducing artificial intelligence to a wide range of techniques (which draws from a series of different disciplines) that are used to get computers to solve complex, domain-specific problems that tend to be ambiguous and difficult to formulate abstractly.
We discussed the emergence of artificial intelligence, its history and learned that the problems themselves are being solved by trying to create abstract representations (called “models”) of the problem domain.
We then listed some examples of real-world problems that are currently being solved by artificial intelligence, and saw how the problems solved by intelligent systems tend to involve i) large amounts of data, ii) uncertainty and changing environments, as well as iii) significant variations in the input data received. Last but not least, we tried to dispel some of the myths surrounding artificial intelligence by discussing its limitations, illustrating that AI is really a problem-solving technique that tends to be applied to domain-specific problems. The development of intelligent systems really depends on our ability to:
1. Understand, formulate and abstract the problem and domain at hand.
2. Gather and structure the available data in such a way that it can support our model.
3. Incorporate the unexpected into our model.
With an understanding of what artificial intelligence is, we can now move on to the core topic of this tutorial series: Machine Learning. Stay tuned!
Note: Dear Readers, If you are interested in understanding the fundamentals of Machine Learning and Artificial Intelligence, make sure to read our Machine Learning for Everybody series.
This article was technically reviewed by Suprotim Agarwal.
This article has been editorially reviewed by Suprotim Agarwal.
C# and .NET have been around for a very long time, but their constant growth means there’s always more to learn.
We at DotNetCurry are very excited to announce The Absolutely Awesome Book on C# and .NET. This is a 500 pages concise technical eBook available in PDF, ePub (iPad), and Mobi (Kindle).
Organized around concepts, this Book aims to provide a concise, yet solid foundation in C# and .NET, covering C# 6.0, C# 7.0 and .NET Core, with chapters on the latest .NET Core 3.0, .NET Standard and C# 8.0 (final release) too. Use these concepts to deepen your existing knowledge of C# and .NET, to have a solid grasp of the latest in C# and .NET OR to crack your next .NET Interview.
Click here to Explore the Table of Contents or Download Sample Chapters!
Was this article worth reading? Share it with fellow developers too. Thanks!
Benjamin Jakobus is a senior software engineer based in Rio de Janeiro. He graduated with a BSc in Computer Science from University College Cork and obtained an MSc in Advanced Computing from Imperial College London. For over 10 years he has worked on a wide range of products across Europe, the United States and Brazil. You can connect with him on
LinkedIn.