How We Form Political Opinions:
An AI Viewpoint
Almost two years after the 2016 election in the U.S., I did something I had always carefully avoided: post a few political commentaries on my Facebook page. My friends were doing the same thing. Something in the U.S. political life was obviously irritating us, regardless of whether our views were liberal or conservative. The opinions I read, and the ones I produced, were very passionate and probably tried to change the minds of those who disagreed. But these attempts always failed and the question is why. So it would be appropriate to look at how we form our political opinions, what predisposes us to adopt one stance or the other. And what makes us accept or reject articles of evidence that others are proposing.
The foundations of AI seem to offer some insight. It's a possible explanation but you should think it over yourself. Psychologists have two principles which can describe behavior regarding political posts: confirmation (or cognitive) bias and cognitive dissonance. Bias says that people look for information confirming their existing beliefs and dissonance says that they are uncomfortable with information that does not conform to those beliefs. But our interest lies in the deeper question as to why do bias and dissonance happen. The "why" is a question for neuroscience, and you will see many times on our website that neuroscience and AI cross-fertilize each other very well. In a nutshell, we begin to look at opinion forming as a process of inference, more specifically, Bayesian inference.
Bayesian inference is based on a small formula due to Thomas Bayes. When the formula is explained to students, examples are usually taken from medical testing. People tend to relate better to these medical tests, because they most likely encountered such tests in their own lives, and they did so at difficult times and assigned more importance to them. Moreover, unaware of Bayes' formula, many physicians incorrectly interpret the results sent back by medical labs, sometimes to the chagrin and worry of their patients. We'll also take our example from medical testing, and add the recommendation that you consult with other sources that use medical testing as examples, in order to further enhance your understanding. Bayes' formula is one of the foundations of statistical analysis, which is the theoretical basis on which AI itself rests. This little formula has amazing power and the more you think about it and apply it to different situations, the more "aha!" moments await you. It may not appear natural at first, so it takes a bit of practice.
Say a woman over the age of 50 undergoes a mammogram test, the standard test for breast cancer. In general, 90% of women who have breast cancer test positive, this is the so-called True Positive Rate. And 8% of those who do not have breast cancer also test positive, the so-called False Positive Rate. Now, let's say that a test comes back positive, is it time to panic? Sometimes, either patients or physicians, incorrectly and alarmingly look at the True Positive Rate and assign it too much weight. Bayes formula brings out the answer, as we shall see. (Now, the conventional wisdom is that anytime you put a mathematical equation in your writing you loose half of the readers. I don't see how we can sensibly talk about AI without Bayes' formula, so we'll assume that risk. The hope is that if some equations bother you, you will choose to look at the words around them and skip the bothering part.) Here is Bayes' formula:
We have a hypothesis H = "Patient has breast cancer", and the evidence E = "Patient tested positive". Our belief about H, before we look at the evidence E, is captured by the prior probability P(H); in the absence of any other evidence, P(H) is usually taken from past general knowledge; in our case, 1% of all women of age over 50 have breast cancer, so we can take P(H) to be 1%. Bayes' formula allows us to refine our opinion about H after we factor in the evidence E. The vertical bar | in the P(H|E) of Bayes formula is to be read "given that", the word posterior just means that we are looking at the probability of H after we factored in the evidence E. And with this cleared up, the answer we are looking for is given by the left-hand side of Bayes formula, i.e., what is the probability that the patient has breast cancer given that the patient tested positive?
Since the likelihood that someone who has breast cancer tests positive is 90%, that is our P(E|H). So we pretty much have everything we need to apply Bayes formula and get our answer, except we still don't have the quantity P(E), the probability of testing positive in general. We have P(E|H) but not P(E). Now, either H is true or its negation (which we denote by ~H, ~ just meaning not) is true, there are no other possibilities, so P(E) = P(E|H)*P(H) + P(E|~H)*P(~H). If this splitting of P(E) looks contrived, just translate the whole thing into English to see why it all makes sense.
So, P(H)=0.01, therefore P(~H) = 1 - P(H) = 0.99. We already are given that P(E|H) = 0.9, P(E|~H) = 0.08. So, P(E) = 0.9 * 0.01 + 0.08 * 0.99 = 0.0882 and finally P(H|E) = (0.01 * 0.9)/0.0882 = 0.102. So, the probability that a woman over 50 has breast cancer given that she tested positive is still pretty small, 10.2%.
But let's say that the patient runs the same test with a different lab and the result comes back positive again. This time your prior probability P(H), based on the evidence we gathered prior to the new test, is P(H) = 10.2%. Therefore P(~H) = 1 - P(H) = 0.898. You plug in the new numbers in the Bayes formula, and this time P(H|E) = (0.9 * 0.102)/(0.9 * 0.102 + 0.08 * 0.898) = 56.1%. So unsurprisingly, the probability of having breast cancer went up significantly. Now let's say that you run DNA tests and they come back showing that the probability of breast cancer for people with this patient's DNA profile is extremely low. You plug that number again in the Bayes formula and calculate the new P(H|E), with the new evidence. This process is typical of how we reach conclusions, continuously refining P(H|E) based on new pieces of evidence E and the old prior P(H). And then again the posteriors become priors for new evidence.
We should use this opportunity and make an important connection with AI systems, since this is where our interest is. These quantities, the False Positive (FN) Rate and the False Negative (FN) Rate are also derived when calculating the effectiveness of a particular type of frequently used AI algorithms, the so-called classification systems. For example, systems that classify your received email messages into spam and non-spam, or logins to a bank into valid logins or fraudulent ones. Medical labs as well as AI engineers are actually calculating a more comprehensive set of numbers, not just the FP and FN rates, they are calculating what in statistics is called a confusion matrix, a 2 x 2 matrix like the following one:
Let's make some sense of this confusion matrix. Say we design an AI system that looks at various photos and tells us whether the photo has a horse in it or not. We train our system on thousands of photos, some with horses, some without horses. The most progress in AI so far has been made exactly in situations like this, in which we train an AI system by giving it labeled data, i.e., we tell the system which photos have horses in them and which do not, so that it could learn from that data. The system will develop a model of the problem allowing it to make predictions when it sees new photos. Obviously, we would want to know how good our system is. We collect 165 photos that the system has not seen yet, and ask the system to classify them. The results of this test are captured in the confusion matrix above. In the matrix, Predicted means that the AI system has predicted that the photo has a horse in it. And Actual means that the photo does actually have a horse in it.
The system has correctly identified 50 photos that do not have horses in them as non-horse photos (True Negatives), and also correctly identified 100 photos that have horses in them as horse photos (True Positives). But it has incorrectly identified 10 non-horse photos as horse photos (False Positives), and 5 horse photos as non-horse photos (False Negatives). Since the 165 photos are made of 60 non-horse photos and 105 horse photos, the system has a False Negative Rate of 5/60 = 8%, and a False Positive Rate of 10/105 = 9%. You can replace Predicted with "Tested Positive" and Actual with "Has Breast Cancer" to see what this confusion matrix would look like in the case of the medical screening lab for breast cancer that we used in our example for applying Bayes formula.
We may think of the human brain as a Bayesian inference machine. The idea, which goes back to the 19th century and the work of Hermann von Helmholtz, is that brains compute and perceive in a probabilistic manner, constantly making predictions and adjusting beliefs based on what the senses contribute as evidence.
With this new extension of Bayesian principles to the human brain, let's return to the original goal of the article, and assume that H is a political opinion that we have, like H="Climate Change Has Definite Human Causes". In a 2019 poll, 25% considered H to be at crisis level, further 36% considered it to be a major issue, 20% considered it minor and 18% thought it was not a problem at all. So, adding the crisis percentage and the major percentage, we can ballpark that H has an approval rating of 61% of the U.S. population, so we should start with the prior P(H) = 0.61, and calculate the posterior P(H|E) given some pieces of evidence E that are presented to us. One could take E = "New analysis of the atmospheric CO2 shows conclusively that the increase is coming from burning fossil fuels". The calculations would proceed as above, but you understand the idea by now. The totality of all these political opinions form a model of the political environment in which the human lives, a model of his/her political reality if you wish.
From the above, you saw that the evidence we gather will pump up or down our beliefs. But you notice something odd among your Facebook friends, you notice that their beliefs do not seem to change with the evidence that accumulates daily and of which we are made aware by media outlets. Are we treating the evidence differently?
An explanation for this oddity requires an extension of the concept of Bayesian inference in the brain. This extension is given by a profound (but still difficult and inaccessible) recent discovery, made by a neuroscientist. Karl Friston's work is an attempt to provide a unified framework for several theories of the brain. (This framework has fertile connections with Gerald Edelman's neuronal group selection theory, which will appear in our article on Artificial Consciousness and it also connects to reinforcement learning, the AI technique at the center of our attention in the following Main AI Concepts article.)
Before Friston's discovery of the Free Energy Principle, the brain was looked at as a passive inference engine, building its model based on sensory data obtained through senses and minimizing the modeling error, i.e., minimizing surprise. Think of surprise as the difference between the states that the brain predicts it is in and the states that the sensory organs tell it that it is. This surprise is more formally known as free energy, hence the name of the principle. It was Friston who pointed out that the brain is a bit more complicated than just a passive Bayesian engine, which only considers beliefs and perception. Friston adds action. The brain actually is actively directing the sensory organs and the motor organs to look for data that also minimizes surprise. (This form of active cognition, for which the brain needs a body in order to gather evidence, is known as embodied cognition. In robotics, one of the biggest challenges is not to create the inference engine, but rather to create the "body" that allows the robot to exercise this embodied cognition.)
I chose the following diagram of the Free Energy Principle with a few goals in mind. Please ignore the math equations, but not the words surrounding them, especially environment, brain and the Markov blanket consisting of all the states at the boundary of the system (in this case the human brain) and its environment. The equations do not concern us for now, just the fact that they exist; what is important is that the principle is based on sound mathematics. The diagram below also shows how much the interests of neuroscientists and AI researchers are converging, a topic on which we will elaborate at length; both groups are using the universal and precise language of mathematics and developing a common set of concepts. We will use this test of "acceptance of a theory if the math behind it is sound" blindly a few times, because most of the math itself is not our interest. Luckily the main principles behind AI have a very easy math and those will be of interest to us.
Just as the principle is used to build more powerful AI agents, it is also used to explain mental illness. Disorders like schizophrenia or depression can be explained as such: the brain attaches inadequate weight to the information coming in from the senses, forming a faulty model of reality and acting in such a way as to minimize surprise and deviation from this faulty model. It would not be surprising if in the near future, the neurologist (some neuroscientists, like Friston, started as neurologists) who shows up in a white robe at the schizophrenia patient's side turns out to be a computer scientist, or the researcher tasked with producing customers' psychological profiles in a purely technological AI startup is actually an MD.
This new kind of inference is referred to as active inference. Let's notice that the Markov blanket (the states at the boundary between the brain and the environment) consists of both sensory states and active states, this fact will be essential to some of our arguments.
Now returning to our question, this active inference that the brain uses, leads to the conclusion that it is not that our brains necessarily look at evidence differently, it is that they look for different kinds of evidence. "We sample the world," Friston writes, "to ensure our predictions become a self-fulfilling prophecy." The fact that people whose brains have produced models who already support the conclusion that the U.S. needs a wall on its southern border switch to Fox News on their TV remote while the ones who don't believe this conclusion tune to CNN, is a manifestation of this principle. Again, this is why the confirmation bias and the cognitive dissonance happen.
This model of the brain as a Bayesian inference engine actively looking to minimize surprise is a very useful one and it explains well how we form opinions in a mechanistic way, but it leaves out human curiosity and will. These are more subtle aspects of our makeup and there is yet no formal way to incorporate them into a theory of neuroscience.
There is a way to define surprise in the information produced by a data source. (The fancy name for this is entropy, and you can already see it making its appearance in the diagram above, but we will treat it in more detail later.) Suffice it for now to mention that low-probability events (more surprise) which our brain encounters (through senses) carry more information than events of high probability (less surprise). And assuming that it is desirable to accumulate more information from our environment, since presumably more information leads to a more faithful model of reality in our brains, it follows that looking for such information requires a willful override.
Using will and curiosity, we can manifestly fight against the more mechanistic tendency of our brain to minimize surprise. We can purposely look for dis-conforming evidence, the kind of evidence that goes against our beliefs . This requires effort and it does not come as natural as the evidence our (more mechanical part of the) brain is looking for to minimize surprise. Here is an appeal to be curious and get outside of the bubble of our beliefs coming from the former President Barack Obama, given in his farewell speech:
So far, we have looked at the brain as a Bayesian machine that actively processes and searches for information. That Bayesian machine resides mostly in the cortex and you'd be right to think that most AI systems (news, social media, etc.) target the cortex, but AI systems do not target only the cortex. Our brain also houses the limbic system, which controls our basic emotions and sits on top of the brain stem. You would probably be surprised that an AI system could target the components of this limbic system, but it can … and it already does . When you read a posting on Facebook that shows you the conditions of the children separated at the southern border from their parents, or what happens to an embryo at abortion time, your limbic system reacts, and it does so much more powerfully than the prefrontal cortex. You are being deliberately manipulated into forming opinions based on how much or how little dopamine your amygdala (part of your limbic system) sends out to your body.
Understanding Opposing Arguments
The search for dis-conforming evidence can be pushed further. We can actually train and improve our capability for understanding positions that do not conform to our beliefs. And in that process, we may revise those beliefs, possibly even abandoning them and forming new ones. The following test asks us to try to support the arguments we do not agree with, not just halfheartedly, but to the best of our abilities. By doing that, you will gain some understanding of the opposing views and realize they are not as crazy as you thought they were. Try it on a few political issues that interest you the most, to see if it works for you. (The classical Turing test is a test for determining when a machine exhibits intelligence and can respond to questions in a manner indistinguishable from that of a human. Turing's many ideas are foundational for AI and his name will pop up many times in our articles. )
This idea of restating opposing arguments to the best of your ability also appears as the first rule among Rapaport's Rules, which were mostly popularized by Daniel Dennet in How to Criticize with Kindness: Philosopher Daniel Dennett on the Four Steps to Arguing Intelligently (Daniel Dennet's name will also pop up a few times in our articles.) Here are the four rules:
- You should attempt to re-express your target’s position so clearly, vividly, and fairly that your target says, "Thanks, I wish I’d thought of putting it that way".
- You should list any points of agreement (especially if they are not matters of general or widespread agreement).
- You should mention anything you have learned from your target.
- Only then are you permitted to say so much as a word of rebuttal or criticism.
Here is a take-home exercise for the ideological Turing test; it may not be comfortable in the U.S. but that's the whole point: