Graphs of Data
Since data is at the center of our attention, we should clarify what we mean by this data. The data that we will be mostly interested in can be represented as graphs, a special kind of graphs as we shall see. Graphs come with a bit of technical terminology, and we'll try to make these terms accessible enough to people without a technical background, and at the same time not watering them down to the point that they loose any meaning. You can also think of graphs as networks, networks of people or networks of concepts, or both people and concepts. But the term graph is more precise than the term network and it will remind us of exactly the data structure that we are talking about, without misunderstandings.
The graphs of data we are interested in are technically known as property graphs. They are composed of nodes and arrows (also known as relationships) between these nodes, with each node and arrow having a type, the type being a collection of properties. On the right is a simple example of a property graph; it has three nodes, two nodes of type person and one node of type article. There are three relationships in this graph, of types knows, read, and wrote. The person nodes have properties, like name and age. The relationships also have properties, for example wrote and read relationships have a date property. According to this graph, marko is age 31 and knows aaron, age 30. In 2009, marko wrote an article named rvm which aaron read in 2010. Notice how concisely and precisely this graph summarizes a fair amount of interesting knowledge.
Graphs of People, graphs of knowledge
People who have an online presence are noticing that something puzzling is taking place. When they log in to Facebook or search with Google, they enter a world that seems to have information about them, without them recalling to have entered that information explicitly anywhere. Let's look at the Facebook AI system for example. In Facebook you are represented as a node within a very large social graph. The main relationship that the Facebook graph stores is of type friend_of, i.e., the graph stores nodes of type person and for each pair of nodes \(X\) and \(Y\), an arrow \(X \leftrightarrow Y\) if and only if \(X\) and \(Y\) are friends. Notice that these friendship arrows in the Facebook graph are bidirectional, but that is not always the case with other relationships. These nodes have a lot of properties, like your age, the places you worked before, the schools you attended,etc.; But the Facebook graph is not storing only properties that you entered, it also gathers info about you from your relationships, and does that increasingly with AI techniques. So you should look at the Facebook AI system as a massive system that continuously grows and refines its graph. As an example of such information are a list of things you seem to be interested in: politics, art, travel, etc. The Facebook node that represents you is your Facebook digital twin. As you have noticed, your Facebook digital twin continuously gains intelligence. It appears that your actions (the posts that you read, the messages you write to other people in the network, the links you click on, etc.) are creating an ever more truthful picture of yourself. The same thing happens with your Google, LinkedIn, and Twitter digital twins. The question of the conditions under which governmental intelligence agencies would be allowed to merge all these twins into one complete picture of yourself is a hot one and we will discuss it at length later.
You are also a node in the governmental IRS graph, the graphs of the financial credit agencies and a node in the medical software system that your doctor is using. We will refer to the collection of all your digital twins (Facebook, Google, Amazon, Netflix, Twitter, 23AndMe, medical records, financial credit bureaus, government (IRS, Social Security) as your (integrated) digital twin. Just think for a moment of the power which that integrated digital twin might have. Right now the merging of all these graphs is not happening so your digital twin is just a concept. There would be some small technical difficulties to accomplish such merging, but the question would be mainly political, legislative and judicial. And under special conditions, and for certain individuals only, and under a warrant, such merging would make sense; think of the 19 people who planned and carried out the 9/11 terror.
The second type of graph we are interested in is a graph of concepts. The nodes of such graphs may be any of the nouns that you type in a search box. The best known such graph of concepts is the Google Knowledge graph, which is the graph behind many of the search results given to you, including the knowledge panel on the right hand side of the results page. The difference between an unstructured list of results and that panel is that information in the panel is structured, and given within context, and therefore much more useful. (There are more advanced knowledge graphs in development now, based on RDF (Resource Description Framework), which are capturing knowledge in a more rigorous way, and which will form the basis of the next step in the evolution of WWW, the so-called semantic web. But not many sites structure the information in their web pages into a semantic form, so those graphs are not in wide use yet, and we will not include them in our discussions.)
These property graphs are stored in graph databases (graph DBs). Most of the information used nowadays by businesses is stored in relational DBs. Relational data is very rigidly structured into tables and the relations between different tables are captured through special columns in those tables, called key columns. Because of this rigid structure, relational engines are very efficient and the queries into them can be highly optimized. But business problems many times involve data which contains many such relationships, and the relational model is strained. The important difference with graph is that graph DBs model their domain naturally, without having to introduce artificial concepts like tables, and joins. The best known general purpose graph DB is Neo4j. It is a DB that has been used in the analysis of the Panama Papers, and the Russian trolling in the 2016 U.S. presidential election. (The large tech companies, Facebook/Google/LinkedInhave developed their own versions of much larger and distributed graph DBs, customized to their own needs. Building such custom distributed DBs is a very difficult engineering problem.) Here is Emil Eifrem, the CEO of Neo4j, giving a tour of graph databases, and their increased usage in AI applications; you will also see some of the merging of social graphs and knowledge graphs, which will be important to us:
The processing on these graphs is done with some of the most important algorithms of computer science. Our point of view, since we are interested in AI done on large graphs, is that the two kinds of algorithms, graph algorithms and AI algorithms, are usually used side-by-side and together they produce additional information, which information can then be stored back into specific properties of the nodes and the relationships between the nodes. So you can think of AI as enhancing the graph with properties it learns through statistical analysis, while the graph algorithms are enhancing the graph with properties that reflect the topology of the graph.
What are some of these graph algorithms? Some of the most important graph algorithms deal with ways to traverse the graph, like the least cost routing algorithm: what is the least expensive way to get from node A to node B? The word "expensive" is relative to the cost of traversing the graph, which assumes that a cost has been calculated for each individual relationship that needs to be traversed. You may think of the system of freeways in the U.S. as a graph, and the cost of traversing from city A to city B is truly the distance between them, the farther they are from each other, the more expensive it is to travel between them. Now let's look at an example inside a social graph. If we can quantify the strength of the friendship between two people in a social graph, then we can store that strength as a property of the friendship relationship. Then the cost of traversing through friendships is the inverse of the strength, the stronger the relationship, the easier it is to go between the nodes. So you may ask who is A's best friend (what node has the strongest relationship to A), or if you hear about somebody B who you would like to be friends with, what is the cheapest path to get to B (the overall strongest set of relationships leading to B, which may not be the shortest path topologically, you may need more nodes, but with stronger friendships).
Another set of questions the graph algorithms answer are questions of centrality and influence. In social graphs, some people are socially better positioned than others, in other words they have more friends and stronger relationships, or have more business relationships. There is a large variety of algorithms that try to capture this notion of centrality/influence. Centrality/influence is important in both social graphs and in graphs of knowledge. For example, one of the best known centrality algorithms is Google's page-ranking, an iterative algorithm which assigns more weight to a page that is linked to by many other pages (the weight of those pages counting too); it is a very natural way of ranking pages, the more a page is being linked to by other interesting pages, the more influential and interesting that page is; think of Google's view of WWW as being a graph processing system where the nodes of the graph are all the pages of the World Wide Web. That page ranking, together with an indexing from sets of words to pages, is what determines which pages are presented to you and in what order when you do your Google search for an item of interest.
Graph algorithms are among the most demanding applications in terms of computing power, they need HPC (High Performance Computing) devices to work efficiently. (By the way, when you see HPC below, think graph algorithms, that's all we need for our purposes.). AI algorithms, and especially the neural networks which are the focus of our attention, are also very demanding computationally. Interestingly enough, both graph processing and AI algorithms need a different kind of processor to do their job. The CPU (Central Processing Unit) architecture that powers your laptop is not suitable for these jobs. The GPU (Graphics Processing Unit) that populates your computer screen with images turned out to be more suitable for these kinds of tasks, among other things they are much better at doing linear algebra. Probably the most powerful AI library in use today is Google's Tensor Flow library. What are these tensors? You can think of them as the higher dimensional equivalents of the two-dimensional matrices you encountered in high-school math; there are subtle differences here but we'll ignore them for now; so there is a lot of linear algebra that needs to be crunched in order for these neural networks to do their work efficiently. (All our math needs, including matrix multiplication, are presented in Math)
These GPUs have the same power now that supercomputers had not long ago. NVIDIA has been the primary supplier of these chips although competition is heating up. Among all the GPUs, perhaps the NVIDIA Tesla V100 chip represented the clearest leap forward for both AI and graph algorithms; it is a marvel of engineering, not yet extensively used but you can see where the future is; it will power the most sophisticated AI applications and graph analytics at the same time. These processors support tensor operations, which we have seen are the basic operations behind neural networks, as its native operations. Here NVIDIA CEO and founder Jensen Huang introduces the chip:
With the GPU in mind, let's continue to add to that image and hopefully see where AI systems and the graphs they work on (and especially the AWI systems which are the focus of our attention) live physically. We have seen that to a certain extent the NVIDIA Tesla V100 is already a supercomputer on a chip. CPUs and GPUs are usually bundled inside blade servers, these are enclosures which can be mounted on tall racks, either vertically or horizontally, depending on their form factor. The picture on the right shows a technician mounting such a blade inside a rack within the data center; NVIDIA bundles a number of those Tesla V100 chips together into larger systems, with 8 GPUs and 16 GPUs respectively, which can be mounted on tall racks as well; those blade servers are much larger of course, and inside them there are special buses that connect data between them at very high speeds, not the regular interconnects on the normal rack. This is very expensive hardware at the present time, but regardless of the individual structure and configuration of these blades, whether CPU or GPU based, they are all still mounted on standard racks.
So, what is the overall picture that we should keep in mind when we think about these AI systems built on large graphs? The complete image we now have in mind is on the left: blades mounted on racks inside a data center, rows and rows of such racks. Many developers of AI choose to use services from cloud service providers, like Amazon or Google or Microsoft, which use some of the largest data centers in the world. They offer businesses a variety of configurations of such blades, depending on their needs. When you need compute power, you specify what kind of blades you need, a simple CPU blade, a Volta V100 blade, and so on. It depends on how you structure your computation and how large the data is. If you can transfer all the data to one blade, then you'll process it much faster and you do not need an external Big Data platform. So for small prototype AI projects, this is all you need, but we will look mostly at very large AWI systems, which are distributed over thousands of blade servers.
Large data centers have many operational challenges. Let's look at such a data center, built to support online action games. Many AI breakthroughs are first appearing in game playing; you have hopefully seen in the previous article Main AI Concepts the progress made with playing Go and Chess. Next project of DeepMind is exactly in developing a player for action games, not static games like chess and Go. Action games using AI are some of the most demanding AI applications, not just in the type of AI they would need, but even in their extraordinary appetite for computing power. This computing power is available only in large data centers. These data centers have a number of challenges, which will all tie with some of our concerns about AI, among them the power consumption and the security. A data center aims to stay alive, even if failures occur, at all costs. One cost though that requires a completely different approach is the cost of a deliberate cyber attack on any of the elements of a data center, including the power supply, the cooling and its high-speed connectivity to the Internet. Here is a good example of such a data center, this time in Luxembourg. All the other data centers operate on similar principles.
(Just as an aside: increasingly, these operational challenges of having 24/7 availability in a data center are met with AI means, meaning robots. The advantages are multiple: they are much cheaper than specialized humans, do not get tired, hungry, need no sleep, don't complain, but most importantly, they continuously learn and that accumulation of knowledge is not lost as it is with humans when they are sick or when they quit. Since cooling and energy consumption are such determining factors for the economic health of a data center, they tend to be built in cooler regions. For example, China is developing a massive hub of data centers in its Hohhot Inner Mongolia region. Many of the largest data centers in the world will be built here.)
In the previous article, we looked at the 3+1 different types of AI, and we promised we will define AWI after we introduced graphs. An AWI system is a system consisting of a graph of digital twins of people as its nodes, merged with a graph of concepts related to its people. The term wide refers both to the very large number of nodes and to the number of AI tasks that the AWI system uses to enhance the intelligence of the graph. The nodes may be representing all the people of a nation, and even the people of many participating nations. Think of Google, Facebook, NSA, Social Security, Social Credit (in China), etc. as being AWI systems. We will idealize all these systems and assume that they will do many AI tasks (image recognition, speech recognition, language translation, portfolio management, personal assistants, oracles, expert systems, etc.) they may not be doing at this moment in time, but which, with politics involved, they may and most likely will, do soon.
The AWI system focuses on a particular relationship and builds strong profiles around this relationship. The system does a combination of graph analysis and AI, continuously leveraging its findings on one side of the analysis to the other. It does large scale geometric deep learning on the graph. The software is organized into discoverable services. The specification of these services is formal. For example, an image recognition service, a personal assistant service, a language translation service, a captioning service. It is self improving, using a library of interchangeable programs, each with a formal specification. Some of the questions it can answer: what am I holding in my hand? Answer: a book Depends on the PA service, on the image recognition service, captioning service; what is the title of that book? Answer:... whose picture is this? who is the best person to talk about subject X?
There is another form of intelligence, distinct from human intelligence, which occurs in communities of people, rather than individuals, collective intelligence. The term usually occurs in social science and political science, and it refers to an intelligence which emerges from the collaboration of many individuals, for the purpose of better decision making. But we will be mostly interested in a more specific form of this intelligence, which we will call the "collective intelligence of the graph". While the blue arrows in the diagram 3+1 different types of AI refer to the intelligence evolution of individual AI systems, the red arrows refer to the evolution of this collective intelligence of the graph. Our collective intelligence at the moment is created by us exclusively, through dialog, but it does not take too great of a leap to think that AI algorithms, designed to mine the intelligence of our twins in the graph will eventually enhance (maybe replace) this collective intelligence on their own.
An AWI system might have visual and auditory sensors, like cameras and listening devices. In China, these sensors are or will be soon practically everywhere (banks, railway stations, airports, etc.), and people may have to log in with stronger credentials (like biometrics). We idealize these systems for now, but this idealization is clearly short timed, as many of their stronger features are easy to envisage right now. The existence of such AWI systems will depend not on technical factors, but on political factors. They represent an approach to AI based strictly on computer science advances, not on progress understanding human intelligence.
In that respect, the question of whether an individual's digital twins from various AWI systems can be merged is a technology question and the answer is an easy yes; the question of whether these twins should be merged is political one, and there are no easy answers; there are deep potential benefits and pitfalls. There is already some legislation in the U.S. to prevent that merge. And regardless of whether that merge happens or not in U.S., or China or anywhere else, we will reference the totality of all those partial twins as your integral digital twin.
We will now define distinct categories of AWI systems and we start with the China Social Credit System (CSCS). We start with CSCS for a good reason: AI favors authoritarianism, not liberal democracy. We will see why that is in a dedicated article, AI and liberal democracy. But in nutshell, AI is all about data, and centralized and tightly controlled data works better for AI. Moreover, CSCS is arguably the best known, and most controversial and misunderstood such AWI system. It is without doubt the most ambitious AI project developed by a government. China has developed it since 2014 through a number of smaller pilot AI projects whose concrete finalization would be a formidable AI System that will control much of life in mainland China. This finalization is supposed to be no later than 2020. We will idealize this current CSCS and assume that it is the foundation for what we will call the Chinese National Graph, the idealized graph that combines the CSCS with the social graphs of all the Chinese social media companies like Weibo. Such merging of graphs is far harder to envision in a liberal democracy like the U.S..
The SCS may for good reasons be viewed as a surveillance system, too intrusive for a liberal democracy to consider. Here is a problem with this generic argument against the CSCS. The main assumption right now is that humans are better at judging more nuanced situations, like guilt or innocence in the case of a criminal court case. But the balance between humans and AI systems is continuously tilting towards the AI systems, in practically all domains of interest. What China is doing is speeding up the process, and a blanket criticism of the entire CSCS does not seem to be constructive enough for our purposes.
Could there be such a surveillance system in the U.S.? One can argue that in fact such a system in the U.S. already exists, the NSA graph, although it is far more conspicuous and benign. And it is only a matter of policy and politics whether the government will be given (or should be given) back door access allowing it to coalesce all the social media graphs and the NSA graph towards the U.S. National Graph and use it for the same kinds of decisions that the Chinese National Graph will be used.
We will idealize most of the these AWI systems of interest to us and endow them with characteristics and capabilities which they may not possess at this time, but for which there are no technical difficulties that would prevent them from acquiring those capabilities.
The categories of AWI systems we are interested in are the Private Social Graphs (Facebook, Google, Tweeter, Credit Bureaus, etc.), the Governmental Graphs (Social Security, NSA, etc.), the National Graphs (U.S. and China in particular) on which we already touched in the above paragraph.
We will not view the Knowledge Graphs as AWI systems, because they do not have people as their main nodes, and for them the questions are easier to solve, and cooperation between nations towards their enhancements may be easier to achieve. We will make the assumption that the semantic web will happen in the near future and that all human knowledge will be written and accessible in semantic form; we will refer to this idealized graph as the Universal Knowledge Graph, the totality of human knowledge. We will consider the current knowledge graphs to be subgraphs of the Universal Knowledge Graph. The National Graphs will generate the most intense discussions, and will certainly lead to the thorniest questions of politics and law but we will assume that knowledge graphs are somehow merged into these national graphs, and access to its concepts is available.
We also need to find a way to talk about the idea of combining all these graphs, in other words all the digital twins from all existing graphs with the Universal Knowledge Graph. There are many ways to call this omniscient graph: the World Graph, the Ultimate Graph, but the most expressive way would be to call it God's Graph: this will allow us to tie it with some AI concerns of Type 2, and we will do that in a topic article Superintelligence and God.
AI on Graphs
It is not just that the data is organized in graphs in these AWI systems which ties graphs and AI together. It turns out that there is a much stronger synergy between graphs and AI. Graphs are being used to make predictions, of a very powerful and consequential kind. We knew for a long time that relationships are much more predictive about the properties of the nodes that they are pointing to, than the static characteristics of those nodes (age, sex, education, income, etc.). You know a person less by their personal data, but more by the people around that person, the neighborhood of their digital twin in the graph being the most representative context of their individual existence. This contextual positioning of a node is a very powerful idea, and it is being exploited in many ways:
Graphs are the natural structure to support social network analysis. As opposed to social science in general, where the emphasis is on properties (age, sex, education, income, etc.) of an individual (a node in a graph), social network analysis is focused on the properties of the relationships between individuals. The techniques used in this network analysis are many times related to those of systems and complexity theory. They revolve around questions of centrality and influence, similarity and shared values, strength of support, social positioning and leadership, conflict, mutuality and reciprocity, existence of cliques, etc. The sociograms used in classrooms are nothing but subgraphs of these larger social graphs which are our focus.
This idea of context is even more pervasive; the context of a word in textual information is essential for AI applications to text. In the article Main AI Concepts we aimed in a straight line at one of the most important results in AI, the AlphaGo program. This is a good time to get a bit more mileage out of that article. Neural networks have also been very successful in Natural Language Processing (NLP). And that success is very visible all around us. OK, but what does this have to do with graphs? It turns out that some of the strongest results in NLP are due to word-embedding algorithms, algorithms which use deep learning to analyze text; these algorithms analyze the neighborhoods (the context in which the word is used in text) and efficiently embed the words into a vector space. That way questions about words become questions of linear algebra. Since this coverage of NLP is needed in a few other spots on our website, we consolidate it here:
The concept of a neighborhood also exists in a graph, so we could embed the nodes of graph in a vector space just as we could the words in a text. This analogy is deeper than it appears at first sight. You can translate walks on the graph into English sentences and they will make sense somehow, not literally many times, you would need to think a bit. Vice-versa, it may allow you to think that there is a geometry (a topology be more precise) to textual documents, that text is a geometrical space. In fact the algorithms are projecting (flattening that geometry) into linear spaces. That is done in the DeepWalk algorithm:
Geometric Deep Learning is a new and promising area, mainly research at this time; it brings together the main deep neural networks algorithms we have covered in the Main AI Concepts article and the datasets which are structured as graphs; graphs have a non-flat geometry, as opposed to most of the current applications. This (non-euclidean) geometry of graphs demands a set of new algorithms, and the area is still evolving. The video is technical, but nevertheless, it presents a sort of completion to our article on graphs and allows a glimpse into the future of doing AI on such graphs.
The AWI systems, because of their power, trigger many concerns about the use of AI, some of these concerns being of Type 1, others of Type 2, in other words some are immediate and practical, others are futuristic and speculative. Concerns of Type 1 spring from the fact that the graphs at the core of these AWI systems hold in their nodes and relationships sensitive information about people. Many such Type 1 concerns are about the integrity of the identity of a node, and the access and usage of the information it holds, by other parties, not just the person represented by the node. These parties could be the company controlling the graph, the company controlling the data center (if it's not the same company), third parties doing business with the company, the government (especially the intelligence agency), unlawful intruders, and so on. The issues around strong identity and access to it will be taken up in a topic article, Identity and Trust.
Login access to a node is mainly done through passwords at this moment in time. That mode of access will not be sufficient when the sensitivity of the information in the node will pass a certain threshold. The camera on the device used for initiating the login may have to be turned on, and the AWI system may have to recognize your face, or your eye print. You may have to use another biometric device, like a fingerprint reader. You may have to say a phrase and the AWI system would detect whether it is your voice or not. But one of the most secure ways would be to challenge you with graph questions, and that is why we bring the issue in this article. It is far more difficult to get a hold of this graph structure (the context of your existence) than to get a hold of static data like passwords, date of birth, address, or your Social Security number. The AWI system could randomly present you with people or facts connected to your digital twin in the graph that only you could reasonably answer. It would be almost impossible for an attacking system to figure out all the relationships going in and out of your digital twin in the graph. The questions around login access are less of a technical nature and more of a political will nature. We will see in the Identity and Trust article that some countries are already implementing stronger forms of access.
The AWI systems also trigger concerns of Type 2. Let's stress again that these Type 2 concerns are about the future and they are speculative. You can look at a large data center housing an AWI system like you look at a nuclear reactor, the fuel of the AWI system being its data; consequently, you can treat the AI algorithms as the equivalent of the controlled fission of the reactor; just like in a nuclear reactor, we would like to keep the ignition within controlled parameters. The equivalent to a nuclear accident would be an uncontrolled AI. Since there are connections via networks between these data centers, the possibility of an accidental AI in any data center is very worrisome, as that uncontrolled AI could move into other data centers, and a world ignition would ensue; the larger the data set the more powerful its ignition would be. We will be examining such a possibility within a larger context, in the background article on AI Singularities.