Why are Knowledge Graphs transforming the industry?

The concept and technology underlying Knowledge Graphs (KG) is old, yet is recently gaining renewed traction after Google’s first announcement (Sullivan, 2012) of their application of the technology. Since then, many industries are expanding their analytics arsenals with new, knowledge graph-based solutions in order to solve problems and foster data-driven innovation. In parallel, the tech landscape is adjusting rapidly to this increased demand. An example is the recent VC investment (Neo4j, 2021) of $325M received by Neo4j, the main provider of graph-oriented database systems. But why are KG technologies becoming popular? And what is actually a knowledge graph in the first place? 

KG addresses the business analytics needs that arise from diverse, multilayered data by structuring the data to be insightful and actionable while also incorporating meaningful relationships. First, because graph-oriented databases can deal with unstructured data, they do not obey a restrictive schema and are dynamic in the way that relations can be easily modified vs. more common database technologies. Second, because graph theory and algorithms greatly empower data-science and machine-learning techniques, making them smarter, more predictive and more reproducible compared to traditional approaches. 

According to the Neo4j definition, a knowledge graph is an interconnected dataset enriched with meaning, so we can reason about the underlying data and use it confidently for complex decision making. Very often, the data science problem can be transformed into a graph representation and, when this happens, building a knowledge graph can be advantageous for visualization, insight derivation, as well as for detecting anomalies and for computational efficiency since graph algorithms can be orders of magnitude faster compared to conventional database algorithms in many applications. 

To build a knowledge graph (Figure 1) the first step is to connect the data generating a so-called property graph. At this stage, the relationships between data points are created, thus providing the first level of context data, which is also referred to as “dynamic context”. The terminology is appropriate since when a new piece of data is inserted or created it inherits the context provided by its neighbors and the neighbors’ neighbors and so on, making the information more valuable and the overall graph more interconnected, thus augmenting the usable knowledge required for business decision making. In other words, context is dynamically applied. After the property network is created, the complementing step to build a knowledge graph is to provide meaning to its elements. This is the so-called semantic layer or the organizing principle that allows users and machines to extract knowledge from the data. This is also referred to as the deep dynamic context, because adding semantics makes the data not only interconnected but also smarter, allowing for inference, analytics and learning.  

 

 

Figure 1: The 3 steps for creating a knowledge graph. 

 

Having introduced the basic concepts of KG technologies, our goal here is to illustrate how KG solutions provide added value to solve business challenges in typical client-vendor or interdepartmental (data scientist / domain expert) settings. More specifically, we will focus on how knowledge graphs can: 

1 - Facilitate expectation alignment with clients: Simplifying communication with minimal-cost prototypes 

2 - Add flexibility and cope with agile methodologies: If you are agile, you don’t need to strive for unreacheable perfection from the start 

3 - Enable knowledge sharing between domain experts and data experts: Create business value by tailoring the technology to your specific domain 

4 - Pave the way for growing a non data-driven company into a scalable business model with state-of-the-art AI and machine-learning platforms 

 

Simplifying communication with minimal-cost prototypes 

At the beginning of every new project, one major challenge is the communication of the requirements and execution feasibility. Often business and tech leaders share an initial high-level understanding of the Ws rule: Why, What, Where, Who, and When. Most often - as data driven solutions are developed - clients and tech vendors will experience an increasing gap between expectations and project progress. When initiating a data project, many assumptions are made. Once the data gets processed and new information is added, some of these assumptions will no longer be valid. Even if the communication is clear and the project is well designed, it is often difficult and sometimes impossible to fully predict the evolution of a data project because the generated insights may affect the initial design. At most one could make clever and risk-controlling assumptions. A clever way to mitigate this gap between what is planned and the actual outcome is to guide the development via prototypes. However, depending on the chosen technology, the time and cost of providing such additional deliverables can have a material impact on the economics of a business solution. What if the prototype is actually embedded into the main solution? When modelling a knowledge graph, the conceptual model used for the main solution is extremely explicit and human readable. Even in the earliest stages of a project, a graph visualization consisting of only a few data points can be a relatively easy task for an experienced development team. Such visualizations can reveal with high confidence how the final solution will look like, creating trust for all parts involved. KG is agile in its essence, since by capturing the modelling core at an early stage the domain experts can immediately validate the idea and propose changes, potentially saving significant costs. 

 

If you are agile, you don’t need to strive for “the unreacheable perfection” from the start 

As real data comes into the KG, new requirements may appear. Especially when the data is coming from different sources, we expect some anomalies to be detected on the fly. Moreover, as the domain experts feel more comfortable with the KG technology, new ideas for improvements will emerge and will need to be incorporated. Flexibility and adaptability then become pivotal and agile tech teams must cope with flexible requirements. In this context, KGs bring advantages compared to traditional database systems. Safe experiments can be performed and their business outcomes elaborated without having to re-engineer a whole new solution every time. For example, suppose that in the middle of a deliverable you realize that certain objects or relationships are not as simple as originally assessed, and that the business value of adding extra complexity is not (yet) validated. An expert KG engineer will still be able to provide an enhanced version of a specific part of the graph without compromising the rest of the structure. In fact, using techniques like virtual relationships of graph projections in Neo4j, visualizing an alternative modeling can be done without applying any changes at all into production. In other words, KGs facilitate the development decisions in a very practical way that can be tested and validated by the business experts “on the fly”. Agility at work! 

 

Create business value by tailoring the technology to your specific domain 

Obviously, effective project management and tech development goes beyond aligning business expectations and tech requirements. After several iterations, engineering experts start to grasp more about the domain knowledge and the client's needs and the conversation is expected to reach a higher, more articulate level, including long-term strategies and scalability options. That’s the specific stage at which you can push for the extra mile only if your tech team masters (on top of tech) as well the business domain your business operates in. The scientific literature around Complex Network Analysis has shown that concepts, algorithms and metrics developed for graph structures can be applied to solve problems in a variety of industries. There are many success stories worth mentioning, but for sake of brevity some examples are: NASA’s Mars program that saved two million dollars by capturing 50 years of Lessons Learned in a knowledge graph and connecting that knowledge with the expertise across their enterprise; Airbnb use of KGs in master data management to remove inefficiencies due to silosed data domains and create a commonized, dynamic database of company content; eBay KG for Google Assistant, coupled with natural language understanding and artificial intelligence to store, remember and learn from past interactions with shoppers. The main challenge in applying these solutions to a new business problem is the knowledge sharing process between domain experts and data experts. Because the model is usually transparent and explicit, it is easier for the data team to elaborate examples using real data about the probable outcome of several graph algorithms. In addition, many well known methods are already implemented in the Neo4j Graph Data Science library, so proof of concepts are very feasible at this stage. When the data team succeeds in clarifying the benefits of graph analysis and algorithms, the project is close to becoming a robust data-driven solution. Join Analytics is your partner to translate pharmaceutical, healthcare and biotech R&D challenges into KG solutions powered by Neo4j technology. 

 

Pave the way for growing a non data-driven company into a scalable business model with state-of-the-art AI and machine-learning platforms 

Running AI techniques on knowledge graphs is more realistic than you might think, especially for legacy companies that are not yet data-driven but seek such transition. Modern graph databases, like Neo4j, provide libraries with many implemented and ready-to-use algorithms. For example, the famous PageRank centrality, developed by Google, can identify the most important objects in a big graph in a matter of seconds. Community detection algorithms can summarize important information in your data by revealing groups of objects that highly interact with each other. Link prediction can help you predict missing information as well as understanding how your graph is expected to evolve over time. More recently, the latest advancements in graph neural networks can help you classify nodes and even transform your graph into vector embeddings to be processed and analyzed using any machine learning tool. It is highly advised that this process is conducted along with data experts to avoid taking premature conclusions about the outcome of each method. But as long as you have structured your data in a knowledge graph, all these methods are accessible and easy to use. 

In this scenario, adopting AI methods becomes an easier and more gradual process compared with non-graph solutions. In the graph universe, the more you use and become familiar with each method, the easier it is to implement new solutions and the greater will be the business impact. 

To get more in-depth information on knowledge graphs the reader can refer to the comprehensive text of Kejriwal et al. (Kejriwal et al., 2021), which contains the fundamentals, examples with real-world applications and technical details for building knowledge graphs and using graph algorithms in business solutions.  

Join Analytics was recently selected by the renowned Neo4j Startup program for its competence in providing specialized knowledge-graph solutions for  pharmaceutical, healthcare, and biotech industries. Schedule a meeting with our team to know more about our credentials and services. Our solutions are innovative and customized to suit your needs and to bring your company to the next level of data-driven solutions. 

 

Bibliography 

Kejriwal, M., Knoblock, C., & Szekely, P. (2021). Knowledge Graphs: Fundamentals, Techniques, and Applications (Adaptive Computation and Machine Learning series). The MIT Press. 

Neo4j. (2021). Neo4j Announces $325 Million Series F Investment, the Largest in Database History. Neo4j. Retrieved 8 19, 2021, from https://neo4j.com/press-releases/neo4j-announces-seriesf-funding/ 

Sullivan, D. (2012, 5 16). Google Launches Knowledge Graph To Provide Answers, Not Just Links. Search Engine Land. Retrieved 8 19, 2021, from https://searchengineland.com/google-launches-knowledge-graph-121585