LLM-friendly Knowledge Representation for Augmented Generation and Reasoning
Any RAG-pipeline is limited and error-prone // advanced knowledge pre-processing is required
The recently developed retrieval-augmented generation (RAG) technology has enabled the efficient construction of domain-specific applications. However, it also has limitations, including the gap between vector similarity and the relevance of knowledge reasoning, as well as insensitivity to knowledge logic, such as numerical values, temporal relations, expert rules, and others, which hinder the effectiveness of professional knowledge services. In this work, we introduce a professional domain knowledge service framework called Knowledge Augmented Generation (KAG). KAG is designed to address the aforementioned challenges with the motivation of making full use of the advantages of knowledge graph(KG) and vector retrieval, and to improve generation and reasoning performance by bidirectionally enhancing large language models (LLMs) and KGs through five key aspects: (1) LLM-friendly knowledge representation, (2) mutual-indexing between knowledge graphs and original chunks, (3) logical-form-guided hybrid reasoning engine, (4) knowledge alignment with semantic reasoning, and (5) model capability enhancement for KAG. We compared KAG with existing RAG methods in multihop question answering and found that it significantly outperforms state-of-the-art methods, achieving a relative improvement of 19.6% on hotpotQA and 33.5% on 2wiki in terms of F1 score. We have successfully applied KAG to two professional knowledge Q&A tasks of Ant Group, including E-Government Q&A and E-Health Q&A, achieving significant improvement in professionalism compared to RAG methods. Furthermore, we will soon natively support KAG on the opensource KG engine OpenSPG, allowing developers to more easily build rigorous knowledge decision-making or convenient information retrieval services. This will facilitate the localized development of KAG, enabling developers to build domain knowledge services with higher accuracy and efficiency.
Here’s a summary of the paper “KAG: Boosting LLMs in Professional Domains via Knowledge Augmented Generation”:
Core Idea: This paper introduces Knowledge Augmented Generation (KAG), a novel framework designed to improve the performance of Large Language Models (LLMs) in professional domains. KAG aims to overcome limitations of traditional Retrieval-Augmented Generation (RAG) methods by more effectively integrating knowledge graphs (KGs) and leveraging their logical reasoning capabilities. It seeks to build domain knowledge services with higher accuracy and efficiency.
Problems with Existing RAG: The paper argues that traditional RAG suffers from several limitations: Gap between Vector Similarity and Relevance: Relying solely on text or vector similarity for information retrieval often misses relevant information that requires logical reasoning or inferential steps.
Insensitivity to Knowledge Logic: RAG struggles with complex knowledge such as numerical values, temporal relations, expert rules, and nuanced relationships between concepts.
Lack of Coherence and Logic: Generated text can be incoherent and lacking in logic, especially in specialized domains where analytical reasoning is crucial.
Key Contributions of KAG: KAG addresses these problems by integrating KG and RAG approaches through five key modules: LLM-Friendly Knowledge Representation (LLMFriSPG): Upgrades knowledge representation (SPG) to better align with LLMs. It enables schema-free information extraction and supports mutual-indexing between graph structure and text chunks. This makes it easier to represent, reason about and retrieve relevant information from logical forms.
Logical-Form-Guided Hybrid Solving and Reasoning Engine: Combines different types of reasoning (retrieval, KG, language, and numerical) to transform questions into a problem-solving process. Uses a planning step followed by a hybrid reasoning method for retrieval and processing. This approach uses logical expressions and symbolic reasoning, which can be combined with other operators to answer questions.
Knowledge Alignment with Semantic Reasoning: Defines domain knowledge through semantic relations like synonyms, hypernyms, and inclusions. It performs semantic reasoning both during KG indexing and during online question answering to align question with accurate knowledge.
Model Capability Enhancement: Fine-tunes LLMs for improved performance on tasks needed for the KAG framework, such as Natural Language Understanding (NLU), Natural Language Inference (NLI), and Natural Language Generation (NLG). It also uses specialized fine-tuning methods like K-Lora and AKGF.
One-Pass Inference: KAG simplifies inference using an efficient one-pass mechanism, called OneGen, that integrates retrieval and generation into a single model. This avoids complex multi-module pipelines that have associated loss in transferring information from module to module.
Evaluation & Results: The authors evaluated KAG on three complex question answering datasets (HotpotQA, 2WikiMultiHopQA, and MuSiQue) and found significant improvements over existing RAG methods. Specifically, KAG showed a 19.6% relative improvement in the F1 score on HotpotQA and 33.5% on 2Wiki.
The paper also reports successful applications of KAG in two professional scenarios within Ant Group: E-Government and E-Health Q&A. These applications showed a substantial improvement in accuracy and professionalism compared to traditional RAG methods.
The authors plan to natively support KAG on the open-source KG engine OpenSPG.
Significance: KAG presents a robust technical framework that integrates symbolic reasoning with LLMs. It improves performance in specialized domain question answering.
It effectively bridges the gap between vector-based retrieval and logic-based reasoning by combining knowledge graphs with retrieval augmented generation.
It provides a framework for building knowledge-intensive applications that require both precision and understanding of complex logic.
By providing an open-source implementation on OpenSPG, KAG will enable other researchers and practitioners in the AI field to build domain knowledge services with higher accuracy and efficiency.
In essence, KAG is a significant step towards making LLMs more reliable, logical, and accurate when dealing with the complexity of specialized knowledge. It highlights the importance of combining the strengths of both symbolic AI (KG) and neural AI (LLM) for professional applications.