Neuro-Symbolic Learning and Reasoning in the Era of LLMs // papers AAAI 2024

There are many attempts to integrate neuro (system 1) and symbolic (system 2) methods

sbagency
11 min readFeb 28, 2024
https://nuclear-workshop.github.io/
https://openreview.net/pdf?id=ohixFcMzEr

Answering questions over domain-specific graphs requires a tailored approach due to the limited number of relations and the specific nature of the domain. Our approach integrates classic logical programming languages into large language models (LLMs), enabling the utilization of logical reasoning capabilities to tackle the KGQA task. By representing the questions as Prolog queries, which are readable and near close to natural language in representation, we facilitate the generation of programmatically derived answers. To validate the effectiveness of our approach, we evaluate it using a well-known benchmark dataset, MetaQA. Our experimental results demonstrate that our method achieves accurate identification of correct answer entities for all test questions, given only a very small fraction of the training data. Overall, our work presents a promising approach to addressing question answering over domain-specific graphs, offering an explainable and robust solution by incorporating logical programming languages. Code and models are publicly available on Github.

In this work, we have presented a framework that leverages logical programming languages as a powerful tool for large language models (LLMs) for domain specific question answering over knowledge graphs. By utilizing logical programming languages such as Prolog which benefits from the inherent similarity between the representations of meaning in logical programming languages and natural language, we have showcased the ability to bridge the gap between natural language understanding and logical reasoning. We evaluated our model on a relatively small dataset and showed that it is able to fully answer questions given a small subset of annotated representations due to the pre-trained knowledge encoded even in relatively small LLMs.

The problem is that Prolog engine is limited, as any symbolic system.

https://openreview.net/pdf?id=1wlEOltnRP

In this paper, we investigate the effectiveness of integrating a hierarchical taxonomy of labels as prior knowledge into the learning algorithm of a flat classifier. We introduce two methods to integrate the hierarchical taxonomy as an explicit regularizer into the loss function of learning algorithms. By reasoning on a hierarchical taxonomy, a neural network alleviates its output distributions over the classes, allowing conditioning on upper concepts for a minority class. We limit ourselves to the flat classification task and provide our experimental results on two industrial in-house datasets and two public benchmarks, RCV1 and Amazon product reviews. Our obtained results show the significant effect of a taxonomy in increasing the performance of a learner in semisupervised multi-class classification and the considerable results obtained in a fully supervised fashion.

Symbolic Machine learning

Let us take a step back and explore another approach to training a machine, which is Symbolic Machine learning. A symbolic machine combines a sophisticated reasoner with a large-scale knowledge base.

Here is a summary of the key points from the paper:

- The paper investigates integrating hierarchical taxonomy of labels as prior knowledge into the learning algorithm of a flat classifier.

- Two methods are proposed to represent and incorporate the taxonomy:

1) Symbolic-based Approach: Represent the taxonomy as logical constraints and integrate it into the loss function as a differentiable semantic loss.

2) GCN-based Approach: Represent the taxonomy as a graph and use graph convolutional networks (GCN) to encode the nodes. The GCN encodings are used to regularize the model predictions.

- Experiments are conducted on 4 datasets — 2 in-house Chinese datasets and 2 public benchmarks RCV1 and Amazon reviews.

- The proposed methods are evaluated in both supervised and semi-supervised settings and compared to baselines without taxonomy.

- Results show incorporating taxonomy consistently improves performance over baselines, especially for minority classes. The symbolic method gives good accuracy while GCN gives good macro F1 score.

- Taxonomy has more impact in semi-supervised learning by providing learning signal for unlabeled data. The effect reduces as labeled data increases.

- Key conclusions are:
— Well-designed taxonomy helps guide learner and alleviate imbalanced distributions.
— Symbolic method scales for large taxonomies.
— GCN method effective for imbalanced data.

The paper demonstrates incorporating hierarchical taxonomy as prior knowledge can enhance model quality for flat classification problems.

https://openreview.net/pdf?id=AOAP8sLYdt

While deep learning has enjoyed significant success in computer vision tasks over the past decade, many shortcomings still exist from a Cognitive Science (CogSci) perspective. In particular, the ability to subitize, i.e., quickly and accurately identify the small (≤ 6) count of items, is not well learned by current Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) when using a standard crossentropy (CE) loss. In this paper, we demonstrate that adapting tools used in CogSci research can improve the subitizing generalization of CNNs and ViTs by developing an alternative loss function using Holographic Reduced Representations (HRRs). We investigate how this neuro-symbolic approach to learning affects the subitizing capability of CNNs and ViTs, and so we focus on specially crafted problems that isolate generalization to specific aspects of subitizing. Via saliency maps and out-of-distribution performance, we are able to empirically observe that the proposed HRR loss improves subitizing generalization though it does not completely solve the problem. In addition, we find that ViTs perform considerably worse compared to CNNs in most respects on subitizing, except on one axis where an HRR-based loss provides improvement. Code is available on GitHub.

Here are the key points from the paper:

- The paper proposes using a neuro-symbolic loss function based on Holographic Reduced Representations (HRRs) to improve the subitizing ability of convolutional neural networks (CNNs) and vision transformers (ViTs).

- Subitizing refers to the ability to quickly and accurately identify the number of items (typically 1–4) without counting. Current CNNs and ViTs fail to subitize well on simple datasets.

- The proposed HRR loss represents each class as a unique key-value pair and trains the network to predict the bound term. This is compared to standard cross-entropy loss.

- Experiments on MNIST-like datasets for subitizing show the HRR loss improves generalization for shape, color, and size changes over cross-entropy loss for CNNs. For ViTs, HRR also shows better generalization.

- Analysis of saliency maps shows the HRR loss focuses attention on object boundaries while cross-entropy has more diffuse attention.

- Boundary representation experiments further demonstrate some improved generalization with the HRR loss, but limitations remain in scaling and counting biases.

- Overall, the HRR loss shows promise in improving subitizing in CNNs and ViTs, supporting the use of tools from cognitive science, but more work is needed to reach human-level abilities.

https://openreview.net/pdf?id=dxUi16pvub

The alignment of autonomous agents with human values is a pivotal challenge when deploying these agents within physical environments, where safety is an important concern. However, defining the agent’s objective as a reward and/or cost function is inherently complex and prone to human errors. In response to this challenge, we present a novel approach that leverages one-class decision trees to facilitate learning from expert demonstrations. These decision trees provide a foundation for representing a set of constraints pertinent to the given environment as a logical formula in disjunctive normal form. The learned constraints are subsequently employed within an oracle constrained reinforcement learning framework, enabling the acquisition of a safe policy. In contrast to other methods, our approach offers an interpretable representation of the constraints, a vital feature in safety-critical environments. To validate the effectiveness of our proposed method, we conduct experiments in synthetic benchmark domains and a realistic driving environment.

Here are the key points from the paper:

- The paper introduces a new approach to learn safety constraints from expert demonstrations using one-class decision trees. The learned constraints are represented as a logical formula in disjunctive normal form.

- The constraints are then used within a constrained reinforcement learning framework to obtain a safe policy. The approach allows learning constraints without needing to manually specify them, which is prone to errors.

- The decision tree representation provides interpretability of the learned constraints. The constraints can also be pruned after training to improve interpretability.

- The method is evaluated on synthetic benchmark environments and a realistic driving scenario. It is shown to effectively learn constraints that improve agent safety and performance compared to not having constraints or using hand-engineized constraints.

- A key advantage is that learned constraints can be transferred and reused for different agents and tasks, removing the need to relearn constraints separately. The approach is also robust to limited negative examples in the expert demonstrations.

- Overall, the paper demonstrates a novel interpretable approach to learn safety constraints from demonstrations that can then be used to improve safety in reinforcement learning agents. The ability to learn constraints reduces the burden of manual specification.

https://openreview.net/pdf?id=ORAhay0H4x

The notion of Artificial Intelligence (AI) has garnered significant attention in recent years and AI-based tools have increasingly become integrated into our daily lives. As this strand of research is gaining traction, one of the central debates is whether end-to-end Machine Learning or symbolic AI approaches alone can lead to an effective AI model, or if these techniques need to be integrated into a synergistic system. We believe the integration route to be the most promising. To this end, we introduce a specialization of a neurosymbolic architecture, known as SOFAI (Slow and Fast AI), inspired by the cognitive framework popularized by D. Kahneman’s book “Thinking, Fast and Slow”. Our system, referred to as Plan-SOFAI, aims to tackle planning problems across a large spectrum of scenarios, with a specific focus on the classical setting. Plan-SOFAI leverages multiple planning approaches, each possessing distinct characteristics and categorized as either fast or slow while incorporating a metacognitive process for governance. Finally, we evaluated the performance of this system against state-of-the-art planners, demonstrating that our exhibits a solid balance between solving speed and plans’ optimality.

Here is a summary of the key points from the document:

- The paper introduces Plan-SOFAI, a neuro-symbolic planning architecture inspired by the cognitive framework “Thinking Fast and Slow”.

- Plan-SOFAI incorporates both classical AI planning techniques (“thinking slow” like System 2) and experience-based techniques (“thinking fast” like System 1).

- It has a metacognitive module that decides when to use the fast experience-based techniques versus the slower but more thorough classical planning.

- The fast techniques include a case-based planner that retrieves past plans and a transformer model fine-tuned on planning called Plansformer.

- The slow techniques use existing classical planners like Fast Downward and LPG.

- Experiments show Plan-SOFAI balances solving speed and plan optimality better than using the classical planners alone.

- The architecture is flexible and different planners can be plugged in as the fast and slow solvers.

- Future work includes continually training the Plansformer model and exploring ways for the fast and slow solvers to better collaborate.

https://openreview.net/pdf?id=fWkTKHWfie

Although pre-trained language models encode generic knowledge that is beneficial for planning and control, they may fail to generate appropriate control policies for domain-specific tasks. Existing fine-tuning methods use human feedback to address this limitation. However, sourcing human feedback is labor-intensive and costly. We present a fully automated approach to fine-tune pre-trained language models for domainspecific applications, bridging the gap between generic knowledge and domain-specific requirements while reducing cost. The method synthesizes automaton-based controllers from pre-trained models guided by natural language task descriptions. These controllers are verifiable against independently provided specifications within a world model, which can be abstract or obtained from a high-fidelity simulator. Controllers with high compliance with the desired specifications receive higher ranks, guiding the iterative fine-tuning process. We provide quantitative evidence, primarily in autonomous driving, to demonstrate the method’s effectiveness across multiple tasks. The results indicate an improvement in the percentage of specifications satisfied by the controller from 60% to 90%.

Here is a summary of the key points from the paper:

- The paper presents a method to fine-tune pre-trained language models for domain-specific control tasks using automated feedback from formal methods rather than human feedback.

- The method converts natural language instructions from the language model into automaton-based controllers. These controllers are verified against independently provided specifications in a world model (abstract or simulator-based).

- Controllers satisfying more specifications receive higher rankings, which guides an iterative direct preference optimization process to fine-tune the language model.

- Experiments in autonomous driving show the percentage of satisfied specifications improves from 60% to over 90% after fine-tuning, indicating the method’s effectiveness.

- Automated feedback from formal methods reduces the labor intensity and costs compared to human feedback. The verification results also provide stronger guarantees than empirical evaluation alone.

- Overall, the proposed method bridges the gap between the generic knowledge of pre-trained models and domain-specific requirements while reducing human effort. It’s a promising approach for reliable control of autonomous systems like self-driving cars.

https://openreview.net/pdf?id=wxfqhp9bNR

Recent large language models (LLMs) have enabled tremendous progress in natural language understanding. However, they are prone to generate confident but nonsensical reasoning chains, a significant obstacle to establishing trust with users. In this work, we aim to incorporate rich human feedback on such incorrect model generated reasoning chains for multi-hop reasoning to improve performance on these tasks. To do so, we collect two such datasets of human feedback in the form of (correction, explanation, error type) for StrategyQA and Sports Understanding datasets1 , and evaluate several algorithms to learn from such feedback. We show that fine-tuning on such small datasets of rich human feedback can improve model’s performance of generating the correct final answers, and also improves the model’s ability of judging the correctness of it’s own answer.

Here are the key points from the paper:

- The goal is to improve multi-hop reasoning in large language models (LLMs) by learning from rich human feedback on incorrect reasoning chains generated by the models.

- They collect two datasets totaling 2.2k examples with rich feedback on incorrect reasoning chains from StrategyQA and Sports Understanding datasets. The feedback includes corrections, explanations, and categorization of error types.

- They propose several algorithms to learn from this feedback, including multitask learning, weighted self-consistency, and refinement.

- Experiments on Llama show the feedback can improve performance on answering multi-hop questions and also improve the model’s ability to judge whether its own reasoning chain is correct.

- The best method performs comparably to in-context learning on StrategyQA and significantly better on Sports Understanding. Removing parts of the feedback like error types doesn’t hurt much.

- The model adapted with feedback is better at identifying errors in its own reasoning chains compared to the base Llama model.

- Key limitations are the small number of examples collected and limited error type categorization. Main contributions are the human feedback dataset and showing its utility for improving reasoning and calibration.

--

--

sbagency
sbagency

Written by sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.

No responses yet