Autonomy is a key component of intelligence // towards autonomous systems
Current AI computing systems can emulate autonomy, and LLMs in loop systems can engage in endless conversations. While this may not be particularly useful right now, these systems have the potential to become more capable in the future
The talk discusses the development of autonomous systems called “droids” at Factory, aimed at automating various stages of the software development lifecycle. These droids have distinct cognitive architectures tailored to tasks like code review, documentation, testing, and coding. The speaker highlights three core characteristics of “agentic” systems: planning, decision-making, and environmental grounding.
1. Planning: Inspired by control systems and robotics, planning involves techniques like the pseudo Kalman filter, which refines intermediate reasoning to ensure consistency, though it can propagate errors. Subtask decomposition helps break down tasks into finer details, increasing control but adding complexity. Model predictive control adapts plans in real-time based on feedback, while explicit plan criteria help maintain focus and reduce errors.
2. Decision-Making: The talk emphasizes various methods to enhance decision-making. These include consensus mechanisms (like prompt ensembles), explicit reasoning through techniques like checklists or chain of thought, fine-tuning models for specific tasks, and simulating decision paths to evaluate outcomes before executing them.
3. Environmental Grounding: This involves integrating AI with external tools and systems. The speaker discusses building custom AI interfaces for specific workflows, emphasizing the importance of processing feedback, both from external sources (like logs) and from the system itself. Bounded exploration allows the agent to gather context before starting a task, while human guidance is crucial for refining the system’s reliability.
Large Language Models (LLMs) have shown remarkable capabilities in natural language tasks requiring complex reasoning, yet their application in agentic, multi-step reasoning within interactive environments remains a difficult challenge. Traditional supervised pre-training on static datasets falls short in enabling autonomous agent capabilities needed to perform complex decision-making in dynamic settings like web navigation. Previous attempts to bridge this gap through supervised fine-tuning on curated expert demonstrations often suffer from compounding errors and limited exploration data, resulting in sub-optimal policy outcomes. To overcome these challenges, we propose a framework that combines guided Monte Carlo Tree Search (MCTS) search with a self-critique mechanism and iterative fine-tuning on agent interactions using an off-policy variant of the Direct Preference Optimization (DPO) algorithm. Our method allows LLM agents to learn effectively from both successful and unsuccessful trajectories, thereby improving their generalization in complex, multi-step reasoning tasks. We validate our approach in the WebShop environment, a simulated e-commerce platform — where it consistently outperforms behavior cloning and reinforced fine-tuning baseline, and beats average human performance when equipped with the capability to do online search. In real-world booking scenarios, our methodology boosts Llama-3 70B model’s zero-shot performance from 18.6% to 81.7% success rate (a 340% relative increase) after a single day of data collection and further to 95.4% with online search. We believe this represents a substantial leap forward in the capabilities of autonomous agents, paving the way for more sophisticated and reliable decision-making in real-world settings.
Due to emergent capabilities, large language models (LLMs) have been utilized as languagebased agents to perform a variety of tasks and make decisions with an increasing degree of autonomy. These autonomous agents can understand high-level instructions, interact with their environments, and execute complex tasks using a selection of tools available to them. As the capabilities of the agents expand, ensuring their safety and trustworthiness becomes more imperative. In this study, we introduce the ATHENA framework which leverages the concept of verbal contrastive learning where past safe and unsafe trajectories are used as in-context (contrastive) examples to guide the agent towards safety while fulfilling a given task. The framework also incorporates a critiquing mechanism to guide the agent to prevent risky actions at every step. Furthermore, due to the lack of existing benchmarks on the safety reasoning ability of LLM-based agents, we curate a set of 80 toolkits across 8 categories with 180 scenarios to provide a safety evaluation benchmark. Our experimental evaluation, with both closed- and open-source LLMs, indicates verbal contrastive learning and interaction-level critiquing improve the safety rate significantly
We propose a novel in-context learning algorithm for building autonomous decision-making language agents. The language agent continuously attempts to solve the same task by self-correcting each time the task fails. Our selected language agent demonstrates the ability to solve tasks in a text-based game environment. Our results show that the gemma-2–9b-it language model, using our proposed method, can successfully complete two of six tasks that failed in the first attempt. This highlights the effectiveness of our approach in enhancing the problem-solving capabilities of a single language model through self-correction, paving the way for more advanced autonomous agents. The code is publicly available at https://github.com/YenCheHsiao/ AutonomousLLMAgentwithAdaptingPlanning.
A major, ongoing social transition is the inclusion of autonomous agents into human organizations. For example, in defence and security applications, robots may be used alongside human operatives to reduce risk or add capability. But a key barrier to the transition to successful human-autonomous agent collectives is the need for sufficient trust between team members. A critical enabling factor for this trust will be a suitably designed dynamic allocation of function (AoF). We consider AoF in terms of a ‘ladder of trust’ (from low to high) with individual team members adjusting trust in their teammates based on variation in ‘score’ over time. The score is derived by the ability of team member to perceive and understand its situation based on the gathered information and act to acheive team or self goals. Combining these trust scores gives a system-level perspective on how AoF might be adjusted during a mission. That is, the most suitable teammate for a function might have a low trust rating from its fellow teammates, so it might be preferable to choose the next most suitable teammate for the function at that point in time. Of course, this is only in the situation where the next most suitable teammate is also likely to perform within the set framework of moral, ethical, and legal constraints. The trade-offs between trust in the individual agent’s capability and predictability need to be considered within the broader context of the agent’s integrity and accountability. From this perspective, the Allocation Space is defined by more than ability of each agent to perform a function. The models that we are developing also require cooperation (and communication) between agents. This can allow the proposed AoF to be negotiated between agents and leads to the proposal that AoF could, in effect, represent a ‘contract’ between the agent performing the function and the agents that would be affected by this performance. We argue that this new approach to trust-sensitive AoF could be an important enabler for organizations seeking to embrace the opportunities arising from integrating autonomous agents into their teams.
With the emergence of large language models (LLMs), LLMpowered multi-agent systems (LLM-MA systems) have been proposed to tackle real-world tasks. However, their agents mostly follow predefined Standard Operating Procedures (SOPs) that remain unchanged across the whole interaction, lacking autonomy and scalability. Additionally, current solutions often overlook the necessity for effective agent cooperation. To address the above limitations, we propose MegaAgent, a practical framework designed for autonomous cooperation in large-scale LLM Agent systems. MegaAgent leverages the autonomy of agents to dynamically generate agents based on task requirements, incorporating features such as automatically dividing tasks, systematic planning and monitoring of agent activities, and managing concurrent operations. In addition, MegaAgent is designed with a hierarchical structure and employs system-level parallelism to enhance performance and boost communication. We demonstrate the effectiveness of MegaAgent through Gobang game development, showing that it outperforms popular LLM-MA systems; and national policy simulation, demonstrating its high autonomy and potential to rapidly scale up to 590 agents while ensuring effective cooperation among them. Our results indicate that MegaAgent is the first autonomous large-scale LLM-MA system with no pre-defined SOPs, high effectiveness and scalability, paving the way for further research in this field. Our code is at https://anonymous.4open.science/r/ MegaAgent-81F3.