Why AI-Agents? // compound & extensible, not monolithic limited
Agents are an abstraction that provides unlimited capabilities to solve real-world tasks // plan, act, analyze (reflect, memorize, etc.), and iterate.
What are AI agents?
AI agents are advanced systems that integrate large language models (LLMs) with various tools and data sources to perform complex tasks more efficiently than standalone models. Let’s break down the key concepts:
Shift from Monolithic Models to Compound AI Systems:
-Standalone Models: Limited by their training data, difficult to adapt, and cannot access external data directly.
-Compound AI Systems: Combine models with other components to enhance functionality. For example, to plan a vacation, a model needs access to personal vacation data. By integrating the model with a database, it can fetch and provide accurate information.
Benefits of Compound AI Systems:
- Modular Design: Systems can include various models (e.g., LLMs, image generation) and programmatic components (e.g., output verifiers, database search).
- Flexibility and Adaptability: Easier to solve complex problems by selecting appropriate components for each task.
Introduction of AI Agents:
- Reasoning and Control Logic: AI agents use LLMs to handle the control logic, allowing them to plan and reason through tasks rather than following rigid instructions.
- Autonomous Operation: AI agents can plan, execute tasks, and adjust their strategies based on the outcomes, making them suitable for complex problem-solving.
Capabilities of AI Agents:
1. Reasoning: LLMs break down tasks, plan, and reason through steps.
2. Action: Use external tools (e.g., web search, calculators) to execute tasks.
3. Memory: Store and retrieve conversation history and internal logs to enhance personalization and problem-solving.
ReACT Framework:
- Combines reasoning and action, prompting the model to plan, act, observe outcomes, and iterate as needed.
Application Example:
Planning a vacation involves multiple steps, such as checking vacation days, weather forecasts, and sunscreen requirements. AI agents can handle such complex tasks by integrating various tools and data sources, demonstrating their modularity and efficiency.
Future of AI Agents:
- Autonomy Spectrum: AI systems can range from narrowly focused, programmatic approaches to more autonomous, agent-based approaches for complex tasks.
- Human in the Loop: While AI agents are becoming more capable, human oversight remains crucial for ensuring accuracy and handling intricate tasks.
AI agents represent a significant advancement in AI, combining system design with autonomous behavior to tackle increasingly complex tasks effectively.
AI agents are an exciting new research direction, and agent development is driven by benchmarks. Our analysis of current agent benchmarks and evaluation practices reveals several shortcomings that hinder their usefulness in real-world applications. First, there is a narrow focus on accuracy without attention to other metrics. As a result, SOTA agents are needlessly complex and costly, and the community has reached mistaken conclusions about the sources of accuracy gains. Our focus on cost in addition to accuracy motivates the new goal of jointly optimizing the two metrics. We design and implement one such optimization, showing its potential to greatly reduce cost while maintaining accuracy. Second, the benchmarking needs of model and downstream developers have been conflated, making it hard to identify which agent would be best suited for a particular application. Third, many agent benchmarks have inadequate holdout sets, and sometimes none at all. This has led to agents that are fragile because they take shortcuts and overfit to the benchmark in various ways. We prescribe a principled framework for avoiding overfitting. Finally, there is a lack of standardization in evaluation practices, leading to a pervasive lack of reproducibility. We hope that the steps we introduce for addressing these shortcomings will spur the development of agents that are useful in the real world and not just accurate on benchmarks.
Recent advancements in Large Language Models (LLMs) have led to significant breakthroughs in various natural language processing tasks. However, generating factually consistent responses in knowledge-intensive scenarios remains a challenge due to issues such as hallucination, difficulty in acquiring long-tailed knowledge, and limited memory expansion. This paper introduces SMART, a novel multiagent framework that leverages external knowledge to enhance the interpretability and factual consistency of LLM-generated responses. SMART comprises four specialized agents, each performing a specific sub-trajectory action to navigate complex knowledge-intensive tasks. We propose a multi-agent co-training paradigm, Long- and Short-Trajectory Learning, which ensures synergistic collaboration among agents while maintaining fine-grained execution by each agent. Extensive experiments on 5 tasks demonstrate SMART’s superior performance compared to previous widely adopted methods. Our code is available at https://github.com/yueshengbin/SMART.
Existing agents based on large language models (LLMs) demonstrate robust problem-solving capabilities by integrating LLMs’ inherent knowledge, strong incontext learning and zero-shot capabilities, and the use of tools combined with intricately designed LLM invocation workflows by humans. However, these agents still exhibit shortcomings in long-term reasoning and under-use the potential of existing tools, leading to noticeable deficiencies in complex real-world reasoning scenarios. To address these limitations, we introduce Sibyl, a simple yet powerful LLM-based agent framework designed to tackle complex reasoning tasks by efficiently leveraging a minimal set of tools. Drawing inspiration from Global Workspace Theory, Sibyl incorporates a global workspace to enhance the management and sharing of knowledge and conversation history throughout the system. Furthermore, guided by Society of Mind Theory, Sibyl implements a multi-agent debate-based jury to self-refine the final answers, ensuring a comprehensive and balanced approach. This approach aims to reduce system complexity while expanding the scope of problems solvable — from matters typically resolved by humans in minutes to those requiring hours or even days, thus facilitating a shift from System-1 to System-2 thinking. Sibyl has been designed with a focus on scalability and ease of debugging by incorporating the concept of reentrancy from functional programming from its inception, with the aim of seamless and low effort integration in other LLM applications to improve capabilities. Our experimental results on the GAIA benchmark test set reveal that the Sibyl agent instantiated with GPT-4 achieves state-of-the-art performance with an average score of 34.55%, compared to other agents based on GPT-4. We hope that Sibyl can inspire more reliable and reusable LLM-based agent solutions to address complex real-world reasoning tasks.1
•We propose Sibyl, a simple and powerful LLM-based agent framework that embodies a design philosophy centered on simplicity, modularity, and reusability by promoting stateless interactions during inference time, which facilitates ease of debugging and enhancement.
• We develop an external information acquisition channel accompanied by a representation language specifically tailored to efficiently gather and selectively compress external information. Drawing inspiration from Global Workspace Theory and Society of Mind Theory, we also introduce a global workspace that facilitates effective information sharing across modules, and a multi-agent debate-based jury that promotes self-reflection.
• The experimental results on the GAIA benchmark test set demonstrate that the Sibyl agent achieves new state-of-the-art performance, particularly in the challenging Level 2 and Level 3 scenarios, which underscores the improvement of Sibyl in solving complex reasoning tasks.
The AI community has been exploring a pathway to artificial general intelligence (AGI) by developing “language agents”, which are complex large language models (LLMs) pipelines involving both prompting techniques and tool usage methods. While language agents have demonstrated impressive capabilities for many realworld tasks, a fundamental limitation of current language agents research is that they are model-centric, or engineering-centric. That’s to say, the progress on prompts, tools, and pipelines of language agents requires substantial manual engineering efforts from human experts rather than automatically learning from data. We believe the transition from model-centric, or engineering-centric, to data-centric, i.e., the ability of language agents to autonomously learn and evolve in environments, is the key for them to possibly achieve AGI. In this work, we introduce agent symbolic learning, a systematic framework that enables language agents to optimize themselves on their own in a data-centric way using symbolic optimizers. Specifically, we consider agents as symbolic networks where learnable weights are defined by prompts, tools, and the way they are stacked together. Agent symbolic learning is designed to optimize the symbolic network within language agents by mimicking two fundamental algorithms in connectionist learning: back-propagation and gradient descent. Instead of dealing with numeric weights, agent symbolic learning works with natural language simulacrums of weights, loss, and gradients. We conduct proof-of-concept experiments on both standard benchmarks and complex real-world tasks and show that agent symbolic learning enables language agents to update themselves after being created and deployed in the wild, resulting in “self-evolving agents”. We demonstrate the potential of the agent symbolic learning framework and open-source the entire framework to facilitate future research on data-centric agent learning.
Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. Despite numerous efforts made to optimize LLMs for long contexts, challenges persist in robustly processing long inputs. In this paper, we introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore this graph autonomously. Upon receiving a question, the agent first undertakes a step-by-step analysis and devises a rational plan. It then invokes a set of predefined functions to read node content and neighbors, facilitating a coarse-to-fine exploration of the graph. Throughout the exploration, the agent continuously records new insights and reflects on current circumstances to optimize the process until it has gathered sufficient information to generate an answer. Experimental results on the LV-Eval dataset reveal that GraphReader, using a 4k context window, consistently outperforms GPT-4–128k across context lengths from 16k to 256k by a large margin. Additionally, our approach demonstrates superior performance on four challenging single-hop and multi-hop benchmarks.
Graph Construction
1. Document Chunking: Split document D into chunks of maximum length L , preserving paragraph structure.
2. Summarization: For each chunk, the LLM summarizes it into atomic facts — simplified, indivisible pieces of information.
3. Key Element Extraction: Extract essential nouns, verbs, and adjectives from each atomic fact.
4. Normalization: Normalize key elements to handle lexical noise and granularity issues, creating a final set of key elements.
5. Node Creation: Construct nodes v_i = (k_i, A_i) , where k_i is a key element and A_i is the set of atomic facts corresponding to k_i .
6. Node Linking: Link two nodes v_i and v_j if k_i appears in A_j and vice versa.
Graph Exploration
1. Agent Initialization:
— Notebook: The agent maintains a notebook to record supporting facts.
— Rational Plan: The agent pre-plans the solution, breaking down the question into key steps and information needs.
— Initial Node Selection: The agent selects N initial nodes based on the question and rational plan.
2. Exploration Process:
— Atomic Facts: Start with atomic facts, grouped by chunks and labeled with chunk IDs.
— Chunk Reading: If needed, the agent reads specific chunks for detailed information.
— Notebook Update: Continuously update the notebook with relevant information.
3. Exploring Atomic Facts:
— Overview: The agent captures an overview of each chunk by reading groups of atomic facts.
— Function Utilization:
— read_chunk: Read valuable chunks identified by the agent.
— stop_and_read_neighbor: Proceed to neighboring nodes if no chunks are deemed valuable.
4. Exploring Chunks:
— Queue Traversal: Traverse and read chunks of interest.
— Supporting Facts: Record any supporting facts in the notebook.
— Function Selection:
— search_more: Continue exploring chunks if information is insufficient.
— read_previous_chunk / read_subsequent_chunk: Insert adjacent chunks into the queue if relevant.
— termination: Finish exploration if sufficient information is gathered.
5. Exploring Neighbors:
— Neighbor Node Check: Check all neighboring nodes based on the notebook, question, and plan.
— Function Utilization:
— read_neighbor_node: Re-enter exploration with a neighboring node.
— termination: Finish exploration if no neighboring nodes are useful.
Answer Reasoning
1. Information Compilation: Gather all notes from each agent after independent exploration.
2. Chain-of-Thought: Analyze each note, consider complementary information, and resolve inconsistencies using a majority voting strategy.
3. Final Answer: Generate the final answer based on all available information.
Autonomous agents powered by language models (LMs) have demonstrated promise in their ability to perform decision-making tasks such as web automation. However, a key limitation remains: LMs, primarily optimized for natural language understanding and generation, struggle with multi-step reasoning, planning, and using environmental feedback when attempting to solve realistic computer tasks. Towards addressing this, we propose an inference-time search algorithm for LM agents to explicitly perform exploration and multi-step planning in interactive web environments. Our approach is a form of best-first tree search that operates within the actual environment space, and is complementary with most existing state-ofthe-art agents. It is the first tree search algorithm for LM agents that shows effectiveness on realistic web tasks. On the challenging VisualWebArena benchmark, applying our search algorithm on top of a GPT-4o agent yields a 39.7% relative increase in success rate compared to the same baseline without search, setting a state-of-the-art success rate of 26.4%. On WebArena, search also yields a 28.0% relative improvement over a baseline agent, setting a competitive success rate of 19.2%. Our experiments highlight the effectiveness of search for web agents, and we demonstrate that performance scales with increased test-time compute. We conduct a thorough analysis of our results to highlight improvements from search, limitations, and promising directions for future work. Our code and models are publicly released at jykoh.com/search-agents.
The design of new alloys is a multi-scale problem that requires a holistic approach that involves retrieving relevant knowledge, applying advanced computational methods, conducting experimental validations, and analyzing the results, a process that is typically slow and reserved for human experts. Machine learning (ML) can help accelerate this process, for instance, through the use of deep surrogate models that connect structural and chemical features to material properties, or vice versa. However, existing data-driven models often target specific material objectives, offering limited flexibility to integrate out-of-domain knowledge and cannot adapt to new, unforeseen challenges. Here, we overcome these limitations by leveraging the distinct capabilities of multiple AI agents that collaborate autonomously within a dynamic environment to solve complex materials design tasks. The proposed physics-aware generative AI platform, AtomAgents, synergizes the intelligence of large language models (LLM) the dynamic collaboration among AI agents with expertise in various domains, including knowledge retrieval, multi-modal data integration, physics-based simulations, and comprehensive results analysis across modalities that includes numerical data and images of physical simulation results. The concerted effort of the multi-agent system allows for addressing complex materials design problems, as demonstrated by examples that include autonomously designing metallic alloys with enhanced properties compared to their pure counterparts. Our results enable accurate prediction of key characteristics across alloys and highlight the crucial role of solid solution alloying to steer the development of advanced metallic alloys. Our framework enhances the efficiency of complex multi-objective design tasks and opens new avenues in fields such as biomedical materials engineering, renewable energy, and environmental sustainability