Autonomous AI-agents again // so close now to simulate a hallucinating digital creatures
Autonomous agents: infinite reasoning loop, tool abstractions, memories, code generation/execution, web access
Autonomous agents are a group of systems that can handle defined tasks without the need for regular human input, instead leveraging AI technologies to make decisions and act. Moreover, they can learn from their environment and make improvements, allowing them to perform more complex tasks.
- Gartner Warns that GenAI “Will Directly Lead to the Death of a Customer” by 2027
- By 2028, the EU Will Mandate “the Right to Talk to a Human” In Customer Service, Predicts Gartner
- Gartner: Customer Service Leaders Have Three Priorities to Improve Customer Experience In 2024
Autonomous agents have long been a research focus in academic and industry communities. Previous research often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from human learning processes, and makes the agents hard to achieve human-like decisions. Recently, through the acquisition of vast amounts of web knowledge, large language models (LLMs) have shown potential in human-level intelligence, leading to a surge in research on LLM-based autonomous agents. In this paper, we present a comprehensive survey of these studies, delivering a systematic review of LLMbased autonomous agents from a holistic perspective. We first discuss the construction of LLM-based autonomous agents, proposing a unified framework that encompasses much of previous work. Then, we present a overview of the diverse applications of LLM-based autonomous agents in social science, natural science, and engineering. Finally, we delve into the evaluation strategies commonly used for LLM-based autonomous agents. Based on the previous studies, we also present several challenges and future directions in this field.
This paper provides a comprehensive survey on large language model (LLM) based autonomous agents. The key points are:
1. It discusses the construction of LLM-based agents, proposing a unified framework with four modules: profiling, memory, planning, and action. The profiling module defines agent roles, the memory module stores experiences, the planning module guides behaviors, and the action module produces outcomes.
2. It summarizes strategies for enhancing agent capabilities, categorized into fine-tuning methods using human-annotated, LLM-generated or real-world datasets, and methods without fine-tuning like prompting engineering and mechanism engineering.
3. It overviews applications of LLM-based agents across social sciences (psychology, political science, social simulation, jurisprudence), natural sciences (documentation, experiment assistance, education), and engineering domains (civil, computer science, industrial automation, robotics).
4. It examines evaluation strategies for LLM-based agents, including subjective methods like human annotations and Turing tests, as well as objective metrics, protocols and benchmarks.
5. It identifies challenges and future directions in developing more advanced LLM-based autonomous agents with comprehensive capabilities.
In summary, this survey provides a holistic perspective on the architecture, capability acquisition, applications, evaluation, and future potential of LLM-based autonomous agents.
Language models have shown effectiveness in a variety of software applications, particularly in tasks related to automatic workflow. These models possess the crucial ability to call functions, which is essential in creating AI agents. Despite the high performance of large-scale language models in cloud environments, they are often associated with concerns over privacy and cost. Current on-device models for function calling face issues with latency and accuracy. Our research presents a new method that empowers an on-device model with 2 billion parameters to surpass the performance of GPT-4 in both accuracy and latency, and decrease the context length by 95%. When compared to Llama-7B with a RAG-based function calling mechanism, our method enhances latency by 35-fold. This method reduces the latency to levels deemed suitable for deployment across a variety of edge devices in production environments, aligning with the performance requisites for real-world applications.
This paper presents a new method called Octopus v2 that enables an on-device language model with 2 billion parameters to outperform GPT-4 in both accuracy and latency for function calling tasks. The key ideas are:
1. Introducing special “functional tokens” like <nexa_0> to represent different functions in the model’s vocabulary, allowing function name prediction as a single-token classification problem.
2. Fine-tuning the model to understand the meanings of these functional tokens by incorporating function descriptions into the training data.
3. Using the special <nexa_end> token as an early stopping criterion during inference, avoiding the need to process full function descriptions and reducing context length by 95%.
The Octopus model achieves 99.5% accuracy on an Android function calling benchmark while reducing latency by 35x compared to a retrieval-augmented 7B model like Llama. It can complete function calls within 1–2 seconds on a smartphone after quantization. The method is scalable to other domains like vehicle controls, Yelp, and DoorDash. The paper discusses techniques like dataset generation, model training configurations, and weighted loss functions for special tokens. Overall, Octopus v2 enables highly accurate and low-latency on-device function calling suitable for deploying AI agents on edge devices.
Our current training initiative proves that any specific function can be encapsulated into a newly coined term, functional token, a novel token type seamlessly integrated into both the tokenizer and the model. This model, through a cost-effective training process amounting to merely two cents, facilitates the deployment of AI agents characterized by their remarkably low latency and high accuracy. The potential impacts of our research are extensive. For application developers, including those at DoorDash and Yelp, our model paves the way for training on application-specific scenarios. Developers can pinpoint the APIs most utilized by their audience, transform these into functional tokens for the Octopus model, and proceed with deployment. This strategy has the capacity to fully automate app workflows, emulating functionalities akin to Apple’s Siri, albeit with significantly enhanced response speeds and accuracy. Furthermore, the model’s application within operating systems of PCs, smartphones, and wearable technology presents another exciting avenue. Software developers could train minor LoRAs specific to the operating system. By accumulating multiple LoRAs, the model facilitates efficient function calling across diverse system components. For instance, incorporating this model into the Android ecosystem would enable Yelp and DoorDash developers to train distinct LoRAs, thus rendering the model operational on mobile platforms as well. Looking ahead, we aim to develop a model dedicated to on-device reasoning. Our ambitions are dual-pronged: firstly, to achieve notable speed enhancements for cloud deployments, vastly outpacing GPT-4 in speed metrics. Secondly, to support local deployment, offering a valuable solution for users mindful of privacy or operational costs. This dual deployment strategy not only extends the model’s utility across cloud and local environments, but also caters to user preferences for either speed and efficiency or privacy and cost savings.
Technical experts and policy-makers have increasingly emphasized the need to address extinction risk from artificial intelligence (AI) systems that might circumvent safeguards and thwart attempts to control them (1). Reinforcement learning (RL) agents that plan over a long time horizon far more effectively than humans present particular risks. Giving an advanced AI system the objective to maximize its reward and, at some point, withholding reward from it, strongly incentivizes the AI system to take humans out of the loop, if it has the opportunity. The incentive to deceive humans and thwart human control arises not only for RL agents but for long-term planning agents (LTPAs) more generally. Because empirical testing of sufficiently capable LTPAs is unlikely to uncover these dangerous tendencies, our core regulatory proposal is simple: Developers should not be permitted to build sufficiently capable LTPAs, and the resources required to build them should be subject to stringent controls.
Most of the conversation I see about AI agents is technical, but the real power of agents might be that they solve the organizational problem of how to integrate AI into existing workflows. For better or worse, they act much more like people that can independently execute tasks.
The rise of algorithmic pricing raises concerns of algorithmic collusion. We conduct experiments with algorithmic pricing agents based on Large Language Models (LLMs), and specifically GPT-4. We find that (1) LLM-based agents are adept at pricing tasks, (2) LLM-based pricing agents autonomously collude in oligopoly settings to the detriment of consumers, and (3) variation in seemingly innocuous phrases in LLM instructions (“prompts”) may increase collusion. These results extend to auction settings. Our findings underscore the need for antitrust regulation regarding algorithmic pricing, and uncover regulatory challenges unique to LLM-based pricing agents.
The advent of LLMs heralds both great opportunities and grave concerns. In this paper, we identify the opportunity of incorporating LLMs into pricing algorithms by constructing LLM-based pricing agents and showing that they are powerful enough to optimally price in a simple economic environment. And yet, we also establish that concerns of autonomous algorithmic collusion that have been voiced regarding various pricing algorithms in the past apply equally, if not more so, to pricing algorithms based on LLMs. In particular, we show that even when given seemingly innocuous instructions in broad lay terms, LLM-based pricing algorithms can quickly and robustly arrive at supracompetitive price levels, to the detriment of consumers. Klein’s (2020) policy paper discusses four types of algorithmic collusion, and warns that autonomous algorithmic collusion is the one for which existing enforcement frameworks are least suitable: “The biggest concern may arise, however, when algorithms can learn to optimally form cartels all by themselves — not through instructions from their human masters (or some irrational behaviour), but through optimal autonomous learning (i.e. ‘selflearning’ algorithms). Such an outcome, were it to occur, may be very difficult to prosecute, as businesses deploying such algorithms may not even be aware of whatstrategy the algorithm has learned.” Klein (2020) adds that, although Calvano et al. (2020b) and Klein (2021) establish that autonomous algorithmic collusion may emerge in principle, “many practical limitations for such autonomous algorithmic collusion remain — such as the need for a long learning period,” but that “advances in artificial intelligence may be able to deal with these practical limitations sooner than we might expect.” In line with this prediction, we show that autonomous algorithmic collusion in fact has the potential to quickly and robustly arise in what is slanted to possibly become the most common consumer-available AI in the world. That being said, our economic environment is simple and does not capture many real-world complexities, and we focus on one fixed time horizon. We leave exploring these frontiers to future research. As we show, using certain seemingly innocuous terms and phrases in LLM prompts has the potential to greatly facilitate, or alternatively reduce, seemingly collusive behavior among LLM-based pricing algorithms. Coupled with the opaqueness of how the input to LLMs influences their output, this introduces an array of new challenges for antitrust regulators.