More AI-Agents // new projects every day

More and more new projects are emerging, but is there real innovation?

sbagency
8 min readAug 13, 2024

New LLM-based AI agents are essentially abstractions of multi-hop LLM reasoning. Most LLM-based agentic frameworks consist of multiple agents, each fulfilling different roles. The common workflow for these agents involves task classification and decomposition, planning, and ultimately execution. Execution may include tool usage, code generation and execution, retrieval-augmented generation (RAG), memory/context management, and other related tasks. Dozens of new agents are mixing these principles. But they’re still more like hacks than elegant solutions. AI agents will sooner or later be as commonly used as current apps or websites.

https://github.com/SakanaAI/AI-Scientist
https://arxiv.org/pdf/2408.06292

One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been used as aids to human scientists, e.g. for brainstorming ideas, writing code, or prediction tasks, they still conduct only a small part of the scientific process. This paper presents the first comprehensive framework for fully automatic scientific discovery, enabling frontier large language models (LLMs) to perform research independently and communicate their findings. We introduce The A I Sc ient ist, which generates novel research ideas, writes code, executes experiments, visualizes results, describes its findings by writing a full scientific paper, and then runs a simulated review process for evaluation. In principle, this process can be repeated to iteratively develop ideas in an open-ended fashion and add them to a growing archive of knowledge, acting like the human scientific community. We demonstrate the versatility of this approach by applying it to three distinct subfields of machine learning: diffusion modeling, transformer-based language modeling, and learning dynamics. Each idea is implemented and developed into a full paper at a meager cost of less than $15 per paper, illustrating the potential for our framework to democratize research and significantly accelerate scientific progress. To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores. The A I Sc ient ist can produce papers that exceed the acceptance threshold at a top machine learning conference as judged by our automated reviewer. This approach signifies the beginning of a new era in scientific discovery in machine learning: bringing the transformative benefits of AI agents to the entire research process of AI itself, and taking us closer to a world where endless affordable creativity and innovation can be unleashed on the world’s most challenging problems. Our code is open-sourced at https://github.com/SakanaAI/AI-Scientist.

The video provides a tutorial on building a self-improving AI application using Flowwise, a tool that allows for the creation of sequential agents within a chatbot. The purpose of this AI application is to reduce inaccurate responses and AI hallucinations by using multiple agents to process user queries and generate more accurate answers.

The example used in the tutorial involves a fictitious restaurant called Oak & Barrel, which sells steaks and sushi. The chatbot is designed to handle queries about the restaurant’s menu, such as “What are your current specials?” by retrieving relevant information from a knowledge base. If the retrieved information does not seem relevant to the query, the AI uses a conditional agent to determine whether to rewrite the question or generate a response directly.

The tutorial goes into detail on how to set up the agent flow, including creating nodes for retrieving information, assessing the relevance of the retrieved documents, and generating or rewriting responses based on the relevance check. The video also explains how to connect these nodes to ensure that the AI application can loop back and reprocess the user’s query if necessary.

The video concludes with a demonstration of the chatbot correctly identifying irrelevant information and rewriting the question to produce a more accurate answer, showcasing the effectiveness of the self-improving RAG (Retrieval-Augmented Generation) pipeline.

The video explains the differences between two powerful methods in Flowise for building complex agent workflows: multi-agents and sequential agents.

1. Multi-Agents: This approach involves a supervisor node that delegates tasks to different worker nodes (e.g., a software developer and a code reviewer). It’s simple to set up and allows for easy scalability, but it offers limited control over the workflow, relying heavily on the supervisor’s decision-making.

2. Sequential Agents: This method provides more control but requires more setup. It starts with a sequential flow where nodes are connected in a sequence, and a state node is used to manage the workflow. The supervisor node determines which worker should act next, based on predefined conditions and state management. This method allows for more precise control and flexibility.

The video demonstrates creating the same project using both methods and highlights that while multi-agents are easier to set up, sequential agents offer more control over the application behavior.

https://github.com/mikekelly/AgentK

Agent K is a modular, self-evolving AGI system that gradually builds its own mind as you challenge it to complete tasks.

The “K” stands kernel, meaning small core. The aim is for AgentK to be the minimum set of agents and tools necessary for it to bootstrap itself and then grow its own mind.

AgentK’s mind is made up of:

Agents who collaborate to solve problems, and;

Tools which those agents are able to use to interact with the outside world.

It develops both of these as regular python files (in the agents and tools directories) so it's very easy to track its progress, and even contribute yourself if you want.

The agents that make up the kernel

Hermes: The orchestrator that interacts with humans to understand goals, manage the creation and assignment of tasks, and coordinate the activities of other agents.

AgentSmith: The architect responsible for creating and maintaining other agents. AgentSmith ensures agents are equipped with the necessary tools and tests their functionality.

ToolMaker: The developer of tools within the system, ToolMaker creates and refines the tools that agents need to perform their tasks, ensuring that the system remains flexible and well-equipped.

WebResearcher: The knowledge gatherer, WebResearcher performs in-depth online research to provide the system with up-to-date information, allowing agents to make informed decisions and execute tasks effectively.

https://github.com/frdel/agent-zero?

Personal and organic AI framework

Agent Zero is not a predefined agentic framework. It is designed to be dynamic, organically growing, and learning as you use it.

Agent Zero is fully transparent, readable, comprehensible, customizable and interactive.

Agent Zero uses the computer as a tool to accomplish its (your) tasks.

Key concepts

General-purpose assistant

Agent Zero is not pre-programmed for specific tasks (but can be). It is meant to be a general-purpose personal assistant. Give it a task, and it will gather information, execute commands and code, cooperate with other agent instances, and do its best to accomplish it.

It has a persistent memory, allowing it to memorize previous solutions, code, facts, instructions, etc., to solve tasks faster and more reliably in the future.

Computer as a tool

Agent Zero uses the operating system as a tool to accomplish its tasks. It has no single-purpose tools pre-programmed. Instead, it can write its own code and use the terminal to create and use its own tools as needed.

The only default tools in its arsenal are online search, memory features, communication (with the user and other agents), and code/terminal execution. Everything else is created by the agent itself or can be extended by the user.

Tool usage functionality has been developed from scratch to be the most compatible and reliable, even with very small models.

Multi-agent cooperation

Every agent has a superior agent giving it tasks and instructions. Every agent then reports back to its superior.

In the case of the first agent, the superior is the human user; the agent sees no difference.

Every agent can create its subordinate agent to help break down and solve subtasks. This helps all agents keep their context clean and focused.

Completely customizable and extensible

Almost nothing in this framework is hard-coded. Nothing is hidden. Everything can be extended or changed by the user.

The whole behavior is defined by a system prompt in the prompts/agent.system.md file. Change this prompt and change the framework dramatically.

The framework does not guide or limit the agent in any way. There are no hard-coded rails that agents have to follow.

Every prompt, every small message template sent to the agent in its communication loop, can be found in the prompts/ folder and changed.

Every default tool can be found in the python/tools/ folder and changed or copied to create new predefined tools.

Of course, it is open-source (except for some tools like Perplexity, but that will be replaced with an open-source alternative as well in the future).

Communication is key

Give your agent a proper system prompt and instructions, and it can do miracles.

Agents can communicate with their superiors and subordinates, asking questions, giving instructions, and providing guidance. Instruct your agents in the system prompt on how to communicate effectively.

The terminal interface is real-time streamed and interactive. You can stop and intervene at any point. If you see your agent heading in the wrong direction, just stop and tell it right away.

There is a lot of freedom in this framework. You can instruct your agents to regularly report back to superiors asking for permission to continue. You can instruct them to use point-scoring systems when deciding when to delegate subtasks. Superiors can double-check subordinates’ results and dispute. The possibilities are endless.

https://www.openinterpreter.com/

Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing.

This provides a natural-language interface to your computer’s general-purpose capabilities:

Create and edit photos, videos, PDFs, etc.

Control a Chrome browser to perform research

Plot, clean, and analyze large datasets

…etc.

https://www.taskade.com/

--

--

sbagency
sbagency

Written by sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.

No responses yet