Action models // when AI-agents will be useful?

AI agents are merely abstractions built on top of LLMs and other useful technologies

sbagency
3 min readJul 18, 2024
https://www.rabbit.tech/rabbit-os

The Rabbit Research Team has developed the Large Action Model (LAM), a system that can infer and replicate human actions on computer applications reliably and quickly. This model leverages recent advances in neuro-symbolic programming to directly model the structure of applications and user actions without converting them into text, achieving competitive results in accuracy, interpretability, and speed. LAM’s architecture overcomes challenges related to real-time communication and virtual network computing, aiming to enhance AI assistants and operating systems by understanding and performing user actions.

Recent advancements in neural language models, speech recognition, and synthesis have enabled machines to comprehend natural language with greater depth and context. This has led to the development of devices that use spoken language as the primary interface, exemplified by smart speakers and AI chatbots like ChatGPT and operating systems like Rabbit. However, a major challenge is the lack of APIs for many applications, prompting the use of neuro-symbolic programming to learn user interactions directly.

The LAM system utilizes a combination of neural and symbolic components to address the unique structure of human-computer interactions, which are more structured than images and noisier than text. Language models struggle with raw text representations of applications due to the verbosity and noise, necessitating a neuro-symbolic approach that preserves structural information and ensures explainability and efficiency.

LAM learns actions by observing human interactions and replicating them reliably, even when the interface changes. This approach provides transparency and allows technically trained users to inspect and understand the model’s operations. Over time, LAM accumulates knowledge and creates a conceptual blueprint of application services, potentially generalizing to a wide range of applications.

LAM’s technical stack includes a data collection platform, a network architecture combining transformer-style attention and graph-based message passing, and program synthesizers guided by demonstrations. This infrastructure supports LAM in efficiently performing user actions on consumer applications.

LAM has shown competitive performance in web navigation tasks, significantly improving accuracy and latency when combining neural and symbolic methods. The system has been evaluated on real-world websites, demonstrating its potential in practical applications.

LAM is designed to execute tasks responsibly and reliably, with platforms for scheduling and managing routines, and ensuring human-like, respectful interactions. The infrastructure includes secure cloud environments and optimized protocols for multimedia interactions and virtual network computing.

The Rabbit Research Team envisions that LAM will continue to scale and improve, transforming human-machine interactions and making advanced AI experiences affordable and accessible. The goal is to enable intuitive natural language-powered systems that benefit various stakeholders, from service providers to consumers and developers.

https://www.lavague.ai/
https://docs.lavague.ai/en/latest/docs/get-started/quick-tour/
https://agents.vrsen.ai/

Agency Swarm started as a desire and effort of Arsenii Shatokhin (aka VRSEN) to fully automate his AI Agency with AI. By building this framework, we aim to simplify the agent creation process and enable anyone to create collaborative swarm of agents (Agencies), each with distinct roles and capabilities. By thinking about automation in terms of real world entities, such as agencies and specialized agent roles, we make it a lot more intuitive for both the agents and the users. [github]

--

--

sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.