Agents are easy to imagine, difficult to make it work…but the right time is **now**. Andrej Karpathy
Videos from agents hackathon..
But current agents suck, that’s why 1st place was won by AgentEval (agent debugger)
Mind2Web is a dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Mind2Web contains 2,350 tasks from 137 websites spanning 31 domains that:
Reflect diverse and practical use cases on the web.
Provide challenging yet realistic environments with real-world websites.
Test generalization ability across tasks and environments.
What are the challenges with agents?
Almost everything that humans do, should agents do also. Think (reason), memorize (long/short term), recognize, use tools, write code, use APIs/services, etc.
Frameworks and tools for agents/bots
LLMs for agents
Agents, right time is now
It’s a 1st time in a human history, thanks to latest advancements in AI, ML, LLMs, etc. programs can “think” (yes, not actually, but simulate thinking or reasoning processes) and “act” (use tools).
What can agents do for business?
Almost everything you can imagine.