Next-gen AI must excel in math, reasoning, planning, etc. // System 2 cognitive functions

Current AI systems can reproduce (approximate) known sequences, but they are unable to synthesize entirely new knowledge

sbagency
6 min readAug 28, 2024

Competition is very high, there are many other projects focused on “System 2" step-by-step multi-hop reasoning. So-called Agents or multi-agent pipelines with advanced retrieval augmented generation (RAG) with up-to-date data.

https://www.theinformation.com/articles/openai-shows-strawberry-ai-to-the-feds-and-uses-it-to-develop-orion

OpenAI is on the verge of releasing a new AI model, and there’s significant buzz around it. The model, code-named “Orion,” could represent a major leap in AI capabilities, particularly in solving complex tasks like math problems and logical reasoning, areas where AI has traditionally struggled. The technology behind Orion, referred to as “Strawberry” or “Q*,” was reportedly demonstrated to U.S. national security officials, highlighting its potential importance.

Strawberry is a slower, more deliberate AI that uses “System 2 thinking,” allowing it to reason through problems rather than simply predicting the next word in a sequence. This approach could lead to fewer errors, or “hallucinations,” and improve AI’s performance in complex and subjective tasks. OpenAI is also reportedly generating synthetic training data, which is necessary due to the scarcity of high-quality, accessible data.

However, the model’s slower processing might make it less suitable for time-sensitive applications like search engines, but ideal for tasks like coding or where precision is more critical than speed. There’s speculation that this model could be a game-changer, especially in fields that require advanced reasoning capabilities.

Overall, OpenAI is positioning Orion as a response to increasing competition from other AI models and open-source alternatives, which have caught up with or even surpassed existing models like GPT-4 in some areas. The AI community is eagerly anticipating the release, and there’s widespread speculation and excitement about its potential impact.

https://x.com/theinformation/status/1828442108425798038

There are a lot of other Agentic projects targeting the same problem. But don’t forget about fundamental limits of artificial intelligence systems.

https://x.com/tom_keldenich/status/1828473666427826313

“Strawberry” and ”Orion” Here’s what you need to know about them: Strawberry: 1. Solves complex tasks (mathematics and programming) 2. Improves language understanding (with more ”thinking” time) 3. Has shown success in internal tests (such as the NY Times puzzle)
Orion: 1. Successor to GPT-4, aims to outperform it. 2. Train with data generated by Strawberry. Both models aim to improve AI reasoning.

A — chatbot — version of Strawberry could be launched this autumn.

https://arxiv.org/pdf/2203.14465

Generating step-by-step “chain-of-thought” rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation currently requires either constructing massive rationale datasets or sacrificing accuracy by using only few-shot inference. We propose a technique to iteratively leverage a small number of rationale examples and a large dataset without rationales, to bootstrap the ability to perform successively more complex reasoning. This technique, the “Self-Taught Reasoner” (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; finetune on all the rationales that ultimately yielded correct answers; repeat. We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers, and performs comparably to finetuning a 30× larger state-of-the-art language model on CommensenseQA. Thus, STaR lets a model improve itself by learning from its own generated reasoning.

https://www.techtarget.com/searchenterpriseai/news/366609277/Google-leader-dives-deep-into-AI-agents

Generative AI is driving the rise of AI agents, tools capable of performing complex tasks that traditionally required human intervention. Companies like Google are developing products to help businesses create their own AI agents, which are more advanced than traditional automation bots. Google’s Jason Gelman explains that AI agents act on behalf of users, capable of planning and executing tasks, but the technology is still in its early stages. Current use cases include industries like call centers and healthcare, though misconceptions exist about the maturity of the technology. Gelman emphasizes the importance of starting with simple, reliable tasks and gradually expanding capabilities as trust in the technology grows. He also contrasts Google’s AI agent approach with Microsoft’s Copilot, noting that Google’s solution focuses on API-driven, enterprise-level integration rather than individual user tasks.

You don’t need to be OpenAI to build reasoning AI-systems

There are a lot of AI-Agents projects with advanced pipelines. But key success factor of course is high quality data..

https://github.com/brainqub3/meta_expert

This project leverages three core concepts:

Meta prompting: For more information, refer to the paper on Meta-Prompting (source). Read our notes on Meta-Prompting Overview for a more concise overview.

Chain of Reasoning: For Jar3d, we also leverage an adaptation of Chain-of-Reasoning

Jar3d uses retrieval augmented generation, which isn’t used within the Basic Meta Agent. Read our notes on Overview of Agentic RAG.

https://arxiv.black/pdf/2401.12954

We introduce meta-prompting, an effective scaffolding technique designed to enhance the functionality of language models (LMs). This approach transforms a single LM into a multi-faceted conductor, adept at managing and integrating multiple independent LM queries. By employing high-level instructions, meta-prompting guides the LM to break down complex tasks into smaller, more manageable subtasks. These subtasks are then handled by distinct “expert” instances of the same LM, each operating under specific, tailored instructions. Central to this process is the LM itself, in its role as the conductor, which ensures seamless communication and effective integration of the outputs from these expert models. It additionally employs its inherent critical thinking and robust verification processes to refine and authenticate the end result. This collaborative prompting approach empowers a single LM to simultaneously act as a comprehensive orchestrator and a panel of diverse experts, significantly enhancing its performance across a wide array of tasks. The zero-shot, task-agnostic nature of meta-prompting greatly simplifies user interaction by obviating the need for detailed, task-specific instructions. Furthermore, our research demonstrates the seamless integration of external tools, such as a Python interpreter, into the meta-prompting framework, thereby broadening its applicability and utility. Through rigorous experimentation with GPT-4, we establish the superiority of meta-prompting over conventional scaffolding methods: When averaged across all tasks, including the Game of 24, Checkmate-in-One, and Python Programming Puzzles, meta-prompting — augmented with a Python interpreter functionality — surpasses standard prompting by 17.1%, expert (dynamic) prompting by 17.3%, and multipersona prompting by 15.2%.

--

--

sbagency
sbagency

Written by sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.

No responses yet