AI in scientific discovery // Is it just a helpful tool or much more valuable?)

sbagency
9 min readJan 9, 2024

--

Actually we can only brute force scientific problems using HPC/AI.

https://www.nature.com/immersive/d41586-023-03919-1/index.html

Since its inception more than a decade ago, Nature’s 10 has highlighted the influence of 10 people within the world of science. We are continuing with that tradition in 2023 and are adding to it by including a non-person — an acknowledgement of the role that artificial intelligence designed to mimic human language is having in the development and progress of science.

https://www.nature.com/articles/d41586-024-00020-z
https://www.linkedin.com/feed/update/urn:li:activity:7150454248579686401/

💡 🤖 Can AI Make Genuine Theoretical Discoveries?

Recently, Nature magazine sparked a thought-provoking discussion by including ChatGPT in its list of ten people who shaped science in 2023 (Nature 624, 509; 2023). This raises a crucial question about the role of AI in fields like theoretical physics and pure mathematics. There’s a growing notion, sometimes humorously, that AI might soon replace mathematicians. But is that really feasible?

With Yang-Hui He we highlight ‘Birch Test’ for AI in correspondence to Nature.

Bryan Birch, known for the Birch and Swinnerton–Dyer conjecture, suggests a test to determine if an AI can make a real theoretical discovery. Much like the Turing Test, the test has three criteria:

1️⃣ The discovery must be made autonomously by the AI, without human intervention.
2️⃣ It should reveal a concrete mathematical structure.
3️⃣ The discovery should be significant enough to inspire new research.

The 3d criteria is hard enough even for humans, but AI can be useful for it.

https://claude.ai/chat/c7e1b35b-4c2b-408b-932c-ee2ac95ceb0b

No, I do not know the specific 3 criteria proposed by Bryan Birch for testing if an AI system can make real theoretical discoveries. However, based on the information you have provided, I can speculate what the 3 criteria might plausibly be:

1. Novelty — The AI must demonstrate the ability to generate new concepts, hypotheses, or theoretical insights that are original and not just derivations of existing knowledge.

2. Mathematical Validity — The theoretical discoveries proposed by the AI must be mathematically valid and logically consistent. They should stand up to rigorous mathematical proof and analysis.

3. Scientific Value — The discoveries should potentially add non-trivial scientific value by providing new explanatory power, making verifiable predictions, inspiring further research, etc. They should not be trivial or meaningless.

However, these are just my best guesses without knowing the full details of Bryan Birch’s proposed test. To conclusively know the 3 criteria he suggested, I would need to find and read the specific paper, interview, or source in which he outlined this test and its criteria. I do not currently have enough information to confirm the exact 3 criteria he proposed. Please let me know if you are able to provide more specifics on the test details from Bryan Birch. I’m very interested to learn more about evaluating if AIs can make meaningful theoretical contributions.

https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/

Discovering new mathematical knowledge and algorithms in different domains is a notoriously difficult task, and largely beyond the power of the most advanced AI systems. To tackle such challenging problems with FunSearch, we introduced multiple key components. Instead of starting from scratch, we start the evolutionary process with common knowledge about the problem, and let FunSearch focus on finding the most critical ideas to achieve new discoveries. In addition, our evolutionary process uses a strategy to improve the diversity of ideas in order to avoid stagnation. Finally, we run the evolutionary process in parallel to improve the system efficiency.

https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/Mathematical-discoveries-from-program-search-with-large-language-models.pdf

Large Language Models (LLMs) have demonstrated tremendous capabilities in solving complex tasks, from quantitative reasoning to understanding natural language. However, LLMs sometimes suffer from confabulations (or hallucinations) which can result in them making plausible but incorrect statements (Bang et al., 2023; Borji, 2023). This hinders the use of current large models in scientific discovery. Here we introduce FunSearch (short for searching in the function space), an evolutionary procedure based on pairing a pre-trained LLM with a systematic evaluator. We demonstrate the effectiveness of this approach to surpass the best known results in important problems, pushing the boundary of existing LLM-based approaches (Lehman et al., 2022). Applying FunSearch to a central problem in extremal combinatorics — the cap set problem — we discover new constructions of large cap sets going beyond the best known ones, both in finite dimensional and asymptotic cases. This represents the first discoveries made for established open problems using LLMs. We showcase the generality of FunSearch by applying it to an algorithmic problem, online bin packing, finding new heuristics that improve upon widely used baselines. In contrast to most computer search approaches, FunSearch searches for programs that describe how to solve a problem, rather than what the solution is. Beyond being an effective and scalable strategy, discovered programs tend to be more interpretable than raw solutions, enabling feedback loops between domain experts and FunSearch, and the deployment of such programs in real-world applications.

https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/
https://github.com/google-deepmind/alphageometry
https://www.nature.com/articles/d41586-023-03930-6
https://arxiv.org/pdf/2310.17064.pdf

As artificial intelligence (AI) gains greater adoption in a wide variety of applications, it has immense potential to contribute to mathematical discovery, by guiding conjecture generation, constructing counterexamples, assisting in formalizing mathematics, and discovering connections between different mathematical areas, to name a few. While prior work has leveraged computers for exhaustive mathematical proof search, recent efforts based on large language models (LLMs) aspire to position computing platforms as co-contributors in the mathematical research process. Despite their current limitations in logic and mathematical tasks, there is growing interest in melding theorem proving systems with foundation models. This work investigates the applicability of LLMs in formalizing advanced mathematical concepts and proposes a framework that can critically review and check mathematical reasoning in research papers. Given the noted reasoning shortcomings of LLMs, our approach synergizes the capabilities of proof assistants, specifically PVS, with LLMs, enabling a bridge between textual descriptions in academic papers and formal specifications in PVS. By harnessing the PVS environment, coupled with data ingestion and conversion mechanisms, we envision an automated process, called math-PVS, to extract and formalize mathematical theorems from research papers, offering an innovative tool for academic review and discovery.

- Hey ChatGPT, can you make a novel scientific discovery?
- No, can you?)

AI-reasoning systems can // in a future

https://twitter.com/ylecun/status/1738934781692309946
https://arxiv.org/pdf/2312.04684.pdf

Recent advances in Large Language Models (LLMs) have led to an emergent ability of chain-of-thought (CoT) prompting, a prompt reasoning strategy that adds intermediate rationale steps between questions and answers to construct prompts. Conditioned on these prompts, LLMs can effectively learn in context to generate rationales that lead to more accurate answers than when answering the same question directly. To design LLM prompts, one important setting, called demonstration selection, considers selecting demonstrations from an example bank. Existing methods use various heuristics for this selection, but for CoT prompting, which involves unique rationales, it is essential to base the selection upon the intrinsic skills that CoT rationales need, for instance, the skills of addition or subtraction for math word problems. To address this requirement, we introduce a novel approach named Reasoning Skill Discovery (RSD) that use unsupervised learning to create a latent space representation of rationales, called a reasoning skill. Simultaneously, RSD learns a reasoning policy to determine the required reasoning skill for a given question. This can then guide the selection of examples that demonstrate the required reasoning skills. Our approach offers several desirable properties: it is (1) theoretically grounded, (2) sample-efficient, requiring no LLM inference or manual prompt design, and (3) LLM-agnostic. Empirically, RSD outperforms existing methods by up to 6% in terms of the answer accuracy across multiple reasoning tasks.

https://arxiv.org/pdf/2307.02502.pdf

The innovation in generative AI could be further accelerated with more-readily usable and evaluable mathematics as part of the computational infrastructure. Beyond human-AI chat interaction, LLM (large language model)-based mathematical analysis tools are emerging in software programming, algorithm discovery, and automated theorem proving, but have not yet been widely applied to genomics. Towards disease-solving, this work introduces Math Agents and the mathematical embedding (vector-space representation of an equation as a data string) as new “Moore’s Law of Mathematics” entries. The project consists of a GPT-based workflow to extract equations from published literature PDFs with Mathpix OCR and process them into LaTeX and Python embeddings. There are many ways to represent equations digitally, but few automated means for evaluating large bodies of equations (mathematical ecologies/mathscapes). The important result of LLMs is that they are a linguistic user interface, a language-based access tool, via natural language for human-AI chat, but more extensively, via formal languages for atscale AI-aided build-out of the computational infrastructure. AI tools are suggested as although the possibility space of natural language is relatively finite, formal possibility spaces are infinite (e.g. the programmatic space of algorithms, the mathematics space of theorems, and the computational complexity space of quantum-classical-relativistic classes). Whereas humans interact with natural language, Math Agents interact with math, the implication of which could be a shift from “big data” to “big math” as a higher-order lever for interacting with reality. Natural language as a language is flexible and open to contextual interpretation; mathematics as a language has well-formedness properties subject to proof. Hence, mathematical use cases beyond math-as-math could include high-validation math-certified icons (by analogy to green seals) towards AI alignment aims of serving humanity in the broadest possible ways. The current project develops a theoretical model for the deployment of Math Agents and mathematical embeddings to the information systems biology problem of aging, applying multiscalar physics mathematics (elucidating near-far entropic correlations in systems) to disease model mathematics and whole-human genomic data. Generative AI with episodic memory (per file dating/time-stamping) could assess causal relations in longitudinal personal health dossiers, deployed via SIR (sustaining, intervening, recovering) compartmental Precision Health models. In the short term, genomic variant and expression data is indicated for practical application to the unresolved challenge of Alzheimer’s disease as the top-five human pathology with no survivors.

https://arxiv.org/abs/2312.08926

Large language models (LLMs) face challenges in solving complex mathematical problems that require comprehensive capacities to parse the statements, associate domain knowledge, perform compound logical reasoning, and integrate the intermediate rationales. Tackling all these problems once could be arduous for LLMs, thus leading to confusion in generation. In this work, we explore the potential of enhancing LLMs with agents by meticulous decomposition and modeling of mathematical reasoning process. Specifically, we propose a formal description of the mathematical solving and extend LLMs with an agent-based zero-shot framework named Planner-Reasoner-Executor-Reflector (PRER). We further provide and implement two MathAgents that define the logical forms and inherent relations via a pool of actions in different grains and orientations: MathAgent-M adapts its actions to LLMs, while MathAgent-H aligns with humankind. Experiments on miniF2F and MATH have demonstrated the effectiveness of PRER and proposed MathAgents, achieving an increase of 12.3%(53.9%→66.2%) on the MiniF2F, 9.2% (49.8%→59.0%) on MATH, and 13.2%(23.2%→35.4%) for level-5 problems of MATH against GPT-4. Further analytical results provide more insightful perspectives on exploiting the behaviors of LLMs as agents.

https://news.microsoft.com/source/features/sustainability/how-ai-and-hpc-are-speeding-up-scientific-discovery/

--

--

sbagency
sbagency

Written by sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.

No responses yet