Do we need reasoning intelligence // AI in scientific research
In pure mathematics, very occasionally, breakthroughs arrive like bolts from the blue — the result of such inspired feats of reasoning and creativity that they seem to push the very bounds of intelligence. In 2016, for instance, mathematician Timothy Gowers marvelled at a solution to the cap set problem, which has to do with finding the largest pattern of points in space where no three points form a straight line. The proof “has a magic quality that leaves one wondering how on Earth anybody thought of it”, he wrote.
Leveraging generative Artificial Intelligence (AI), we have transformed a dataset comprising 1,000 scientific papers focused on biological materials into a comprehensive ontological knowledge graph. Through an in-depth structural analysis of this graph, we have calculated node degrees, identified communities along with their connectivities, and evaluated clustering coefficients and betweenness centrality of pivotal nodes, uncovering fascinating knowledge architectures. We find that the graph has an inherently scale-free nature, shows a high level of connectedness, and can be used as a rich source for downstream graph reasoning by taking advantage of transitive and isomorphic properties to reveal insights into unprecedented interdisciplinary relationships that can be used to answer queries, identify gaps in knowledge, propose never-before-seen material designs, and predict material behaviors. Using a large language embedding model we compute deep node representations and use combinatorial node similarity ranking to develop a path sampling strategy that allows us to link dissimilar concepts that have previously not been related. One comparison revealed detailed structural parallels between biological materials and Beethoven’s 9th Symphony, highlighting shared patterns of complexity through isomorphic mapping. In another example, the algorithm proposed an innovative hierarchical mycelium-based composite based on integrating path sampling with principles extracted from Kandinsky’s ‘Composition VII’ painting. The resulting material integrates an innovative set of concepts that include a balance of chaos and order, adjustable porosity, mechanical strength, and complex patterned chemical functionalization. We uncover other isomorphisms across science, technology and art, revealing a nuanced ontology of immanence that reveal a context-dependent heterarchical interplay of constituents. Because our method transcends established disciplinary boundaries through diverse data modalities (graphs, images, text, numerical data, etc.), graph-based generative AI achieves a far higher degree of novelty, explorative capacity, and technical detail, than conventional approaches and establishes a widely useful framework for innovation by revealing hidden connections.
We introduce five collaborative schemes for multi-agent systems:
💡 Brainstorming agents
💡 Expert consultation agents
💡 Research debate agents
💡 Roundtable discussion agents
💡 Self-driving lab agentsWe provide a roadmap for building biomedical AI agents with four key modules:
1️⃣ Perception modules to receive multimodal inputs across data modalities and techniques
2️⃣ Interaction modules to interact with other agents, humans, and tools
3️⃣ Reasoning modules to enable direct reasoning (e.g., chain of thought, leap of thought) and reasoning with feedback
4️⃣ Memory modules for short-term memory (in-context learning, prompt learning and knowledge graph retrieval) and long-term memory (model fine-tuning, model editing, RAG)
Publications focused on scientific discoveries derived from analyzing large biological datasets typically follow the cycle of hypothesis generation, experimentation, and data interpretation. The reproduction of findings from such papers is crucial for confirming the validity of the scientific, statistical, and computational methods employed in the study, and it also facilitates the foundation for new research. By employing a multi-agent system composed of Large Language Models (LLMs), including both text and code generation agents built on OpenAI’s platform, our study attempts to reproduce the methodology and findings of a high-impact publication that investigated the expression of viral-entry-associated genes using single-cell RNA sequencing (scRNA-seq). The LLM system was critically evaluated against the analysis results from the original study, highlighting the system’s ability to perform simple statistical analysis tasks and literature reviews to establish the purpose of the analyses. However, we also identified significant challenges in the system, such as nondeterminism in code generation, difficulties in data procurement, and the limitations presented by context length and bias from the model’s inherent training data. By addressing these challenges and expanding on the system’s capabilities, we intend to contribute to the goal of automating scientific research for efficiency, reproducibility, and transparency, and to drive the discussion on the role of AI in scientific discovery.
Scientific publishing lays the foundation of science by disseminating research findings, fostering collaboration, encouraging reproducibility, and ensuring that scientific knowledge is accessible, verifiable, and built upon over time. Recently, there has been immense speculation about how many people are using large language models (LLMs) like ChatGPT in their academic writing, and to what extent this tool might have an effect on global scientific practices. However, we lack a precise measure of the proportion of academic writing substantially modified or produced by LLMs. To address this gap, we conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals, using a population-level statistical framework to measure the prevalence of LLM-modified content over time. Our statistical estimation operates on the corpus level and is more robust than inference on individual instances. Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers (up to 17.5%). In comparison, Mathematics papers and the Nature portfolio showed the least LLM modification (up to 6.3%). Moreover, at an aggregate level, our analysis reveals that higher levels of LLM-modification are associated with papers whose first authors post preprints more frequently, papers in more crowded research areas, and papers of shorter lengths. Our findings suggests that LLMs are being broadly used in scientific writings
We present an approach for automatically generating and testing, in silico, social scientific hypotheses. This automation is made possible by recent advances in large language models (LLM), but the key feature of the approach is the use of structural causal models. Structural causal models provide a language to state hypotheses, a blueprint for constructing LLM-based agents, an experimental design, and a plan for data analysis. The fitted structural causal model becomes an object available for prediction or the planning of follow-on experiments. We demonstrate the approach with several scenarios: a negotiation, a bail hearing, a job interview, and an auction. In each case, causal relationships are both proposed and tested by the system, finding evidence for some and not others. We provide evidence that the insights from these simulations of social interactions are not available to the LLM purely through direct elicitation. When given its proposed structural causal model for each scenario, the LLM is good at predicting the signs of estimated effects, but it cannot reliably predict the magnitudes of those estimates. In the auction experiment, the in silico simulation results closely match the predictions of auction theory, but elicited predictions of the clearing prices from the LLM are inaccurate. However, the LLM’s predictions are dramatically improved if the model can condition on the fitted structural causal model. In short, the LLM knows more than it can (immediately) tell.