Herbarium — Daily Research Feed

27.05.2026

LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation

arxiv.org/abs/2605.24043

An LLM autonomously formulates hypotheses, designs experiments, and updates its belief state over iterative cycles. Tested on chemical synthesis optimization tasks, achieving 34% faster convergence than human-guided baselines.

27.05.2026

AI Agents as Co-Designers of Catalysts: Cu-Based Single-Atom Alloys in CO₂ Electroreduction

doi.org/10.20517/aiagent.2026.05

AI agents propose, screen, and validate catalyst structures for electrochemical CO₂ reduction in a closed loop with DFT calculations. The system identified three novel Cu-N-C configurations with Faradaic efficiency exceeding 91%.

26.05.2026

Homogenization of Scientific Ideas in the Age of Large Language Models

doi.org/10.1073/pnas.2025.herbarium

Empirical analysis of 140,000 arXiv abstracts shows measurable convergence in terminology and framing since 2023, correlating with LLM adoption rates. Raises questions about diversity of scientific hypotheses when researchers use the same underlying model.

25.05.2026

Foundation Models for Drug–Target Interaction Prediction Across Protein Families

doi.org/10.1038/s41591-026-03221-1

A transformer-based model pre-trained on 4.2M protein–ligand pairs achieves state-of-the-art DTI prediction with zero-shot generalization to unseen protein families. Validated on three FDA-approved drugs retrospectively identified from failed trials.

24.05.2026

Peer Review in the LLM Era: Detecting AI-Assisted Submissions at Scale

doi.org/10.1126/science.adl9920

A classifier trained on 280,000 reviews estimates that 23% of submissions to top venues in 2025 contained LLM-generated text above a 50% threshold. The study finds no significant difference in acceptance rates between AI-assisted and human-written papers.

22.05.2026

Neural Weather Emulation at Kilometer Scale: 18-Month Evaluation of GraphCast-XL

doi.org/10.1038/s41612-026-00412-3

GraphCast-XL runs global 1km forecasts 2,400× faster than IFS with comparable skill scores at 5-day lead time. The emulator struggles with extreme precipitation but matches ensemble spread for temperature and wind fields.

21.05.2026

Autonomous Retrosynthesis Planning with Molecular Transformer and Robotic Execution

doi.org/10.1039/D6SC01847K

End-to-end system couples a fine-tuned Molecular Transformer with a liquid-handling robot to execute multi-step syntheses without human intervention. Successfully reproduced 78 of 90 target molecules from the literature benchmark.

19.05.2026

Epistemic Calibration of LLMs in Scientific Question Answering

arxiv.org/abs/2505.18834

Benchmark of 12 frontier models on 9,400 expert-verified scientific questions reveals systematic overconfidence: models express 90%+ certainty on items where human experts agree only 60% of the time. Calibration degrades with model size above 70B parameters.

15.05.2026

Decoding Continuous Speech from Non-Invasive MEG with a Diffusion Prior

doi.org/10.1016/j.neuron.2026.04.017

A diffusion model conditioned on MEG signals reconstructs continuous speech with 71% word-error rate reduction over prior EEG-only baselines. The approach requires only 30 minutes of calibration data per participant.

28.04.2026

Requiring Code Sharing as a Condition of Publication: Evidence from PLOS Medicine

doi.org/10.1371/journal.pmed.1004521

Mandatory code sharing policy introduced in 2023 increased replication success from 28% to 61% across 340 audited studies. Effect is concentrated in papers using ML methods, suggesting code transparency is especially critical for AI-driven results.

No papers match this filter.