An LLM autonomously formulates hypotheses, designs experiments, and updates its belief state over iterative cycles. Tested on chemical synthesis optimization tasks, achieving 34% faster convergence than human-guided baselines.
AI agents propose, screen, and validate catalyst structures for electrochemical CO₂ reduction in a closed loop with DFT calculations. The system identified three novel Cu-N-C configurations with Faradaic efficiency exceeding 91%.
Empirical analysis of 140,000 arXiv abstracts shows measurable convergence in terminology and framing since 2023, correlating with LLM adoption rates. Raises questions about diversity of scientific hypotheses when researchers use the same underlying model.
A transformer-based model pre-trained on 4.2M protein–ligand pairs achieves state-of-the-art DTI prediction with zero-shot generalization to unseen protein families. Validated on three FDA-approved drugs retrospectively identified from failed trials.
A classifier trained on 280,000 reviews estimates that 23% of submissions to top venues in 2025 contained LLM-generated text above a 50% threshold. The study finds no significant difference in acceptance rates between AI-assisted and human-written papers.
GraphCast-XL runs global 1km forecasts 2,400× faster than IFS with comparable skill scores at 5-day lead time. The emulator struggles with extreme precipitation but matches ensemble spread for temperature and wind fields.
End-to-end system couples a fine-tuned Molecular Transformer with a liquid-handling robot to execute multi-step syntheses without human intervention. Successfully reproduced 78 of 90 target molecules from the literature benchmark.
Benchmark of 12 frontier models on 9,400 expert-verified scientific questions reveals systematic overconfidence: models express 90%+ certainty on items where human experts agree only 60% of the time. Calibration degrades with model size above 70B parameters.
A diffusion model conditioned on MEG signals reconstructs continuous speech with 71% word-error rate reduction over prior EEG-only baselines. The approach requires only 30 minutes of calibration data per participant.
Mandatory code sharing policy introduced in 2023 increased replication success from 28% to 61% across 340 audited studies. Effect is concentrated in papers using ML methods, suggesting code transparency is especially critical for AI-driven results.
No papers match this filter.