search_query=cat:astro-ph.*+AND+lastUpdatedDate:[202604242000+TO+202604302000]&start=0&max_results=5000
We present spectroxide, a code package for computing cosmic microwave background spectral distortions in which all ${\sim}14{,}500$ lines of Rust code, Python interface, and ${\sim}400$ automated tests were written by an AI assistant (Claude Code) under human physicist supervision. The solver evolves the photon Boltzmann equation under Compton scattering, double Compton emission, and Bremsstrahlung from $z \sim 5 \times 10^6$ to the present, computing spectral distortions from arbitrary heat and photon injection within this redshift range. No fully open-source code of this kind is publicly available; we validate against analytic limits, published spectra, and publicly available precomputed Green's function tables. We document the development as a case study in AI-assisted scientific computing, highlighting how domain expertise caught physics bugs (incorrect dimensional prefactors, near-cancellation errors) that evaded the full automated test suite, and provide recommendations for best practices in human--AI collaborative development of scientific software. We make spectroxide publicly available on GitHub.
Vision-language models (VLMs) are increasingly proposed as general-purpose tools for scientific data interpretation, yet their reliability on real astronomical observations across diverse modalities remains untested. We present AstroVLBench, a comprehensive benchmark comprising over 4,100 expert-verified instances across five tasks spanning optical imaging, radio interferometry, multi-wavelength photometry, time-domain light curves, and optical spectroscopy. Evaluating six frontier models, we find that performance is strongly modality-dependent: while one model (Gemini 3 Pro) emerges as the most consistently capable across tasks, task-specific strengths vary, and all models substantially underperform domain-specialized methods. Mechanistic ablations reveal that performance depends not only on directing attention to salient visual features but also on grounding those features in physical knowledge. Phenomenological prompts describing what to look for improve accuracy by sharpening model focus, but physical prompts explaining why those features matter perform better overall and yield more balanced classifications with reduced class-specific bias. Consistent with this picture, presenting the underlying one-dimensional measurements directly as numerical tables instead of rendered plots yields up to 13 percentage points improvement. Reasoning quality analysis further demonstrates that, without explicit physical grounding, models may reach correct predictions from phenomenologically plausible cues while providing physically imprecise justifications, establishing that accuracy alone is insufficient for trustworthy scientific deployment. These findings provide the first systematic, multi-modal baselines for VLMs in observational astronomy and identify the specific representation, grounding, and reasoning bottlenecks where current models fail.
Scattered light is one of the most common sources of non-stationary noise at low frequencies in Advanced LIGO detectors. It appears as arch-like features in time-frequency spectrograms, produced when stray light reflects from moving surfaces and recombines with the main interferometer beam. In this study, we present ArchGEM, an automated framework for identifying and characterizing these arches and recovering the physical properties of the scattering surfaces. ArchGEM combines a prominence-based peak-finding method with a Gaussian Mixture Model clustering approach to capture a range of scattered-light morphologies across different detector conditions. We apply ArchGEM to scattered light glitches across Advanced LIGO observing runs O3 (2019--2020) and O4 (2023--2024). We find that the average frequency distributions of this noise span 15--25 Hz in O3a and O4, but increase to 20--40 Hz during O3b. Typical inferred surface velocities are 0.2--0.5 $μ$m/s, and inferred surface displacements are 0.1--0.3 $μ$m. The Gaussian Mixture Model performs most consistently for complex or overlapping features, with mean frequency offsets within 5 Hz of the Gravity Spy baseline. Our results show that ArchGEM provides a practical tool for detector characterization by linking observed spectrogram features to the motion of scattering surfaces and helping guide future mitigation of scattered light noise in current and next-generation interferometers. By quantifying the temporal and spectral behavior of scattered light, ArchGEM provides a robust framework for diagnosing noise sources and guiding targeted mitigation strategies in future detector upgrades.
Agentic AI systems are increasingly being integrated into scientific workflows, yet their behavior under realistic conditions remains insufficiently understood. We evaluate CMBAgent across two workflow paradigms and eighteen astrophysical tasks. In the One-Shot setting, access to domain-specific context yields an approximately ~6x performance improvement (0.85 vs. ~0 without context), with the primary failure mode being silent incorrect computation - syntactically valid code that produces plausible but inaccurate results. In the Deep Research setting, the system frequently exhibits silent failures across stress tests, producing physically inconsistent posteriors without self-diagnosis. Overall, performance is strong on well-specified tasks but degrades on problems designed to probe reasoning limits, often without visible error signals. These findings highlight that the most concerning failure mode in agentic scientific workflows is not overt failure, but confident generation of incorrect results. We release our evaluation framework to facilitate systematic reliability analysis of scientific AI agents.
General-relativistic magnetohydrodynamic (GR-MHD) simulations are essential for studying black hole accretion, relativistic jets, and magnetic reconnection, yet their computational cost severely limits systematic parameter exploration. We investigate neural operator surrogates for two astrophysically relevant simulation scenarios produced by the Black Hole Accretion Code (\texttt{BHAC}). First, a Physics Informed Fourier Neural Operator (PINO) is trained on the special-relativistic resistive MHD (SRRMHD) evolution of the Orszag-Tang vortex over a range of resistivities spanning the Sweet-Parker and fast reconnection regimes. By embedding the governing equations as an additional loss term evaluated at finer temporal resolution than the available data supervision, the model learns dynamics at time steps where no simulation data is provided, enabling recovery of plasmoid formation that a data-only baseline trained on the same sparse snapshots fails to reproduce. To our knowledge, the present work is the first application of a physics informed neural operator to special relativistic resistive MHD, and the first to investigate the capability of such models to resolve plasmoid formation in SRRMHD. In a second line of investigation, an OFormer-style Transformer Neural Operator is trained on the evolution of spine-sheath relativistic jets created with \texttt{BHAC}, in special-relativistic MHD (SRMHD). The model is directly applied on the adaptive mesh, highlighting the need for linear attention due to long sequences. The neural surrogate model is capable of capturing most of the major details, especially in early predictions. To our knowledge, this constitutes the first application of a neural operator directly on a high resolution adaptive mesh refinement grid in the context of MHD simulations.
The quiet-Sun coronal electron-temperature ratio $R \equiv T_\mathrm{EUV}/T_B \approx 2.4$, stable across an eight-year solar cycle, is read here as a measurement of relative entropy between two diagnostic projections of the coronal electron distribution onto the one-parameter Maxwellian family. The EUV ionization temperature is a moment-matching projection against a Bethe-type ionization kernel; the radio brightness temperature is the Rayleigh-Jeans source function of thermal bremsstrahlung. For a kappa distribution in the mean-energy convention, Fleishman & Kuznetsov (2014) give the radio-side projection in closed form as $T_B = T_\mathrm{core}$; the EUV side returns $T_\mathrm{eff}$ up to a shape-dependent correction within the Dudík et al. (2014) intensity-ratio envelope. At $κ= 2.5$ the Kullback-Leibler divergences between the true distribution and its two Maxwellian projections evaluate to $0.32$ and $1.20$ nats, and their difference satisfies $ΔD_\mathrm{KL} = (3/2)[R_0 - \ln R_0 - 1] = (3/2) d_\mathrm{IS}(T_\mathrm{eff}, T_\mathrm{core})$, where $R_0 \equiv κ/(κ- 3/2)$ is the ideal closed-form ratio and $d_\mathrm{IS}$ is the Itakura-Saito distance. The identity is offered as an analytical reference for observational systems in which two diagnostics project different moments of a common non-equilibrium distribution; the eight-year stability of $R$ expresses a stability of that projection structure.