AI 精选 — AI 自动挑选的高价值内容

ArXiv CS.AI

PhyDrawGen: Physically Grounded Diagram Generation from Natural Language

arXiv:2605.30512v1 Announce Type: new Abstract: Generating physics diagrams from text requires strict adherence to physical laws. While current generative models produce visually plausible outputs, they systematically hallucinate force vectors, ignore conservation laws, and violate geometric constr…

💬 暂无讨论

12 小时前

ArXiv CS.AI

Physically Viable World Models: A Case for Query-Conditioned Embodied AI

arXiv:2605.30542v1 Announce Type: new Abstract: World models for embodied AI must be physically viable: constructed to answer intervention queries by representing the physical structure governing action outcomes, rather than merely predicting future observations.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Transforming and Encoding FTS for SAT Solving: What Helps, What Hurts (Extended Version)

arXiv:2605.30563v1 Announce Type: new Abstract: Factored tasks are a classical planning representation that extends SAS+ with limited forms of disjunctive preconditions, conditional effects, and angelic nondeterminism.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Procedural Generation of First Person Shooter Maps using Map-Elites

arXiv:2605.30570v1 Announce Type: new Abstract: We investigate the application of MAP-Elites (a well-known quality diversity algorithm) to design levels for First-Person Shooter (FPS) games.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Uncertainty-Aware and Temporally Regulated Expert Advice in Reinforcement Learning for Autonomous Driving

arXiv:2605.30576v1 Announce Type: new Abstract: Exploration in reinforcement learning for autonomous driving is inherently unsafe: agents must experience novel behaviors to learn, yet exploration can lead to collisions or off-road driving.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

arXiv:2605.30621v1 Announce Type: new Abstract: LLM agents are increasingly deployed as systems built around editable external harnesses, including prompts, skills, memories and tools, that shape task execution without changing model parameters.

💬 暂无讨论

12 小时前

ArXiv CS.AI

EHRBench: An Automated and Reliable EHR-based Benchmark for Clinical Decision Making with LLMs

arXiv:2605.30637v1 Announce Type: new Abstract: Clinical decision-making (CDM) is central to real-world clinical workflows, where clinicians infer diagnoses, select treatments, or anticipate future health outcomes under incomplete evidence.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Structure-Induced Information for Rerooting Levin Tree Search

arXiv:2605.30664v1 Announce Type: new Abstract: Subgoal-based policy tree search, which uses a policy to guide search, is effective for complex single-agent deterministic problems but often relies on explicit subgoal generation that can incur substantial overhead and hinders scalability.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Healthcare Mechanisms from Policy-as-Code Search under Strategic Provider Response

arXiv:2605.30680v1 Announce Type: new Abstract: Healthcare mechanisms are inseparable from the strategic provider response they induce: existing healthcare AI benchmarks hold this response fixed and so cannot evaluate mechanisms by the equilibrium they produce.

💬 暂无讨论

12 小时前

ArXiv CS.AI

MAVEN: Improving Generalization in Agentic Tool Calling

arXiv:2605.30738v1 Announce Type: new Abstract: Generalization across agentic tool-calling environments remains a central challenge for reliable agentic reasoning systems.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Generating Graph-like Rules for Knowledge Graph Reasoning via Diffusion Models

arXiv:2605.30747v1 Announce Type: new Abstract: Logical rules constitute a cornerstone of knowledge graph (KG) reasoning, valued for their interpretability and ability to model relational patterns.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Learning Agent-Compatible Context Management for Long-Horizon Tasks

arXiv:2605.30785v1 Announce Type: new Abstract: LLM agents increasingly face long-horizon tasks such as web search and deep research in real-world applications, where accumulated context can cause long-context degradation and reasoning failures.

💬 暂无讨论

12 小时前

ArXiv CS.AI

PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges

arXiv:2605.30803v1 Announce Type: new Abstract: LLM judges are increasingly used to evaluate open-ended responses, but their scores depend strongly on the rubrics that condition them.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Planner-Centric Reinforcement Learning for Deep Research with Structure-Aware Reward

arXiv:2605.30824v1 Announce Type: new Abstract: Deep research tasks require LLMs to plan what to investigate, retrieve evidence, and synthesize long-form answers across multiple branches of inquiry.

💬 暂无讨论

12 小时前

ArXiv CS.AI

SLAT: Segment-Level Adaptive Trimming for Efficient CoT Reasoning

arXiv:2605.30832v1 Announce Type: new Abstract: Recent advances in Large Reasoning Models have significantly improved chain-of-thought (CoT) capabilities via reinforcement learning (RL).

💬 暂无讨论

12 小时前

ArXiv CS.AI

COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

arXiv:2605.30838v1 Announce Type: new Abstract: LLM-powered search agents enable multi-step reasoning and tool use. However, these capabilities introduce retrieval-induced safety degradation, as harmful intents may decompose into seemingly innocuous sub-queries that lead to unsafe outcomes.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Distilling LLM Feedback for Lean Theorem Proving

arXiv:2605.30861v1 Announce Type: new Abstract: Post-training for reasoning models typically combines supervised fine-tuning with reinforcement learning from verifiable rewards, most commonly with GRPO. However, this algorithm suffers from sparse rewards, limited exploration, and mode collapse.

💬 暂无讨论

12 小时前

ArXiv CS.AI

UniScale: Adaptive Unified Inference Scaling via Online Joint Optimization of Model Routing and Test-Time Scaling

arXiv:2605.30898v1 Announce Type: new Abstract: In real-world deployments of large language models (LLMs), balancing inference quality and computational cost has become a central challenge.

💬 暂无讨论

12 小时前

ArXiv CS.AI

BilliardPhys-Bench: Benchmarking Physical Reasoning and Visual Dynamics of Multimodal LLMs

arXiv:2605.30900v1 Announce Type: new Abstract: Current multimodal models handle static image recognition well, but intuitive physical reasoning remains a weakness. Predicting how objects will move and interact from a single image is still difficult for these systems.

💬 暂无讨论

12 小时前

ArXiv CS.AI

A Persona-Based Evaluation Framework for Pluralistic Alignment in Generative AI

arXiv:2605.31021v1 Announce Type: new Abstract: Current alignment paradigms for generative artificial intelligence rely predominantly on monolithic benchmarking frameworks that reduce the plurality of human judgment to aggregated statistical baselines, thereby obscuring cultural, demographic, and c…

💬 暂无讨论

12 小时前

ArXiv CS.AI

HADT: A Heterogeneous Multi-Agent Differential Transformer for Autonomous Earth Observation Satellite Cluster

arXiv:2605.31023v1 Announce Type: new Abstract: This work addresses the problem of autonomous resource management in heterogeneous satellite cluster conducting Earth Observation (EO) missions including optical and Synthetic Aperture Radar (SAR) satellites.

💬 暂无讨论

12 小时前

ArXiv CS.AI

GraphARC: A Comprehensive Benchmark for Graph-Based Abstract Reasoning

arXiv:2605.31031v1 Announce Type: new Abstract: Relational reasoning lies at the heart of intelligence, but existing benchmarks are typically confined to formats such as grids or text. We introduce GraphARC, a benchmark for abstract reasoning on graph-structured data.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Vector Linking via Cross-Model Local Isometric Consistency

arXiv:2605.31100v1 Announce Type: new Abstract: We study Vector Linking: given two embedding clouds produced by different black-box encoders over partially overlapping datasets, recover cross-model object correspondences using only vectors.

💬 暂无讨论

12 小时前

ArXiv CS.AI

LLM-FACETS: A Privacy-Preserving Framework for Evaluating LLM Transparency and Accountability

arXiv:2605.31167v1 Announce Type: new Abstract: Assessing whether Large Language Models outputs are factually grounded, epistemically calibrated, and methodologically reproducible is a prerequisite for responsible AI deployment.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Formalizing and falsifying causal pathways of rare events

arXiv:2605.31254v1 Announce Type: new Abstract: Building on recent formalizations of root cause analysis for rare events (``outliers'') in structural equation models, we propose a formal definition of a causal pathway and discuss its testable implications.

💬 暂无讨论

12 小时前

ArXiv CS.AI

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

arXiv:2605.31264v1 Announce Type: new Abstract: LLM agents are increasingly expected not only to complete isolated tasks, but also to carry bounded representations of human expertise, judgment, and interaction style.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Industrializing Prediction-Powered Inference: The GLIDE Library for Reliable GenAI and Agentic Systems Evaluation

arXiv:2605.31278v1 Announce Type: new Abstract: Reliable evaluation of agentic systems requires unbiased estimates with valid uncertainty, but standard practice navigates between costly human annotation and biased LLM-as-judge proxies.

💬 暂无讨论

12 小时前

ArXiv CS.AI

TraceGraph: Shared Decision Landscapes for Diagnosing and Improving Agent Trajectories

arXiv:2605.31308v1 Announce Type: new Abstract: Agent benchmarks increasingly record rich interaction trajectories, yet evaluation often reduces each rollout to a pass rate or reward score.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents

arXiv:2605.31354v1 Announce Type: new Abstract: Modular visual reasoning systems increasingly rely on shared working memory for multi-step collaboration, yet the failure dynamics of intermediate state evolution in low-capacity regimes remain underexplored.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Learning to Adapt: Self-Improving Web Agent via Cognitive-Aware Exploration

arXiv:2605.31365v1 Announce Type: new Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have led to promising progress in web agents.

💬 暂无讨论

12 小时前

ArXiv CS.AI

HypoAgent: An Agentic Framework for Interactive Abductive Hypothesis Generation over Knowledge Graphs

arXiv:2605.31370v1 Announce Type: new Abstract: Abductive reasoning over knowledge graphs aims to generate logical hypotheses that explain observed entities or facts.

💬 暂无讨论

12 小时前

ArXiv CS.AI

FAM-Bench: A Multimodal Benchmark for Condition-Aware Food-as-Medicine Reasoning

arXiv:2605.31410v1 Announce Type: new Abstract: Food-as-Medicine requires models to reason beyond what a dish is or what nutrition it contains: they must decide whether a concrete food choice is appropriate for a specific health condition.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Answer-Set-Programming-based Abstractions for Reinforcement Learning

arXiv:2605.31444v1 Announce Type: new Abstract: Reinforcement Learning (RL) enables autonomous agents to learn policies from experience, but realistic problems often involve enormous state spaces, making learning and generalisation challenging.

💬 暂无讨论

12 小时前

ArXiv CS.AI

AutoSci: A Memory-Centric Agentic System for the Full Scientific Research Lifecycle

arXiv:2605.31468v1 Announce Type: new Abstract: Scientific research has traditionally been human-intensive, requiring researchers to coordinate literature, ideas, experiments, manuscripts, and review responses across long project cycles.

💬 暂无讨论

12 小时前

ArXiv CS.AI

LinTree: Improving LLM Reasoning with Explicitly Structured Search Histories

arXiv:2605.31492v1 Announce Type: new Abstract: Large language models (LLMs) often solve reasoning problems by generating intermediate traces that explore and revise partial solutions.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Choosing the Lens: Strategic Perspective Activation in Context-Dependent Argumentation

arXiv:2605.31581v1 Announce Type: new Abstract: The same arguments often need to be evaluated under different external regimes. An agent with influence over the regime has a strategic lever that standard formalisms do not directly capture.

💬 暂无讨论

12 小时前

ArXiv CS.AI

TRINE: A Token-Aware, Runtime-Adaptive FPGA Inference Engine for Multimodal AI

arXiv:2603.22867v1 Announce Type: cross Abstract: Multimodal stacks that mix ViTs, CNNs, GNNs, and transformer NLP strain embedded platforms because their compute/memory patterns diverge and hard real-time targets leave little slack.

💬 暂无讨论

12 小时前

ArXiv CS.AI

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

arXiv:2605.28918v1 Announce Type: cross Abstract: For sparse, structured reinforcement-learning tasks with semantic reward-function interfaces, LLM-generated reward shaping is better framed as debugging than one-shot generation.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Gradient-Free Training of Spiking Neural Networks via Low-Rank Evolution Strategies

arXiv:2605.30361v1 Announce Type: cross Abstract: Spiking Neural Networks (SNNs) offer compelling energy efficiency on neuromorphic hardware, yet their training remains challenging because the discrete spike threshold is non-differentiable.

💬 暂无讨论

12 小时前

ArXiv CS.AI

XOResNet: Exclusive-OR Meta-Residuals Facilitate Deep Spiking Neural Networks Learning

arXiv:2605.30362v1 Announce Type: cross Abstract: Spiking neural networks (SNNs) hold promise for demonstrating superior learning and representation capabilities in deep models.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Enhancing Regime Shift Detection Using Unstructured Data: A Study on the Treasury Market

arXiv:2605.30363v1 Announce Type: cross Abstract: Regime shifts in financial markets reorganise the joint dynamics of asset prices and macro variables, breaking any single-regime calibration.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Hamiltonian-Inspired Attention Mechanism for Scalable RF Transmitter Fingerprinting

arXiv:2605.30364v1 Announce Type: cross Abstract: Radio-frequency (RF) fingerprinting identifies wire-less transmitters using hardware-induced imperfections present in baseband I/Q signals.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Mental Damage: Caption Poisoning Attacks on Retrieval-Augmented Text-to-Music Generation

arXiv:2605.30365v1 Announce Type: cross Abstract: Retrieval-augmented text-to-music (TTM) systems augment underspecified user prompts using captions retrieved from a music caption dataset. This design introduces an integrity dependency on the music knowledge database.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Reinterpreting Safety Thresholds as Neuron Spiking Thresholds

arXiv:2605.30368v1 Announce Type: cross Abstract: Surrogate Safety Measures (SSMs) are extensively utilised in the evaluation of traffic risk in automated driving contexts.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Updating the standard neuron model in artificial neural networks

arXiv:2605.30370v1 Announce Type: cross Abstract: From their inception in the 1950s, artificial neural networks (ANNs) started using the so-called point neuron model then prevalent in neuroscience, hoping that this analogy would allow for a better emulation of brain function.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Evolutionary Algorithm for Reservoir Learning and Yielding

arXiv:2605.30372v1 Announce Type: cross Abstract: Reservoir computing, a type of recurrent neural network, is a promising approach for temporal learning as it separates dynamic processing from the trained readout layer.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Full-field prediction for engineering-scale three-dimensional aircraft with multigrid-hierarchical learning

arXiv:2605.30375v1 Announce Type: cross Abstract: High-fidelity computational fluid dynamics is essential for aerospace design, but engineering-scale simulations of practical three-dimensional aircraft remain computationally expensive.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Unicorn: Scaling High-Dimensional Time Series Forecasting via Universal Correlation Modeling

arXiv:2605.30376v1 Announce Type: cross Abstract: Modern time series architectures face a fundamental trade-off: channel-independent models scale well with increasing data volume but ignore critical inter-channel dependencies, while channel-dependent models are expressive but remain ``dimension-bou…

💬 暂无讨论

12 小时前

ArXiv CS.AI

When LLMs Learn to Be Consistently Wrong: A Multi-Model Study of Linear Representations of Synthetic Deception

arXiv:2605.30381v1 Announce Type: cross Abstract: Deceptive alignment, in which models maintain accurate internal representations while deliberately producing false outputs, remains a central challenge in AI safety.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Structured interactions improve distributed coordination beyond model scaling in a real-world multi-robot system

arXiv:2605.30383v1 Announce Type: cross Abstract: Scaling individual robot capabilities is common but costly. Here we investigate a system-level design question in real-world multi-robot coordination: given matched hardware budgets, does restructuring communication among robots yield larger gains t…

💬 暂无讨论

12 小时前

ArXiv CS.AI

LLMs Without Deep Neural Networks: New Architecture, Benefits and Case Study

arXiv:2605.30385v1 Announce Type: cross Abstract: The purpose of this article is to provide validation to my deep neural network alternative in the context of LLMs.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Functional MRI Time Series Generation via Wavelet-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification

arXiv:2605.30387v1 Announce Type: cross Abstract: Functional Magnetic Resonance Imaging (fMRI) provides non-invasive access to dynamic brain activity by measuring blood oxygen level-dependent (BOLD) signals over time.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Social Reasoning in Machines: Investigating Collective Truth-Seeking Dynamics in Large Language Model Debate

arXiv:2605.30391v1 Announce Type: cross Abstract: Human reasoning has long been theorised to operate socially, not through isolated individual cognition, but through collective adversarial discourse, a framework known as the Argumentative Theory of Reasoning (ATR).

💬 暂无讨论

12 小时前

ArXiv CS.AI

NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models

arXiv:2605.30393v1 Announce Type: cross Abstract: Public numeric benchmarks appear in pretraining, so an evaluation that conditions on a date may be measuring memorized recall rather than out-of-sample skill.

💬 暂无讨论

12 小时前

ArXiv CS.AI

CodeGolf Bench: A Multi-Language Benchmark for Evaluating Concise Code Generation Capabilities of Large Language Models

arXiv:2605.30394v1 Announce Type: cross Abstract: This paper introduces Code Bench, a benchmark capable of evaluating Large Language Models (LLMs) concise code generation abilities in 60 programming languages.

💬 暂无讨论

12 小时前

ArXiv CS.AI

AI Loss of Control Incident Management: Response & Resilience

arXiv:2605.30406v1 Announce Type: cross Abstract: Recent research demonstrating AI systems exhibiting deception and shutdown resistance suggests that AI loss of control (LOC) is an urgent policy concern , yet current literature focuses almost exclusively on alignment and prevention.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Exploring Autonomous Agentic Data Engineering for Model Specialization

arXiv:2605.30407v1 Announce Type: cross Abstract: Large Language Models (LLMs) have demonstrated strong performance on general tasks, while often struggling to adapt to specialized domains without high-quality domain-specific data.

💬 暂无讨论

12 小时前

ArXiv CS.AI

SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer

arXiv:2605.30409v1 Announce Type: cross Abstract: Real-time streaming video-to-video editing (V2V) is critical for interactive applications such as live broadcasting and gaming, yet it remains a formidable challenge due to the stringent requirements for temporal consistency and inference throughput…

💬 暂无讨论

12 小时前

ArXiv CS.AI

Domain Adaptation and Reasoning Frameworks in Language Models: A Controlled Experiment with Historical Cosmology

arXiv:2605.30415v1 Announce Type: cross Abstract: We investigate how domain adaptation reshapes explanatory behavior in language models using historical cosmology as a controlled setting.

💬 暂无讨论

12 小时前

ArXiv CS.AI

LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis

arXiv:2605.30434v1 Announce Type: cross Abstract: Real-world data analysis is inherently iterative, yet existing benchmarks mostly evaluate isolated or short interactive tasks, leaving agents' ability to track evolving analytical context over long horizons untested.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Calibrated Preference Learning: The Case of Label Ranking

arXiv:2605.30447v1 Announce Type: cross Abstract: Calibration, the alignment of predicted probabilities with true outcome frequencies, is essential for reliable decision-making.

💬 暂无讨论

12 小时前

ArXiv CS.AI

A Unified Framework for Gradient Aggregation in Multi-Objective Optimization

arXiv:2605.30452v1 Announce Type: cross Abstract: Many machine learning problems involve multiple inherent trade-offs that are best addressed by gradient-based multi-objective optimization (MOO) algorithms.

💬 暂无讨论

12 小时前

ArXiv CS.AI

The Surface You Test Is Not the Surface That Breaks

arXiv:2605.30454v1 Announce Type: cross Abstract: Tool-augmented LLM agents are vulnerable to prompt injection: a third party who controls part of the agent's context can plant instructions that the agent then executes as if they came from the user.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Scalable Constrained Multi-Agent Reinforcement Learning via State Augmentation and Consensus for Separable Dynamics

arXiv:2605.30461v1 Announce Type: cross Abstract: We present a distributed approach for constrained Multi-Agent Reinforcement Learning (MARL) that combines state-augmented policy learning with distributed consensus over dual variables.

💬 暂无讨论

12 小时前

ArXiv CS.AI

idSCD: Identifying Training Datasets through Semantic Correlation Descriptors

arXiv:2605.30462v1 Announce Type: cross Abstract: Can a dataset be recognized from the spurious correlations it induces during training? We argue that datasets leave dataset-specific traces in a model's learned semantic correlation structure: incidental regularities that are predictive within a dat…

💬 暂无讨论

12 小时前

ArXiv CS.AI

Graph-Conditioned Mixture of Graph Neural Network Experts for Traffic Forecasting

arXiv:2605.30486v1 Announce Type: cross Abstract: Spatio-temporal forecasting on sensor graphs is commonly tackled with a single backbone architecture applied uniformly across all nodes, although graph regions can exhibit different dynamics.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Improved Distribution Estimation in $\ell_\infty$

arXiv:2605.30509v1 Announce Type: cross Abstract: We present improved bounds for estimating discrete probability distributions under the $\ell_\infty$ norm. These include minimax bounds in expectation and high-probability tail bounds.

💬 暂无讨论

12 小时前

ArXiv CS.AI

A Novel Global Context-aware Deep Neural Network for Enhanced Brain Tumor Segmentation using Magnetic Resonance Images

arXiv:2605.30510v1 Announce Type: cross Abstract: Brain cancer's severity necessitates precise brain tumor segmentation, which is crucial for effective brain tumor diagnosis. Manual identification, burdened by high costs, labor, and error risks, highlights the need for automated methods.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't

arXiv:2605.30523v1 Announce Type: cross Abstract: Recent work describes what transformers can and cannot compute through connections to boolean circuits, but existing results lack exact characterizations and are sensitive to modeling choices.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Generalistic or Specific Embeddings, Which is Better? An Empirical Study on Search for Clinical Coding in Non-English Languages

arXiv:2605.30529v1 Announce Type: cross Abstract: Sentence-embedding models for semantic search are overwhelmingly developed and evaluated on English corpora.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

arXiv:2605.30557v1 Announce Type: cross Abstract: Spatial reasoning is a fundamental capability for vision-language models (VLMs) deployed in real-world environments.

💬 暂无讨论

12 小时前

ArXiv CS.AI

VLM3: Vision Language Models Are Native 3D Learners

arXiv:2605.30561v1 Announce Type: cross Abstract: Vision Language Models (VLMs) enable a unified model to solve various vision tasks through prompting. They have shown promising performance in semantic understanding.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

arXiv:2605.30571v1 Announce Type: cross Abstract: Physical AI systems, including robots, autonomous vehicles, embodied agents and edge copilots, often run a different inference workload from cloud LLM serving: single-stream, batch-1 autoregressive decode, where one robot, camera feed or user sessio…

💬 暂无讨论

12 小时前

ArXiv CS.AI

Prior Availability in Industrial Visual Sim-to-Real: A Review of CAD-Guided and CAD-Unavailable Regimes

arXiv:2605.30581v1 Announce Type: cross Abstract: Industrial visual sim-to-real is often described as transferring from synthetic images to real images, but industrial deployment usually involves a broader mismatch between available evidence and required decisions.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Benchmarking Machine Learning Uncertainty Quantification Methodologies for Predicting Turbine Gas Temperature Degradation

arXiv:2605.30585v1 Announce Type: cross Abstract: Effective prognostics and health management of modern engines relies on accurate turbine gas temperature predictions and robust uncertainty quantification to ensure reliability and safety.

💬 暂无讨论

12 小时前

ArXiv CS.AI

ImmigrationQA: A Source-Grounded Dataset and Small-Model Adaptation for U.S. Immigration Law

arXiv:2605.30589v1 Announce Type: cross Abstract: U.S. immigration law spans thousands of pages of official policy, federal regulations, and procedural guidance that change frequently and carry high stakes for petitioners who lack legal representation.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents

arXiv:2605.30590v1 Announce Type: cross Abstract: Two clinical AI systems can score nearly identically on coverage-based rubrics yet behave radically differently when their patient inputs change: one updates its recommendations to match the new clinical signal, while the other produces the same out…

💬 暂无讨论

12 小时前

ArXiv CS.AI

Scientific Machine Learning for Engine Health Management and Remaining Useful Life Prediction

arXiv:2605.30593v1 Announce Type: cross Abstract: Engine Health Management (EHM) depends on reliable forecasting of Remaining Useful Life (RUL) and on tracking thermal indicators such as turbine gas temperature (TGT).

💬 暂无讨论

12 小时前

ArXiv CS.AI

An Organization-Scoped LLM Agent Runtime Architecture for Regulated Cybersecurity Operations

arXiv:2605.30604v1 Announce Type: cross Abstract: Regulated cybersecurity workflows lack a runtime substrate that enforces organization-level scope across retrieval, tool calls, memory, findings, reports, and audit while remaining model-agnostic and locally deployable.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

arXiv:2605.30611v1 Announce Type: cross Abstract: Scientific figures are among the most effective means of communicating complex research ideas, yet producing publication-quality illustrations remains one of the most labor-intensive parts of paper preparation.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Reward Learning from Best-of-$N$ Preference Data: Targets, Tradeoffs, and Design Principles

arXiv:2605.30619v1 Announce Type: cross Abstract: Best-of-$N$ sampling is widely used to construct pairwise preference data: $N$ candidates are drawn from a base distribution, and the best is paired with a rejected response.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Active Timepoint Selection for Learning Measure-Valued Trajectories

arXiv:2605.30625v1 Announce Type: cross Abstract: Inferring continuous probability paths from sparse snapshots is a fundamental challenge in domains like single-cell biology, where high-fidelity data acquisition is often destructive and constrained by prohibitive sequencing costs.

💬 暂无讨论

12 小时前

ArXiv CS.AI

The Architecture of Errors: From Universal Impossibility to Patch-Local LLM Reliability

arXiv:2605.30628v1 Announce Type: cross Abstract: Universal LLM reliability is not a finite-library problem: across all possible tasks, tools, schemas, knowledge sources, and evaluator expectations, new intervention-distinguishable failure modes can appear without bound, so no finite intervention d…

💬 暂无讨论

12 小时前

ArXiv CS.AI

Controllable Lung Nodule Synthesis via Histogram-Regularized Latent Diffusion Models

arXiv:2605.30631v1 Announce Type: cross Abstract: While automated diagnosis systems have achieved remarkable success in computed tomography (CT)-based lung cancer screening, their development remains limited by the scarcity of diverse, annotated pulmonary nodule datasets.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Rationalize: Shared Semantic Reasoning for Human-AI Alignment

arXiv:2605.30632v1 Announce Type: cross Abstract: We introduce Rationalize, a role-pair framework for shared semantic reasoning between humans and AI models in data-driven sensemaking.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Score Broadcast and Decorrelation: A General Framework for Broadcast-Based Credit Assignment

arXiv:2605.30638v1 Announce Type: cross Abstract: We introduce Score Broadcast and Decorrelation (SBD), a principled framework for broadcast-based credit assignment for general families of differentiable losses.

💬 暂无讨论

12 小时前

ArXiv CS.AI

PInVerify: An Offline Embodied Benchmark for Active Instance Verification

arXiv:2605.30639v1 Announce Type: cross Abstract: Embodied agents have made strong progress in navigating to target objects, but reaching the goal vicinity does not guarantee that the agent has found the correct instance: subtle attribute differences (e.g., "white floral" vs.

💬 暂无讨论

12 小时前

ArXiv CS.AI

COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in Large Language Models

arXiv:2605.30641v1 Announce Type: cross Abstract: Large language models (LLMs) can reveal and amplify societal biases during chain-of-thought (CoT) generation.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Same Patient, Different Words, Different Diagnosis? Evaluating Semantic Stability in Clinical LLMs

arXiv:2605.30646v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used in clinical applications. However, their behavior remains highly sensitive to subtle linguistic variations, such as rephrasing or syntactic variation.

💬 暂无讨论

12 小时前

ArXiv CS.AI

LARK: Learnability-Grounded Trajectory Selection for Efficient Reasoning Distillation

arXiv:2605.30651v1 Announce Type: cross Abstract: We study trajectory selection for reasoning distillation, where teacher-generated reasoning trajectories are selectively used as supervision for a student model.

💬 暂无讨论

12 小时前

ArXiv CS.AI

EUDAIMONIA: Evaluating Undesirable Dynamics in AI

arXiv:2605.30654v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used as conversational partners for companionship, emotional disclosure, and interpersonal advice, but the social dynamics of these interactions can create harms that are not captured by capability-orien…

💬 暂无讨论

12 小时前

ArXiv CS.AI

Automatically Attacking Software Reverse Engineering AI Agents

arXiv:2605.30667v1 Announce Type: cross Abstract: Software tools for reverse engineering executable binary files, such as Ghidra, enable malware analysts to safely conduct robust static analysis without having access to original source code.

💬 暂无讨论

12 小时前

ArXiv CS.AI

CobSeg: Coherence Boundary Modeling for Dialogue Topic Segmentation

arXiv:2605.30668v1 Announce Type: cross Abstract: Dialogue topic segmentation is critical in many human-AI collaborative applications which requires identifying heterogeneous boundary cues, including lexical transitions near utterance edges and semantic discontinuities across utterances.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Human-Alignment, Calibration, and Activation Patterns in Large Language Model Uncertainty

arXiv:2605.30675v1 Announce Type: cross Abstract: Uncertainty Quantification is a large and growing subfield of large language model behavioral analysis.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Investigating Detection and Obfuscation of Prompt Injection Attacks Against Software Reverse Engineering AI Agents

arXiv:2605.30677v1 Announce Type: cross Abstract: Agentic software reverse engineering systems are vulnerable to prompt injection attacks placed into the source code of executable binary files.

💬 暂无讨论

12 小时前

ArXiv CS.AI

How Early Adopters Used Generative AI Worldwide: Variation by Country Income and Language

arXiv:2605.30685v1 Announce Type: cross Abstract: AI is being used by people globally, but not everyone is using it in the same ways. Using a large-scale dataset of anonymized, de-identified, and privacy-scrubbed interactions with a widely available and free AI chatbot, we empirically characterize …

💬 暂无讨论

12 小时前

ArXiv CS.AI

Depth-Dependent Indirect Prompt Injection in Tool-Calling ReAct Agents: Injection Depth, Payload Framing, and Turn-Budget Sensitivity

arXiv:2605.30686v1 Announce Type: cross Abstract: ReAct agents that interleave chain-of-thought reasoning with tool calls are increasingly deployed for real tasks such as scheduling, file retrieval, and data access.

💬 暂无讨论

12 小时前

ArXiv CS.AI

ConTrans: Learning Text-enhanced Local-global Temporal Representations for Zero-shot Temporal Action Localization

arXiv:2605.30689v1 Announce Type: cross Abstract: Zero-shot Temporal Action Localization (ZS-TAL) aims to detect and locate previously unseen actions in untrimmed videos.

💬 暂无讨论

12 小时前

ArXiv CS.AI

Seeing Before Agreeing: Aligning Multi-Agent Consensus with Visual Evidence

arXiv:2605.30698v1 Announce Type: cross Abstract: Vision-language models (VLMs) have achieved strong performance on visual question answering (VQA). To mitigate individual hallucinations and blind spots, aggregating diverse perspectives via multi-agent collaboration has emerged as a promising parad…

💬 暂无讨论

12 小时前

ArXiv CS.AI

SAGE: A Novelty Gate for Efficient Memory Evolution in Agentic LLMs

arXiv:2605.30711v1 Announce Type: cross Abstract: Agentic LLMs must continuously decide whether newly extracted facts should be added, merged with existing memories, or ignored, yet prior work has focused more on retrieval and storage than on principled write-side control.

💬 暂无讨论

12 小时前