Results and Retrospective Analysis of the CODS 2025 AssetOpsBench Challenge Paper • 2605.08518 • Published 15 days ago • 10
MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments Paper • 2605.09131 • Published 14 days ago • 55
DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules Paper • 2605.08614 • Published 14 days ago • 7
Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs Paper • 2603.16932 • Published Mar 14 • 89
view article Article IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST ibm-research • Feb 18 • 19
view article Article AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality ibm-research • Jan 21 • 33
A Large Encoder-Decoder Family of Foundation Models For Chemical Language Paper • 2407.20267 • Published Jul 24, 2024 • 32
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance Paper • 2506.03828 • Published Jun 4, 2025 • 20
Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series Paper • 2401.03955 • Published Jan 8, 2024 • 13
FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure Modes Paper • 2506.03278 • Published Jun 3, 2025 • 7