OpenTSLM: Time-Series Language Models for Reasoning over Multivariate Medical Text- and Time-Series Data
Abstract
OpenTSLM integrates time series into pretrained LLMs using soft prompting and cross-attention, outperforming text-only models on clinical reasoning tasks.
LLMs have emerged as powerful tools for interpreting multimodal data. In medicine, they hold particular promise for synthesizing large volumes of clinical information into actionable insights and digital health applications. Yet, a major limitation remains their inability to handle time series. To overcome this gap, we present OpenTSLM, a family of Time Series Language Models (TSLMs) created by integrating time series as a native modality to pretrained LLMs, enabling reasoning over multiple time series of any length. We investigate two architectures for OpenTSLM. The first, OpenTSLM-SoftPrompt, models time series implicitly by concatenating learnable time series tokens with text tokens via soft prompting. Although parameter-efficient, we hypothesize that explicit time series modeling scales better and outperforms implicit approaches. We thus introduce OpenTSLM-Flamingo, which integrates time series with text via cross-attention. We benchmark both variants against baselines that treat time series as text tokens or plots, across a suite of text-time-series Chain-of-Thought (CoT) reasoning tasks. We introduce three datasets: HAR-CoT, Sleep-CoT, and ECG-QA-CoT. Across all, OpenTSLM models outperform baselines, reaching 69.9 F1 in sleep staging and 65.4 in HAR, compared to 9.05 and 52.2 for finetuned text-only models. Notably, even 1B-parameter OpenTSLM models surpass GPT-4o (15.47 and 2.95). OpenTSLM-Flamingo matches OpenTSLM-SoftPrompt in performance and outperforms on longer sequences, while maintaining stable memory requirements. By contrast, SoftPrompt grows exponentially in memory with sequence length, requiring around 110 GB compared to 40 GB VRAM when training on ECG-QA with LLaMA-3B. Expert reviews by clinicians find strong reasoning capabilities exhibited by OpenTSLMs on ECG-QA. To facilitate further research, we provide all code, datasets, and models open-source.
Community
๐ก Highlights
- ๐๐๐๐๐ฌ can caption and reason about multiple raw time series of varying length.
- ๐๐ซ๐๐ฌ๐๐ซ๐ฏ๐๐ ๐๐๐ ๐๐๐ฉ๐๐๐ข๐ฅ๐ข๐ญ๐ข๐๐ฌ: add additional context, background, or follow-up questions.
- ๐๐ญ๐๐ญ๐-๐จ๐-๐ญ๐ก๐-๐๐ซ๐ญ ๐ซ๐๐๐ฌ๐จ๐ง๐ข๐ง๐ ๐ซ๐๐ฌ๐ฎ๐ฅ๐ญ๐ฌ: demonstrated across diverse clinical reasoning tasks, including ECG question answering, rationales verified by medical doctors.
- ๐๐ฉ๐๐ง-๐ฌ๐จ๐ฎ๐ซ๐๐: code, datasets, and pretrained models are available (MIT license).
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Augmenting LLMs for General Time Series Understanding and Prediction (2025)
- DiagECG: An LLM-Driven Framework for Diagnostic Reasoning via Discretized ECG Tokenization (2025)
- BEDTime: A Unified Benchmark for Automatically Describing Time Series (2025)
- Integrating Text and Time-Series into (Large) Language Models to Predict Medical Outcomes (2025)
- CaTS-Bench: Can Language Models Describe Numeric Time Series? (2025)
- BALM-TSF: Balanced Multimodal Alignment for LLM-Based Time Series Forecasting (2025)
- UniCast: A Unified Multimodal Prompting Framework for Time Series Forecasting (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper