arxiv:2510.02410

OpenTSLM: Time-Series Language Models for Reasoning over Multivariate Medical Text- and Time-Series Data

Published on Oct 2

· Submitted by

Thomas Wolf on Oct 6

OpenTSLM - Open Source Time Series Language Models

Upvote

Authors:

Patrick Langer ,

Thomas Kaar ,

Abstract

OpenTSLM integrates time series into pretrained LLMs using soft prompting and cross-attention, outperforming text-only models on clinical reasoning tasks.

AI-generated summary

LLMs have emerged as powerful tools for interpreting multimodal data. In medicine, they hold particular promise for synthesizing large volumes of clinical information into actionable insights and digital health applications. Yet, a major limitation remains their inability to handle time series. To overcome this gap, we present OpenTSLM, a family of Time Series Language Models (TSLMs) created by integrating time series as a native modality to pretrained LLMs, enabling reasoning over multiple time series of any length. We investigate two architectures for OpenTSLM. The first, OpenTSLM-SoftPrompt, models time series implicitly by concatenating learnable time series tokens with text tokens via soft prompting. Although parameter-efficient, we hypothesize that explicit time series modeling scales better and outperforms implicit approaches. We thus introduce OpenTSLM-Flamingo, which integrates time series with text via cross-attention. We benchmark both variants against baselines that treat time series as text tokens or plots, across a suite of text-time-series Chain-of-Thought (CoT) reasoning tasks. We introduce three datasets: HAR-CoT, Sleep-CoT, and ECG-QA-CoT. Across all, OpenTSLM models outperform baselines, reaching 69.9 F1 in sleep staging and 65.4 in HAR, compared to 9.05 and 52.2 for finetuned text-only models. Notably, even 1B-parameter OpenTSLM models surpass GPT-4o (15.47 and 2.95). OpenTSLM-Flamingo matches OpenTSLM-SoftPrompt in performance and outperforms on longer sequences, while maintaining stable memory requirements. By contrast, SoftPrompt grows exponentially in memory with sequence length, requiring around 110 GB compared to 40 GB VRAM when training on ECG-QA with LLaMA-3B. Expert reviews by clinicians find strong reasoning capabilities exhibited by OpenTSLMs on ECG-QA. To facilitate further research, we provide all code, datasets, and models open-source.

View arXiv page View PDF GitHub 1.03k Add to collection

Community

thomwolf

Paper submitter 22 days ago

💡 Highlights

𝐓𝐒𝐋𝐌𝐬 can caption and reason about multiple raw time series of varying length.
𝐏𝐫𝐞𝐬𝐞𝐫𝐯𝐞𝐝 𝐋𝐋𝐌 𝐜𝐚𝐩𝐚𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬: add additional context, background, or follow-up questions.
𝐒𝐭𝐚𝐭𝐞-𝐨𝐟-𝐭𝐡𝐞-𝐚𝐫𝐭 𝐫𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠 𝐫𝐞𝐬𝐮𝐥𝐭𝐬: demonstrated across diverse clinical reasoning tasks, including ECG question answering, rationales verified by medical doctors.
𝐎𝐩𝐞𝐧-𝐬𝐨𝐮𝐫𝐜𝐞: code, datasets, and pretrained models are available (MIT license).