TarDiff: Target-Oriented Diffusion Guidance for Synthetic Electronic Health Record Time Series Generation

Source

Status And Credibility

TarDiff was posted to arXiv on 2025-04-24. The ACM Digital Library page lists it as a KDD 2025 paper, and the official code is in the Microsoft TimeCraft repository.

Core Claim

TarDiff argues that synthetic EHR time series should be optimized for downstream task utility, not only distributional fidelity. It estimates the influence of generated samples on a task-specific guidance set and injects that influence gradient into diffusion sampling.

Key Contributions

  • Defines target-oriented synthetic EHR generation through task-specific loss reduction.
  • Uses influence functions to estimate how synthetic samples would affect downstream performance.
  • Adds influence guidance to the reverse diffusion process.
  • Evaluates on six EHR or physiological-signal datasets, including MIMIC-III and eICU.

Evidence And Results

The paper reports train-on-synthetic-test-on-real and train-on-synthetic-and-real-test-on-real evaluations. It reports improvements up to 20.4% AUPRC and 18.4% AUROC over baselines on clinical prediction tasks and analyzes minority-class behavior under class imbalance.

Limitations

  • Utility guidance depends on the downstream model and guidance set; poor or leaky guidance can steer the generator toward benchmark artifacts.
  • The guidance set must match the downstream task distribution, which is a strong operational assumption.
  • The paper improves supervised clinical prediction utility, not causal treatment-response or intervention planning.
  • EHR data are confounded logged decision data; generated observations should not be interpreted as valid counterfactual patient trajectories without causal assumptions.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Synthetic data and rare-regime coveragepartially closesInfluence-guided generation targets clinical utility and class imbalance rather than average fidelity alone.Needs strict leakage controls and deployment-grade privacy audits.
Time-series generation and editingpartially closesDiffusion samples are steered by task-specific gradients.No edit interface and no action-conditioned patient state rollout.
Control and counterfactualsinsufficient evidenceClinical tasks involve outcomes, but treatments/interventions are not modeled as controllable inputs.Needs treatment/action history, confounding controls, and counterfactual evaluation.