A Contextual-Bandit Approach to Personalized News Article Recommendation

Source

Core Claim

The Yahoo! Front Page news recommendation line uses randomized logged traffic to evaluate contextual bandit policies over article actions.

Action-Time-Series Notes

  • The sequence is a temporal log of recommendation decisions, contexts, actions, and click rewards.
  • It is valuable for action-response modeling but often lacks a rich next-state observation.
  • It belongs in a weak-time-series / bandit category for world-model comparison.

Foundation TSFM Relevance

Agenda slotVerdictEvidenceMissing pieces
Causal structure, counterfactuals, and controlpartially closesThe paper formalizes sequential contextual-bandit actions, clicked rewards, and unbiased offline policy evaluation from randomized traffic.Feedback is one-step reward only; unchosen-arm outcomes and next-state trajectories are absent.
Dynamic compute allocationadjacentLinUCB updates fixed-size matrices incrementally and is designed for fast online personalization under large traffic.This is algorithmic efficiency, not adaptive neural inference compute.
Benchmark levelwarningThe 36M-event Yahoo random bucket is strong for off-policy bandit evaluation.It does not test latent-state maintenance, multivariate dynamics, or action-conditioned rollouts.