A Contextual-Bandit Approach to Personalized News Article Recommendation
Source
- Raw Markdown: paper_yahoo-contextual-bandit-2010.md
- PDF: paper_yahoo-contextual-bandit-2010.pdf
Core Claim
The Yahoo! Front Page news recommendation line uses randomized logged traffic to evaluate contextual bandit policies over article actions.
Action-Time-Series Notes
- The sequence is a temporal log of recommendation decisions, contexts, actions, and click rewards.
- It is valuable for action-response modeling but often lacks a rich next-state observation.
- It belongs in a weak-time-series / bandit category for world-model comparison.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Causal structure, counterfactuals, and control | partially closes | The paper formalizes sequential contextual-bandit actions, clicked rewards, and unbiased offline policy evaluation from randomized traffic. | Feedback is one-step reward only; unchosen-arm outcomes and next-state trajectories are absent. |
| Dynamic compute allocation | adjacent | LinUCB updates fixed-size matrices incrementally and is designed for fast online personalization under large traffic. | This is algorithmic efficiency, not adaptive neural inference compute. |
| Benchmark level | warning | The 36M-event Yahoo random bucket is strong for off-policy bandit evaluation. | It does not test latent-state maintenance, multivariate dynamics, or action-conditioned rollouts. |