Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
Source
- Raw Markdown: paper_open-bandit-dataset-2020.md
- PDF: paper_open-bandit-dataset-2020.pdf
Core Claim
Open Bandit Dataset provides logged bandit feedback from ZOZOTOWN with actions, rewards, and propensities for off-policy evaluation.
Action-Time-Series Notes
- It has explicit actions and propensities, but its temporal dynamics are weaker than full trajectory datasets.
- It is best viewed as contextual action-response data rather than a rich world-model dataset.
- It is useful for testing causal/off-policy pieces of an action-conditioned modeling stack.
Foundation TSFM Relevance
| Agenda slot | Verdict | Evidence | Missing pieces |
|---|---|---|---|
| Causal structure, counterfactuals, and control | partially closes | The readable raw abstract describes large-scale bandit feedback data for evaluating off-policy estimators and bandit algorithms. | The converted raw Markdown is not fully expanded, and the benchmark is one-step recommendation rather than state rollout. |
| Benchmark level | warning | Logged rewards and known policies test action selection and off-policy evaluation. | Needs temporal state transitions, richer context history, and next-state targets. |