---
abstract: |
  Generative models aim to simulate realistic effects of various actions across different contexts, from text generation to visual effects. Despite significant efforts to build real-world simulators, the application of generative models to virtual worlds, like financial markets, remains under-explored. In financial markets, generative models can simulate complex market effects of participants with various behaviors, enabling interaction under different market conditions, and training strategies without financial risk. This simulation relies on the finest structured data in financial market like orders thus building the finest realistic simulation. We propose Large Market Model (LMM), an order-level generative foundation model, for financial market simulation, akin to language modeling in the digital world. Our financial Market Simulation engine (MarS), powered by LMM, addresses the domain-specific need for realistic, interactive and controllable order generation. Key observations include LMM's strong scalability across data size and model complexity, and MarS's robust and practicable realism in controlled generation with market impact. We showcase MarS as a forecast tool, detection system, analysis platform, and agent training environment, thus demonstrating MarS's \`\`paradigm shift" potential for a variety of financial applications. We release the code of MarS at <https://github.com/microsoft/MarS/>.
author:
- |
  Junjie Li[^1], Yang Liu[]{.footnote-mark note-num="1"}, Weiqing Liu[^2], Shikai Fang, Lewen Wang, Chang Xu & Jiang Bian\
  Microsoft Research Asia\
  `{junli,yangliu2,weiqing.liu,fangshikai,lewen.wang,`\
  `chanx,jiang.bian}@microsoft.com`\
bibliography:
- reference_text/ref.bib
title: "MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model"
---

\newcommand{\figleft}{{\em (Left)}}
\newcommand{\figcenter}{{\em (Center)}}
\newcommand{\figright}{{\em (Right)}}
\newcommand{\figtop}{{\em (Top)}}
\newcommand{\figbottom}{{\em (Bottom)}}
\newcommand{\captiona}{{\em (a)}}
\newcommand{\captionb}{{\em (b)}}
\newcommand{\captionc}{{\em (c)}}
\newcommand{\captiond}{{\em (d)}}
\newcommand{\newterm}[1]{{\bf #1}}
\def\figref#1{figure~\ref{#1}}
\def\Figref#1{Figure~\ref{#1}}
\def\twofigref#1#2{figures \ref{#1} and \ref{#2}}
\def\quadfigref#1#2#3#4{figures \ref{#1}, \ref{#2}, \ref{#3} and \ref{#4}}
\def\secref#1{section~\ref{#1}}
\def\Secref#1{Section~\ref{#1}}
\def\twosecrefs#1#2{sections \ref{#1} and \ref{#2}}
\def\secrefs#1#2#3{sections \ref{#1}, \ref{#2} and \ref{#3}}
\def\eqref#1{equation~\ref{#1}}
\def\Eqref#1{Equation~\ref{#1}}
\def\plaineqref#1{\ref{#1}}
\def\chapref#1{chapter~\ref{#1}}
\def\Chapref#1{Chapter~\ref{#1}}
\def\rangechapref#1#2{chapters\ref{#1}--\ref{#2}}
\def\algref#1{algorithm~\ref{#1}}
\def\Algref#1{Algorithm~\ref{#1}}
\def\twoalgref#1#2{algorithms \ref{#1} and \ref{#2}}
\def\Twoalgref#1#2{Algorithms \ref{#1} and \ref{#2}}
\def\partref#1{part~\ref{#1}}
\def\Partref#1{Part~\ref{#1}}
\def\twopartref#1#2{parts \ref{#1} and \ref{#2}}
\def\ceil#1{\lceil #1 \rceil}
\def\floor#1{\lfloor #1 \rfloor}
\def\1{\bm{1}}
\newcommand{\train}{\mathcal{D}}
\newcommand{\valid}{\mathcal{D_{\mathrm{valid}}}}
\newcommand{\test}{\mathcal{D_{\mathrm{test}}}}
\def\eps{{\epsilon}}
\def\reta{{\textnormal{$\eta$}}}
\def\ra{{\textnormal{a}}}
\def\rb{{\textnormal{b}}}
\def\rc{{\textnormal{c}}}
\def\rd{{\textnormal{d}}}
\def\re{{\textnormal{e}}}
\def\rf{{\textnormal{f}}}
\def\rg{{\textnormal{g}}}
\def\rh{{\textnormal{h}}}
\def\ri{{\textnormal{i}}}
\def\rj{{\textnormal{j}}}
\def\rk{{\textnormal{k}}}
\def\rl{{\textnormal{l}}}
\def\rn{{\textnormal{n}}}
\def\ro{{\textnormal{o}}}
\def\rp{{\textnormal{p}}}
\def\rq{{\textnormal{q}}}
\def\rr{{\textnormal{r}}}
\def\rs{{\textnormal{s}}}
\def\rt{{\textnormal{t}}}
\def\ru{{\textnormal{u}}}
\def\rv{{\textnormal{v}}}
\def\rw{{\textnormal{w}}}
\def\rx{{\textnormal{x}}}
\def\ry{{\textnormal{y}}}
\def\rz{{\textnormal{z}}}
\def\rvepsilon{{\mathbf{\epsilon}}}
\def\rvtheta{{\mathbf{\theta}}}
\def\rva{{\mathbf{a}}}
\def\rvb{{\mathbf{b}}}
\def\rvc{{\mathbf{c}}}
\def\rvd{{\mathbf{d}}}
\def\rve{{\mathbf{e}}}
\def\rvf{{\mathbf{f}}}
\def\rvg{{\mathbf{g}}}
\def\rvh{{\mathbf{h}}}
\def\rvu{{\mathbf{i}}}
\def\rvj{{\mathbf{j}}}
\def\rvk{{\mathbf{k}}}
\def\rvl{{\mathbf{l}}}
\def\rvm{{\mathbf{m}}}
\def\rvn{{\mathbf{n}}}
\def\rvo{{\mathbf{o}}}
\def\rvp{{\mathbf{p}}}
\def\rvq{{\mathbf{q}}}
\def\rvr{{\mathbf{r}}}
\def\rvs{{\mathbf{s}}}
\def\rvt{{\mathbf{t}}}
\def\rvu{{\mathbf{u}}}
\def\rvv{{\mathbf{v}}}
\def\rvw{{\mathbf{w}}}
\def\rvx{{\mathbf{x}}}
\def\rvy{{\mathbf{y}}}
\def\rvz{{\mathbf{z}}}
\def\erva{{\textnormal{a}}}
\def\ervb{{\textnormal{b}}}
\def\ervc{{\textnormal{c}}}
\def\ervd{{\textnormal{d}}}
\def\erve{{\textnormal{e}}}
\def\ervf{{\textnormal{f}}}
\def\ervg{{\textnormal{g}}}
\def\ervh{{\textnormal{h}}}
\def\ervi{{\textnormal{i}}}
\def\ervj{{\textnormal{j}}}
\def\ervk{{\textnormal{k}}}
\def\ervl{{\textnormal{l}}}
\def\ervm{{\textnormal{m}}}
\def\ervn{{\textnormal{n}}}
\def\ervo{{\textnormal{o}}}
\def\ervp{{\textnormal{p}}}
\def\ervq{{\textnormal{q}}}
\def\ervr{{\textnormal{r}}}
\def\ervs{{\textnormal{s}}}
\def\ervt{{\textnormal{t}}}
\def\ervu{{\textnormal{u}}}
\def\ervv{{\textnormal{v}}}
\def\ervw{{\textnormal{w}}}
\def\ervx{{\textnormal{x}}}
\def\ervy{{\textnormal{y}}}
\def\ervz{{\textnormal{z}}}
\def\rmA{{\mathbf{A}}}
\def\rmB{{\mathbf{B}}}
\def\rmC{{\mathbf{C}}}
\def\rmD{{\mathbf{D}}}
\def\rmE{{\mathbf{E}}}
\def\rmF{{\mathbf{F}}}
\def\rmG{{\mathbf{G}}}
\def\rmH{{\mathbf{H}}}
\def\rmI{{\mathbf{I}}}
\def\rmJ{{\mathbf{J}}}
\def\rmK{{\mathbf{K}}}
\def\rmL{{\mathbf{L}}}
\def\rmM{{\mathbf{M}}}
\def\rmN{{\mathbf{N}}}
\def\rmO{{\mathbf{O}}}
\def\rmP{{\mathbf{P}}}
\def\rmQ{{\mathbf{Q}}}
\def\rmR{{\mathbf{R}}}
\def\rmS{{\mathbf{S}}}
\def\rmT{{\mathbf{T}}}
\def\rmU{{\mathbf{U}}}
\def\rmV{{\mathbf{V}}}
\def\rmW{{\mathbf{W}}}
\def\rmX{{\mathbf{X}}}
\def\rmY{{\mathbf{Y}}}
\def\rmZ{{\mathbf{Z}}}
\def\ermA{{\textnormal{A}}}
\def\ermB{{\textnormal{B}}}
\def\ermC{{\textnormal{C}}}
\def\ermD{{\textnormal{D}}}
\def\ermE{{\textnormal{E}}}
\def\ermF{{\textnormal{F}}}
\def\ermG{{\textnormal{G}}}
\def\ermH{{\textnormal{H}}}
\def\ermI{{\textnormal{I}}}
\def\ermJ{{\textnormal{J}}}
\def\ermK{{\textnormal{K}}}
\def\ermL{{\textnormal{L}}}
\def\ermM{{\textnormal{M}}}
\def\ermN{{\textnormal{N}}}
\def\ermO{{\textnormal{O}}}
\def\ermP{{\textnormal{P}}}
\def\ermQ{{\textnormal{Q}}}
\def\ermR{{\textnormal{R}}}
\def\ermS{{\textnormal{S}}}
\def\ermT{{\textnormal{T}}}
\def\ermU{{\textnormal{U}}}
\def\ermV{{\textnormal{V}}}
\def\ermW{{\textnormal{W}}}
\def\ermX{{\textnormal{X}}}
\def\ermY{{\textnormal{Y}}}
\def\ermZ{{\textnormal{Z}}}
\def\vzero{{\bm{0}}}
\def\vone{{\bm{1}}}
\def\vmu{{\bm{\mu}}}
\def\vtheta{{\bm{\theta}}}
\def\va{{\bm{a}}}
\def\vb{{\bm{b}}}
\def\vc{{\bm{c}}}
\def\vd{{\bm{d}}}
\def\ve{{\bm{e}}}
\def\vf{{\bm{f}}}
\def\vg{{\bm{g}}}
\def\vh{{\bm{h}}}
\def\vi{{\bm{i}}}
\def\vj{{\bm{j}}}
\def\vk{{\bm{k}}}
\def\vl{{\bm{l}}}
\def\vm{{\bm{m}}}
\def\vn{{\bm{n}}}
\def\vo{{\bm{o}}}
\def\vp{{\bm{p}}}
\def\vq{{\bm{q}}}
\def\vr{{\bm{r}}}
\def\vs{{\bm{s}}}
\def\vt{{\bm{t}}}
\def\vu{{\bm{u}}}
\def\vv{{\bm{v}}}
\def\vw{{\bm{w}}}
\def\vx{{\bm{x}}}
\def\vy{{\bm{y}}}
\def\vz{{\bm{z}}}
\def\evalpha{{\alpha}}
\def\evbeta{{\beta}}
\def\evepsilon{{\epsilon}}
\def\evlambda{{\lambda}}
\def\evomega{{\omega}}
\def\evmu{{\mu}}
\def\evpsi{{\psi}}
\def\evsigma{{\sigma}}
\def\evtheta{{\theta}}
\def\eva{{a}}
\def\evb{{b}}
\def\evc{{c}}
\def\evd{{d}}
\def\eve{{e}}
\def\evf{{f}}
\def\evg{{g}}
\def\evh{{h}}
\def\evi{{i}}
\def\evj{{j}}
\def\evk{{k}}
\def\evl{{l}}
\def\evm{{m}}
\def\evn{{n}}
\def\evo{{o}}
\def\evp{{p}}
\def\evq{{q}}
\def\evr{{r}}
\def\evs{{s}}
\def\evt{{t}}
\def\evu{{u}}
\def\evv{{v}}
\def\evw{{w}}
\def\evx{{x}}
\def\evy{{y}}
\def\evz{{z}}
\def\mA{{\bm{A}}}
\def\mB{{\bm{B}}}
\def\mC{{\bm{C}}}
\def\mD{{\bm{D}}}
\def\mE{{\bm{E}}}
\def\mF{{\bm{F}}}
\def\mG{{\bm{G}}}
\def\mH{{\bm{H}}}
\def\mI{{\bm{I}}}
\def\mJ{{\bm{J}}}
\def\mK{{\bm{K}}}
\def\mL{{\bm{L}}}
\def\mM{{\bm{M}}}
\def\mN{{\bm{N}}}
\def\mO{{\bm{O}}}
\def\mP{{\bm{P}}}
\def\mQ{{\bm{Q}}}
\def\mR{{\bm{R}}}
\def\mS{{\bm{S}}}
\def\mT{{\bm{T}}}
\def\mU{{\bm{U}}}
\def\mV{{\bm{V}}}
\def\mW{{\bm{W}}}
\def\mX{{\bm{X}}}
\def\mY{{\bm{Y}}}
\def\mZ{{\bm{Z}}}
\def\mBeta{{\bm{\beta}}}
\def\mPhi{{\bm{\Phi}}}
\def\mLambda{{\bm{\Lambda}}}
\def\mSigma{{\bm{\Sigma}}}
\newcommand{\tens}[1]{\bm{\mathsfit{#1}}}
\def\tA{{\tens{A}}}
\def\tB{{\tens{B}}}
\def\tC{{\tens{C}}}
\def\tD{{\tens{D}}}
\def\tE{{\tens{E}}}
\def\tF{{\tens{F}}}
\def\tG{{\tens{G}}}
\def\tH{{\tens{H}}}
\def\tI{{\tens{I}}}
\def\tJ{{\tens{J}}}
\def\tK{{\tens{K}}}
\def\tL{{\tens{L}}}
\def\tM{{\tens{M}}}
\def\tN{{\tens{N}}}
\def\tO{{\tens{O}}}
\def\tP{{\tens{P}}}
\def\tQ{{\tens{Q}}}
\def\tR{{\tens{R}}}
\def\tS{{\tens{S}}}
\def\tT{{\tens{T}}}
\def\tU{{\tens{U}}}
\def\tV{{\tens{V}}}
\def\tW{{\tens{W}}}
\def\tX{{\tens{X}}}
\def\tY{{\tens{Y}}}
\def\tZ{{\tens{Z}}}
\def\gA{{\mathcal{A}}}
\def\gB{{\mathcal{B}}}
\def\gC{{\mathcal{C}}}
\def\gD{{\mathcal{D}}}
\def\gE{{\mathcal{E}}}
\def\gF{{\mathcal{F}}}
\def\gG{{\mathcal{G}}}
\def\gH{{\mathcal{H}}}
\def\gI{{\mathcal{I}}}
\def\gJ{{\mathcal{J}}}
\def\gK{{\mathcal{K}}}
\def\gL{{\mathcal{L}}}
\def\gM{{\mathcal{M}}}
\def\gN{{\mathcal{N}}}
\def\gO{{\mathcal{O}}}
\def\gP{{\mathcal{P}}}
\def\gQ{{\mathcal{Q}}}
\def\gR{{\mathcal{R}}}
\def\gS{{\mathcal{S}}}
\def\gT{{\mathcal{T}}}
\def\gU{{\mathcal{U}}}
\def\gV{{\mathcal{V}}}
\def\gW{{\mathcal{W}}}
\def\gX{{\mathcal{X}}}
\def\gY{{\mathcal{Y}}}
\def\gZ{{\mathcal{Z}}}
\def\sA{{\mathbb{A}}}
\def\sB{{\mathbb{B}}}
\def\sC{{\mathbb{C}}}
\def\sD{{\mathbb{D}}}
\def\sF{{\mathbb{F}}}
\def\sG{{\mathbb{G}}}
\def\sH{{\mathbb{H}}}
\def\sI{{\mathbb{I}}}
\def\sJ{{\mathbb{J}}}
\def\sK{{\mathbb{K}}}
\def\sL{{\mathbb{L}}}
\def\sM{{\mathbb{M}}}
\def\sN{{\mathbb{N}}}
\def\sO{{\mathbb{O}}}
\def\sP{{\mathbb{P}}}
\def\sQ{{\mathbb{Q}}}
\def\sR{{\mathbb{R}}}
\def\sS{{\mathbb{S}}}
\def\sT{{\mathbb{T}}}
\def\sU{{\mathbb{U}}}
\def\sV{{\mathbb{V}}}
\def\sW{{\mathbb{W}}}
\def\sX{{\mathbb{X}}}
\def\sY{{\mathbb{Y}}}
\def\sZ{{\mathbb{Z}}}
\def\emLambda{{\Lambda}}
\def\emA{{A}}
\def\emB{{B}}
\def\emC{{C}}
\def\emD{{D}}
\def\emE{{E}}
\def\emF{{F}}
\def\emG{{G}}
\def\emH{{H}}
\def\emI{{I}}
\def\emJ{{J}}
\def\emK{{K}}
\def\emL{{L}}
\def\emM{{M}}
\def\emN{{N}}
\def\emO{{O}}
\def\emP{{P}}
\def\emQ{{Q}}
\def\emR{{R}}
\def\emS{{S}}
\def\emT{{T}}
\def\emU{{U}}
\def\emV{{V}}
\def\emW{{W}}
\def\emX{{X}}
\def\emY{{Y}}
\def\emZ{{Z}}
\def\emSigma{{\Sigma}}
\newcommand{\etens}[1]{\mathsfit{#1}}
\def\etLambda{{\etens{\Lambda}}}
\def\etA{{\etens{A}}}
\def\etB{{\etens{B}}}
\def\etC{{\etens{C}}}
\def\etD{{\etens{D}}}
\def\etE{{\etens{E}}}
\def\etF{{\etens{F}}}
\def\etG{{\etens{G}}}
\def\etH{{\etens{H}}}
\def\etI{{\etens{I}}}
\def\etJ{{\etens{J}}}
\def\etK{{\etens{K}}}
\def\etL{{\etens{L}}}
\def\etM{{\etens{M}}}
\def\etN{{\etens{N}}}
\def\etO{{\etens{O}}}
\def\etP{{\etens{P}}}
\def\etQ{{\etens{Q}}}
\def\etR{{\etens{R}}}
\def\etS{{\etens{S}}}
\def\etT{{\etens{T}}}
\def\etU{{\etens{U}}}
\def\etV{{\etens{V}}}
\def\etW{{\etens{W}}}
\def\etX{{\etens{X}}}
\def\etY{{\etens{Y}}}
\def\etZ{{\etens{Z}}}
\newcommand{\pdata}{p_{\rm{data}}}
\newcommand{\ptrain}{\hat{p}_{\rm{data}}}
\newcommand{\Ptrain}{\hat{P}_{\rm{data}}}
\newcommand{\pmodel}{p_{\rm{model}}}
\newcommand{\Pmodel}{P_{\rm{model}}}
\newcommand{\ptildemodel}{\tilde{p}_{\rm{model}}}
\newcommand{\pencode}{p_{\rm{encoder}}}
\newcommand{\pdecode}{p_{\rm{decoder}}}
\newcommand{\precons}{p_{\rm{reconstruct}}}
\newcommand{\laplace}{\mathrm{Laplace}}
\newcommand{\E}{\mathbb{E}}
\newcommand{\Ls}{\mathcal{L}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\emp}{\tilde{p}}
\newcommand{\lr}{\alpha}
\newcommand{\reg}{\lambda}
\newcommand{\rect}{\mathrm{rectifier}}
\newcommand{\softmax}{\mathrm{softmax}}
\newcommand{\sigmoid}{\sigma}
\newcommand{\softplus}{\zeta}
\newcommand{\KL}{D_{\mathrm{KL}}}
\newcommand{\Var}{\mathrm{Var}}
\newcommand{\standarderror}{\mathrm{SE}}
\newcommand{\Cov}{\mathrm{Cov}}
\newcommand{\normlzero}{L^0}
\newcommand{\normlone}{L^1}
\newcommand{\normltwo}{L^2}
\newcommand{\normlp}{L^p}
\newcommand{\normmax}{L^\infty}
\newcommand{\parents}{Pa}
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\DeclareMathOperator{\sign}{sign}
\DeclareMathOperator{\Tr}{Tr}
\let\ab\allowbreak
\newcommand{\AddMethod}[1]{\begingroup\color{black}#1\endgroup}
\maketitle

# Introduction

The primary aim of generative models is to simulate realistic effects of various actions across different contexts, such as text generation [@achiam2023gpt] and visual effects [@videoworldsimulators2024]. Real-world simulators enable human interaction with diverse scenes and objects [@mialon2023augmented], allow robots to learn from simulated experiences without physical risk [@NEURIPS2023_1d5b9233], and generate vast amounts of realistic data for training other machine intelligence [@li2023synthetic].

While research on real-world simulators is extensive [@zhu2024sora; @yang2024video], the application of generative models for virtual world simulation remains under-explored. The financial market exemplifies such a virtual world where each action, from trade execution to strategy deployment, can have ripple effects across a complex network of market participants. The ability to model and predict these effects in real time is crucial for traders, analysts, and regulators alike. Yet, current market simulation models -- largely focused on statistical or agent-based approaches -- lack the resolution, interactivity, and realism needed to reflect the full complexity of order-level behaviors.

To address these gaps, it is crucial to integrate the vast amounts of structured financial data, such as Limit Order Book (LOB) [@gould2013limit], that are essential for capturing market microstructures. We therefore propose the Large Market Model (LMM), a generative foundation model specifically designed for order-level financial market simulation. LMM builds on the successes of generative models in other domains but uniquely adapts them to the financial context, where the generation of orders, order batches, and LOBs plays a critical role in understanding market dynamics. By leveraging structured market data, LMM scales effectively with increasing data and model size, as we will demonstrate through scaling law evaluation, revealing its potential for handling large-scale financial markets. LMM's design ensures that it can generate high-resolution market simulations, capturing both fine-grained individual order actions and broader market trends.

Powered by LMM, we introduce MarS, a financial [Mar]{.underline}ket [S]{.underline}imulation engine, unlocking new potential in financial market forecasting, risk detection, strategy analysis. MarS is designed to ensure realism, producing simulated market trajectories that are robust enough for practical financial tasks such as predictive modeling, risk management, and agent training. It is capable of providing controlled generation, blending users' interactively injected orders into the generation of realistic market behaviors, assessing the market impact of these actions. This feature ensures that MarS delivers not only high-fidelity simulations but also controllable environments where financial strategies can be safely tested and evaluated.

Among the broad adoption of AI techniques in finance [@rl_zhang2024reinforcement; @nlp_liu2023fingpt; @gat_kim2019hats; @cl_hou2021stock], MarS is the first to fully leverage the core elements of financial markets, making it a powerful tool for a wide range of downstream applications. We posit that MarS has the potential to bring paradigm shifts to a wide range of tasks related to the financial market. In this work, we demonstrate its transformative potential in four specific use cases:

1.  **Forecast Tool**: MarS generates subsequent orders based on recent orders and LOB, simulating future market trajectories. This enables precise forecasting by analyzing multiple simulated trajectories.

2.  **Detection System**: By generating multiple future market trajectories, MarS identifies potential risks not apparent from current observations. For example, a sudden drop in trajectory variance could indicate an impending significant event, providing early warnings and enhancing risk management.

3.  **Analysis Platform**: MarS answers a wide range of \`\`what if" questions by providing a realistic simulation environment. For instance, it evaluates the market impact of large orders by comparing existing market impact formulas to simulated results, identifying potential improvements and gaining deeper insights into market dynamics.

4.  **Agent Training Environment**: The realistic and responsive nature of MarS makes it ideal for training reinforcement learning agents. This is demonstrated with an order execution scenario, showcasing MarS's potential for developing and refining trading strategies without real-world financial risks.

The main contributions of this paper are as follows:

- We introduce the Large Market Model (LMM), a generative foundation model designed specifically for financial market simulations, and demonstrate its scalability across data size and model complexity. This establishes a new direction for domain-specific foundation models in finance.

- We develop MarS, a high-fidelity financial market simulation engine powered by LMM, capable of generating realistic market scenarios and modeling the intricate impacts of order-level dynamics. This unlocks new possibilities for applying generative models in financial markets.

- We demonstrate the versatility of MarS through four key downstream applications: precise market forecasting, risk detection, market impact analysis, and agent training for trading strategies. These applications highlight the significant potential of MarS for transforming financial industry practices.

# MarS Design {#MarS Design}

<figure id="fig:overview highlevel" data-latex-placement="ht!">
<img src="Figures/Overview/High-level_Overview_Final_v3.png" />
<figcaption>High-Level Overview of MarS. MarS is powered by a generative foundation model (LMM) trained on order-level historical financial market data. During real-time simulation, LMM dynamically generates order series in response to various conditions, including user-injected interactive orders, vague target scenario descriptions, and current/recent market data. These generated order series, combined with user interactive orders, are matched in a simulated clearing house in real-time, producing fine-grained simulated market trajectories. The flexibility of LMM’s order generation enables MarS to support various downstream applications, such as forecasting, detection systems, analysis platforms, and agent training environments.</figcaption>
</figure>

To create a truly realistic simulation system, MarS must excel in three key dimensions: high-resolution, controllability, and interactivity.

High-resolution refers to the ability of MarS to faithfully replicate the intricate dynamics of financial markets. This is why we leverage trading orders and order batches as the foundational elements of the simulation system, since they encapsulate the investment behaviors of market participants. These fine-grained data points are essential for accurately reproducing historical market trajectories, ensuring that the simulation reflects real market conditions and behaviors with precision.

Controllability offers users the flexibility to simulate a wide range of market scenarios and circumstances. Under the scenarios of assessing market trends, monitoring potential risks, or optimizing trading strategies, MarS provides the tools needed to explore any possible market condition. This capability is particularly valuable for stress testing and strategy optimization, where diverse and even rare extreme cases must be modeled accurately.

Interactivity is crucial for enabling real-time user interaction with the simulated market. By allowing users to inject their own orders into the system, it enable them to evaluate market impacts, including both first-order and second-order effects. This feature is vital for analyzing trading strategies, managing systemic risks, and developing regulatory policies in a controlled, risk-free environment.

## Large Market Model for Financial Market Simulation

**Problem Formulation.** To address the need for high-resolution, controllable, and interactive simulations, we propose the Large Market Model (LMM), a generative foundation model specifically designed for order-level financial market simulation. The problem is formulated as a conditional generation task, where the generation of trading orders is conditioned on historical data, user-injected orders, and market matching rules. LMM incorporates key features of the market microstructure such as Limit Order Books (LOB), enabling it to capture both individual trading behaviors and systemic market dynamics.

**Tokenization of Order and Order-Batch.** LMM models the generation of trading orders as a conditional generation process, leveraging sequential modeling techniques to predict the evolution of market states over time. This is achieved through a novel representation learning approach tailored for the financial industry's structured data, particularly the order flows at two distinct scales: individual orders and aggregated order-batches. The **Order Model**, using a causal transformer, tokenizes historical order sequences and Limit Order Book (LOB) information to ensure the realistic generation of individual trading orders. The tokenization procedure for the $i^{th}$ order is as follows: $$\begin{equation}
    Emb_i = \text{emb}(order_i) + \text{linear\_proj} (LOB_i^{\text{volumes}}) + \text{emb}(LOB_i^{\text{mid\_price}}),
\end{equation}$$

where $order_i$ denotes an index indicating its position in the tuple (type, price, volume, interval), with type being one of \[\`\`Ask", \`\`Bid", \`\`Cancel"\], $LOB_i^{\text{volumes}}$ represents the 10-level volumes for asks and bids in the LOB, and $LOB_i^{\text{mid\_price}}$ is the mid-price of the LOB, expressed as the number of price tick changes since market opening.

In parallel, the **Order-Batch Model** converts the order batches into an image-like format, and employs VQ-VAE to represent and generate aggregated trading behaviors over discrete time intervals. In practice, we convert one order-batch into an RGB image format. We refer to such images as \`\`order images", demonstrated in Fig. `\ref{fig:order_image_converter-rebuttal}`{=latex}.

<figure id="fig:order_image_converter-rebuttal" data-latex-placement="ht!">
<img src="Figures/Order-batch/Framework_order-image-converter_cropped.png" style="width:80.0%" />
<figcaption>The order image converter transforms order data into a visual representation. Each order has three attributes: type (Bid, Ask, Cancel), price slot (relative to the mid-price), and volume slot (binned volume). The pixel values in the image represent the number of orders with the same attributes, with higher pixel values indicating more orders. More details can be found in .</figcaption>
</figure>

These components combine in an ensemble framework, where LMM uses auto-regressive modeling to build a foundational generative model. This framework integrates micro-level behaviors with macro-level market trends. LMM captures complex dependencies within historical data and temporal patterns through high-dimensional embeddings, providing robust market dynamics representation. For further details on the tokenization strategy and the architectural design of Order and Order-Batch Models, we refer the reader to Appendix `\ref{sec:order_model}`{=latex} and `\ref{sec:order_batch_model}`{=latex}.

### Conditional Trading Order Generation

In LMM, the generation of trading orders is modeled as a conditional generation process that adapts to real-time market dynamics. An order clip is a sequence of trading orders $\mathbf{x} = (x_0, \ldots, x_n)$, generated based on the following four key conditions: **DES_TEXT**: A general description of the desired market scenario (e.g., \`\`price bump" or \`\`volatility crush"), ensuring controllability. **Interactive Orders**: $(\dot{x}_{i+1}, \ldots, \dot{x}_{i+j})$ are user-injected orders after the $i$-th generated order. If $j=0$, there are no interactive orders between $x_i$ and $x_{i+1}$. **Starting Sequence**: $(x_0, \ldots, x_{m-1})$ are the initial $m$ orders, often using recent real orders to forecast subsequent ones, enabling realistic simulations. **MTCH_R**: Matching rules for trading orders, defining the feasible space for each order and reflecting the specific financial market's characteristics.

The conditional generation process: $p( x_{i+j+1} | \{ \textit{DES\_TEXT},  (\dot{x}_{i+1}, \ldots, \dot{x}_{i+j}), (x_0, \ldots, x_m), \textit{MTCH\_R} \} )$ ensures that generated orders are realistic and aligned with both the user-defined scenario and the underlying market structure. They can be adjusted for various MarS scenarios, with different applications showcased in Sec. `\ref{application}`{=latex}. We provide a summary of the input conditions and configurations for various applications, along with the detailed introduction of **MTCH_R** and **DES_TEXT** in Appendix `\ref{sec:config-apps}`{=latex}.

### Framework Design of Large Market Model

The LMM integrates two complementary approaches: Order Sequence Modeling and Order-Batch Sequence Modeling, combined into an ensemble model to address financial market complexities. **Order Sequence Modeling.** We use a causal transformer to encode each order and its preceding Limit Order Book (LOB) information as a single token. This method captures the sequential nature of orders, ensuring realistic order sequences that reflect market dynamics. **Order-Batch Sequence Modeling.** To model structured patterns of dynamic market behavior over time intervals, we apply an auto-regressive transformer to order-batch sequences. Orders within a time step are grouped into batches, converted into a structured representation of market behavior for this time step, and modeled to maintain coherence and continuity. **Ensemble Model.** Combining order sequence and order-batch modeling, the ensemble model balances fine-grained control of individual orders with broader market dynamics. This integration ensures detailed and contextually accurate market simulations. **Fine-grained Signal Generation Interface.** We introduce an interface that maps vague descriptions to fine-grained control signals using LLM-based historical market record retrieval. This guides the ensemble model, ensuring simulations follow realistic market patterns and user-defined scenarios.

The bottom-left of Fig. `\ref{fig:overview highlevel}`{=latex} shows the framework of the Large Market Model. The detailed design of its four parts can be found in Appendix `\ref{sec:order_model}`{=latex}, `\ref{sec:order_batch_model}`{=latex}, `\ref{sec:ensemble_model}`{=latex}, `\ref{app-sec:nlp-control}`{=latex}.

### Scaling Law in Large Market Model

LMM's scalability is a key perspective to assess its effectiveness in handling increasingly large-scale financial markets. In our four-part foundation model design, we employ an auto-regressive transformer for order-batch sequences and a causal transformer for order sequences. These components utilize standard pre-training techniques commonly applied in foundation models, including those used in language modeling [@kaplan2020scaling] and vision modeling [@zhai2022scaling].

To assess the scalability of the LMM, we evaluated its performance across varying data scales and model sizes. The scaling curves are shown in Fig. `\ref{fig:scaling-curves}`{=latex}. Our findings indicate that as the size of the data and the model increases, LMM's performance improves significantly, consistent with the scaling laws observed in other foundation models. This suggests that the potential of LMM can be further unlocked by leveraging larger datasets and more extensive computational resources.

While the current implementation only taps into a fraction of the available order-level financial market data due to resource constraints, the vast amount of data accessible within financial markets holds tremendous promise for future enhancements. MarS, in this context, serves as the tool to unearth this \`\`gold mine" of data, indicating substantial opportunities for more comprehensive and powerful market simulations.

<figure id="fig:scaling-curves" data-latex-placement="ht!">
<table>
<tbody>
<tr>
<td style="text-align: center;"><figure id="fig:order-model-scaling-curve">
<img src="Figures/Order-model/order-model-scaling-curve.png" style="width:95.0%" />
<figcaption>Order Model</figcaption>
</figure></td>
<td style="text-align: center;"><figure id="fig:order-batch-model-scaling-curve">
<img src="Figures/Order-batch/order-batch-model-scaling-curve.png" style="width:95.0%" />
<figcaption>Order-Batch Model</figcaption>
</figure></td>
</tr>
</tbody>
</table>
<figcaption>Scaling curves of Order Model and Order-Batch Model. (<strong>a</strong>) Order Model: Trained on 32 billion tokens, with model sizes ranging from 2 million to 1.02 billion parameters. (<strong>b</strong>) Order-Batch Model: Trained on 10 billion tokens, with model sizes ranging from 150 million to 3 billion parameters. The results demonstrate enhanced performance with increased data and model sizes.</figcaption>
</figure>

## MarS --- Order Generation Combined with Simulated Clearing House

Powered by LMM, the MarS engine is designed to generate highly realistic market trajectories that are robust enough for practical financial tasks such as predictive modeling, risk management, and agent training.

At the core of MarS is the simulated clearing house, which matches both generated and interactive orders in real-time, providing extensive information (e.g., LOB) for subsequent order generation. For each generated order $x_i$, the clearing house processes it against any $j$ interactive orders ($j \geq 0$) injected by the user. The results of this matching process, including the recent LOB, are then used to generate the next order $x_{i+j+1}$, creating a continuous and dynamic simulation.

MarS excels at providing controlled generation, blending users' interactively injected orders into the generation of realistic market behaviors. Users can inject their own orders into the system and observe how these actions impact market dynamics in real-time. This capability allows users to simulate various trading strategies, assess market impacts, and evaluate the performance of their strategies under different conditions. The blending process is carefully managed in MarS by adhering to two guiding principles.

- **\`\`Shaping the Future Based on Realized Realities."** At each time step, the order-batch model generates the next order-batch based on recent orders and corresponding matching results from the simulated clearing house. These information conclude the immediate market impact of users' injected orders and determines the generated market behaviors in the next order-batch.

- **\`\`Electing the Best from Every Possible Future."** Multiple predicted order-batches are generated at each time step and the best match to the fine-grained control signal is selected, ensuring the simulation remains realistic while allowing for user control.

The order-level transformer, trained on historical orders, naturally learns immediate market impact for subsequent order generation. Concurrently, the ensemble model influences order generation, aligning with the generated next order-batch. Fig. `\ref{fig:mars generation}`{=latex} illustrates the generation process, balancing injected orders' market impact and control signals to form a realistic simulation.

<figure id="fig:mars generation" data-latex-placement="ht!">
<img src="Figures/Control/Framework_mars_generation_v2.png" />
<figcaption>The process of MarS generation employs a two-level order generation mechanism. At the order-batch level, following the two guiding principles in Sec. , the Order-Batch Model processes existing orders from <span class="math inline"><em>minute</em><sub><em>t</em></sub></span> and generates <span class="math inline"><em>N</em></span> possible distributions for <span class="math inline"><em>minute</em><sub><em>t</em> + 1</sub></span>. Through a filter process based on control signals, the target distribution (<span class="math inline">⋆</span>) is selected and serves as a condition for the Ensemble Model (E). At the order level, the Order Model (O) generates immediate responses for recent and user-submitted orders, while the Ensemble Model refines these generations conditioned on the target distribution. The generated orders in <span class="math inline"><em>minute</em><sub><em>t</em> + 1</sub></span> are fed back to the Order-Batch Model (OB) for <span class="math inline"><em>minute</em><sub><em>t</em> + 2</sub></span> prediction, creating a dynamic feedback loop that balances market impact and controlled generation.</figcaption>
</figure>

# Experiments {#Experiments}

This section evaluates the capabilities of MarS in providing realistic, interactive, and controllable simulations. Note that throughout our experiments, the term \`\`**replay**" refers to replaying real historical market data within MarS to validate the simulation against real-world events.

## Realistic Simulations

To assess the realism of MarS's market simulations, we compare simulated data against key stylized facts derived from historical market data [@sherkar2023studystylizedfactsstock]. These stylized facts serve as robust benchmarks, ensuring market simulations accurately reflect real-world market behaviors [@vyetrenko2020get; @coletta2022learning; @stillman2023deepcalibrationmarketsimulations]. Fig. `\ref{fig:stylized-facts}`{=latex} presents several prevalent stylized facts. MarS successfully replicates these stylized facts, demonstrating its capability to produce highly realistic market simulations suitable for practical applications. Besides these three stylized facts, we provide a detailed evaluation of other **eleven** stylized facts in Appendix `\ref{set:cont-stylized-facts}`{=latex} and a quantitative analysis in Appendix `\ref{sec:quantitative-stylized-facts}`{=latex}.

<figure id="fig:stylized-facts" data-latex-placement="ht!">
<table>
<tbody>
<tr>
<td style="text-align: center;"><figure id="fig:stylized-facts-aggregational-gaussianity">
<img src="Figures/Order-model/stylized-facts-1-Aggregational-Gaussianity.png" />
<figcaption>Aggregational Gaussianity</figcaption>
</figure></td>
<td style="text-align: center;"><figure id="fig:stylized-facts-absence-of-autocorrelations">
<img src="Figures/Order-model/stylized-facts-2-Absence-of-Autocorrelations.png" />
<figcaption>Absence of Autocorrelations</figcaption>
</figure></td>
<td style="text-align: center;"><figure id="fig:stylized-facts-volatility-clustering">
<img src="Figures/Order-model/stylized-facts-3-Volatility-clustering.png" />
<figcaption>Volatility Clustering</figcaption>
</figure></td>
</tr>
</tbody>
</table>
<figcaption>Illustration of Stylized Facts in MarS. (<strong>a</strong>) Aggregational Gaussianity: as the interval increases from 1 to 5 minutes, the distribution of log returns becomes more similar to a normal distribution. (<strong>b</strong>) Absence of Autocorrelations: the auto-correlation of log returns rapidly decreases with increasing intervals. (<strong>c</strong>) Volatility Clustering: high volatility auto-correlation is observed over periods.</figcaption>
</figure>

## Interactive Simulations {#sec:Interactive-Simulations}

Understanding market impacts, i.e., changes in financial markets caused by trading activity, is crucial. MarS simulates these impacts by generating orders from detailed order-level data. Fig. `\ref{fig:interaction-example}`{=latex} illustrates MarS interacting with a trading agent executing a TWAP (Time-Weighted Average Price) strategy, which caused observable changes in the subsequent price trajectory. The gap between the two curves represents the synthetic market impact generated by the agent's trading actions. A detailed exploration of market impact can be found in Sec.`\ref{sub-sec:market-impact}`{=latex}.

We validated these simulations by collecting market impacts from agents with various configurations, confirming that the synthetic data adheres to the $\textit{Square-Root-Law}$, as depicted in Fig. `\ref{fig:Verify-Square-Root-Law}`{=latex}. The $\textit{Square-Root-Law}$, $\Delta \propto \sigma \sqrt{{Q}/{V}}$, is a widely used model for market impact [@Moro_2009; @Lillo2003; @almgren2005direct], where $\Delta$ is the price change, $\sigma$ is the volatility, $Q$ is the trading volume, and $V$ is the total market volume. These results illustrate that MarS can effectively model the impact of trading strategies on market prices, providing valuable insights for market participants and aiding in the development of more robust trading strategies. Additional details and results about the TWAP agent and market impact can be found in Appendix `\ref{sec:twap-algo}`{=latex} and `\ref{app-sec:market-impact}`{=latex}.

<figure id="fig:market_manipulation_exp" data-latex-placement="ht!">
<table>
<tbody>
<tr>
<td style="text-align: center;"><figure id="fig:interaction-example">
<img src="Figures/Order-model/rollouts.png" style="width:100.0%" />
<figcaption>Synthetic market interaction</figcaption>
</figure></td>
<td style="text-align: center;"><figure id="fig:Verify-Square-Root-Law">
<img src="Figures/market_impact/twap-impacts/market-impact-square-root.png" style="width:100.0%" />
<figcaption>Square-Root-Law Validation</figcaption>
</figure></td>
<td style="text-align: center;"><figure id="fig:interaction-example-corr">
<img src="Figures/Control/control-interaction-corr.png" style="width:100.0%" />
<figcaption>Effects of control signals</figcaption>
</figure></td>
</tr>
</tbody>
</table>
<figcaption>Results of interactive and controllable simulations in MarS.</figcaption>
</figure>

## Controllable Simulations

We demonstrate the controllability of MarS by replicating historical events. Specifically, MarS allows two types of control signals: $\{ \textbf{replay curve}, \textbf{prompt}\}$. For control with $\textbf{replay curve}$, we simulate a price change between 0.3% and 0.5% over 5 minutes. With control enabled, an order batch is generated using minute-level guiding signals from the replay curve, integrated with the order model within an ensemble model to produce trading orders. Fig. `\ref{fig:interaction-example-corr}`{=latex} depicts the correlation between simulated and replay price trajectories. The introduction of control signals significantly enhances the correlation scores ($0.23 \rightarrow 0.47$), showcasing MarS's effectiveness in generating controllable market simulations. Fig. `\ref{fig:interaction-example-corr}`{=latex} shows the balance between control and interaction. Configurations with control but no interaction achieve the highest correlation scores, while introducing interaction reduces control precision ($0.47 \rightarrow 0.33$). This inherent balance allows for more realistic interactions in diverse applications. For control with $\textbf{prompt}$, MarS allows users to use natural language to describe specific historical scenarios, then utilizes Large Language Models(LLMs) to guide the generation through the **fine-grained signal generation interface**. The detailed results are provided in Appendix `\ref{app-sec:nlp-control}`{=latex}.

# Applications {#application}

In Sec.`\ref{MarS Design}`{=latex} and `\ref{Experiments}`{=latex}, we demonstrated the formulation of diverse financial tasks as a conditional trading order generation problem. Our experiments showed that MarS is Realistic, Controllable, and Interactive, establishing it as a robust financial market simulator. This section explores potential downstream applications of MarS, further validating its foundational role in financial market simulation. We present practical financial tasks to illustrate: a) MarS's capability to solve financial problems independently, and b) its utility as a simulation platform for other tasks. For a), we showcase Forecast and Detection tasks, and for b), we provide examples of \`\`What if" Analysis, and Reinforcement Learning Environment.

Here, we highlight that, analogous to text generation vs. language modeling [@achiam2023gpt; @abdin2024phi; @dubey2024llama], and video generation vs. physical world decision making [@liu2024sora; @yang2024video; @yang2023learning], we have constructed a unified task interface through conditional trading order generation for diverse financial downstream tasks with MarS. This interface can transfer complex and diverse financial information into specific tasks. We compare current methodologies with the new paradigm introduced by MarS to illustrate the \`\`paradigm shift" across various types of financial tasks, as shown in Table `\ref{table:app-compare}`{=latex}. Detailed introductions are provided in the subsequent sections.

     **Applications**                              **Current Methods**                                                            **MarS**
  ----------------------- ---------------------------------------------------------------------- --------------------------------------------------------------------------
        Forecasting                               sequence extrapolation                                                   conditional generation
         Detection         $\operatorname{Diff}(\textit{market}_{now}, \textit{market}_{past})$   $\operatorname{Diff}(\textit{market}_{now}, \textit{simu-market}_{now})$
   \`\`What if" Analysis                  online experiments, empirical formula                                         offline data-driven pipeline
      RL Environment                     finite data, fake $P(s_{t+1}|s_{t},a_t)$                                infinite data, real $P(s_{t+1}|s_{t},a_t)$

  : Summary of how MarS reshapes mainstream financial applications. $\operatorname{Diff}(\cdot,\cdot)$ represents the difference between two market states for anomaly detection. $P(s_{t+1}|s_{t},a_t)$ denotes the state transition given the current state and action. Without an interactive environment, most existing financial RL works cannot model the realistic impact of market state caused by agent actions. Further details of the RL-Environment are in Sec.`\ref{sub-sec:RL-env}`{=latex}. {#table:app-compare}

## Forecasting {#sec:forecasting}

Forecasting is crucial in many financial applications, with market trend forecasting being a prime example. This task demands models that accurately capture and reflect market dynamics. Traditionally, direct forecasting models are used. In this section, we assess the effectiveness of our market simulation in predicting trends.

Following [@Ntakaris_2018], we define the price change from $t$ to $t+k$ minute as: $l = \left(\left(\frac{1}{n} \sum_{i=1}^{n} m_{i}\right) - m_{0}\right) / m_{0}$, where $m_{0}$ is the mid-price at time $t$, $n$ is the number of orders between $t$ and $t+k$ minutes, and $m_{i}$ is the mid-price after the $i$th order event. The price change is categorized into three classes---up, down, and flat---based on the value of $l$, ensuring similar probabilities for each class over the training period. We compare our model with DeepLOB by [@Zhang_2019], a well-known baseline. Fig. `\ref{fig:trend-prediction}`{=latex} illustrates that LMM-based simulations significantly outperform DeepLOB, highlighting its superior market dynamics understanding. Additionally, the 1.02 billion-parameter model outperforms the 0.22 billion-parameter model, indicating that improved validation loss in scaling curve (Fig. `\ref{fig:scaling-curves}`{=latex}) correlates with enhanced forecasting performance.

It is noteworthy that all forecasting targets can be calculated using simulated trajectories from MarS, whereas traditional direct forecasting models require separate training for each target. This underscores the significant advantage of simulation-based forecasting by MarS. For more discussion about the comparison between DeepLOB and MarS/LMM, please refer to Appendix `\ref{sec:forecasting-comparison}`{=latex}.

<figure id="fig:rl-training-curves" data-latex-placement="ht!">
<table>
<tbody>
<tr>
<td style="text-align: center;"><figure id="fig:trend-prediction">
<img src="Figures/Order-model/2024-05-13-order-model-trend-prediction-TRAIN-SPLIT.png" style="width:80.0%" />
<figcaption>Trend Prediction Accuracy</figcaption>
</figure></td>
<td style="text-align: center;"><figure id="fig:rl-training-curves">
<img src="Figures/Order-model/RL-training-curves-more-actions.png" style="width:80.0%" />
<figcaption>Performance of the trading agent</figcaption>
</figure></td>
</tr>
</tbody>
</table>
<figcaption>Results of forecasting and RL-agent training tasks. For forecasting task, MarS executes 128 simulations at each initial time point, and aggregate outcomes to determine the final predicted class. The ground truth is obtained from historical replay. For RL-agent training, the x-axis represents the number of update batches, and the y-axis is the price advantage over our best-configured TWAP agent (<span class="math inline">L1-P0.9</span>), in basis points (BP).</figcaption>
</figure>

## Detection

Detecting the changing state of market is crucial in financial tasks, especially in the regulation of market abuse, e.g., insider trading [@meulbroek1992empirical] and market manipulation [@putnicnvs2012market]. We demonstrate how MarS could bring a new simulation-based paradigm to detection tasks by monitoring the similarity between simulated and real market patterns. Using real market manipulation cases from CSRC[^3], we evaluate the similarity of spread distributions through Distribution Similarity[^4], which serves as a key indicator of market liquidity. While MarS maintains high distribution similarity ($>0.87$) in normal periods, its simulation realism drops significantly during manipulation periods, particularly showing a heavier tail and a peak around $\delta=1000$ (Fig. `\ref{fig:market_manipulation}`{=latex}). These anomalies can be viewed as signals likely corresponding to market manipulation, where manipulators significantly impact liquidity. This suggests a promising direction for automated anomaly detection, though comprehensive evaluation combining multiple metrics is necessary for robust conclusions. Detailed analysis and experimental settings are provided in Appendix `\ref{appendix:detection}`{=latex}.

<figure id="fig:market_manipulation" data-latex-placement="ht!">
<table>
<tbody>
<tr>
<td style="text-align: center;"><figure>
<img src="Figures/Application/Detection/detection-part-0.png" />
<figcaption>Pre-manipulation</figcaption>
</figure></td>
<td style="text-align: center;"><figure>
<img src="Figures/Application/Detection/detection-part-1.png" />
<figcaption>Manipulation period</figcaption>
</figure></td>
<td style="text-align: center;"><figure>
<img src="Figures/Application/Detection/detection-part-2.png" />
<figcaption>Post-manipulation</figcaption>
</figure></td>
</tr>
</tbody>
</table>
<figcaption>Spread distribution in different periods of market manipulation. The distribution similarity between replay and simulation drops during the manipulation period (b), where a heavier tail and a noticeable peak around <span class="math inline"><em>δ</em> = 1000</span> emerge, in contrast to the pre-manipulation (a) and post-manipulation (c) periods.</figcaption>
</figure>

## \`\`What If" Analysis on Market Impact {#sub-sec:market-impact}

One of the most important \`\`What if" topics in finance is to analyze market impact, the change in asset prices caused by trading activity. Due to complex mechanisms, most existing research in this area relies heavily on strong assumptions and empirical formulas [@zarinelli2015beyond; @almgren2005direct; @gatheral2010no; @gatheral2012transient; @gatheral2011exponential], and is limited to costly and risky online experiments. In this section, we take market impact as an example, showing how MarS can act as a reliable and powerful platform and contribute to \`\`what if" analysis. As we have validated the reliability of synthetic market impact in Sec.`\ref{sec:Interactive-Simulations}`{=latex}, we step to a more ambitious goal: leverage the synthetic data to build data-driven pipeline to discover new laws to explain market impact and its long-term dynamics. Due to the limited space, details of experiment settings, clarification, and more results in this section are provided in Appendix `\ref{app-sec:market-impact}`{=latex}.

**New factors beyond Square-Root-Law:** To uncover new factors beyond Square-Root-Law influencing market impact, we first employed symbolic regression [@de2020pysindy], using classic volume and price factors before trading as the base dictionary. By applying genetic algorithms, we sought to identify the most informative factors on synthetic market impact. The preliminary results were reviewed and refined by domain experts, leading to the discovery of three new factors that partially explain market impact: $\{$*resiliency*, *LOB_pressure*, *LOB_depth*$\}$. We show the relationship between market impact and factor $\textit{resiliency}$ in Fig. `\ref{fig:Resiliency}`{=latex}.

<figure id="fig:long-term-ode-weight" data-latex-placement="ht!">
<table>
<tbody>
<tr>
<td style="text-align: center;"><figure id="fig:Resiliency">
<img src="Figures/market_impact/resiliency.png" style="width:80.0%" />
<figcaption>New factor: Resiliency</figcaption>
</figure></td>
<td style="text-align: center;"><figure id="fig:long-term-ode-weight">
<img src="Figures/market_impact/learn_weights.png" style="width:90.0%" />
<figcaption>Interaction weights of learned ODE</figcaption>
</figure></td>
</tr>
</tbody>
</table>
<figcaption>Analysis of new market impact factor and long-term market impact.</figcaption>
</figure>

**Dynamics of Long-Term Market Impact:** The long-term market impact, also known as price impact trajectory, typically manifests as a gradually decaying sequence of price fluctuations after a trade. Traditional research relies on empirical formulas to model this dynamics [@gatheral2011exponential; @donier2015fully; @bacry2015market], but could struggle to capture the full complexity of real-world scenarios. To address this, we leverage generated market impact to develop a more accurate, data-driven approach. Our method models the decay dynamics using an ordinary differential equation (ODE), which integrates both potential influencing factors and decay functions:

$$\begin{align}
    \frac{dY(t)}{dt} =  \text{sum}( W \circ (X \otimes F^{\text{decay}}(t))) = \sum_{i=1}^{m}\sum_{j=1}^{n} W_{i,j} X_i F_j^{\text{decay}}(t) \label{eq:long-term}
\end{align}$$ where $Y(t)$ is the long-term market impact, $X \in \mathbb{R}^m$ is the factor group, such as volume, price, etc., and $F^{\text{decay}}(t): t \rightarrow \mathbb{R}^n$ includes possible decay functions, e.g., $[1/t, \ldots, 1/\sqrt{t}]$. $X$ and $F^{\text{decay}}(t)$ can be customized based on domain knowledge. $\otimes$ is the outer product, $\circ$ is the Hadamard product, $X^{T} \otimes F^{\text{decay}}(t)$ is a matrix with size $\mathbb{R}^{n \times m}$, representing interactions among factors and decay patterns, and $W \in \mathbb{R}^{n \times m}$ is the learnable interaction weight. Fig. `\ref{fig:long-term-ode-weight}`{=latex} shows the learned weights $W$, demonstrating the importance of interaction pairs of two decay functions and seven factors, which can help to deepen our understanding of the long-term market impact.

## Reinforcement Learning Environment {#sub-sec:RL-env}

The MarS environment, being both realistic and interactive, is ideal for training reinforcement learning (RL) agents. This environment accurately reflects an agent's impact, provides realistic rewards, and facilitates training robust agents for the financial market. In this experiment, we aim to train a trading agent from scratch using MarS. The trading agent's goal is to purchase a large volume within 5 minutes, optimizing both fulfillment rate and price advantage.

The trading agent's state includes features such as remaining time, remaining volume, LOB imbalance, and the period's stage (passive or aggressive). The agent's actions are based on a configurable TWAP strategy and the reward function is defined as follows:

$$\begin{equation}
    \text{Reward} = \alpha \times \text{FulfillmentRate} + \text{PriceAdvantage},
\end{equation}$$ where $\alpha=1$ when FulfillmentRate $\leq 0.95$ and decreases to 0 as FulfillmentRate approaches 1. Detailed settings of agent training can be found in Appendix `\ref{sec:twap-algo}`{=latex}.

Fig. `\ref{fig:rl-training-curves}`{=latex} shows the training performance of the trading agent. The agent's performance improves from -6 BP to 2\~6 BP during training. The observed fluctuations between 2\~6 BP are attributed to the agent exploring various strategies between high and low fulfillment rates, resulting in corresponding variations in price advantage based on the current reward setting. This demonstration highlights that MarS is capable of training trading agents from scratch by leveraging its realistic and interactive simulation capabilities.

# Related Work

We give a detailed and comprehensive discussion of related work on financial market simulation and generative foundation models in Appendix `\ref{related}`{=latex}.

# Conclusion

We introduce MarS, an order-level, fine-grained realistic financial market simulation engine, powered by the generative foundation model, LMM. Our evaluation of LMM's scaling law demonstrates the potential for continuous improvement in future financial world models. We identify three essential characteristics of impactful market simulation: realism, controllability, and interactivity. We present four representative tasks developed using MarS, underscoring its potential to catalyze a paradigm shift across various financial applications.

# Acknowledgements {#acknowledgements .unnumbered}

We would like to thank our colleagues Xiao Yang and Xu Yang for contributing to our early prototype and their invaluable feedback and suggestions during our regular discussions. We also express our sincere gratitude to Chengqi Dong for his meticulous analysis and exploration on market impact data, which have significantly enhanced our understanding of market impact studies.

# Disclaimer {#disclaimer .unnumbered}

Users of the market simulation engine and the code should prepare their own agents which may be included trained models built with users' own data, independently assess and test the risks of the model in a specify use scenario, ensure the responsible use of AI technology, including but limited to developing and integrating risk mitigation measures, and comply with all applicable laws and regulations. The market simulation engine does not provide financial opinions, nor is it designed to replace the role of qualified financial professionals in formulating, assessing, and approving finance products. The outputs of the market simulation engine do not reflect the opinions of Microsoft. `\clearpage`{=latex}

\bibliographystyle{iclr2025_conference}
\clearpage
\appendix

# Related Works {#related}

**Financial Market Simulation.** Before the recent surge in generative foundation models, researchers in the finance domain had already recognized the immense potential of powerful market simulations. Early approaches often utilized agent-based modeling, particularly multi-agent systems, to simulate order-driven markets [@chiarella2009impact; @byrd2020abides; @amrouni2021abides].

With the advancements in deep learning technologies, several works have emerged that adopt the world model paradigm to simulate Limit Order Book (LOB) markets [@takahashi2019modeling; @li2020generating; @coletta2021towards; @coletta2022learning; @coletta2023conditional]. These studies primarily leveraged Generative Adversarial Networks (GANs) [@goodfellow2020generative] to model the distribution of LOB time series.

Recently, some generators have begun incorporating market micro-structure data, such as those presented in [@hultin2023generative; @nagy2023generative]. Among these, [@nagy2023generative] is most related to our work, particularly regarding the order model. They employ an auto-regressive model based on a Deep State Space Network [@rangapuram2018deep] to generate LOB and message flows. However, their focus is primarily on LOB modeling. While they demonstrate some realistic stylized facts of the generated sequences, they do not evaluate the model's capability to address downstream financial tasks.

Our work aims to push the boundaries of financial market simulation by introducing an innovative approach that goes beyond generating realistic order flows. We introduce MarS, a pioneering financial market simulation engine driven by the Large Market Model (LMM). Designed to meet the specific demands of the financial sector, MarS excels in modeling the market impact of orders and achieving high levels of controllability and realism. By framing various financial market tasks as conditional trading order generation problems, we demonstrate MarS's transformative potential and practical applications in real-world financial markets.

**Foundation Models.** Foundation models are trained on broad datasets and can be adapted to a wide range of downstream tasks. The term was popularized by the Stanford Institute [@bommasani2021opportunities]. The release of GPT-3 [@brown2020language] showcased the powerful benefits of training large auto-regressive language models (LLMs) on extensive corpora [@abdin2024phi; @achiam2023gpt; @dubey2024llama].

In addition, numerous foundation models have emerged in the fields of computer vision (CV) and multimodal areas [@rombach2021highresolution; @videoworldsimulators2024; @liu2023llava]. Recently, real-world simulators and industry-specific large models have become popular research topics in this field. Real-world simulators aim to achieve real-world simulation through the unified goal of video generation, addressing various tasks in fields such as autonomous driving, robotics, and gaming [@liu2024sora; @zhu2024sora; @yang2024video; @yang2023learning]. However, they primarily focus on simulating the physical world. The order-driven financial market is an exemplary virtual world with different operating principles. To the best of our knowledge, we are the first to build a financial world simulator.

Industry-specific large models primarily focus on fields such as biomedicine [@moor2023foundation], law [@huang2023lawyer], and finance. In the financial domain, most large models are Financial LLMs, which either pre-train LLMs on financial corpora [@wu2023bloomberggpt; @zhang2023xuanyuan] or fine-tune them [@xie2024pixiu; @zhang2023instruct; @Fin-LLAMA; @yang2023investlm] to tackle financial NLP tasks or multimodal tasks [@bhatia2024fintral; @xie2024open], including sentiment analysis, text classification, and question answering.

Beyond text, there is an even larger and more information-rich corpus in the financial world: trading orders. We propose a Large Market Model (LMM), which, for the first time, reveals the scaling law on trading orders. We take the first step toward building a generative foundation model as a world model for the financial market. We believe that, with MarS as the shovel, the extensive order-level data undoubtedly represent a significant gold mine.

# Order Sequence Modeling. {#sec:order_model}

## Introduction

The order model for financial markets shares similarities with the Language Model (LM) for text in several respects. Both models strive to predict the subsequent event, whether it be a token in a text corpus or a trade order in financial markets. Additionally, the datasets for both are typically extensive, facilitating the training of robust models. Furthermore, data in both domains can be generated autoregressively.

Nevertheless, substantial differences also exist between the two fields. Each order in the financial market is associated with a complex set of market dynamics, including the Limit Order Book (LOB), transactions, and potentially market news in natural language. Consequently, each order may be influenced by a broader array of information beyond the order stream itself. It is therefore imperative to encode this rich information compactly while preserving the autoregressive generation paradigm. Moreover, the financial market operates on a rule-based order matching system, which processes orders and generates new states, such as transactions and the updated LOB. This necessitates an additional order matching step to obtain accurate market states.

<figure id="fig:order-model-framework" data-latex-placement="ht!">
<img src="Figures/Order-model/Framework_order-model_cropped.png" style="width:70.0%" />
<figcaption>The framework of the order model. The model is trained on the order stream and the corresponding LOB information. It is autoregressive, generating the next order based on the preceding order and LOB information. The order matching step is employed to produce the new LOB state.</figcaption>
</figure>

## Approach

### Tokenization

The objective of tokenization is to make it compact and efficient for encoding and decoding while retaining the majority of useful information. To this end, we opt to encode each order and its antecedent LOB as a single token. The LOB information functions analogously to an image in a text, offering additional context for the order. The tokenization procedure for the $i^{th}$ order is as follows: $$\begin{equation}
    Emb_i = \text{emb}(order_i) + \text{linear\_proj} (LOB_i^{\text{volumes}}) + \text{emb}(LOB_i^{\text{mid\_price}})
\end{equation}$$

Here, $order_i$ denotes an index indicating its position in the tuple (type, price, volume, interval), with type being one of \[\`\`Ask", \`\`Bid", \`\`Cancel"\]. Both price and volume are discretized into the range \[0, 32), and interval into \[0, 16). An index within the range \[0, 49152) can uniquely identify a position for the (type, price, volume, interval) tuple. $LOB_i^{\text{volumes}}$ represents the 10-level volumes for asks and bids in the LOB, also discretized into \[0, 32). The $LOB_i^{\text{mid\_price}}$ is the mid-price of the LOB, expressed as the number of price tick changes since market opening.

This formula computes the embedding for the $i^{th}$ token, which is a composite of the order, the linear projection of the LOB volumes, and the embedding of the LOB mid-price.

While the input token includes LOB information, it is impractical and unnecessary to predict the resultant LOB during the decoding process. Instead, the new LOB information can be derived using a standard order matching algorithm, based on the preceding LOB and the newly generated order. Given this consideration, we only output the order index and conduct an order matching during simulation to obtain the subsequent accurate LOB state, as depicted in Fig. `\ref{fig:order-model-framework}`{=latex}.

## Data and Model Training {#sec:data_model_training}

Our dataset encompasses the top 500 liquidity stocks in the Chinese stock market, covering the period from 2017 to 2023 and comprising 16 billion order tokens. Our model architecture is based on LLaMA2 [@touvron2023llama], and AdamW optimizer [@loshchilov2017decoupled] is employed in all experiments. We utilize fp16 precision with DeepSpeed ZERO stage 2 [@rajbhandari2020zero] to optimize memory usage. The sequence length is set at 1024, with a batch size of 4096, equating to 4 million tokens per optimization step.

The inclusion of LOB information in the tokenization process is compared to determine its impact on training performance. The evidence suggests that integrating the LOB information contributes to an enhanced training curve, as shown in Fig. `\ref{fig:order-tokenization}`{=latex}.

<figure id="fig:order-tokenization" data-latex-placement="ht!">
<img src="Figures/Order-model/2024-05-13-order-model-tokenization.png" style="width:50.0%" />
<figcaption>Tokenization of the Order Model. A comparative analysis of the tokenization process with and without the Limit Order Book (LOB) information. Incorporating precise LOB information leads to an improved training curve.</figcaption>
</figure>

Furthermore, we examine the effects of varying data and model sizes on training performance. The data suggest that augmenting both data and model sizes correlates with improved outcomes, as shown in Fig. `\ref{fig:order-model-scaling-curve}`{=latex}.

# Order-Batch Sequence Modeling {#sec:order_batch_model}

## Introduction

In this section, we introduce the order-batch model. Different from the order model, which focuses on individual orders, the order-batch model concentrates on batches of orders to model structured patterns of dynamic market behavior over time intervals. We innovatively organize batches of orders into an RGB image format, which are then discretized into tokens for autoregressive training, aimed at generating order-batch sequences.

As we know, financial markets are comprised of diverse participants, each with a unique set of information and trading frequency. Even in the domain of high-frequency trading, there are nuances: some traders pay close attention to each order, while others may focus on signals in fixed time intervals to guide their trading decisions. Through data analysis, we can easily discern the traces by the latter type of high-frequency traders. We counted the number of orders per minute for each stock in our dataset introduced in Sec. `\ref{sec:data_model_training}`{=latex} to create a chart shown in Fig. `\ref{fig:num_intraday_distl}`{=latex}. From this chart, we can observe the following patterns: 1. The intraday order distribution is U-shaped. 2. There is a significant increase in order number at the market open in the morning and after the lunch break. 3. There are spikes in order numbers nearly every 10 minutes, suggesting a periodic pattern. With the above observations, we find that the distribution of orders within fixed intervals adheres to consistent patterns, and such patterns can also be captured by the model. So we attempt to model these structured patterns of dynamic market behavior.

<figure id="fig:num_intraday_distl" data-latex-placement="t">
<img src="Figures/Order-batch/num_intraday_dist.png" style="width:70.0%" />
<figcaption>The intraday distribution of the average number of orders per minute.</figcaption>
</figure>

Besides, modeling batches of orders facilitates the generation of specific financial scenarios. If generating a specified market scenario through prompts, there will be significant information asymmetry between the brief text of prompts and the thousands of orders in an order flow. Imposing such a signal directly onto each order through an order model is clearly intractable. Therefore, we need an order-batch model to act as a bridge between the prompt and the order model to facilitate this transition. The order-batch model corresponds to prompts by first generating minute-level order batches, and then decoding them into an order flow in conjunction with the order model.

## Approach

As observed in Fig. `\ref{fig:num_intraday_distl}`{=latex}, orders within fixed time intervals vary in numbers, and these variations are significant at different time throughout the day. In light of this, learning representations from the sequences after padding is clearly not a sensible approach. To better represent orders of variable numbers, we creatively convert the orders into an RGB image format. This approach allows us not only to \`\`visualize" the changes in orders over a period of time but also to draw on the experience of the image generation field, transforming the problem of order-batch generation into one of image generation. We present the framework of the order-batch model in Fig. `\ref{fig:order_batch_model}`{=latex}.

## Order Image Converter {#sec:order_image_converter}

<figure id="fig:order_batch_model" data-latex-placement="ht!">
<img src="Figures/Order-batch/Framework_order_batch_model.png" />
<figcaption>The framework of order-batch model. We employ a two-stage training approach: in Stage 1, we leverage a fine-tuned image encoder to transform ``order images” from minute-level orders into tokens; in Stage 2, we train an autoregressive transformer model to learn the distribution of the tokens. Order images are decoded from tokens via fine-tuned image decoder.</figcaption>
</figure>

Learning representations directly from order sequences at fixed time intervals is not an effective and practical approach. On the one hand, stocks with different levels of liquidity have significantly different order numbers. On the other hand, for the same stock, the distribution of order numbers throughout the day can be extremely uneven (with a higher concentration during the opening and closing periods, and sparser distribution during the mid-day). Within fixed time intervals (e.g., minute-level), we care more about the aggregate characteristics of the order sequence rather than the details of individual orders. Under the assumption that the distribution of orders remains relatively stable over short periods, we can disregard the precise arrival times of individual orders and structure the order sequences in a cross-sectional view.

In practice, we convert one order-batch into an RGB image format. We refer to such images as \`\`order images" with shape $[C,H,W]$, as we demonstrated in Fig. `\ref{fig:order_image_converter-rebuttal}`{=latex}. $C$ denotes the categories of orders, or the channels of an order image. $W$ and $H$ represent the width and height of the order image, indicating the number of price and volume slots, respectively. The pixel value $V$ of the order images signifies the count of identical orders. In our work, we set $C=3, H=W=32, V \in [0,100]$.

The order image converter allows us not only to \`\`depict" the changes in orders over a past period but also to leverage experience from image generation. We can utilize a pre-trained visual encoder to obtain an order-batch embedding.

### Stage 1: Order Image Tokenizer

After converting the order-batch into an order image, we transform the problem of modelling order-batches into an image generation problem. In this way, we can follow the successful path of Large Vision Models [@bai2024sequential], adopting a two-stage approach to generate intraday order-batch sequences. The first stage of the image generation task typically involves using a pre-trained image tokenizer to discretize individual images into a series of tokens.

Specifically, we leverage VQGAN [@esser2021taming] to accomplish the conversion of order images into discrete tokens, which learns a convolutional model consisting of an encoder and decoder, allowing them to represent images using codes from a learned, discrete codebook. In particular, VQGAN incorporates a discriminator and perceptual loss to ensure high quality during the compression process. In our implementation, both the encoder and decoder utilized the original structure. **Technical Details**: We use a pre-trained VQGAN from LDM [@rombach2022high], which was trained on the LAION-400M database [@schuhmann2021laion]. We adopt the configuration and weights from one of the models in the LDM model zoo, with a down-sampling factor $f=4$, vocabulary size $Z=8192$, and codebook dimension $d=3$. This means that an RGB order image of size $32 \times 32$ with 3 channels is discretized into $8 \times 8 = 64$ tokens at this stage, each with a dimension of $3$. In practice, we find that the off-the-shelf model parameters did not represent order images well, so we fine-tune it using order images to achieve a transition from natural images to order images.

### Stage 2: Order-batch Sequence Modelling

After the order image tokenizer converts individual order images into a sequence of discrete tokens, we concatenate these tokens to form an order-batch sequence. In Stage 2, we train an autoregressive transformer to learn the distribution of these tokens. It learns not only the distribution of tokens that make up an order image but also the distribution of tokens between order images. Consequently, we can generate intraday order-batch sequences.

Specifically, we employ a language model for next token prediction training. **Technical Details**: We use LLaMA2[@touvron2023llama] as the implementation framework for our autoregressive transformer. We calculate the cross-entropy loss between prediction logits and labels. Implementation Details: The token length for LLaMA2 is 4096, and we concatenate 16 order-batches to form an order-batch sequence, with a total length of $16 \times 64 = 1024$, which is well below the length limit.

# Ensemble Model {#sec:ensemble_model}

## Introduction

In sections above, we introduced the order model and order batch model, each with its advantages:

- **Order model**: This model generates orders individually and is designed to reflect short-term market impacts rapidly. However, it lacks the ability to generate target scenarios over the long run.

- **Order-batch model**: This model generates order channels (We do not distinguish 'order channels' and 'order images' in this paper), representing the macro behavior of the market, and can be used to follow control signals. However, it lacks the ability for interactive market simulation.

In this section, we introduce the ensemble model, which aims to balance interaction and controllability in market simulation.

## Approach

The order channels output by the order-batch model contain rich information about macro trends in the financial market. It would be advantageous if the order model could utilize this information to generate orders.

We propose using an ensemble model that takes the order logits and order channels as input and generates the next order, as illustrated in Fig. `\ref{fig:mars generation}`{=latex}

In our experiment, we found it challenging to train the ensemble model directly from order channels predicted by the order-batch model. The reason is that the order channels predicted by the order-batch model still exhibit high variance and may not accurately reflect replay order data. Realizing this, during training, we use the order channels directly from replay data, which provides an accurate description of the market trend. In this way, our ensemble model learns how to condition on the order channels to generate the next order. During simulation, we use order channels predicted by the order-batch model to generate orders, which provide more flexibility for controllable simulation.

The ensemble model is a simple cross-attention model that takes the order logits and real order channels as input and generates the next order. The loss advantage over the order model is used as the training metric. Fig. `\ref{fig:ensemble-training}`{=latex} shows the training process of the ensemble model. We can see that with this design, the ensemble model can improve its performance on order generation, demonstrating its conditioning on order batch data.

<figure id="fig:ensemble-training" data-latex-placement="ht!">
<img src="Figures/Control/ensemble-training.png" style="width:50.0%" />
<figcaption>Training process of the ensemble model. The x-axis represents the number of training samples, and the y-axis represents the loss advantage over the order model.</figcaption>
</figure>

# Fine-Grained Signal Generation Interface {#app-sec:nlp-control}

We introduce an interface that maps vague descriptions to fine-grained control signals using LLM-based historical market record retrieval. This guides our order batch model, ensuring simulations reflect realistic market patterns and user-defined scenarios. The process involves three main steps:

- **Example Provision and Code Generation**: Provide a sample of minute-level return history to GPT-4o mini and prompt it to generate code that retrieves historical periods matching specified scenarios.

- **Scenario Filtering**: Apply the generated code on the entire dataset to identify more minute-level trajectories for each scenario.

- **Scenario Generation**: Use the identified minute-level trajectories to guide the generation of order batches according to principles outlined in Sec. `\ref{item:priciples}`{=latex}, alongside the ensemble model for scenario generation.

The minute-level return history is stored in a CSV file, formatted as shown in Table `\ref{tab:raw_data_sample}`{=latex}:

    **date**    **minute**   **SZ000001**   **SZ000002**   **\...**   **SZ003043**   **SZ003816**
  ------------ ------------ -------------- -------------- ---------- -------------- --------------
   2023-01-03    09:31:00     -0.001520       0.001664       \...      -0.005541       0.000000
   2023-01-03    09:32:00      0.000761       0.000000       \...      -0.004261       0.000000
      \...         \...          \...           \...         \...         \...           \...
   2023-03-31    14:55:00      0.000797       0.000657       \...       0.000164       0.000000
   2023-03-31    14:56:00      0.000000      -0.000656       \...       0.000164       0.000000

  : Format of minute-level return history {#tab:raw_data_sample}

We demonstrate market simulations for scenarios including \`\`Sharp Drop", \`\`Sharp Rise", and \`\`Trend Reversal". ```\AddMethod{Below, we detail the process when TEXT\_DES is ``Sharp Drop''}```{=latex}. First, a prompt is provided to GPT-4o mini, which generates code to filter typical cases for the \`\`Sharp Drop" scenario. The prompt is shown in Table `\ref{tab:prompt}`{=latex}.

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| **Scenario: Sharp Drop**                                                                                                                                                           |
+:===================================================================================================================================================================================+
| **Data Description**: The input data is in CSV format with the following information.                                                                                              |
|                                                                                                                                                                                    |
| - The first column \`\`date" represents the trading date.                                                                                                                          |
|                                                                                                                                                                                    |
| - The second column \`\`minute" represents the time.                                                                                                                               |
|                                                                                                                                                                                    |
| - Each subsequent column corresponds to an instrument, with the value in each cell representing the return of the instrument for the given minute compared to the previous minute. |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| **Output Description**: Please identify and provide 30 samples where a stock drops sharply within a 25-minute window. For each sample, include the following details:              |
|                                                                                                                                                                                    |
| 1.  Date.                                                                                                                                                                          |
|                                                                                                                                                                                    |
| 2.  Start and end minute of the 25-minute window.                                                                                                                                  |
|                                                                                                                                                                                    |
| 3.  Stock code.                                                                                                                                                                    |
|                                                                                                                                                                                    |
| 4.  The return of the 25-minute interval.                                                                                                                                          |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| **Constraints on Output:**                                                                                                                                                         |
|                                                                                                                                                                                    |
| 1.  Ensure that the 25-minute cases do not contain duplicate stock codes and datetimes. Each sample should be selected from different trading days.                                |
|                                                                                                                                                                                    |
| 2.  Ensure that each 25-minute interval is within the same trading day.                                                                                                            |
|                                                                                                                                                                                    |
| 3.  You can use `groupby(’datetime’).rolling(25).sum()` to convert 1-minute-level returns to 25-minute-level returns.                                                              |
|                                                                                                                                                                                    |
| 4.  The begin and end times of the 25-minute interval should be within trading hours, e.g., 9:30 AM - 11:30 AM and 1 PM - 3 PM.                                                    |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

: Prompt used for generating code in the \`\`Sharp Drop" scenario {#tab:prompt}

The code generated by GPT-4o mini, shown in Fig. `\ref{fig:code-example}`{=latex}, is then used to filter the \`\`Sharp Drop" scenario and applied to the entire dataset to identify additional cases.

<figure id="fig:code-example">

<figcaption>Generated code to filter out the ``Sharp Drop” case</figcaption>
</figure>

Once the minute-level return trajectory is retrieved, it is used to guide the generation of order batches along with the ensemble model for scenario generation. Detailed descriptions and visualizations of the three scenarios are provided:

- **Sharp Drops**: Simulating sharp declines to understand market reactions to negative events, assess risk management strategies, and evaluate market liquidity.

- **Sharp Rises**: Simulating sharp increases to capture market behavior during positive events, allowing traders to test profit-taking strategies and analyze upward trends.

- **Trend Reversals**: Simulating trend reversals to identify signals for entry or exit points and understand market reactions to trend shifts.

Fig. `\ref{fig:control_cases}`{=latex} displays real stock trends over the first 15 minutes and the stock trends generated by MarS for the last 10 minutes of a 25-minute period for these scenarios. Each row represents a scenario with three cases. The x-axis denotes time, and the y-axis indicates price. The blue line shows the replay price trajectory, and the orange line depicts the simulated price trajectory with confidence intervals. The results demonstrate MarS's capability to effectively generate diverse market scenarios, providing valuable insights for market participants.

<figure id="fig:control_cases" data-latex-placement="ht!">
<img src="Figures/Control/control-examples.png" style="width:95.0%" />
<figcaption>Case study for different scenario generation.</figcaption>
</figure>

\AddMethod{
    \section{Configurations of input over different applications }\label{sec:config-apps}

    As we abstract the mechanism of MarS as a conditional generation process in Sec.\ref{MarS Design}, we summarize their input conditions over different applications in Table \ref{table:app-condition}, and provide more detailed clarification.

    \textbf{DES\_TEXT} is a key component in the Conditional Trading Order Generation task, acting as a control mechanism for the ``Conditional'' aspect. It is designed to describe different market states under which we aim to generate trading orders. Examples of such market states include ``sharp price decline'' or ``high market volatility''. By incorporating \textbf{DES\_TEXT}, we enable the generation process to adapt to varying market conditions, making the generated trading orders contextually relevant. More details on \textbf{DES\_TEXT} are provided in Sec \ref{app-sec:nlp-control}.

    As for \textbf{MTCH\_R}, it represents a comprehensive set of order-matching rules,  for example, the widely used double auction mechanism. In real-world financial markets, the rules are specified and periodically adjusted by exchanges. In our simulation, these rules are governed by the Simulated Clearing House. We formulated \textbf{MTCH\_R} as a hyperparameter to make the MarS framework adaptable to different markets and conditions. In the proposed paper, we set it as a series of standard settings of the default double auction.  Expanding \textbf{MTCH\_R}  would reveal the full extent of an exchange’s trading rules, encompassing many details that we have implemented in our code for the Simulated Clearing House.

    Moreover, while the double auction mechanism is a common paradigm for the majority of global financial markets, there are variations in trading rules that differ across markets and periods. These include aspects such as price fluctuation limits, circuit breakers, and the distinction between call and continuous auction sessions. Our goal was to encapsulate these variations within the conditional trading order generation framework, ensuring the approach remains broadly applicable and flexible for different market scenarios.
}

         **Applications**                                                      **Input Conditions**
  ----------------------- ------------------------------------------------------------------------------------------------------------------------------
              Forecasting                                        $\left(x_0, \ldots, x_m\right),\textit{MTCH\_R}$
                Detection                                        $\left(x_0, \ldots, x_m\right),\textit{MTCH\_R}$
    \`\`What if" Analysis  \[$\textit{DES\_TEXT}],\left(x_0, \ldots, x_m\right)^*,[\left(\dot{x}_{i+1}, \ldots, \dot{x}_{i+j}\right)],\textit{MTCH\_R}$
           RL Environment   $\textit{DES\_TEXT}^*,\left(x_0, \ldots, x_m\right)^*,\left(\dot{x}_{i+1}, \ldots, \dot{x}_{i+j}\right),\textit{MTCH\_R}$

  : `\AddMethod{The summary of input conditions for order generation of different applications. $*$ means the condition is optional and $[ \ ]$ indicates that either of the specified conditions should be chosen. }`{=latex} {#table:app-condition}

# Data and Technical Details of Detection {#appendix:detection}

::: center
   **Instrument**   **Start time**   **End time**   **Case Number**  
  ---------------- ---------------- -------------- ----------------- --
       300475         2017-03-07      2017-04-25     \[2020\]No.92   
       002321         2017-04-17      2018-01-30     \[2024\]No.44   
       300263         2017-05-17      2017-09-25     \[2023\]No.36   
       300658         2019-02-13      2019-05-10     \[2023\]No.25   
       300378         2019-03-14      2019-04-15    \[2021\]No.116   
       300119         2019-04-01      2019-05-22    \[2021\]No.116   
       002718         2020-06-04      2020-07-15     \[2022\]No.64   
       300313         2020-08-19      2020-08-24     \[2021\]No.76   
       002730         2020-12-15      2021-11-17     \[2024\]No.23   
       002713         2022-05-05      2022-05-18     \[2024\]No.47   

  : Market manipulation samples collected from CSRC. {#tab:detection sample}
:::

Traditional methods for detecting market abuse are time-consuming and challenging, and abnormal market states are often defined and detected based on the differences between current and historical market patterns. In this section, we take market manipulation as an example, and demonstrate how MarS could bring a new simulation-based paradigm to detection task.

Table `\ref{tab:detection sample}`{=latex} shows the market manipulation samples collected from China Securities Regulatory Commission (CSRC). The data encompass a total of 10 stocks, which have never been included in datasets used for our model training. For each stock, we gathered samples from an equal number of trading days before and after the manipulation occurred for comparison. There are 522 trading days for each period. For each trading day, we conducted simulations every 25 minutes and then calculated a series of stylized facts of the simulated and replay trajectories.

The spread is a key indicator of market liquidity, with a larger spread indicating poorer market liquidity. At time $t$, the spread $\delta$ is defined as: $\delta_t = a_t - b_t$, where $a_t$ is the best ask price and $b_t$ is the best bid price. The spread distribution is widely used in detection tasks in finance [@affleck2000detecting; @vyetrenko2020get].

As we evaluated MarS's realism in a normal market in Sec. `\ref{Experiments}`{=latex}, a straightforward principle for anomaly detection is that a quick drop in simulation realism metrics can serve as an initial indicator of potential anomalies. To verify it, we collected several market manipulation cases from CSRC[^5]. For each stock, we collected replay samples before, during and after the manipulation, and conducted simulations by MarS simultaneously. Through calculating Distribution Similarity[^6], we evaluate the similarity of spread distributions, which serves as a key indicator of market liquidity. This metric is used for comparison between replay and simulation.

Fig. `\ref{fig:market_manipulation}`{=latex} shows the varying spread distributions in different periods around manipulation. While MarS generally performs well to simulate the normal market, its performance drops during the manipulation, showing a heavier tail and a peak around $\delta=1000$. These anomalies can be viewed as signals likely corresponding to market manipulation, where manipulators significantly impact liquidity, widening the spread. These anomalies, less frequent in normal markets, lead to a performance drop in MarS, suggesting a new detection approach by monitoring such similarity drops. Consequently, MarS can help investors avoid anomalies and assist financial institutions in maintaining market stability.

It is important to note that a single anomaly does not conclusively indicate market manipulation. Instead, it serves as an initial signal that requires further holistic assessment, combining multiple metrics to ensure robust conclusions. The example provided serves as a representative illustration of our approach. Our primary objective in this experiment was to demonstrate the paradigm shift MarS offers in market manipulation detection.

# Configurable TWAP Strategy and Trading Agent  {#sec:twap-algo}

## Introduction of TWAP Strategy

The Time-Weighted Average Price (TWAP) algorithm executes large trade volumes while minimizing market impact over a specified time frame. The TWAP strategy divides the total volume to be traded into equal parts that are executed at regular intervals. This strategy consists of two distinct phases within each interval: the passive period and the aggressive period. Key configurations include:

- **Maximum Passive Volume Ratio (PVR)**: During the passive period, the strategy places orders at the current bid price (bid1) with a volume determined by the PVR, aiming to fill orders without significantly altering the market price. A PVR of 0 indicates no passive volume during the passive period.

- **Aggressive Price (AP)**: If passive trading does not achieve the expected volume, the strategy enters an aggressive phase, placing additional orders at a more aggressive price (AP) to ensure the desired volume is executed. An AP of 0 means no aggressive order during the aggressive period.

By balancing passive and aggressive trading, the TWAP strategy aims to execute large orders efficiently while controlling market impact.

Taking the buying task as an example, our configurable TWAP strategy is shown as below:

\begin{algorithm}[H]\SetAlgoLined
    \KwIn{Total Volume $V$, Execution Time $T = 5$ minutes, Split Interval $\Delta t = 30$ seconds, Maximum Passive Volume Ratio $PVR$, Aggressive Price $AP$ (ask1, ask2, ..., ask5)}
    \KwOut{Executed Orders}
    \BlankLine
    \textbf{Initialization:}
    \begin{enumerate}
        \item Split the total volume $V$ into 10 equal parts. Each part $K = V/10$ is expected to be executed in $\Delta t = 30$ seconds.
    \end{enumerate}
    \BlankLine
    \textbf{For each interval $i$ from $1$ to $10$:}
    \begin{enumerate}
        \item \textbf{Passive Period:} (First 25 seconds of each interval)
              \begin{enumerate}
                  \item Cancel all non-bid1 volumes.
                  \item Submit a passive order with max volume $PVR \times V$ and price bid1.
                  \item Wait for 25 seconds.
              \end{enumerate}
        \item \textbf{Aggressive Period:} (Last 5 seconds of each interval)
              \begin{enumerate}
                  \item If the current executed volume lags behind the expected volume:
                        \begin{itemize}
                            \item Calculate the extra volume $E$ to be executed.
                            \item If the available volume is insufficient, cancel existing passive orders as needed.
                            \item Submit an aggressive order with volume $E$ and price $AP$.
                        \end{itemize}
                  \item Wait for 5 seconds.
              \end{enumerate}
    \end{enumerate}
    \caption{Configurable Time-Weighted Average Price (TWAP) Strategy for Buying.}
    \label{algo:twap}
\end{algorithm}

## Training of TWAP Trading Agent with RL

For the trading agent training with RL, we can adjust the maximum passive volume ratio (PVR) from {0, 0.1, ..., 1} and aggressive price (AP) in {0, 1, 2, 3, 4, 5} for TWAP Strategy. We used a batch size of 8192 and a learning rate of $4 \times 10^{-5}$. The trading model was updated using a simple policy gradient algorithm [@sutton2018reinforcement]. The performance metric is the price advantage over our best-configured TWAP agent ($\text{L1-P0.9}$), measured in basis points (BP, 1/10000).

#  `\AddMethod{Evaluation of Cont's 11 Stylized Facts}`{=latex} {#sec:cont-stylized-facts}

`\label{set:cont-stylized-facts}`{=latex}

## Summary

\AddMethod{Stylized facts are high-level summaries of empirical characteristics in financial markets, essential for assessing the realism of market simulations. In this section, we evaluate the 11 stylized facts identified by \cite{cont2001empirical} using historical and simulated order sequences.}
\AddMethod{To rigorously test these facts, we simulated 11,591 trajectories for the top 500 liquid stocks in the Chinese market, from March 9, 2023, to July 12, 2023. Table \ref{table:font-11-facts} compares the presence of these facts in both historical and simulated data. The \textbf{Historical} column indicates observation in real data, while the \textbf{Simulated} column assesses their presence in simulated data. Key findings include:}

- \AddMethod{Nine out of the 11 stylized facts are observed in both historical and simulated data. However, \textit{Gain/loss asymmetry} and \textit{Leverage effect} are not present, possibly reflecting modern market shifts. Studies such as \cite{10386957} note similar absences in the modern U.S. Dow 30 stocks.}

- \AddMethod{All 11 facts show similar patterns between simulated and historical sequences, showcasing the model's strong capability in generating realistic order sequences.}

\AddMethod{Note that merely evaluating stylized facts does not fully assess financial market simulation quality. Further evaluations for \textbf{in-context} generation, such as forecasting (Section \ref{sec:forecasting}) and quantitative analysis of stylized facts (Section \ref{sec:quantitative-stylized-facts}), are crucial.

    \begin{table}[h!]
        
        \begin{tabular}{|c|l|c|c|}
            \hline
            \textbf{Fact \#} & \textbf{Fact Name}                                & \textbf{Historical} & \textbf{Simulated} \\ \hline
            1                & Absence of autocorrelations                       & $\times$            & $\times$           \\
            2                & Heavy tails                                       & $\times$            & $\times$           \\
            3                & Gain/loss asymmetry                               &                     &                    \\
            4                & Aggregational Gaussianity                         & $\times$            & $\times$           \\
            5                & Intermittency                                     & $\times$            & $\times$           \\
            6                & Volatility clustering                             & $\times$            & $\times$           \\
            7                & Conditional heavy tails                           & $\times$            & $\times$           \\
            8                & Slow decay of autocorrelation in absolute returns & $\times$            & $\times$           \\
            9                & Leverage effect                                   &                     &                    \\
            10               & Volume/volatility correlation                     & $\times$            & $\times$           \\
            11               & Asymmetry in timescales                           & $\times$            & $\times$           \\ \hline
        \end{tabular}
        \caption{\AddMethod{Presence of Stylized Facts in Historical and Simulated Order Sequences. All facts are present in both historical and simulated data, except for \textit{Gain/loss asymmetry} and \textit{Leverage effect}.}}
        \label{table:font-11-facts}
    \end{table}
}

## `\AddMethod{Definitions of Stylized Facts}`{=latex} {#section}

\AddMethod{
    The 11 stylized facts from \cite{cont2001empirical} are:

    \begin{enumerate}
        \item \textbf{Absence of autocorrelations}: ``(linear) autocorrelations of asset returns are often insignificant, except for very small intraday time scales (20 minutes) for which microstructure effects come into play.''
        \item \textbf{Heavy tails}: ``the (unconditional) distribution of returns seems to display a power-law or Pareto-like tail, with a tail index which is finite, higher than two and less than five for most data sets studied. In particular this excludes stable laws with infinite variance and the normal distribution. However the precise form of the tails is difficult to determine.''
        \item \textbf{Gain/loss asymmetry}: ``one observes large drawdowns in stock prices and stock index values but not equally large upward movements.''
        \item \textbf{Aggregational Gaussianity}: ``as one increases the time scale $t$ over which returns are calculated, their distribution looks more and more like a normal distribution. In particular, the shape of the distribution is not the same at different time scales.''
        \item \textbf{Intermittency}: ``returns display, at any time scale, a high degree of variability. This is quantified by the presence of irregular bursts in time series of a wide variety of volatility estimators.''
        \item \textbf{Volatility clustering}: ``different measures of volatility display a positive autocorrelation over several days, which quantifies the fact that high-volatility events tend to cluster in time.''
        \item \textbf{Conditional heavy tails}: ``even after correcting returns for volatility clustering (e.g. via GARCH-type models), the residual time series still exhibit heavy tails. However, the tails are less heavy than in the unconditional distribution of returns.''
        \item \textbf{Slow decay of autocorrelation in absolute returns}: ``the autocorrelation function of absolute returns decays slowly as a function of the time lag, roughly as a power law with an exponent $\beta \in [0.2, 0.4]$. This is sometimes interpreted as a sign of long-range dependence.''
        \item \textbf{Leverage effect}: ``most measures of volatility of an asset are negatively correlated with the returns of that asset.''
        \item \textbf{Volume/volatility correlation}: ``trading volume is correlated with all measures of volatility.''
        \item \textbf{Asymmetry in time scales}: ``coarse-grained measures of volatility predict fine-scale volatility better than the other way round.''
    \end{enumerate}

    \subsection{Evaluation of Stylized Facts}

    This subsection summarizes the evaluation results for each stylized fact. Initially, each instrument is assessed individually, and the results are then aggregated across all instruments to obtain an average. A 95\% confidence interval is shown for line plots, and quantiles are displayed for the box plot.

    \textbf{Absence of autocorrelations}: We computed the autocorrelation of returns using both the last and mean trade prices per minute. Fig. \ref{fig:cont1-last-absence-of-auto-correlations} and \ref{fig:cont1-mean-absence-of-auto-correlations} illustrate that autocorrelations decay quickly after one minute. Using the last trade price shows negative autocorrelation at lag 1 due to the ``bid-ask bounce'', as noted in \cite{10386957}. Conversely, the mean trade price shows positive autocorrelation, indicating short-term momentum. For consistency with \cite{10386957}, we use the last trade price for subsequent evaluations.

    \begin{figure}[ht!]
        
        \begin{tabular}[c]{c c}
            \begin{subfigure}[b]{0.45\textwidth}
                
                \includegraphics[width=0.99\linewidth]{Figures/Cont-11-stylized-facts/cont1-last-absence-of-auto-correlations.pdf}
                
                \caption{\small Absence of autocorrelations (Last Price)}
                \label{fig:cont1-last-absence-of-auto-correlations}
            \end{subfigure} &
            \begin{subfigure}[b]{0.45\textwidth}
                
                \includegraphics[width=0.99\linewidth]{Figures/Cont-11-stylized-facts/cont1-mean-absence-of-auto-correlations.pdf}
                
                \caption{\small Absence of autocorrelations (Mean Price)}
                \label{fig:cont1-mean-absence-of-auto-correlations}
            \end{subfigure}
        \end{tabular}
        
        \caption{\small \AddMethod{ Absence of autocorrelations. (\textbf{a}) Using \textit{last} trade price. (\textbf{b}) Using \textit{mean} trade price. Both show rapid decline after 1 minute.}}
        
    \end{figure}

    \textbf{Heavy tails} and \textbf{Aggregational Gaussianity}: Kurtosis of returns for various intervals was calculated. Positive kurtosis indicates sharper peaks and heavier tails than normal distribution. Fig. \ref{fig:cont2-last-heavy-tails-aggregational-gaussianity} shows that return distributions exhibit heavy tails. Distributions trend towards normality as intervals extend from 1 to 20 minutes, aligning with Aggregational Gaussianity.

    \textbf{Conditional heavy tails}: Volatility varies throughout the trading day, peaking at open and close. After normalizing returns by minute-specific volatility and computing kurtosis, Fig. \ref{fig:cont7-conditional-heavy-tails} shows that normalized returns still exhibit heavy tails, though less pronounced than unconditional returns in Fig. \ref{fig:cont2-last-heavy-tails-aggregational-gaussianity}, consistent with Conditional heavy tails.

    \begin{figure}[ht!]
        
        \begin{tabular}[c]{c c}
            \begin{subfigure}[b]{0.45\textwidth}
                
                \includegraphics[width=0.99\linewidth]{Figures/Cont-11-stylized-facts/cont2-last-heavy-tails-aggregational-gaussianity.pdf}
                
                \caption{\small Heavy tails and Aggregational Gaussianity}
                \label{fig:cont2-last-heavy-tails-aggregational-gaussianity}
            \end{subfigure} &
            \begin{subfigure}[b]{0.45\textwidth}
                
                \includegraphics[width=0.99\linewidth]{Figures/Cont-11-stylized-facts/cont7-conditional-heavy-tails.pdf}
                
                \caption{\small Conditional heavy tails}
                \label{fig:cont7-conditional-heavy-tails}
            \end{subfigure}
        \end{tabular}
        
        \caption{\small (\textbf{a}) \AddMethod{ Heavy tails and Aggregational Gaussianity. (\textbf{b}) Conditional heavy tails.}}
        
    \end{figure}

    \textbf{Gain/loss asymmetry}: Positive skewness of returns (Fig. \ref{fig:cont3-gain-loss-asymmetry}) suggests a deviation from Cont's original description.

    \textbf{Volatility clustering} and \textbf{Slow decay of autocorrelation in absolute returns}: Autocorrelation of absolute returns for different intervals shows slow decay in Fig. \ref{fig:cont6-volatility-clustering-slow-decay}. Considering absolute returns as volatility \cite{MULLER1997213}, this also illustrates volatility clustering.

    \begin{figure}[ht!]
        
        \begin{tabular}[c]{c c}
            \begin{subfigure}[b]{0.45\textwidth}
                
                \includegraphics[width=0.99\linewidth]{Figures/Cont-11-stylized-facts/cont3-gain-loss-asymmetry.pdf}
                
                \caption{\small Gain/loss asymmetry}
                \label{fig:cont3-gain-loss-asymmetry}
            \end{subfigure} &
            \begin{subfigure}[b]{0.45\textwidth}
                
                \includegraphics[width=0.99\linewidth]{Figures/Cont-11-stylized-facts/cont6-volatility-clustering-slow-decay.pdf}
                
                \caption{\small Volatility Clustering}
                \label{fig:cont6-volatility-clustering-slow-decay}
            \end{subfigure}
        \end{tabular}
        
        \caption{\small (\textbf{a}) \AddMethod{Gain/loss asymmetry: right-skewed distribution. (\textbf{b}) Volatility Clustering: slow decay of absolute return autocorrelation.}}
        
    \end{figure}

    \textbf{Intermittency}: Following \cite{10386957}, extreme returns are defined as the 99\% quantile of absolute returns. The Fano factor, used to verify Poisson distribution adherence, exceeded 1, indicating higher variability (Fig. \ref{fig:cont5-intermittency}). This, along with heavy tails and volatility clustering, confirms Intermittency.

    \textbf{Leverage effect}: Return and lagged volatility correlation is slightly positive (Fig. \ref{fig:cont9-leverage-effect}), contrary to Cont's description.

    \begin{figure}[ht!]
        
        \begin{tabular}[c]{c c}
            \begin{subfigure}[b]{0.45\textwidth}
                
                \includegraphics[width=0.99\linewidth]{Figures/Cont-11-stylized-facts/cont5-intermittency.pdf}
                
                \caption{\small Intermittency}
                \label{fig:cont5-intermittency}
            \end{subfigure} &
            \begin{subfigure}[b]{0.45\textwidth}
                
                \includegraphics[width=0.99\linewidth]{Figures/Cont-11-stylized-facts/cont9-leverage-effect.pdf}
                
                \caption{\small Leverage effect}
                \label{fig:cont9-leverage-effect}
            \end{subfigure}
        \end{tabular}
        
        \caption{\small (\textbf{a}) \AddMethod{Intermittency: Fano factor exceeds 1, indicating high variability. (\textbf{b}) Leverage effect: slightly positive correlation between return and lagged volatility.}}
        
    \end{figure}

    \textbf{Volume/volatility correlation}: Positive correlation between volume and lagged volatility is evident (Fig. \ref{fig:cont10-volume-volatility-corr}).

    \textbf{Asymmetry in timescales}: Following \cite{TAKAHASHI2019121261}, we assessed correlation between fine- and coarse-grained volatility across lags from -10 to 10 minutes. Fig. \ref{fig:cont11-asymmetry-in-timescales} shows significant negative asymmetry, consistent with \cite{TAKAHASHI2019121261} and \cite{MULLER1997213}.

    \begin{figure}[ht!]
        
        \begin{tabular}[c]{c c}
            \begin{subfigure}[b]{0.45\textwidth}
                
                \includegraphics[width=0.99\linewidth]{Figures/Cont-11-stylized-facts/cont10-volume-volatility-corr.pdf}
                
                \caption{\small Volume/volatility correlation}
                \label{fig:cont10-volume-volatility-corr}
            \end{subfigure} &
            \begin{subfigure}[b]{0.45\textwidth}
                
                \includegraphics[width=0.99\linewidth]{Figures/Cont-11-stylized-facts/cont11-asymmetry-in-timescales.pdf}
                
                \caption{\small Asymmetry in timescales}
                \label{fig:cont11-asymmetry-in-timescales}
            \end{subfigure}
        \end{tabular}
        
        \caption{\small (\textbf{a}) \AddMethod{Volume/volatility correlation: positive correlation. (\textbf{b}) Asymmetry in timescales: significant negative asymmetry observed.}}
        
    \end{figure}

}

# Quantitative Analysis of Stylized Facts {#sec:quantitative-stylized-facts}

To ensure experiments are comparable across runs, we quantify the stylized facts with two metrics:

- **Distribution Similarity**: We calculate the overlap coefficient between the empirical distribution of the stylized fact and the simulated distribution. A higher score indicates a higher similarity in the overall distribution.

- **Accuracy (3-Class)**: We classify one stylized fact value into three classes based on replay data: low, medium, and high, ensuring similar probabilities for each class over the simulation period. We then compare the stylized fact value between simulation and replay and calculate the accuracy of the classification. This metric measures our capability for in-context prediction.

<figure id="fig:buy-ratio" data-latex-placement="ht!">
<img src="Figures/Order-model/stylized-facts-buy-ratio.png" style="width:65.0%" />
<figcaption>Stylized Fact Analysis: Buy Order Ratio. This metric assesses the proportion of buy to buy+sell orders, capturing market dynamics that may influence the market trend.</figcaption>
</figure>

We show an example for the Buy Order Ratio in Fig. `\ref{fig:buy-ratio}`{=latex}: we calculate the buy order ratio for each minute and then compare the distribution of the ratio between simulation and replay data. In summary, we achieve a high score for the overall distribution similarity and an acceptable 3-class classification considering the nuances of market dynamics. We list the full quantitative results in Table `\ref{tab:stylized_fact}`{=latex}.

::: center
                  **Name**  **Distribution Similarity**   **Accuracy (3-Class)**
  ------------------------ ----------------------------- ------------------------
                Volatility             0.872                      0.516
                    Spread             0.970                      0.729
         Mean Order Volume             0.957                      0.776
    Aggressive Order Ratio             0.920                      0.525
           Buy Order Ratio             0.933                      0.570
              1-Min Return             0.956                      0.684
              2-Min Return             0.936                      0.625
              3-Min Return             0.924                      0.583
              4-Min Return             0.914                      0.548
              5-Min Return             0.908                      0.531

  : Summary of stylized facts. The prediction for 1 to 5-Min Return is aggregated from 128 rollouts for each initial time point. {#tab:stylized_fact}
:::

#  Market Impact {#app-sec:market-impact}

We give a detailed introduction and discussion on interactive simulation and market impact analysis.

**Market Impact Generation:** We generate market impact data using the TWAP strategy with four different configurations: $\text{L1-P0.1}$, $\text{L1-P0.9}$, $\text{L5-P0.1}$, and $\text{L5-P0.9}$. The configuration name $\text{LX-PY}$ indicates that the aggressive price ($\text{AP}$) is $\text{askX}$ and the maximum passive volume ratio ($\text{PVR}$) is $\text{Y}$. These agents are assigned to buy varying volumes over 5 minutes with different instructions and starting times. We explored the market impact generated by these trading agents from 624k simulated trading trajectories.

**Further analysis of synthetic market impact:** Beyond the verification of the Square-Root-Law, we apply further analysis on synthetic market impact data. The key findings are summarized as follows:

- Agents with more aggressive configurations ($\text{L5-P0.1}$ and $\text{L5-P0.9}$) are expected to exhibit a larger market impact and achieve a higher fulfillment rate. Our simulations quantify their differences and confirm these assumptions, as illustrated in Fig. `\ref{fig:market-impact-analysis-fullfill}`{=latex}.

- The agents generate both short-term and long-term market impacts in MarS, as shown in Fig. `\ref{fig:market-impact-analysis-short-long}`{=latex}, similar to observations studied in previous empirical work [@bacry2014marketimpactslifecycle; @donier2015fullyconsistentminimalmodel]. We also observe that agents with a larger passive volume ratio generate less momentum after trading ends.

<figure id="fig:market-impact-analysis-short-long" data-latex-placement="ht!">
<table>
<tbody>
<tr>
<td style="text-align: center;"><figure id="fig:market-impact-analysis-fullfill">
<img src="Figures/market_impact/twap-impacts/market-impact-fulfillment-rate.png" style="width:85.0%" />
<figcaption>Fulfillment rate of different agents</figcaption>
</figure></td>
<td style="text-align: center;"><figure id="fig:market-impact-analysis-short-long">
<img src="Figures/market_impact/twap-impacts/market-impact-long-short-impact.png" style="width:85.0%" />
<figcaption>Short-term and long-term market impact</figcaption>
</figure></td>
</tr>
</tbody>
</table>
<figcaption>Further investigation of synthetic market impact</figcaption>
</figure>

These findings confirm the reliability and convenience of using synthetic data from MarS, allowing for in-depth exploration of market dynamics without the cost, risk, and time constraints associated with real-world experiments.

**New factors of Market Impact:** The new three factors $\{$*resiliency*, *LOB_pressure*, *LOB_depth*$\}$ are defined as below: $$\begin{align}
    \textit{resiliency}    & = 1-\log (|\textit{pre\_trading\_moment}|)                                                                                    \\
    \textit{LOB\_pressure} & = (\alpha *\textit{agent\_trans\_ask} + (1-\alpha) *\textit{agent\_trans\_bid} )*\textit{LOB\_imb}_{\textit{last-pre-min}}    \\
    \textit{LOB\_depth}    & = \log(\beta *{\textit{LOB\_ask\_volume}_\textit{last-pre-min}+ (1-\beta)*\textit{LOB\_bid\_volume}_\textit{last-pre-min} }),
\end{align}$$ where: $$\begin{align}
    \textit{pre\_trading\_moment}             & = \frac{\sum_{t_0}^{\textit{last-pre-min}-1}\gamma_t * \textit{mid\_price}_{t}}{\textit{mid\_price}_{\textit{last-pre-min}}} -1                                                                                                     \\
    \textit{agent\_trans\_ask}                & = \frac{\sum_{t=\textit{trade\_start}}^\textit{trade\_end}\textit{agent\_trans\_volume}_{t}}{\sum_{t=\textit{trade\_start}}^\textit{trade\_end}\textit{agent\_trans\_volume}_{t} + \textit{LOB\_ask\_volume}_\textit{last-pre-min}} \\
    \textit{agent\_trans\_bid}                & = \frac{\sum_\textit{trade\_start}^\textit{trade\_end}\textit{agent\_trans\_volume}_{t}}{\sum_\textit{trade\_start}^\textit{trade\_end}\textit{agent\_trans\_volume}_{t} + \textit{LOB\_bid\_volume}_\textit{last-pre-min}}         \\
    \textit{LOB\_imb}_{\textit{last-pre-min}} & = \frac{|\textit{LOB\_ask\_volume}_\textit{last-pre-min}-\textit{LOB\_bid\_volume}_\textit{last-pre-min}|}{\textit{LOB\_ask\_volume}_\textit{last-pre-min}+\textit{LOB\_bid\_volume}_\textit{last-pre-min}},
\end{align}$$ and $\alpha, \beta, \{ \gamma_t\}$ are the hyper-parameters with constrain: $\alpha\in( 0,1)$, $\beta\in( 0,1)$, $\gamma_t\in( 0,1)$ for any $t$, and $\sum_{t_0}^{\textit{last-pre-min}-1}\gamma_t = 1$. $\textit{last-pre-min}$ means the last minute before the agent starts to trade. $\textit{LOB\_ask\_volume}$ and $\textit{LOB\_bid\_volume}$ are the ask and bid volumes of LOB. $\textit{agent\_trans\_volume}_{t}$ is the transaction volume of the agent at time $t$. $\textit{mid\_price}_t$ is the mid-price at time $t$.

The relationship between market impact and factors $\textit{LOB\_pressure}$, and $\textit{LOB\_depth}$ is shown in Fig. `\ref{fig:market-impact-factors}`{=latex}.

<figure id="fig:market-impact-factors" data-latex-placement="ht!">
<table>
<tbody>
<tr>
<td style="text-align: center;"><figure>
<img src="Figures/market_impact/LOB_pressure.png" />
<figcaption>LOB_pressure</figcaption>
</figure></td>
<td style="text-align: center;"><figure>
<img src="Figures/market_impact/LOB_depth.png" />
<figcaption>LOB_depth</figcaption>
</figure></td>
</tr>
</tbody>
</table>
<figcaption>Effects of new factors on market impact.</figcaption>
</figure>

We also investigate the correlation between three new factors and the Square-Root-Law factors: $\textit{sqrt}(Q/V)$ and volatility $\sigma$ in Fig. `\ref{fig:correlation}`{=latex}. It is clear that the correlation scores of those factors are relatively low.

<figure id="fig:correlation" data-latex-placement="ht!">
<img src="Figures/market_impact/correlation_matrix.png" style="width:55.0%" />
<figcaption>Correlation matrix of Square-Root-Law factors and three new factors.</figcaption>
</figure>

**Dynamics of long-term Market Impact:** For `\eqref{eq:long-term-appendix}`{=latex} used to model the long-term market impact, we set two decay functions: $F^{decay}(t) = [ \frac{1}{t}, \frac{1}{\sqrt{t}} ]$ and seven factors: $\{ \sqrt{\frac{Q}{V}}, \textit{mid-price}, \textit{agent\_replay}, \textit{agent\_rollout},\textit{LOB\_depth},\textit{LOB\_pressure},\textit{resiliency} \}$. $\textit{mid-price}$ is the mid-price before trading. $\textit{agent\_rollout}$ and $\textit{agent\_replay}$ are defined as below: $$\begin{align}
    \textit{agent\_rollout} & = \frac{\sum_\textit{trade\_start}^\textit{trade\_end}\textit{agent\_trans\_volume}_{t}}{\textit{total\_transaction\_volume\_of\_rollout}} \\
    \textit{agent\_replay}  & = \frac{\sum_\textit{trade\_start}^\textit{trade\_end}\textit{agent\_trans\_volume}_{t}}{\textit{total\_transaction\_volume\_of\_replay}}
\end{align}$$

The training process is based on the synthetic long-term market impact generated by the TWAP agent ($L1-P0.1$). We use torch-diff [@torchdiffeq] to optimize $W$, where the objective is set as the MSE reconstruction loss along with the L1 regularization.

After training, we illustrate the auto-correlation of the synthetic market impact decay, the trajectories predicted by the learned ODE, and the base ODE from empirical formulas [@gatheral2011exponential; @curato2017optimal] in Fig. `\ref{fig:long-term-ode-auto}`{=latex}.

<figure id="fig:long-term-ode-auto" data-latex-placement="ht!">
<img src="Figures/market_impact/auto_correlation.png" style="width:45.0%" />
<figcaption>Auto-correlation of long-term market impact with learned ODE and base-ODE. </figcaption>
</figure>

For the base-ODE used as a baseline in Fig. `\ref{fig:long-term-ode-auto}`{=latex}, we use the basic form of the Square-Root Process [@gatheral2010no], which is defined as: $$\begin{align}
    \frac{dY(t)}{dt} =  \sigma \sqrt{\frac{Q}{V}} \frac{1}{\sqrt{t}} \label{eq:long-term-appendix}
\end{align}$$ where $\sigma$ is the volatility, $Q$ is the trading volume, and $V$ is the total market volume.

\AddMethod{
    \section{Comparison of DeepLOB and MarS/LMM in Forecasting Tasks}
    \label{sec:forecasting-comparison}

    \begin{table}[h!]
        
        \begin{tabular}{{r c c}}
            \toprule
            \textbf{Aspect}           & \textbf{DeepLOB}                     & \textbf{MarS/LMM}                       \\
            \textbf{Applicable Tasks} & Task specific forecasting.           & General forecasting through simulation. \\
            \textbf{Input Features}   & Limit order book (LOB) data.         & High-frequency order-level data.        \\
            \textbf{Model}            & Small, handcrafted, and not scalable & Large-scale foundation model.           \\
            \textbf{Prediction}       & Single-step or fixed-length.         & Multi-step, sequence generation.        \\ \bottomrule
        \end{tabular}
        \caption{Comparison of DeepLOB and MarS/LMM in forecasting tasks.}
        \label{tab:comparison}
    \end{table}


    Table \ref{tab:comparison} compares DeepLOB and MarS/LMM in forecasting tasks, emphasizing their distinct approaches and capabilities. DeepLOB is designed for specific forecasting tasks, trained on fixed step forecasting, and uses Limit Order Book (LOB) data as input. It features a relatively small, handcrafted model for LOB forecasting, which is hard to scale up, and provides single-step predictions for fixed-length forecasting, such as price changes after 100 orders or 1 minute. In contrast, MarS is designed for market simulation, capable of performing general forecasting through simulation, and uses fine-grained order sequence data as input. It is powered by large foundation models trained on large-scale order sequence data and offers simulation with multi-step generation.
}

[^1]: Equal Contribution

[^2]: Corresponding Author

[^3]: <http://www.csrc.gov.cn>

[^4]: <https://en.wikipedia.org/wiki/Overlap_coefficient>

[^5]: <http://www.csrc.gov.cn>

[^6]: <https://en.wikipedia.org/wiki/Overlap_coefficient>