Exploration: Fine-Tuning With Parameter Decomposition

Source

Raw Markdown: paper_exploration-fine-tuning-parameter-decomposition-2026.md
Primary write-up: LessWrong post by Lucius Bushnaq
Official X thread: Goodfire German-removal thread
Correction: Goodfire off-target plot correction
VPD announcement thread: Lee Sharkey thread
Official code: goodfire-ai/param-decomp
VPD paper / interactive report: Interpreting Language Model Parameters
VPD summary: Goodfire VPD explainer
SPD paper: arXiv:2506.20790

Local X/API/HTML/code snapshots are stored under papers/exploration-fine-tuning-parameter-decomposition-2026/, including x_provided_posts_2026-06-26.json, x_thread_goodfireai_2070181051801235463.json, x_thread_leedsharkey_2051717264286609516.json, and source_github_readme.md.

Status And Credibility

This is a 2026-06-25 author-reported hackathon exploration by Goodfire/Lucius Bushnaq, not a peer-reviewed paper. It is credible enough to track as an important source because the LessWrong write-up, Goodfire X thread, Lee Sharkey VPD thread, Goodfire research pages, and official MIT-licensed goodfire-ai/param-decomp repository all cross-reference the same VPD parameter-decomposition method family and 67M language-model decomposition.

Credibility caveats are substantial: the target is a single 67M-parameter four-layer language model, the result depends on an expensive prior decomposition and autointerpretation pass, and the experiment is designed as a sanity check / product hackathon demo rather than a benchmarked scaling study.

Core Claim

A VPD-derived weight decomposition can turn some model edits into scalar rescaling of existing causal parameter subcomponents. In the reported experiment, Goodfire removes a small language model’s ability to predict German text by tuning one scalar prefactor on a German-related rank-1 subcomponent, instead of adding LoRA capacity or doing ordinary dense fine-tuning.

The useful formula-level picture is:

W^{l} \approx c \sum U_{c}^{l} (V_{c}^{l})^{⊤} + Δ^{l},

where the edit trains or selects a mask/prefactor $m_{c}$ over an existing subcomponent rather than learning a new adapter matrix. The source frames $m_{c} > 1$ as amplification, $0 \leq m_{c} < 1$ as suppression, and $m_{c} < 0$ as inversion.

Evidence

Evidence thread	Source report	Wiki interpretation
Goodfire X thread	The Goodfire thread says a 67M language model’s German ability was removed by tuning one scalar on one decomposed weight subcomponent, using only a few German tokens, and that a corrected off-target plot reduces one displayed effect by 0.01 nats.	Useful as official launch context and provenance, but the technical evidence is the LessWrong write-up and VPD/code artifacts.
LessWrong experiment	The post reports that the inverting one-scalar edit reaches German-at-chance behavior with about 4 German training tokens and under 0.10 nats English CE damage; LoRA baselines need about 32 tokens for the comparable English-damage target.	Strong mechanism clue for this small model, but not a fair total-cost comparison unless decomposition and autointerpretation cost are included.
Off-target language behavior	The top-16 component edit and rank-1 LoRA often damage French, Spanish, and Italian; after autointerp labels narrow the edit to `h.3.attn.v_proj:513`, French and Spanish are mostly preserved, while Italian is still damaged.	Interpretability is operationally useful: labels changed the edit target and reduced collateral damage. The Italian failure is a warning that labels can be incomplete.
VPD lineage	Lee Sharkey’s thread and Goodfire’s VPD report say VPD decomposes weights rather than activations, handles attention computations distributed across heads, builds attribution graphs, and supports hand-written edits such as emoticon completion.	The German-removal demo is a post-hoc edit on the same decomposed 67M model, not an independent proof of scalable decomposition.
Code repository	The official `param-decomp` repo contains the core library, `nano_param_decomp/`, `param-decomp-lab`, VPD/SPD code releases, and language-model experiment entrypoints.	Reproducibility is better than a thread-only claim, but large-model practicality remains unproven.

Why It Matters

The source sharpens the wiki’s weight-update lens. It separates at least three model-adaptation regimes:

flowchart LR
  Base[Base model weights] --> Decomp[VPD parameter decomposition]
  Decomp --> Components[Interpretable rank-1 subcomponents]
  Components --> ScalarEdit[Scalar mask / prefactor edit]
  Base --> LoRA[LoRA or adapter update]
  Base --> Dense[Dense or sparse full-parameter update]
  ScalarEdit --> Behavior[Target behavior change]
  LoRA --> Behavior
  Dense --> Behavior

The point is not that scalar edits are always better than LoRA. The point is that a decomposed component basis can make the edit target legible before fine-tuning. In this case, autointerp labels showed that most top-ranked components were foreign-language-general rather than German-specific, so the authors narrowed the edit to a single component and reduced off-target damage.

Relationship To Nearby Sources

Reinforcement Learning Finetunes Small Subnetworks shows that some RL post-training updates are sparse but full-rank and broadly distributed. This source is different: it edits an interpreted decomposition basis rather than discovering a sparse final update mask after training.
The Universal Weight Subspace Hypothesis studies reusable low-rank update subspaces and adapters. This source is a counterpoint because the useful basis is learned through mechanistic decomposition of the target model, not by PCA/HOSVD over many task adapters.
Synthetic Data for any Differentiable Target treats data as a hidden control channel into model weights. This source treats a decomposed weight component as an explicit edit handle. Both warn that visible token count is not the whole training-effect story.
LLM Post-Training should now track decomposition-basis editing as a separate adaptation interface from SFT, LoRA, RL, ES, fast context, and synthetic-data metagradients.

Foundation TSFM Relevance

This is language-model mechanistic-interpretability evidence, not direct progress on numeric time-series foundation models. Its value to the foundation TSFM agenda is as an adjacent update-geometry and controllable-editing pattern.

Agenda slot	Verdict	Evidence	Missing pieces
Dynamic adaptation and update geometry	adjacent	Shows a scalar edit in a decomposed weight basis can outperform LoRA on a narrow target/off-target trade-off for a small LM.	Needs TSFM or world-model checkpoints decomposed into meaningful components, with matched decomposition-plus-edit cost.
Representation and mechanism interpretability	adjacent	VPD aims to find causally used parameter subcomponents rather than activation features.	Needs evidence that such subcomponents correspond to numeric features, regimes, channels, event streams, or latent dynamics in time-series models.
Data diversity and long tail	warning	The edit targets German but also affects Italian because the component is not purely German.	Capability/component labels need rare-regime and neighboring-domain retention tests before deployment.
Control and counterfactuals	insufficient evidence	The experiment edits a language capability but does not model actions, control inputs, interventions, or next-state dynamics.	Need action-conditioned world-model experiments where component edits change controllable rollout behavior without collateral damage.

Limitations And Gotchas

This is an exploratory hackathon case study on one small language model and one decomposition.
The apparent four-token edit is not the total cost: VPD decomposition, causal-importance training, and autointerpretation consumed prior data and compute.
LoRA can catch up or overtake at larger German-token budgets, especially if the preservation objective protects the right off-target domains.
The final single-component edit still damages Italian, exposing the gap between an autointerp label and the full causal role of a component.
Lee Sharkey’s compute estimate for VPD is explicitly low-confidence and not a measured scaling law.
The VPD report and the German-removal post are current Goodfire-authored reports, not independent replications or venue-reviewed results.
Nothing in the source shows that parameter decomposition scales cleanly to billion-parameter or frontier-scale models.

Links Into The Wiki

Open Questions

How often do decomposed weight subcomponents isolate a target capability cleanly enough to beat LoRA under matched total compute?
Can component-basis editing preserve a broad protected set of neighboring languages, rare regimes, and safety behaviors without enumerating them in the loss?
Are VPD subcomponents stable across model scales, seeds, checkpoints, and training corpora?
Can the decomposition be made cheap enough that the amortization story dominates the upfront cost?
What is the time-series analogue of a German-specific component: a channel family, regime, event-stream pattern, anomaly type, intervention response, or latent-state transition?

Alex Open Research Wiki

Explorer

Exploration: Fine-Tuning With Parameter Decomposition

Exploration: Fine-Tuning With Parameter Decomposition

Source

Status And Credibility

Core Claim

Evidence

Why It Matters

Relationship To Nearby Sources

Foundation TSFM Relevance

Limitations And Gotchas

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks