Company-Local Block-Wise Fine-Tuning

Status: draft research direction.

Collaboration

If this direction resonates with you, I would be happy to talk with like-minded people, collaborate on research, and work on use-cases together.

Ideas are not the bottleneck. Hands are. Time-series modeling should be moving at least as fast as vision, audio, and robotics.

Motivation

Many companies will not send proprietary data outside their own boundary. For foundation models, this creates a practical research target: adapt a shared model to company-specific data while keeping raw data on-premise.

The interesting version is not ordinary centralized fine-tuning. The target is a training contract where the data-touching part runs inside the company, while the outside model owner or coordinator receives only bounded training signals: gradients, low-rank updates, adapter deltas, secure aggregates, or other update summaries.

DiffusionBlocks makes this worth tracking because it turns a deep residual network into independently trainable blocks. That does not make the method private. It does suggest a new boundary: if blocks can optimize local denoising objectives without full end-to-end activation storage, maybe some blocks or adapters can be trained where the data lives.

The Universal Weight Subspace Hypothesis adds another possible update boundary: export coefficients or residuals in a learned adapter basis instead of full deltas. This is not privacy evidence; coefficient, residual, shared-mean, and basis-selection leakage still need explicit tests.

Core Hypothesis

If a pretrained model can be converted into a block-wise trainable system, then enterprise adaptation might split along block boundaries:

private company data
  -> local block or local adapter training
  -> bounded update signal
  -> global model coordination
  -> returned improved checkpoint or adapter

The research question is whether this split can preserve utility without leaking sensitive data through the update signal.

Dynamic Curriculum Learning For JEPA adds a complementary mechanism. If the external coordinator cannot inspect or label the company corpus, the local worker needs a way to decide which unlabeled windows should produce gradients or adapter deltas. Surprise-band filtering can make that decision inside the company boundary before any update signal leaves.

Possible System Shape

flowchart LR
  Data[Company data stays on-premise]
  Local[Local data-touching block or adapter]
  Guard[Privacy guard and update filter]
  Update[Gradient or adapter delta]
  Coord[External coordinator]
  Global[Shared base model]
  Eval[Local evaluation and approval]
  Data --> Local
  Local --> Guard
  Guard --> Update
  Update --> Coord
  Coord --> Global
  Global --> Eval
  Eval --> Local

The boundary can be placed at different levels:

early blocks that see raw inputs;
late blocks or task heads that see more abstract representations;
LoRA-style adapters inside blocks;
a private local denoising head attached to a shared frozen backbone;
a secure aggregation layer over multiple company-local updates.

Why DiffusionBlocks Is Only A Starting Point

The paper’s evidence is about memory-efficient training, not enterprise privacy. Its strongest current relevance is structural:

independent block objectives reduce the need to backpropagate through the full model;
the active block can be trained without computing gradients for every other block;
blocks can be trained in parallel when compute exists;
pretrained conversion is explicitly named as future work.

The private-training version still needs its own evidence:

gradients and update deltas can leak training data;
block boundaries may not align with privacy boundaries;
local objectives may hurt cross-block coordination;
company-specific updates may overfit private distributions;
local filtering may reveal private distribution statistics if curriculum metadata is exported;
evaluation must prove both utility and leakage resistance.

Relation To Foundation TSFM Agenda

This is an idea page, so the verdicts below describe the intended contribution if the proposed system works. Evidence status is recorded separately in the Evidence and Missing pieces columns.

Agenda slot	Verdict	Evidence	Missing pieces
Context interface	partially closes	Proposes keeping company-specific schemas, metric names, topology, and business context local while exposing bounded update signals.	Need concrete local-context adapter and leakage tests.
Streaming state, long context, and constant updates	partially closes	If validated, local blocks could adapt to changing telemetry or event-stream distributions without centralizing raw streams.	Need online update protocol, rollback, drift detection, and approval workflow.
Control and counterfactuals	adjacent	Enterprise systems need action-conditioned models that can adapt to local interventions and policies.	Need typed action logs and counterfactual evaluation; block-wise training alone does not solve this.
Benchmarks for action-conditioned models	adjacent	Could motivate private/on-prem benchmark protocols for observability and business-process data.	Need benchmark design where raw data cannot leave the tenant boundary.

Minimal Experiments

Convert a small pretrained residual model into a DiffusionBlocks-style model through fine-tuning, not training from scratch.
Compare local block/adapters against ordinary LoRA, full fine-tuning, and frozen-feature heads at matched update budget.
Run membership-inference and gradient-inversion probes on the exported update signals.
Test secure aggregation across simulated tenants with different data distributions.
Compare local uniform sampling against local surprise-band filtering before exporting gradients or adapter deltas.
For time-series use cases, evaluate rare-regime retention, channel-specific deviations, event streams, and intervention histories rather than only average forecast error.

Open Questions

Which update signal is safe enough to leave the company: raw gradients, low-rank deltas, quantized deltas, secure aggregates, or distilled synthetic examples?
Can block-wise objectives preserve cross-block coordination when only some blocks see private data?
Does a local block learn company-specific state, or does it merely memorize private identifiers?
Which private examples should be allowed to produce exported updates when the coordinator cannot inspect the corpus?
If tenant adapters are projected into a shared weight-update basis, what leaks through the shared mean, basis coefficients, reconstruction residuals, or basis-selection process?
How should a company approve, reject, or roll back returned updates?
Can the same protocol support multiple tenants without creating a shared-model privacy leak?

Alex Open Research Wiki

Explorer

Company-Local Block-Wise Fine-Tuning

Company-Local Block-Wise Fine-Tuning

Collaboration

Motivation

Core Hypothesis

Possible System Shape

Why DiffusionBlocks Is Only A Starting Point

Relation To Foundation TSFM Agenda

Minimal Experiments

Open Questions

Graph View

Table of Contents

Backlinks

Alex Open Research Wiki

Explorer

Company-Local Block-Wise Fine-Tuning

Company-Local Block-Wise Fine-Tuning

Collaboration

Motivation

Core Hypothesis

Possible System Shape

Why DiffusionBlocks Is Only A Starting Point

Relation To Foundation TSFM Agenda

Minimal Experiments

Open Questions

Related Pages

Graph View

Table of Contents

Backlinks