PE Fine-Tuning Complete

Qwen3-Next-80B-A3B-Instruct — LoRA adapter trained on energy & infrastructure PE data

March 3, 2026 — RunPod 2x A100 80GB
1.003
Final Loss
81.4%
Token Accuracy
175.9MB
Adapter Size
~$43
Total Cost

We fine-tuned Qwen3-Next-80B-A3B-Instruct — an 80B-parameter Mixture-of-Experts model with only 3B active per token — on 78 curated PE/energy examples. The training used LoRA (Low-Rank Adaptation) to teach the model energy PE domain knowledge without modifying the base weights. Only the attention layers and shared expert MLP were trained. The 512 routed experts (the model's core knowledge) were kept frozen.

model architecture

Qwen3-Next-80B has a hybrid attention design: 36 DeltaNet layers (linear attention, fast) and 12 GQA layers (full attention, precise). Every layer has a shared expert MLP plus 512 routed experts. We targeted LoRA adapters precisely to avoid corrupting the routing.

Component Layers Modules Targeted Status
DeltaNet Attention 36 of 48 in_proj_qkvz in_proj_ba out_proj Trained
GQA Attention 12 of 48 q_proj k_proj v_proj o_proj Trained
Shared Expert MLP 48 of 48 gate_proj up_proj down_proj Trained
Routed Experts (512) 48 of 48 switch_mlp.experts.* Frozen
Router Gate 48 of 48 mlp.gate Frozen
training pipeline
Input
78 Examples
ChatML format
system + user + assistant
Tokenize
Chat Template
2048 max seq len
right-padded
Train
SFTTrainer
5 epochs, cosine LR
bf16 + grad ckpt
Output
LoRA Adapter
175.9 MB
rank-32, rsLoRA
Hyperparameters
LR: 1e-4 · Batch: 1 × 4 grad accum · Warmup: 5 steps · Optimizer: adamw_8bit · LoRA rank: 32 · rsLoRA: on
Infrastructure
2× A100 80GB PCIe · device_map="auto" · bf16 (not QLoRA) · gradient checkpointing · use_reentrant=True for DeltaNet
training results
2.0 1.5 1.0 0.5
0.66
Epoch 1 Epoch 2 Epoch 3 Epoch 4 Epoch 5
Loss Progression
Epoch 1: ~2.0 → 1.5
Epoch 2: 1.5 → 1.2
Epoch 3: 1.2 → 1.0
Epoch 4: 1.0 → 0.8
Epoch 5: 0.8 → 0.66
Interpretation
Loss dropped steadily across all 5 epochs with no signs of overfitting (no loss spike or plateau-then-rise). The curve shows healthy learning. With only 78 examples and 5 epochs, the model saw each example ~5 times — enough to absorb PE patterns without memorizing exact text.
engineering challenges

This was the first known LoRA fine-tune of Qwen3-Next-80B. There's no existing guide for it. Every issue below was discovered and fixed during the session.

cost
GPU Compute
Debugging
Idle
~$30 GPU compute (training + model loading)
~$9 debugging & iteration (5 attempts)
~$4 idle time between attempts

Total: ~$43 of $100 budget — RunPod balance: $56.84. Pod stopped and no longer billing.

what this means

You now have a 175.9 MB LoRA adapter that, when loaded on top of the base Qwen3-Next-80B model, gives it specialized knowledge in PE/energy investing. The adapter modifies how the model processes and generates text about CCGT portfolios, FERC regulations, capacity markets, PPA structures, IRR analysis, and other PE-specific concepts — while preserving all of its general intelligence.

Think of it like this: the base model is a brilliant generalist analyst. The LoRA adapter is 3 months of PE deal training — it doesn't make the analyst dumber at everything else, it just makes them significantly sharper on energy infrastructure deals.

next steps
01
Download Adapter
Pull the 175.9 MB adapter from the stopped RunPod pod to Mac Mini. The pod is stopped but data persists until terminated.
02
Convert to MLX
Convert the PyTorch LoRA adapter to MLX format so it can run on the Mac Studio with the base 80B 5-bit model.
03
Test & Evaluate
Run the fine-tuned model through PE-specific prompts and compare against the base model. Run v5 eval harness for quality metrics.
Pod status: The RunPod pod d1dyb9zs17myga is stopped (not billing). The adapter is saved at /workspace/pe-lora-adapter on the pod's container disk. The pod must be restarted briefly to download the files, then terminated. Don't forget to terminate (not just stop) once the adapter is downloaded.