Qwen3-Next-80B-A3B-Instruct — LoRA adapter trained on energy & infrastructure PE data
March 3, 2026 — RunPod 2x A100 80GBWe fine-tuned Qwen3-Next-80B-A3B-Instruct — an 80B-parameter Mixture-of-Experts model with only 3B active per token — on 78 curated PE/energy examples. The training used LoRA (Low-Rank Adaptation) to teach the model energy PE domain knowledge without modifying the base weights. Only the attention layers and shared expert MLP were trained. The 512 routed experts (the model's core knowledge) were kept frozen.
Qwen3-Next-80B has a hybrid attention design: 36 DeltaNet layers (linear attention, fast) and 12 GQA layers (full attention, precise). Every layer has a shared expert MLP plus 512 routed experts. We targeted LoRA adapters precisely to avoid corrupting the routing.
| Component | Layers | Modules Targeted | Status |
|---|---|---|---|
| DeltaNet Attention | 36 of 48 | in_proj_qkvz in_proj_ba out_proj |
Trained |
| GQA Attention | 12 of 48 | q_proj k_proj v_proj o_proj |
Trained |
| Shared Expert MLP | 48 of 48 | gate_proj up_proj down_proj |
Trained |
| Routed Experts (512) | 48 of 48 | switch_mlp.experts.* |
Frozen |
| Router Gate | 48 of 48 | mlp.gate |
Frozen |
1e-4 · Batch: 1 × 4 grad accum ·
Warmup: 5 steps · Optimizer: adamw_8bit ·
LoRA rank: 32 · rsLoRA: on
device_map="auto" ·
bf16 (not QLoRA) · gradient checkpointing ·
use_reentrant=True for DeltaNet
This was the first known LoRA fine-tune of Qwen3-Next-80B. There's no existing guide for it. Every issue below was discovered and fixed during the session.
max_memory={0:"76GiB", 1:"76GiB", cpu:"30GiB"} prevents OOM during weight conversion across 2 GPUs.init_empty_weights. Monkey-patched to None for pure PyTorch fallback.target_modules uses re.fullmatch, not substring match. Required precise regex to hit shared expert while avoiding routed experts.use_reentrant=True.tokenizer→processing_class, max_seq_length→max_length.gate_up_proj failed conversion, left on meta device. Manually fused from safetensor shards.Total: ~$43 of $100 budget — RunPod balance: $56.84. Pod stopped and no longer billing.
You now have a 175.9 MB LoRA adapter that, when loaded on top of the base Qwen3-Next-80B model, gives it specialized knowledge in PE/energy investing. The adapter modifies how the model processes and generates text about CCGT portfolios, FERC regulations, capacity markets, PPA structures, IRR analysis, and other PE-specific concepts — while preserving all of its general intelligence.
Think of it like this: the base model is a brilliant generalist analyst. The LoRA adapter is 3 months of PE deal training — it doesn't make the analyst dumber at everything else, it just makes them significantly sharper on energy infrastructure deals.
d1dyb9zs17myga is stopped (not billing). The adapter is saved at /workspace/pe-lora-adapter on the pod's container disk. The pod must be restarted briefly to download the files, then terminated. Don't forget to terminate (not just stop) once the adapter is downloaded.