paper · 2026

Lossless Mechanistic Compression and Surgical Correction of Medical Imaging Models

DenseNet121 chest X-ray classifier compressed 51.43% (6.97M → 3.38M parameters) at constant accuracy (AUROC +0.0003). Compression makes channel attribution atomic enough for surgical correction: 5-channel weight zeroing reduces a target false positive by ∆prob −0.13 with zero true-positive loss and exactly zero AUROC change on the other 13 pathologies. Polarized channels reframed as bipolar discriminative axes exploiting label mutual exclusivity (Jaccard < 0.1 in 89 of 100 polarized channels, zero architectural conflicts).

What this paper does

CheXNet (DenseNet121) is widely deployed for multi-label thoracic pathology classification but has 6.97M parameters and opaque debugging. This paper presents a unified framework that addresses three orthogonal challenges — footprint, opacity, and expensive retraining — in a single mechanistic system:

  1. Lossless compression — 51.43% parameter reduction (6.97M → 3.38M) via channel-wise sparsity-constrained weight reconstruction on NIH ChestX-ray14. Output equivalence to numerical precision (max |Δlogit| < 5×10⁻⁶). Latency reduced from 16.07 ms to 15.54 ms.

  2. Surgical correction — classifier-channel attribution + selective weight zeroing. A 5-channel correction reduces a target false positive by Δprob −0.13 with zero true-positive loss and exactly zero AUROC change on the other 13 pathologies (by construction — only one classifier row is modified).

  3. Cost-aware Treatment Decision System — routes each pathology issue to its cheapest effective intervention: threshold calibration, surgical correction, partial retraining, or augmentation. Empirical cost matrix revises initial estimates (notably: retrain_part cost 6 → 2, just 85 s).

  4. Clinical report auto-generation — channel-level evidence, Grad-CAM region mapping, mutual-exclusivity-based exclusion.

Reframing polysemanticity — mutual exclusivity exploitation

A classical concern about polysemantic channels (Olah et al.) is that a single channel responds to unrelated concepts. In the multi-label medical setting we find something qualitatively different: polarized channels (with both strong positive and strong negative weights across different pathologies) are not architectural conflicts. They are bipolar discriminative axes between mutually exclusive labels.

Jaccard-based legitimacy classification of 100 polarized classifier channels:

CategoryCountThreshold
Perfect48J = 0
Legitimate41J < 0.1
Mixed110.1 ≤ J < 0.3
Conflict0J ≥ 0.3

The model is efficient, not confused. Hernia and Cardiomegaly almost never co-occur (J = 0); encoding them on one bipolar channel saves capacity.

Minimal retraining — when to use which tool

The empirical cost matrix decomposition:

TreatmentEmpirical costTimeTrainable params
No action00
Threshold calibration1<1 s
Surgical correction2<5 s5–10 (zeroed)
Partial retraining285 s18K (0.55%)
F re-optimization3minuteslayer-dependent
Data augmentation8hours
Full retraining10hours3.38M (100%)

Classifier-only fine-tune (18K params, 0.55%, 85 s) + threshold calibration via Youden’s J yields F1 × 1.6 and recall × 7 over the default threshold-0.5 evaluation. Adding deeper layers to the fine-tune set yields no measurable additional benefit on our test split.

Verify

  • Zenodo — paper PDF, permanent DOI for citation
  • Model weights for the compressed backbone and the fine-tuned classifier are released alongside the paper

Status

Korean patent application pending; PCT international filing planned within the priority year. The author is actively seeking collaborators in medical imaging AI, model compression for edge deployment, and integrated diagnostic systems.