GA-LoRA: Geometry-Aware Correction for LoRA-Induced Alignment Distortions
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Aligning large language models (LLMs) with multiple behavioral objectives, such as helpfulness, harmlessness, honesty, and reference preservation, is increasingly performed using parameter-efficient fine-tuning methods such as Low-Rank Adaptation (LoRA). Although LoRA reduces computational and memory cost, it also constrains updates to a low-dimensional adaptation subspace. This thesis studies the hypothesis that this constraint is not merely an efficiency mechanism, but can reshape the effective optimization geometry of multi-objective alignment. In particular, we characterize two forms of LoRA-induced distortion: directional distortion, where the accessible update deviates from the intended full-space update, and trade-off distortion, where different objective gradients are attenuated unequally, thereby changing their realized influence during training.
To mitigate these effects, we propose GA-LoRA, a geometry-aware framework for LoRA-based alignment. GA-LoRA combines two complementary mechanisms. The first is objective-level reweighting: GA-LoRA-Norm balances gradient magnitudes within the LoRA trainable space, while GA-LoRA-Geo estimates per-objective projection attenuation and rescales objective weights using inverse attenuation factors. The second is adaptive subspace refinement, which periodically refactors accumulated LoRA updates using randomized singular value decomposition (SVD) to better align the adapter subspace with realized update directions.
We evaluate GA-LoRA on four instruction-tuned models, Llama 3.2 1B/3B and Qwen 3 4B/8B, fine-tuned with Direct Preference Optimization (DPO) on preference data targeting helpfulness, honesty, and harmlessness. Across preference-internal evaluations, GA-LoRA-Geo generally improves preference accuracy and often reduces reference drift relative to standard LoRA, with particularly strong gains on harmlessness-labeled validation pairs and larger models. Ablation studies indicate that objective reweighting and subspace refinement provide complementary benefits. Downstream alignment benchmarks show more heterogeneous behavior, highlighting that geometry-aware correction improves the fidelity of the optimization process but does not by itself guarantee uniform behavioral gains. Overall, the results show that LoRA-induced geometry is an important and measurable factor in multi-objective alignment, and that projection-aware correction is a promising direction for safer and more faithful parameter-efficient fine-tuning.