HydraLoRA; An Asymmetric LoRA Architecture for Efficient Fine-Tuning (NeurIPS 2024)Permalink

Tian, Chunlin, et al. "HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning." NeurIPS (2024)

( https://arxiv.org/pdf/2404.19245 )


ContentsPermalink

  • (1) Abstract
  • (2) Limitation of LoRA
  • (3) HydraLoRA
    • LoRA
    • HydraLoRA
  • (4) Workflow of HydraLoRA
    • Fine-tuning
    • Inference


1. AbstractPermalink

(1) LoRA: Widely used Parameter-Efficient Fine-Tuning (PEFT) technique

(2) Limitation of LoRA: Often underperform compared to full fine-tuning

  • ( especially in complex datasets )

(3) Proposal: HydraLoRA

  • LoRA framework with an asymmetric structure that eliminates the need for domain expertise


2. Limitation of LoRAPermalink

Underperform compared to full fine-tuning, especially in heterogeneous datasets

figure2


3. HydraLoRAPermalink

figure2


(1) LoRAPermalink

y=y+Δy=W0x+BAx,

  • yRd: output

  • xRk: input

  • BRd×r,ARr×k with rmin(d,k).

    • B is initialized with zeroes

    • A is initialized with Kaiming Uniform [14]

      to force Δy=0 at the beginning


(2) HydraLoRAPermalink

W=W0+ΔW=W0+Ni=1ωiBiA.

  • BiRd×r N matrices
  • ARr×k. single matrix (shared)
  • ωi: modulates these contribution weights for head Bi


4. Workflow of HydraLoRAPermalink

figure2


(1) Fine-tuningPermalink

MoE (Mixture-of-Experts) = Experts are selectvely activated by a gating mechanism (router)


a) Set of expertsPermalink

To achieve a unified approach of multiple B matrices…

Define a set of experts: (E1,,EN)


Interpretation

  • (1) Shared matrix A : inherently captures collaborative knowledge to augment intra-gains
  • (2) Different matrices B : foster knowledge modularity to mitigate fine-tuning inter-offsets


b) RouterPermalink

ωi=softmax(WTgx).

  • trainable weights (transformation matrix) WgRr×N

becomes a gating scores (ω1,,ωN)


c) HydraLoRAPermalink

y=W0x+Ni=1ωiEiAx(MoE).

  • where N denotes the number of experts, i.e., B matrices.


(2) InferencePermalink

Merges adapters by enabling routing computation based on the input!

Categories: , ,

Updated: