Large Language Models

Offload

1 minute read

Offload, DeepSpeed

Quantization

less than 1 minute read

Float32 vs Float16 vs BFloat16

VQ-VAE

1 minute read

Neural Discrete Representation Learning (NeurIPS 2017)