On the Duality between Gradient Transformations and Adapters
Paper
by Lucas Torroba-Hennigen, Hunter Lang, Han Guo, Yoon Kim
preprint on arXiv, February 2025
We explore a relationship between training with gradient transformations and with one-sided adapters, and use it to derive more memory-efficient single-node and distributed pretraining methods.