Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization.
Jinhao LiShiyao LiJiaming XuShan HuangYaoxiu LianJun LiuYu WangGuohao DaiPublished in: CoRR (2023)
Keyphrases
- shift register
- random access memory
- computational power
- post processing
- outlier detection
- memory requirements
- high dimensional
- sparse representation
- memory usage
- compressive sensing
- memory space
- memory size
- parallel processing
- random number generator
- power consumption
- main memory
- preprocessing
- image alignment
- random access