Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference.
Hao Mark ChenWayne LukKa Fai Cedric YiuRui LiKonstantin MishchenkoStylianos I. VenierisHongxiang FanPublished in: CoRR (2024)
Keyphrases
- memory efficient
- parallel hardware
- multithreading
- massively parallel
- low cost
- multi core processors
- computer architecture
- high end
- external memory
- iterative deepening
- real time
- parallel computation
- hardware and software
- shared memory
- processing units
- parallel architectures
- parallel computing
- computing power
- multiple sequence alignment
- decoding process
- computer systems
- parallel execution
- hardware architecture
- parallel processing
- belief networks
- depth first search
- hardware implementation
- bayesian inference
- parallel programming
- inference process
- integral image
- data acquisition
- multi dimensional
- parallel implementation
- bayesian networks