• search
    search
  • reviewers
    reviewers
  • feeds
    feeds
  • assignments
    assignments
  • settings
  • logout

Ps and Qs: Quantization-Aware Pruning for Efficient Low Latency Neural Network Inference.

Benjamin HawksJavier M. DuarteNicholas J. FraserAlessandro PappalardoNhan TranYaman Umuroglu
Published in: Frontiers Artif. Intell. (2021)
Keyphrases
  • low latency
  • neural network
  • high throughput
  • highly efficient
  • high bandwidth
  • cost effective
  • massive scale
  • data sets
  • databases
  • search space
  • web services
  • sensor networks