Pruning One More Token is Enough: Leveraging Latency-Workload Non-Linearities for Vision Transformers on the Edge.
Nick John EliopoulosPurvish JajalJames C. DavisGaowen LiuGeorge K. ThiravathukalYung-Hsiang LuPublished in: CoRR (2024)
Keyphrases
- response time
- search space
- vision system
- edge information
- edge detection
- real time
- low overhead
- low latency
- computer vision
- pruning method
- image processing
- weighted graph
- disjoint paths
- visual perception
- pruning algorithm
- visual field
- high throughput
- edge weights
- prefetching
- level set
- data warehouse
- genetic algorithm
- neural network