"latency optimization" Papers
5 papers found
Conference
CONGO: Compressive Online Gradient Optimization
Jeremy Carleton, Prathik Vijaykumar, Divyanshu Saxena et al.
ICLR 2025arXiv:2407.06325
IFORMER: INTEGRATING CONVNET AND TRANSFORMER FOR MOBILE APPLICATION
Chuanyang Zheng
ICLR 2025arXiv:2501.15369
5
citations
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models
Yonggan Fu, Xin Dong, Shizhe Diao et al.
NEURIPS 2025arXiv:2511.18890
2
citations
Preble: Efficient Distributed Prompt Scheduling for LLM Serving
Vikranth Srivatsa, Zijian He, Reyna Abhyankar et al.
ICLR 2025arXiv:2407.00023
46
citations
Till the Layers Collapse: Compressing a Deep Neural Network Through the Lenses of Batch Normalization Layers.
Zhu Liao, Nour Hezbri, Victor Quétu et al.
AAAI 2025paperarXiv:2412.15077