Sarcouncil Journal of Engineering and Computer Sciences

Sarcouncil Journal of Engineering and Computer Sciences

An Open access peer reviewed international Journal
Publication Frequency- Monthly
Publisher Name-SARC Publisher

ISSN Online- 2945-3585
Country of origin-PHILIPPINES
Impact Factor- 3.7
Language- English

Keywords

Editors

Post-Training Optimization Techniques for AI Models: A Comprehensive Framework

Keywords: Post-Training Optimization, Model Quantization, Parameter Efficiency, Deployment Frameworks, Inference Acceleration.

Abstract: Post-training optimization techniques play a crucial role in transforming trained AI models into practical systems that can be deployed in production. The article introduces a strata model that combines model, runtime, and system-level strategies to assist AI practitioners in developing high-performance systems that can efficiently address resource constraints. Post-training quantization (PTQ), sparsity pruning, low-rank adaptation (LoRA), and knowledge distillation are some of the model-level methods that help improve parameter efficiency. Compiler-time optimizations include fusion of operators, restructuring of memory layout, constant folding, auto-tuning of kernels, and compiler re-architecture to improve computational efficiency. System-level strategies, such as dynamic batching, KV cache reuse, paged attention, request coalescing, and model routing, are used to optimize models for a particular deployment environment, enabling efficient resource utilization and an optimal user experience. The article introduces a systematic approach to trade-off analysis between quality and latency, throughput and memory, energy consumption, and cost, which enables AI practitioners to make informed decisions based on optimization, ensuring efficiency and reliability in various applications and serving infrastructure.

Home

Journals

Policy

About Us

Conference

Contact Us

EduVid
Shop
Wishlist
0 items Cart
My account