Large language models (LLMs) like GPT-3 offer impressive text generation capabilities. But with API pricing tied to compute usage, heavy costs limit wider adoption of LLMs. How can we maximize the value extracted from these models under budget constraints?
A new paper from Microsoft Research tackles this challenge. Titled "Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference", it presents EcoOptiGen - a framework to optimize hyperparameters like maximum tokens, temperature, and number of responses to improve utility per query.
EcoOptiGen poses hyperparameter tuning of LLMs as a constrained blackbox optimization problem:
It employs two key techniques:
1. Economical Search with BlendSearch
Since querying an LLM is expensive, the search algorithm must be sample efficient. EcoOptiGen uses a method called BlendSearch that combines:
Bayesian Optimization: Builds a probabilistic model to estimate utility. Chooses informative points to evaluate.
Local Search: Efficiently probes near current best configs leveraging gradient information.
By blending global modeling with localized climbing, BlendSearch can quickly hone in on promising solutions.
2. Pruning
EcoOptiGen's configuration evaluator aggressively eliminates invalid candidates using:
This focuses budget on more promising configurations.
EcoOptiGen's configuration evaluator uses the following pruning techniques to cut costs:
Initial check: Eliminate obvious losers based on past results.
Gradual increase: Slowly grow examples and responses per config, stopping early if constraints violated.
Statistical bounds: Use confidence intervals to terminate unpromising configs.
Through these tricks, bad configs are discarded without fully evaluating across all data. More budget goes to good ones.
Experiments show EcoOptiGen substantially improves utility within inference budget limits. The code is open source in the FLAML library.
Proper hyperparameter tuning unlocks more value from large language models. EcoOptiGen offers an automated approach to maximize utility per query cost.
Created 2023-10-18T21:35:28-07:00, updated 2023-11-16T19:08:14-08:00 · History · Edit