Cost-Effective Hyperparameter Tuning for LLMs on a Budget

Large language models (LLMs) like GPT-3 offer impressive text generation capabilities. But with API pricing tied to compute usage, heavy costs limit wider adoption of LLMs. How can we maximize the value extracted from these models under budget constraints?

A new paper from Microsoft Research tackles this challenge. Titled "Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference", it presents EcoOptiGen - a framework to optimize hyperparameters like maximum tokens, temperature, and number of responses to improve utility per query.

Optimizing the Black Box

EcoOptiGen poses hyperparameter tuning of LLMs as a constrained blackbox optimization problem:

Objective: Maximize utility score (accuracy, summary quality, etc)
Constraints: Average and total inference budgets

It employs two key techniques:

1. Economical Search with BlendSearch

Since querying an LLM is expensive, the search algorithm must be sample efficient. EcoOptiGen uses a method called BlendSearch that combines:

Bayesian Optimization: Builds a probabilistic model to estimate utility. Chooses informative points to evaluate.
Local Search: Efficiently probes near current best configs leveraging gradient information.

By blending global modeling with localized climbing, BlendSearch can quickly hone in on promising solutions.

2. Pruning

EcoOptiGen's configuration evaluator aggressively eliminates invalid candidates using:

Early termination when statistical bounds violated
Progressive sampling of data and responses
Leveraging past results to prune obvious losers

This focuses budget on more promising configurations.

Fighting Wasted Queries

EcoOptiGen's configuration evaluator uses the following pruning techniques to cut costs:

Initial check: Eliminate obvious losers based on past results.
Gradual increase: Slowly grow examples and responses per config, stopping early if constraints violated.
Statistical bounds: Use confidence intervals to terminate unpromising configs.

Through these tricks, bad configs are discarded without fully evaluating across all data. More budget goes to good ones.

Better Tuning, Lower Costs

Experiments show EcoOptiGen substantially improves utility within inference budget limits. The code is open source in the FLAML library.

Proper hyperparameter tuning unlocks more value from large language models. EcoOptiGen offers an automated approach to maximize utility per query cost.

References

Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference.Link to paper
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. Link to paper

Created 2023-10-18T21:35:28-07:00, updated 2023-11-16T19:08:14-08:00

Cost-Effective Hyperparameter Tuning for LLMs on a Budget

Optimizing the Black Box

Fighting Wasted Queries

Better Tuning, Lower Costs

References

Related