Cost-Effective Hyperparameter Tuning for LLMs on a Budget

Large language models (LLMs) like GPT-3 offer impressive text generation capabilities. But with API pricing tied to compute usage, heavy costs limit wider adoption of LLMs. How can we maximize the value extracted from these models under budget constraints?

A new paper from Microsoft Research tackles this challenge. Titled "Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference", it presents EcoOptiGen - a framework to optimize hyperparameters like maximum tokens, temperature, and number of responses to improve utility per query.

Optimizing the Black Box

EcoOptiGen poses hyperparameter tuning of LLMs as a constrained blackbox optimization problem:

It employs two key techniques:

1. Economical Search with BlendSearch

Since querying an LLM is expensive, the search algorithm must be sample efficient. EcoOptiGen uses a method called BlendSearch that combines:

By blending global modeling with localized climbing, BlendSearch can quickly hone in on promising solutions.

2. Pruning

EcoOptiGen's configuration evaluator aggressively eliminates invalid candidates using:

This focuses budget on more promising configurations.

Fighting Wasted Queries

EcoOptiGen's configuration evaluator uses the following pruning techniques to cut costs:

Through these tricks, bad configs are discarded without fully evaluating across all data. More budget goes to good ones.

Better Tuning, Lower Costs

Experiments show EcoOptiGen substantially improves utility within inference budget limits. The code is open source in the FLAML library.

Proper hyperparameter tuning unlocks more value from large language models. EcoOptiGen offers an automated approach to maximize utility per query cost.

References

Related

Created 2023-10-18T21:35:28-07:00, updated 2023-11-16T19:08:14-08:00 · History · Edit