Amazon Bedrock introduces new advanced prompt optimization and migration tool
- Amazon Web Services (AWS) released Amazon Bedrock Advanced Prompt Optimization on May 14, 2026.
- The tool is designed to assist users who are migrating to a new model or seeking to improve the performance of their current model.
- To function, the prompt optimizer requires a prompt template, example user inputs for variable values, ground truth answers, and a specific evaluation metric to serve as a guide.
Amazon Web Services (AWS) released Amazon Bedrock Advanced Prompt Optimization on May 14, 2026. This new tool allows developers to optimize prompts for any model available on Amazon Bedrock and compare the performance of original prompts against optimized versions across up to five models simultaneously.
The tool is designed to assist users who are migrating to a new model or seeking to improve the performance of their current model. By testing these prompts, developers can ensure there are no regressions in known use cases while improving tasks that were previously underperforming.
To function, the prompt optimizer requires a prompt template, example user inputs for variable values, ground truth answers, and a specific evaluation metric to serve as a guide. The tool also supports multimodal user inputs, including JPG, PNG, and PDF files, enabling the optimization of prompts for document and image analysis tasks.
The optimization process operates through a metric-driven feedback loop. Amazon Bedrock automatically sends prompt templates and example data to inference models, evaluates the responses using the provided metric, and rewrites the prompt to optimize the resulting model responses. The final output provides the original and final prompt templates, evaluation scores, latency, and cost estimates.
Developers can guide the optimization process using three different evaluation methods:
- Lambda functions: For concrete metrics such as execution accuracy, F1, accuracy, or structured-JSON match, users can deploy a Lambda function containing custom Python scoring logic. This logic programmatically compares model outputs against reference responses.
- LLM-as-a-Judge: For open-ended tasks like reasoning explanations, generation, or summarization, users can define named metrics with a rating scale and structured instructions in a rubric. A Bedrock judge model evaluates each prompt-response pair and provides a score with reasoning. While users can select their own judge model, the default is Claude Sonnet 4.6.
- Steering criteria: For requirements such as safety constraints, format, or brand voice, users can provide free-form natural language criteria. A default LLM-as-a-judge prompt, utilizing Anthropic Claude Sonnet 4.6, evaluates the responses holistically based on these criteria.
Users can initiate the process through the Amazon Bedrock console by selecting Create prompt optimization or by using the CreateAdvancedPromptOptimizationJob API. Prompt templates must be prepared in JSONL format, with each JSON object on a single line.
The system allows for the direct upload of files or the import of prompt templates from Amazon Simple Storage Service (Amazon S3). Users must also set an S3 output location where the evaluation data and prompt optimization results will be stored.
Amazon Bedrock Advanced Prompt Optimization is available in several global regions, including US East (N. Virginia, Ohio), US West (Oregon), Canada (Central), Europe (Frankfurt, Ireland, London, Zurich), South America (São Paulo), and Asia Pacific (Mumbai, Seoul, Singapore, Sydney, Tokyo).
Pricing for the tool is based on the Bedrock model-inference tokens consumed during the optimization process, which are charged at the same per-token rates as standard Bedrock inference.
