Programmatically test the quality of LLM-generated code
generate_code()
function here accepts a desired output JSON schema, and sample CSV data. These are inputs we’ll provide in our test cases.
For simplicity here, we’ve combined the prompt with the model. In a real-world situation, you may want to create a way to easily swap in different models and prompts.
main()
function shows the outline of the test logic:
evaluate_llm_code()
function. First, we define the format of the results we want:
# Do something
—the logic we apply to each test case. Let’s build out this logic.
code_generator()
function with inputs from the test case.
_run_code()
, that calls the Riza Execute Function API:
_validate_result()
that scores the output. Note that this is just an example of the types of checks you can do. In a real-world example, you’ll likely customize this logic:
evaluate_llm_code()
function. We:
_validate_result()
and add the output to our overall results.PROMPT
template in the code. See how changes to the wording affect performance.
_validate_result()
output a numeric score that reflects a more nuanced result than “pass” versus “fail”.