Overview

In this guide we’ll build an agent that can write and run code safely using PydanticAI and Riza.

PydanticAI is a Python Agent Framework that simplifies building LLM powered applications. Among other conveniences, PydanticAI’s abstractions greatly simplify tool use / function calling.

A common use case for function calling is to run code written by an LLM. However, code that has not been reviewed should be treated as untrusted and only be run in a safe, isolated environment.

Riza’s Code Interpreter provides a safe environment for executing untrusted code, which makes it a great fit for tool calling. With Riza you can execute arbitrary code in a sandboxed environment via a simple API call. For example, here’s Hello World with Riza:

result = riza.command.exec(language="python", code="print('Hello, world!')")

Let’s build an agent that can write and run code with PydanticAI and Riza.

The complete script

Below is the full script. In the rest of this guide we’ll explain component.

code_agent.py
from pydantic_ai import Agent, ModelRetry
import rizaio
import json


code_agent = Agent("openai:gpt-4o", system_prompt="You are a helpful assistant.")


@code_agent.tool_plain
def execute_code(code: str) -> str:
    """Execute Python code

    Use print() to write the output of your code to stdout.

    Use only the Python standard library and built-in modules. For example, do not use pandas, but you can use csv.

    Use httpx to make http requests.
    """

    print(f"Agent wanted to execute this code:\n```\n{code}\n```")

    riza = rizaio.Riza()
    result = riza.command.exec(
        language="python", code=code, http={"allow": [{"host": "*"}]}
    )

    if result.exit_code != 0:
        raise ModelRetry(result.stderr)
    if result.stdout == "":
        raise ModelRetry(
            "Code executed successfully but produced no output. "
            "Ensure your code includes print statements to get output."
        )

    print(f"Execution output:\n```\n{result.stdout}\n```")
    return result.stdout


def log_messages(messages):
    """Convert agent messages to JSON-serializable format and save to file."""
    serialized = [
        {
            "role": m.role if hasattr(m, "role") else "unknown",
            "content": m.content if hasattr(m, "content") else str(m),
        }
        for m in messages
    ]
    with open("all_messages.json", "w") as f:
        json.dump(serialized, f, indent=2)


if __name__ == "__main__":
    usr_msg = "Please introduce yourself."
    result = code_agent.run_sync(usr_msg)
    while usr_msg != "quit":
        print(result.data)
        usr_msg = input("> ")
        result = code_agent.run_sync(usr_msg, message_history=result.all_messages())
        log_messages(result.all_messages())

Set up your environment and run the script

To run this script, first create and activate a virtual environment:

python3 -m venv .venv
source venv/bin/activate

Install the pydantic-ai and riza Python libraries:

pip install pydantic-ai rizaio

Get API keys from both OpenAI and Riza.

Set these API keys as environment variables in your terminal:

export OPENAI_API_KEY="your_openai_api_key"
export RIZA_API_KEY="your_riza_api_key"

Copy and paste the above script into a file named coding_agent.py and run it:

python coding_agent.py

How the Script Works

There are four main components to this script:

  1. The PydanticAI agent
  2. The execute_code tool
  3. An optional logging function
  4. A loop to ask for user input, run the agent, and log messages

Define a PydanticAI Agent

You can define a PydanticAI agent with just a model and a system prompt:

code_agent = Agent("openai:gpt-4o", system_prompt="You are a helpful assistant.")

Agents can be much more complex, but a simple agent works for this guide. You can find the full PydanticAI Agent documentation here.

Add an execute_code() tool to the PydanticAI Agent

We can add function tools to our agent using decorators.

There are two decorators you can use to add a tool to an agent:

  • @agent.tool is used for tools that need to access to the agent context, such as the message history.

  • @agent.tool_plain is used for tools that do not need access to the agent context.

Our execute_code tool only requires a single parameter, the code to execute, so we’ll use the @code_agent.tool_plain decorator:

@code_agent.tool_plain
def execute_code(code: str) -> str:

We’ll add docstring to help the LLM understand how and when to use the tool.

    """Execute Python code

    Use print() to write the output of your code to stdout.

    Use only the Python standard library and built-in modules. For example, do not use pandas, but you can use csv.

    Use httpx to make http requests.
    """

For logging, we’ll print the code that the agent wants to execute.

To run code, we create a Riza client, then call the exec() method with the code to execute. Optionally, we can pass in http={"allow": [{"host": "*"}]} to allow code to make HTTP requests.

    print(f"Agent wanted to execute this code:\n```\n{code}\n```")
    riza = rizaio.Riza()
    result = riza.command.exec(
        language="python", code=code, http={"allow": [{"host": "*"}]}
    )

When code is executed on Riza, it returns a result object with three properties: exit_code, stdout, and stderr.

If exit_code is not 0, that means an error occurred. We raise a ModelRetry and pass along the error. PydanticAI will tell the model to try to fix the error. (More on ModelRetry)

If exit_code is 0, the execution did not throw and error. If stdout is empty, this often means that the LLM generated code that did not print() to stdout — a common cause of silent failure when LLMs generate code. We raise a ModelRetry and pass along a message to encourage the LLM to include print statements in the code.

If exit_code is 0 and stdout is not empty, the code executed successfully and we return stdout.

    if result.exit_code != 0:
        raise ModelRetry(result.stderr)
    if result.stdout == "":
        raise ModelRetry(
            "Code executed successfully but produced no output. "
            "Ensure your code includes print statements to get output."
        )

    print(f"Execution output:\n```\n{result.stdout}\n```")
    return result.stdout

Log Messages

We use a utility function to log the messages to a all_messages.json file. This is not necessary, but is helpful for debugging and understanding the agent’s behavior which can be hidden behind PydanticAI’s abstractions.

def log_messages(messages):
    """Convert agent messages to JSON-serializable format and save to file."""
    serialized = [
        {
            "role": m.role if hasattr(m, "role") else "unknown", 
            "content": m.content if hasattr(m, "content") else str(m),
        }
        for m in messages
    ]
    with open("all_messages.json", "w") as f:
        json.dump(serialized, f, indent=2)

Run the PydanticAI Agent

An agent’s run_sync method runs the agent and returns a RunResultobject. Two of RunResult’s properties include:

  • result.data: The result of the run.
  • result.all_messages: The full message history.

We start our interaction by asking the agent to introduce itself, then enter a loop.

if __name__ == '__main__':
    usr_msg = "Please introduce yourself."
    result = code_agent.run_sync(usr_msg)

    while usr_msg != "quit":
        print(result.data)
        usr_msg = input("> ")
        result = code_agent.run_sync(usr_msg, message_history=result.all_messages())
        log_messages(result.all_messages())

In the loop we print result.data and ask the user for a new message.

On subsequent runs, we call run_sync again with message_history=result.all_messages() to track a multi-step conversation between the agent and the user.

After each run we log the messages to all_messages.json.

Example Run: What time is it?

A simple way to invoke the agent’s ability to write and execute code is to ask what time is it?. Some LLM providers inject the current date into the system prompt — but not the time. To accurately report the current time, the agent will need to run a Python script.

[
  {
    "role": "system",
    "content": "You are a helpful assistant."
  },
  {
    "role": "user",
    "content": "Please introduce yourself."
  },
  {
    "role": "model-text-response",
    "content": "Hello! I'm a helpful assistant designed to assist you with a wide range of questions and tasks. Whether you need information, assistance with problem-solving, or help with coding, I'm here to help. If there's anything specific you need, feel free to ask!"
  },
  {
    "role": "user",
    "content": "What time is it? "
  },
  {
    "role": "model-structured-response",
    "content": "ModelStructuredResponse(calls=[ToolCall(tool_name='execute_code', args=ArgsJson(args_json='{\"code\":\"from datetime import datetime\\\\n\\\\n# Get the current time\\\\ncurrent_time = datetime.now().strftime(\\'%Y-%m-%d %H:%M:%S\\')\\\\nprint(current_time)\"}'), tool_id='call_x74nxHU2LdSla0DjpLjDxQd4')], timestamp=datetime.datetime(2024, 12, 10, 4, 19, 7, tzinfo=datetime.timezone.utc), role='model-structured-response')"
  },
  {
    "role": "tool-return",
    "content": "2024-12-10 04:19:08\n"
  },
  {
    "role": "model-text-response",
    "content": "The current time is 04:19:08 on December 10, 2024."
  }
]

Notice the model-structured-response and tool-return messages. These are interactions between the model and its tools that PydanticAI handles automatically. You can see in the model-structured-response that the tool call is execute_code and the argument is code that uses the datetime module to get the current time.

The tool-return message contains the output of the tool call which is passed back to the model, which uses the result to generate a final response. Notice the difference between the format of the tool-return message and the final model-text-response message.

Example Run: What is 100 factorial?

While LLMs have gotten better at math, they struggle with large numbers. To calculate 100!, the agent needs to run a Python script.

[
  {
    "role": "system",
    "content": "You are a helpful assistant."
  },
  {
    "role": "user",
    "content": "Please introduce yourself."
  },
  {
    "role": "model-text-response",
    "content": "I am a virtual assistant designed to assist you with information, answer questions, and perform tasks using various tools and capabilities. If you have any questions or need help with something specific, feel free to ask!"
  },
  {
    "role": "user",
    "content": "What is 100 factorial? "
  },
  {
    "role": "model-structured-response",
    "content": "ModelStructuredResponse(calls=[ToolCall(tool_name='execute_code', args=ArgsJson(args_json='{\"code\":\"from math import factorial\\\\n\\\\n# Calculate 100 factorial\\\\nresult = factorial(100)\\\\n\\\\n# Print the result\\\\nprint(result)\"}'), tool_id='call_lvNO3iqFwk6ukMJzMhat9xKJ')], timestamp=datetime.datetime(2024, 12, 10, 4, 21, 30, tzinfo=datetime.timezone.utc), role='model-structured-response')"
  },
  {
    "role": "tool-return",
    "content": "93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000\n"
  },
  {
    "role": "model-text-response",
    "content": "100 factorial (100!) is a very large number: \n\n93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000."
  }
]

Here you can see the tool call to Riza with Python code to evaluate 100!, the result from Riza, and the final response from the LLM.

Conclusion

The script in this guide demonstrates how to use PydanticAI’s agents with Riza’s Code Interpreter API to safely execute Python code generated by an LLM. By leveraging these tools you can create powerful applications that write and run code dynamically, while maintaining a secure execution environment.

Remember to always treat LLM-generated code as untrusted and use appropriate safeguards, such as Riza’s sandboxed environment, when executing it.