In this guide, we’ll show you how to use function calling and Riza’s Code Interpreter API to safely execute Python code generated by an OpenAI model, such as gpt-4o.

Background

OpenAI’s function calling feature allows you to connect gpt-4o and other OpenAI models with external tools. A common function calling use case is to execute Python code generated by an LLM. However, unreviewed code generated by an LLM should be treated as untrusted and should only be run in a safe environment.

Riza’s Code Interpreter provides a safe environment for executing untrusted code via a simple API call.

The complete script

This script uses OpenAI’s gpt-4o and function calling to write and execute Python code. Below is the full script, in case you want to copy and paste it into your own project.

In the rest of this guide we’ll explain each of its components.

import sys
import json
import openai
import rizaio

client = openai.OpenAI()  # Set OPENAI_API_KEY as env variable
riza = rizaio.Riza()  # Set RIZA_API_KEY as env variable

def exec_python(code: str) -> str:
    print(f"Executing Python code:")
    print(code)
    result = riza.command.exec(language="python", code=code)
    print(f"Execution result:")
    print(result)

    if result.exit_code > 0:
        return result.stderr    
    if result.stdout:
        return result.stdout
    raise RuntimeError("Code executed successfully but produced no output.")

tools = [{
    "type": "function",
    "function": {
        "name": "exec_python",
        "description": "Execute Python to solve problems. The Python environment is not a notebook. Always print output to stdout.",
        "parameters": {
            "type": "object",
            "properties": {
                "code": {
                    "type": "string",
                    "description": "The Python code to execute. Always print output to stdout.",
                }
            },
        "required": ["code"],
        "additionalProperties": False,
        },
    }
}]

sys_msg = "You are a helpful assistant. Use a tool if you need to solve a problem."
user_msg = "Calculate how many diamonds I need to make 12 full suits of diamond armor in Minecraft."

messages = [
    {"role": "system", "content": sys_msg},
    {"role": "user", "content": user_msg},
]

completion = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
)

if completion.choices[0].message.tool_calls:
    messages.append(completion.choices[0].message)

    for tool_call in completion.choices[0].message.tool_calls:
        args = json.loads(tool_call.function.arguments)
        function_response = ""
        match tool_call.function.name:
            case "exec_python":
                function_response = exec_python(args["code"])
            case _:
                raise ValueError("unknown function")

        messages.append({
            "role": "tool",
            "content": function_response,
            "tool_call_id": tool_call.id,
        })

    completion = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
    )

print(completion.choices[0].message.content)

Running this script will output something like this:

Executing Python code:
diamonds_per_suit = 24
suits = 12
total_diamonds = diamonds_per_suit * suits
print(total_diamonds)
Execution result:
CommandExecResponse(exit_code=0, stderr='', stdout='288\n')
You will need a total of 288 diamonds to make 12 full suits of diamond armor in Minecraft.

Let’s take a look at how to set up the script, and how it works.

Set up your environment and run the script

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install the openai and rizaio Python libraries.

pip install openai rizaio

We’ll need API keys from both OpenAI and Riza.

Set these API keys as environment variables in your terminal:

export OPENAI_API_KEY="your_openai_api_key"
export RIZA_API_KEY="your_riza_api_key"

Copy and paste the above script into a file named coder.py and run it:

python coder.py

How the script works

Let’s look at the key components of our script:

Import libraries and initialize clients

First we import our helper libraries and initalize the OpenAI and Riza clients.

import sys
import json
import openai
import rizaio

client = openai.OpenAI()  # defaults to env var OPENAI_API_KEY
riza = rizaio.Riza()      # defaults to env var RIZA_API_KEY

Use Riza to execute the code and return the result

Our exec_python function uses Riza to safely execute any untrusted code in an isolated environment.

def exec_python(code: str) -> str:
    print(f"Executing Python code:")
    print(code)
    result = riza.command.exec(language="python", code=code)
    print(f"Execution result:")
    print(result)

    if result.exit_code > 0:
        return result.stderr    
    if result.stdout:
        return result.stdout
    raise RuntimeError("Code executed successfully but produced no output.")

This function accepts a single parameter: a string of code to execute. We assume the code is Python since we’re prompting the LLM to produce Python.

We use the Riza client to execute the code. Riza returns an object with three properties: exit_code, stdout, and stderr. If execution completes successfully, exit_code will be 0 and stdout will contain the output. If execution results in an error, exit_code will be greater than 0 and stderr will contain the error message.

We return the result from execution (either stdout or stderr) which we’ll pass back to the LLM.

Occasionally, the code will run successfully (exit_code == 0) with no output (stdout == ""). This happens when the LLM generates code that does not print anything to stdout. In this case we raise an exception to avoid silent failure. Aggressive and specific prompting will usually prevent this.

Describe the function for the LLM

OpenAI’s function calling syntax requires a list of dictionaries describing your functions and their parameters.

A list of tools containing only our exec_python description would look like this:

tools = [{
    "type": "function",
    "function": {
        "name": "exec_python",
        "description": "Execute Python to solve problems. The Python environment is not a notebook. Always print output to stdout.",
        "parameters": {
            "type": "object",
            "properties": {
                "code": {
                    "type": "string",
                    "description": "The Python code to execute. Always print output to stdout.",
                }
            },
        "required": ["code"],
        "additionalProperties": False,
        },
    }
}]

Notice that we tell the LLM it must write code that prints output to stdout.

Make the call to the Chat Completion API

The general pattern for using function calling with OpenAI is to:

  1. Make an initial call to the Chat Completion API, including a list of tools it can use to generate a response.

  2. Check the initial response for a tool_calls field to see if the model decided to use a tool. If so, extract the function arguments from the response, run the function, and append the result as a message to the conversation. In our case, the function arguments will be a string of Python code, and the tool will be exec_python.

  3. Make another call to the Chat Completion API endpoint with the appended message including the function result.

  4. Return the final completion to the user.

For the sake of this guide we’ve hardcoded a simple system message and user message.

sys_msg = "You are a helpful assistant. Use a tool if you need to solve a problem."
user_msg = "Calculate how many diamonds I need to make 12 full suits of diamond armor in Minecraft."

messages = [
    {"role": "system", "content": sys_msg},
    {"role": "user", "content": user_msg}
]

We make our first API call to OpenAI, providing the model, list of messages, and available tools.

completion = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
)

Handle function calls

If the model decides to use a function, we will see a tool_calls field in completion.choices[0].message.tool_calls.

In this example, we have only equipped the model with a single function, exec_python, but to support multiple tools in the future we iterate over each function call in the tool_calls list.

When we find the exec_python function name in the tool_calls list we extract the arguments returned from the model, execute the code using Riza, and append the result to the list of messages.

if completion.choices[0].message.tool_calls:
    messages.append(completion.choices[0].message)

    for tool_call in completion.choices[0].message.tool_calls:
        args = json.loads(tool_call.function.arguments)
        function_response = ""
        match tool_call.function.name:
            case "exec_python":
                function_response = exec_python(args["code"])
            case _:
                raise ValueError("unknown function")

        messages.append({
            "role": "tool",
            "content": function_response,
            "tool_call_id": tool_call.id,
        })

We make a final call to the Chat Completion API endpoint with the updated list of messages.

completion = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
)

And finally, we print the final completion.

print(completion.choices[0].message.content)

Next steps

If you modify the user prompt message enough times and ask a wide variety of questions you’ll surely notice that the model will try to make HTTP requests within the Python code it writes. By default Riza’s isolated Python runtime environment doesn’t allow access to network I/O. Read about how to allow network access here.

Conclusion

The script in thie guide demonstrates how to use OpenAI’s function calling feature with Riza’s Code Interpreter API to safely execute Python code generated by an LLM. By leveraging these tools you can create powerful applications that write and run code dynamically, while maintaining a secure execution environment.

Remember to always treat LLM-generated code as untrusted and use appropriate safeguards, such as Riza’s sandboxed environment, when executing it.