Data Transformation
Run LLM-generated code to transform data to another format
In this guide, we’ll show you how to use Riza to transform a given dataset into another format. We’ll prompt an LLM to write the code to transform the data, and execute that code using Riza.
Why use Riza?
In general, LLMs are good at writing code, but they can’t execute the code they write.
A common use case for Riza is to safely execute code written by LLMs.
For example, you can ask an LLM to write code to analyze specific data, to generate graphs, or to extract data from a website or document. The code written by the LLM is “untrusted” and might contain harmful side-effects. You can protect your systems by executing that code on Riza instead of in your production environment.
Scenario: Serve a dataset in custom formats
Many government websites provide datasets with commerically-useful information. For example, the California Bureau of Real Estate Appraisers provides a list of all current and recently-licensed appraisers via their site.
In the Data Extraction guide, we showed how to automatically extract the appraisers from this site. We got the data out as a CSV with nine columns.
Now, what if we want to let people get this data in whatever format they want? For example, one person might want just three of the nine fields, in a specific JSON schema.
Solution: Generate & run custom transformation code
We’ll build a script that automatically transforms our dataset to a desired final JSON format. In this script, we’ll prompt an LLM to write code that does the transformation, and we’ll safely execute that code using Riza.
Example input
Example output
Benefits of code generation
Compared to asking an LLM directly to transform your data, using code generation as shown in this example can be more reliable, faster, and more cost-effective, especially for larger datasets. You only have to give the LLM a small part of your data and ask it to write code once, as opposed to calling the LLM with your entire dataset.
Example code
Get the full code and data for this example in our GitHub.
The data we’ve prepared is a subset of the full California Bureau of Real Estate Appraisers dataset. We created this dataset by running the code in our Data Extraction guide.
Step 1: Read in data from CSV
First, we’ll read in the data from our CSV.
Step 2: Generate data transformation code with LLM
In this step, we’ll pass a few lines of the CSV we just read to Anthropic, and ask it to generate custom code to transform our CSV data into a specified JSON format.
First, install and initialize the Anthropic SDK:
Import and initialize the Anthropic client.
We’ll now add a generate_code()
function, along with a prompt for the LLM:
Finally, we’ll call generate_code(sample_data)
in main()
. We’ll only send a few rows of our CSV data to the LLM, because that’s all it needs to understand the shape of the data:
Key components of the prompt
Note that in our prompt above, we explicitly ask the LLM to do a few things:
- Write Python code. We plan to execute this code in a Python runtime on Riza.
- Write code to transform data to a specific JSON format. In this example, we’ve provided a formal JSON schema, but models may be able to understand less formal definitions too.
- Write a function that reads data from an object and returns an object. We plan to use Riza’s Execute Function API to run this code. The Execute Function API lets us pass in an input object and receive an output object.
- Use the Python standard library. By default, Riza provides access to standard libraries. If you want to execute code with additional libraries, you can create a custom runtime. You can see an example of using custom runtimes in our Data Analysis Guide.
Step 3: Execute the code on Riza
Now that we have LLM-generated code, we’re ready to run it on Riza and finish our script.
First, install and initialize the Riza API client library:
Import and initialize the Riza client. Note that there are multiple ways to set your API key:
Let’s add a function, run_code()
, that calls the Riza Execute Function API:
Finally, we’ll update our main()
function to run the generated code, and print the resulting JSON:
This script is now complete. You can now run it to produce the desired JSON-formatted data.
Next steps
- Get the full code for this example in our GitHub.
- Try out the API.
- Learn how to use the Riza API with tool use APIs from OpenAI, Anthropic and Google.
- Check out the roadmap to see what we’re working on next.