> ## Documentation Index > Fetch the complete documentation index at: https://docs.riza.io/llms.txt > Use this file to discover all available pages before exploring further. # Data Extraction > Run LLM-generated code to extract data from a website In this guide, we'll show you a simple, real-world use case for Riza: automatically extracting data from a website. We'll prompt an LLM to write the scraping code, and execute that code using Riza. ### Why use Riza? In general, LLMs are good at writing code, but they can't execute the code they write. A common use case for Riza is to safely execute code written by LLMs. For example, you can ask an LLM to write code to analyze specific data, to generate graphs, or to extract data from a website or document. The code written by the LLM is "untrusted" and might contain harmful side-effects. You can protect your systems by executing that code on Riza instead of in your production environment. To see data extraction integrated into an AI agent, see our guide on building [a data analyst agent with LangGraph, Browserbase, and Riza](/guides/frameworks/langgraph-gas-price-agent). ## Scenario: Download a large dataset Many government websites provide datasets with commerically-useful information. For example, the California Bureau of Real Estate Appraisers provides a list of all current and recently-licensed appraisers via their site. However, it's hard to get the data out. There are over 13,000 appraisers, presented in pages of 300 at a time, with no bulk download option. If you want to download all the data, you'll want to automate it. ## Solution: What we'll build We'll write a script that automatically extracts each appraiser from the Real Estate Appraisers site, and prints out the results as a CSV. To keep this guide simple, we'll hand-write the code to download the HTML, and only ask the LLM to write the code to extract data from the HTML. ## Example code Get the full code for this example [in our GitHub](https://github.com/riza-io/examples/blob/main/use-cases/data_extraction_intro.py). Before you begin, sign up for [Riza](https://dashboard.riza.io) and [OpenAI](https://openai.com/) API access. You can adapt this guide to use any other LLM. There is no special reason we chose OpenAI for this use case, and it's straightforward to adjust the implementation to use another LLM provider. ## Step 1: Download one page of HTML First, we'll fetch one page of HTML from the Real Estate Appraisers site. We'll use the `httpx` library to make the web request, and the `beautifulsoup4` library to further process the HTML. Let's install them: ```sh theme={null} pip install httpx beautifulsoup4 ``` Next, we'll write a function, `download_html_body()`, to download a page of results: ```py theme={null} from bs4 import BeautifulSoup import httpx def extract_body_html(full_html): """Returns just the of an HTML page, without any """ soup = BeautifulSoup(full_html, "html.parser") body = soup.find("body") if body: for script in body.find_all("script"): script.decompose() return str(body) else: print("No tag found in the HTML.") return None def download_html_body(website_url): response = httpx.get(website_url) if response.status_code == 200: return extract_body_html(response.text) return None def main(): URL = 'https://www2.brea.ca.gov/breasearch/faces/party/search.xhtml' html = download_html_body(URL) if html is None: print('Could not download HTML') return None # print(html) # optional: Print out the HTML you've extracted ``` At this point, you can uncomment the print statement above and run the script to see the extracted HTML. In our code above, we include an optimization to reduce the overall size of the HTML. After we download the HTML, we extract the `` and remove all `