Build a data analyst agent with LangGraph, Browserbase, and Riza
.env
file with the following variables:
uv init
uv add browserbase langchain langchain-anthropic langgraph playwright python-dotenv rizaio
state.py
file that defines a TrackerState
class:
url
to scrape; and storage_folder_path
, the local folder where we want to store our scraped data and saved charts. The other fields will be populated by the steps in the workflow.
graph.py
, where we will define our LangGraph workflow. Here, we’ll define the steps (aka “nodes”) we want in the workflow, and the order of those steps.
Note, we have not implemented these nodes yet—we will do that next.
nodes/scrape.py
. This node will scrape HTML from our gas price website (this page from AAA) using Browserbase. Browserbase provides a managed browser that can navigate to the website and extract the table of gas prices.
current_html
field in our state to the newly-extracted HTML.
nodes/extract_price_data.py
. This node will extract the data from our newly-scraped HTML, and transform it into a CSV. The CSV format is more compact and easier to manipulate for data analysis.
beautifulsoup4
& plotly
beautifulsoup4
, a popular Python library for parsing HTML. To make beautifulsoup4
available on Riza, we’ll create a custom runtime.
Later on (in Step 8), we’ll also want to run LLM-written code to generate a chart. So let’s also add plotly
, a popular Python charting library, to our custom runtime.
Follow these steps:
Field | Value |
---|---|
Language | Python |
requirements.txt | beautifulsoup4 plotly |
RIZA_RUNTIME_REVISION_ID
in the .env
file you created in Step 1._run_code()
, that calls the Riza Execute Function API and uses our custom runtime.current_html
stored in our LangGraph state to both the LLM and to the Riza function call.current_csv
).nodes/check_if_changed.py
. This node will determine if the price data has changed. (Recall that in our LangGraph workflow, we specify that if the price data has not changed, the workflow should immediately end.)
Since this diffing logic is not the focus of our demo, we provide just a basic implementation that uses the built-in difflib
Python library.
nodes/summarize_change.py
. As we’ve specified in our LangGraph workflow, this node (and the following nodes) will only run if the previous step detected a change in gas prices. The Summarize Change node will prompt an LLM to act as a data analyst and highlight notable changes in the data.
In our demo, we’ll use a single LLM call to do this analysis, because the analysis for this data will be fairly simple. (For more complex data analyses, we’d likely want to turn this node into an agent that can write code and run the code on Riza, too.)
nodes/create_chart.py
. This node will generate a chart that’s relevant to the analysis that the LLM produced in the previous step.
plotly
, a popular Python charting library. Recall that in Step 5, we already created a Riza custom runtime that includes plotly
. We’ll reuse that custom runtime here.
The code is below. Note how we:
_run_code()
, that calls the Riza Execute Function API and uses our custom runtime.summary
, previous_csv
, and current_csv
stored in our LangGraph state to the LLM, and the previous_csv
and current_csv
to the Riza function call.chart_path
).utils/storage.py
:
nodes/store_and_notify.py
. This node will store the new CSV data (replacing the previous CSV data), and “notify” the user. Since this logic is not the focus of our demo, we provide just a basic implementation that saves the data to a local file, and prints the output to the console.
main.py
main.py
file that will kick off this workflow.
We import the LangGraph graph, and kick it off with the two pieces of state required at the start of the workflow: the URL of the gas price site, and the path to a local folder that you want to use to store the output files.
uv run main.py
.