Overview
This demo showcases how to use Trigger.dev with Python to build a web crawler that uses a headless browser to navigate websites and extract content.Prerequisites
- A project with Trigger.dev initialized
- Python installed on your local machine
Features
- Trigger.dev for background task orchestration
- Our Python build extension to install the dependencies and run the Python script
- Crawl4AI, an open source LLM friendly web crawler
- A custom Playwright extension to create a headless chromium browser
- Proxy support
Using Proxies
WEB SCRAPING: When web scraping, you MUST use a proxy to comply with our terms of service. Direct scraping of third-party websites without the site owner’s permission using Trigger.dev Cloud is prohibited and will result in account suspension. See this example which uses a proxy.
- PROXY_URL: The URL of your proxy server (e.g.,- http://proxy.example.com:8080)
- PROXY_USERNAME: Username for authenticated proxies (optional)
- PROXY_PASSWORD: Password for authenticated proxies (optional)
GitHub repo
View the project on GitHub
Click here to view the full code for this project in our examples repository on GitHub. You can
fork it and use it as a starting point for your own project.
The code
Build configuration
After you’ve initialized your project with Trigger.dev, add these build settings to yourtrigger.config.ts file:
trigger.config.ts
Learn more about executing scripts in your Trigger.dev project using our Python build extension
here.
Task code
This task uses thepython.runScript method to run the crawl-url.py script with the given URL as an argument. You can see the original task in our examples repository here.
src/trigger/pythonTasks.ts
Add a requirements.txt file
Add the following to yourrequirements.txt file. This is required in Python projects to install the dependencies.
requirements.txt
The Python script
The Python script is a simple script using Crawl4AI that takes a URL and returns the markdown content of the page. You can see the original script in our examples repository here.src/python/crawl-url.py
Testing your task
- Create a virtual environment python -m venv venv
- Activate the virtual environment, depending on your OS: On Mac/Linux: source venv/bin/activate, on Windows:venv\Scripts\activate
- Install the Python dependencies pip install -r requirements.txt
- If you haven’t already, copy your project ref from your Trigger.dev dashboard and add it to the trigger.config.tsfile.
- Run the Trigger.dev CLI devcommand (it may ask you to authorize the CLI if you haven’t already).
- Test the task in the dashboard, using a URL of your choice.
WEB SCRAPING: When web scraping, you MUST use a proxy to comply with our terms of service. Direct scraping of third-party websites without the site owner’s permission using Trigger.dev Cloud is prohibited and will result in account suspension. See this example which uses a proxy.
Deploying your task
Deploy the task to production using the Trigger.dev CLIdeploy command.
Learn more about using Python with Trigger.dev
Python build extension
Learn how to use our built-in Python build extension to install dependencies and run your Python
code.

