Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper.
Another popular crawler. The .markdown attribute is very useful for LLMs, AI agents, and data pipelines.
mkdir test_dir
cd test_dir/
python3 -m venv venv
source venv/bin/activate
pip install crawl4ai
crawl4ai-setup
crawl4ai-doctor
Create crawl.py
to crawl. vi crawl.py
.
Sample script based on the example from crawl4ai.com.
import argparse
import asyncio
from crawl4ai import AsyncWebCrawler
async def main(url):
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(url=url)
# https://docs.crawl4ai.com/core/crawler-result/
# .html or .cleaned_html is useful for beautifulsoup
print(result.markdown)
if __name__ == "__main__":
argparser = argparse.ArgumentParser()
argparser.add_argument("--url", default="https://crawl4ai.com")
args = argparser.parse_args()
asyncio.run(main(args.url))
To crawl.
python crawl.py --url https://brianchan.us