Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper.

Another popular crawler. The .markdown attribute is very useful for LLMs, AI agents, and data pipelines.

mkdir test_dir
cd test_dir/
python3 -m venv venv
source venv/bin/activate
pip install crawl4ai
crawl4ai-setup
crawl4ai-doctor

Create crawl.py to crawl. vi crawl.py.

Sample script based on the example from crawl4ai.com.

import argparse
import asyncio

from crawl4ai import AsyncWebCrawler


async def main(url):
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(url=url)
        # https://docs.crawl4ai.com/core/crawler-result/
        # .html or .cleaned_html is useful for beautifulsoup
        print(result.markdown)


if __name__ == "__main__":
    argparser = argparse.ArgumentParser()
    argparser.add_argument("--url", default="https://crawl4ai.com")
    args = argparser.parse_args()
    asyncio.run(main(args.url))

To crawl.

python crawl.py --url https://brianchan.us