Async Scraping
Scrape multiple URLs concurrently for maximum speed.
Basic Async
import asyncio
import easyscrape as es
async def main():
result = await es.async_scrape("https://example.com")
print(result.title())
asyncio.run(main())
Concurrent Requests
Scrape multiple URLs in parallel:
import asyncio
import easyscrape as es
async def scrape_all(urls: list[str]):
tasks = [es.async_scrape(url) for url in urls]
results = await asyncio.gather(*tasks)
return results
urls = [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3",
]
results = asyncio.run(scrape_all(urls))
for result in results:
print(f"{result.url}: {result.title()}")
Rate-Limited Concurrency
Control the number of simultaneous requests:
import asyncio
import easyscrape as es
async def scrape_with_limit(urls: list[str], max_concurrent: int = 5):
semaphore = asyncio.Semaphore(max_concurrent)
async def limited_scrape(url: str):
async with semaphore:
return await es.async_scrape(url)
tasks = [limited_scrape(url) for url in urls]
return await asyncio.gather(*tasks)
Error Handling
Handle failures gracefully:
import asyncio
import easyscrape as es
async def safe_scrape(url: str):
try:
return await es.async_scrape(url)
except es.ScrapeError as e:
print(f"Failed: {url} - {e}")
return None
async def main():
urls = ["https://example.com", "https://invalid.example"]
tasks = [safe_scrape(url) for url in urls]
results = await asyncio.gather(*tasks)
successful = [r for r in results if r is not None]
print(f"Succeeded: {len(successful)}/{len(urls)}")
With Async Config
from easyscrape import Config
config = Config(timeout=10, retries=2)
async def main():
result = await es.async_scrape(
"https://example.com",
config=config
)
Best Practices
Limit concurrency - Don’t overwhelm servers. Use semaphores.
Handle errors - Network requests fail. Plan for it.
Respect robots.txt - Check before bulk scraping.
Add delays - Use
rate_limitin config for politeness.