Advance Website Scraper

Introduction: Advanced mode introduced for scraping containing JavaScript enabled content and single page application (SPA) websites. Advanced scraping is done using Firecrawl services. Scraper type options (Basic & Advanced) dropdown available in Website Links section of Content Tab.


Value Delivered

  • Increased run time AI accuracy in response generation due to higher content coverage
  • Helps in faster delivery of use cases by reducing manual effort where content scraping required for Javascript enabled content & SPA websites

Basic Vs Advanced Usage Guidelines:

  • Basic: Recommended when there is a simple website. Use it for first time training. If training is failing repeatedly or accuracy is less, then only move to Advanced
  • Advanced: Recommended for website with JavaScript enabled content & single page application websites.

Key Limitation of Advanced Scraper:

  • Max Limit on Website Pages in a single training - 3000
  • Higher training time as compared to basic: ~ 30-40 minutes for 3000 pages
  • Can not perform OCR on images
  • Can not Scrape website hosted contents like pdf, docs, videos etc
  • Can not scrape dynamic content and perform operations like Click, Scroll, Wait while extracting the content

What’s Next