Product Scraper

Python OpenAI HTML CSS Javascript

Project Summary

This python program searches multiple websites for the user's search term and extracts relevant information. The resulting information is then displayed in a webpage along with an AI generated summary of the data collection. The data is also output to an Excel file for further data analysis


Architecture and Design

The program is built using HTML, CSS, and JavaScript for the frontend, with an underlying python script that handles all the logic. Several python libraries are used such as the official OpenAI API and Undetected-Chromedriver to avoid bot detection.

Technical Implementation

  • Frontend: Built with HTML, CSS, and JavaScript
  • Python Libraries:
    • OpenAI API to prompt ChatGPT with tasks involving the dataset.
    • Undetected-Chromedriver which is a fork of Selenium focused on evading bot detection
    • Beautiful Soup is used to parse the raw HTML and extract the information we want
    • Panda Exports the dataset to an excel file for manual evaluation later.
    • DotEnv helps contain secrets to a .env file that is added to the .gitignore
    • UserAgent rotates the current user-agent in order to further avoid bot detection.