Product Scraper
Python
OpenAI
HTML
CSS
Javascript
Project Summary
This python program searches multiple websites for the user's search term and extracts relevant
information. The resulting information is then displayed in a webpage along with an AI generated
summary of the data collection. The data is also output to an Excel file for further data
analysis
Architecture and Design
The program is built using HTML, CSS, and JavaScript for the frontend, with
an underlying python script that handles all the logic. Several python libraries are used
such as the official OpenAI API and Undetected-Chromedriver to avoid bot detection.
Technical Implementation
- Frontend: Built with HTML, CSS, and
JavaScript
- Python Libraries:
-
OpenAI API to prompt ChatGPT with tasks involving the dataset.
-
Undetected-Chromedriver which is a fork of Selenium focused on evading bot detection
-
Beautiful Soup is used to parse the raw HTML and extract the information we want
-
Panda Exports the dataset to an excel file for manual evaluation later.
-
DotEnv helps contain secrets to a .env file that is added to the .gitignore
-
UserAgent rotates the current user-agent in order to further avoid bot detection.