How Developers Are Automating ChatGPT and Gemini Web UIs Without API Keys
Developers seeking to automate AI tasks like batch OCR or image generation often face a choice between free but manual browser use and paid API access. A developer has documented a method to script ChatGPT and Gemini's web interfaces directly using Selenium with undetected-chromedriver, bypassing the need for API keys entirely. The approach addresses technical hurdles such as non-standard input fields, emoji encoding issues, and hidden file upload elements that complicate browser automation. Key challenges include handling contenteditable divs, managing newlines with Shift+Enter to avoid premature submission, and triggering file uploads without opening a dialog. The technique is aimed at hobby projects, throwaway scripts, and research use cases where production-grade reliability is not required.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in