I have an automation script that opens google browser and does some stuff. I also visit some site for which I need to be logged it. On manual machine, I logged in those site once and that cache get stored in browser profile. So, I just add that profile data in webdriver so that I don't have to log in everytime. Now I'm thinking of deploying it to heroku. Everything else if fine, heroku also have support for google chrome and chrome drivers but how do I import my own profile into that chrome. Thats what I'm stuck on. Any help regarding this would be very helpful. Thank you!
Reference to another old un answered thread on this forum. How do I access my google user profile on Heroku using Ruby?
Related
You may know that heroku will stop their free dyno, free postgres etc from November. So I was finding some alternative to run my python web apps. I have almost 10 regular web apps which I visit regularly, like: url shortener, keyword research, google drive direct link generator site and many more. All of these are hosted on heroku. But I'm moving to vercel now. I setup all projects on vercel but the last one is complicated. My last project is python selenium bot. This one is my keyword research web app. I used some buildpack eg: Headless Chrome (https://github.com/heroku/heroku-buildpack-google-chrome) and Chromedriver (https://github.com/heroku/heroku-buildpack-chromedriver) to make this project run properly. But the problem is I could not find anything like buildpack in vercel to add Chrome and Chromedriver.
Anyone know about this?
Edit:
That was a kind of story and many people didn’t understand what I was asking.
So, My project is about selenium (python). Selenium needs google chrome browser installed and a chromedriver to run itself. There is another option without installing chrome is to set chrome binary location in webdriver.ChromeOptions(). I want to host this selenium project on vercel.com which is linux based.
So my question is how can I install Chrome Browser and ChromeDriver in vercel?
I have a program, that tweets content that I scraped from a website. So I use selenium and a bunch of python libraries. I have found a lot of YouTube videos about selenium grid, aws lambda, etc and I have also deployed a twitter before on ec2.but I find it hard to understand as this time I have selenium. I need the web driver and chrome browser setup.
My program has a bot.py, Image folder where images are saved and deleted and a func.py where functions are saved.
Code is ready but needs deployment, I need a clarity and use case on how to do it. My research after watching and reading content on internet is stressing me out. I’m a newbie, FYI.
I want to build a simple web scraper using python selenium and deploy it on Heroku. I've already done the deployment process with chromedriver and chrome buildpacks and everything is working fine. But I still need to implement one thing. I want to use my local chrome profile so that I don't have to sign up into Google. This is working fine locally by using
options = webdriver.ChromeOptions()
options.add_argument(r"user-data-dir=C:\\Users\\user\\AppData\\Local\\Google\\Chrome\\User Data")
To access my chrome profile on Heroku I just uploaded it the whole folder in the same directory as the code. After the deployment I can access the folder under /app/User Data and can see all files. However if I pass
options.add_argument(r"user-data-dir=/app/User Data")
to the Driver, it doesn't load the profile and the login process fails. I tried to get more information by printing the source code of "chrome://version", but that's just an empty page.
Do you have any suggestion what I can try instead to get it working? Thank you!
If you just need to login you can use https://pypi.org/project/selenium-stealth/ to login. It is selenium but it doesn't get detected when signing in.
I have created a website that scrapes multiple hockey websites for game scores. It runs perfectly on my local server and I am now in the process of trying to deploy. I have tried using pythonanywhere.com but selenium does not seem to be working on it. For any one who has deployed a website that uses selenium/webdriver, what is the easiest/best platform to deploy a website like this (it does not have to be free like pythonanywhere, as long as it is not too expensive, lol!). Thanks in advance
Selenium does work on PythonAnywhere. If you use a free account, you'd have restricted internet access though. Also it's recommended to scrape outside of the web app, since it would slow the views down -- you should rather use a Schedule/Always-on task for that instead. You can also refer to those PythonAnywhere help pages:
Using Selenium
Async work in web apps
You can use the AWS, GCP, or Digitalocean Linux servers. In this case, you first have to install chrome in Linux and then put the relevant version of the chrome driver in your project directory. Make sure to check the chrome version first and then put the relevant Chrome driver on your machine.
I am scraping some websites that seem to have pretty good protection against it. The only way I can get it to work is to use Selenium to load the page and then scrape stuff from that.
Currently this works on my local computer (a firefox windows opens and closed when I access my page and it's HTML is processed further in my script). However, I need my scraper to be accessible on the web. The scraper is embedded within a Flask app on Heroku. Is there a way to make the Selenium browser work on Heroku servers? Or are there any hosting providers where it can work?
Heroku, wonderful as it is, has a major limitation in that one cannot use custom software or in many cases, libraries. In providing an easy to use, centrally-controlled, managed stack, Heroku strips their servers down to prevent other usage.
What this boils down to is there is no Xorg on a Heroku dyno. Lack of Xorg and lack of ability to install custom software means no xvfb either, and no ability to run the browser that selenium expects to exist. Further, the browser is not generally available.
You'll have better luck with a cloud offering like AWS, where you can install custom software, including firefox, xvfb (to keep from needing all the Xorg overhead), and of course the rest of your scraping stack. This answer explains how to do it properly.
There are buildpacks to make selenium work on heroku.
Add below buildpacks.
1) heroku buildpacks:add https://github.com/kevinsawicki/heroku-buildpack-xvfb-google-chrome/
2) heroku buildpacks:add https://github.com/heroku/heroku-buildpack-chromedriver
And set heroku stack to cedar-14 as shown below, as xvfb buildpack works only with cedar-14.
heroku stack:set cedar-14 -a stocksdata
Then point the google chrome location as below
options = ChromeOptions()
options.binary_location = "/app/.apt/usr/bin/google-chrome-stable"
driver = webdriver.Chrome(chrome_options=options)