I am developing a python script to take screenshots from many websites. for this I am using below tools,
phantomjs with selenium
python
windows PC
I have used pyside(instead of phantomjs) for that job but I faced many issues on pyside..
now I have found phantomjs tool from Google.com, I have used phantomjs with selenium for python in windows machine it is working flawless. but it has only one issue phantomjs doesn't support flash player, so am not able to process youtube and some flash websites.. please give me the some quick fix for this
PhantomJS does not and probably will not support Flash and other plugins (see here).
But you can use SlimerJS in your Selenium tests, which is a headless browser based on the Gecko engine. It does support the WebDriver protocol, so use it.
There is also a fork of PhantomJS with Flash support, but it didn't merge changes in PhantomJS back into it, so it is standing still at version 1.9.0.
Phantomjs now don't rely on xwindow enviroment since 1.5, also it has remove plugin support at that time. So there is no officially support for running flash player in current phantomjs version.
Howerver, there are so many project fork from the old phantomjs that has flash player enabled and keep update. You can try r3b phantomjs. Recently I had build a perfect service upon this project under ubuntu os.
Related
You may know that heroku will stop their free dyno, free postgres etc from November. So I was finding some alternative to run my python web apps. I have almost 10 regular web apps which I visit regularly, like: url shortener, keyword research, google drive direct link generator site and many more. All of these are hosted on heroku. But I'm moving to vercel now. I setup all projects on vercel but the last one is complicated. My last project is python selenium bot. This one is my keyword research web app. I used some buildpack eg: Headless Chrome (https://github.com/heroku/heroku-buildpack-google-chrome) and Chromedriver (https://github.com/heroku/heroku-buildpack-chromedriver) to make this project run properly. But the problem is I could not find anything like buildpack in vercel to add Chrome and Chromedriver.
Anyone know about this?
Edit:
That was a kind of story and many people didn’t understand what I was asking.
So, My project is about selenium (python). Selenium needs google chrome browser installed and a chromedriver to run itself. There is another option without installing chrome is to set chrome binary location in webdriver.ChromeOptions(). I want to host this selenium project on vercel.com which is linux based.
So my question is how can I install Chrome Browser and ChromeDriver in vercel?
I have a python script that uses Chromedriver and selenium to scrape a handful of websites. What resources are currently used to run these type of python scripts in the cloud?
For reference, I am using version 84.0.4147.30
One possible solution would be to setup an EC2 instance and install chrome, chromedriver and python...
Apologies if this is not the right place to post this.
I am using a script to scrape asynchronously loaded content from news sites and am running into a situation where the version of phantomJS (with selenium on python) that I have running on my dev machine (a mac with El Capitan) has a much higher level of success than the version I am running on an ubuntu 14.04 LTS server. Despite the fact that I am using the exact same version of Python (3.6.0), Selenium library for python (3.4.0), ghostdriver (1.2.0, but I believe this is just what comes with Selenium) and PhantomJS (2.1.1).
I am running PhantomJS in both places with the following options:
--ignore-ssl-errors=true', '--debug=true', '--load-images=false'
I generally had an issue where the driver was getting the pages, but not able to find the target content because it might not have been letting the page long enough to load all the content. If I add a basic sleep in there between driver.get() and when I parse the page with beautiful soup (also same version - 4.5.3), it solves the problem in the mac environment. Though on the ubuntu environment, no matter how long it sleeps, it appears not to be loading all of the asynchronous content and then tends to fail to scrape on those pages.
I'd say on the mac environment it's getting nearly 100% of the content it is looking to scrape, where as on the ubuntu environment it is only getting roughly 50% consistently. So it is working, just not as reliably.
Is there a reason why the exact same version of PhantomJS would perform much more inconsistently where really the only difference is the operating system they are running on? Is there a good way to figure out why phantomjs is failing? The ghostdriver logs in debug mode don't really seem to be telling me anything useful regarding this issue.
I have a Python program that works with Selenium and PhantomJS, and I’d like to distribute it. The functionality is quite simple; it goes onto a website, fills certain forms and returns the outcome, without any visible browser action.
The problem is that I can’t expect an arbitrary user to have PhantomJS installed on their computers. How should I approach the distribution process?
I already checked Setuptools and PythonAnywhere, but I don’t think they work for what I want.
Edit: May be too hopeful, but I'd like to be able to distribute it for Windows, OSX and Ubuntu.
The way I do it is through a web application built on Flask (one of many great python web frameworks) and hosted on PythonAnywhere.
To use PhantomJS and Selenium in PythonAnywhere you have to ask for Docker Consoles. Instructions here: https://www.pythonanywhere.com/forums/topic/1320/
I have tried splinter for browser automation. Used firefox webdriver in splinter. But the problem is high CPU usage when the firefox loads and sometimes its hangs the gui. Please suggest me an option. I'm in a Linux box(Ubuntu) and building an app using pygtk.
Selinum with phantomjs should be a good replacement of splinter.