Apologies if this is not the right place to post this.
I am using a script to scrape asynchronously loaded content from news sites and am running into a situation where the version of phantomJS (with selenium on python) that I have running on my dev machine (a mac with El Capitan) has a much higher level of success than the version I am running on an ubuntu 14.04 LTS server. Despite the fact that I am using the exact same version of Python (3.6.0), Selenium library for python (3.4.0), ghostdriver (1.2.0, but I believe this is just what comes with Selenium) and PhantomJS (2.1.1).
I am running PhantomJS in both places with the following options:
--ignore-ssl-errors=true', '--debug=true', '--load-images=false'
I generally had an issue where the driver was getting the pages, but not able to find the target content because it might not have been letting the page long enough to load all the content. If I add a basic sleep in there between driver.get() and when I parse the page with beautiful soup (also same version - 4.5.3), it solves the problem in the mac environment. Though on the ubuntu environment, no matter how long it sleeps, it appears not to be loading all of the asynchronous content and then tends to fail to scrape on those pages.
I'd say on the mac environment it's getting nearly 100% of the content it is looking to scrape, where as on the ubuntu environment it is only getting roughly 50% consistently. So it is working, just not as reliably.
Is there a reason why the exact same version of PhantomJS would perform much more inconsistently where really the only difference is the operating system they are running on? Is there a good way to figure out why phantomjs is failing? The ghostdriver logs in debug mode don't really seem to be telling me anything useful regarding this issue.
Related
My team and I have set up an account with Hostinger and have a VPS set up with its own domain. Our current Operating System is CentOS 7 64bit with Webmin/Virtualmin/LAMP and we have Webmin set up as our Cpanel. As of right now we have our HTML pages showing but our Python code is not working.
We used SSH to download Python3, MongoDB, pymongo, and flask, but are still having trouble getting our Python code to work on our web application. From here we are unsure what to do and need guidance on what our next steps should be. Thank you in advance for any help given.
It sounds like what you've gone for on your VPS is a web hosting setup rather than a bare metal VPS setup. I can see why you think you'd want web hosting, but in reality Flask works differently in that it is its own application which needs to run rather than being served like an HTML page.
There is an excellent tutorial on how to do this here. It is designed for Ubuntu (which is a good setup if you are starting fresh) but there are also versions for different linux flavours.
I have a Python program that works with Selenium and PhantomJS, and I’d like to distribute it. The functionality is quite simple; it goes onto a website, fills certain forms and returns the outcome, without any visible browser action.
The problem is that I can’t expect an arbitrary user to have PhantomJS installed on their computers. How should I approach the distribution process?
I already checked Setuptools and PythonAnywhere, but I don’t think they work for what I want.
Edit: May be too hopeful, but I'd like to be able to distribute it for Windows, OSX and Ubuntu.
The way I do it is through a web application built on Flask (one of many great python web frameworks) and hosted on PythonAnywhere.
To use PhantomJS and Selenium in PythonAnywhere you have to ask for Docker Consoles. Instructions here: https://www.pythonanywhere.com/forums/topic/1320/
I am developing a python script to take screenshots from many websites. for this I am using below tools,
phantomjs with selenium
python
windows PC
I have used pyside(instead of phantomjs) for that job but I faced many issues on pyside..
now I have found phantomjs tool from Google.com, I have used phantomjs with selenium for python in windows machine it is working flawless. but it has only one issue phantomjs doesn't support flash player, so am not able to process youtube and some flash websites.. please give me the some quick fix for this
PhantomJS does not and probably will not support Flash and other plugins (see here).
But you can use SlimerJS in your Selenium tests, which is a headless browser based on the Gecko engine. It does support the WebDriver protocol, so use it.
There is also a fork of PhantomJS with Flash support, but it didn't merge changes in PhantomJS back into it, so it is standing still at version 1.9.0.
Phantomjs now don't rely on xwindow enviroment since 1.5, also it has remove plugin support at that time. So there is no officially support for running flash player in current phantomjs version.
Howerver, there are so many project fork from the old phantomjs that has flash player enabled and keep update. You can try r3b phantomjs. Recently I had build a perfect service upon this project under ubuntu os.
I am writing a piece of code that uses the Box.com Python SDK. The SDK uses the requests module to communicate with Box.com as per the API documentation. For my purposes, I need to make several GET and POST requests in a row, some of which could be used to transfer files. The issue that I'm running into is this:
On Linux (Ubuntu 13.10), each request takes a relatively long time (5 to 15 seconds) to get through, though transfer speeds for file transfers are as expected in the context of my network connection.
On Windows 8.1, running the exact same code, the requests go through really fast (sub-second fast).
On both platforms I am using the same version of iPython (1.1.0) and of the requests module (1.2.3) under Python 2.7. This is particularly problematic for me because the code I'm working on will eventually be implemented on Linux machines.
Is this problem someone has encountered before? I would love to hear from anyone with some ideas on what the issue might be. I have yet to try it on a different Linux installation to see if it is a problem with the specific setup.
Thanks.
EDIT 1
So, I decided to check this using virtual machines. Using the same Debian virtual machine under Windows all the responses were fast, but under Ubuntu they were slow. I then made a Ubuntu 12.04 live USB and ran the code on that, and the responses were fast there as well.
So, it's not Python or Linux in general, it's my particular installation and I have no idea how to diagnose the problem :(
Use a tool such as wireshark (which needs to be run with sudo on most distributions) to log the individual network packets when your code makes the API requests, to determine what is taking so long.
My guess is the following possibilities are most likely:
For some reason your Ubuntu installation is picking up the wrong DNS server list, and DNS lookups are timing out.
IPv6 issue (which may appear to be a DNS issue, too). Disable IPv6.
I have several django projects and they work well on my desktop. But when I run them on my laptop, they run ok for sometime. Then on a random occasion, opening a page won't work. The browser keeps trying to load the page (title tab keeps spinning, URL changes to the page its trying to open, and the page turns blank), while the development server (django on windows shell) says it has successfully served the page (200 status).
This behavior is consistent among Firefox, IE and Chrome. I tried changing ports, using machine IP instead of localhost, loading static files on external server, but nothing works. I tried opening the site (using laptop computer name) from desktop browsers and behaves the same. Another interesting thing is, even if I shutdown and restart the django server, I wont be able to open the page that have failed previously unless I close the loading page.
My laptop is running a basic Windows 8, while desktop is Windows 8 Pro. I think the windows version has something to do with it.
Does anyone know how to solve this? I hope I made myself clear. Thanks.
It is hard to tell whether the issue is related to Windows specifically, rather than compatibility issues with images/CSS/Javascript/plugins such as Flash. Are you running the latest versions of those browsers (or at least the same versions as on your desktop)? Do you have different security software/firewalls? Do other sites load inconsistently? Seems unlikely to be a Django issue (although you can try loading sites like djangoproject.com).
Thanks people for your comments and answer. I uninstalled from the laptop each application that is not present in the desktop and found which one is causing the problem. This app called NetWorx has a network filtering that I enabled and for some reason its blocking the django response. I disabled network filtering which is good enough for my need.