How to "add" geckodriver to PATH on ScrapingHub? - python

I am using python2 for web scraping, I have written a spider that uses headless Firefox (no GUI) to go on a website, log in with my account and furthermore interact with the website by pressing buttons, filling forms, calendars, etc. It works as expected on my personal computer, however, once I deploy it to Scrapinghub I get the error saying that geckodriver needs to be on path.. That directory already is on PATH on my computer, just not on Scrapinghub.
I tried copying geckodriver itself into a folder within the project, adding its subdirectory to the executable_path parameter for webdriver as shown in this small guide, and finally deploying to Scrapinghub again, but I still get the same error.
I would like to know how to add geckodriver to "PATH" on Scrapinghub (if possible) and whether there are other ways I can achieve this or not. I have read something about Python eggs but I am not sure that is something that would help me with this.
I use Windows 10 and python 2.7.

Related

How to use headless chrome in python on vercel?

You may know that heroku will stop their free dyno, free postgres etc from November. So I was finding some alternative to run my python web apps. I have almost 10 regular web apps which I visit regularly, like: url shortener, keyword research, google drive direct link generator site and many more. All of these are hosted on heroku. But I'm moving to vercel now. I setup all projects on vercel but the last one is complicated. My last project is python selenium bot. This one is my keyword research web app. I used some buildpack eg: Headless Chrome (https://github.com/heroku/heroku-buildpack-google-chrome) and Chromedriver (https://github.com/heroku/heroku-buildpack-chromedriver) to make this project run properly. But the problem is I could not find anything like buildpack in vercel to add Chrome and Chromedriver.
Anyone know about this?
Edit:
That was a kind of story and many people didn’t understand what I was asking.
So, My project is about selenium (python). Selenium needs google chrome browser installed and a chromedriver to run itself. There is another option without installing chrome is to set chrome binary location in webdriver.ChromeOptions(). I want to host this selenium project on vercel.com which is linux based.
So my question is how can I install Chrome Browser and ChromeDriver in vercel?

Python Selenium: Is there a way to install webdriver inside the virtual enviroment

I'm planning to build my first web automation project using selenium.
But before that, I'd like to know is there a way to install the web driver inside the virtual environment. I looked in the documentation that you should place the web driver inside the python bin folder, but I would like it to be inside a virtual environment. If there is a way to do that, please show the steps to do it.
If you bundle a WebDriver into the virtual environment, you should also include the browser itself... since it is bound with the browser.
Therefore, it is not recommended to bundle WebDriver with your app that way, and instead you should just run WebDriver separately from your client.
If you really look for bundling WebDriver with your app, you should rather user Docker, since it will allow you to properly install a browser, for example, there is a ready image on Docker hub for that, including Python, Chrome, Chromedriver and some of them also Xvfb.

Is possible to use Selenium with Python, for Electron apps?

I am pretty new to the Selenium testing with Electron apps; I know how to use Python to drive Chrome via the webdriver, and how to use Selenium IDE on Firefox, but I am having trouble to find a good source of info.
So far I have an app made with Electron, and I would like to use Selenium to drive it and automate the basics. I did some research and most of the results were using node.js, which I do not know at all. I would like to use Python, so before moving on a whole different language, I would like to ask to a bigger audience, if there is something already to do Selenium testing with Python, on Electron apps
In particular, how do you assign the variable that will contain the electron app? with the browser I would say
from selenium import webdriver
driver = webdriver.Chrome('/chromedriver')
but this won't make sense for an electron app.
I did find a way to catch the application.
You need to download Chromedriver; and run it on a port that you like(example: 8765).
Then you can access the application written via Electron, in Python using
from selenium import webdriver
remote_app = webdriver.remote.webdriver.WebDriver(
command_executor='http://localhost:8765',
desired_capabilities = {'chromeOptions':{ 'binary': '/myapp'}},
browser_profile=None,
proxy=None,
keep_alive=False)
Then you can access the DOM elements on the app as usual. Not sure if it will work on Windows, OSX and Linux, will have to try.
Yes you can do it with driver options and capabilities.
You need to set binary path and you should add Arguments on options.
Binary path is your electron application path under project directory in '.bin'.
Argument path is your project's main directory.
For example :
Let's say, your project under home directory and named 'ElectronProject'
Binart path is '/Users/Home/ElectronProject/node_modules/.bin/electron'
Argument Path is '/Users/Home/ElectronProject'
Yes, It is possible. you can refer the documentation # https://electronjs.org/docs/tutorial/using-selenium-and-webdriver

Distributing py program involving PhantomJS

I have a Python program that works with Selenium and PhantomJS, and I’d like to distribute it. The functionality is quite simple; it goes onto a website, fills certain forms and returns the outcome, without any visible browser action.
The problem is that I can’t expect an arbitrary user to have PhantomJS installed on their computers. How should I approach the distribution process?
I already checked Setuptools and PythonAnywhere, but I don’t think they work for what I want.
Edit: May be too hopeful, but I'd like to be able to distribute it for Windows, OSX and Ubuntu.
The way I do it is through a web application built on Flask (one of many great python web frameworks) and hosted on PythonAnywhere.
To use PhantomJS and Selenium in PythonAnywhere you have to ask for Docker Consoles. Instructions here: https://www.pythonanywhere.com/forums/topic/1320/

IIS Not Linking to Django with PyISAPIe

I'm trying to run a site with Django on an IIS-based server. I followed all the instructions on the main site (http://code.djangoproject.com/wiki/DjangoOnWindowsWithIISAndSQLServer), and double checked it with a very good article (http://www.messwithsilverlight.com/2009/11/django-on-windows-server-2003-and-iis6/).
I successfully got as far as setting up IIS to read .py files. Following the main instructions, I can get the server to render Info.py. However, I can't seem to get IIS and Django to play nice. If, for instance, my Virtual directory is "abc", then if I go to "localhost/abc/", the browser simply shows me the content directory for that folder. Furthermore, if I have my urls set up so that "/dashboard/1" should bring me to a certain page, entering "localhost/abc/dashboard/1" gives me a "page cannot be displayed" error.
I'm fairly certain IIS simply isn't referencing or interacting with Django at all. Does anyone have any ideas how to fix this?
Thanks
Here are the original instructions I followed,
basics instructions: https://code.djangoproject.com/wiki/DjangoOnWindowsWithIISAndSQLServer
additional tips: http://whelkaholism.blogspot.ca/
The first thing you should do is install Python 2.5 or 2.6, for 2.7 you need to recompile PyISAPIe, which I have not done. http://www.python.org/ftp/python/2.6/python-2.6.msi
You need to install the version of PyISAPIe that will match your Python Interpreter version, if they do not match, it will fail. Get it there : http://sourceforge.net/projects/pyisapie/files/pyisapie/
Move the extracted folder from the last step at a decent location (i.e. C:)
You need to change the security settings of the PyISAPIe.dll, they suggest Network Service read, but I set everyone, to be sure there are no problems with this
You then have to CUT AND PASTE (Important) the Http folder of PyISAPIe to Lib\Site-Packages of your Python installation directory
Next, you setup IIS (open the manager with inetmgr in run (winkey+r):
Add a new virtual directory and allow executing ISAPI extensions when prompted by the wizard
Add a new wildcard extension in the property of your virtual directory, untick file exist setting
Add Web Service Extension to IIS Manager pointing to the dll, ensure it is allowed
From the PyISAPIe folder, copy examples\django\Isapi.py and paste it in Lib\Site-Packages\Http
In Isapi.py, set the path (i.e. c:\inetpub\wwwroot\ web_site\ django_project ) and DJANGO_SETTINGS_MODULE (i.e. django_app .settings)
When any change is done to your files, use iisreset in your command prompt to apply the changes
Here are some other things you might do
Ensure the path of your db file (if sqlite used) is okay
Do the same with template location settings
In your urls and html files, ensure the path start with the name you gave to your virtual directory alias (i.e. web_site in our example)
Finally, you may encounter difficulties with serving your CSS. If you have any troubles, tell me and I will update my post.
Serving Django with any webserver basically involves three key details:
Telling the webserver, "I want you
to serve content that is provided by
this module that invokes python"
Telling the python module, "I want you to execute python code
using the details in this file"
Telling the file, "I want you to use Django"
If you're getting a directory listing back for your Virtual Directory then it would seem that you should investigate the VD settings to make sure PyISAPIe is configured for that directory (key details #1).
From the article you mentioned:
Open the IIS Management Console, and create a new virtual directory, and
allow executing ISAPI extensions when
prompted by the wizard.
View the properties of the new folder and click on the
"configuration" button (if it's greyed
out, click 'create' first), then add a
new wildcard extension (the lower
box), locate the pyisapie.dll file and
untick the "check that file exists"
box.
In the IIS Manager, go to the "Web Service Extensions" section, and
right click -> add new web service
extension.
Give it a name (it doesn't matter what), add the pyisapie.dll
fill as a required file and check the
box to set the extension status to
allowed.

Categories

Resources