I am using selenium to crawl a javascript website, the issue is that, a Firefox browser opens up, but the call for the URL is not done. however, when I close the browser, it is then that call for URL is done and of course I get the missing driver exception. what do you think the issue comes from.
knowing that:
all programs are up-to-date
my solution works fine, in local, but when I try to deploy it on the server, I start having issues
Example: at my local machine, I run this script and everything goes smooth, however when I run it a server (Linux), only the browser opens up and no get URL is called
from selenium import webdriver
import time
geckodriver_path = r'.../geckodriver'
driver = webdriver.Firefox(executable_path= geckodriver_path)
time.sleep(3)
driver.get("http://www.stackoverflow.com")
I end up finding the solution :
from selenium import webdriver
import time
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
geckodriver_path = r'/path_to/geckodriver'
binary = FirefoxBinary(r'/usr/bin/firefox')
capabilities = webdriver.DesiredCapabilities().FIREFOX
capabilities["marionette"] = False
driver = webdriver.Firefox(firefox_binary=binary,
executable_path= geckodriver_path,
capabilities=capabilities)
time.sleep(3)
driver.get("https://stackoverflow.com/")
time.sleep(6)
driver.close()
# solution from:
# https://github.com/SeleniumHQ/selenium/issues/3884
# https://stackoverflow.com/questions/25713824/setting-path-to-firefox-binary-on-windows-with-selenium-webdriver
Related
How to open a URL without webbrowser using Python, but with an address of the application with which I want to open it.
There are three ways to do this
Case 1: You just need to open a page. and that is it,
import webbrowser
webbrowser.open("https://yoursite.com")
The above will open https://yoursite.com on your default browser
Case 2: You need to open, as well as do a task eg refresh
from selenium import webdriver
import time
driver = webdriver.Edge("path to your edge driver")
driver.get("https://yoursite.com")
# The command below will refresh your webbrowser after 3 seconds of opening it
time.sleep(3)
driver.refresh()
If you want to use chrome... use driver = webdriver.Chrome("path to your chrome driver")
Case 3:You have already opened your webbrowser (via selenium) and you are in another program and you want to see it without opening it manually
from selenium import webdriver
import time
driver = webdriver.Edge("path to your edge driver")
driver.get("https://yoursite.com")
time.sleep(10)
# this command stores your webbrowser window in a variable
main_page = driver.current_window_handle
# this command moves the webbrowser to the front
driver.switch_to.window(main_page)
Also you have to install a driver for your browser if you are using the selenium thing...
For edge go to https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/#:~:text=Microsoft%20Edge%20WebDriver%20for%20Microsoft%20Edge%20will%20work,find%20your%20correct%20build%20number%3A%20Launch%20Microsoft%20Edge. and choose your edge version...
And for chrome go to https://chromedriver.chromium.org/downloads
You will also have to do pip install selenium in the terminal...
Hope this helps ;)
Try selenium, although you'll have to first download Chromedriver:
from selenium import webdriver from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox() driver.get("http://stackoverflow.com/")
body = driver.find_element_by_tag_name("body") body.send_keys(Keys.CONTROL + 't')
driver.close()
I am trying to acces whatsapp web with python without having to scan the QR code everytime I restart the program (because in my normal browser I also dont have to do that). But how can I do that? Where is the data stored that tells whatsapp web to connect to my phone? And how do I save this data and send it to the browser when I rerun the code?
I already tried this because someone told me I should save the cookies:
from selenium import webdriver
import time
browser = None
cookies = None
def init():
browser = webdriver.Firefox(executable_path=r"C:/Users/Pascal/Desktop/geckodriver.exe")
browser.get("https://web.whatsapp.com/")
time.sleep(5) # in this time I scanned the QR to see if there are cookies
cookies = browser.get_cookies()
print(len(cookies))
print(cookies)
init()
Unfortunately there were no cookies..
The output was 0 and [].
How do I fix this probblem?
As mentioned in the answer to this question, pass your Chrome profile to the Chromedriver in order to avoid this problem. You can do it like this:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = webdriver.ChromeOptions()
options.add_argument("user-data-dir=C:\\Path") #Path to your chrome profile
driver = webdriver.Chrome(executable_path="C:\\Users\\chromedriver.exe", options=options)
This one works for me, I just created a folder, on the home directory of the script and a little modifications and it works perfectly.
###########
E_PROFILE_PATH = "user-data-dir=C:\Users\Denoh\Documents\Project\WhatBOts\SessionSaver"
##################
This is the Config File that I will import later
##################
The main script starts here
##################
from selenium import webdriver
from config import E_PROFILE_PATH
options = webdriver.ChromeOptions()
options.add_argument(E_PROFILE_PATH)
driver = webdriver.Chrome(executable_path='chromedriver_win32_86.0.4240.22\chromedriver.exe', options=options)
driver.get('https://web.whatsapp.com/')
I’m working to make web crawler with python by using selenium
Here, I successfully got contents by using chromedriver, but problem occurred when I tried to make
Headless access crawling through PhantomJS. find_element_by_id, or find_element_by_name did not work
Is there any difference between these? Actually I am trying to make this as headless because I want to run this
Code in ubuntu server as a batch job without GUI support.
My script is like as below.
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import re
#driver = webdriver.PhantomJS('/Users/user/Downloads/phantomjs-2.1.1-macosx/bin/phantomjs')
#driver = webdriver.Chrome('/Users/user/Downloads/chromedriver')
driver = webdriver.PhantomJS()
driver.set_window_size(1120, 550)
driver.get(url)
driver.implicitly_wait(3)
#here I tried two different find_tag things but both didn’t work
user = driver.find_element(by=By.NAME,value="user:email")
password = driver.find_element_by_id('user_password')
I am using Python Selenium to open a Firefox browser and go to a URL. The function I am using to do this is...
def openurl_function():
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://www.example.com')
When I run the function it always opens a new instance of FireFox, is there a way to have it to just open using the same browser instance?
Currently if I run the function 10 times then I get 10 FireFox browsers open.
Just keep reusing the same driver. You are creating a new browser, every time you call
driver = webdriver.Firefox()
Also, because you never quit() on your driver, you will probably have all the browsers stay open as orphans because you deleted the handle to them when you created a new browser.
I am having a strange issue with PhantomJS or may be I am newbie. I am trying to login on NewEgg.com via Selenium by using PhantomJS. I am using Python for it. Issue is, when I use Firefox as a driver it works well but as soon as I set PhantomJS as a driver it does not go to next page hence give message:
Exception Message: u'{"errorMessage":"Unable to find element with id \'UserName\'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"89","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:55372","User-Agent":"Python-urllib/2.7"},"httpVersion":"1.1","method":"POST","post":"{\\"using\\": \\"id\\", \\"sessionId\\": \\"aaff4c40-6aaa-11e4-9cb1-7b8841e74090\\", \\"value\\": \\"UserName\\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/aaff4c40-6aaa-11e4-9cb1-7b8841e74090/element"}}' ; Screenshot: available via screen
The reason I found after taking screenshot that phantom could not navigate the page and script got finished. How do I sort this out? Code Snippet I tried given below:
import requests
from bs4 import BeautifulSoup
from time import sleep
from selenium import webdriver
import datetime
my_username = "user#mail.com"
my_password = "password"
driver = webdriver.PhantomJS('/Setups/phantomjs-1.9.7-macosx/bin/phantomjs')
firefox_profile = webdriver.FirefoxProfile()
#firefox_profile.set_preference('permissions.default.stylesheet', 2)
firefox_profile.set_preference('permissions.default.image', 2)
firefox_profile.set_preference('dom.ipc.plugins.enabled.libflashplayer.so', 'false')
#driver = webdriver.Firefox(firefox_profile)
driver.set_window_size(1120, 550)
driver.get('http://newegg.com')
driver.find_element_by_link_text('Log in or Register').click()
driver.save_screenshot('screen.png')
I even put sleep but it is not making any difference.
I experienced this with PhantomJS when the content type of the second page is not correct. A normal browser would just interpret the content dynamically, but Phantom just dies, silently.