I have written a small python script with selenium to search Google and open the first link but whenever I run this script, it opens a console and open a new Chrome window and run this script in that Chrome window.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pyautogui
def main():
setup()
# open Chrome and open Google
def setup():
driver = webdriver.Chrome(r'C:\\python_programs'+
'(Starting_out_python)'+
'\\chromedriver.exe')
driver.get('https://www.google.com')
assert 'Google' in driver.title
mySearch(driver)
#Search keyword
def mySearch(driver):
search = driver.find_element_by_id("lst-ib")
search.clear()
search.send_keys("Beautiful Islam")
search.send_keys(Keys.RETURN)
first_link(driver)
#click first link
def first_link(driver):
link = driver.find_elements_by_class_name("r")
link1 = link[0]
link1.click()
main()
How can I open this in the same browser I am using?
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
def main():
setup()
# open Chrome and open Google
def setup():
driver = webdriver.Chrome()
driver.get('https://www.google.com')
assert 'Google' in driver.title
mySearch(driver)
#Search keyword
def mySearch(driver):
search = driver.find_element_by_id("lst-ib")
search.clear()
search.send_keys("test")
search.send_keys(Keys.RETURN)
first_link(driver)
#click first link
def first_link(driver):
link = driver.find_elements_by_xpath("//a[#href]")
# uncomment to see each href of the found links
# for i in link:
# print(i.get_attribute("href"))
first_link = link[0]
url = first_link.get_attribute("href")
driver.execute_script("window.open('about:blank', 'tab2');")
driver.switch_to.window("tab2")
driver.get(url)
# Do something else with this new tab now
main()
A few observation: the first link you get might not be the first link you want. In my case, the first link is the login to Google account. So you might want to do some more validation on it until you open it, like check it's href property, check it's text to see if it matches something etc.
Another observation is that there are easier ways of crawling google search results and using googles API directly or a thirdparty implementation like this: https://pypi.python.org/pypi/google or https://pypi.python.org/pypi/google-search
To my knowledge, there's no way to attach Selenium to an already-running browser.
More to the point, why do you want to do that? The only thing I can think of is if you're trying to set up something with the browser manually, and then having Selenium do things to it from that manually-set-up state. If you want your tests to run as consistently as possible, you shouldn't be relying on a human setting up the browser in a particular way; the script should do this itself.
Related
I was going to use Selenium to crawl the web
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome('./chromedriver', options=options)
driver.get('https://steamdb.info/tag/1742/?all')
driver.implicitly_wait(3)
li = []
games = driver.find_elements_by_xpath('//*[#class="table-products.text-center.dataTable"]')
for i in games:
time.sleep(5)
li.append(i.get_attribute("href"))
print(li)
After accessing the steam url that I was looking for, I tried to find something called an appid
The picture below is the HTML I'm looking for
I'm trying to find the number next to "data-appid="
But if I run my code, nothing is saved in the "games"
Correct me if I'm wrong but from what I can see this steam page requires you to log-in, are you sure that when webdriver opens the page that same data is available to you ?
Additionally when using By, the correct syntax would be games = driver.find_element(By.CSS_SELECTOR('//*[#class="table-products.text-center.dataTable"]'))
I'm new to selenium and I wrote this code that gets user input and searches in ebay but I want to save the new link of the search so I can pass it on to BeautifulSoup.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
search_ = input()
browser = webdriver.Chrome(r'C:\Users\Leila\Downloads\chromedriver_win32')
browser.get("https://www.ebay.com.au/sch/i.html?_from=R40&_trksid=p2499334.m570.l1311.R1.TR12.TRC2.A0.H0.Xphones.TRS0&_nkw=phones&_sacat=0")
Search = browser.find_element_by_id('kw')
Search.send_keys(search_)
Search.send_keys(Keys.ENTER)
#how do you write a code that gets the link of the new page it loads
To extract a link from a webpage, you need to make use of the HREF attribute and use the get_attribute() method.
This example from here illustrates how it would work.
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
driver = webdriver.Chrome(chrome_options=options)
driver.get('https://www.w3.org/')
for a in driver.find_elements_by_xpath('.//a'):
print(a.get_attribute('href'))
In your case, do:
Search = browser.find_element_by_id('kw')
page_link = Search.get_attribute('href')
I would like to scrape job listings from a Dutch job listings website. However, when I try to open the page with selenium I run into a cookiewall (new GDPR rules). How do I bypass the cookiewall?
import selenium
#launch url
url = "https://www.nationalevacaturebank.nl/vacature/zoeken?query=&location=&distance=city&limit=100&sort=relevance&filters%5BcareerLevel%5D%5B%5D=Starter&filters%5BeducationLevel%5D%5B%5D=MBO"
# create a new Firefox session
driver = webdriver.Firefox()
driver.implicitly_wait(30)
driver.get(url)
Edit I tried something
import selenium
import pickle
url = "https://www.nationalevacaturebank.nl/vacature/zoeken?query=&location=&distance=city&limit=100&sort=relevance&filters%5BcareerLevel%5D%5B%5D=Starter&filters%5BeducationLevel%5D%5B%5D=MBO"
driver = webdriver.Firefox()
driver.set_page_load_timeout(20)
driver.get(start_url)
pickle.dump(driver.get_cookies() , open("NVBCookies.pkl","wb"))
after that loading the cookies did not work
for cookie in pickle.load(open("NVBCookies.pkl", "rb")):
driver.add_cookie(cookie)
InvalidCookieDomainException: Message: Cookies may only be set for the current domain (cookiewall.vnumediaonline.nl)
It looks like I don't get the cookies from the cookiewall, correct?
Instead of bypassing why don't you write code to check if it's present then accept it otherwise continue with next operation. Please find below code for more details
import unittest
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
class PythonOrgSearch(unittest.TestCase):
def setUp(self):
self.driver = webdriver.Chrome(executable_path="C:\\Users\\USER\\Downloads\\New folder (2)\\chromedriver_win32\\chromedriver.exe")
def test_search_in_python_org(self):
driver = self.driver
driver.get("https://www.nationalevacaturebank.nl/vacature/zoeken?query=&location=&distance=city&limit=100&sort=relevance&filters%5BcareerLevel%5D%5B%5D=Starter&filters%5BeducationLevel%5D%5B%5D=MBO")
elem = driver.find_element_by_xpath("//div[#class='article__button']//button[#id='form_save']")
elem.click()
def tearDown(self):
self.driver.close()
if __name__ == "__main__":
unittest.main()
driver.find_element_by_xpath('//*[#id="form_save"]').click()
ok I made selenium click the accept button. Also fine by me. Not sure if I'll run into cookiewalls later
I want to iteratively search for 30+ items through a search button in webpage and scrape the related data.
My search items are stored in a list: vol_list
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome("driver path")
driver.get("web url")
for item in vol_list :
mc_search_box = driver.find_element_by_name("search_str")
mc_search_box.clear()
search_box.send_keys(item)
search_box.send_keys(Keys.RETURN)
After search is complete I will proceed to scrape the data for each item and store in array/list.
Is it possible to repeat this process without opening browser for every item in the loop?
You can't use chrome and other browsers without opening it.
In your case, headless browsers should do the job. Headless browsers simulates browser, but doesn't have GUI.
Try ghost driver/ html unit driver/ NodeJS. Then you will have to modify at least this line with the driver you want to use:
driver = webdriver.Chrome("driver path")
Good luck!
If you're using firefox, you can apply the headless option:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument("--headless")
driver = webdriver.Firefox(options=options)
driver.get('your url')
I just want to refresh an already opened web page with Selenium.
It always opens a new browser window.
What I'm doing wrong?
from selenium import webdriver
import urllib
import urllib2
driver = webdriver.Firefox()
driver.refresh()
I would suggest binding the driver element search to the tag body and use the refresh command of the browser.
In OSX for example
driver.find_element_by_tag_name('body').send_keys(Keys.COMMAND + 'r')
Documentation on keys here: http://selenium-python.readthedocs.org/en/latest/api.html
Update:
The following code, very similar to your one, works fine for me.
driver = webdriver.Firefox()
driver.get(response.url) #tested in combination with scrapy
time.sleep(3)
driver.refresh()
Are you sure you correctly load the web page with the driver before refreshing it ?
The problem is you are opening the webdriver and then trying to refresh when you have not specified a URL.
All you need to do is get your desired URL before refreshing:
from selenium import webdriver
import urllib
import urllib2
driver = webdriver.Firefox()
driver.get("Your desired URL goes here...")
#now you can refresh the page!
driver.refresh()
The following codes work for me
driver.get(driver.current_url)
sleep(2)
driver.refresh()
I use python 3.7.6, selenium 3.141.0
You are trying to refresh the page before it loads so u can use a sleep function
from time import sleep
sleep(1)
or you can wait for an XPath to load so
WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, xpath goes here)))
For me helped
from selenium import webdriver
import time
driver = webdriver.Firefox()
driver.get("URL")
time.sleep(5)
driver.refresh()
I got mine fixed by adding "browser.refresh()" the for loop or while loop.
You can try any one of the below methods for the same.
Method 1:
driver.findElement(By.name("s")).sendKeys(Keys.F5);
Method 2:
driver.get(driver.getCurrentUrl());
Method3:
driver.navigate().to(driver.getCurrentUrl());
Method4:
driver.findElement(By.name("s")).sendKeys("\uE035");