I have to get the longitude and latitude of all of "كشري" restaurants in "Cairo" in "Egypt". The thing is: Google API is requiring a key which is not free. So, I decided to go with selenium.
I tried Using this code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time
# create a new Chrome session
browser = webdriver.Edge()
browser.maximize_window()
# navigate to Google Maps website
browser.get('https://www.google.com/maps')
# Search for 'كشري'
search_box = browser.find_element(By.NAME,'q') # find search box element by name attribute
search_box.send_keys('كشري') # enter 'كشري' in the search box
search_box.send_keys(Keys.ENTER) # hit enter key
time.sleep(5)
# iterate through each restaurant and get its longitude and latitude
restaurants = browser.find_elements(By.XPATH, "//div[#class='section-result-content']")
for result in restaurants:
name = result.find_element(By.XPATH, '//div[#class="section-result-title"]/span').text
lon = result.find_element(By.XPATH, '//div[#class="section-result-location"]/span[#class="section-result-location-longitude"]').text
lat = result.find_element(By.XPATH, '//div[#class="section-result-location"]/span[#class="section-result-location-latitude"]').text
print(name + ': ' + lon + ' ' + lat)
The result went as expected, It searched for the restaurants, hit enter so the list would appear.
after it was time for the iteration, It gave back a big fat nothing!
There was no errors. Just nothing.
your help would be much appreciated.
Related
This code will go to a website, launch it, extract 2 web-elements(email, and ticket#) and print them successfully.
from audioop import add
from inspect import isframe
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.support import expected_conditions as EC
#Firefox Driver
driver = webdriver.Firefox(service=Service(r'C:\geckodriver-v0.32.0-win-aarch64\geckodriver.exe'))
#Launches Ticketing website
driver.get('WebsiteURl')
wait = WebDriverWait(driver, 10)
#Switches to iFrame
iframe = driver.find_element(By.XPATH,'//*[#id="gsft_main"]')
driver.switch_to.frame(iframe)
#Calls for value in row (Email and Ticket #)
Email = driver.find_element(By.XPATH,"//table/tbody/tr[1]/td[8]")
Ticket = driver.find_element(By.XPATH,"//table/tbody/tr[1]/td[3]")
print(Ticket.text + " : " + Email.text)
This is the output:
TicketNumber001 : useremail#domain.com
The output works just as intended, but now I am looking to do this for the below columns utilizing the next 10 consecutive XPaths:
tr[1]/td[3]
...
..
.
tr[10]/td[3]
Which should look like this and I should be able to export that into a CSV to interact with a Powershell Script I have:
TicketNumber001 useremail#domain.com
...
..
.
TicketNumber010 useremail10#domain.com
I would appreciate your input, I'm a total newb with python and this is the first time using selenium.
Thank you,
If there are no other extra data present for your table, you could use find_elements to fetch only the table data directly.
Tickets = driver.find_elements(By.XPATH,"//table//td[3]")
This would give you an array of elements that you can manipulate.
You can iterate through the array to get the texts
for ticket in Tickets:
print("Ticket:"+ ticket.text)
Assuming your email and Ticket are always equal
Emails = driver.find_elements(By.XPATH,"//table//td[8]")
Tickets = driver.find_elements(By.XPATH,"//table//td[3]")
for i in range(0, len(Tickets)-1):
print(Tickets[i].text + " : " + Emails[i].text)
Working on a project to make reservations and I'm very rusty. I am able to navigate dynamically to the page for a reservation 2 weeks out, but I am unable to locate and click on the time slots.
My final line throws me an error, but my ultimate goal is to develop a code block that will iterate through the available time with some ranking system. For example, I set a ranked order of 8pm, 7:45pm, 7:30pm, 7:15pm, 8:15pm, etc. These time slots go fast, so I'll have to be able to handle the possibility of the reservation being gone or even taken while completing the checkout process.
I know this is a lot, so any help or guidance is appreciated!
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import datetime
ResDate = datetime.date.fromordinal(datetime.date.today().toordinal()+14).strftime("%Y-%m-%d")
print(ResDate)
URL = "https://resy.com/cities/ny/lartusi-ny?date={}&seats=2".format(ResDate)
timeout = 30
driver = webdriver.Chrome()
driver.get(URL)
TimeBlock = WebDriverWait(driver, timeout).until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT, '10:00')))
TimeBlock.click()
wait = WebDriverWait(driver, 3)
ranking_list=['8:00PM','7:45PM','10:00PM']
for rank in ranking_list:
try:
wait.until(EC.element_to_be_clickable((By.XPATH,f"//div[#class='ReservationButton__time' and text()='{rank}']"))).click()
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[#aria-label='Book with Resy']")))
wait.until(EC.element_to_be_clickable((By.XPATH,"//button[./span[.='Reserve Now']]"))).click()
break
except:
print('No availability: ',rank)
Imports:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
Basically access each element of your ranking_list and then proceed to click on the reservation with that text. You can exit the loop if you can click on the reservation with break optional.
I didn't get your question about the ranking system. But for button clicking issue, try this code:
time_to_book = "10:15PM"
time_in_a_day = driver.find_elements(By.XPATH,"//*[#class='ReservationButton__time']")
# print(len(time_in_a_day))
time_text = []
for i in range(len(time_in_a_day)):
time_text.append(time_in_a_day[i].text)
for i in range(len(time_text)):
if time_text[i] == time_to_book:
element = driver.find_element(By.XPATH,"(//*[#class='ReservationButton__time'])[" + str(i + 1) + "]//parent::button")
driver.execute_script("arguments[0].click();", element)
break
I'm trying to webscrape this page : https://mlapshin.com/index.php/scrum-quizzes/sm-learning-mode/
I want to scrape the questions and answers
However, I'm having trouble clicking on the next button to scrape all the informations. I've tried doing this:
driver = webdriver.Chrome('C:/Users/Ihnhn/Documents/WebScrap/Selenium/chromedriver.exe')
driver.get("https://mlapshin.com/index.php/scrum-quizzes/sm-learning-mode/")
driver.maximize_window()
start_quizz = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"input[name='startQuiz']"))).click()
driver.execute_script("window.scrollTo(0,400);")
all_questions = driver.find_elements_by_class_name("wpProQuiz_listItem")
for i in all_questions:
nom_question = i.find_element_by_class_name("wpProQuiz_question_text").text
print(nom_question)
check_answer = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH,"//input[#name='check']"))).click()
next_answer = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH,"//input[#name='next']"))).click()
So i wanted to try first, to get just the name of the questions but it gives me TimeoutException
it just scrap the 2 first questions and thats all, so in the second question, it just doesnt click on the button "check" so that I can continue
(there are 87 questions : so I imagined that with this code it would get the 87 questions)
Im am a beginner in web scraping, so Im a litte lost... if anyone could help me
Thanks
You use absolute xpath "//input[#name='check']" so it always searchs first input Check on page but if you check HTML in browser then you see every question has own input Check (and input Next) - and when it displays second question then your xpath waits for input Check in first question - but this input is hidden and it can't be clickable.
You should use relative xpath (with dot) ".//input[#name='check']" and you should use i instead of driver when you use relative xpath.
all_questions = driver.find_elements_by_class_name("wpProQuiz_listItem")
for i in all_questions:
# relative to `i`
nom_question = i.find_element_by_class_name("wpProQuiz_question_text").text
print(nom_question)
# relative to `i`
check_answer = WebDriverWait(i, 20).until(EC.element_to_be_clickable((By.XPATH,".//input[#name='check']"))).click()
# relative to `i`
next_answer = WebDriverWait(i, 20).until(EC.element_to_be_clickable((By.XPATH,".//input[#name='next']"))).click()
Full working code which I used to test it.
from selenium import webdriver
#from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException, TimeoutException
#from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.firefox import GeckoDriverManager
#import time
#driver = webdriver.Chrome(executable_path=ChromeDriverManager().install())
driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())
driver.get("https://mlapshin.com/index.php/scrum-quizzes/sm-learning-mode/")
driver.maximize_window()
start_quizz = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"input[name='startQuiz']"))).click()
driver.execute_script("window.scrollTo(0,400);")
all_questions = driver.find_elements_by_class_name("wpProQuiz_listItem")
for item in all_questions:
# relative to `item`
nom_question = item.find_element_by_class_name("wpProQuiz_question_text").text
print(nom_question)
# relative to `item`
check_answer = WebDriverWait(item, 20).until(EC.element_to_be_clickable((By.XPATH,".//input[#name='check']"))).click()
#time.sleep(0.5)
# relative to `item`
next_answer = WebDriverWait(item, 20).until(EC.element_to_be_clickable((By.XPATH,"//input[#name='next']"))).click()
#time.sleep(0.5)
I am working on a script to gather information off Newegg to look at changes over time in graphics card prices. Currently, my script will open a Newegg search on RTX 3080's through Chromedriver and then click on the link for Desktop Graphics Cards to narrow down my search. The part that I am struggling with is developing a for item in range loop that will let me iterate through all 8 search result pages. I know that I could do this by simply changing the page number in the URL, but as this is an exercise that I'm trying to use to learn Relative Xpath better, I want to do it using the Pagination buttons at the bottom of the page. I know that each button should contain inner text of "1,2,3,4 etc." but whenever I use text() = {item} in my for loop, it doesn't click the button. The script runs and doesn't return any exceptions, but doesn't do what I want it too. Below I have attached the HTML for the page as well as my current script. Any suggestions or hints are appreciated.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
import pandas as pd
import time
options = Options()
PATH = 'C://Program Files (x86)//chromedriver.exe'
driver = webdriver.Chrome(PATH)
url = 'https://www.newegg.com/p/pl?d=RTX+3080'
driver.maximize_window()
driver.get(url)
card_path = '/html/body/div[8]/div[3]/section/div/div/div[1]/div/dl[1]/dd/ul[2]/li/a'
desktop_graphics_cards = driver.find_element(By.XPATH, card_path)
desktop_graphics_cards.click()
time.sleep(5)
graphics_card = []
shipping_cost = []
price = []
total_cost = []
for item in range(9):
try:
#next_page_click = driver.find_element(By.XPATH("//button[text() = '{item + 1}']"))
print(next_page_click)
next_page_click.click()
except:
pass
The pagination buttons are out of the initially visible area.
In order to click these elements you will have to scroll the page until the element appears.
Also, you will need to click next page buttons starting from 2 up to 9 (including) while you trying to do this with numbers from 1 up to 9.
I think this should work better:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
import pandas as pd
import time
options = Options()
PATH = 'C://Program Files (x86)//chromedriver.exe'
driver = webdriver.Chrome(PATH)
url = 'https://www.newegg.com/p/pl?d=RTX+3080'
actions = ActionChains(driver)
driver.maximize_window()
driver.get(url)
card_path = '/html/body/div[8]/div[3]/section/div/div/div[1]/div/dl[1]/dd/ul[2]/li/a'
desktop_graphics_cards = driver.find_element(By.XPATH, card_path)
desktop_graphics_cards.click()
time.sleep(5)
graphics_card = []
shipping_cost = []
price = []
total_cost = []
for item in range(2,10):
try:
next_page_click = driver.find_element(By.XPATH(f"//button[text() = '{item}']"))
actions.move_to_element(next_page_click).perform()
time.sleep(2)
#print(next_page_click) - printing a web element itself will not give you usable information
next_page_click.click()
#let the next page loaded, it takes some time
time.sleep(5)
except:
pass
I am trying to scrape a table found inside a div on a page.
Basically here's my attempt so far:
# NOTE: Download the chromedriver driver
# Then move exe file on C:\Python27\Scripts
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import sys
driver = webdriver.Chrome()
driver.implicitly_wait(10)
URL_start = "http://www.google.us/trends/explore?"
date = '&date=today%203-m' # Last 90 days
location = "&geo=US"
symbol = sys.argv[1]
query = 'q='+symbol
URL = URL_start+query+date+location
driver.get(URL)
table = driver.find_element_by_xpath('//div[#class="line-chart"]/table/tbody')
print table.text
If I run the script, with an argument like "stackoverflow" I should be able to scrape this site: https://www.google.us/trends/explore?date=today%203-m&geo=US&q=stackoverflow
Apparently the xpath I have there is not working, the program is not printing anything, it's just plain blank.
I am basically in need on the values of the chart that appears on that website. And those values (and dates) are inside a table, here is a screenshot:
Could you help me locate the correct xpath of the table to retrieve those values using selenium on python?
Thanks in advance!
you can use Xpath As Follow:
//div[#class="line-chart"]/div/div[1]/div/div/table/tbody/tr
Here I will Refine my answer and make some changes in your code not it's work.
# NOTE: Download the chromedriver driver
# Then move exe file on C:\Python27\Scripts
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import sys
from lxml.html import fromstring,tostring
driver = webdriver.Chrome()
driver.implicitly_wait(20)
'''
URL_start = "http://www.google.us/trends/explore?"
date = '&date=today%203-m' # Last 90 days
location = "&geo=US"
symbol = sys.argv[1]
query = 'q='+symbol
URL = URL_start+query+date+location
'''
driver.get("https://www.google.us/trends/explore?date=today%203-m&geo=US&q=stackoverflow")
table_trs = driver.find_elements_by_xpath('//div[#class="line-chart"]/div/div[1]/div/div/table/tbody/tr')
for tr in table_trs:
#print tr.get_attribute("innerHTML").encode("UTF-8")
td = tr.find_elements_by_xpath(".//td")
if len(td)==2:
print td[0].get_attribute("innerHTML").encode("UTF-8") +"\t"+td[1].get_attribute("innerHTML").encode("UTF-8")