I am absolutely stuck on this one. I am scraping restaurant URLs from a webpage and there is a button at the bottom to reveal more restaurants. The website button code is below (i believe):
<div id="restsPages">
<a class="next" data-url="https://hungryhouse.co.uk/takeaways/aberdeen-bridge-of-dee-ab10">Show more</a>
<a class="back">Back to top</a>
</div>
It is the "Show more" button i am trying to activate. The url within the "data-url" does not reveal more of the page.
It all seems a bit odd on what do do to activate the button from within the python spider?
The code i am trying to use to make this work is:
import scrapy
from hungryhouse.items import HungryhouseItem
from selenium import webdriver
class HungryhouseSpider(scrapy.Spider):
name = "hungryhouse"
allowed_domains = ["hungryhouse.co.uk"]
start_urls = ["https://hungryhouse.co.uk/takeaways/westhill-ab10",
]
def __init__(self):
self.driver = webdriver.Chrome()
def parse(self,response):
self.driver.get(response.url)
while True:
next =self.driver.find_element_by_xpath('//*[#id="restsPages"]/a[#class="next"]')
try:
next.click()
except:
break
self.driver.close()
.... rest of the code follows
The error i get is: 'chromedriver' executable needs to be in PATH
This was resolved at Pressing a button within python code with reference to the answer at Error message: "'chromedriver' executable needs to be available in the path"
But specifically
self.driver = webdriver.Chrome()
needed to change to
self.driver = webdriver.Chrome("C:/Users/andrew/Downloads/chromedriver_win32/chromedriver.exe")
in my case.
i.e. I needed to add the path to chromedriver.exe.
Related
so I am trying to automate a program that would log in to a google account I created, go to this canvas website, and draw a circle,
(Circle just as a placeholder because im trying to make it draw some cool stuff like a car, just to test if it will work first.) But the main issue is when you first go into the website, there is a pop up that displays and that pop up has 2 options, "learn more" and "Get started". I tried to make selenium click on the "Get started" using driver.find_element_by_id('Get-started').click but it does not seem to work, I also tried to use CSS selector but that does not seem to work either. So, I'm stuck on this. Any advice or help to click the get started button? (Feel free to also give me advice on how to draw in the canvas as well!)
Here's the HTML:
<paper-button id="get-started" dialog-confirm="" class="primary" aria-label="Get started" role="button" tabindex="0" animated="" elevation="0" aria-disabled="false">
Get started
</paper-button>
here's the code:
from selenium import webdriver
import time
from PrivateInfo import*
driver = webdriver.Chrome(r"D:\chromeDriver\chromedriver.exe")
link = "https://www.google.com/"
driver.get(link)
def linkText(element):
button = driver.find_element_by_link_text(element)
button.click()
def byClass(className):
button2 = driver.find_element_by_class_name(className)
button2.click()
def type(text, elements):
input = driver.find_elements_by_name(elements)
for element in input:
driver.implicitly_wait(2)
element.send_keys(text)
linkText("Sign in")
type(email, "identifier")
byClass("VfPpkd-vQzf8d")
type(pw, "password")
driver.find_element_by_id("passwordNext").click()
time.sleep(1)
driver.get("https://canvas.apps.chrome/")
driver.implicitly_wait(3)
driver.find_element_by_css_selector('paper-button[id="get-started"]').click()
edit: I also tried this
getStart = driver.find_elements_by_css_selector('paper-button[id="get-started"]')
for start in getStart:
start.click()
it doesn't give me any errors but it does not do anything.
Ah yeah, forgot to mention but im new to using selenium.
The content of the popup is nested inside shadowRoots.
More on ShadowRoot here:
The ShadowRoot interface of the Shadow DOM API is the root node of a
DOM subtree that is rendered separately from a document's main DOM
tree.
To be able to control the element you will need to switch between DOMs. Here's how:
drawing_app_el = driver.execute_script("return arguments[0].shadowRoot", driver.find_element(By.CSS_SELECTOR, 'drawing-app'))
This code will retrieve the drawing_app first and then return the content of the shadowRoot.
To have access to the button getStarted, here's how:
# Get the shadowRoot of drawing-app
drawing_app_el = driver.execute_script("return arguments[0].shadowRoot", driver.find_element(By.CSS_SELECTOR, 'drawing-app'))
# Get the shadowRoot of welcome-dialog within drawing_app
welcome_dialog_el = driver.execute_script("return arguments[0].shadowRoot", drawing_app_el.find_element(By.CSS_SELECTOR, 'welcome-dialog'))
# Get the paper-dialog from welcome_dialog
paper_dialog_el = welcome_dialog_el.find_element(By.CSS_SELECTOR, 'paper-dialog')
# Finally, retrieve the getStarted button
get_started_button = paper_dialog_el.find_element(By.CSS_SELECTOR, '#get-started')
get_started_button.click()
I'm trying to automate a log in on a website. I have the following code:
def __init__(self):
self.driver = webdriver.Chrome(executable_path = '/usr/bin/chromedriver')
def parse(self, response):
self.driver.get(response.url)
self.driver.switch_to.frame(self.driver.find_element_by_id('J_loginIframe'))
self.driver.find_element_by_name('fm-login-id').send_keys('iamgooglepenn')
self.driver.find_element_by_id('fm-login-password').send_keys('mypassword')
self.driver.find_element_by_class_name('fm-button fm-submit password-login').click()
Right now, this code successfully put in the log in information and click the login button; However, the web asks my spider to slide a bar to the right before logging me in.
The HTML of the slidebar is as follows:
<span id="nc_1_n1z" class="nc_iconfont btn_slide" data-spm-anchor-id="0.0.0.i3.6a38teDwteDwKs" style="left:-2px";> ▫ </span>
Is there a way to automate this with python?
I think this s because you've visited the website multiple times on that device. So if you login from a new device, that slider will most probably turn up. So to access the slider, delete your caches and cookies and try again (or use Incognito mode). Or when you see the slider during the test, right click and inspect.
So the next time when you run it, make sure to click the slider using an xpath
Hope this works!!
I would like to scrape an arbitrary offer from aliexpress. Im trying to use scrapy and selenium. The issue I face is that when I use chrome and do right click > inspect on a element I see the real HTML but when I do right click > view source I see something different - a HTML CSS and JS mess all around.
As far as I understand the content is pulled asynchronously? I guess this is the reason why I cant find the element I am looking for on the page.
I was trying to use selenium to load the page first and then get the content I want but failed. I'm trying to scroll down to get to reviews section and get its content
Is this some advanced anti-bot solution that they have or maybe my approach is wrong?
The code that I currently have:
import scrapy
from selenium import webdriver
import logging
import time
logging.getLogger('scrapy').setLevel(logging.WARNING)
class MySpider(scrapy.Spider):
name = 'myspider'
start_urls = ['https://pl.aliexpress.com/item/32998115046.html']
def __init__(self):
self.driver = webdriver.Chrome()
def parse(self, response):
self.driver.get(response.url)
scroll_retries = 20
data = ''
while scroll_retries > 0:
try:
data = self.driver.find_element_by_class_name('feedback-list-wrap')
scroll_retries = 0
except:
self.scroll_down(500)
scroll_retries -= 1
print("----------")
print(data)
print("----------")
self.driver.close()
def scroll_down(self, pixels):
self.driver.execute_script("window.scrollTo(0, {});".format(pixels))
time.sleep(2)
By watching requests in network tab in inspect tool of browser you will find out comments are comming from here so you can crawl this page instead.
I just started and I've been on this for a week or two. Just using the internet to help but now I reached the point where I cant understand or my problem cannot be found anywhere else. In case you didnt understand my program I want to scrape data then click on a button then scrape data until I scrape an already collected data. then go to the next page which is in the list.
I reached the point where I scrape the first 8 data but I cant find a way to click on the "see more!" button. I know I should use Selenium and the button's Xpath. Anyway here is my code :
class KickstarterSpider(scrapy.Spider):
name = 'kickstarter'
allowed_domains = ['kickstarter.com']
start_urls = ["https://www.kickstarter.com/projects/zwim/zwim-smart-swimming-goggles/community", "https://www.kickstarter.com/projects/zunik/oriboard-the-amazing-origami-multifunctional-cutti/community"]
def _init_(self, driver):
self.driver = webdriver.Chrome(chromedriver)
def parse(self, response):
self.driver.get('https://www.kickstarter.com/projects/zwim/zwim-smart-swimming-goggles/community')
backers = response.css('.founding-backer.community-block-content')
b = backers[0]
while True:
try:
seemore = selfdriver.find_element_by_xpath('//*[#id="content-wrap"]').click()
except:
break
self.driver.close()
def parse2(self,response):
print('you are here!')
for b in backers:
name = b.css('.name.js-founding-backer-name::text').extract_first()
backed = b.css('.backing-count.js-founding-backer-backings::text').extract_first()
print(name, backed)
Be shure web driver used in scrapy loads and interprets JS (idk... it can be a solution)
I'm scraping this site using selenium. Firstly, i clicked on the clear button beside Attraction Type. Then i clicked on the more link on the bottom of the category list . Now for each i find the element by id and click on the link. The problem is as i click on the first category Outdoor Activities, the website goes back to the initial state again and i get following error as i try to click the next link:
StaleElementReferenceException: Message: Element is no longer attached to the DOM
My code is:
class TripSpider(CrawlSpider):
name = "tspider"
allowed_domains = ["tripadvisor.ca"]
start_urls = ['http://www.tripadvisor.ca/Attractions-g147288-Activities-c42-Dominican_Republic.html']
def __init__(self):
self.driver = webdriver.Firefox()
self.driver.maximize_window()
def parse(self, response):
self.driver.get(response.url)
self.driver.find_element_by_class_name('filter_clear').click()
time.sleep(3)
self.driver.find_element_by_class_name('show').click()
time.sleep(3)
#to handle popups
self.driver.switch_to.window(browser.window_handles[-1])
# Close the new window
self.driver.close()
# Switch back to original browser (first window)
self.driver.switch_to.window(browser.window_handles[0])
divs = self.driver.find_elements_by_xpath('//div[contains(#id,"ATTR_CATEGORY")]')
for d in divs:
d.find_element_by_tag_name('a').click()
time.sleep(3)
The problem with this website in particular is that each time you click on an element the DOM changes, so you can`t loop through elements which have gone stale.
I have the same problem short time ago, and I solved it using different windows for each link.
You could change this part of the code:
divs = self.driver.find_elements_by_xpath('//div[contains(#id,"ATTR_CATEGORY")]')
for d in divs:
d.find_element_by_tag_name('a').click()
time.sleep(3)
For:
from selenium.webdriver.common.keys import Keys
mainWindow = self.driver.current_window_handle
divs = self.driver.find_elements_by_xpath('//div[contains(#id,"ATTR_CATEGORY")]')
for d in divs:
# Open the element in a new Window
d.find_element_by_tag_name('a').send_keys(Keys.SHIFT + Keys.ENTER)
self.driver.switch_to_window(self.driver.window_handles[1])
# Here you do whatever you want in the new window
# Close the window and continue
self.driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 'w')
self.driver.switch_to_window(mainWindow)