web element not detecting in selenium in a FOR LOOP - python

I'm trying to fetch some information from specific web elements. The problem is that when i try to fetch the information without for loop the program works like a charm. But the same when i put it in a for loop and try it does not detect the web elements in the loop. Here's the code i have been trying:
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import time
from lxml import html
import requests
import xlwt
browser = webdriver.Firefox() # Get local session of firefox
# 0 wait until the pages are loaded
browser.implicitly_wait(3) # 3 secs should be enough. if not, increase it
browser.get("http://ae.bizdirlib.com/taxonomy/term/1493") # Load page
links = browser.find_elements_by_css_selector("h2 > a")
def test():#test function
elems = browser.find_elements_by_css_selector("div.content.clearfix > div > fieldset> div > ul > li > span")
print elems
for elem in elems:
print elem.text
elem1 = browser.find_elements_by_css_selector("div.content.clearfix>div>fieldset>div>ul>li>a")
for elems21 in elem1:
print elems21.text
return 0
for link in links:
link.send_keys(Keys.CONTROL + Keys.RETURN)
link.send_keys(Keys.CONTROL + Keys.PAGE_UP)
time.sleep(5)
test() # Want to call test function
link.send_keys(Keys.CONTROL + 'w')
The output i get when i print the object is a empty array as the output []. Can somebody help me enhance it. Newbie to selenium.
In the previous question i had asked about printing. But the problem lies is that it self is that the element is not detecting by itself. This way question is totally different.

I couldnt open the page but as I understand you want to open links sequencially and do something. With link.send_keys(Keys.CONTROL + 'w') you are closing the newly opened tab so your links open in a new tab. In this condition must switch to new window so that you can reach the element in new window. You can query windows by driver.window_handles and switch to last window by driver.switch_to_window(driver.window_handles[-1]) and after you closed the window you must switch back to the first window by driver.switch_to_window(driver.window_handles[0])
for link in links:
link.send_keys(Keys.CONTROL + Keys.RETURN)
# switch to new window
driver.switch_to_window(driver.window_handles[-1])
link.send_keys(Keys.CONTROL + Keys.PAGE_UP) # dont know why
time.sleep(5)
test() # Want to call test function
link.send_keys(Keys.CONTROL + 'w')
#switch back to the first window
driver.switch_to_window(driver.window_handles[0])

Related

Getting an "Stale element reference" When trying to loop through pages with the intention of scraping multiple pages

I'm having an issue with my Python code. The intension is to use Selenium to open up the website (craigslist), search a text (Honda) then scrape three pages of this site. I keep getting the
"StaleElementReferenceException: stale element reference: element is not attached to the page document" exception
when the iteration reaches the second page. I cant exactly tell why its stopping at the second page and not clicking the "next" button once more to reach the third page then finally scraping the data and printing it.
This is my code:
import time
from selenium import webdriver
from bs4 import BeautifulSoup
DRIVER_PATH = "/Users/mouradsal/Downloads/DataSets Python/chromedriver"
URL = "https://vancouver.craigslist.org/"
browser = webdriver.Chrome(DRIVER_PATH)
browser.get(URL)
browser.maximize_window()
time.sleep(4)
search = browser.find_element_by_css_selector("#query")
search.send_keys("Honda")
search.send_keys(u'\ue007')
content = browser.find_elements_by_css_selector(".hdrlnk")
button = browser.find_element_by_css_selector(".next")
for i in range(0,3):
button.click()
print("Count: "+ str(i))
time.sleep(10)
print("done loop ")
for e in content:
start = e.get_attribute("innerHTML")
soup = BeautifulSoup(start, features=("lxml"))
print(soup.get_text())
print("***************************")
Any suggestions would be greatly appreciated!
Thanks
for i in range(0,3):
button = driver.find_element_by_css_selector(".next")
button.click()
print("Count: "+ str(i))
time.sleep(10)
You need to nest your finding of elements cause webelements change every time you get to a new page.

Python > Selenium: Web-scraping in a "logged-in" environment based on links from a text file

Compatible for ChromeDriver
This program seeks to accomplish the following:
Automatically sign-in to a website;
Visit a link / link(s) from a text file;
To scrape data from each page visited this way; and
Output all scraped data by print().
Kindly skip to Part 2 for the problem area, as part 1 is tested to work for step 1 already. :)
The code:
Part 1
from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
driver.get("https://www.website1.com/home")
main_page = driver.current_window_handle
time.sleep(5)
##cookies
driver.find_element_by_xpath('//*[#id="CybotCookiebotDialogBodyButtonAccept"]').click()
time.sleep(5)
driver.find_element_by_xpath('//*[#id ="google-login"]/span').click()
for handle in driver.window_handles:
if handle != main_page:
login_page = handle
driver.switch_to.window(login_page)
with open('logindetails.txt', 'r') as file:
for details in file:
email, password = details.split(':')
driver.find_element_by_xpath('//*[#id ="identifierId"]').send_keys(email)
driver.find_element_by_xpath('//span[text()="Next"]').click()
time.sleep(5)
driver.find_element_by_xpath('//input[#type="password"]').send_keys(password)
driver.find_element_by_xpath('//span[text()="Next"]').click()
driver.switch_to.window(main_page)
time.sleep(5)
Part 2
In alllinks.txt, we have the following websites:
• website1.com/otherpage/page1
• website1.com/otherpage/page2
• website1.com/otherpage/page3
with open('alllinks.txt', 'r') as directory:
for items in directory:
driver.get(items)
time.sleep(2)
elements = driver.find_elements_by_class_name('data-xl')
for element in elements:
print ([element])
time.sleep(5)
driver.quit()
The outcome:
[Done] exited with code=0 in 53.463 seconds
... and zero output
The problem:
Location of the element has been verified, am suspecting that the windows have something to do with why the driver is not scraping.
All inputs are welcome and greatly appreciated. :)
URL's used in driver.get() must include the protocol (i.e. https://).
driver.get('website1.com/otherpage/page1') will just raise an exception.
It turns out that I've missed out on including the "iframe", which is quite important for elements not directly selectable via window.
iframe = driver.find_element_by_xpath("//iframe[#class='LegacyFeature__Iframe-tasygr-1> bFBhBT']")
driver.switch_to.frame(iframe)
After switching to the target iframe, we then run the code to find and print the elements we're looking for.
time.sleep(1)
elements = driver.find_elements_by_class_name('data-xl')
for element in elements:
print(element.text)
Once logged-in, you can pretty much direct the webdriver to other pages on the site, even based on a text file that has all the links of interest:
Suppose that the text file (shown below as "LINKS.txt") had the
following links:
• https://www.website.com/home/item1
• https://www.website.com/home/item2
• https://www.website.com/home/item3
with open('LINKS.txt', 'r') as directory:
for items in directory:
driver.get(items)
iframe = driver.find_element_by_xpath("//iframe[#class='LegacyFeature__Iframe-tasygr-1 bFBhBT']")
driver.switch_to.frame(iframe)
time.sleep(10)
elements = driver.find_elements_by_class_name('data-xl')
for element in elements:
print(element.text)
time.sleep(10)
The code above should allow you to visit pages ...item1, ...item2, and ...item3 (as per the ".txt" file), scrape the elements, and print output.

selenium, webdriver.page_source not refreshing after click

I am trying to copy a web page's list of addresses for a given community service to a new document so i can geocode all of the locations in a map. Instead of being able to get a list of all the parcels I can only download one at a time and there are 25 parcel numbers limited to a page. As such, this would be extremely time consuming.
I want to develop a script that will look at the page source (everything including the 25 addresses which are contained in a table tag) click the next page button, copy the next page, and so on until the max page is reached. Afterwards, I can format the text to be geocoding compatible.
The code below does all of this except it only copies the first page over and over again even though I can clearly see that the program has successfully navigated to the next page:
# Open chrome
br = webdriver.Chrome()
raw_input("Navigate to web page. Press enter when done: ")
pg_src = br.page_source.encode("utf")
soup = BeautifulSoup(pg_src)
max_page = 122 #int(max_page)
#open a text doc to write the results to
f = open(r'C:\Geocoding\results.txt', 'w')
# write results page by page until max page number is reached
pg_cnt = 1 # start on 1 as we should already have the first page
while pg_cnt < max_page:
tble_elems = soup.findAll('table')
soup = BeautifulSoup(str(tble_elems))
f.write(str(soup))
time.sleep(5)
pg_cnt +=1
# clicks the next button
br.find_element_by_xpath("//div[#class='next button']").click()
# give some time for the page to load
time.sleep(5)
# get the new page source (THIS IS THE PART THAT DOESN'T SEEM TO BE WORKING)
page_src = br.page_source.encode("utf")
soup = BeautifulSoup(pg_src)
f.close()
I faced the same problem.
The problem i think is because some javascripts are not completely loaded.
All you need is wait till the object is loaded.Below code worked for me
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
delay = 10 # seconds
try:
myElem = WebDriverWait(drivr, delay).until(EC.presence_of_element_located((By.CLASS_NAME, 'legal-attribute-row')))
except :
print ("Loading took too much time!")

Selenium in Python - open every link within a drop down menu

I'm new to Python, but I've been searching for the past hour about how to do this and this code almost works. I need to open up every category on a collapsing (dropdown) menu, and then Ctrl+t every link within that now .active class. The browser opens and all the categories open as well, but I'm not getting any of the .active links being opened in new tabs. I would appreciate any help.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("pioneerdoctor.com/productpage.cfm")
cat = driver.find_elements_by_css_selector("a[href*='Product_home']")
for i in cat:
i.click()
child = driver.find_elements_by_css_selector("li.active > a[href*='ProductPage']")
for m in child:
m.send_keys(Keys.CONTROL + 't')
EDIT:
Here's the current workaround I got going by writing to a text file and using webbrowser. The only issue I'm seeing is that it's writing duplicates of the results multiple times. I'll be looking through the comments later to see if I can get it working a better way (which I'm sure there is).
from selenium import webdriver
import webbrowser
print("Opening Google Chrome..")
driver = webdriver.Chrome()
driver.get("http://pioneerdoctor.com/productpage.cfm")
driver.implicitly_wait(.5)
driver.maximize_window()
cat = driver.find_elements_by_css_selector("a[href*='Product_home']")
print("Writing URLS to file..")
for i in cat:
i.click()
child = driver.find_elements_by_css_selector("a[href*='ProductPage']")
for i in child:
child = i.get_attribute("href")
file = open("Output.txt", "a")
file.write(str(child) + '\n')
file.close()
driver.quit
file = open("Output.txt", "r")
Loop = input("Loop Number, Enter 0 to quit: ")
Loop = int(Loop)
x = 0
if Loop == 0:
print("Quitting..")
else:
for z in file:
if x == Loop:
break
print("Done.\n")
else:
webbrowser.open_new_tab(z)
x += 1
None of the links in those categories are not found because the css selector for the links is incorrect. Remove the > in li.active > a[href*='ProductPage']. Why ? p > q gives you the immediate children of p. Space or "p q" gives you all the "q" inside p. The links you are interested in are NOT the immediate children of li. They are inside a UL which is inside the li.
The other problem is the way you open links in new tabs. Use this code instead:
combo = Keys.chord(Keys.CONTROL, Keys.RETURN)
m.sendKeys(combo)
Thats how I do it in Java. I think that python should have Keys.chord. If I were you, then I would open the links in another browser instance. I have seen that switching between tabs and windows is not supported well by selenium itself. Bad things can happen.
Before you try any tabbing, make a simple example to open a new tab and switch back to the previous tab. Do the back and forth 3-4 times. Does it work smoothly ? Good. Then, do that with 3-5 tabs. Tell me how was your experience.

extracting more information from webdriver

I have written a code to extract the mobile models from the following website
"http://www.kart123.com/mobiles/pr?p%5B%5D=sort%3Dfeatured&sid=tyy%2C4io&ref=659eb948-c365-492c-99ef-59bd9f0427c6"
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://www.kart123.com/mobiles/pr?p%5B%5D=sort%3Dfeatured&sid=tyy%2C4io&ref=659eb948-c365-492c-99ef-59bd9f0427c6")
elem=[]
elem=driver.find_elements_by_xpath('.//div[#class="pu-title fk-font-13"]')
for e in elem:
print e.text
Everything is working fine but the problem arises at the end of the page. It is showing the contents of the first page only.Please could you help me what can I do in order to get all the models.
This will get you on your way, I would use while loops using sleep to get all the page loaded before getting the information from the page.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
driver = webdriver.Firefox()
driver.get("http://www.flipkart.com/mobiles/pr? p%5B%5D=sort%3Dfeatured&sid=tyy%2C4io&ref=659eb948-c365-492c-99ef-59bd9f0427c6")
time.sleep(3)
for i in range(5):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") # scroll to bottom of page
time.sleep(2)
driver.find_element_by_xpath('//*[#id="show-more-results"]').click() # click load more button, needs to be done until you reach the end.
elem=[]
elem=driver.find_elements_by_xpath('.//div[#class="pu-title fk-font-13"]')
for e in elem:
print e.text
Ok this is going to be a major hack but here goes... The site gets more phones as you scroll down by hitting an ajax script giving you 20 more each time. The script its hitting is this:
http://www.flipkart.com/mobiles/pr?p[]=sort%3Dpopularity&sid=tyy%2C4io&start=1&ref=8aef4a5f-3429-45c9-8b0e-41b05a9e7d28&ajax=true
Notice the start parameter you can hack this into what you want with
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
num = 1
while num <=2450:
"""
This condition will need to be updated to the maximum number
of models you're interested in (or if you're feeling brave try to extract
this from the top of the page)
"""
driver.get("http://www.flipkart.com/mobiles/pr?p[]=sort%3Dpopularity&sid=tyy%2C4io&start=%f&ref=8aef4a5f-3429-45c9-8b0e-41b05a9e7d28&ajax=true" % num)
elem=[]
elem=driver.find_elements_by_xpath('.//div[#class="pu-title fk-font-13"]')
for e in elem:
print e.text
num += 20
You'll be making 127 get requests so this will be quite slow...
You can get full source of the page and do all the analysis based on it:
page_text = driver.page_source
The page shall contain current content including whatever was generated by JavaScript. Be carefull to get this content at the moment, all the rendering is completed (you may e.g. wait for presence of some string, which gets rendered at the end).

Categories

Resources