Can't access some content on page using selenium and python - python

I am using selenium in a python script to login into a website where I can get an authorization key to access their API. I am able to login and navigate to the page where the authorization key is provided, I am using chrome driver for testing so I can see what's going on. When I get to the final page where the key is displayed, I can't find a way to access it. I can't see it in the page source, and when I try to access via the page element outer html, it doesn't print the value shown on the page. Here is a screenshot of what I see in the browser (I'm interested in accessing the content shown in response body):
this is the code snippet I am using to try to access the content:
auth_key = WebDriverWait(sel_browser, 10).until(EC.presence_of_element_located((By.XPATH, '//*[#id="responseBodyContent"]')))
print auth_key.get_attribute("outerHTML")
and this is what the print statement returns:
<pre id="responseBodyContent"></pre>
I've also tried:
print auth_key.text
which returns nothing. Is there way I can extract this key from the page?

It looks like you need a custom wait to wait for the element and then wait for text.
First, add a class, find element and then get innerHTML of the element. Finally, measure length of the string.
See my example below.
class element_text_not_empty(object):
def __init__(self, locator):
self.locator = locator
def __call__(self, driver):
try:
element = driver.find_element(*self.locator)
if(len(element.get_attribute('innerHTML').strip())>0):
return element.get_attribute('innerHTML')
else:
return False
except Exception as ex:
print("Error while waiting: " + str(ex))
return False
driver = webdriver.Chrome(chrome_path)
...
...
try:
print("Start wait")
result = WebDriverWait(driver, 20).until(element_text_not_empty((By.XPATH, '//*[#id="responseBodyContent"]')))
print(result)
except Exception as ex:
print("Error: " + str(ex))

Since attribute value is in json format for responseBodyContent try this
authkey_text = json.loads(auth_key.get_attribute)
print str(authkey_text)

Related

Selenium.click() is not working, instead it is calling an unwanted function

I am using selenium to interact with a website. I'm using twitter as an example.
Here is my code:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
r = 0
def loadPage():
driver = webdriver.Firefox()
driver.set_window_size(800, 800)
#url = "about:blank"
url = "http://www.twitter.com/login"
driver.get(url)
login(driver)
def login(driver):
print("login was called")
name = "session[username_or_email]"
global r
try:
elem = driver.find_element_by_name(name)
elem.clear()
elem.send_keys("#someaccount")
elem.send_keys(Keys.TAB)
actions = ActionChains(driver)
actions.send_keys('password')
actions.send_keys(Keys.RETURN)
actions.perform()
r=0
retweet(driver)
except:
driver.implicitly_wait(3)
r+=1
if r <= 5: #only try this 5 times
print(r)
login(driver)
else:
print("Could not find element " + name)
#driver.close()
def retweet(driver):
g = 'g'
print(driver.current_url)
icon = driver.find_elements_by_tag_name(g)
icon.click()
loadPage()
When the function retweet() is called, icon.click() at line 43 calls the function login(). (The intended behavior is to perform a click, not to call the function login().)
using "icon.send_keys(Keys.RETURN)" at line 43 exhibits the same behavior.
program outputs:
login was called
1
login was called
https://twitter.com/login
1
login was called
https://twitter.com/login
1
login was called
2
login was called
3
login was called
4
login was called
5
login was called
Could not find element session[username_or_email]
The reason your login function is called again and again, because it found NoSuchElementException at line icon = driver.find_elements_by_tag_name(g). Once exception occurred it is going to execute code under except block. Which is nothing but to call login method as per above code.
Now, why NoSuchElementException occurred even if there are plethora of tag is available on page ? To answer that if you see your page in inspection mode all <g> tags are inside <svg> tag. To identify <svg> tag we need to use name method of xpath. So if you will use below it will not throw exception:
def retweet(driver):
xpathLink = "//*[name()='svg']//*[name()='g']"
print(driver.current_url)
icon = driver.find_element_by_xpath(xpathLink )
icon.click()
But, still you will not be clicking retweet link as above xpath will find any link icon present on the twitter page. So if you want to click re-tweet link only you need to use below xpath.
xpathRetweet = //div[#data-testid='retweet']//*[name()='svg']//*[name()='g']
Note : Above will always click first re-tweet link on the page. if you want to click all on page. You need to use find_elements o get list of all re-tweet links and click them one by one.

How can I check if an element exists on a page using Selenium XPath?

I'm writing a script in to do some webscraping on my Firebase for a few select users. After accessing the events page for a user, I want to check for the condition that no events have been logged by that user first.
For this, I am using Selenium and Python. Using XPath seems to work fine for locating links and navigation in all other parts of the script, except for accessing elements in a table. At first, I thought I might have been using the wrong XPath expression, so I copied the path directly from Chrome's inspection window, but still no luck.
As an alternative, I have tried to copy the page source and pass it into Beautiful Soup, and then parse it there to check for the element. No luck there either.
Here's some of the code, and some of the HTML I'm trying to parse. Where am I going wrong?
# Using WebDriver - always triggers an exception
def check_if_user_has_any_data():
try:
time.sleep(10)
element = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//*[#id="event-table"]/div/div/div[2]/mobile-table/md-whiteframe/div[1]/ga-no-data-table/div')))
print(type(element))
if element == True:
print("Found empty state by copying XPath expression directly. It is a bit risky, but it seems to have worked")
else:
print("didn’t find empty state")
except:
print("could not find the empty state element", EC)
# Using Beautiful Soup
def check_if_user_has_any_data#2():
time.sleep(10)
html = driver.execute_script("return document.documentElement.outerHTML")
soup = BeautifulSoup(html, 'html.parser')
print(soup.text[:500])
print(len(soup.findAll('div', {"class": "table-row-no-data ng-scope"})))
HTML
<div class="table-row-no-data ng-scope" ng-if="::config" ng-class="{overlay: config.isBuilderOpen()}">
<div class="no-data-content layout-align-center-center layout-row" layout="row" layout-align="center center">
<!-- ... -->
</div>
The first version triggers the exception and is expected to evaluate 'element' as True. Actual, the element is not found.
The second version prints the first 500 characters (correctly, as far as I can tell), but it returns '0'. It is expected to return '1' after inspecting the page source.
Use the following code:
elements = driver.find_elements_by_xpath("//*[#id='event-table']/div/div/div[2]/mobile-table/md-whiteframe/div[1]/ga-no-data-table/div")
size = len(elements)
if len(elements) > 0:
# Element is present. Do your action
else:
# Element is not present. Do alternative action
Note: find_elements will not generate or throw any exception
Here is the method that generally I use.
Imports
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
Method
def is_element_present(self, how, what):
try:
self.driver.find_element(by=how, value=what)
except NoSuchElementException as e:
return False
return True
Some things load dynamically. It is better to just set a timeout on a wait exception.
If you're using Python and Selenium, you can use this:
try:
driver.find_element_by_xpath("<Full XPath expression>") # Test the element if exist
# <Other code>
except:
# <Run these if element doesn't exist>
I've solved it. The page had a bunch of different iframe elements, and I didn't know that one had to switch between frames in Selenium to access those elements.
There was nothing wrong with the initial code, or the suggested solutions which also worked fine when I tested them.
Here's the code I used to test it:
# Time for the page to load
time.sleep(20)
# Find all iframes
iframes = driver.find_elements_by_tag_name("iframe")
# From inspecting page source, it looks like the index for the relevant iframe is [0]
x = len(iframes)
print("Found ", x, " iFrames") # Should return 5
driver.switch_to.frame(iframes[0])
print("switched to frame [0]")
if WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//*[#class="no-data-title ng-binding"]'))):
print("Found it in this frame!")
Check the length of the element you are retrieving with an if statement,
Example:
element = ('https://www.example.com').
if len(element) > 1:
# Do something.

Continue when element is not found in selenium python

The following script follows a page in Instagram:
browser = webdriver.Chrome('./chromedriver')
# GO INSTAGRAM PAGE FOR LOGIN
browser.get('https://www.instagram.com/accounts/login/?hl=it')
sleep(2)
# ID AND PASSWORD
elem = browser.find_element_by_name("username").send_keys('test')
elem = browser.find_element_by_name("password").send_keys('passw')
# CLICK BUTTON AND OPEN INSTAGRAM
sleep(5)
good_elem = browser.find_element_by_xpath('//*[#id="react-root"]/section/main/div/article/div/div[1]/div/form/span/button').click()
sleep(5)
browser.get("https://www.instagram.com")
# GO TO PAGE FOR FOLLOW
browser.get("https://www.instagram.com/iam.ai4/")
sleep(28)
segui = browser.find_element_by_class_name('BY3EC').click()
If an element with class BY3EC isn't found I want the script to keep working.
When an element is not found it throws NoSuchElementException, so you can use try/except to avoid that, for example:
from selenium.common.exceptions import NoSuchElementException
try:
segui = browser.find_element_by_class_name('BY3EC').click()
except NoSuchElementException:
print('Element BY3EC not found') # or do something else here
You can take a look at selenium exceptions to get an idea of what each one of them is for.
surround it with try catches, than you can build a happy path and handle failures as well, so your test case will always work
Best practice is to not use Exceptions to control flow. Exceptions should be exceptional... rare and unexpected. The simple way to do this is to get a collection using the locator and then see if the collection is empty. If it is, you know the element doesn't exist.
In the example below we search the page for the element you wanted and check to see that the collection contains an element, if it does... click it.
segui = browser.find_elements_by_class_name('BY3EC')
if segui:
segui[0].click()

Scraping image, url, description from page

I am trying to get image and video url from https://www.google.com/trends/home/all/IN
Here is the code:
driver = webdriver.PhantomJS('/usr/local/bin/phantomjs')
driver.set_window_size(1124, 850)
driver.get("https://www.google.com/trends/home/all/IN")
trend = {}
def getGooglerends():
try:
#Does this line makes any sense
#element = WebDriverWait(driver, 20).until(lambda driver: driver.find_elements_by_class_name('md-list-block ng-scope'))
for s in driver.find_elements_by_class_name('md-list-block ng-scope'):
print s.find_element_by_tag_name('img').get_attribute('src')
print s.find_element_by_tag_name('img').get_attribute('alt')
print s.find_elements_by_class_name('image-wrapper ng-scope').get_attribute('href')
except:
getNDTVTrends()
getGooglerends()
which gives
WebDriverException: Message: {"errorMessage":"Compound class names not permitted","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"111","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:57213","User-Agent":"Python-urllib/2.7"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"class name\", \"sessionId\": \"648251c0-1cc7-11e5-bf1c-4ff79ddbdce4\", \"value\": \"md-list-block ng-scope\"}","url":"/elements","urlParsed":{"anchor":"","query":"","file":"elements","directory":"/","path":"/elements","relative":"/elements","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/elements","queryKey":{},"chunks":["elements"]},"urlOriginal":"/session/648251c0-1cc7-11e5-bf1c-4ff79ddbdce4/elements"}}
Screenshot: available via screen
Any suggestion for this error?
Compound class names not permitted
It basically means, that you can not have spaces in your class name. You need to switch to another selector, be that css, xpath or something like that.
Not really sure what you are trying to select, but for example following xpath selects a list of items containing that class:
//div[#class="homepage-trending-stories generic-container ng-scope"]/md-list[#class="md-list-block ng-scope"]

Selenium not able to get some data on the web page

I am using Selenium with Python to get some data about Chrome extensions. I am trying to get the number of users of particular extension at this page. I am using the code below:
from selenium import webdriver
from selenium.common.exceptions import ElementNotVisibleException, NoSuchElementException
import time
def create_browser(first_page=None):
print "Starting"
browser = webdriver.Chrome('/home/user/ChromeDriver/chromedriver')
if first_page:
browser.get(first_page);
print "Done."
return browser
def wait_find_element_by_xpath(driver, path):
counter = 0
while counter < 7:
try:
elem = driver.find_element_by_xpath(path)
break
except NoSuchElementException:
time.sleep(1)
counter += 1
elem = None
return elem
URL = 'https://chrome.google.com/webstore/detail/id-vault/jlljbiieciifehccmokcpnmlklpaimpa/details'
browser = create_browser()
browser.get(URL)
time.sleep(7)
#Get number of users
userStr = wait_find_element_by_xpath(browser, './/span[#class="webstore-f-g-He"]')
#print "\n\n\n No. of Users: "
#print userStr
#print userStr.text
#print "\n\n\n-----"
noOfUserStr = userStr.text.replace(" users", "")
noOfUsers = noOfUserStr.replace(",", "")
users = int(noOfUsers)
My problem is I am not able to get the number of users at that particular page. Instead I get the error: ValueError: invalid literal for int() with base 10: ''
I find this strange because the code works well with other extensions. Also, even when you click see the source (Right Click-> Inspect element) you see the number of users in the source (just after the "from" field) but I am still not able to get the value. Can anyone help me fix the problem?
The problem is that for this particular extension, the number of users is not visible due to the length of the "from" url. Selenium generally only works on visible elements in the document.
I would recommend getting this value via javascript execution:
userStr = browser.execute_script("return document.getElementsByClassName('webstore-f-g-He')[0].textContent")

Categories

Resources