Selenium-rc Python client : Not able to iterate through the xpath? - python

I am a newbie to Selenium and is implementing selenium-rc with Python client library. I tried traversing through my page's div using xpath(s) elements using the command "sel.get_xpath_count(xpath)".
It gives a count of 20, but when I iterate through every div using for statement and command "sel.get_text(xpath='%s[%d]'%(xpath, i))", but it only finds the first element and give a error on the remaining 19 saying divs not found.

Your second XPath expression is wrong. Programmers trained in C-style languages frequently make this mistake, because they see [...] and think "index into an array", but that's not what brackets do in XPath.
If you use sel.get_xpath_count(something), then you need to use sel.get_text("xpath=(something)[item_number]"). Note the use of parentheses around the original XPath expression in the second use.
The reason behind this is that something[item_count] is short-hand for something AND position() = item_count - thus you wind up adding another predicate to the "something" expression, instead of selecting one of the nodes selected by the expression. (something)[item_count] works because the value of (something) is a list of nodes, and adding a position() = item_count selects the node from the list with the specified position. That's more like a C-style array.

Related

Problems with XPath in Selenium

I want to find element based on it attributes.
I have already tried searching by all divs, and specify by attributes, and even searching by *. None of this was solution.
Whole element looks like this:
<div class="charc" data-lvl="66" data-world="walios" data-nick="mirek">
This is my search expression:
driver.find_element_by_xpath('//div[#data-world="walios"] and [#data-nick="mirek"]')
I would like to find this element using python with selenium, and be able to click on it.
Actually I am getting the error
SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//div[#data-world="walios"] and [#data-nick="mirek"]' is not a valid XPath expression.
What am I doing wrong?
The error message is correct because your predicate(s) is/are not correct.
Try putting the predicate in one [...] expression:
driver.find_element_by_xpath('//div[#data-world="walios" and #data-nick="mirek"]')
driver.find_elements_by_xpath("//div[#data-world="walios" and #data-nick="mirek)]")
or
driver.find_elements_by_xpath("//div[#data-world="walios"][#data-nick="mirek)]")
The multiple conditions for selecting the tag can't be within nested []. Either you have to specify within one [] or within multiple []s.
XPath axes methods:
These XPath axes methods are used to find the complex or dynamic elements. Below we will see some of these methods.
XPath expression select nodes or list of nodes on the basis of attributes like ID , Name, Classname, etc. from the XML document .

Looking for 2 things in 2 different Xpaths - Python Selenium

I am working on a bot for a website and it requires a color and keyword to find the item. I am using selenium to look for the item from the keyword and then pick a color option (some items on the website, provide the item in multiple colors). I am having trouble looking for both the keyword and color at the same time, and then after choosing the correct colored version of the item from the user's color and keyword input. I want it to select on that option.
Formula I am trying to make in Python:
If the first Xpath(keyword) is found and the 2nd Xpath(color) is found
Then select on the item that contains those 2 properties.
This is the current code I have:
Item = driver.find_element_by_xpath('//*[contains(text(), "MLK")]' and contains ("Black")]')
if (item != None):
actions.moveToElement(item).click()
I've tried the code above and it doesn't work.
Here are the 2 pieces of code that I want to merge to find the item:
driver.find_element_by_xpath('//a[contains(text(), "MLK")]')
driver.find_element_by_xpath('//a[contains(text(), "Black")]')
The keyword is called MLK
The Color is called Black
After Merging, I want to find that Exact Element (Called MLK, Color version = Black)
This combined item should be clicked on, I only know to use .click()
If a better way, please let me know.
The website I am using to make a bot for: supremenewyork.com
The item I am using as an example, to pick a certain color (It's the Sweatshirt with MLK on it): http://www.supremenewyork.com/shop/all/sweatshirts
It took me a second to realize that there are 3 A tags for each shirt... one for the image, one for the name of the shirt, and one for the color. Since the last two A tags are the ones you are wanting to text search, you can't look for both strings in the same A tag. I've tested the XPath below and it works.
//article[.//a[contains(.,'MLK')]][.//a[.='Black']]//a
ARTICLE is the container for the shirt. This XPath is looking for an ARTICLE tag that contains an A tag that contains 'MLK' and then another A tag that contains 'Black' then finds the A tags that are descendants of the ARTICLE tag. You can click on any of them, they are all the same link.
BTW, your code has a problem. The first line below will throw an exception if there is no match so the next line will never be reached to test for None.
Item = driver.find_element_by_xpath('//*[contains(text(), "MLK")]' and contains ("Black")]')
if (Item != None):
actions.moveToElement(item).click()
A better practice is to use .find_elements() (plural) and check for an empty list. If the list is empty, that means there was no element that matched the locator.
Putting the pieces together:
items = driver.find_elements_by_xpath("//article[.//a[contains(.,'MLK')]][.//a[.='Black']]//a")
if items:
items[0].click()
I'm assuming you will be calling this code repeatedly so I would suggest that you put this in a function and pass the two strings to be searched for. I'll let you take it from here...
Try union "|" operator to combine two xpath.
Example:-
//p[#id='para1'] | //p[#id='para2']
('//a[contains(text(), "MLK")]' | '//a[contains(text(), "Black")]')
You can use a full XPath to select the item you want based on two conditions, you just need to start from a parent node and then apply the conditions on the child nodes:
//div[contains(./h1/a/text(), "MLK") and contains(./p/a/text(), "Black")]/a/#href
You first need to select the element itself, after that you need to get the attribute #href from the element, something like this:
Item = driver.find_element_by_xpath('//div[contains(./h1/a/text(), "MLK") and contains(./p/a/text(), "Black")]/a')
href = Item .get_attribute("href")

How to locate an element by class name and its text in python selenium

Hi I am trying to locate an element by its class name and the text that it contains
<div class="fc-day-number">15</div>
there are a bunch of fc-day-number on the page with different values, I need the one with for example 15.
I do
driver.find_element_by_class_name("fc-day-content")
but I also need it to be equal to 15 and I am stuck here, please help.
You can use xpath:
driver.find_element_by_xpath("//div[#class='fc-day-content' and text()='15']")
fc_day_contents = driver.find_elements_by_class_name("fc-day-content")
the_one_you_want = [x for x in fc_day_contents if "15" == x.text][0]
first line puts all elements with class name "fc-day-content" in a list ( also notice how its elementSSSSSSSSS with an S, this returns a list of all elements by_class_name, by_name, by_id or wahtever)
second line, goes through each element and looks to see if it has the text "15" as its text, and returns it as a (probably smaller) list
the [0] at the end of it, returns the first item in the list (you can remove it, if you want a list of all the ones that are "15" )
For things like this, prefer to use JavaScript:
els = driver.execute_script("""
return Array.prototype.slice.call(document.getElementsByClassName("fc-day-content"))
.filter(function (x) { return x.textContent === "15"; });
""")
assert len(els) == 1
el = els[0]
What this does is get all elements that have the class fc-day-content as the class. (By the way, your question uses fc-day-content and fc-day-number. Unclear which one you're really looking for but it does not matter in the grand scheme of things.) The call to Array.prototype.slice creates an array from the return value of getElementsByClassName because this method returns an HTMLCollection and thus filter is not available on it. Once we have the array, run a filter to narrow it to the elements that have 15 for text. This array of elements is returned by the JavaScript code. The assert is to make sure we don't unwittingly get more than one element, which would be a surprise. And then the element is extracted from the list.
(If you care about IE compatibility, textContent is not available before IE 9. If you need support for IE 8 or earlier, you might be able to get by with innerText or innerHTML or you could check that the element holds a single text node with value 15.)
I prefer not to do it like TehTris does (find_elements_by_class_name plus a Python loop to find the one with the text) because that method takes 1 round-trip between Selenium client and Selenium server to get all the elements of class fc-day-content plus 1 round-trip per element that was found. So if you have 15 elements on your page with the class fc-day-content, that's 16 round-trips. If you run a test through Browser Stack or Sauce Labs, that's going to slow things down considerably.
And I prefer to avoid an XPath expression with #class='fc-day-content' because as soon as you add a new class to your element, this expression breaks. Maybe the element you care about has just one CSS class now but applications change. You could use XPath's contains() function but then you run into other complications, and once you take care of everything, it becomes a bit unwieldy. See this answer for how to use it robustly.

Chosing next relative in Python BeautifulSoup with automation

First of all - I'm creating xml document with python BeautifulSoup.
Currently, what I'm trying to create, is very similar to this example.
<options>
<opt name='string'>ContentString</opt>
<opt name='string'>ContentString</opt>
<opt name='string'>ContentString</opt>
</options>
Notice, that there should be only one tag, called name.
As options can be much more in count, and different as well, I decided to create little python function, which could help me create such result.
array = ['FirstName','SecondName','ThirdName']
# This list will be guideline for function to let it know, how much options will be in result, and how option tags will be called.
def create_options(array):
soup.append(soup.new_tag('options'))
if len(array) > 0: # It's small error handling, so you could see, if given array isn't empty by any reason. Optional.
for i in range(len(array)):
soup.options.append(soup.new_tag('opt'))
# With beatifullsoup methods, we create opt tags inside options tag. Exact amount as in parsed array.
counter = 0
# There's option to use python range() method, but for testing purposes, current approach is sufficient enough.
for tag in soup.options.find_all():
soup.options.find('opt')['name'] = str(array[counter])
# Notice, that in this part tag name is assigned only to first opt element. We'll discuss this next.
counter += 1
print len(array), ' options were created.'
else:
print 'No options were created.'
You notice, that in function, tag assignment is handled by for loop, which, unfortunately, assigns all different tag names to first option in options element.
BeautifulSoup has .next_sibling and .previous_sibling, which can help me in this task.
As they describe by name, with them I can access next or previous sibling in element. So, by this example:
soup.options.find('opt').next_sibling['name'] = str(array[counter])
We can access second child of options element. So, if we add .next_sibling to each soup.items.find('opt'), we could then move from first element to next.
Problem is, that by finding option element in options with:
soup.options.find('opt')
each time we access first option. But my function is willing to access with each item in list, next option as well. So it means, as more items are in list, more .next_sibling methods it must add to first option.
In result, with logic I constructed, with 4th or further item in list, accessing relevant option for assigning it's appropriate tag, should look like this:
soup.options.find('opt').next_sibling.next_sibling.next_sibling.next_sibling['name'] = str(array[counter])
And now we are ready to my questions:
1st. As I didn't found any other kind of method, how to do it with Python BeautifulSoup methods, I'm not sure, that my approach still is only way. Is there any other method?
2st. How could I achieve result by this approach, if as my experiments show me, that I can't put variable inside method row? (So I could multiply methods)
#Like this
thirdoption = .next_sibling.next_sibling.next_sibling
#As well, it's not quite possible, but it's just example.
soup.options.find('opt').next_sibling.next_sibling.next_sibling['name'] = str(array[counter])
3st. May be I read BeautifulSoup documentation badly, and just didn't found method, which could help me in this task?
I managed to achieve result, ignoring BeatifulSoup methods.
Python has element tree methods, which were sufficient enough to work with.
So, let me show the example code, and explain it, what it does. Comments provide explanation more precisely.
"""
Before this code, there goes soup xml document generation. Except part, I mentioned in topic, we just create empty options tags in document, thus, creating almost done document.
Right after that, with this little script, we will use basic python provided element tree methods.
"""
import xml.etree.ElementTree as ET
ET_tree = ET.parse("exported_file.xml")
# Here we import exactly the same file, we opened with soup. Exporting can be done in different file, if you wish.
ET_root = ET_tree.getroot()
for position, opt in enumerate(item.find('options')):
# Position is pretty important, as this will remove 'counter' thing in for loop, I was using in code in first example. Position will be used for getting out exact items from array, which works like template for our option tag names.
opt.set('name', str(array[position]))
opt.text = 'text'
# Same way, with position, we can get data from relevant array, provided, that they are inherited or connected in same way.
tree = ET.ElementTree(ET_root).write('exported_file.xml',encoding="UTF-8",xml_declaration=True)
# This part was something, I researched before quite lot. This code will help save xml document with utf-8 encoding, which is very important.
This approach is pretty inefficient, as for achieving same result, I could use ET for everything.
Thought, BeatifulSoup prepares document in nice output, which in any way is very neat, as element-tree creates files for software friendly only look.

Nested Selectors in Scrapy

I have trouble getting nested Selectors to work as described in the documentation of Scrapy (http://doc.scrapy.org/en/latest/topics/selectors.html)
Here's what I got:
sel = Selector(response)
level3fields = sel.xpath('//ul/something/*')
for element in level3fields:
site = element.xpath('/span').extract()
When I print out "element" in the loop I get < Selector xpath='stuff seen above' data="u'< span class="something">text< /span>>
Now I got two problems:
Firstly, within the element, there should also be an "a"-node (as in <a href), but it doesn't show up in the print out, only if I extract it directly, then it does show up. Is that just a printing error or doesn't the "element-Selector" hold the a-node (without extraction)
when I print out "site" above, it should show a list with the span-nodes. However, it doesn't, it only prints out an empty list.
I tried a combination of changes (multiple to no slashes and stars (*) in different places), but none of it brought me any closer.
Essentially, I just want to get a nested Selector which gives me the span-node in the second step (the loop).
Anyone got any tips?
Regarding your first question, it's just a print "error". __repr__ and __str__ methods on Selectors only print the first 40 characters of the data (element represented as HTML/XML or text content). See https://github.com/scrapy/scrapy/blob/master/scrapy/selector/unified.py#L143
In your loop on level3fields you should use relative XPath expressions. Using /span will look for span elements directly under the root node, that's not what you want I guess.
Try this:
sel = Selector(response)
level3fields = sel.xpath('//ul/something')
for element in level3fields:
site = element.xpath('.//span').extract()

Categories

Resources