Python Xpath Contains %s in Selenium - python

I know this must be really, really simple but I can't quite figure out the syntax for this. I thought I understood but apparently I don't.
The XPath here works and if I use 'Wednesday' instead of '%s' it works fine but I'm trying to feed the XPath a variable string with this:
texttolookfor = "Wednesday"
buttonelement= webDriver.find_element_by_xpath("//tr[contains(., '%s')]/td[#class='button-cell']" % texttolookfor)
Which gives me:
<class 'selenium.common.exceptions.NoSuchElementException'>
But this works:
buttonelement= webDriver.find_element_by_xpath("//tr[contains(., 'Wednesday')]/td[#class='button-cell']")
So I've clearly misunderstood or made a mistake feeding the string in. I've been trying variations and looking at sample code but as far as my understanding goes this should of worked. I'd appreciate it if anyone could shed some light.
As asked in the comments I ran:
print "//tr[contains(., '%s')]/td[#class='button-cell']" % texttolookfor
which produced:
//tr[contains(., ' Wednesday')]/td[#class='button-cell']

Related

changing value in python

I'm scraping some info from a site using python and one of the value's has to be in the name of the file.
for this specific part I can't seem to get it to print right.
In the API there is a line like this:
BroadcastDate = 20100401
now I want to print this value like this.
01.04.2010
I know there is a lot possible in Python with text but I can't seem to figure it out or find anything on Google.
you smart guys probably know if its possible and how.
if something is unclear or you have a question, let me know!
EDIT
so I got the following piece of code which should need to work in my head but I dont get a response:
b = "20100104"
print((b[7:8]).(b[5:6]).(b[1:4]))
Thanks #Barmar your awnser put me in the right direction.
This worked:
b = "20100103"
year = (b[0:4])
month = (b[4:6])
day = (b[6:9])
print ((day) +(".") +(month) +(".") +(year))

Why do I need to include statement inside `where` inside of quotes in this python-Spark case?

I am learning Spark and had the following code:
shakeWordsDF = (shakespeareDF
.select(explode(split(shakespeareDF.word,'[\s]+'))
.alias('word'))
.where('word'!='')
)
This code did not work because of "condition should be string or column".
However, the code by this guy worked
shakeWordsDF = (shakespeareDF
.select(explode(split(shakespeareDF.word,'[\s]+'))
.alias('word'))
.where("word!=''")
)
I understand what the eror says - it needs string. But why doesn't documentation have anything on strings? All tutorials I have seen say that it should be
.where(x>2)
or (if I'm not wrong)
.where('columnName'>2)
and not
.where("x>2").
I'm really confused.

Tor API example not works correct

I'm trying to run example named "Using PycURL" from here https://stem.torproject.org/tutorials/to_russia_with_love.html
Everything works fine, but in the final i have this some kind of error:
TypeError : String argument expected, got 'bytes'
Unable to reach http://google.com <<23, 'Failed writing body <0 != 144>'>>
The question is, how can i fix these?
I've tried to use PyCurl as is without any proxy and it works fine.
But this example not works.
I'm running Python 3.4 under Windows, here is my source code http://pastebin.com/zFWrXU5E
Tnanks.
P.S. I need this to work exactly with PyCurl, cuz it is most usefull for my tasks.
P.S. #2 : I did little crutch, seems like it work http://pastebin.com/x8PtL9i3
Heh.
P.S. #3 : Hey! I get the error point, it's in the WRITEFUNCTION of PyCurl, somehow io.StringIO().write function not works ...
Solved.
Problem was in Python 3.4, cuz StringIO object was changed.
All you need is to change output var type from StringIO to BytesIO and then convert bytes to string for printing result.
Here is working source code : http://pastebin.com/Ad8ENTGe
Thanks.
P.S. Who placed -1 ???
haters...

python lxml xpath AttributeError (NoneType) with correct xpath and usually working

I am trying to migrate a forum to phpbb3 with python/xpath. Although I am pretty new to python and xpath, it is going well. However, I need help with an error.
(The source file has been downloaded and processed with tagsoup.)
Firefox/Firebug show xpath: /html/body/table[5]/tbody/tr[position()>1]/td/a[3]/b
(in my script without tbody)
Here is an abbreviated version of my code:
forumfile="morethread-alte-korken-fruchtweinkeller-89069-6046822-0.html"
XPOSTS = "/html/body/table[5]/tr[position()>1]"
t = etree.parse(forumfile)
allposts = t.xpath(XPOSTS)
XUSER = "td[1]/a[3]/b"
XREG = "td/span"
XTIME = "td[2]/table/tr/td[1]/span"
XTEXT = "td[2]/p"
XSIG = "td[2]/i"
XAVAT = "td/img[last()]"
XPOSTITEL = "/html/body/table[3]/tr/td/table/tr/td/div/h3"
XSUBF = "/html/body/table[3]/tr/td/table/tr/td/div/strong[position()=1]"
for p in allposts:
unreg=0
username = None
username = p.find(XUSER).text #this is where it goes haywire
When the loop hits user "tompson" / position()=11 at the end of the file, I get
AttributeError: 'NoneType' object has no attribute 'text'
I've tried a lot of try except else finallys, but they weren't helpful.
I am getting much more information later in the script such as date of post, date of user registry, the url and attributes of the avatar, the content of the post...
The script works for hundreds of other files/sites of this forum.
This is no encode/decode problem. And it is not "limited" to the XUSER part. I tried to "hardcode" the username, then the date of registry will fail. If I skip those, the text of the post (code see below) will fail...
#text of getpost
text = etree.tostring(p.find(XTEXT),pretty_print=True)
Now, this whole error would make sense if my xpath would be wrong. However, all the other files and the first numbers of users in this file work. it is only this "one" at position()=11
Is position() uncapable of going >10 ? I don't think so?
Am I missing something?
Question answered!
I have found the answer...
I must have been very tired when I tried to fix it and came here to ask for help. I did not see something quite obvious...
The way I posted my problem, it was not visible either.
the HTML I downloaded and processed with tagsoup had an additional tag at position 11... this was not visible on the website and screwed with my xpath
(It probably is crappy html generated by the forum in combination with tagsoups attempt to make it parseable)
out of >20000 files less than 20 are afflicted, this one here just happened to be the first...
additionally sometimes the information is in table[4], other times in table[5]. I did account for this and wrote a function that will determine the correct table. Although I tested the function a LOT and thought it working correctly (hence did not inlcude it above), it did not.
So I made a better xpath:
'/html/body/table[tr/td[#width="20%"]]/tr[position()>1]'
and, although this is not related, I ran into another problem with unxpected encoding in the html file (not utf-8) which was fixed by adding:
parser = etree.XMLParser(encoding='ISO-8859-15')
t = etree.parse(forumfile, parser)
I am now confident that after adjusting for strange additional and multiple , and tags my code will work on all files...
Still I will be looking into lxml.html, as I mentioned in the comment, I have never used it before, but if it is more robust and may allow for using the files without tagsoup, it might be a better fit and save me extensive try/except statements and loops to fix the few files screwing with my current script...

Getting started with json

I have never worked with json before. I am trying: http://api.worldbank.org//topics?format=JSON and make things with it, but I don't even know how to get started.
Following some manuals, I did this:
import urllib
import urllib2
import simplejson
urlb = 'http://api.worldbank.org/topics'
datab = urllib2.urlopen(urlb+'?'+ param)
resultb = simplejson.load(datab)
but I have no clue of how to parse and work on it now, how do I list the individual items? count them? filter them?. Is there any simple tutorial that you guys can point me to or advice? I checked diveintopython, json's website and most of the obvious ones, but I am still struggling with it. Is there any simple step-by-step guide that somebody could point me to?
Thanks
Trying printing resultb. Its just a python list with dictionaries inside it. Treat it like you would any list.

Categories

Resources