Strange urllib2.urlopen() error with variable vs string - python

I am having some strange behavior while using urllib2 to open a URL and download a video.
I am trying to open a video resource and here is an example link:
https://zencoder-temp-storage-us-east-1.s3.amazonaws.com/o/20130723/b3ed92cc582885e27cb5c8d8b51b9956/b740dc57c2a44ea2dc2d940d93d772e2.mp4?AWSAccessKeyId=AKIAI456JQ76GBU7FECA&Signature=S3lvi9n9kHbarCw%2FUKOknfpkkkY%3D&Expires=1374639361
I have the following code:
mp4_url = ''
#response_body is a json response that I get the mp4_url from
if response_body['outputs'][0]['label'] == 'mp4':
mp4_url = response_body['outputs'][0]['url']
if mp4_url:
logging.info('this is the mp4_url')
logging.info(mp4_url)
#if I add the line directly below this then it works just fine
mp4_url = 'https://zencoder-temp-storage-us-east-1.s3.amazonaws.com/o/20130723/b3ed92cc582885e27cb5c8d8b51b9956/b740dc57c2a44ea2dc2d940d93d772e2.mp4?AWSAccessKeyId=AKIAI456JQ76GBU7FECA&Signature=S3lvi9n9kHbarCw%2FUKOknfpkkkY%3D&Expires=1374639361'
mp4_video = urllib2.urlopen(mp4_url)
logging.info('succesfully opened the url')
The code works when I add the designated line but it gives me a HTTP Error 403: Forbidden message when I don't which makes me think it is messing up the mp4_url somehow. But the confusing part is that when I check the logging line for mp4_url it is exactly what I hardcoded in there. What could the difference be? Are there some characters in there that may be disrupting it? I have tried converting it to a string by doing:
mp4_video = urllib2.urlopen(str(mp4_url))
But that didn't do anything. Any ideas?
UPDATE:
With the suggestion to use print repr(mp4_url) it is giving me:
u'https://zencoder-temp-storage-us-east-1.s3.amazonaws.com/o/20130723/b3ed92cc582885e27cb5c8d8b51b9956/b740dc57c2a44ea2dc2d940d93d772e2.mp4?AWSAccessKeyId=AKIAI456JQ76GBU7FECA&Signature=S3lvi9n9kHbarCw%2FUKOknfpkkkY%3D&Expires=1374639361'
And I suppose the difference is what is causing the error but what would be the best way to parse this?
UPDATE II:
It ended up that I did need to cast it to a string but also the source that I was getting the link (an encoded video) needed nearly a 60 second delay before it could serve that URL so that is why it kept working when I hardcoded it because it had that delay. Thanks for the help!

It would be better to simply dump the response obtained. This way you would be able to check what response_body['outputs'][0]['label'] evaluates to. In you case, you are initializing mp4_url to ''. This is not the same as None and hence the condition if mp4_url: will always be true.
You may want to check that the initial if statement where you check that response_body['outputs'][0]['label'] is correct.

Related

Cant replace spaces in a python variable

i tried to replace spaces in a variable in python but it returns me this error
AttributeError: 'HTTPHeaders' object has no attribute 'replace'
this is my code
for req in driver.requests:
print(req.headers)
d = req.headers
x = d.replace("""
""", "")
So, if you check out the class HTTPHeaders you'll see it has a __repr__ function and that it's an HTTPMessage object.
Depending on what you exactly want to achieve (which is still not clear to me!, i.e, for which header do you want to replace spaces?) you can go about this two ways. Use the methods on the HTTPMessage object (documented here) or use the string version of it by calling repr on the response. I recommend you use the first approach as it is much cleaner.
I'll give an example in which I remove spaces for all canary values in all of the requests:
for req in driver.requests:
canary = req.headers.get("canary")
canary = canary.replace(" ", "")
P.S., your question is nowhere near clear enough as it stands. Only after asking multiple times and linking your other question it becomes clear that you are using seleniumwire, for example. Ideally, the code you provide can be run by anyone with the installed packages and reproduces the issue you have. BUT, allright, the comments made it more clear.

In Python pyngrok error for .replace method

I get an error on this line:
link = ngrok.connect(4040,"http").replace("http","https")
Error:
Instance of 'NgrokTunnel' has no 'replace' member
I've tested it.
Your link is no string. You have to convert it into a string in order to replace text.
This works with the function str().
link = str(ngrok.connect()).replace("http", "https")
The accepted answer is not quite correct, as the string you'll end up with is [<NgrokTunnel: "https://<public_sub>.ngrok.io" -> "http://localhost:80">] when the string you want is just the https://<public_sub>.ngrok.io part.
The NgrokTunnel object has a public_url attribute, which is what you want, so do this:
link = ngrok.connect(4040, "http").public_url.replace("http","https")
Moreover, if you don't even need the http port opened, this will just give you the https link by only opening a single tunnel, no need to manipulate the string:
link = ngrok.connect(4040, bind_tls=True).public_url
It's worth noting the accepted answer will work if you are using an older version of pyngrok (pre-5.0.0 release).

Simple split string not working in zapier

I'm using zapier to put different apps together. I need to split a string custom_id that has 6 parts that are separated by an underscore. For example, sk000_i093_14.50_5_MNE_2017-07-25
Here's my code:
split_str = input_data['custom_id'].split("_")
output = [{'sk':split_str[0], 'buy_invoice':split_str[1], 'sales_amt':split_str[2], 'UPI':split_str[3], 'buyer':split_str[4], 'date_buy':split_str[5]}]
I also tried it this way:
sk, buy_invoice, sales_amt, upi, buyer, date_buy = input_data['custom_id'].split("_")
output = [{'sk':sk, 'buy_invoice':buy_invoice, 'sales_amt':sales_amt, 'upi':upi, 'buyer':buyer, 'date_buy':date_buy}]
I've searched and searched and haven't found anything specific to zapier on why my simple split string isn't working with zapier. When I test the code zapier doesn't give a useful error message, just:
"Bargle. We hit an error creating a run python. Error: Your code had
an error!"
I've tried running it multiple ways, but whenever I try to retrieve the data from the split I get the very unhelpful error message.
Any help is very much appreciated! Thanks!
UPDATE:
When you go to test the code, Zapier shows test data for input_data. Even though this data is showing up correctly, during the actual test run input_data is empty! So there was nothing wrong with the split. Phew!
Thanks!
The split was correct. The problem was input_data wasn't being populated, even though Zapier showed the correct data was going to populate it, input_data was empty anyway. I added some more key:value pairs to input_data because I needed them, refreshed the webpage, refreshed the fields, and re-tested the code, and input_data finally got populated and the code ran perfectly.
Thanks to PRMoureu and E. Ducateme for giving me the idea to check my input_data (Duh!).

python lxml xpath AttributeError (NoneType) with correct xpath and usually working

I am trying to migrate a forum to phpbb3 with python/xpath. Although I am pretty new to python and xpath, it is going well. However, I need help with an error.
(The source file has been downloaded and processed with tagsoup.)
Firefox/Firebug show xpath: /html/body/table[5]/tbody/tr[position()>1]/td/a[3]/b
(in my script without tbody)
Here is an abbreviated version of my code:
forumfile="morethread-alte-korken-fruchtweinkeller-89069-6046822-0.html"
XPOSTS = "/html/body/table[5]/tr[position()>1]"
t = etree.parse(forumfile)
allposts = t.xpath(XPOSTS)
XUSER = "td[1]/a[3]/b"
XREG = "td/span"
XTIME = "td[2]/table/tr/td[1]/span"
XTEXT = "td[2]/p"
XSIG = "td[2]/i"
XAVAT = "td/img[last()]"
XPOSTITEL = "/html/body/table[3]/tr/td/table/tr/td/div/h3"
XSUBF = "/html/body/table[3]/tr/td/table/tr/td/div/strong[position()=1]"
for p in allposts:
unreg=0
username = None
username = p.find(XUSER).text #this is where it goes haywire
When the loop hits user "tompson" / position()=11 at the end of the file, I get
AttributeError: 'NoneType' object has no attribute 'text'
I've tried a lot of try except else finallys, but they weren't helpful.
I am getting much more information later in the script such as date of post, date of user registry, the url and attributes of the avatar, the content of the post...
The script works for hundreds of other files/sites of this forum.
This is no encode/decode problem. And it is not "limited" to the XUSER part. I tried to "hardcode" the username, then the date of registry will fail. If I skip those, the text of the post (code see below) will fail...
#text of getpost
text = etree.tostring(p.find(XTEXT),pretty_print=True)
Now, this whole error would make sense if my xpath would be wrong. However, all the other files and the first numbers of users in this file work. it is only this "one" at position()=11
Is position() uncapable of going >10 ? I don't think so?
Am I missing something?
Question answered!
I have found the answer...
I must have been very tired when I tried to fix it and came here to ask for help. I did not see something quite obvious...
The way I posted my problem, it was not visible either.
the HTML I downloaded and processed with tagsoup had an additional tag at position 11... this was not visible on the website and screwed with my xpath
(It probably is crappy html generated by the forum in combination with tagsoups attempt to make it parseable)
out of >20000 files less than 20 are afflicted, this one here just happened to be the first...
additionally sometimes the information is in table[4], other times in table[5]. I did account for this and wrote a function that will determine the correct table. Although I tested the function a LOT and thought it working correctly (hence did not inlcude it above), it did not.
So I made a better xpath:
'/html/body/table[tr/td[#width="20%"]]/tr[position()>1]'
and, although this is not related, I ran into another problem with unxpected encoding in the html file (not utf-8) which was fixed by adding:
parser = etree.XMLParser(encoding='ISO-8859-15')
t = etree.parse(forumfile, parser)
I am now confident that after adjusting for strange additional and multiple , and tags my code will work on all files...
Still I will be looking into lxml.html, as I mentioned in the comment, I have never used it before, but if it is more robust and may allow for using the files without tagsoup, it might be a better fit and save me extensive try/except statements and loops to fix the few files screwing with my current script...

figuring out the possible attributes of an object

regarding this code from python-blogger
def listposts(service, blogid):
feed = service.Get('/feeds/' + blogid + '/posts/default')
for post in feed.entry:
print post.GetEditLink().href.split('/')[-1], post.title.text, "[DRAFT]" if is_draft(post) else ""
I want to know what fields exist in feed.entry but I'm not sure where to look in these docs to find out.
So I dont just want an answer. I want to know how I should've navigated the docs to find out for myself.
Try dir(field.entry)
It may be useful for your case.
It's a case of working through it, step by step.
The first thing I did was click on service on the link you sent... based on service = feed.Get(...)
Which leads here: http://gdata-python-client.googlecode.com/hg/pydocs/gdata.service.html
Then looking at .Get() it states
Returns:
If there is no ResultsTransformer specified in the call, a GDataFeed
or GDataEntry depending on which is sent from the server. If the
response is niether a feed or entry and there is no ResultsTransformer,
return a string. If there is a ResultsTransformer, the returned value
will be that of the ResultsTransformer function.
So guessing you've got a GDataFeed - as you're iterating over it:, and a quick google for "google GDataFeed" leads to: https://developers.google.com/gdata/jsdoc/1.10/google/gdata/Feed

Categories

Resources