How to extract long URL from email with Python? - python

I need to extract a very long URL (example below) from an email message that I grab using Gmail's IMAP.
https://example.com/account/resetpassword?code=e8EkT%2B48uMCHr3Sq4QZVr0%2FVHrTBwQvhYwubjeaKozn29I7VGvWSYNO6VNRLXCK230P%2FklDrFC6BpPI7OF%2F5yawHlux80jqTBhTq2QRS4r7sEnSM9qKV1mIXkTzx%2B5tjakgElg%3D%3D&returnUrl=example.com
However, when I try to print the grabbed message, I notice that my long URL has some extra things like =\r\n and 3D inside of it (see examples below) or it is split in several lines by =.
https://example.com/account/resetpa=\r\nssword?code=3De8EkT%2B48uMCHr3Sq4QZVr0%2FVHrTBwQvhYwubjeaKozn29I7VGvWSYNO6V=\r\nNRLXCK230P%2FklDrFC6BpPI7OF%2F5yawHlux80jqTBhTq2QRS4r7sEnSM9qKV1mIXkTzx%2B5=\r\ntjakgElg%3D%3D&returnUrl=3Dexample.com
https://example.com/account/resetpa=
ssword?code=3De8EkT%2B48uMCHr3Sq4QZVr0%2FVHrTBwQvhYwubjeaKozn29I7VGvWSYNO6V=
NRLXCK230P%2FklDrFC6BpPI7OF%2F5yawHlux80jqTBhTq2QRS4r7sEnSM9qKV1mIXkTzx%2B5=
tjakgElg%3D%3D&returnUrl=3Dexample.com
How can I make sure that nothing is added to the long URL so that I could use it later to open?

I believe that format with = and 3D is called quoted printable. https://en.wikipedia.org/wiki/Quoted-printable
You could try using quopri.decodestring(string). https://docs.python.org/2/library/quopri.html

"\r\n" is a carriage return, which you can get rid of by using urlstring.replace("\r\n", ""). %3D means =(source), but I don't see why this would be an issue for you. The only issue is the carriage returns, which print your URL on different lines.

Related

Cant replace spaces in a python variable

i tried to replace spaces in a variable in python but it returns me this error
AttributeError: 'HTTPHeaders' object has no attribute 'replace'
this is my code
for req in driver.requests:
print(req.headers)
d = req.headers
x = d.replace("""
""", "")
So, if you check out the class HTTPHeaders you'll see it has a __repr__ function and that it's an HTTPMessage object.
Depending on what you exactly want to achieve (which is still not clear to me!, i.e, for which header do you want to replace spaces?) you can go about this two ways. Use the methods on the HTTPMessage object (documented here) or use the string version of it by calling repr on the response. I recommend you use the first approach as it is much cleaner.
I'll give an example in which I remove spaces for all canary values in all of the requests:
for req in driver.requests:
canary = req.headers.get("canary")
canary = canary.replace(" ", "")
P.S., your question is nowhere near clear enough as it stands. Only after asking multiple times and linking your other question it becomes clear that you are using seleniumwire, for example. Ideally, the code you provide can be run by anyone with the installed packages and reproduces the issue you have. BUT, allright, the comments made it more clear.

In Python pyngrok error for .replace method

I get an error on this line:
link = ngrok.connect(4040,"http").replace("http","https")
Error:
Instance of 'NgrokTunnel' has no 'replace' member
I've tested it.
Your link is no string. You have to convert it into a string in order to replace text.
This works with the function str().
link = str(ngrok.connect()).replace("http", "https")
The accepted answer is not quite correct, as the string you'll end up with is [<NgrokTunnel: "https://<public_sub>.ngrok.io" -> "http://localhost:80">] when the string you want is just the https://<public_sub>.ngrok.io part.
The NgrokTunnel object has a public_url attribute, which is what you want, so do this:
link = ngrok.connect(4040, "http").public_url.replace("http","https")
Moreover, if you don't even need the http port opened, this will just give you the https link by only opening a single tunnel, no need to manipulate the string:
link = ngrok.connect(4040, bind_tls=True).public_url
It's worth noting the accepted answer will work if you are using an older version of pyngrok (pre-5.0.0 release).

Raw http response in golang

I have a request I'm making to an endpoint but however for some reason the response body only contains the last line of the response (the whole response is captured in fiddler). The same thing happens if I recreate the request in python using the requests module. However, I've noticed if I take the entire raw response in python, I am able to see all the lines (separated by multiple \r). I'm wondering if it is possible to view the whole raw response in go like with the response.raw.data method in python. In other words is there a way I can view the whole text response instead of it cutting off everything but the last line? If anyone knows as to why the last line is being cut off it will be appreciated greatly as well.
To clarify, this only happens with this single endpoint and I suspect the \rs in the response body may be the culprit but I am unsure. I've not seen this behaviour from any other http response.
edit: this is the code I'm using to view the response
bodyB, _ := ioutil.ReadAll(resp.Body)
bodyStr := string(bodyB)
\r is a carriage return, but not a new line, so when you print it you are getting all of the lines, but they get overwritten each time.
You probably will want to do:
bodyB, _ := ioutil.ReadAll(resp.Body)
bodyStr := string(bytes.Replace(bodyB, []byte("\r"), []byte("\r\n"), -1))

Improper output while calling ajax request and appending output

Ajax output is
\u001b[1mGetting NS records for yahoo.com\u001b[0m\n\n\n\nIp Address\tServer Name\n\n----------\t-----------\n\n68.180.131.16\tns1.yahoo.com\n\n98.138.11.157\tns4.yahoo.com\n\n203.84.221.53\tns3.yahoo.com\n\n68.142.255.16\tns2.yahoo.com\n\n119.160.247.124\tns5.yahoo.com\n\n202.43.223.170\tns6.yahoo.com\n\n\n\nZone Transfer not enabled\n\n
When I append into html it looks like
[1mGetting NS records for yahoo.com[0m Ip Address Server Name ---------- ----------- 68.180.131.16 ns1.yahoo.com 98.138.11.157 ns4.yahoo.com 203.84.221.53 ns3.yahoo.com 68.142.255.16 ns2.yahoo.com 119.160.247.124 ns5.yahoo.com 202.43.223.170 ns6.yahoo.com Zone Transfer not enabled
"\t" "\n" doesnt seem to be working.
Please help.
HTML does not render tabs and line breaks. For a line break in HTML, use <br>. There are no tabs in HTML, but if you just want to insert some spaces, you can use for each blank space (of course, you can always insert a single space, but multiple spaces will get collapsed unless you explicitly use ).
Another option is to wrap your text in a <pre></pre> element to display the text exactly as you have it formatted in the HTML source (you may need to play with the CSS if you don't like the default formatting of <pre> content). web2py also includes a CODE() helper, which uses <pre> but also enables line numbers and syntax highlighting.

Strange urllib2.urlopen() error with variable vs string

I am having some strange behavior while using urllib2 to open a URL and download a video.
I am trying to open a video resource and here is an example link:
https://zencoder-temp-storage-us-east-1.s3.amazonaws.com/o/20130723/b3ed92cc582885e27cb5c8d8b51b9956/b740dc57c2a44ea2dc2d940d93d772e2.mp4?AWSAccessKeyId=AKIAI456JQ76GBU7FECA&Signature=S3lvi9n9kHbarCw%2FUKOknfpkkkY%3D&Expires=1374639361
I have the following code:
mp4_url = ''
#response_body is a json response that I get the mp4_url from
if response_body['outputs'][0]['label'] == 'mp4':
mp4_url = response_body['outputs'][0]['url']
if mp4_url:
logging.info('this is the mp4_url')
logging.info(mp4_url)
#if I add the line directly below this then it works just fine
mp4_url = 'https://zencoder-temp-storage-us-east-1.s3.amazonaws.com/o/20130723/b3ed92cc582885e27cb5c8d8b51b9956/b740dc57c2a44ea2dc2d940d93d772e2.mp4?AWSAccessKeyId=AKIAI456JQ76GBU7FECA&Signature=S3lvi9n9kHbarCw%2FUKOknfpkkkY%3D&Expires=1374639361'
mp4_video = urllib2.urlopen(mp4_url)
logging.info('succesfully opened the url')
The code works when I add the designated line but it gives me a HTTP Error 403: Forbidden message when I don't which makes me think it is messing up the mp4_url somehow. But the confusing part is that when I check the logging line for mp4_url it is exactly what I hardcoded in there. What could the difference be? Are there some characters in there that may be disrupting it? I have tried converting it to a string by doing:
mp4_video = urllib2.urlopen(str(mp4_url))
But that didn't do anything. Any ideas?
UPDATE:
With the suggestion to use print repr(mp4_url) it is giving me:
u'https://zencoder-temp-storage-us-east-1.s3.amazonaws.com/o/20130723/b3ed92cc582885e27cb5c8d8b51b9956/b740dc57c2a44ea2dc2d940d93d772e2.mp4?AWSAccessKeyId=AKIAI456JQ76GBU7FECA&Signature=S3lvi9n9kHbarCw%2FUKOknfpkkkY%3D&Expires=1374639361'
And I suppose the difference is what is causing the error but what would be the best way to parse this?
UPDATE II:
It ended up that I did need to cast it to a string but also the source that I was getting the link (an encoded video) needed nearly a 60 second delay before it could serve that URL so that is why it kept working when I hardcoded it because it had that delay. Thanks for the help!
It would be better to simply dump the response obtained. This way you would be able to check what response_body['outputs'][0]['label'] evaluates to. In you case, you are initializing mp4_url to ''. This is not the same as None and hence the condition if mp4_url: will always be true.
You may want to check that the initial if statement where you check that response_body['outputs'][0]['label'] is correct.

Categories

Resources