how do i change my headers and request so that i appear as firefox ...
like when request to some servers
import urllib
f = urllib.urlopen("rss feed")
they deny my request saying your client dosent have permission...
i get reply but the reply contains " your client dosent have permission"
so how do i get around this and get the data...
http://vsbabu.org/mt/archives/2003/05/27/urllib2_setting_http_headers.html
If you want to use good old urllib instead of newer, fancier urllib2, then as urllib's docs say, and I quote,
For example, applications may want to specify a different User-Agent header than URLopener defines. This can be accomplished with the following code:
import urllib
class AppURLopener(urllib.FancyURLopener):
version = "App/1.7"
urllib._urlopener = AppURLopener()
Of course, you'll want a version (aka User-Agent header) suitable for whatever version of Firefox (or w/ever else;-) you want to pretend you are;-).
Related
Basically i need a program that given a URL, it downloads a file and saves it. I know this should be easy but there are a couple of drawbacks here...
First, it is part of a tool I'm building at work, I have everything else besides that and the URL is HTTPS, the URL is of those you would paste in your browser and you'd get a pop up saying if you want to open or save the file (.txt).
Second, I'm a beginner at this, so if there's info I'm not providing please ask me. :)
I'm using Python 3.3 by the way.
I tried this:
import urllib.request
response = urllib.request.urlopen('https://websitewithfile.com')
txt = response.read()
print(txt)
And I get:
urllib.error.HTTPError: HTTP Error 401: Authorization Required
Any ideas? Thanks!!
You can do this easily with the requests library.
import requests
response = requests.get('https://websitewithfile.com/text.txt',verify=False, auth=('user', 'pass'))
print(response.text)
to save the file you would type
with open('filename.txt','w') as fout:
fout.write(response.text):
(I would suggest you always set verify=True in the resquests.get() command)
Here is the documentation:
Doesn't the browser also ask you to sign in? Then you need to repeat the request with the added authentication like this:
Python urllib2, basic HTTP authentication, and tr.im
Equally good: Python, HTTPS GET with basic authentication
If you don't have Requests module, then the code below works for python 2.6 or later. Not sure about 3.x
import urllib
testfile = urllib.URLopener()
testfile.retrieve("https://randomsite.com/file.gz", "/local/path/to/download/file")
You can try this solution: https://github.qualcomm.com/graphics-infra/urllib-siteminder
import siteminder
import getpass
url = 'https://XYZ.dns.com'
r = siteminder.urlopen(url, getpass.getuser(), getpass.getpass(), "dns.com")
Password:<Enter Your Password>
data = r.read() / pd.read_html(r.read()) # need to import panda as pd for the second one
This is a newbie problem with python, advice is much appreciated.
no-ip.com provides an easy way to update a computer's changing ip-address, simply open the url
http://user:password#dynupdate.no-ip.com/nic/update?hostname=my.host.name
...both http and https work when entered in firefox. I tried to implement that in a script residing in "/etc/NetworkManager/dispatcher.d" to be used by Network Manager on a recent version of Ubuntu.
What works is the python script:
from urllib import urlopen;
urlopen("http://user:password#dynupdate.no-ip.com/nic/update?hostname=my.host.name")
What I want to have is the same with "https", which does not work as easily. Could anyone, please,
(1) show me what the script should look like for https,
(2) give me some keywords, which I can use to learn about this.
(3) perhaps even explain why it does not work any more when the script is changed to using "urllib2":
from urllib2 import urlopen;
urlopen("http://user:password#dynupdate.no-ip.com/nic/update?hostname=my.host.name")
Thank you!
The user:password part isn't in the actual URL, but a shortcut for HTTP authentication. The browser's URL parsing lib will filter them out. In urllib2, you want to
import base64, urllib2
user,password = 'john_smith','123456'
request = urllib2.Request('dynupdate.no-ip.com/nic/update?hostname=my.host.name')
auth = base64.base64encode(user + ':' + password)
request.add_header('Authorization', 'Basic ' + auth)
urllib2.urlopen(request)
Is there any Python HTTP library that helps to imitate one of popular web-browser and has HTTPS support? I would like to define the order of HTTP headers, the presence of each exact header, the order of cookies values - everything that relates to "fingerprint" of a browser. We need that to test specific web server.
httplib.request will take an OrderedDict for headers. Some headers will be added automatically for protocol compliance, which will be left out if you specify them in your supplied headers.
Take a look at the putheader and _send_request methods, which you could override if their behaviour didn't suit your purposes.
>>> import httplib
>>> from collections import OrderedDict
>>> h = OrderedDict(('X-A','a'),('X-B','b'),('X-C','c'))
>>> c = httplib.HTTPConnection('localhost')
>>> c.set_debuglevel(1)
>>> r = c.request('GET','/','',h)
send: 'GET / HTTP/1.1\r\nHost: localhost\r\nAccept-Encoding: identity\r\nX-A: a\r\nX-B: b\r\nX-C: c\r\n\r\n'
Check out Requests which is very easy to work with and has all you need.
Alternatively you can drive web browser itself from Python using Selenium
Can someone explain to me why things are implemented the following in urllib2.
When I pass encoded url using http it again encodes the parameters
whereas in case of https it does not urlencode again
so lets say the (http) call is http//:example.com?email=amit%40sethi.com the request is
http://example.com?email=amit%2540sethi.com
where as in case of https it is
https://example.com?email=amit%40sethi.com
Thanks
Edit : Adding more details
The basic request I am making is
SF_EXTEND_RESOURCE = "https://www.superfax.in/api/voice/planchange/?"
params_dict = {'username':USERNAME,
'password':PASSWORD,
'email':str(user.email)
}
_url = SF_EXTEND_RESOURCE + urlencode(params_dict)
response = urllib2.urlopen(_url).read()
Now my problem is that when I am using http the email string is encoded twice where as that was not the case for https . I am using Python 2.6.5 on ubuntu Lucid. I am not able to understand how this is not reproducible.
I just tried it, and for me the behaviour is not what you observe: for me, http and https URLs work the same.
import urllib2
out = urllib2.urlopen("https://www.google.com/?q=foo%40bar");
print out.geturl()
open('out1', 'w').write(out.read())
out = urllib2.urlopen("http://www.google.com/?q=foo%40bar");
print out.geturl()
open('out2', 'w').write(out.read())
Compare out1 and out2 and you'll find that both the correct foo#bar in the "value" attribute of the search box, so there doesn't seem to be any double-encoding going on.
I am using urllib.urlretrieve in Python to download websites. Though some websites seem to not want me to download them, unless they have a proper referrer from their own site. Does anybody know of a way I can set a referrer in one of Python's libraries or a external one to.
import urllib2
req = urllib2.Request('http://www.example.com/')
req.add_header('Referer', 'http://www.python.org/')
r = urllib2.urlopen(req)
adopted from http://docs.python.org/library/urllib2.html
urllib makes it hard to send arbitrary headers with the request; you could use urllib2, which lets you build and send a Request object with arbitrary headers (including of course the -- alas sadly spelled;-) -- Referer). Doesn't offer urlretrieve, but it's easy to just urlopen as you with and copy the resulting file-like object to disk if you want (directly, or e.g. via shutil functions).
Also, using urllib2 with build_opener you can do this:
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('Referer', 'http://www.python.org/')]
opener.open('http://www.example.com/')