This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How do I add a header to urllib2 opener?
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
How do i add a user agent header to this?
Not a direct answer - as Mateusz is correct for your direct question, but I strongly suggest if you're going to be doing a lot of this you consider the requests library at http://docs.python-requests.org/en/latest/index.html
That way it's as simple as:
import requests
r = requests.get('http://whatever.com/', headers={'User-Agent': 'xxxx'})
You also get cookies handled for you, basic auth is easier, and it's easy to plug-in OAuth bits - check out the docs, you may find it useful.
request = urllib2.Request(your_webpage)
request.add_header('User-Agent', your_user-agent)
data = opener.open(request).read()
Related
This question already has answers here:
Python Requests and persistent sessions
(8 answers)
Closed 4 years ago.
I am logging into website by using below code...
payload = {'user': 'apple', 'password': 'banana'}
loginurl = 'http://12.345.678.910:8000/login'
r = requests.post(loginurl, data=payload)
data = r.json()
print(data)
As a response of above code i am getting output as below
{u'message': u'Logged in'}
Now in that website i am trying to get some response by using below get request...
DataURL = "http://12.345.678.910:8000/api/datasources/proxy/1/query?db=AB_CDE&q=SELECT%20sum(%22count%22)%20FROM%20%22gatling%22%20WHERE%20%22status%22%20%3D%20%27ok%27%20AND%20%22simulation%22%20%3D~%20%2Fabdokd-live*%2F%20AND%20time%20%3E%201544491800000ms%20and%20time%20%3C%201544495400000ms%20GROUP%20BY%20%22script%22&epoch=ms"
Datar = requests.get(url = DataURL)
resposne = Datar.json()
print(resposne)
As a response of above code i am getting below...
{u'message': u'Unauthorized'}
Which is not expected as in the previous step i already logged into the website. Can someone help me to correct my code?
You will probably need to look into how the authentication mechanism works in HTTP. It's most likely that your server is returning either a cookie or some sort of other identification header. Cookies are easiest because the browser will (to a first approximation) automatically return the cookies it gets form a server when making further requests. Your existing code isn't doing that.
Since you are using the requests library you should look at the answer to this question, which might shed some light on the problem.
This question already has answers here:
Is there an easy way to request a URL in python and NOT follow redirects?
(7 answers)
Closed 4 years ago.
I know that it is possible to check if a URL redirects, as mentioned in the following question and its answer.
How to check if the url redirect to another url using Python
using the following code:
eq = urllib2.Request(url=url, headers=headers)
resp = urllib2.urlopen(req, timeout=3)
redirected = resp.geturl() != url # redirected will be a boolean True/False
However, I have list of Millions of URLs. Currently it is discussed wether one of them is a harmful URL or redirects to a harmful URL.
I want to know if it is possible to check for redirect without opening a direct connection to the redirecting website to avoid creating a connection with a harmful website?
You can do a HEAD request and check the status code. If you are using the third party requests library you can do that like this:
import requests
original_url = '...' # your original url here
response = requests.head(original_url)
if response.is_redirect:
print("Redirecting")
else:
print("Not redirecting")
This question already has answers here:
"SSL: certificate_verify_failed" error when scraping https://www.thenewboston.com/
(7 answers)
Closed 4 years ago.
I am trying to download this url which is a frame on this page.
I have tried like this:
import urllib.request
url = 'https://tips.danskespil.dk/tips13/?#/tps/poolid/2954'
response = urllib.request.urlopen(url)
html = response.read()
and also this way:
import requests
page = requests.get(url)
but both ways give me the error: SSL: CERTIFICATE_VERIFY_FAILED request.get
Any help would be much appriciated.
If you're not worried about safety (which you should be) your best bet is to use verify=False in the request function.
page = requests.get(url, verify=False)
You can also set verify to a directory of certificates with trusted CAs like so
verify = '/path/to/certfile'
You can refer to the documentation here for all the ways to get around it
This question already has answers here:
Using headers with the Python requests library's get method
(4 answers)
Closed 7 years ago.
I have a curl request that works:
curl "https://api.propublica.org/campaign-finance/v1/2016/candidates/search.json?query=Wilson"
-H "X-API-Key: PROPUBLICA_API_KEY"
How can I translate this into Python? I tried the following:
payload = {'X-API-Key': 'myapikey'}
r = requests.get("https://api.propublica.org/campaign-finance/v1/2016/candidates/search.json?query=Wilson", params = payload)
Then, I got:
>>> print(r.url)
https://api.propublica.org/campaign-finance/v1/2016/candidates/search.json?query=Wilson&X-API-Key=myapikey
>>> r.text
u'{"message": "Forbidden"}'
The simplest way to translate your curl work to python will be to use pycurl instead of requests.
Your Forbidden issue, however, does not depend on using requests or pycurl. It comes from sending the X-API-Key as query parameters instead of sending it as header (as you did in the curl call).
Try this out:
import urllib2
url = your_url
payload = {'X-API-Key': 'myapikey'}
req = Request(url, payload, {'Content-Type': 'application/json'})
open = urlopen(req)
after this you can use whatever way you want to print. Hope this works for you.
This question already has answers here:
How to disable cookie handling with the Python requests library?
(5 answers)
Closed 9 years ago.
I need to use a requests session object to set an HTTPAdaptor on the connection. I don't however want to actually track a session. That is, I don't wish to have cookies, or any other persistent data, stored and potentially sent with further requests.
Is there an easy way to disable this session tracking, or perhaps is there a way to use an HTTPAdapter without a session?
See this answer
the tl;dr is
from http import cookiejar # Python 2: import cookielib as cookiejar
class BlockAll(cookiejar.CookiePolicy):
return_ok = set_ok = domain_return_ok = path_return_ok = lambda self, *args, **kwargs: False
netscape = True
rfc2965 = hide_cookie2 = False
s = requests.Session()
s.cookies.set_policy(BlockAll())
s.get("https://httpbin.org/cookies/set?foo=bar")
assert not s.cookies