Python urllib2: How to eliminate urllib2 add it's own headers - python

I am testing some application where I send some POST requests, want to test the behavior of the application when some headers are missing in the request to verify that it generates the correct error codes.
For doing this, my code is as follows.
header = {'Content-type': 'application/json'}
data = "hello world"
request = urllib2.Request(url, data, header)
f = urllib2.urlopen(request)
response = f.read()
The problem is urllib2 adds it's own headers like Content-Length, Accept-Encoding when it sends the POST request, but I don't want urllib2 to add any more headers than the one I specified in the headers dict above, is there a way to do that, I tried setting the other headers I don't want to None, but they still go with those empty values as part of the request which I don't want.

The header takes a dictionary type, example below using a chrome user-agent. For all standard and some non-stranded header fields take a look here. You also need to encode your data with urllib not urllib2. This is all mention in the python documentation here
import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()

Related

Delete default header fields in Python Requests [duplicate]

I want to send a value for "User-agent" while requesting a webpage using Python Requests. I am not sure is if it is okay to send this as a part of the header, as in the code below:
debug = {'verbose': sys.stderr}
user_agent = {'User-agent': 'Mozilla/5.0'}
response = requests.get(url, headers = user_agent, config=debug)
The debug information isn't showing the headers being sent during the request.
Is it acceptable to send this information in the header? If not, how can I send it?
The user-agent should be specified as a field in the header.
Here is a list of HTTP header fields, and you'd probably be interested in request-specific fields, which includes User-Agent.
If you're using requests v2.13 and newer
The simplest way to do what you want is to create a dictionary and specify your headers directly, like so:
import requests
url = 'SOME URL'
headers = {
'User-Agent': 'My User Agent 1.0',
'From': 'youremail#domain.example' # This is another valid field
}
response = requests.get(url, headers=headers)
If you're using requests v2.12.x and older
Older versions of requests clobbered default headers, so you'd want to do the following to preserve default headers and then add your own to them.
import requests
url = 'SOME URL'
# Get a copy of the default headers that requests would use
headers = requests.utils.default_headers()
# Update the headers with your custom ones
# You don't have to worry about case-sensitivity with
# the dictionary keys, because default_headers uses a custom
# CaseInsensitiveDict implementation within requests' source code.
headers.update(
{
'User-Agent': 'My User Agent 1.0',
}
)
response = requests.get(url, headers=headers)
It's more convenient to use a session, this way you don't have to remember to set headers each time:
session = requests.Session()
session.headers.update({'User-Agent': 'Custom user agent'})
session.get('https://httpbin.org/headers')
By default, session also manages cookies for you. In case you want to disable that, see this question.
It will send the request like browser
import requests
url = 'https://Your-url'
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}
response= requests.get(url.strip(), headers=headers, timeout=10)

TypeError: POST data should be bytes [duplicate]

Using the following code I received an error:
TypeError: POST data should be bytes or an iterable of bytes. It cannot be str
Second concern, I am not sure if I specified my user-agent correctly, here's my user-agent in whole: Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.94 Safari/537.4. I gave my best shot as I defined the user-agent in the script.
import urllib.parse
import urllib.request
url = 'http://getliberty.org/contact-us/'
user_agent = 'Mozilla/5.0 (compatible; Chrome/22.0.1229.94; Windows NT)'
values = {'Your Name' : 'Horatio',
'Your Email' : '6765Minus4181#gmail.com',
'Subject' : 'Hello',
'Your Message' : 'Cheers'}
headers = {'User-Agent': user_agent }
data = urllib.parse.urlencode(values)
req = urllib.request.Request(url, data)
response = urllib.request.urlopen(req)
the_page = response.read()
I am aware of this similar question, TypeError: POST data should be bytes or an iterable of bytes. It cannot be str, but am too new for the answer to be much help.
data = urllib.parse.urlencode(values)
type(data) #this returns <class 'str'>. it's a string
The urllib docs say for urllib.request.Request(url, data ...):
The urllib.parse.urlencode() function takes a mapping or sequence of 2-tuples and returns a string in this format. It should be encoded to bytes before being used as the data parameter. etc etc
(emphasis mine)
So you have a string that looks right, what you need is that string encoded into bytes. And you choose the encoding.
binary_data = data.encode(encoding)
in the above line: encoding can be 'utf-8' or 'ascii' or a bunch of other things. Pick whichever one the server expects.
So you end up with something that looks like:
data = urllib.parse.urlencode(values)
binary_data = data.encode(encoding)
req = urllib.request.Request(url, binary_data)
You can try with requests module as an alternative solution
import json
import requests
url = 'http://getliberty.org/contact-us/'
user_agent = 'Mozilla/5.0 (compatible; Chrome/22.0.1229.94; Windows NT)'
values = {
'Your Name' : 'Horatio',
'Your Email' : '6765Minus4181#gmail.com',
'Subject' : 'Hello',
'Your Message' : 'Cheers'
}
headers = {'User-Agent': user_agent, 'Content-Type':'application/json' }
data = json.dumps(values)
request = requests.post(url, data=data, headers=headers)
response = request.json()

proper way of implementing a user agent into a urllib.request.build_opener

I'm trying to set the user agent for my urllib request:
opener = urllib.request.build_opener(
urllib.request.HTTPCookieProcessor(cj),
urllib.request.HTTPRedirectHandler(),
urllib.request.ProxyHandler({'http': proxy})
)
and finally:
response3 = opener.open("https://www.google.com:443/search?q=test", timeout=timeout_value).read().decode("utf-8")
What would be the best way to set the user-agent header to
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36
With urllib we have two options, as far as I know.
build_opener returns a OpenerDirector object, which has an addheaders attribute. We can change the user-agent and other headers with that attribute.
opener.addheaders = [('User-Agent', 'My User-Agent')]
url = 'http://httpbin.org/user-agent'
r = opener.open(url, timeout=5)
text = r.read().decode("utf-8")
Alternatively, we can install the OpenerDirector object to the global opener with install_opener and use urlopen to submit the request. Now can use Request to set the headers.
urllib.request.install_opener(opener)
url = 'http://httpbin.org/user-agent'
headers = {'user-agent': "My User-Agent"}
req = urllib.request.Request(url, headers=headers)
r = urllib.request.urlopen(req, timeout=5)
text = r.read().decode("utf-8")
Personally, I prefer the second method because it is more consistent. Once we install the opener all requests will have the same handlers, and we can continue using urllib the same way. However, if you don't want to use those handlers for all requests you should choose the first method and use addheaders to set headers for a specific OpenerDirector object.
With requests things are simpler.
We can use the session.heders attribute if we want to change the user-agent or other headers for all requests,
s = requests.session()
s.headers['user-agent'] = "My User-Agent"
r = s.get(url, timeout=5)
or use the headers parameter if we want to set headers for a specific request only.
headers = {'user-agent': "My User-Agent"}
r = requests.get(url, headers=headers, timeout=5)

requests payload encoding

I use requests to pretend firefox and from the fiddler, I saw header is same, but the SystaxView not same
payload = {'searchType':'U'}
s.post(url,data=payload)
but I got error 500, From the syntax view, I saw in requests it will change to searchType=U
But Real browser will output searchType='U'.
I tried payload = {'searchType':'\'U\''} it will becomesearchType=%27U%27 in Syntax view.
any idea? I only find 1 difference, so I suspect it will trigger 500 error.
import requests
s=requests.Session()
s.headers.update({'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0'})
s.get('http://gls.fehd.gov.hk/fehd_lgs/jsp/search/searchMainPage.jsp?lang=zh_TW')
s.headers.update({'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'X-Requested-With': 'XMLHttpRequest'})
s.headers.update({'Referer': 'http://gls.fehd.gov.hk/fehd_lgs/jsp/search/searchMainPage.jsp?lang=zh_TW', 'HOST':'gls.fehd.gov.hk'})
s.headers.update({'Accept': 'application/xml, text/xml, */*; q=0.01'})
payload={'searchType':'U','deceased_surName':'','deceased_firstName':'','deceased_age':'','deceased_gender':'M','deceased_nationality':'','deathYear':'','deathMonth':'default','deathDay':'default','burialYear':'','burialMonth':'default','burialDay':'default','sectionNo':'','graveNo':''}
url='http://gls.fehd.gov.hk/FEHD_LGS/util/getSearchResult.jsp'
s.post(url,data=payload)
If the value you want to send is 'U' this might help you send it correctly.
payload = {'searchType': "'U'"}
s.post(url,data=payload)
Edit:
I don't think you need to make a post request. Try making a get request:
url='http://gls.fehd.gov.hk/FEHD_LGS/util/getSearchResult.jsp'
response = requests.get("%s?%s" % (url, "searchType='U'&deceased_surname=%E6%A5%8A&deceased_firstname=&deceased_age=&deceased_gender='M'&deceased_nationality=&deathYear=&deathMonth=default&deathDay=default&burialYear=&burialMonth=default&burialDay=default&sectionNo=&graveNo=–"))
print(response.content.decode())
select * from cccs_dece_info where SITE_ID in (12,13) and GRAVE_TYPE in ('U') and ( DECEASED_CNAME like '楊%' or upper(DECEASED_ENAME) like '楊 %' or DECEASED_ALIAS = '楊' or DECEASED_ALIAS = '楊') and ( DECEASED_SEX_CODE in ('M', 'U')) and ( GRAVE_NO='–' )
java.sql.SQLSyntaxErrorException: ORA-01722: invalid number
If your server handle post payload in json format, format your payload to json first.
import requests
import json
url = "http://someurl.com/"
# format for json payload
def post(url, param):
payload = json.dumps(param)
payload = payload.replace(", ", ",")
payload = payload.replace("{", "{\n\t")
payload = payload.replace("\",", "\",\n\t")
payload = payload.replace("}", "\n}")
return response = requests.request("POST", url, data=payload)
payloads = dict(searchType ='U')
response = post(url, payloads)
print(response.response.text)
There is nothing wrong with the code, look like there are something wrong with your url/server,.. I checked with Postman look like this picture
Have you tried another method to do POST payload? (ex:Postman or PHP POST Method)

Why isn't my Python program working? It uses HTTP Headers

EDIT: I changed the code and it still doesn't work! I used the links from the answer to do it but it didn't work!
Why does this not work? When I run it takes a long time to run and never finishes!
import urllib
import urllib2
url = 'https://www.locationary.com/index.jsp?ACTION_TOKEN=tile_loginBar_jsp$JspView$LoginAction'
values = {'inUserName' : 'USER',
'inUserPass' : 'PASSWORD'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
req.add_header('Host', 'www.locationary.com')
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0')
req.add_header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8')
req.add_header('Accept-Language', 'en-us,en;q=0.5')
req.add_header('Accept-Encoding','gzip, deflate')
req.add_header('Accept-Charset','ISO-8859-1,utf-8;q=0.7,*;q=0.7')
req.add_header('Connection','keep-alive')
req.add_header('Referer','http://www.locationary.com/')
req.add_header('Cookie','site_version=REGULAR; __utma=47547066.1079503560.1321924193.1322707232.1324693472.36; __utmz=47547066.1321924193.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); nickname=jacob501; locaCountry=1033; locaState=1795; locaCity=Montreal; jforumUserId=1; PMS=1; TurnOFfTips=true; Locacookie=enable; __utma=47547066.1079503560.1321924193.1322707232.1324693472.36; __utmz=47547066.1321924193.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); nickname=jacob501; PMS=1; __utmb=47547066.15.10.1324693472; __utmc=47547066; JSESSIONID=DC7F5AB08264A51FBCDB836393CB16E7; PSESSIONID=28b334905ab6305f7a7fe051e83857bc280af1a9; __utmc=47547066; __utmb=47547066.15.10.1324693472; ACTION_RESULT_CODE=ACTION_RESULT_FAIL; ACTION_ERROR_TEXT=java.lang.NullPointerException')
req.add_header('Content-Type','application/x-www-form-urlencoded')
#user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
#headers = { 'User-Agent' : user_agent }
response = urllib2.urlopen(req)
page = response.read()
print page
The remote server (the one at www.locationary.com) is waiting for the content of your HTTP post request, based on the Content-Type and Content-Length headers. Since you're never actually sending said awaited data, the remote server waits — and so does read() — until you do so.
I need to know how to send the content of my http post request.
Well, you need to actually send some data in the request. See:
urllib2 - The Missing Manual
How do I send a HTTP POST value to a (PHP) page using Python?
Final, "working" version:
import urllib
import urllib2
url = 'https://www.locationary.com/index.jsp?ACTION_TOKEN=tile_loginBar_jsp$JspView$LoginAction'
values = {'inUserName' : 'USER',
'inUserPass' : 'PASSWORD'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
req.add_header('Host', 'www.locationary.com')
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0')
req.add_header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8')
req.add_header('Accept-Language', 'en-us,en;q=0.5')
req.add_header('Accept-Charset','ISO-8859-1,utf-8;q=0.7,*;q=0.7')
req.add_header('Connection','keep-alive')
req.add_header('Referer','http://www.locationary.com/')
req.add_header('Cookie','site_version=REGULAR; __utma=47547066.1079503560.1321924193.1322707232.1324693472.36; __utmz=47547066.1321924193.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); nickname=jacob501; locaCountry=1033; locaState=1795; locaCity=Montreal; jforumUserId=1; PMS=1; TurnOFfTips=true; Locacookie=enable; __utma=47547066.1079503560.1321924193.1322707232.1324693472.36; __utmz=47547066.1321924193.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); nickname=jacob501; PMS=1; __utmb=47547066.15.10.1324693472; __utmc=47547066; JSESSIONID=DC7F5AB08264A51FBCDB836393CB16E7; PSESSIONID=28b334905ab6305f7a7fe051e83857bc280af1a9; __utmc=47547066; __utmb=47547066.15.10.1324693472; ACTION_RESULT_CODE=ACTION_RESULT_FAIL; ACTION_ERROR_TEXT=java.lang.NullPointerException')
req.add_header('Content-Type','application/x-www-form-urlencoded')
response = urllib2.urlopen(req)
page = response.read()
print page
Don't explicitly set the Content-Length header
Remove the req.add_header('Accept-Encoding','gzip, deflate') line, so that the response doesn't have to be decompressed (or — exercise left to the reader — ungzip it yourself)

Categories

Resources