Differences of pythons urllib , urllib2 and requests libary - python

I have 2 scripts which submits a post request with Ajax parameters.
One script uses the requests libary (script that works).
The other one uses urllib & urllib2.
At the moment I have no idea why the urllib script does not work.
Can anybody help?
Script using Request:
import requests
s = requests.Session()
url = "https://www.shirtinator.de/?cT=search/motives&sq=junggesellenabschied"
data1 = {
'xajax': 'searchBrowse',
'xajaxr': '1455134430801',
'xajaxargs[]': ['1', 'true', 'true', 'motives', '100'],
}
r = s.post(url, data=data1, headers={'X-Requested-With': 'XMLHttpRequest'}, verify=False)
#r = requests.post(url, data=data1, headers={'X-Requested-With': 'XMLHttpRequest'}, verify=False)
#r = requests.post(url, verify=False)
result = r.text
print result
print result.count("motiveImageBox")
Script using urllib:
import urllib2
import urllib
#
url = "https://www.shirtinator.de/?cT=search/motives&sq=junggesellenabschied"
data = ({
'xajax': 'searchBrowse',
'xajaxr': '1455134430801',
'xajaxargs[]': ['1', 'true', 'true', 'motives', '100'],
})
encode_data = urllib.urlencode(data)
print encode_data
req = urllib2.Request(url,encode_data)
response = urllib2.urlopen(req)
d = response.read()
print d
print d.count("motiveImageBox")

Related

scraping data from a web page with python requests

I am trying to scrape domain search page (where you can enter the keyword, and get some random results) and
I found this api url in network tab https://api.leandomainsearch.com/search?query=computer&count=all (for keyword: computer), but I am getting this error
{'error': True, 'message': 'Invalid API Credentials'}
here is the code
import requests
r = requests.get("https://api.leandomainsearch.com/search?query=cmputer&count=all")
print(r.json())
The site needs that you set Authorization and Referer HTTP headers.
For example:
import re
import json
import requests
kw = 'computer'
url = 'https://leandomainsearch.com/search/'
api_url = 'https://api.leandomainsearch.com/search'
api_key = re.search(r'"apiKey":"(.*?)"', requests.get(url, params={'q': kw}).text)[1]
headers = {'Authorization': 'Key ' + api_key, 'Referer': 'https://leandomainsearch.com/search/?q={}'.format(kw)}
data = requests.get(api_url, params={'query': kw, 'count': 'all'}, headers=headers).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for d in data['domains']:
print(d['name'])
print()
print('Total:', data['_meta']['total_records'])
Prints:
...
blackopscomputer.com
allegiancecomputer.com
northpolecomputer.com
monumentalcomputer.com
fissioncomputer.com
hedgehogcomputer.com
blackwellcomputer.com
reflectionscomputer.com
towerscomputer.com
offgridcomputer.com
redefinecomputer.com
quantumleapcomputer.com
Total: 1727

Python - Print specific cookie value from request

Code:
from bs4 import BeautifulSoup
import requests
#GET SESSION
s = requests.Session()
s.get('https://www.clos19.com/en-gb/')
#GET CSRFToken
r = requests.get('https://www.clos19.com/en-gb/login')
soup = BeautifulSoup(r.text, 'html.parser')
CSRFToken = soup.find('input', type='hidden')["value"]
print(soup.find('input', type='hidden')["value"])
#ADD TO CART
payload = {
'productCode': '100559',
'qty': '1',
'CSRFToken': CSRFToken
}
r = s.post('https://www.clos19.com/en-gb/ajax/addToCart', data=payload, allow_redirects=False)
print(r.status_code)
print(s.cookies.get_dict())
Terminal output:
{'JSESSIONID': '448858956754C2F8F2DCF1CC4B803833', 'ROUTEID': '.app1', 'mhPreferredCountry': 'GB', 'mhPreferredLanguage': 'en_GB'}
How do I print only the JSESSIONID? Therefore the desired printed value would be 448858956754C2F8F2DCF1CC4B803833
change the last line to
print(s.cookies['JSESSIONID'])

How do I unpack a Python requests.response object and extract string representations of its data? [duplicate]

x = requests.post(url, data=data)
print x.cookies
I used the requests library to get some cookies from a website, but I can only get the cookies
from the Response, how to get the cookies from the Request? Thanks!
Alternatively, you can use requests.Session and observe cookies before and after a request:
>>> import requests
>>> session = requests.Session()
>>> print(session.cookies.get_dict())
{}
>>> response = session.get('http://google.com')
>>> print(session.cookies.get_dict())
{'PREF': 'ID=5514c728c9215a9a:FF=0:TM=1406958091:LM=1406958091:S=KfAG0U9jYhrB0XNf', 'NID': '67=TVMYiq2wLMNvJi5SiaONeIQVNqxSc2RAwVrCnuYgTQYAHIZAGESHHPL0xsyM9EMpluLDQgaj3db_V37NjvshV-eoQdA8u43M8UwHMqZdL-S2gjho8j0-Fe1XuH5wYr9v'}
If you need the path and thedomain for each cookie, which get_dict() is not exposes, you can parse the cookies manually, for instance:
[
{'name': c.name, 'value': c.value, 'domain': c.domain, 'path': c.path}
for c in session.cookies
]
url = "http://localhost:8070/web/session/authenticate"
data = {}
header = {"Content-Type": "application/json"}
x = requests.post(url, json=data, headers=header)
print(x.cookies.get_dict())

Python: how to send html-code in a POST request

I try to send htm-code from python script to Joomla site.
description = "<h1>Header</h1><p>text</p>"
values = {'description' : description.encode(self.encoding),
'id = ' : 5,
}
data = urlencode(values)
binData = data.encode(self.encoding)
headers = { 'User-Agent' : self.userAgent,
'X-Requested-With' : 'XMLHttpRequest'}
req = urllib2.Request(self.addr, binData, headers)
response = urllib2.urlopen(req)
rawreply = response.read()
At Joomla-server I got the same string but without html:
$desc = JRequest::getVar('description', '', 'POST');
What's wrong?
You should use requests
pip install requests
then
import requests
description = "<h1>Header</h1><p>text</p>"
values = dict(description='description',id=5)
response = requests.post(self.addr,data=values)
if response.ok:
print response.json()
or if joomla didnt return json
print response.content
JRequest::getVar or JRequest::getString filter HTML code. But it can be turned off:
$desc = JRequest::getVar('description', '', 'POST', 'string', JREQUEST_ALLOWHTML);

Python 'requests' module and encoding?

I am doing a simple POST request using the requests module, and testing it against httpbin
import requests
url = 'http://httpbin.org/post'
params = {'apikey':'666666'}
sample = {'sample': open('test.bin', 'r')}
response = requests.post( url, files=sample, params=params, verify=False)
report_info = response.json()
print report_info
I have an issue with the encoding. It is not using application/octet-stream and so the encoding is not correct. From the headers, I see:
{
u'origin': u'xxxx, xxxxxx',
u'files': {
u'sample': u'data:None;base64,qANQR1DBw..........
So, I get data:None instead of data:application/octet-stream when I try with curl. The file size and encoding is incorrect.
How can I force or check that it is using application/octet-stream?
Sample taken from http://www.python-requests.org/en/latest/user/quickstart/#custom-headers
>>> import json
>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}
>>> headers = {'content-type': 'application/json'}
>>> r = requests.post(url, data=json.dumps(payload), headers=headers)
You might want to change the headers to
headers = {'content-type': 'application/octet-stream'}
response = requests.post( url, files=sample, params=params, verify=False,
headers = headers)

Categories

Resources