How to convert curl request to requests module in python - python

How to convert this curl request to python requests module compitable(Post request)
curl 'http://sss.com' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36' -H --data 'aq=%40syssource%3DProFind%20AND%20NOT%20%40title%3DCoveo%20AND%20NOT%20%40title%3Derror&searchHub=ProFind&xxx=yyy&xxx=yyy&xxx=yyy=10&xxx=yyy' --compressed
I am searching for requests module here
http://docs.python-requests.org/en/master/user/quickstart/
But they only have data value as key,value there
r = requests.post('http://httpbin.org/post', data = {'key':'value'})
So how could in convert the above curl post request to python requests module to make post request

The documentation you linked says:
There are many times that you want to send data that is not form-encoded. If you pass in a string instead of a dict, that data will be posted directly.
So just use
r = requests.post('http://sss.com', data = 'aq=%40syssource%3DProFind%20AND%20NOT%20%40title%3DCoveo%20AND%20NOT%20%40title%3Derror&searchHub=ProFind&xxx=yyy&xxx=yyy&xxx=yyy=10&xxx=yyy')

Related

How to consume this API in Python? I just can't

So i'm trying to consume this API, I got this URL http://www.ventamovil.com.mx:9092/service.asmx?op=Check_Balance
There you can write this {"User":"6144135400","Password":"Prueba$$"} on the input field and you get a response.
https://i.stack.imgur.com/RTEii.png
Response
But when i try to consume this api on python i just can't, i don't exactly know how to consume correctly:
My Code
As you can see i got a different response with my code, i should be getting the same response as the "Response" image.
To save yourself some time, you can use their request to build python code automatically, all you have to do is:
Just as you did at first, enter the json in the input field and invoke.
open the network tab, copy the post request they made as curl
curl 'http://www.ventamovil.com.mx:9092/service.asmx/Check_Balance' -H 'Connection: keep-alive' -H 'Cache-Control: max-age=0' -H 'Upgrade-Insecure-Requests: 1' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36' -H 'Origin: http://www.ventamovil.com.mx:9092' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' -H 'Referer: http://www.ventamovil.com.mx:9092/service.asmx?op=Check_Balance' -H 'Accept-Language: en-US,en;q=0.9,ar;q=0.8,pt;q=0.7' --data 'jrquest=%7B%22User%22%3A6144135400%2C+%22Password%22%3A+%22Prueba%24%24%22%7D' --compressed --insecure
Go to postman and import the curl, then click code and select python, and here you go you have all the right headers needed
import requests
url = "http://www.ventamovil.com.mx:9092/service.asmx/Check_Balance"
payload = 'jrquest=%7B%22User%22%3A6144135400%2C+%22Password%22%3A+%22Prueba%24%24%22%7D'
headers = {
'Upgrade-Insecure-Requests': '1',
'Content-Type': 'application/x-www-form-urlencoded',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
}
response = requests.request("POST", url, headers=headers, data = payload)
print(response.text.encode('utf8'))
As you can see, they accept their input as form encoded payload.
You need to modify this request to be parameterized with user/password you want each time you use.
Btw, the output of this python code is:
b'<?xml version="1.0" encoding="utf-8"?>\r\n<string xmlns="http://www.ventamovil.com.mx/ws/">{"Confirmation":"00","Saldo_Inicial":"10000","Compras":"9360","Ventas":"8416","Comision":"469","Balance":"10345.92"}</string>'

python request get a REST and return 404

I have a REST url like:
url='http://xx.xx.xx.xx/server/rest/line/125'
I can get correct return json by command curl, like this:
curl -i http://xx.xx.xx.xx/server/rest/line/125
But when i use Python3 requests, it allways return 404, Python code:
import requests
resp = requests.get(r'http://xx.xx.xx.xx/server/rest/line/125')
print(resp.status_code)
Anyone can tell me the problem? Is the server block the request?
A 404 is displayed when:
The URL is incorrect and the response is actually accurate.
Trailing spaces in the URL
The website may not like HTTP(S) requests coming from Python code. Change your headers by adding "www." to your Referer url.
resp = requests.get(r'http://www.xx.xx.xx.xx/server/rest/line/125')
or
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}
result = requests.get('https://www.transfermarkt.co.uk', headers=headers)
In your case, try option #3.

Website blocking out curl even with real browser's headers

I noticed that http://www.momondo.com.cn/ is using some magic technology:
curl doesn't work on it. The URL displays fine in a web browser, but curl always returns a timeout, even when I add all of the headers like a web browser would.
I also tried Python requests and urllib2, but they didn't work either.
C:\Users\Administrator>curl -v -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36" -H "Connection: Keep-Alive" -H "Accept-Encoding:gzip, deflate, sdch" -H "Cache-Control:no-cache" -H "Upgrade-Insecure-Requests:1" -H "Accept-Language:zh-CN,zh;q=0.8" -H "Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
http://www.momondo.com.cn/
* About to connect() to www.momondo.com.cn port 80 (#0)
* Trying 184.50.91.106...
* connected
* Connected to www.momondo.com.cn (184.50.91.106) port 80 (#0)
> GET / HTTP/1.1
> Host: www.momondo.com.cn
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36
> Connection: Keep-Alive
> Accept-Encoding:gzip, deflate, sdch
> Cache-Control:no-cache
> Upgrade-Insecure-Requests:1
> Accept-Language:zh-CN,zh;q=0.8
> Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
>
Why and how does this happen? How does Momondo escape curl?
How are you setting up the request? If you are using requests you should use the Session object type and change the headers there so they can be easily reused. It doesn't look like they are doing anything special because using telnet directly on that site (i.e. telnet www.momondo.com.cn 80) with the headers generated by the browser (captured via tcpdump, just to be sure) resulted in content returned rather than hanging till timeout. Also, it pays attention to look at what CDN (content delivery network) the site is behind, and in this case the address resolves to some subdomain at akamaiedge.net which means it might be useful to check out why they might have blocked you.
Anyway, using the headers you have supplied with a requests.Session object, a response was generated.
>>> from requests import Session
>>> session = Session()
>>> session.headers # check the default headers
{'User-Agent': 'python-requests/2.12.5', 'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*'}
>>> session.headers['Accept'] = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
>>> session.headers['Accept-Language'] = 'en-GB,en-US;q=0.8,en;q=0.6,zh-TW;q=0.4'
>>> session.headers['Cache-Control'] = 'max-age=0'
>>> session.headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36'
>>> response = session.get('http://www.momondo.com.cn/')
>>> response
<Response [200]>
Doesn't seem to be anything magic at all.
I figure out the reason:
momondo is using following methods to block unreal web clients.
Detect the user-agent. Can not be curl's default UA.
Detect the "Connection" header. Must use "keep-alive" rather than "Keep-Alive" in my initial test.
Detect the "Accept-Encoding" header. Can not be empty, can use anything.
Finally i can use curl to get the content now:
curl -v -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X
10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89
Safari/537.36" -H "Connection: keep-alive" -H "Accept-Encoding:
nothing" http://www.momondo.com.cn/
BTW, I have doing webscraping for about seven years. This is the first time i met a website used this anti-scraping method. Mark it.

convert curl command to python [pycurl or request or anythings!]

I`m try to post a data to specific url and get response and save it in the file with curl and no problem with it.
This webpage post a data and if data is correct show response webpage, and if not redirect to url like that:
http://example.com/url/foo/bar/error
My code is:
curl --fail --user-agent "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36" --data "mydataexample" --referer "http://example.com/url/foo/bar" http://example.com/url/foo/bar --output test.html
But when code in python with requests always status_code is OK [200] even with wrong data! and no exist correct response to save!
Here is my python code:
import requests
data = 'myexampledata'
headers = { 'referer':'http://example.com/url/foo/bar' ,'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36' }
url = 'http://example.com/url/foo/bar'
r = requests.post(url,params=data,headers=headers)
# check headers and status_code for test if data is wrong...
print r.headers
print r.status_code
And now, How write a python code to solve this problem and exactly work same as curl command? Any advise with requests or pycurl to fix it?

Web Scraping Header issue

I am playing about with scraping data from websites as an educational exercise. I am using python and beautiful soup.
I am basically looking at products on a page e.g.
http://www.asos.com/Women/Dresses/Cat/pgecategory.aspx?cid=8799#parentID=-1&pge=0&pgeSize=5&sort=-1
I noticed it had the parameters pge and pgeSize which I can change in the browser and give the results I would expect, but when running user python requests, it always returns the same 36 products (36 being the default)
I thought this was a header issue so I tried using curl Chrome developer tools to try and work out which headers I needed but with curl I can't get past the following response:
curl -c ~/cookie -H "Accept: application/xml" -H "Accept-Language: en-GB,en-US;q=0.8,en;q=0.6" -H "Content-Type: application/xml" -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" -X GET 'http://www.asos.com/Women/Dresses/Cat/pgecategory.aspx?cid=8799#parentID=-1&pge=0&pgeSize=5&sort=-1'
Object moved
Object moved to here.
How or what is the correct way to debug and try to work this out?
The default dresses are always returned for URL /Women/Dresses/Cat/pgecategory.aspx?cid=8799&r=2.
Notice parentID=-1&pge=7&pgeSize=5&sort=-1 are after # sign.
There is an additional query that fetches the right dresses and replaces them for you.
You need to provide an asos cookie, e.g. using this curl flag:
curl --cookie "asos=currencyid=19" 'http://www.asos.com/Women/Dresses/Cat/pgecategory.aspx?cid=8799#parentID=-1&pge=0&pgeSize=5&sort=-1'

Categories

Resources