I'm trying to mimic this web requests mechanism. I have all the data. What I need to know is how I can do the POST requests using Python Selenium?
I you really just need to make a post request from python there is actually no need to use selenium at all. As Tim already pointed out you want to simulate the user interaction with the web page in selenium.
If you really just want to make a post request, you can just use http.client:
>>> import http.client, urllib.parse
>>> params = urllib.parse.urlencode({'#number': 12524, '#type': 'issue', '#action': 'show'})
>>> headers = {"Content-type": "application/x-www-form-urlencoded",
... "Accept": "text/plain"}
>>> conn = http.client.HTTPConnection("bugs.python.org")
>>> conn.request("POST", "", params, headers)
>>> response = conn.getresponse()
>>> print(response.status, response.reason)
302 Found
>>> data = response.read()
>>> data
b'Redirecting to http://bugs.python.org/issue12524'
>>> conn.close()
Related
Hello I am trying to retrieve the json of soraredata by this link but it returns me a source code without json.
When I put this link in a software called Insomnia it happens to have the json so I think it must be possible with requests?
sorry for my english i use the translator.
edit : the link seems to work without the "my_username" so url = "https://www.soraredata.com/api/stats/newFullRankings/all/false/all/7/0/sr_football"
I get a status code 403, I don't know what is missing to get 200?
Thank you
headers = {
"Host" : "www.soraredata.com",
"User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0",
"Referer" : "https://www.soraredata.com/rankings",
}
#url = "https://www.soraredata.com/api/stats/newFullRankings/all/false/all/7/{my_username}/0/sr_football"
res = requests.get(url, headers=headers)
html = res.text
#html = json.loads(html)
print(html)
Here is a solution I got to work.
import http.client
import json
import socket
import ssl
import urllib.request
hostname = "www.soraredata.com"
path = "/api/stats/newFullRankings/all/false/all/7/0/sr_football"
http_msg = "GET {path} HTTP/1.1\r\nHost: {host}\r\nAccept-Encoding: identity\r\nUser-Agent: python-urllib3/1.26.7\r\n\r\n".format(
host=hostname,
path=path
).encode("utf-8")
sock = socket.create_connection((hostname, 443), timeout=3.1)
context = ssl.create_default_context()
with sock:
with context.wrap_socket(sock, server_hostname=hostname) as ssock:
ssock.sendall(urllib3_msg)
response = http.client.HTTPResponse(ssock, method="GET")
response.begin()
print(response.status, response.reason)
data = response.read()
resp_data = json.loads(data.decode("utf-8"))
What was perplexing is that the HTTP message I used was the exact same one used by urllib3, as indicated when debugging the following code. (See the this answer for how to set up logging to debug requests, which also works for urllib3.)
Yet, this code gave a 403 HTTP status code.
import urllib3
http = urllib3.PoolManager()
r = http.request(
"GET",
"https://www.soraredata.com/api/stats/newFullRankings/all/false/all/7/0/sr_football",
)
assert r.status == 403
Moreover http.client also gave a 403 status code, and it seems to be doing pretty much what I did above: wrap a socket in an SSL context and send the request.
conn = http.client.HTTPSConnection(hostname)
conn.request("GET", path)
res = conn.getresponse()
assert res.status == 403
Thank you ogdenkev!
I also found this but it doesn't always work
import cloudscraper
import json
scraper = cloudscraper.create_scraper()
r = scraper.get(url,).text
y = json.loads(r)
print (y)
I am trying to use post request in https website.But urllib work only on http.so can you tell me how to use urllib for https.
thanks in advance.
It's simply not true that urllib only works on HTTP, not HTTPS. It fully supports HTTPS.
In any case though, you probably want to be using the third-party library requests.
I'd rather use httplib:
import httplib
import urllib
params = urllib.urlencode({'user': 'pew', 'age': 52})
headers = {"Content-type": "application/x-www-form-urlencoded", "Accept": "text/plain"}
conn = httplib.HTTPSConnection("your_domain.com")
conn.request("POST", "/form/handler/test", params, headers)
response = conn.getresponse()
print response.status
print response.reason
reply = response.read()
print reply
I want to send a POST request to the page after opening it using Python (using urllib2.urlopen). Webpage is http://wireless.walmart.com/content/shop-plans/?r=wm
Code which I am using right now is:
url = 'http://wireless.walmart.com/content/shop-plans/?r=wm'
user_agent = 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1;Trident/5.0)'
values = {'carrierID':'68',
'conditionToType':'1',
'cssPrepend':'wm20',
'partnerID':'36575'}
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
page = response.read()
walmart = open('Walmart_ContractPlans_ATT.html','wb')
walmart.write(page)
This is giving me page which opens by default, after inspecting the page using Firebug I came to know that carrierID:68 is sent when I click on the button which sends this POST request.
I want to simulate this browser behaviour.
Please help me in resolving this.
For webscraping I prefer to use requests and pyquery. First you download the data:
import requests
from pyquery import PyQuery as pq
url = 'http://wireless.walmart.com/content/getRatePlanInfo'
payload = {'carrierID':68, 'conditionToType':1, 'cssPrepend':'wm20'}
r = requests.post(url, data=payload)
d = pq(r.text)
After this you proceed to parse the elements, for example to extract all plans:
plans = []
plans_selector = '.wm20_planspage_planDetails_sub_detailsDiv_ul_li'
plans = d(plans_selector).each(lambda i, n: plans.append(pq(n).text()))
Result:
['Basic 200',
'Simply Everything',
'Everything Data 900',
'Everything Data 450',
'Talk 450',
...
I recommend looking at a browser emulator like mechanize, rather than trying to do this with raw HTTP requests.
When I try to post data from http to https, urllib2 does not return desired https webpage instead website asks to enable cookies.
To get first http page:
proxyHandler = urllib2.ProxyHandler({'http': "http://proxy:port" })
opener = urllib2.build_opener(proxyHandler)
opener.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0')]
urllib2.install_opener(opener)
resp = urllib2.urlopen(url)
content = resp.read()
When I extract data from above page and post data to second https page, urllib2 returns success status 200 and page asks to enable cookies.
I've checked the post data, its fine. I'm getting cookies from website but not sure whether they are being sent with next request or not as I read in python docs that urllib2 automatically handles cookies.
To get second https page:
resp = urllib2.urlopen(url, data=postData)
content = resp.read()
I also tried to set proxy handler to this as read in a reply to similar problem on stackoverflow somewhere but got same result:
proxyHandler = urllib2.ProxyHandler({'https': "http://proxy:port" })
urllib2 "handles" cookies in responses but it doesn't not automatically store them and resend them with later requests. You'll need to use the cooklib module for that.
There are some examples in the documentation that show how it works with urllib2.
I'm trying to send a POST request to a restful webservice. I need to pass some json in the request.It works with the curl command below
curl --basic -i --data '<json data string here>' -H Content-type:"text/plain" -X POST http://www.test.com/api
I need some help in making the above request from Python. To send this POST request from python I have the following code so far:
import urllib
url='http://www.test.com/api'
params = urllib.urlencode... #What should be here ?
data = urllib.urlopen(url, params).read()
I have the following three questions:
Is this the correct way to send the resuest ?.
How should i specify the params value ?
Does content-type need to be specified ?
Please Help
Thank You
The documentation for httplib has an example of sending a post request.
>>> import httplib, urllib
>>> params = urllib.urlencode({'#number': 12524, '#type': 'issue', '#action': 'show'})
>>> headers = {"Content-type": "application/x-www-form-urlencoded",
... "Accept": "text/plain"}
>>> conn = httplib.HTTPConnection("bugs.python.org")
>>> conn.request("POST", "", params, headers)
>>> response = conn.getresponse()
>>> print response.status, response.reason
302 Found
>>> data = response.read()
>>> data
'Redirecting to http://bugs.python.org/issue12524'
>>> conn.close()
Construct a dict of the data you want to be sent as a POST request.
urlencode the dict to get a string.
urlopen the URL you want, passing in the optional data parameter as your encoded POST data.
the question deals with sending the parameters as "json"..
you need to set the Content-Type to application/json in the headers and then send the paramters without urlencoding..
ex:
url = "someUrl"
data = { "data":"ur data"}
header = {"Content-Type":"application/json","User-Agent":"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"}
#lets use httplib2
import httplib2
http = httplib2.Http()
response, send = http.request(url,"POST",headers=header,body=data)
You don't need urllib.urlencode() if Content-Type is not application/x-www-form-urlencoded:
import json, urllib2
data = {"some": "json", "d": ["a", "ta"]}
req = urllib2.Request("http://www.test.com/api", data=json.dumps(data),
headers={"Content-Type": "application/json"})
print urllib2.urlopen(req).read()
import requests
endpoint = 'https://xxxxxxxxxxxxxxxxxxx.com'
headers = {'Content-Type': 'text/plain'}
data = '{ id: 1 }'
result = requests.post(endpoint, headers=headers, data=data)
print(result)
Here's a sample snippet on POST request of json. The results will be printed in your terminal.
import urllib, urllib2
url = 'http://www.test.com/api'
values = dict(data=json.dumps({"jsonkey2": "jsonvalue2", "jsonkey2": "jsonvalue2"}))
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
rsp = urllib2.urlopen(req)
content = rsp.read()
print content