I am trying to query a website in Python. I need to use a POST method (according to what is happening in my browser when I monitor it with the developer tools).
If I query the website with cURL, it works well:
curl -i --data "param1=var1¶m2=var2" http://www.test.com
I get this header:
HTTP/1.1 200 OK
Date: Tue, 26 Sep 2017 08:46:18 GMT
Server: Apache/1.3.33 (Unix) mod_gzip/1.3.26.1a mod_fastcgi/2.4.2 PHP/4.3.11
Transfer-Encoding: chunked
Content-Type: text/html
But when I do it in Python 3, I get an error 104.
Here is what I tried so far. First, with urllib (getting inspiration from this thread to manage to use a POST method instead of GET):
import re
from urllib import request as ur
# URL to handle request
url = "http://www.test.com"
data = "param1=var1¶m2=var2"
# Build a request dictionary
preq = [re.findall("[^=]+", i) for i in re.findall("[^\&]+", data)]
dreq = {i[0]: i[1] if len(i) == 2 else "" for i in preq}
# Initiate request & add method
ureq = ur.Request(url)
ureq.get_method = lambda: "POST"
# Send request
req = ur.urlopen(ureq, data=str(dreq).encode())
I did basically the same with requests:
import re
import requests
# URL to handle request
url = "http://www.test.com"
data = "param1=var1¶m2=var2"
# Build a request dictionary
preq = [re.findall("[^=]+", i) for i in re.findall("[^\&]+", data)]
dreq = {i[0]: i[1] if len(i) == 2 else "" for i in preq}
# Initiate request & add method
req = requests.post(url, data=dreq)
In both cases, I get a HTTP 104 error:
ConnectionResetError: [Errno 104] Connection reset by peer
That I don't understand since the same request is working with cURL. I guess I misunderstood something with Python request but so far I'm stuck. Any hint would be appreciated!
I've just figured out I did not pass data in the right format. I thought it needed to be store in a dict; that is not the case and it is therefore much more simple that what I tried previously.
With urllib:
req = ur.urlopen(ureq, data=str(data).encode())
With requests:
req = requests.post(url, data=data)
Related
I am trying to create a bot to read cookies but I'm failing to do so, what the hell am I doing wrong?
import urllib
import http.cookiejar
URL = 'https://roblox.com'
def extract_cookies():
cookie_jar = http.cookiejar.CookieJar()
url_opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cookie_jar))
url_opener.open(URL)
print(URL)
for cookie in cookie_jar:
print("[Cookie Name = %s] [Cookie Value = %s]" %(cookie.name, cookie.value))
if __name__ == '__main__':
extract_cookies()```
It's a permanent redirect: https://roblox.com redirects to https://www.roblox.com. This is why you get an HTTP 308 status code. Note the difference in www.
The server tells you where to go in the HTTP response:
HTTP GET https://roblox.com/
--
HTTP/2 308 Permanent Redirect
location: https://www.roblox.com/
So update your URL to https://www.roblox.com
I am having a device which is sending the following http message to my RaspberryPi:
POST /sinvertwebmonitor/InverterService/InverterService.asmx/CollectInverterData HTTP/1.1
Host: www.automation.siemens.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 349
xmlData=<rd><m>xxxxx</m><s>yyyyyy</s><d t="1483019400" l="600"><p i="1">460380AE</p><p i="2">43655DE7</p><p i="3">4212C986</p><p i="4">424805BC</p><p i="5">4604E3D1</p><p i="6">441F616A</p><p i="7">4155E7F5</p><p i="8">E1</p><p i="9">112</p><p i="C">153</p><p i="D">4</p><p i="E">11ABAC</p><p i="F">22A48C</p><p i="10">0</p></d></rd>
I cannot change anything on the device.
On the RaspberryPi im running a script to listen and receive the message from a socket.
This works so far and the received message is the one above.
Now, I would like to create a HTTP object from this message and then extract comfortably the header, content and so on.
Similar to the following example:
r = requests.get('https://www.google.com')
r.status_code
However, without "getting" an url. I just want to read the string I already have.
Pseudo-example:
r = requests.read(hereComesTheString)
r.status_code
I hope the problem became understandable.
Would be glad to get some hints.
Thanks and best regards,
Christoph
You use the status_code property in your example, but what you are receiving is a request not a response. However you can still create a simple object for accessing the data in the request.
It is probably easiest to create your own custom class:
import mimetools
from StringIO import StringIO
request = """POST /sinvertwebmonitor/InverterService/InverterService.asmx/CollectInverterData HTTP/1.1
Host: www.automation.siemens.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 349
xmlData=<rd><m>xxxxx</m><s>yyyyyy</s><d t="1483019400" l="600"><p i="1">460380AE</p><p i="2">43655DE7</p><p i="3">4212C986</p><p i="4">424805BC</p><p i="5">4604E3D1</p><p i="6">441F616A</p><p i="7">4155E7F5</p><p i="8">E1</p><p i="9">112</p><p i="C">153</p><p i="D">4</p><p i="E">11ABAC</p><p i="F">22A48C</p><p i="10">0</p></d></rd>"""
class Request:
def __init__(self, request):
stream = StringIO(request)
request = stream.readline()
words = request.split()
[self.command, self.path, self.version] = words
self.headers = mimetools.Message(stream, 0)
self.content = stream.read()
def __getitem__(self, key):
return self.headers.get(key, '')
r = Request(request)
print(r.command)
print(r.path)
print(r.version)
for header in r.headers:
print(header, r[header])
print(r.content)
This outputs:
POST
/sinvertwebmonitor/InverterService/InverterService.asmx/CollectInverterData
HTTP/1.1
('host', 'www.automation.siemens.com')
('content-type', 'application/x-www-form-urlencoded')
('content-length', '349')
xmlData=<rd><m>xxxxx</m><s>yyyyyy</s><d t="1483019400" l="600"><p i="1">460380AE</p><p i="2">43655DE7</p><p i="3">4212C986</p><p i="4">424805BC</p><p i="5">4604E3D1</p><p i="6">441F616A</p><p i="7">4155E7F5</p><p i="8">E1</p><p i="9">112</p><p i="C">153</p><p i="D">4</p><p i="E">11ABAC</p><p i="F">22A48C</p><p i="10">0</p></d></rd>
If you're using plain socket server, then you need to implement an HTTP server so that you can split the request and respond according to the protocol.
It's probably easier just to use an existing HTTP server and app server. Flask is ideal for this:
from flask import Flask
from flask import request
app = Flask(__name__)
#app.route("/sinvertwebmonitor/InverterService/InverterService.asmx/CollectInverterData", methods=['POST'])
def dataCollector():
data = request.form['xmlData']
print(data)
# parseData. Take a look at ElementTree
if __name__ == "__main__":
app.run(host=0.0.0.0, port=80)
Thanks Alden. Below your code with a few changes so it works with Python3.
import email
from io import StringIO
request = """POST /sinvertwebmonitor/InverterService/InverterService.asmx/CollectInverterData HTTP/1.1
Host: www.automation.siemens.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 349
xmlData=<rd><m>xxxxx</m><s>yyyyyy</s><d t="1483019400" l="600"><p i="1">460380AE</p><p i="2">43655DE7</p><p i="3">4212C986</p><p i="4">424805BC</p><p i="5">4604E3D1</p><p i="6">441F616A</p><p i="7">4155E7F5</p><p i="8">E1</p><p i="9">112</p><p i="C">153</p><p i="D">4</p><p i="E">11ABAC</p><p i="F">22A48C</p><p i="10">0</p></d></rd>"""
class Request:
def __init__(self, request):
stream = StringIO(request)
request = stream.readline()
words = request.split()
[self.command, self.path, self.version] = words
self.headers = email.message_from_string(request)
self.content = stream.read()
def __getitem__(self, key):
return self.headers.get(key, '')
r = Request(request)
print(r.command)
print(r.path)
print(r.version)
for header in r.headers:
print(header, r[header])
print(r.content)
I am trying to upload a file to the server in a simple way and I am getting the following error:
HTTP/1.1 411 Length Required
Content-Type: text/html
Date: Wed, 01 Jul 2015 03:05:33 GMT
Connection: close
Content-Length: 24
Length Required
I tried to insert length in different parts and looks like is not working, any suggestions?
import socket
import httplib
import os.path
target_host = "192.168.1.1"
target_port = 80
total_size = os.path.getsize('/root/test.html')
client = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
client.connect((target_host,target_port))
client.send("PUT /root/test.html HTTP/1.1\r\nHost:192.168.1.1\r\n\r\n" )
response = client.recv(4096)
print response
Have you tried implementing this with requests?
You could easily specify the missing content-length header that is causing your error.
Check out this answer: Is there any way to do HTTP PUT in python
Here's an example using the requests library:
payload = {'username': 'me', 'email': 'me#me.com'}
>>> r = requests.put("http://somedomain.org/endpoint", data=payload)
I am writing some code to interface with redmine and I need to upload some files as part of the process, but I am not sure how to do a POST request from python containing a binary file.
I am trying to mimic the commands here:
curl --data-binary "#image.png" -H "Content-Type: application/octet-stream" -X POST -u login:password http://redmine/uploads.xml
In python (below), but it does not seem to work. I am not sure if the problem is somehow related to encoding the file or if something is wrong with the headers.
import urllib2, os
FilePath = "C:\somefolder\somefile.7z"
FileData = open(FilePath, "rb")
length = os.path.getsize(FilePath)
password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password(None, 'http://redmine/', 'admin', 'admin')
auth_handler = urllib2.HTTPBasicAuthHandler(password_manager)
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
request = urllib2.Request( r'http://redmine/uploads.xml', FileData)
request.add_header('Content-Length', '%d' % length)
request.add_header('Content-Type', 'application/octet-stream')
try:
response = urllib2.urlopen( request)
print response.read()
except urllib2.HTTPError as e:
error_message = e.read()
print error_message
I have access to the server and it looks like a encoding error:
...
invalid byte sequence in UTF-8
Line: 1
Position: 624
Last 80 unconsumed characters:
7z¼¯'ÅÐз2^Ôøë4g¸R<süðí6kĤª¶!»=}jcdjSPúá-º#»ÄAtD»H7Ê!æ½]j):
(further down)
Started POST "/uploads.xml" for 192.168.0.117 at 2013-01-16 09:57:49 -0800
Processing by AttachmentsController#upload as XML
WARNING: Can't verify CSRF token authenticity
Current user: anonymous
Filter chain halted as :authorize_global rendered or redirected
Completed 401 Unauthorized in 13ms (ActiveRecord: 3.1ms)
Basically what you do is correct. Looking at redmine docs you linked to, it seems that suffix after the dot in the url denotes type of posted data (.json for JSON, .xml for XML), which agrees with the response you get - Processing by AttachmentsController#upload as XML. I guess maybe there's a bug in docs and to post binary data you should try using http://redmine/uploads url instead of http://redmine/uploads.xml.
Btw, I highly recommend very good and very popular Requests library for http in Python. It's much better than what's in the standard lib (urllib2). It supports authentication as well but I skipped it for brevity here.
import requests
with open('./x.png', 'rb') as f:
data = f.read()
res = requests.post(url='http://httpbin.org/post',
data=data,
headers={'Content-Type': 'application/octet-stream'})
# let's check if what we sent is what we intended to send...
import json
import base64
assert base64.b64decode(res.json()['data'][len('data:application/octet-stream;base64,'):]) == data
UPDATE
To find out why this works with Requests but not with urllib2 we have to examine the difference in what's being sent. To see this I'm sending traffic to http proxy (Fiddler) running on port 8888:
Using Requests
import requests
data = 'test data'
res = requests.post(url='http://localhost:8888',
data=data,
headers={'Content-Type': 'application/octet-stream'})
we see
POST http://localhost:8888/ HTTP/1.1
Host: localhost:8888
Content-Length: 9
Content-Type: application/octet-stream
Accept-Encoding: gzip, deflate, compress
Accept: */*
User-Agent: python-requests/1.0.4 CPython/2.7.3 Windows/Vista
test data
and using urllib2
import urllib2
data = 'test data'
req = urllib2.Request('http://localhost:8888', data)
req.add_header('Content-Length', '%d' % len(data))
req.add_header('Content-Type', 'application/octet-stream')
res = urllib2.urlopen(req)
we get
POST http://localhost:8888/ HTTP/1.1
Accept-Encoding: identity
Content-Length: 9
Host: localhost:8888
Content-Type: application/octet-stream
Connection: close
User-Agent: Python-urllib/2.7
test data
I don't see any differences which would warrant different behavior you observe. Having said that it's not uncommon for http servers to inspect User-Agent header and vary behavior based on its value. Try to change headers sent by Requests one by one making them the same as those being sent by urllib2 and see when it stops working.
This has nothing to do with a malformed upload. The HTTP error clearly specifies 401 unauthorized, and tells you the CSRF token is invalid. Try sending a valid CSRF token with the upload.
More about csrf tokens here:
What is a CSRF token ? What is its importance and how does it work?
you need to add Content-Disposition header, smth like this (although I used mod-python here, but principle should be the same):
request.headers_out['Content-Disposition'] = 'attachment; filename=%s' % myfname
You can use unirest, It provides easy method to post request.
`
import unirest
def callback(response):
print "code:"+ str(response.code)
print "******************"
print "headers:"+ str(response.headers)
print "******************"
print "body:"+ str(response.body)
print "******************"
print "raw_body:"+ str(response.raw_body)
# consume async post request
def consumePOSTRequestASync():
params = {'test1':'param1','test2':'param2'}
# we need to pass a dummy variable which is open method
# actually unirest does not provide variable to shift between
# application-x-www-form-urlencoded and
# multipart/form-data
params['dummy'] = open('dummy.txt', 'r')
url = 'http://httpbin.org/post'
headers = {"Accept": "application/json"}
# call get service with headers and params
unirest.post(url, headers = headers,params = params, callback = callback)
# post async request multipart/form-data
consumePOSTRequestASync()
I am using the Python Requests Module to datamine a website. As part of the datamining, I have to HTTP POST a form and check if it succeeded by checking the resulting URL. My question is, after the POST, is it possible to request the server to not send the entire page? I only need to check the URL, yet my program downloads the entire page and consumes unnecessary bandwidth. The code is very simple
import requests
r = requests.post(URL, payload)
if 'keyword' in r.url:
success
fail
An easy solution, if it's implementable for you. Is to go low-level. Use socket library.
For example you need to send a POST with some data in its body. I used this in my Crawler for one site.
import socket
from urllib import quote # POST body is escaped. use quote
req_header = "POST /{0} HTTP/1.1\r\nHost: www.yourtarget.com\r\nUser-Agent: For the lulz..\r\nContent-Type: application/x-www-form-urlencoded; charset=UTF-8\r\nContent-Length: {1}"
req_body = quote("data1=yourtestdata&data2=foo&data3=bar=")
req_url = "test.php"
header = req_header.format(req_url,str(len(req_body))) #plug in req_url as {0}
#and length of req_body as Content-length
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM) #create a socket
s.connect(("www.yourtarget.com",80)) #connect it
s.send(header+"\r\n\r\n"+body+"\r\n\r\n") # send header+ two times CR_LF + body + 2 times CR_LF to complete the request
page = ""
while True:
buf = s.recv(1024) #receive first 1024 bytes(in UTF-8 chars), this should be enought to receive the header in one try
if not buf:
break
if "\r\n\r\n" in page: # if we received the whole header(ending with 2x CRLF) break
break
page+=buf
s.close() # close the socket here. which should close the TCP connection even if data is still flowing in
# this should leave you with a header where you should find a 302 redirected and then your target URL in "Location:" header statement.
There's a chance the site uses Post/Redirect/Get (PRG) pattern. If so then it's enough to not follow redirect and read Location header from response.
Example
>>> import requests
>>> response = requests.get('http://httpbin.org/redirect/1', allow_redirects=False)
>>> response.status_code
302
>>> response.headers['location']
'http://httpbin.org/get'
If you need more information on what would you get if you had followed redirection then you can use HEAD on the url given in Location header.
Example
>>> import requests
>>> response = requests.get('http://httpbin.org/redirect/1', allow_redirects=False)
>>> response.status_code
302
>>> response.headers['location']
'http://httpbin.org/get'
>>> response2 = requests.head(response.headers['location'])
>>> response2.status_code
200
>>> response2.headers
{'date': 'Wed, 07 Nov 2012 20:04:16 GMT', 'content-length': '352', 'content-type':
'application/json', 'connection': 'keep-alive', 'server': 'gunicorn/0.13.4'}
It would help if you gave some more data, for example, a sample URL that you're trying to request. That being said, it seems to me that generally you're checking if you had the correct URL after your POST request using the following algorithm relying on redirection or HTTP 404 errors:
if original_url == returned request url:
correct url to a correctly made request
else:
wrong url and a wrongly made request
If this is the case, what you can do here is use the HTTP HEAD request (another type of HTTP request like GET, POST, etc.) in Python's requests library to get only the header and not also the page body. Then, you'd check the response code and redirection url (if present) to see if you made a request to a valid URL.
For example:
def attempt_url(url):
'''Checks the url to see if it is valid, or returns a redirect or error.
Returns True if valid, False otherwise.'''
r = requests.head(url)
if r.status_code == 200:
return True
elif r.status_code in (301, 302):
if r.headers['location'] == url:
return True
else:
return False
elif r.status_code == 404:
return False
else:
raise Exception, "A status code we haven't prepared for has arisen!"
If this isn't quite what you're looking for, additional detail on your requirements would help. At the very least, this gets you the status code and headers without pulling all of the page data.