I'm trying to make a POST request script in Python which will send data to database automatically. Since my web application (made in Django) is still in progress (on localhost) I tried to send this POST request from another computer on same network via IPv4:port.
While doing this I get HTTP Error 500: Internal server error with traceback to the line "response = urllib2.urlopen(req)"
My script looks like this:
import urllib, urllib2, cookielib
import re
cookie_jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie_jar))
urllib2.install_opener(opener)
url_1 = 'http://192.168.1.104:8000/legacy'
req = urllib2.Request(url_1)
rsp = urllib2.urlopen(req)
url_2 = 'http://192.168.1.104:8000/legacy'
params = {
'temperature_max' : '186',
'temperature_min' : '88',
'submit': 'Submit',
}
data = urllib.urlencode(params)
req = urllib2.Request(url_2, data)
response = urllib2.urlopen(req)
the_page = response.read()
pat = re.compile('Title:.*')
print pat.search(the_page).group()
On the other computer that is hosting the server I get the following error:
Exception happened during processing of request from ('192.168.1.65', 56996)
Traceback (most recent call last):
File "c:\python27\Lib\SocketServer.py", line 599, in process_request_thread
self.finish_request(request, client_address)
File "c:\python27\Lib\SocketServer.py", line 334, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "C:\Users\Alex\Envs\rango\lib\site-packages\django\core\servers\basehttp.
py", line 129, in __init__
super(WSGIRequestHandler, self).__init__(*args, **kwargs)
File "c:\python27\Lib\SocketServer.py", line 657, in __init__
self.finish()
File "c:\python27\Lib\SocketServer.py", line 716, in finish
self.wfile.close()
File "c:\python27\Lib\socket.py", line 283, in close
self.flush()
File "c:\python27\Lib\socket.py", line 307, in flush
self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 10054] An existing connection was forcibly closed by the remote host
EDIT: I'd like to let you know that I can post data from the other computer to my database if I connect to my app with my browser.
You will need to get a session cookie and then read the generated cookie for that session.
The best way to go about that is to fetch the page with the form first, saving all cookies. You can use cookielib for that as demonstrated here. If you want to make life easier for yourself, use requests as well instead of urllib. The code then becomes a lot simpler.
To get the CSRF token, you can scrape the form page with BeautifulSoup. There are lots of tutorials for this which you can easily find on Google.
Try:
import urllib.request
from urllib.error import HTTPError
url = "<enter_URL_here>"
try:
res = urllib.request.urlopen(url)
except HTTPError as e:
content = e.read()
print("HTTP error encountered while reading URL")
Related
I have a daemon which uploads json to http://localhost:9000, I can read json easily using hapi with this code:
'use strict';
const Hapi = require('hapi');
const server = new Hapi.Server();
server.connection({
host: 'localhost',
port: 9000
});
server.route({
method: 'POST',
path:'/push/',
handler: function (request, reply) {
console.log('Server running at:', request.payload);
return reply('hello world');
}
});
server.start((err) => {
if (err) {
throw err;
}
console.log('Server running at:', server.info.uri);
});
but I'm trying to handle this in python, I found a answer here, so here is my code:
import urllib, json
url = 'http://localhost:9000/push/'
response = urllib.urlopen(url)
data = json.load(response.read())
print data
but when I run it I get Exception:
Traceback (most recent call last):
File "pythonServer.py", line 3, in <module>
response = urllib.urlopen(url)
File "/usr/lib/python2.7/urllib.py", line 87, in urlopen
return opener.open(url)
File "/usr/lib/python2.7/urllib.py", line 213, in open
return getattr(self, name)(url)
File "/usr/lib/python2.7/urllib.py", line 350, in open_http
h.endheaders(data)
File "/usr/lib/python2.7/httplib.py", line 997, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 850, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 812, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 793, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 571, in create_connection
raise err
IOError: [Errno socket error] [Errno 111] Connection refused
How can I solve this?
Your server route is expecting a POST request.
Your Python code is sending a GET request.
Notice that your JavaScript code is using the following:
method: 'POST'
Your Python code by contrast will be default to a GET method, since you didn't explicitly specify it. You can look here for how to make a POST request using urllib, but I would recommend you look into using the requests library instead.
It's a simple as
import requests
r = requests.post("http://localhost:9000/push/", data = {"key":"value"})
Alternatively, just change your server (JavaScript/Node.js code) to use method: 'GET'.
Your server is sending down POST request and you are fetching a GET call.
response = urllib.urlopen(url)
urllib.urlopen opens the network URL for reading.
Instead of sending a GET request you can send POST request to server.
Here is the complete code:
import urllib, json, urllib2
url = 'http://localhost:9000/push/'
# This packages the request (it doesn't make it)
request = urllib2.Request(url)
# Sends the request and catches the response
response = urllib2.urlopen(request)
data = json.load(response.read())
print data
I am working with a local html file in python, and I am trying to use lxml to parse the file. For some reason I can't get the file to load properly, and I'm not sure if this has to do with not having an http server set up on my local machine, etree usage, or something else.
My reference for this code was this: http://docs.python-guide.org/en/latest/scenarios/scrape/
This could be a related problem: Requests : No connection adapters were found for, error in Python3
Here is my code:
from lxml import html
import requests
page = requests.get('C:\Users\...\sites\site_1.html')
tree = html.fromstring(page.text)
test = tree.xpath('//html/body/form/div[3]/div[3]/div[2]/div[2]/div/div[2]/div[2]/p[1]/strong/text()')
print test
The traceback that I'm getting reads:
C:\Python27\python.exe "C:/Users/.../extract_html/extract.py"
Traceback (most recent call last):
File "C:/Users/.../extract_html/extract.py", line 4, in <module>
page = requests.get('C:\Users\...\sites\site_1.html')
File "C:\Python27\lib\site-packages\requests\api.py", line 69, in get
return request('get', url, params=params, **kwargs)
File "C:\Python27\lib\site-packages\requests\api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 567, in send
adapter = self.get_adapter(url=request.url)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 641, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for 'C:\Users\...\sites\site_1.html'
Process finished with exit code 1
You can see that it has something to do with a "connection adapter" but I'm not sure what that means.
If the file is local, you shouldn't be using requests -- just open the file and read it in. requests expects to be talking to a web server.
with open(r'C:\Users\...site_1.html', "r") as f:
page = f.read()
tree = html.fromstring(page)
There is a better way for doing it:
using parse function instead of fromstring
tree = html.parse("C:\Users\...site_1.html")
print(html.tostring(tree))
You can also try using Beautiful Soup
from bs4 import BeautifulSoup
f = open("filepath", encoding="utf8")
soup = BeautifulSoup(f)
f.close()
I can download things from my controlled server in one way - by passing the document ID into a link like so :
https://website/deployLink/442/document/download/$NUMBER
If I navigate to this in my browser, it downloads the file with ID $NUMBER.
The problem is, I have 9,000 files on my server, which is SSL encrypted and usually requires signing in with a username/password on a dialog box popup which appears on the web-page.
I posted a similar thread to this already, where I downloaded the files via WGET. Now I would like to try and use Python, and I'd like to provide the username/password and get through the SSL encryption.
Here is my attempt to grab one file, which results in a 401 error. Full stacktrace below.
import urllib2
import ctypes
from HTMLParser import HTMLParser
# create a password manager
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
# Add the username and password.
top_level_url = "https://website.com/home.html"
password_mgr.add_password(None, top_level_url, "admin", "password")
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
# create "opener" (OpenerDirector instance)
opener = urllib2.build_opener(handler)
# Install the opener.
# Now all calls to urllib2.urlopen use our opener.
urllib2.install_opener(opener)
# Grab website
response = urllib2.urlopen('https://website/deployLink/442/document/download/1')
html = response.read()
class MyHTMLParser(HTMLParser):
url=''https://website/deployLink/442/document/download/1')'
# Save the file
webpage = urllib2.urlopen(url)
with open('Test.doc','wb') as localFile:
localFile.write(webpage.read())
What have I done incorrectly here? Is what I am attempting possible?
C:\Python27\python.exe C:/Users/ADMIN/PycharmProjects/GetFile.py
Traceback (most recent call last):
File "C:/Users/ADMIN/PycharmProjects/GetFile.py", line 22, in <module>
response = urllib2.urlopen('https://website/deployLink/442/document/download/1')
File "C:\Python27\lib\urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 437, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 475, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 409, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 558, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 401: Processed
Process finished with exit code 1
Here's my authent page with some info removed for privacy :
Authent url ends in :443.
Assuming your code above is accurate, then I think your problem is related to the URIs in your add_password method. You have this when setting up the username/password:
# Add the username and password.
top_level_url = "https://website.com/home.html"
password_mgr.add_password(None, top_level_url, "admin", "password")
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
And then your subsequent request goes to this URI:
# Grab website
response = urllib2.urlopen('https://website/deployLink/442/document/download/1')
(I'm assuming they've been "scrubbed" incorrectly, and they should be the same, and just move on. See: "website" vs. "website.com")
The second URI is not a child of the first URI based on their respective path portions. The URI path /deployLink/442/document/download/1 is not a child of /home.html. From the perspective of the library, you'd have no auth data for the second URI.
If I use urllib to load this url( https://www.fundingcircle.com/my-account/sell-my-loans/ ) I get a 400 status error.
e.g. The following returns a 400 error
>>> import urllib
>>> f = urllib.urlopen("https://www.fundingcircle.com/my-account/sell-my-loans/")
>>> print f.read()
However, if I copy and paste the url into my browser, I see a web page with the information that I want to see.
I have tried using a try, except, and then reading the error. But the returned data just tells me that the page does not exist. e.g.
import urllib
try:
f = urllib.urlopen("https://www.fundingcircle.com/my-account/sell-my-loans/")
except Exception as e:
eString = e.read()
print eString
Why can't Python load the page?
If Python is given a 404 status then that'd be because the server refuses to give you the page.
Why that is is difficult to know, because servers are black boxes. But your browser gives the server more than just the URL, it also gives it a set of HTTP headers. Most likely the server alters behaviour based on the contents of one or more of those headers.
You need to look in your browser development tools and see what your browser sends, then try and replicate some of those headers from Python. Obvious candidates are the User-Agent header, followed by Accept and Cookie headers.
However, in this specific case, the server is responding with a 401 Unauthorized; you are given a login page. It does this both for the browser and Python:
>>> import urllib
>>> urllib.urlopen('https://www.fundingcircle.com/my-account/sell-my-loans/')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/urllib.py", line 87, in urlopen
return opener.open(url)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/urllib.py", line 208, in open
return getattr(self, name)(url)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/urllib.py", line 451, in open_https
return self.http_error(url, fp, errcode, errmsg, headers)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/urllib.py", line 372, in http_error
result = method(url, fp, errcode, errmsg, headers)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/urllib.py", line 683, in http_error_401
errcode, errmsg, headers)
File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/urllib.py", line 381, in http_error_default
raise IOError, ('http error', errcode, errmsg, headers)
IOError: ('http error', 401, 'Unauthorized', <httplib.HTTPMessage instance at 0x1066f9a28>)
but Python's urllib doesn't have a handler for the 401 status code and turns that into an exception.
The response body contains a login form; you'll have to write code to log in here, and presumably track cookies.
That task would be a lot easier with more specialised tools. You could use robobrowser to load the page, parse the form and give you the tools to fill it out, then post the form for you and track the cookies required to keep you logged in. It is built on top of the excellent requests and BeautifulSoup libraries.
I'm getting a getaddress error and after doing some sleuthing, it looks like it might be my corporate intranet not allowing the connection (I'm assuming due to security, although it is strange that IE works but won't allow Python to open a url). Is there a safe way to get around this?
Here's the exact error:
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
b = urllib.urlopen('http://www.google.com')
File "C:\Python26\lib\urllib.py", line 87, in urlopen
return opener.open(url)
File "C:\Python26\lib\urllib.py", line 203, in open
return getattr(self, name)(url)
File "C:\Python26\lib\urllib.py", line 342, in open_http
h.endheaders()
File "C:\Python26\lib\httplib.py", line 868, in endheaders
self._send_output()
File "C:\Python26\lib\httplib.py", line 740, in _send_output
self.send(msg)
File "C:\Python26\lib\httplib.py", line 699, in send
self.connect()
File "C:\Python26\lib\httplib.py", line 683, in connect
self.timeout)
File "C:\Python26\lib\socket.py", line 498, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno 11001] getaddrinfo failed
More info: I also get this error with urllib2.urlopen
You probably need to fill in proxy information.
import urllib2
proxy_handler = urllib2.ProxyHandler({'http': 'http://yourcorporateproxy:12345/'})
proxy_auth_handler = urllib2.HTTPBasicAuthHandler()
proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
opener.open('http://www.stackoverflow.com')
Check you are using the correct proxy.
You can get the proxy information by using urllib.getproxies (note: getproxies does not work with dynamic proxy configuration, like when using PAC).
Update As per information about empty proxy list, I would suggest using an urlopener, with the proxy name and information.
Some good information about how use proxies urlopeners:
Urllib manual
Michael Foord's introduction to urllib
Possibly this is a DNS issue, try urlopen with the IP address of the web server you're accessing, i.e.
import urllib
URL="http://66.102.11.99" # www.google.com
f = urllib.urlopen(URL)
f.read()
If this succeeds, then it's probably a DNS issue rather than a proxy issue (but you should also check your proxy setup).
Looks like a DNS problem.
Since you are using Windows, you can try run this command
nslookup www.google.com
To check if the web address can be resolved successfully.
If not, it is a network setting issue
If OK, then we have to look at possible alternative causes
I was facing the same issue.
In my system the proxy configuration is through a .PAC file.
So i opended that file, took out the default proxy url, for me it was http://168.219.61.250:8080/
Following test code worked for me :
import urllib2
proxy_support = urllib2.ProxyHandler({'http': 'http://168.219.61.250:8080/'})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
response = urllib2.urlopen('http://python.org/')
html = response.read()
print html
You might need to add some more code, if your proxy requires authentication
Hope this helps!!