Using Census Bulk Geocoder with python requests library - python

I am experimenting with the census bulk geocode API documentation
The following curl command works:
curl --form addressFile=#Addresses.csv --form benchmark=9 http://geocoding.geo.census.gov/geocoder/locations/addressbatch --output geocoderesult.csv
But when I attempt to port this to python requests:
url = 'http://geocoding.geo.census.gov/geocoder/geographies/addressbatch'
payload = {'benchmark':9}
files = {'addressFile': ('Addresses.csv', open('Addresses.csv', 'rb'), 'text/csv')}
r = requests.post(url, files=files, data = payload)
print r.text
I am apparently not sending a well formed request and only receiving "There was an internal error" in response. Any idea what I am doing wrong in forming this request?

Got it! Turns out that the geographies request type required some parameters that the locations type did not. Working solution:
url = 'http://geocoding.geo.census.gov/geocoder/geographies/addressbatch'
payload = {'benchmark':'Public_AR_Current','vintage':'ACS2013_Current'}
files = {'addressFile': ('Addresses.csv', open('Addresses.csv', 'rb'), 'text/csv')}
r = requests.post(url, files=files, data = payload)
print r.text

May be this is a simpler way to do the same thing.
You will get a clean output in pandas dataframe :)
# pip install censusgeocode
import censusgeocode
import pandas as pd
cg = censusgeocode.CensusGeocode(benchmark='Public_AR_Census2010',vintage='Census2010_Census2010')
k = cg.addressbatch('D:\WORK\Addresses.csv')
# Bonus
# Get clean output in Dataframe
df = pd.DataFrame(k, columns=k[0].keys())
# PS: I tried with 9990 records in single batch
Reference:
https://pypi.org/project/censusgeocode/
https://geocoding.geo.census.gov/geocoder/benchmarks
https://geocoding.geo.census.gov/geocoder/vintages?form
https://geocoding.geo.census.gov/geocoder/geographies/addressbatch?form

Works great. Today I just used the code shown below.
url = 'https://geocoding.geo.census.gov/geocoder/locations/addressbatch'
payload = {'benchmark':'Public_AR_Current','vintage':'ACS2013_Current'}
files = {'addressFile': ('19067.csv', open('19067.csv', 'rb'), 'text/csv')}
r = requests.post(url, files=files, data = payload)

Related

Python using Requests to cURL a file upload

I am trying to translate a specific curl method into Python's requests module to upload a file to to an api. My standard method that works for non-file requests looks like this:
import requests
requestObject = requests.Session()
standard_headers = {header1:headerValue1,header2:headerValue2}
payload = {key1:value1,key2:value2}
url = 'https://myUrl.com/apiCall'
requestObject.post(url,headers=standard_headers, json=payload)
This works for non-file requests that I need to make to the API. However for file uploads, the API documentation shows a method using curl:
curl -XPOST -H 'header1' -H 'header2 'https://myUrl.com/apiCall' \
-F 'attachment=#/path/to/my/file' \
-F '_json=<-;type=application/json' << _EOF_
{
"key1":"keyValue1",
"key2":"keyValue2"
}
_EOF_
I tested the curl command and it works successfully.
My question is how do I translate that curl method using the << _EOF_ method in Python requests. One idea I had was simply to use the 'files' option in the requests module:
requestObject = requests.Session()
standard_headers = {header1:headerValue1,header2:headerValue2}
payload = {key1:keyValue1,key2:keyValue2}
url = 'https://myUrl.com/apiCall'
file_to_upload = {'filename': open('/path/to/my/file', 'rb')}
requestObject.post(url,headers=standard_headers, files=file_to_upload, json=payload)
But that does not seem to work as the necessary json parameters (the values in payload) do not appear to get passed to the file upload
I also tried specifying the json parameters directly into the file_to_upload variable:
requestObject = requests.Session()
standard_headers = {header1:headerValue1,header2:headerValue2}
url = 'https://myUrl.com/apiCall'
file_to_upload = {'attachment': open('/path/to/my/file', 'rb'),'{"key1":"keyValue1","key2":"keyValue2"}'}
requestObject.post(url,headers=standard_headers, files=file_to_upload)
Similar result, it seems as though I am not passing the necessary json values correctly. I tried a few other ways but I am overlooking something. Any insight into how I should structure my request is appreciated.
Ok I managed to get it to work and posting for anyone who might need help in the future.
The trick was to include the _json key in the data field. My code ended up looking like so:
import requests
requestObject = requests.Session()
standard_headers = {header1:headerValue1,header2:headerValue2}
json_fields = json.dumps({
"key1": "key1Value",
"key2": "key2Value"
})
payload = {"_json":json_fields)
file = {"attachment": /path/to/my/file}
url = 'https://myUrl.com/apiCall'
requestObject.post(url,headers=standard_headers, files=file, data=payload)
Hope that helps some future person.

Passing file and data to requests, error at every turn

Python 3.6.7, Requests 2.21.0
I have an issue that gives me a new error at every solution.
What I want: To send a file with data in a POST command using the requests library.
url_upload = BASE_URL + "upload?action=save"
data = {'data':{'name':'test.txt','contenttype':'text/plain', 'size':37}}
files = {'file': open('/home/user/test.txt', 'rb')}
req = session.post(url=url_upload, files=files, data=data)
The end server is using Spring (I assume) and the response text contains this error:
"net.sf.json.JSONException: A JSONObject text must begin with \'{\' at character 1 of name"
So, I try
data = json.dumps(data)
But, of course requests doesn't want that:
ValueError: Data must not be a string.
If I add the headers:
headers = {'Content-type': 'multipart/form-data'}
org.apache.commons.fileupload.FileUploadException: the request was rejected because no multipart boundary was found
Help would be appreciated.
What I needed to do was:
req = session.post(url=url_upload, files=files, data={'data': json.dumps(data)})
That way I'm giving the function variable 'data' the form-data variable name 'data' which contains the variable that has the key 'data'...
http://www.trekmate.org.uk/wp-content/uploads/2015/02/Data-star-trek-the-next-generation-31159191-1024-768.png

How to use an auth token and submit data using Python requests.POST?

Using WheniWork's api, I need to use a token for authentication, and I also need to send data to create a new user. Does the order or name of arguments I send with requests.post() matter?
If I'm just using GET to pull information, I can have the url contain the thing I'm looking for, and then send a payload that is the token. For example:
url = 'https://api.wheniwork.com/2/users/2450964'
payload = {"W-Token": "ilovemyboss"}
r = requests.get(url, params=payload)
print r.text
When I try to add a new user however, I'm either not able to authenticate or not passing the data correctly. The api reference shows this format for using cURL:
curl https://api.wheniwork.com/2/users --data '{"first_name":"FirstName", "last_name": "LastName", "email": "user#email.com"}' -H "W-Token: ilovemyboss"
Here's what I've written out in python (2.7.10) using Requests:
url = 'https://api.wheniwork.com/2/users'
data={'first_name':'TestFirst', 'last_name': 'TestLast','email':'test#aol.com'}
params={"W-Token": "ilovemyboss"}
r = requests.post(url, data=data, params=params)
print r.text
Can someone explain if/how data(the user) gets sent separately from authentication(the token)?
I found the issue!
The data (user dict) needs to be in quotes. I'm not sure if their API is expecting a string, or if that's how requests works, or what. But here's the solution:
url = 'https://api.wheniwork.com/2/users'
data = "{'first_name':'TestFirst', 'last_name': 'TestLast','email':'test#aol.com'}"
params = {"W-Token": "ilovemyboss"}
r = requests.post(url, data=data, params=params)
print r.text
We can solve the above problem by converting the data dictionary to JSON string by using json.dumps.
data={'first_name':'TestFirst', 'last_name': 'TestLast','email':'test#aol.com'}
r = requests.post(url, data=json.dumps(data), params=params)
print r.text

Converting urllib2 POST to Requests

I have an existing Http POST using urllib2:
data = 'client_id=%s&client_secret=%s&grant_type=authorization_code&code=%s&redirect_uri=%s' % (settings.GOOGLE_CLIENT_ID, settings.GOOGLE_CLIENT_SECRET, code, redirect_uri)
req = urllib2.Request(access_token_url, data=data)
response = urllib2.urlopen(req)
response_content = response.read()
json_response = json.loads(response_content)
I'm trying to convert this to the Requests library instead (http://docs.python-requests.org/) but I'm getting a 400 Invalid Request.
Here's my attempt:
params = {'redirect_uri' : redirect_uri}
params['client_id'] = settings.GOOGLE_CLIENT_ID
params['client_secret'] = settings.GOOGLE_CLIENT_SECRET
params['grant_type'] = 'authorization_code'
params['code'] = code
req = requests.post(access_token_url, data=params)
json_response = req.json()
I tried tweaking it to use params instead of data but I got the same error.
Anything I'm missing?
Make sure the values of the data dict are not already escaped as requests will do that for you. Please notice how your original example does not do any escaping.

Python equivalent of Curl HTTP post

I am posting to Hudson server using curl from the command line using the following--
curl -X POST -d '<run><log encoding="hexBinary">4142430A</log><result>0</result><duration>2000</duration></run>' \
http://user:pass#myhost/hudson/job/_jobName_/postBuildResult
as shown in the hudson documentation..can I emulate the same thing using python..i don't want to use pyCurl or send this line through os.system()..is there ny way out using raw python??
import urllib2
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
result = response.read()
where data is the encoded data you want to POST.
You can encode a dict using urllib like this:
import urllib
values = { 'foo': 'bar' }
data = urllib.urlencode(values)
The modern day solution to this is much simpler with the requests module (tagline: HTTP for humans! :)
import requests
r = requests.post('http://httpbin.org/post', data = {'key':'value'}, auth=('user', 'passwd'))
r.text # response as a string
r.content # response as a byte string
# gzip and deflate transfer-encodings automatically decoded
r.json() # return python object from json! this is what you probably want!

Categories

Resources