I've got a valid JSON object, with a number of bike accidents listed:
{
"city":"San Francisco",
"accidents":[
{
"lat":37.7726483,
"severity":"u'INJURY",
"street1":"11th St",
"street2":"Kissling St",
"image_id":0,
"year":"2012",
"date":"u'20120409",
"lng":-122.4150145
},
],
"source":"http://sf-police.org/"
}
I'm trying to use the json library in python to load the data and then add fields to the objects in the "accidents" array. I've loaded my json like so:
with open('sanfrancisco_crashes_cp.json', 'rw') as json_data:
json_data = json.load(json_data)
accidents = json_data['accidents']
When I try to write to the file like so:
for accident in accidents:
turn = randTurn()
accidents.write(accident['Turn'] = 'right')
I get the following error: SyntaxError: keyword can't be an expression
I've tried a number of different ways. How can you add data to a JSON object using Python?
First, accidents is a dictionary, and you can't write to a dictionary; you just set values in it.
So, what you want is:
for accident in accidents:
accident['Turn'] = 'right'
The thing you want to write out is the new JSON—after you've finished modifying the data, you can dump it back to a file.
Ideally you do this by writing to a new file, then moving it over the original:
with open('sanfrancisco_crashes_cp.json') as json_file:
json_data = json.load(json_file)
accidents = json_data['accidents']
for accident in accidents:
accident['Turn'] = 'right'
with tempfile.NamedTemporaryFile(dir='.', delete=False) as temp_file:
json.dump(temp_file, json_data)
os.replace(temp_file.name, 'sanfrancisco_crashes_cp.json')
But you can do it in-place if you really want to:
# notice r+, not rw, and notice that we have to keep the file open
# by moving everything into the with statement
with open('sanfrancisco_crashes_cp.json', 'r+') as json_file:
json_data = json.load(json_file)
accidents = json_data['accidents']
for accident in accidents:
accident['Turn'] = 'right'
# And we also have to move back to the start of the file to overwrite
json_file.seek(0, 0)
json.dump(json_file, json_data)
json_file.truncate()
If you're wondering why you got the specific error you did:
In Python—unlike many other languages—assignments aren't expressions, they're statements, which have to go on a line all by themselves.
But keyword arguments inside a function call have a very similar syntax. For example, see that tempfile.NamedTemporaryFile(dir='.', delete=False) in my example code above.
So, Python is trying to interpret your accident['Turn'] = 'right' as if it were a keyword argument, with the keyword accident['Turn']. But keywords can only be actual words (well, identifiers), not arbitrary expressions. So its attempt to interpret your code fails, and you get an error saying keyword can't be an expression.
I solved with that :
with open('sanfrancisco_crashes_cp.json') as json_file:
json_data = json.load(json_file)
accidents = json_data['accidents']
for accident in accidents:
accident['Turn'] = 'right'
with open('sanfrancisco_crashes_cp.json', "w") as f:
json.dump(json_data, f)
Related
I have a script that works fine in Python2 but I can't get it to work in Python3. I want to base64 encode each item in a list and then write it to a json file. I know I can't use map the same way in Python3 but when I make it a list I get a different error.
import base64
import json
list_of_numbers = ['123456', '234567', '345678']
file = open("orig.json", "r")
json_object = json.load(file)
list = ["[{\"number\":\"" + str(s) + "\"}]" for s in list_of_numbers]
base64_bytes = map(base64.b64encode, list)
json_object["conditions"][1]["value"] = base64_bytes
rule = open("new.json", "w")
json.dump(json_object, rule, indent=2, sort_keys=True)
rule.close()
I'm not sure if your error is related to this, but here's what I think might be the problem. When you map a function, the returned value becomes a map object. To get the results as a list again, you need to cast it back to a list after you map your function. In other words:
base64_bytes = list(map(base64.b64encode, list))
P.S. It's better to avoid list as your variable name since it's the name of the built-in function list.
I am using the KEGG API to download genomic data and writing it to a file. There are 26 files total and some of of them contain the the dictionary 'COMPOUND'. I would like to assign these to CompData and write them to the output file. I tried writing it as an if True statement but this does not work.
# Read in hsa links
hsa = []
with open ('/users/skylake/desktop/pathway-HSAs.txt', 'r') as file:
for line in file:
line = line.strip()
hsa.append(line)
# Import KEGG API Bioservices | Create KEGG Variable
from bioservices.kegg import KEGG
k = KEGG()
# Data Parsing | Writing to File
# for i in range(len(hsa)):
data = k.get(hsa[2])
dict_data = k.parse(data)
if dict_data['COMPOUND'] == True:
compData = str(dict_data['COMPOUND'])
nameData = str(dict_data['NAME'])
geneData = str(dict_data['GENE'])
f = open('/Users/Skylake/Desktop/pathway-info/' + nameData + '.txt' , 'w')
f.write("Genes\n")
f.write(geneData)
f.write("\nCompounds\n")
f.write(compData)
f.close()
I guess that by
if dict_data['COMPOUND'] == True:
You test (wrongly) for the existence of the string key 'COMPOUND' in dict_data. In this case what you want is
if 'COMPOUND' in dict_data:
Furthermore, note that the definition of the variable compData won't occur if the key is not present, which will raise an error when trying to write its value. This means that you should always define it whatever happens, via doing, e.g.
compData = str(dict_data.get('COMPOUND', 'undefined'))
The above line of code means that if the key exists it gets its value and if does not exist, it gets 'undefined' instead. Note that you can chose the alternative value you want, or even not giving any, which results in None by default.
I am converting an XML file to a JSON file. I do this by opening the xml, use the xmltodict module and then use the .get method to traverse the tree to the level I want. This level is the parents to the leaves. I then check on a certain condition that some of the leaves for each of these task is true and if it is then I use json.dumps() and write it to the file. The issue is (I think this is where it is stemming from) that when I only append one JSON object to the file, it doesn't append a comma to the end of the object because it thinks it is the only object. I tried combating this by appending a ',' at the end of each JSON object but then when I try to use the json.loads() method it gives me an error saying "No JSON object could be decoded". However when I manually append the '[' and ']' to the file it doesn't give me an error. My code is below and I'd appreciate any help/suggestions you have.
def getTasks(filename):
f = open(filename, 'r')
a = open('tasksJSON', 'w')
a.write('[')
d = xmltodict.parse(f)
l = d.get('Project').get('Tasks').get('Task')
for task in l:
if (task['Name'] == 'dinner'): #criteria for desirable tasks
j = json.dumps(task)
a.write (str(j))
a.write(',')
a.write(']')
f.close()
a.close()
This works and puts everything in tasksJSON but like I said, when I call
my_file = open('tasksJSON', 'r')
data = json.load(my_file) # LINE THAT GIVES ME ERROR
I get an error saying
ValueError: No JSON object could be decoded
and the output file contains:
[{"UID": "4", "ID": "14", "Name": "Design"},{"UID": "5", "ID": "15", "Name": "Basic Skeleton"}]
^
this is the comma I manually inserted
make it this way:
def getTasks(filename):
f = open(filename, 'r')
a = open('tasksJSON', 'w')
x = []
d = xmltodict.parse(f)
l = d.get('Project').get('Tasks').get('Task')
for task in l:
if (task['Name'] == 'dinner'): #criteria for desirable tasks
#j = json.dumps(task)
x.append(task)
#a.write (str(j))
#a.write(',')
a.write(json.dumps(x))
f.close()
a.close()
JSON doesn't allow extra commas at the end of an array or object. But your code adds such an extra comma. If you look at the official grammar here, you can only have a , before another value. And Python's json library conforms to that grammar, so:
>>> json.loads('[1, 2, 3, ]')
ValueError: Expecting value: line 1 column 8 (char 7)
To fix this, you could do something like this:
first = True
for task in l:
if (task['Name'] == 'dinner'): #criteria for desirable tasks
if first:
first = False
else:
a.write(',')
j = json.dumps(task)
a.write(str(j))
On the other hand, if memory isn't an issue, it might be simpler—and certainly cleaner—to just add all of the objects to a list and then json.dumps that list:
output = []
for task in l:
if (task['Name'] == 'dinner'): #criteria for desirable tasks
output.append(task)
a.write(json.dumps(output))
Or, more simply:
json.dump([task for task in l if task['Name'] == 'dinner'], a)
(In fact, even if memory is an issue, you can extend JSONEncoder, as shown in the docs, to handle iterators by converting them lazily into JSON arrays, but this is a bit tricky, so I won't show the details unless someone needs them.)
It seems, that you put into a file several json objects and add your own square brackets. Hence, it can not load as single obj
I am trying to write a code in python and deploy on google app engine. I am new to both these things. I have json which contains the following
[
{
"sentiment":-0.113568,
"id":455908588913827840,
"user":"ANI",
"text":"Posters put up against Arvind Kejriwal in Varanasi http://t.co/ZDrzjm84je",
"created_at":1.397532052E9,
"location":"India",
"time_zone":"New Delhi"
},
{
"sentiment":-0.467335,
"id":456034840106643456,
"user":"Kumar Amit",
"text":"Arvind Kejriwal's interactive session with Varansi Supporter and Opponent will start in short while ..Join at http://t.co/f6xI0l2dWc",
"created_at":1.397562153E9,
"location":"New Delhi, Patna.",
"time_zone":"New Delhi"
},
I am trying to load this data in python. I have the following code for it
data = simplejson.load(open('data/convertcsv.json'))
# print data
for row in data:
print data['sentiment']
I am getting the following error - TypeError: list indices must be integers, not str
If I uncomment the print data line and remove the last 2 lines I can see all the data in console. I want to be able to do some computations on the sentiment and also search for some words in the text. But for that I need to know how to get it line by line.
If you'd like to clean it up a bit
import json
with open('data/convertcsv.json') as f:
data = json.loads(f.read())
for row in data:
print row['sentiment']
The 'with' only leaves the file open as its used, then closes it automatically once the indented block under is executed.
Try this:
import json
f = open('data/convertcsv.json');
data = json.loads(f.read())
f.close()
for row in data:
print row['sentiment']
The issue is that you use data['sentiment'] instead of row['sentiment'] otherwise your code is fine:
with open('data/convertcsv.json', 'rb') as file:
data = simplejson.load(file)
# print data
for row in data:
print row['sentiment'] # <-- data is a list, use `row` here
Using Python I wanted to extract data rows shown below to a csv file from a bunch of javascript files which contain hardcoded data as shown below:
....html code....
hotels[0] = new hotelData();
hotels[0].hotelName = "MANHATTAN";
hotels[0].hotelPhone = "";
hotels[0].hotelSalesPhone = "";
hotels[0].hotelPhone = 'Phone: 888-350-6432';
hotels[0].hotelStreet = "787 11TH AVENUE";
hotels[0].hotelCity = "NEW YORK";
hotels[0].hotelState = "NY";
hotels[0].hotelZip = "10019";
hotels[0].hotelId = "51543";
hotels[0].hotelLat = "40.7686";;
hotels[0].hotelLong = "-73.992645";;
hotels[1] = new hotelData();
hotels[1].hotelName = "KOEPPEL";
hotels[1].hotelPhone = "";
hotels[1].hotelSalesPhone = "";
hotels[1].hotelPhone = 'Phone: 718-721-9100';
hotels[1].hotelStreet = "57-01 NORTHERN BLVD.";
hotels[1].hotelCity = "WOODSIDE";
hotels[1].hotelState = "NY";
hotels[1].hotelZip = "11377";
hotels[1].hotelId = "51582";
hotels[1].hotelLat = "40.75362";;
hotels[1].hotelLong = "-73.90366";;
var mykey = "AlvQ9gNhp7oNuvjhkalD4OWVs_9LvGHg0ZLG9cWwRdAUbsy-ZIW1N9uVSU0V4X-8";
var map = null;
var pins = null;
var i = null;
var boxes = new Array();
var currentBox = null;
var mapOptions = {
credentials: mykey,
enableSearchLogo: false,
showMapTypeSelector: false,
enableClickableLogo: false
}
.....html code .....
Hence the required csv output would be like rows of the above data:
MANHATTAN,,,Phone: 888-350-6432 ...
KOEPPEL,,,Phone: 718-721-9100 ...
Should I use code generation tool to directly parse the above statements to get the data ? Which is the most efficient Python method to transform such data contained in thousands of Javascript files into csv tabular format?
Update:
Ideally I would like the solution to parse the JavaScript statements as Python objects and then store it to CSV to gain maximum independence from ordering and formatting of the input script code
I'd recommend using a regular expression to pick out all "hotel[#]. ..." lines, and then add all of the results to a dictionary. Then, with the dictionary, output to a CSV file. The following should work:
import re
import csv
src_text = your_javascript_text
p = re.compile(r'hotels\[(?P<hotelid>\d+)\].(?P<attr>\w+) = ("|\')(?P<attr_val>.*?)("|\');', re.DOTALL)
hotels = {}
fieldnames = []
for result in [m.groupdict() for m in p.finditer(src_text)]:
if int(result['hotelid']) not in hotels:
hotels[int(result['hotelid'])] = {}
if result['attr'] not in fieldnames:
fieldnames.append(result['attr'])
hotels[int(result['hotelid'])][result['attr']] = result['attr_val']
output = open('hotels.csv','wb')
csv_writer = csv.DictWriter(output, delimiter=',', fieldnames=fieldnames, quoting=csv.QUOTE_ALL)
csv_writer.writerow(dict((f,f) for f in fieldnames))
for hotel in hotels.items():
csv_writer.writerow(hotel[1])
You now have a dictionary of Hotels w/ attributes, grouped by the ID in the Javascript, as well as the output file "hotels.csv" (with header row & proper escaping). I did do things like named groups which really aren't necessary, but find it to be more self-commenting.
It should be noted that if the same group is provided in the Javascript twice, like hotelPhone, the last is the only one stored.
When dealing with this type of problem, it falls to you and your judgment how much tolerance and sanitation you need. You may need to modify the regular expression to handle examples not int he small sample provided (ie. change in capture groups, restrict matches to those at the start of a line, etc.); or escape newline characters, like those in the phone number); or strip out certain text (eg. "Phone: " in the phone numbers). There's no real way for us to know this, so keep that in mind.
Cheers!
If this is something you will have to do routinely and you want to make the process fully automatic I think the easiest would be just to parse the files using Python and then write to csv using the csv Python module.
Your code could look somewhat like this:
with open("datafile.txt") as f:
hotel_data = []
for line in f:
# Let's make sure the line not empty
if line:
if "new hotelData();" in line:
if hotel_data:
write_to_csv(hotel_data)
hotel_data = []
else:
# Data, still has ending quote and semi colon
data = line.split("= ")[1]
# Remove ending quote and semi colon
data = data[:-2]
hotel_data.append(data)
def write_to_csv(hotel_data):
with open('hotels.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',',
quotechar='""', quoting=csv.QUOTE_MINIMAL)
spamwriter.writerow(hotel_data)
Beware that I have not tested this code, it is just meant to help you and point you in the right direction, it is not the complete solution.
If each hotel has every field declared in your files (i.e. if all of the hotels have the same amount of lines, even if some of them are empty), you may try to use a simple regular expression to extract every value surrounded by quotes ("xxx"), and then group them by number (for example, group every 5 fields into a single line and then add a line break).
A simple regex that would work would be ["'][^"']*["'] (EDIT: this is because I see that some fileds (i.e. Phone) use single quotes and the rest use quotes).
To make the search, use findall:
compPattern = re.compile(pattern)
results = compPattern.findall(compPattern)