Right way to load data from json file in python - python

I am trying to write a code in python and deploy on google app engine. I am new to both these things. I have json which contains the following
[
{
"sentiment":-0.113568,
"id":455908588913827840,
"user":"ANI",
"text":"Posters put up against Arvind Kejriwal in Varanasi http://t.co/ZDrzjm84je",
"created_at":1.397532052E9,
"location":"India",
"time_zone":"New Delhi"
},
{
"sentiment":-0.467335,
"id":456034840106643456,
"user":"Kumar Amit",
"text":"Arvind Kejriwal's interactive session with Varansi Supporter and Opponent will start in short while ..Join at http://t.co/f6xI0l2dWc",
"created_at":1.397562153E9,
"location":"New Delhi, Patna.",
"time_zone":"New Delhi"
},
I am trying to load this data in python. I have the following code for it
data = simplejson.load(open('data/convertcsv.json'))
# print data
for row in data:
print data['sentiment']
I am getting the following error - TypeError: list indices must be integers, not str
If I uncomment the print data line and remove the last 2 lines I can see all the data in console. I want to be able to do some computations on the sentiment and also search for some words in the text. But for that I need to know how to get it line by line.

If you'd like to clean it up a bit
import json
with open('data/convertcsv.json') as f:
data = json.loads(f.read())
for row in data:
print row['sentiment']
The 'with' only leaves the file open as its used, then closes it automatically once the indented block under is executed.

Try this:
import json
f = open('data/convertcsv.json');
data = json.loads(f.read())
f.close()
for row in data:
print row['sentiment']

The issue is that you use data['sentiment'] instead of row['sentiment'] otherwise your code is fine:
with open('data/convertcsv.json', 'rb') as file:
data = simplejson.load(file)
# print data
for row in data:
print row['sentiment'] # <-- data is a list, use `row` here

Related

"list indices must be integers or slices, not str" while manipulating data from JSON

I am trying to extract some data from JSON files, which are have all the same structure and then write the chosen data into a new JSON file. My goal is to create a new JSON file which is more or less a list of each JSON file in my folder with the data:
Filename, triggerdata, velocity {imgVel, trigVel}, coordinates.
In a further step of my programme, I will need this new splitTest1 for analysing the data of the different files.
I have the following code:
base_dir = 'mypath'
def createJsonFile() :
splitTest1 = {}
splitTest1['20mm PSL'] = []
for file in os.listdir(base_dir):
# If file is a json, construct it's full path and open it, append all json data to list
if 'json' in file:
json_path = os.path.join(base_dir, file)
json_data = pd.read_json(json_path, lines=True)
if splitTest1[file]['20mm PSL'] == to_find:
splitTest1['20mm PSL'].append({
'filename': os.path.basename(base_dir),
'triggerdata': ['rawData']['adcDump']['0B'],
'velocity': {
'imgVel': ['computedData']['particleProperties']['imgVelocity'],
'trigVel': ['computedData']['img0Properties']['coordinates']},
'coordinates': ['computedData']['img1Properties']['coordinates']})
print(len(splitTest1))
When I run the code, I get this error:
'triggerdata': ['rawData']['adcDump']['0B'], TypeError: list indices must be integers or slices, not str
What is wrong with the code? How do I fix this?
This is my previous code how I accessed that data without saving it in another JSON File:
with open('myJsonFile.json') as f0:
d0 = json.load(f0)
y00B = d0['rawData']['adcDump']['0B']
x = np.arange(0, (2048 * 0.004), 0.004) # in ms, 2048 Samples, 4us
def getData():
return y00B, x
def getVel():
imgV = d0['computedData']['particleProperties']['imgVelocity']
trigV = d0['computedData']['trigger']['trigVelocity']
return imgV, trigV
Basically, I am trying to put this last code snippet into a loop which is reading all my JSON files in my folder and make a new JSON file with a list of the names of these files and some other chosen data (like the ['rawData']['adcDump']['0B'], etc)
Hope this helps understanding my problem better
I assume what you want to do is to take some data from several json files and compile those into a list and write that into a new json file.
In order to get the data from your current json file you'll need to add a "reference" to it in front of the indices (otherwise the code has no idea where it should that data from). Like so:
base_dir = 'mypath'
def createJsonFile() :
splitTest1 = {}
splitTest1['20mm PSL'] = []
for file in os.listdir(base_dir):
# If file is a json, construct it's full path and open it, append all json data to list
if 'json' in file:
json_path = os.path.join(base_dir, file)
json_data = pd.read_json(json_path, lines=True)
if splitTest1[file]['20mm PSL'] == to_find:
splitTest1['20mm PSL'].append({
'filename': os.path.basename(base_dir),
'triggerdata': json_data['rawData']['adcDump']['0B'],
'velocity': {
'imgVel': json_data['computedData']['particleProperties']['imgVelocity'],
'trigVel': json_data['computedData']['img0Properties']['coordinates']},
'coordinates': json_data['computedData']['img1Properties']['coordinates']})
print(len(splitTest1))
So basically what you need to do is to add "json_data" in front of the indices.
Also I suggest you to write the variable "json_path" and not "base_dir" into the 'filename' field.
I found the solution with help of the post from Mattu475
I had to add the reference in front of the indices and also change on how to open the files found in my folder with the following code;
with open (json_path) as f0:
json_data = json.load(f0)
instead of pd.read_json(...)
Here the full code:
def createJsonFile() :
splitTest1 = {}
splitTest1['20mm PSL'] = []
for file in os.listdir(base_dir):
# If file is a json, construct it's full path and open it, append all json data to list
if 'json' in file:
print("filename: " ,file) # file is only the file name, the path not included
json_path = os.path.join(base_dir, file)
print("path : ", json_path)
with open (json_path) as f0:
json_data = json.load(f0)
splitTest1['20mm PSL'].append({
'filename': os.path.basename(json_path),
'triggerdata': json_data['rawData']['adcDump']['0B'],
#'imgVel': json_data['computedData']['particleProperties']['imgVelocity'],
'trigVel': json_data['computedData']['trigger']['trigVelocity'],
#'coordinatesImg0': json_data['computedData']['img0Properties']['coordinates'],
#'coordinatesImg1': json_data['computedData']['img1Properties']['coordinates']
})
return splitTest1
few lines (the ones commented out) do not function 100% yet, but the rest works.
Thank you for your help!
The issue is with this line
'imgVel': ['computedData']['particleProperties']['imgVelocity'],
And the two that come after that. What's happening there is you're creating a list with the string 'computedData' as the only element. And then trying to find the index 'particleProperties', which doesn't make sense. You can only index a list with integers. I can't really give you a "solution", but if you want imgVel to just be a list of those strings, then you would do
'imgVel': ['computedData', 'particularProperties', 'imgVelocity']
Your dict value isn't legal Python.
'triggerdata': ['rawData']['adcDump']['0B']
The value doesn't make any sense: you make a list of a single string, then you try to index it with another string. You asked for element "adcDump" of the list ['rawData'], and there isn't any such syntax.
You cannot store arbitrary source code (your partial expression) as if it were a data value.
If you want help to construct a particular reference, then please post a focused question. Please repeat how to ask from the intro tour.

Getting empty CSV in python

I have written a piece of code that compares data from two csv's and writes the final output to a new csv. The problem is except for the header nothing else is being written into the csv. Below is my code,
import csv
data_3B = open('3B_processed.csv', 'r')
reader_3B = csv.DictReader(data_3B)
data_2A = open('2A_processed.csv', 'r')
reader_2A = csv.DictReader(data_2A)
l_3B_2A = [["taxable_entity_id", "return_period", "3B", "2A"]]
for row_3B in reader_3B:
for row_2A in reader_2A:
if row_3B["taxable_entity_id"] == row_2A["taxable_entity_id"] and row_3B["return_period"] == row_2A["return_period"]:
l_3B_2A.append([row_3B["taxable_entity_id"], row_3B["return_period"], row_3B["total"], row_2A["total"]])
with open("3Bvs2A_new.csv", "w") as csv_file:
writer = csv.writer(csv_file)
writer.writerows(l_3B_2A)
csv_file.close()
How do I solve this?
Edit:
2A_processed.csv sample:
taxable_entity_id,return_period,total
2d9cc638-5ed0-410f-9a76-422e32f34779,072019,0
2d9cc638-5ed0-410f-9a76-422e32f34779,062019,0
2d9cc638-5ed0-410f-9a76-422e32f34779,082019,0
e5091f99-e725-44bc-b018-0843953a8771,082019,0
e5091f99-e725-44bc-b018-0843953a8771,052019,41711.5
920da7ba-19c7-45ce-ba59-3aa19a6cb7f0,032019,2862.94
410ecd0f-ea0f-4a36-8fa6-9488ba3c095b,082018,48253.9
3B_processed sample:
taxable_entity_id,return_period,total
1e5ccfbc-a03e-429e-b79a-68041b69dfb0,072017,0.0
1e5ccfbc-a03e-429e-b79a-68041b69dfb0,082017,0.0
1e5ccfbc-a03e-429e-b79a-68041b69dfb0,092017,0.0
f7d52d1f-00a5-440d-9e76-cb7fbf1afde3,122017,0.0
1b9afebb-495d-4516-96bd-1e21138268b7,072017,146500.0
1b9afebb-495d-4516-96bd-1e21138268b7,082017,251710.0
The csv.DictReader objects in your code can only read through the file once, because they are reading from file objects (created with open). Therefore, the second and subsequent times through the outer loop, the inner loop does not run, because there are no more row_2A values in reader_2A - the reader is at the end of the file after the first time.
The simplest fix is to read each file into a list first. We can make a helper function to handle this, and also ensure the files are closed properly:
def lines_of_csv(filename):
with open(filename) as source:
return list(csv.DictReader(source))
reader_3B = lines_of_csv('3B_processed.csv')
reader_2A = lines_of_csv('2A_processed.csv')
I put your code into a file test.py and created test files to simulate your csvs.
$ python3 ./test.py
$ cat ./3Bvs2A_new.csv
taxable_entity_id,return_period,3B,2A
1,2,3,2
$ cat ./3B_processed.csv
total,taxable_entity_id,return_period,3B,2A
3,1,2,3,4
3,4,3,2,1
$ cat ./2A_processed.csv
taxable_entity_id,return_period,2A,3B,total
1,2,3,4,2
4,3,2,1,2
So as you can see the order of the columns doesn't matter as they are being accessed correctly using the dict reader and if the first row is a match your code works but there are no rows left in the second csv file after the processing the first row from the first file. I suggest making a dictionary if taxable_entity_id and return_period tuple values, processing the first csv file by adding totals into the dict then running through the second one and looking them up.
row_lookup = {}
for row in first_csv:
rowLookup[(row['taxable_entity_id'], row['return_period'])] = row['total']
for row in second_csv:
if (row['taxable_entity_id'],row['return_period']) in row_lookup.keys():
newRow = [row['taxable_entity_id'], row['return_period'], row['total'] ,row_lookup[(row['taxable_entity_id'],row['return_period']] ]
Of course that only works if pairs of taxable_entity_ids and return_periods are always unique... Hard to say exactly what you should do without knowing the exact nature of your task and full format of your csvs.
You can do this with pandas if the data frames are equal-sized like this :
reader_3B=pd.read_csv('3B_processed.csv')
reader_2A=pd.read_csv('2A_processed.csv')
l_3B_2A=row_3B[(row_3B["taxable_entity_id"] == row_2A["taxable_entity_id"])&(row_3B["return_period"] == row_2A["return_period"])]
l_3B_2A.to_csv('3Bvs2A_new.csv')

Python:Loop through .csv of urls and save it as another column

New to python, read a bunch and watched a lot of videos. I can't get it to work and i'm getting frustrated.
I have a List of links like below:
"KGS ID","Latitude","Longitude","Location","Operator","Lease","API","Elevation","Elev_Ref","Depth_start","Depth_stop","URL"
"1002880800","37.2354869","-100.4607509","T32S R29W, Sec. 27, SW SW NE","Stanolind Oil and Gas Co.","William L. Rickers 1","15-119-00164","2705"," KB","2790","7652","http://www.kgs.ku.edu/WellLogs/32S29W/1043696830.zip"
"1002880821","37.1234622","-100.1158111","T34S R26W, Sec. 2, NW NW NE","SKELLY OIL CO","GRACE MCKINNEY 'A' 1","15-119-00181","2290"," KB","4000","5900","http://www.kgs.ku.edu/WellLogs/34S26W/1043696831.zip"
I'm trying to get python to go to "URL" and save it in a folder named "location" as filename "API.las".
ex) ......"location"/Section/"API".las
C://.../T32S R29W/Sec.27/15-119-00164.las
The file has hundred of rows and links to download. I wanted to implement a sleep function at the also to not bombard the servers.
What are some of the different ways to do this? I've tried pandas and a few other methods... any ideas?
You will have to do something like this
for link, file_name in zip(links, file_names):
u = urllib.urlopen(link)
udata = u.read()
f = open(file_name+".las", "w")
f.write(udata)
f.close()
u.close()
If the contents of your file are not what you wanted, you might want to look at a scraping library like BeautifulSoup for parsing.
This might be a little dirty, but it's a first pass at solving the problem. This is all contingent on each value in the CSV being encompassed in double quotes. If this is not true, this solution will need heavy tweaking.
Code:
import os
csv = """
"KGS ID","Latitude","Longitude","Location","Operator","Lease","API","Elevation","Elev_Ref","Depth_start","Depth_stop","URL"
"1002880800","37.2354869","-100.4607509","T32S R29W, Sec. 27, SW SW NE","Stanolind Oil and Gas Co.","William L. Rickers 1","15-119-00164","2705"," KB","2790","7652","http://www.kgs.ku.edu/WellLogs/32S29W/1043696830.zip"
"1002880821","37.1234622","-100.1158111","T34S R26W, Sec. 2, NW NW NE","SKELLY OIL CO","GRACE MCKINNEY 'A' 1","15-119-00181","2290"," KB","4000","5900","http://www.kgs.ku.edu/WellLogs/34S26W/1043696831.zip"
""".strip() # trim excess space at top and bottom
root_dir = '/tmp/so_test'
lines = csv.split('\n') # break CSV on newlines
header = lines[0].strip('"').split('","') # grab first line and consider it the header
lines_d = [] # we're about to perform the core actions, and we're going to store it in this variable
for l in lines[1:]: # we want all lines except the top line, which is a header
line_broken = l.strip('"').split('","') # strip off leading and trailing double-quote
line_assoc = zip(header, line_broken) # creates a tuple of tuples out of the line with the header at matching position as key
line_dict = dict(line_assoc) # turn this into a dict
lines_d.append(line_dict)
section_parts = [s.strip() for s in line_dict['Location'].split(',')] # break Section value to get pieces we need
file_out = os.path.join(root_dir, '%s%s%s%sAPI.las'%(section_parts[0], os.path.sep, section_parts[1], os.path.sep)) # format output filename the way I think is requested
# stuff to show what's actually put in the files
print file_out, ':'
print ' ', '"%s"'%('","'.join(header),)
print ' ', '"%s"'%('","'.join(line_dict[h] for h in header))
output:
~/so_test $ python so_test.py
/tmp/so_test/T32S R29W/Sec. 27/API.las :
"KGS ID","Latitude","Longitude","Location","Operator","Lease","API","Elevation","Elev_Ref","Depth_start","Depth_stop","URL"
"1002880800","37.2354869","-100.4607509","T32S R29W, Sec. 27, SW SW NE","Stanolind Oil and Gas Co.","William L. Rickers 1","15-119-00164","2705"," KB","2790","7652","http://www.kgs.ku.edu/WellLogs/32S29W/1043696830.zip"
/tmp/so_test/T34S R26W/Sec. 2/API.las :
"KGS ID","Latitude","Longitude","Location","Operator","Lease","API","Elevation","Elev_Ref","Depth_start","Depth_stop","URL"
"1002880821","37.1234622","-100.1158111","T34S R26W, Sec. 2, NW NW NE","SKELLY OIL CO","GRACE MCKINNEY 'A' 1","15-119-00181","2290"," KB","4000","5900","http://www.kgs.ku.edu/WellLogs/34S26W/1043696831.zip"
~/so_test $
Approach 1 :-
Your file has suppose 1000 rows.
Create masterlist which has the data stored in this form ->
[row1,row2,row3 and so on]
Once done, loop through this masterlist. You will get a a row in string format in every iteration.
split it make a list and splice the last column of url i.e. row[-1]
and append it to a empty list named result_url. Once it has run for all rows, save it in a file and you can easily create a directory using os module and move your file over there
Approach 2 :-
If file is too huge, read line one by one in try block and process your data (using csv module you can get each row as a list, splice url and write it to file API.las everytime).
Once your program moves 1001th line it will move to except block where you can just 'pass' or write a print to get notified.
In approach 2, you are not saving all data in any data structure, you just storing a single row at executing it, so it is more fast.
import csv, os
directory_creater = os.mkdir('Locations')
fme = open('./Locations/API.las','w+')
with open('data.csv','r') as csvfile:
spamreader = csv.reader(csvfile, delimiter = ',')
print spamreader.next()
while True:
try:
row= spamreader.next()
get_url = row[-1]
to_write = get_url+'\n'
fme.write(to_write)
except:
print "Program has run. Check output."
exit(1)
This code can do all that you mentioned efficiently in less time.

Add values to JSON object in Python

I've got a valid JSON object, with a number of bike accidents listed:
{
"city":"San Francisco",
"accidents":[
{
"lat":37.7726483,
"severity":"u'INJURY",
"street1":"11th St",
"street2":"Kissling St",
"image_id":0,
"year":"2012",
"date":"u'20120409",
"lng":-122.4150145
},
],
"source":"http://sf-police.org/"
}
I'm trying to use the json library in python to load the data and then add fields to the objects in the "accidents" array. I've loaded my json like so:
with open('sanfrancisco_crashes_cp.json', 'rw') as json_data:
json_data = json.load(json_data)
accidents = json_data['accidents']
When I try to write to the file like so:
for accident in accidents:
turn = randTurn()
accidents.write(accident['Turn'] = 'right')
I get the following error: SyntaxError: keyword can't be an expression
I've tried a number of different ways. How can you add data to a JSON object using Python?
First, accidents is a dictionary, and you can't write to a dictionary; you just set values in it.
So, what you want is:
for accident in accidents:
accident['Turn'] = 'right'
The thing you want to write out is the new JSON—after you've finished modifying the data, you can dump it back to a file.
Ideally you do this by writing to a new file, then moving it over the original:
with open('sanfrancisco_crashes_cp.json') as json_file:
json_data = json.load(json_file)
accidents = json_data['accidents']
for accident in accidents:
accident['Turn'] = 'right'
with tempfile.NamedTemporaryFile(dir='.', delete=False) as temp_file:
json.dump(temp_file, json_data)
os.replace(temp_file.name, 'sanfrancisco_crashes_cp.json')
But you can do it in-place if you really want to:
# notice r+, not rw, and notice that we have to keep the file open
# by moving everything into the with statement
with open('sanfrancisco_crashes_cp.json', 'r+') as json_file:
json_data = json.load(json_file)
accidents = json_data['accidents']
for accident in accidents:
accident['Turn'] = 'right'
# And we also have to move back to the start of the file to overwrite
json_file.seek(0, 0)
json.dump(json_file, json_data)
json_file.truncate()
If you're wondering why you got the specific error you did:
In Python—unlike many other languages—assignments aren't expressions, they're statements, which have to go on a line all by themselves.
But keyword arguments inside a function call have a very similar syntax. For example, see that tempfile.NamedTemporaryFile(dir='.', delete=False) in my example code above.
So, Python is trying to interpret your accident['Turn'] = 'right' as if it were a keyword argument, with the keyword accident['Turn']. But keywords can only be actual words (well, identifiers), not arbitrary expressions. So its attempt to interpret your code fails, and you get an error saying keyword can't be an expression.
I solved with that :
with open('sanfrancisco_crashes_cp.json') as json_file:
json_data = json.load(json_file)
accidents = json_data['accidents']
for accident in accidents:
accident['Turn'] = 'right'
with open('sanfrancisco_crashes_cp.json', "w") as f:
json.dump(json_data, f)

Python3.3 error loadng a JSON file with multiple entries

I have a JSON file named debug.json that I created in Python3.3 that looks like this:
{"TIME": 55.55, "ID":155,"DATA": [17,22,33,44,55]}{"TIME": 56.55, "ID":195,"DATA": [17,22,ff,44,55]}
I'm trying to load it with the following code:
import json
with open("debug.json",'r',encoding='utf-8') as f:
testing = json.loads(f.read())
However when I try this I get the following error:
ValueError: Extra data line 1 column 92
This is where the second JSON object starts in the text file...I'm guessing that I am missing something pretty trivial here but I haven't found any examples that relate to my problem. Any help is appreciated!
Use json.JSONDecoder.raw_decode, which accepts JSON with extra data at the end, such as another JSON object, and returns a tuple with the first object decoded and the position of the next object.
Example with your JSON :
import json
js = """{"TIME": 55.55, "ID":155,"DATA": [17,22,33,44,55]}{"TIME": 56.55, "ID":195,"DATA": [17,22,ff,44,55]}"""
json.JSONDecoder().raw_decode(js) # ({'TIME': 55.55, 'DATA': [17, 22, 33, 44, 55], 'ID': 155}, 50)
js[50:] # '{"TIME": 56.55, "ID":195,"DATA": [17,22,ff,44,55]}'
As you can see, it successfully decoded the first object and told us where the next object starts (in this case at the 50th character).
Here is a function that I made that can decode multiple JSON objects and returns a list with all of them :
def multipleJSONDecode(js):
result = []
while js:
obj, pos = json.JSONDecoder().raw_decode(js)
result.append(obj)
js = js[pos:]
return result
When you create the file, make sure that you have at most one valid JSON string per line. Then, when you need to read them back out, you can loop over the lines in the file one at a time:
import json
testing = []
with open("debug.json",'r',encoding='utf-8') as f:
for line in f:
testing.append(json.loads(line))

Categories

Resources