I have a file 'test.json' which contains an array "rows" and another sub array "allowed" in which some alphabets are there like "A","B" etc.but i want to modify the contents of subarray. how can i do??
test.json file is following:
{"rows": [
{
"Company": "google",
"allowed": ["A","B","C"]},#array containg 3 variables
{
"Company": "Yahoo",
"allowed": ["D","E","F"]#array contanig 3 variables
}
]}
But i want to modify "allowed" array . and want to update 3rd index as "LOOK" instead of "C". so that the resultant array should looks like:
{"rows": [
{
"Company": "google",
"allowed": ["A","B","LOOK"]#array containg 3 variables
},
{
"Company": "Yahoo", #array containing 3 variables
"allowed": ["D","E","F"] #array containing 3 variables
}
]}
My program:
import json
with open('test.json') as f:
data = json.load(f)
for row in data['rows']:
a_dict = {row['allowed'][1]:"L"}
with open('test.json') as f:
data = json.load(f)
data.update(a_dict)
with open('test.json', 'w') as f:
json.dump(data, f,indent=2)
There are a couple of problems with your program as it is.
The first issue is you're not looking up the last element of your 'allowed' arrays:
a_dict = {row['allowed'][1]:"L"}
Remember, array indicies start at zero. eg:
['Index 0', 'Index 1', 'Index 2']
But the main problem is when you walk over each row, you fetch the contents of
that row, but then don't do anything with it.
import json
with open('test.json') as f:
data = json.load(f)
for row in data['rows']:
a_dict = {row['allowed'][1]:"L"}
# a_dict is twiddling its thumbs here...
# until it's replaced by the next row's contents
...
It just gets replaced by the next row of the for loop, until you're left with the
final row all by itself in "a_dict", since the last one of course isn't overwritten by
anything. Which in you sample, would be:
{'E': 'L'}
Next you load the original json data again (though, you don't need to -- it's
still in your data variable, unmodified), and add a_dict to it:
with open('test.json') as f:
data = json.load(f)
data.update(a_dict)
This leaves you with this:
{
"rows": [
{
"Company": "google",
"allowed": ["A", "B", "C"]
},
{
"Company": "Yahoo",
"allowed": ["D", "E", "F"]
}
],
"E": "L"
}
So, to fix this, you need to:
Point at the correct 'allowed' index (in your case, that'll be [2]), and
Modify the rows, instead of copying them out and merging them back into data.
In your for loop, each row in data['rows'] is pointing at the value in data, so you can update the contents of row, and your work is done.
One thing I wasn't clear on was whether you meant to update all rows (implied by your looping over all rows), or just update the first row (as shown in your example desired output).
So here's a sample fix which works in either case:
import json
modify_first_row_only = True
with open('test.json', 'r') as f:
data = json.load(f)
rows = data['rows']
if modify_first_row_only:
rows[0]['allowed'][2] = 'LOOK'
else:
for row in rows:
row['allowed'][2] = 'LOOK'
with open('test.json', 'w') as f:
json.dump(data, f, indent=2)
Related
I am trying to convert a very long JSON file to CSV. I'm currently trying to use the code below to accomplish this.
import json
import csv
with open('G:\user\jsondata.json') as json_file:
jsondata = json.load(json_file)
data_file = open('G:\user\jsonoutput.csv', 'w', newline='')
csv_writer = csv.writer(data_file)
count = 0
for data in jsondata:
if count == 0:
header = data.keys()
csv_writer.writerow(header)
count += 1
csv_writer.writerow(data.values())
data_file.close()
This code accomplishes writing all the data to a CSV, However only takes the keys for from the first JSON line to use as the headers in the CSV. This would be fine, but further in the JSON there are more keys to used. This causes the values to be disorganized. I was wondering if anyone could help me find a way to get all the possible headers and possibly insert NA when a line doesn't contain that key or values for that key.
The JSON file is similar to this:
[
{"time": "1984-11-04:4:00", "dateOfevent": "1984-11-04", "action": "TAKEN", "Country": "Germany", "Purchased": "YES", ...},
{"time": "1984-10-04:4:00", "dateOfevent": "1984-10-04", "action": "NOTTAKEN", "Country": "Germany", "Purchased": "NO", ...},
{"type": "A4", "time": "1984-11-04:4:00", "dateOfevent": "1984-11-04", "Country": "Germany", "typeOfevent": "H7", ...},
{...},
{...},
]
I've searched for possible solutions all over, but was unable to find anyone having a similar issue.
If want to use csv and json modules to do this then can do it in two passes. First pass collects the keys for the CSV file and second pass writes the rows to CSV file. Also, must use a DictWriter since the keys differ in the different records.
import json
import csv
with open('jsondata.json') as json_file:
jsondata = json.load(json_file)
# stage 1 - populate column names from JSON
keys = []
for data in jsondata:
for k in data.keys():
if k not in keys:
keys.append(k)
# stage 2 - write rows to CSV file
with open('jsonoutput.csv', 'w', newline='') as fout:
csv_writer = csv.DictWriter(fout, fieldnames=keys)
csv_writer.writeheader()
for data in jsondata:
csv_writer.writerow(data)
Could you try to read it normally, and then cobert it to csv using .to_csv like this:
df = pd.read_json('G:\user\jsondata')
#df = pd.json_normalize(df['Column Name']) #if you want to normalize it
dv.to_csv('example.csv')
I am not able to generate a proper csv file using the below code. But when I query in individually, I am getting the desired result. Below is the my json file and code
{
"quiz": {
"maths": {
"q2": {
"question": "12 - 8 = ?",
"options": [
"1",
"2",
"3",
"4"
],
"answer": "4"
},
"q1": {
"question": "5 + 7 = ?",
"options": [
"10",
"11",
"12",
"13"
],
"answer": "12"
}
},
"sport": {
"q1": {
"question": "Which one is correct team name in NBA?",
"options": [
"New York Bulls",
"Los Angeles Kings",
"Golden State Warriros",
"Huston Rocket"
],
"answer": "Huston Rocket"
}
}
}
}
import json
import csv
# Opening JSON file and loading the data
# into the variable data
with open('tempjson.json', 'r') as jsonFile:
data = json.load(jsonFile)
flattenData=flatten(data)
employee_data=flattenData
# now we will open a file for writing
data_file = open('data_files.csv', 'w')
# create the csv writer object
csv_writer = csv.writer(data_file)
# Counter variable used for writing
# headers to the CSV file
count = 0
for emp in employee_data:
if count == 0:
# Writing headers of CSV file
header = emp
csv_writer.writerow(header)
count += 1
# Writing data of CSV file
#csv_writer.writerow(employee_data.get(emp))
data_file.close()
Once the above code execute, I get the information as below:
I am not getting it what I am doing wrong. I am flattenning my json file and then trying to change it to csv
You can manipulate the JSON easily with Pandas Dataframes and save it to a CSV.
I'm not sure how your desired CSV should look like, but the following code generates a CSV with columns question, options, and answers. It generates an index column with the name of the quiz and the question number in an alphabetically ordered list (your JSON was unordered). The code below will also work when more different quizzes and questions are added.
Maybe converting it natively in Python is performance-wise better, but manipulation using Pandas makes it easier.
import pandas as pd
# create Pandas dataframe from JSON for easy manipulation
df = pd.read_json("tempjson.json")
# create result dataframe
df_result = pd.DataFrame()
# Get nested dict from each dataframe row
for index, row in df.iterrows():
# Convert it into a new dataframe
df_temp = pd.DataFrame.from_dict(df.loc[index]['quiz'], orient='index')
# Add name of quiz to index
df_temp.index = index + ' ' + df_temp.index
# Append row result to final dataframe
df_result = df_result.append(df_temp)
# Optionally sort alphabetically so questions are in order
df_result.sort_index(inplace=True)
# convert dataframe to CSV
df_result.to_csv('quiz.csv')
Update on request: Export to CSV using flattened JSON:
import json
import csv
from flatten_json import flatten
import pandas
# Opening JSON file and loading the data
# into the variable data
with open("tempjson.json", 'r') as jsonFile:
data = json.load(jsonFile)
flattenData=flatten(data)
df = pd.DataFrame.from_dict(flattenData, orient='index')
# convert dataframe to CSV
df.to_csv('quiz.csv', header=False)
Results in the following CSV (Not sure what your desired outcome is since you did not provide the desired result in your question).
Basically I'll have a bunch of small dictionary, like such:
dictionary_list = [
{"eight": "yes", "queen": "yes", "we": "yes", "eighteen": "yes"},
{"nine": "yes", "king": "yes","we": "yes", "nineteen": "yes"}
]
Then I have a csv file with a whole bunch of columns with words in the header as well, like this:
There could be 500 columns each with 1 word, and I don't know the order of which a column appears. I do, however, know that any word in my small dictionary should match to the word in a column.
I want to iterate through the headers of the file (skipping first to the 5 column headers) and each time see if the header name can be found in the dictionary, and if so, add the value into that row, if not, add a "no". This will be done row by row, where each row is for one of the small dictionaries. Results using the above dictionary for this file would be:
So far I've been able to try the following that doesn't really work:
f = open("file.csv", "r")
writer = csv.DictWriter(f)
for dict in dictionary_list: # this is the collection of little dictionaries
# do some other stuff
for r in writer:
#not sure how to skip 10 columns here. next() seems to work on rows
for col in r:
if col in dict.keys():
writer.writerow(dict.values())
else:
writer.writerow("no")
Given an input file headers.csv:
row1,row2,row3,row4,row5,bad,good,eight,nine,queen,three,eighteen,nineteen,king,jack,ace,we,them,you,two
The following code generates your output:
import csv
dictionary_list = [{"eight": "yes", "queen": "yes", "we": "yes", "eighteen": "yes"},
{"nine": "yes", "king": "yes","we": "yes", "nineteen": "yes"}]
# Read the input header line as a list
with open('headers.csv',newline='') as f:
reader = csv.reader(f)
headers = next(reader)
# Generate the fixed values for the first 5 rows.
rowvals = dict(zip(headers[:5],['x'] * 5))
with open('file.csv', 'w', newline='') as f:
# When writing a row, restval is the default value when it isn't in the dict row.
# extrasaction='ignore' prevents complaining if all columns are not present in dict row.
writer = csv.DictWriter(f,headers,restval='no',extrasaction='ignore')
writer.writeheader()
for dictionary in dictionary_list:
D = dictionary.copy() # needed if the original shouldn't be modified.
D.update(rowvals)
writer.writerow(D)
Output:
row1,row2,row3,row4,row5,bad,good,eight,nine,queen,three,eighteen,nineteen,king,jack,ace,we,them,you,two
x,x,x,x,x,no,no,yes,no,yes,no,yes,no,no,no,no,yes,no,no,no
x,x,x,x,x,no,no,no,yes,no,no,no,yes,yes,no,no,yes,no,no,no
‘Pandas’ may help you.
Here is the website http://pandas.pydata.org/pandas-docs/stable/.
You can process csv file by using pandas.read_csv() method and add some data as you want by using Dataframe.append() method.
Hope these would be helpful for you.
Your question appears to be asking to ensure fields from your dictionary_list exist the record. If the field originally existed in the record set the field value to yes, otherwise add the field to the record and set the value to no.
#!/usr/bin/env python3
import csv
dictionary_list = [
{"eight": "yes", "queen": "yes", "we": "yes", "eighteen": "yes"},
{"nine": "yes", "king": "yes","them": "yes", "nineteen": "yes"}
]
"""
flatten all the dicionary keys into a uniq list as the
key names will be used for field names and can't be duplicated
"""
field_check = set([k for d in dictionary_list for k in d.keys()])
if __name__ == "__main__":
with open("file.csv", "r") as f:
reader = csv.DictReader(f)
# do not consider the first 10 colums
field_tail = set(reader.fieldnames[10:])
"""
initialize yes and no fields as they
should be the same for every row in the file
"""
yes_fields = set(field_check & field_tail)
no_fields = field_check.difference(yes_fields)
yes_dict = {k:"yes" for k in yes_fields}
no_dict = {k:"no" for k in no_fields}
for row in reader:
row.update(yes_dict)
row.update(no_dict)
print(row)
I have the following csv file (1.csv):
"STUB_1","current_week","previous_week","weekly_diff"
"Crude Oil",1184.951,1191.649,-6.698
Need to convert to the following json
json_body = [
{
"measurement":"Crude Oil",
"fields":
{
"weekly_diff":-6.698,
"current_week":1184.951,
"previous_week":1191.649
}
}
]
df = pd.read_csv("1.csv")
df = df.rename(columns={'STUB_1': 'measurement'})
j = (df.groupby(['measurement'], as_index=True)
.apply(lambda x: x[['current_week','previous_week', 'weekly_diff']].to_dict('r'))
.reset_index()
.rename(columns={0:'fields'})
.to_json(orient='records'))
print j
output:
[
{
"measurement": "Crude Oil",
"fields":
[ #extra bracket
{
"weekly_diff": -6.698,
"current_week": 1184.951,
"previous_week": 1191.649
}
] # extra bracket
}
]
which is almost what I need but with extra [ ].
can anyone help what I did wrong? thank you!
Don't use pandas for this - you would have to do a lot of manual unraveling to turn your table data into a hierarchical structure so why not just skip the middle man and use the built-in csv and json modules to do the task for you, e.g.
import csv
import json
with open("1.csv", "rU") as f: # open your CSV file for reading
reader = csv.DictReader(f, quoting=csv.QUOTE_NONNUMERIC) # DictReader for convenience
data = [{"measurement": r.pop("STUB_1", None), "fields": r} for r in reader] # convert!
data_json = json.dumps(data, indent=4) # finally, serialize the data to JSON
print(data_json)
and you get:
[
{
"measurement": "Crude Oil",
"fields": {
"current_week": 1184.951,
"previous_week": 1191.649,
"weekly_diff": -6.698
}
}
]
However, keep in mind that if you have multiple entries with the same STUB_1 value only the latest will be kept - otherwise you'd have to store your fields as a list which will bring you to your original problem with the data.
A quick note on how it does what it does - first we create a csv.DictReader - it's a convenience reader that will map each row's entry with the header fields. It also uses quoting=csv.QUOTE_NONNUMERIC to ensure automatic conversion to floats for all non-quoted fields in your CSV. Then, in the list comprehension, it essentially reads row by row from the reader and creates a new dict for each row - the measurement key contains the STUB_1 entry (which gets immediately removed with dict.pop()) and fields contains the remaining entries in the row. Finally, the json module is used to serialize this list into a JSON that you want.
Also, keep in mind that JSON (and Python <3.5) doesn't guarantee the order of elements so your measurement entry might appear after the fields entry and same goes for the sub-entries of fields. Order shouldn't matter anyway (except for a few very specific cases) but if you want to control it you can use collections.OrderedDict to build your inner dictionaries in the order you prefer to look at once serialized to JSON.
I am writing a little script which loops through a .csv, stores each row in the file as a dictionary, and fires off that dictionary to an API in a 1-dimensional list.
import csv
import requests
with open('csv.csv', 'rU') as f:
reader = csv.reader(f, skipinitialspace=True)
header = next(reader)
for row in reader:
request = [dict(zip(header, map(str, row)))]
r = requests.post(url, headers = i_headers, json = request)
print str(reader.line_num) + "-" + str(r)
The request list looks something like this:
[
{
"id": "1",
"col_1": "A",
"col_2": "B",
"col_3": "C"
}
]
This script works, but I'm looping through an 8 million row .csv, and this method is simply too slow. I would like to speed up this process by sending more than one row per API call. The API I'm working with allows me to send up to 100 rows per call.
How can I change this script to incrementally build lists containing 100 dictionaries, post that to the API and then repeat. A sample of what I'd be sending to this API would look like this:
[
{
"id": "1",
"col_1": "A",
"col_2": "B",
"col_3": "C"
},
{
"id": "2",
"col_1": "A",
"col_2": "B",
"col_3": "C"
},
...
...
...
{
"id": "100",
"col_1": "A",
"col_2": "B",
"col_3": "C"
}
]
One thing that won't work is to build a massive list and then partition it into n lists of size 100. The reason being that my machine cannot hold all of that data in memory at any given time.
It is possible to do this by using range(100) and except StopIteration:, but it's not very pretty. Instead, a generator is perfect for getting chunks of 100 rows at a time from your CSV file. As it doesn't clutter up your actual iteration and request logic, it makes for fairly elegant code. Check it:
import csv
import requests
from itertools import islice
def chunks(iterator, size):
iterator = iter(iterator)
chunk = tuple(islice(iterator, size))
while chunk:
yield chunk
chunk = tuple(islice(iterator, size))
with open('csv.csv', 'rU') as f:
reader = csv.reader(f, skipinitialspace=True)
header = next(reader)
for rows in chunks(reader, 100):
rows = [dict(zip(header, map(str, row))) for row in rows]
r = requests.post(url, headers=i_headers, json=rows)
print str(reader.line_num) + "-" + str(r)
I'm not entirely sure where you're getting i_headers from, however, but I assume you've got that figured out in your actual code.
You can create a list of requests, and whenever its size is big enough, send it to the API:
import csv
import requests
with open('csv.csv', 'rU') as f:
reader = csv.reader(f, skipinitialspace=True)
header = next(reader)
requestList = []
for row in reader:
requestList.append(dict(zip(header, map(str, row))))
if len(requestList) >= 100:
r = requests.post(url, headers = i_headers, json = requestList)
print str(reader.line_num) + "-" + str(r)
requestList = []
Then, you just need to take care, that you also call the API for the last, non-full list. Can either be done by calling the API with the remaining list after the loop, or the CSV reader can tell you whether it's the last row.