How to convert CSV file to multiline JSON? - python

Here's my code, really simple stuff...
import csv
import json
csvfile = open('file.csv', 'r')
jsonfile = open('file.json', 'w')
fieldnames = ("FirstName","LastName","IDNumber","Message")
reader = csv.DictReader( csvfile, fieldnames)
out = json.dumps( [ row for row in reader ] )
jsonfile.write(out)
Declare some field names, the reader uses CSV to read the file, and the filed names to dump the file to a JSON format. Here's the problem...
Each record in the CSV file is on a different row. I want the JSON output to be the same way. The problem is it dumps it all on one giant, long line.
I've tried using something like for line in csvfile: and then running my code below that with reader = csv.DictReader( line, fieldnames) which loops through each line, but it does the entire file on one line, then loops through the entire file on another line... continues until it runs out of lines.
Any suggestions for correcting this?
Edit: To clarify, currently I have: (every record on line 1)
[{"FirstName":"John","LastName":"Doe","IDNumber":"123","Message":"None"},{"FirstName":"George","LastName":"Washington","IDNumber":"001","Message":"Something"}]
What I'm looking for: (2 records on 2 lines)
{"FirstName":"John","LastName":"Doe","IDNumber":"123","Message":"None"}
{"FirstName":"George","LastName":"Washington","IDNumber":"001","Message":"Something"}
Not each individual field indented/on a separate line, but each record on it's own line.
Some sample input.
"John","Doe","001","Message1"
"George","Washington","002","Message2"

The problem with your desired output is that it is not valid json document,; it's a stream of json documents!
That's okay, if its what you need, but that means that for each document you want in your output, you'll have to call json.dumps.
Since the newline you want separating your documents is not contained in those documents, you're on the hook for supplying it yourself. So we just need to pull the loop out of the call to json.dump and interpose newlines for each document written.
import csv
import json
csvfile = open('file.csv', 'r')
jsonfile = open('file.json', 'w')
fieldnames = ("FirstName","LastName","IDNumber","Message")
reader = csv.DictReader( csvfile, fieldnames)
for row in reader:
json.dump(row, jsonfile)
jsonfile.write('\n')

You can use Pandas DataFrame to achieve this, with the following Example:
import pandas as pd
csv_file = pd.DataFrame(pd.read_csv("path/to/file.csv", sep = ",", header = 0, index_col = False))
csv_file.to_json("/path/to/new/file.json", orient = "records", date_format = "epoch", double_precision = 10, force_ascii = True, date_unit = "ms", default_handler = None)

import csv
import json
file = 'csv_file_name.csv'
json_file = 'output_file_name.json'
#Read CSV File
def read_CSV(file, json_file):
csv_rows = []
with open(file) as csvfile:
reader = csv.DictReader(csvfile)
field = reader.fieldnames
for row in reader:
csv_rows.extend([{field[i]:row[field[i]] for i in range(len(field))}])
convert_write_json(csv_rows, json_file)
#Convert csv data into json
def convert_write_json(data, json_file):
with open(json_file, "w") as f:
f.write(json.dumps(data, sort_keys=False, indent=4, separators=(',', ': '))) #for pretty
f.write(json.dumps(data))
read_CSV(file,json_file)
Documentation of json.dumps()

I took #SingleNegationElimination's response and simplified it into a three-liner that can be used in a pipeline:
import csv
import json
import sys
for row in csv.DictReader(sys.stdin):
json.dump(row, sys.stdout)
sys.stdout.write('\n')

You can try this
import csvmapper
# how does the object look
mapper = csvmapper.DictMapper([
[
{ 'name' : 'FirstName'},
{ 'name' : 'LastName' },
{ 'name' : 'IDNumber', 'type':'int' },
{ 'name' : 'Messages' }
]
])
# parser instance
parser = csvmapper.CSVParser('sample.csv', mapper)
# conversion service
converter = csvmapper.JSONConverter(parser)
print converter.doConvert(pretty=True)
Edit:
Simpler approach
import csvmapper
fields = ('FirstName', 'LastName', 'IDNumber', 'Messages')
parser = CSVParser('sample.csv', csvmapper.FieldMapper(fields))
converter = csvmapper.JSONConverter(parser)
print converter.doConvert(pretty=True)

I see this is old but I needed the code from SingleNegationElimination however I had issue with the data containing non utf-8 characters. These appeared in fields I was not overly concerned with so I chose to ignore them. However that took some effort. I am new to python so with some trial and error I got it to work. The code is a copy of SingleNegationElimination with the extra handling of utf-8. I tried to do it with https://docs.python.org/2.7/library/csv.html but in the end gave up. The below code worked.
import csv, json
csvfile = open('file.csv', 'r')
jsonfile = open('file.json', 'w')
fieldnames = ("Scope","Comment","OOS Code","In RMF","Code","Status","Name","Sub Code","CAT","LOB","Description","Owner","Manager","Platform Owner")
reader = csv.DictReader(csvfile , fieldnames)
code = ''
for row in reader:
try:
print('+' + row['Code'])
for key in row:
row[key] = row[key].decode('utf-8', 'ignore').encode('utf-8')
json.dump(row, jsonfile)
jsonfile.write('\n')
except:
print('-' + row['Code'])
raise

Add the indent parameter to json.dumps
data = {'this': ['has', 'some', 'things'],
'in': {'it': 'with', 'some': 'more'}}
print(json.dumps(data, indent=4))
Also note that, you can simply use json.dump with the open jsonfile:
json.dump(data, jsonfile)

Use pandas and the json library:
import pandas as pd
import json
filepath = "inputfile.csv"
output_path = "outputfile.json"
df = pd.read_csv(filepath)
# Create a multiline json
json_list = json.loads(df.to_json(orient = "records"))
with open(output_path, 'w') as f:
for item in json_list:
f.write("%s\n" % item)

How about using Pandas to read the csv file into a DataFrame (pd.read_csv), then manipulating the columns if you want (dropping them or updating values) and finally converting the DataFrame back to JSON (pd.DataFrame.to_json).
Note: I haven't checked how efficient this will be but this is definitely one of the easiest ways to manipulate and convert a large csv to json.

As slight improvement to #MONTYHS answer, iterating through a tup of fieldnames:
import csv
import json
csvfilename = 'filename.csv'
jsonfilename = csvfilename.split('.')[0] + '.json'
csvfile = open(csvfilename, 'r')
jsonfile = open(jsonfilename, 'w')
reader = csv.DictReader(csvfile)
fieldnames = ('FirstName', 'LastName', 'IDNumber', 'Message')
output = []
for each in reader:
row = {}
for field in fieldnames:
row[field] = each[field]
output.append(row)
json.dump(output, jsonfile, indent=2, sort_keys=True)

def read():
noOfElem = 200 # no of data you want to import
csv_file_name = "hashtag_donaldtrump.csv" # csv file name
json_file_name = "hashtag_donaldtrump.json" # json file name
with open(csv_file_name, mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file)
with open(json_file_name, 'w') as json_file:
i = 0
json_file.write("[")
for row in csv_reader:
i = i + 1
if i == noOfElem:
json_file.write("]")
return
json_file.write(json.dumps(row))
if i != noOfElem - 1:
json_file.write(",")
Change the above three parameter, everything will be done.

import csv
import json
csvfile = csv.DictReader('filename.csv', 'r'))
output =[]
for each in csvfile:
row ={}
row['FirstName'] = each['FirstName']
row['LastName'] = each['LastName']
row['IDNumber'] = each ['IDNumber']
row['Message'] = each['Message']
output.append(row)
json.dump(output,open('filename.json','w'),indent=4,sort_keys=False)

Related

python nested json to csv Incomplete conversion

this is my json file
{"_index":"core-bvd-locations","_type":"_doc","_id":"75b82cba4a80784f4fa36d14c86f6d85","_score":1,"_source":{"a_id":"FR518077177","a_id_type":"BVD ID","a_name":"Moisan Patrick Roger","a_name_normal":"MOISAN PATRICK ROGER","a_country_code":"FR","a_country":"France","a_in_compliance_db":false,"a_nationality":"FR","a_street_address":"Les Carmes","a_city":"Decize","a_postcode":"58300","a_region":"Bourgogne-Franche-Comte|Nievre","a_phone":"+33 603740000","a_latitude":46.79402777777778,"a_longitude":3.496277777777778,"a_national_ids":{"European VAT number":["FR58 518077177"],"SIREN number":["518077177"],"TIN":["518077177"],"SIRET number":["518077177-00013"]},"relationship":"Location info","file_name":"/media/hedwig/iforce/data/BvD/s3-transfer/SuperTable_v3_json/locations/part-00021-1f62c713-17a0-410d-9b18-32328d9836d6-c000.json","a_geo_point":{"lat":46.79402777777778,"lon":3.496277777777778}}}
this is my code
import csv
import json
import sys
import codecs
def trans(path):
jsonData = codecs.open('F:\\1.json', 'r', 'utf-8')
# csvfile = open(path+'.csv', 'w') #
# csvfile = open(path+'.csv', 'wb') # python2
csvfile = open('F:\\1.csv', 'w', encoding='utf-8', newline='') #
writer = csv.writer(csvfile, delimiter=',')
flag = True
for line in jsonData:
dic = json.loads(line)
if flag:
keys = list(dic.keys())
print(keys)
writer.writerow(keys)
flag = False
writer.writerow(list(dic.values()))
jsonData.close()
csvfile.close()
if __name__ == '__main__':
path=str(sys.argv[0]) #
print(path)
trans(path)
C:\Users\jeri\PycharmProjects\pythonProject9\venv\Scripts\python.exe C:\Users\jeri\PycharmProjects\pythonProject9\zwc_count_file.py
C:\Users\jeri\PycharmProjects\pythonProject9\zwc_count_file.py
['_index', '_type', '_id', '_score', '_source']
Process finished with exit code 0
output jie
enter image description here
Information in nested json file cannot be parse, how can i modify the code
import json
import pandas as pd
file_data = open("json_filname.json",'r').read()
data= json.loads(file_data)
df = pd.json_normalize(data)
df
json.load(): json.load() accepts file object, parses the JSON data, populates a Python dictionary with the data and returns it back to you.
import json
# Opening JSON file
f = open('data.json')
# returns JSON object as
# a dictionary
data = json.load(f)
writer.writerow write the entire row, rigth sintaxis
writer.writerow(#iterable_object#)

Converting JSON to CSV, CSV is empty

I'm attempting to convert yelps data set that is in JSON to a csv format. The new csv file that is created is empty.
I've tried different ways to iterate through the JSON but they all give me a zero bytes file.
The json file looks like this:
{"business_id":"1SWheh84yJXfytovILXOAQ","name":"Arizona Biltmore Golf Club","address":"2818 E Camino Acequia Drive","city":"Phoenix","state":"AZ","postal_code":"85016","latitude":33.5221425,"longitude":-112.0184807,"stars":3.0,"review_count":5,"is_open":0,"attributes":{"GoodForKids":"False"},"categories":"Golf, Active Life","hours":null}
import json
import csv
infile = open("business.json","r")
outfile = open("business2.csv","w")
data = json.load(infile)
infile.close()
out = csv.writer(outfile)
out.writerow(data[0].keys())
for row in data:
out.writerow(row.values())
I get an "extra data" message when the code runs. The new business2 csv file is empty and the size is zero bytes.
if you JSON has only one row.. then try this
infile = open("business.json","r")
outfile = open("business2.csv","w")
data = json.load(infile)
infile.close()
out = csv.writer(outfile)
#print(data.keys())
out.writerow(data.keys())
out.writerow(data.values())
Hi Please try the below code, by using with command the file access will automatically get closed when the control moves out of scope of with
infile = open("business.json","r")
outfile = open("business2.csv","w")
data = json.load(infile)
infile.close()
headers = list(data.keys())
values = list(data.values())
with open("business2.csv","w") as outfile:
out = csv.writer(outfile)
out.writerow(headers)
out.writerow(values)
You need to use with to close file.
import json
import csv
infile = open("business.json","r")
data = json.load(infile)
infile.close()
with open("business2.csv","w") as outfile:
out = csv.writer(outfile)
out.writerow(list(data.keys()))
out.writerow(list(data.values()))

Missing headers in csv output file (python)

I am trying to output a csv file, but the problem is, the headers are gone, and I tried looking at my code line by line but I don't know what's wrong with my code..
My sample data is :
ABC.csv (assuming there are multiple data in it so I also add the code on how to remove it)
KeyID,GeneralID
145258,KL456
145259,BG486
145260,HJ789
145261,KL456
145259,BG486
145259,BG486
My code:
import csv
import fileinput
from collections import Counter
file_path_1 = "ABC.csv"
key_id = []
general_id = []
total_general_id = []
with open(file_path_1, 'rU') as f:
reader = csv.reader(f)
header = next(reader)
lines = [line for line in reader]
counts = Counter([l[1] for l in lines])
new_lines = [l + [str(counts[l[1])] for l in lines]
with open(file_path_1, 'wb') as f:
writer = csv.writer(f)
writer.writerow(header + ['Total_GeneralID'])
writer.writerows(new_lines)
with open(file_path_1, 'rU') as f:
reader = csv.DictReader(f)
for row in reader:
key_id.append(row['KeyID'])
general_id.append(row['GeneralID'])
total_general_id.append(['Total_GeneralID'])
New_List = [[] for _ in range(len(key_id))]
for attr in range(len(key_id)):
New_List[attr].append(key_id[attr])
New_List[attr].append(general_id[attr])
New_List[attr].append(total_general_id[attr])
with open('result_id_with_total.csv', 'wb+') as newfile:
header = ['KEY ID', 'GENERAL ID' , 'TOTAL GENERAL ID']
wr = csv.writer(newfile, delimiter=',', quoting = csv.QUOTE_MINIMAL)
wr.writerow(header) #I already add the headers but it won't work.
for item in New_List:
if item not in newfile:
wr.writerow(item)
Unfortunately, my output would be like this(result_id_with_total.csv);
145258,KL456,2
145259,BG486,1
145260,HJ789,1
145261,KL456,2
What I am trying to achieve;
KEY ID,GENERAL ID,TOTAL GENERAL ID
145258,KL456,2
145259,BG486,1
145260,HJ789,1
145261,KL456,2
My main problem in this code:
wr.writerow(header)
won't work.
This is to do with opening a file with wb+ (write bytes). Because when you write a file in bytes mode you need to pass to it an array of bytes and not strings.
I get this error in the console when I run it:
TypeError: a bytes-like object is required, not 'str'
Try changing wb+ to just w, this does the trick.
with open('result_id_with_total.csv', 'w') as newfile:
header = ['KEY ID', 'GENERAL ID' , 'TOTAL GENERAL ID']
wr = csv.writer(newfile, delimiter=',', quoting = csv.QUOTE_MINIMAL)

JSON like data to CSV file in python - not showing headers correctly

I am transforming JSON like data to CSV and having a few issues.
The code is here:
import json
import csv
def parse_file(inputed_file):
with open(input_file, 'r') as inputed_file:
content = inputed_file.readlines()
split_file = open('test.csv', 'w')
for line in content:
lines = line.split('\t')
data = json.loads(lines[0])
writer = csv.DictWriter(split_file, fieldnames = ["title", "firstname"], delimiter = ',')
writer.writeheader()
The problem is this is adding a header on each row for the data, I want to only have the header displayed once. Then add this for the data to go below the headers:
writer.writerow(data)
I have looked at this and tried it but failed: How can I convert JSON to CSV?.
Create the DictWriter outside the loop, and just call writer.writeheader() there. Then call writer.writerow() inside the loop.
def parse_file(inputed_file):
with open(input_file, 'r') as inputed_file:
content = inputed_file.readlines()
split_file = open('test.csv', 'w')
writer = csv.DictWriter(split_file, fieldnames = ["title", "firstname"], delimiter = ',')
writer.writeheader()
for line in content:
lines = line.split('\t')
data = json.loads(lines[0])
writer.writerow(data)

Saving a dictionary as a file with columns which i can continously add to

I have a dictionary of currencies here in JSON.
I will give a sample for simplicity:
{
"AED": "united Arab Emirates Dirham",
"AFN": "Afghan Afghani",
...
"ZWL": "Zimbabwean Dollar"
}
And I wish to add them to a file where i can continuously add different sets of currencies at different times. The file should have a column for the code name of currency (e.g. "AED") and another column for name.
I really don't know where to start. Help to point me in the right direction will be very much appreciated.
My code for the dictionary is as follows:
import json
import urllib.request
def _fetch_currencies():
f = urllib.request.urlopen(
'http://openexchangerates.org/api/currencies.json')
charset = f.info().get_param('charset', 'utf8')
data = f.read()
decoded = json.loads(data.decode(charset))
print(json.dumps(decoded, indent=4))
You could simply save your data in csv with one line per currency:
AED, united Arab Emirates Dirham
AFN, Afghan Afghani
ZWL, Zimbabwean Dollar
To do that, you might want to transform your dictionary into rows, but in this case, it's trivial, since it's just the pair (key, value):
rows = decoded.items()
Note, however, that the items will be in a random order, if you want the entries sorted, you can sort them before writing to the file:
rows.sort()
In the end, using the csv module:
import csv
with open('local_file.csv', 'w') as my_csv:
csv_writer = csv.writer(my_csv, delimiter=',')
csv_writer.writerows(sorted(decoded.items()))
Putting it all together:
import json
import urllib.request
import csv
def fetch_currencies():
f = urllib.request.urlopen('http://openexchangerates.org/api/currencies.json')
charset = f.info().get_param('charset', 'utf8')
data = f.read()
decoded = json.loads(data.decode(charset))
return decoded
def save_currencies(currencies, filename):
sorted_currencies = sorted(currencies.items())
with open(filename, 'w') as my_csv:
csv_writer = csv.writer(my_csv, delimiter=',')
csv_writer.wrtiterows(sorted_currencies)
save_currencies(fetch_currencies(), 'currencies.csv')
You can use csv.DictWriter to handle easily the save file process. As DictWriter handles with dictionaries and the result of json.loads is dict, DictWriter do the job simpler.
import csv
import json
import urllib.request
def _fetch_currencies():
f = urllib.request.urlopen('http://openexchangerates.org/api/currencies.json')
charset = f.info().get_param('charset', 'utf8')
data = f.read()
decoded = json.loads(data.decode(charset))
with open('names.csv', 'w') as csvfile:
fieldnames = ['code', 'country']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for code, country in decoded.items():
writer.writerow({'code': code, 'country': country})

Categories

Resources