Generate a json file - python

I need to generate a .json file which has data in the following format:
{"cnt":[1950,1600,400,1250,995],
"dt":["2020-01","2020-02","2020-03","2020-04","2020-05"]}
I would prefer it getting generated by querying a table or using a CSV to JSON conversion. The format data I will have after querying or in my CSV file will be:
How to do this?

import csv
import json
with open('csv_file_path') as f:
dict_reader = csv.DictReader(f)
dicts = [dict(i) for i in dict_reader]
field_names = dict_reader.fieldnames #get the column headings # CNT,DT etc..
output_dict = {}
for item in field_names:
output_dict.setdefault(item,[])
for d in dicts:
for key in d:
output_dict[key].append(d[key])
with open('josn_file_path', 'w+') as f:
f.write(json.dumps(output_dict, indent=4))
Tested and works fine.

This example will turn your row-based data into a dict-based version.
Please keep in mind that I didn't test this - but it should work fine.
In essence this is what's happening:
Read the source data
Determine the headings you need for the dict
Fill the new data-format from your source_data
Dump this new format to a json file.
Code:
import csv
import json
# Read the source data
with open('path_to_csv_file') as f:
source_data = [i for i in csv.DictReader(f)]
# Discover headers, and prep dict framework.
target_data = {key: [] for key in source_data[0].keys()}
# Iterate over the source_data and append the values to the righ key in target_data
for row in source_data:
for k, v in row.items():
target_data[k].append(v)
# Write target data to json file
with open('path_to_json_file', 'w') as f:
json.dump(data, f)

Related

multiple json to csv using pandas python

trying to convert multiple json files to 1 csv file
tried 2 ways,
first one using pandas ,
second using json and csv writer
about my json
keys are unordered and some keys are different in every file
code using writer
file_list=os.listdir('output')
count = 0
for file in file_list:
dict={}
file_path = "output/" + file
with open(file_path,'r') as f:
jsonData=json.load(f)
datafile=open('data.csv','a')
csv_writer = csv.writer(datafile)
if count == 0:
header = jsonData.keys()
csv_writer.writerow(header)
count += 1
csv_writer.writerow(jsonData.values())
if count == 1:
csv_writer.writerow(jsonData.values())
datafile.close()
problem
bcoz my data is unordered and different keys so in my csv file wrong value is coming under wrong header
code using pandas
for file in file_list:
dict={}
file_path = "output/" + file
with open(file_path,'r') as f:
jsonData=json.load(f)
for j in jsonData:
dict.update({j:[jsonData[j]]})
df=pd.DataFrame(dict)
df.to_csv("hello.csv")
problem
i dont know how to append in pandas
so this is showing only 2 rows bcoz of my last json file i guess
inside my json
Try this code:
import pandas as pd
import json
import pathlib
data_path = pathlib.Path('.')
keys = ['Solutions', 'account_number', 'actual_reading_current','actual_reading_previous', 'address', 'amount_due']
dat = dict([(k, []) for k in keys])
for jfile in data_path.glob('*.json'):
with jfile.open('r') as ifile:
json_data = json.load(ifile)
for key in keys:
dat[key].append(json_data[key][0] if key in json_data else None)
result = pd.DataFrame.from_dict(dat)
result.to_csv('result.csv')
I first define a dictionary containing the columns that I want.
Then I read in the json files and append them as rows to the dictionary.
Note, that I had to edit your json files, one was missing a ending quote and I had to replace the single quotes by double quotes.

Using readlines and somehow skip the third column from comparison in two csv files

Old.csv:
name,department
leona,IT
New.csv:
name,department
leona,IT
lewis,Tax
With the same two columns, finding the new values from New.csv and update Old.csv with those works fine with the code below
feed = []
headers = []
with open("Old.csv", 'r') as t1, open("New.csv", 'r') as t2:
for header in t1.readline().split(','):
headers.append(header.rstrip())
fileone = t1.readlines()
filetwo = t2.readlines()[1:] # Skip csv fieldnames
for line in filetwo:
if line not in fileone:
lineItems = {}
feed.append(line.strip()) # For old file update
New problem:
1/ Add a 3rd column to store timestamp values
2/ Skip the 3rd column (timestamp) in both files and still need to compare two files for differences based on the 1st and 2nd columns
3/ Old file will be updated with the new values on all 3 columns
I tried the slicing method split(',')[0:2] but didn't seem to work at all. I feel there is just some small updates to the existing code but not sure how I can achieve that.
Expected outcome:
Old.csv:
name,department,timestamp
leona,IT,07/20/2020 <--- Existing value
lewis,Tax,08/25/2020 <--- New value from New.csv
New.csv:
name,department,timestamp
leona,IT,07/20/2020
leona,IT,07/25/2020
lewis,Tax,08/25/2020
You can do it all yourself, but why not use the tools built in to Python?
from csv import reader
feed = []
with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
old = reader(t1)
new = reader(t2)
headers = next(old)
# skip header in new
next(new)
# relevant data is only the first two columns
old_data = [rec[:2] for rec in old]
for rec in new:
if rec[:2] not in old_data:
feed.append(rec)
print(headers)
print(feed)
Result:
['name', 'department']
[['lewis', 'Tax']]
Note that you'll get this result with the data you provided, but if you add a third column, the code still works as expected and will add that data to the feed result.
To get feed to be a list of dictionaries, which you can easily turn into JSON, you could do something like:
feed.append(dict(zip(headers, rec)))
Turning feed into json is as simple as:
import json
print(json.dumps(feed))
The whole solution:
import json
from csv import reader
feed = []
with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
old = reader(t1)
new = reader(t2)
headers = next(old)
# skip header in new
next(new)
# relevant data is only the first two columns
old_data = [rec[:2] for rec in old]
for rec in new:
if rec[:2] not in old_data:
feed.append(dict(zip(headers, rec)))
print(json.dumps(feed))
With outputs like:
[{"name": "lewis", "department": "Tax", "timestamp": "08/25/2020"}]

How would I write to a CSV file from a python dictionary of dictionaries while some values are empty

I have a Python dictionary of dictionaries and have date stored that i need to write to a CSV file.
the problem i'm having is that some of the dictionaries from the file i have read don't contain any information for that particular ID. So my CSV file column are not lined up properly .
example
d["first1"]["title"] = founder
d["first1"]["started"] = 2005
d["second1"]["title"] = CEO
d["second1"]["favcolour"] = blue
and so when i use the following code:
for key, value in d.iteritems():
ln = [key]
for ikey, ivalue in value.iteritems():
ln.append(ikey)
ln.extend([v for v in ivalue])
writer.writerow(ln)
my CSV file will have all the information but the "started" and "favcolour" are in the same column i want it so that the columns only contain one .
Thanks all in advance
Here's a suggestion:
d = {"first1": {"title": 'founder', "started": 2005}, "second1": {"title": 'CEO', "favcolour": 'blue'}}
columns = []
output = []
for key, value in d.iteritems():
for ikey, ivalue in value.iteritems():
if ikey not in columns:
columns.append(ikey)
ln = []
for col in columns:
if col not in value:
ln.append('')
else:
ln.append(value[col])
output.append(ln)
with open('file', 'w') as fl:
csv_writer = csv.writer(fl)
csv_writer.writerow(columns)
for ln in output:
print ln
csv_writer.writerow(ln)
file:
started,title,favcolour
2005,founder
,CEO,blue
If it doesn't need to be human-readable, you can use alternatively pickle:
import pickle
# Write:
with open('filename.pickle', 'wb') as handle:
pickle.dump(d, handle)
# Read:
with open('filename.pickle', 'rb') as handle:
d = pickle.load(handle)
You can use the DictWriter class in csv to easily append what would be a sparse dictionary into a CSV. The only caveat is you need to know all the possible fields at the beginning.
import csv
data = { "first": {}, "second": {} }
data["first"]["title"] = "founder"
data["first"]["started"] = 2005
data["second"]["title"] = "CEO"
data["second"]["favcolour"] = "blue"
fieldNames = set()
for d in data:
for key in data[d].keys():
# Add all possible keys to fieldNames, beacuse fieldNames is
# a set, you can't have duplicate values
fieldNames.add(key)
with open('csvFile.csv', 'w') as csvfile:
# Initialize DictWriter with the list of fieldNames
# You can sort fieldNames to whatever order you wish the CSV
# headers to be in.
writer = csv.DictWriter(csvfile, fieldnames=list(fieldNames))
# Add Header to the CSV file
writer.writeheader()
# Iterate through all sub-dictionaries
for d in data:
# Add the sub-dictionary to the csv file
writer.writerow(data[d])
Pandas works really well for things like this, so if it's an option, I would recommend it.
import pandas as pd
#not necessary, but for me it's usually easier to work with a list of dicts than dicts
my_list = [my_dict[key] for key in my_dict]
# When you pass a list of dictionaries to pandas DataFrame class, it will take care of
#alignment issues for you, but if you're wanting to do something specific
#with None values, you will need to further manipulate the frame
df = pd.DataFrame(my_list)
df.to_csv('file_path_to_save_to')

Parsing .DAT file with Python

I need to convert a .dat file that's in a specific format into a .csv file.
The .dat file has multiple rows with a repeating structure. The data is held in brackets and have tags. Below is the sample data; it repeats throughout the data file:
{"name":"ABSDSDSRF","ID":"AFJDKGFGHF","lat":37,"lng":-122,"type":0,"HAC":5,"verticalAccuracy":4,"course":266.8359375,"area":"san_francisco"}
Can anyone provide a starting point for the script?
This will create a csv assuming each line in your .DAT is json. Just order the header list to your liking
import csv, json
header = ['ID', 'name', 'type', 'area', 'HAC', 'verticalAccuracy', 'course', 'lat', 'lng']
with open('file.DAT') as datfile:
with open('output.csv', 'wb') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=header)
writer.writeheader()
for line in datfile:
writer.writerow(json.loads(line))
Your row is in json format. So, you can use:
import json
data = json.loads('{"name":"ABSDSDSRF","ID":"AFJDKGFGHF","lat":37,"lng":-122,"type":0,"HAC":5,"verticalAccuracy":4,"course":266.8359375,"area":"san_francisco"}')
print data.get('name')
print data.get('ID')
This is only a start point. You have to iter all the .dat file. At the end, you have to write an exporter to save the data into the csv file.
Use a regex to find all of the data items. Use ast.literal_eval to convert each data item into a dictionary. Collect the items in a list.
import re, ast
result = []
s = '''{"name":"ABSDSDSRF","ID":"AFJDKGFGHF","lat":37,"lng":-122,"type":0,"HAC":5,"verticalAccuracy":4,"course":266.8359375,"area":"san_francisco"}'''
item = re.compile(r'{[^}]*?}')
for match in item.finditer(s):
d = ast.literal_eval(match.group())
result.append(d)
If each data item is on a separate line in the file You don't need the regex - you can just iterate over the file.
with open('file.dat') as f:
for line in f:
line = line.strip()
line = ast.literal_eval(line)
result.append(line)
Use json.load:
import json
with open (filename) as fh:
data = json.load (fh)
...

Convert a csv to a dictionary with multiple values?

I have a csv file like this:
pos,place
6696,266835
6698,266835
938,176299
940,176299
941,176299
947,176299
948,176299
949,176299
950,176299
951,176299
770,272944
2751,190650
2752,190650
2753,190650
I want to convert it to a dictionary like the following:
{266835:[6696,6698],176299:[938,940,941,947,948,949,950,951],190650:[2751,2752,2753]}
And then, fill the missing numbers in the range in the values:
{{266835:[6696,6697,6698],176299:[938,939,940,941,942,943,944,945,946947,948,949,950,951],190650:[2751,2752,2753]}
}
Right now i have tried to build the dictionary using solution suggested here, but it overwrites the old value with new one.
Any help would be great.
Here is a function that i wrote for converting csv2dict
def csv2dict(filename):
"""
reads in a two column csv file, and the converts it into dictionary
"""
import csv
with open(filename) as f:
f.readline()#ignore first line
reader=csv.reader(f,delimiter=',')
mydict=dict((rows[1],rows[0]) for rows in reader)
return mydict
Easiest is to use collections.defaultdict() with a list:
import csv
from collections import defaultdict
data = defaultdict(list)
with open(inputfilename, 'rb') as infh:
reader = csv.reader(infh)
next(reader, None) # skip the header
for col1, col2 in reader:
data[col2].append(int(col1))
if len(data[col2]) > 1:
data[col2] = range(min(data[col2]), max(data[col2]) + 1)
This also expands the ranges on the fly as you read the data.
Based on what you have tried -
from collections import default dict
# open archive reader
myFile = open ("myfile.csv","rb")
archive = csv.reader(myFile, delimiter=',')
arch_dict = defaultdict(list)
for rows in archive:
arch_dict[row[1]].append(row[0])
print arch_dict

Categories

Resources