I have to read a csv file and create a JSON list out of it. Currently , I first read each row, add to list and use JSON dumps to create the JSON output. There are 2 issues I am facing
JSON dumps adds single quotes to the attributes which is not what I want . I wanted each key-value pair to be enclosed in its own double quotes.
It is using the CSV file headers for the keys which are not in camel case but I need them in camel case
This is what my program produces
{
"Reqlist":[
{
'FieldName1' : 'val1'
},
{
'Fieldname2' : 'val2'
}
],
'metaData' : 'metaVal'
}
This is the output I expect
{
"Reqlist":[
{
"fieldName1" : "val1"
},
{
"fieldName2" : "val2"
}
],
"metaData" : "metaVal"
}
Sample code :
reader = csv.DictReader(open(file_data), restkey='INVALID', restval='INVALID')
headers = reader.fieldnames
error_count = 0
success_count =0
dict=[]
header_count = set(headers)
json_error_data = json.dumps({})
csv_list =[]
error_list={}
myDict =[]
print(headers)
if(len(header_count)!=constants.EXPECTED_HEADER_COUNT or (set(headers)!=set(constants.FIELD_NAMES))):
print('error for record')
else:
for row in reader:
if('INVALID' in row.values()):
error_count +=1
else:
success_count +=1
csv_list.append(row)
except Exception as e:
logging.error('error')
if(error_count>0 and success_count == 0 ):
print('save the errors')
elif(success_count>0):
jsonlist = json.dumps({'Reqlist': csv_list })
new = json.loads(jsonlist)
a_dict = {'metaData': 'metaVal'}
new.update(a_dict)
def convert_to_camel(dict1):
new_dict = {}
for key, value in dict1.items():
key = key[0].lower() + key[1:]
value = value if type(value) is not dict else convert_to_camel(value)
new_dict[key] = value
return new_dict
csv_list = [convert_to_camel(i) for i in csv_list]
This should work for camel case
For single quote thing, i have written a json-like library for python. It's not perfect, but here it is.
Related
im trying to print a json in python
"rules":[
{
"table":"Forest",
"format":"List",
"header":{"en":"Forest","fr":"Forêt"},
"fields":[
{
"name":"Name",
"displayName":{"en":"Forest","fr":"Forêt"}
},
{
"name":"ForestMode",
"displayName":{"en":"Forest Mode","fr":"Mode forêt"},
"ok":"re.search('Windows(2019|2016)Forest',x) != None",
"warn":"re.search('Windows(2012R2|2012)Forest',x) != None",
"nok":"re.search('Windows(2008R2|2008|2003|2003Interim|2000)Forest',x) != None",
"comment":{"en":"Increase the functional level of the forest","fr":"Augmenter le niveau fonctionnel de la forêt"}
},
{
"name":"RootDomain",
"displayName":{"en":"Root Domain","fr":"Domaine racine"}
},
{
"name":"Domains",
"displayName":{"en":"Domains","fr":"Domaines"}
},
{
"name":"Sites",
"displayName":{"en":"Sites","fr":"Sites"}
},
{
but i've run into an issue
some of the json data doent have the key while some do
i have written this thus far
with open('./rules-adds.json', 'r') as ad_file:
ad_data = json.load(ad_file)
# print(ad_data)
data = ad_data["rules"]
# print(data)
# print(json.dumps(ad_data, indent=4))
for x in data:
print(x['table'], x['fields'])
for y in x['fields']:
print(y['name'])
But i get an error since the first element of the json file doesn't have the "ok" key
print(y['ok'])
KeyError: 'ok'
Answer:
You can use the get function of a dictionary:
my_value = my_dict.get('some_key_name', 'in_case_not_found')
So my_value will contain the existing value, or a default value you define in case there key doesn't exist in the dictionary
You can also check if a key exists with an if:
if 'some_key_name' in my_dict:
print(my_dict['some_key_name'])
else:
print('Well, key is not there')
Extra tip:
Make sure you anem your variables as descriptible as possible.
So for field in fields ..., for attribute in my_dictionary ...
I have a nested json file which I got from json.
I am trying to convert it in to csv through python code.
I tried all the possible way to convert it to csv but couldn't succeed.
I also followed previous question and solution but didn't work for me.
My json format is
{
"d1" : ("value1"),
"d2" : (value2-int),
"d3" : [
{
"sub-d1" : sub-value1(int),
"sub-d2" : sub-value2(int),
"sub-d3" : sub-value3(int),
"sub-d4" : [
{
"sub-sub-d1" : "sub-sub-value3",
"sub-sub-d2" : sub-value3(int)
},
{
"sub-sub-d1" : sub-sub-value3(int),
"sub-sub-d2" : "sub-sub-value3"}
]
],
"sub-d5" : "sub-value4",
"sub-d6" : "sub-value5"
}
],
"d4" : "value3",
"d5" : "value4",
"d6" : "value5,
"d7" : "value6"
}
{ another entry with same pattern..and so on}
Some of the value and sub value has integers and str + int.
What I tried
import json
import csv
import requests
with open('./data/inverter.json', 'r') as myfile:
json_data = myfile.read()
def get_leaves(item, key=None):
if isinstance(item, dict):
leaves = {}
for i in item.keys():
leaves.update(get_leaves(item[i], i))
return leaves
elif isinstance(item, list):
leaves = {}
for i in item:
leaves.update(get_leaves(i, key))
return leaves
else:
return {key : item}
# First parse all entries to get the complete fieldname list
fieldnames = set()
for entry in json_data:
fieldnames.update(get_leaves(entry).keys())
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.DictWriter(f_output, fieldnames=sorted(fieldnames))
csv_output.writeheader()
csv_output.writerows(get_leaves(entry) for entry in json_data)
This one saves all my data in single column with split values.
I tried to use :
https://github.com/vinay20045/json-to-csv.git
but this also didn't work.
I also tried to parse and do simple trick with following code:
with open("./data/inverter.json") as data_file:
data = data_file.read()
#print(data)
data_content = json.loads(data)
print(data_content)
but it throws an error : 'JSONDecodeError: Expecting value: line 2 column 13 (char 15)'
Can any one help me to convert my nested json to csv ?
It would be appreciated.
Thank you
It looks like the NumberInt(234234) issue you describe was a bug in MongoDB: how to export mongodb without any wrapping with NumberInt(...)?
If you cannot fix it by upgrading MongoDB, I can recommend preprocessing the data with regular expressions and parsing it as regular JSON after that.
For the sake of example, let's say you've got "test.json" that looks like this, which is valid except for the NumberInt(...) stuff:
{
"d1" : "value1",
"d2" : NumberInt(1234),
"d3" : [
{
"sub-d1" : 123,
"sub-d2" : 123,
"sub-d3" : 123,
"sub-d4" : [
{
"sub-sub-d1" : "sub-sub-value3",
"sub-sub-d2" : NumberInt(123)
},
{
"sub-sub-d1" : 43242,
"sub-sub-d2" : "sub-sub-value3"
}
]
}
],
"d4" : "value3",
"d5" : "value4",
"d6" : "value5",
"d7" : "value6"
}
You could import this into Python as follows:
import re
import json
with open("test.json") as f:
data = f.read()
# This regular expression finds/replaces the NumberInt bits with just the contents
fixed_data = re.sub(r"NumberInt\((\d+)\)", r"\1", data)
loaded_data = json.loads(fixed_data)
print(json.dumps(loaded_data, indent=4))
i've got a dot delimited string which I need to convert to Json. This is an example with different types of strings:
my.dictionary.value -> value
my.dictionary.list[0].value -> value
my.dictionary.list[1].value.list[0].value -> value
I have no problems converting the first type of string using a recursive approach:
def put(d, keys, item):
if "." in keys:
key, rest = keys.split(".", 1)
if key not in d:
d[key] = {}
put(d[key], rest, item)
else:
d[keys] = item
But i'm struggling to find a solution for the lists. Is there a library that provides out of the box string to json conversion? Thank you for your time.
AFAIK, there isn't any modules that would do this
Here is a sample code to converted a series of dotted strings into json format. You just have create a new list when you see the pattern [n] in the string that would be used as a key.
import re
import json
def parse_dotted_strlines(strlines):
res= {}
for line in strlines.splitlines():
parse_dotted_str(line, res)
return res
def parse_dotted_str(s, res):
if '.' in s:
key, rest = s.split('.', 1)
# Check if key represents a list
match = re.search(r'(.*)\[(\d)\]$', key)
if match:
# List
key, index = match.groups()
val = res.get(key, {}) or []
assert type(val) == list, f'Cannot set key {key} as of type list as i
t was earlier marked as {type(val)}'
while len(val) <= int(index):
val.append({})
val[index] = parse_dotted_str(rest, {})
res[key] = val
else:
# Dict
res[key] = parse_dotted_str(rest, res.get(key, {}))
elif '->' in s:
key, val = s.split('->')
res[key.strip()] = val.strip()
return res
Sample input and output
lines = """
my.dictionary.value -> value
my.dictionary.list[0].value -> value
my.dictionary.list[1].value.list[0].value -> value
"""
res = parse_dotted_strlines(lines)
print (json.dumps(res, indent=4))
{
"my": {
"dictionary": {
"value": "value",
"list": [
{
"value": "value"
},
{
"value": {
"list": [
{
"value": "value"
}
]
}
}
]
}
}
}
the json package is what you need
import json
mydict = """{
"str1": "str",
"list1": ["list1_str1", "list1_str2"],
"list2": ["list2_str1", "list2_str2", ["list2_str11", "list_str12"]]
}"""
json.loads(mydict)
>> {'str1': 'str',
'list1': ['list1_str1', 'list1_str2'],
'list2': ['list2_str1', 'list2_str2', ['list2_str11', 'list_str12']]}
I am new to pyspark . My requirement is to get/extract the attribute names from a nested json file . I tried using json_normalize imported from pandas package. It works for direct attributes but never fetches the attributes within json array attributes. My json doesn't have a static structure. It varies for each document that we receive. Could someone please help me with explanation for the small example provided below,
{
"id":"1",
"name":"a",
"salaries":[
{
"salary":"1000"
},
{
"salary":"5000"
}
],
"states":{
"state":"Karnataka",
"cities":[
{
"city":"Bangalore"
},
{
"city":"Mysore"
}
],
"state":"Tamil Nadu",
"cities":[
{
"city":"Chennai"
},
{
"city":"Coimbatore"
}
]
}
}
Especially for the json array elements..
Expected output :
id
name
salaries.salary
states.state
states.cities.city``
Here is the another solution for extracting all nested attributes from json
import json
result_set = set([])
def parse_json_array(json_obj, parent_path):
array_obj = list(json_obj)
for i in range(0, len(array_obj)):
json_ob = array_obj[i]
if type(json_obj) == type(json_obj):
parse_json(json_ob, parent_path)
return None
def parse_json(json_obj, parent_path):
for key in json_obj.keys():
key_value = json_obj.get(key)
# if isinstance(a, dict):
if type(key_value) == type(json_obj):
parse_json(key_value, str(key) if parent_path == "" else parent_path + "." + str(key))
elif type(key_value) == type(list(json_obj)):
parse_json_array(key_value, str(key) if parent_path == "" else parent_path + "." + str(key))
result_set.add((parent_path + "." + key).encode('ascii', 'ignore'))
return None
file_name = "C:/input/sample.json"
file_data = open(file_name, "r")
json_data = json.load(file_data)
print json_data
parse_json(json_data, "")
print list(result_set)
Output:
{u'states': {u'state': u'Tamil Nadu', u'cities': [{u'city': u'Chennai'}, {u'city': u'Coimbatore'}]}, u'id': u'1', u'salaries': [{u'salary': u'1000'}, {u'salary': u'5000'}], u'name': u'a'}
['states.cities.city', 'states.cities', '.id', 'states.state', 'salaries.salary', '.salaries', '.states', '.name']
Note:
My Python version: 2.7
you can do in this way also.
data = { "id":"1", "name":"a", "salaries":[ { "salary":"1000" }, { "salary":"5000" } ], "states":{ "state":"Karnataka", "cities":[ { "city":"Bangalore" }, { "city":"Mysore" } ], "state":"Tamil Nadu", "cities":[ { "city":"Chennai" }, { "city":"Coimbatore" } ] } }
def dict_ittr(lin,data):
for k, v in data.items():
if type(v)is list:
for l in v:
dict_ittr(lin+"."+k,l)
elif type(v)is dict:
dict_ittr(lin+"."+k,v)
pass
else:
print lin+"."+k
dict_ittr("",data)
output
.states.state
.states.cities.city
.states.cities.city
.id
.salaries.salary
.salaries.salary
.name
If you treat the json like a python dictionary, this should work.
I just wrote a simple recursive program.
Script
import json
def js_r(filename):
with open(filename) as f_in:
return(json.load(f_in))
g = js_r("city.json")
answer_d = {}
def base_line(g, answer_d):
for key in g.keys():
answer_d[key] = {}
return answer_d
answer_d = base_line(g, answer_d)
def recurser_func(g, answer_d):
for k in g.keys():
if type(g[k]) == type([]): #If the value is a list
answer_d[k] = {list(g[k][0].keys())[0]:{}}
if type(g[k]) == type({}): #If the value is a dictionary
answer_d[k] = {list(g[k].keys())[0]: {}} #set key equal to
answer_d[k] = recurser_func(g[k], answer_d[k])
return answer_d
recurser_func(g,answer_d)
def printer_func(answer_d, list_to_print, parent):
for k in answer_d.keys():
if len(answer_d[k].keys()) == 1:
list_to_print.append(parent)
list_to_print[-1] += k
list_to_print[-1] += "." + str(list(answer_d[k].keys())[0])
if len(answer_d[k].keys()) == 0:
list_to_print.append(parent)
list_to_print[-1] += k
if len(answer_d[k].keys()) > 1:
printer_func(answer_d[k], list_to_print, k + ".")
return list_to_print
l = printer_func(answer_d, [], "")
final = " ".join(l)
print(final)
Explanation
base_line makes a dictionary of all your base keys.
recursur_func checks if the key's value is a list or dict then adds to the answer dictionary as is necessary until answer_d looks like: {'id': {}, 'name': {}, 'salaries': {'salary': {}}, 'states': {'state': {}, 'cities': {'city': {}}}}
After these 2 functions are called you have a dictionary of keys in a sense. Then printer_func is a recursive function to print it as you desired.
NOTE:
Your question is similar to this one: Get all keys of a nested dictionary but since you have a nested list/dictionary instead of just a nested dictionary, their answers won't work for you, but there is more discussion on the topic on that question if you like more info
EDIT 1
my python version is 3.7.1
I have added a json file opener to the top. I assume that the json is named city.json and is in the same directory
EDIT 2: More thorough explanation
The main difficulty that I found with dealing with your data is the fact that you can have infinitely nested lists and dictionaries. This makes it complicated. Since it was infinite possible nesting, I new this was a recursion problem.
So, I build a dictionary of dictionaries representing the key structure that you are looking for. Firstly I start with the baseline.
base_line makes {'id': {}, 'name': {}, 'salaries': {}, 'states': {}} This is a dictionary of empty dictionaries. I know that when you print. Every key structure (like states.state) starts with one of these words.
recursion
Then I add all the child keys using recursur_func.
When given a dictionary g this function for loop through all the keys in that dictionary and (assuming answer_d has each key that g has) for each key will add that keys child to answer_d.
If the child is a dictionary. Then I recurse with the given dictionary g now being the sub-part of the dictionary that pertains to the children, and answer_d being the sub_part of answer_d that pertains to the child.
I am trying to parse a big json file (hundreds of gigs) to extract information from its keys. For simplicity, consider the following example:
import random, string
# To create a random key
def random_string(length):
return "".join(random.choice(string.lowercase) for i in range(length))
# Create the dicitonary
dummy = {random_string(10): random.sample(range(1, 1000), 10) for times in range(15)}
# Dump the dictionary into a json file
with open("dummy.json", "w") as fp:
json.dump(dummy, fp)
Then, I use ijson in python 2.7 to parse the file:
file_name = "dummy.json"
with open(file_name, "r") as fp:
for key in dummy.keys():
print "key: ", key
parser = ijson.items(fp, str(key) + ".item")
for number in parser:
print number,
I was expecting to retrieve all the numbers in the lists corresponding to the keys of the dic. However, I got
IncompleteJSONError: Incomplete JSON data
I am aware of this post: Using python ijson to read a large json file with multiple json objects, but in my case I have a single json file, that is well formed, with a relative simple schema. Any ideas on how can I parse it? Thank you.
ijson has an iterator interface to deal with large JSON files allowing to read the file lazily. You can process the file in small chunks and save results somewhere else.
Calling ijson.parse() yields three values prefix, event, value
Some JSON:
{
"europe": [
{"name": "Paris", "type": "city"},
{"name": "Rhein", "type": "river"}
]
}
Code:
import ijson
data = ijson.parse(open(FILE_PATH, 'r'))
for prefix, event, value in data:
if event == 'string':
print(value)
Output:
Paris
city
Rhein
river
Reference: https://pypi.python.org/pypi/ijson
The sample json content file is given below: it has records of two people. It might as well have 2 million records.
[
{
"Name" : "Joy",
"Address" : "123 Main St",
"Schools" : [
"University of Chicago",
"Purdue University"
],
"Hobbies" : [
{
"Instrument" : "Guitar",
"Level" : "Expert"
},
{
"percussion" : "Drum",
"Level" : "Professional"
}
],
"Status" : "Student",
"id" : 111,
"AltID" : "J111"
},
{
"Name" : "Mary",
"Address" : "452 Jubal St",
"Schools" : [
"University of Pensylvania",
"Washington University"
],
"Hobbies" : [
{
"Instrument" : "Violin",
"Level" : "Expert"
},
{
"percussion" : "Piano",
"Level" : "Professional"
}
],
"Status" : "Employed",
"id" : 112,
"AltID" : "M112"
}
}
]
I created a generator which would return each person's record as a json object. The code would look like below. This is not the generator code. Changing couple of lines would make it a generator.
import json
curly_idx = []
jstr = ""
first_curly_found = False
with open("C:\\Users\\Rajeshs\\PycharmProjects\\Project1\\data\\test.json", 'r') as fp:
#Reading file line by line
line = fp.readline()
lnum = 0
while line:
for a in line:
if a == '{':
curly_idx.append(lnum)
first_curly_found = True
elif a == '}':
curly_idx.pop()
# when the right curly for every left curly is found,
# it would mean that one complete data element was read
if len(curly_idx) == 0 and first_curly_found:
jstr = f'{jstr}{line}'
jstr = jstr.rstrip()
jstr = jstr.rstrip(',')
jstr[:-1]
print("------------")
if len(jstr) > 10:
print("making json")
j = json.loads(jstr)
print(jstr)
jstr = ""
line = fp.readline()
lnum += 1
continue
if first_curly_found:
jstr = f'{jstr}{line}'
line = fp.readline()
lnum += 1
if lnum > 100:
break
You are starting more than one parsing iterations with the same file object without resetting it. The first call to ijson will work, but will move the file object to the end of the file; then the second time you pass the same.object to ijson it will complain because there is nothing to read from the file anymore.
Try opening the file each time you call ijson; alternatively you can seek to the beginning of the file after calling ijson so the file object can read your file data again.
if you are working with json with the following format you can use ijson.item()
sample json:
[
{"id":2,"cost":0,"test":0,"testid2":255909890011279,"test_id_3":0,"meeting":"daily","video":"paused"}
{"id":2,"cost":0,"test":0,"testid2":255909890011279,"test_id_3":0,"meeting":"daily","video":"paused"}
]
input = 'file.txt'
res=[]
if Path(input).suffix[1:].lower() == 'gz':
input_file_handle = gzip.open(input, mode='rb')
else:
input_file_handle = open(input, 'rb')
for json_row in ijson.items(input_file_handle,
'item'):
res.append(json_row)