JSONDecodeError: Expecting value: line 2 column 13 (char 15) - python

I have a nested json file which I got from json.
I am trying to convert it in to csv through python code.
I tried all the possible way to convert it to csv but couldn't succeed.
I also followed previous question and solution but didn't work for me.
My json format is
{
"d1" : ("value1"),
"d2" : (value2-int),
"d3" : [
{
"sub-d1" : sub-value1(int),
"sub-d2" : sub-value2(int),
"sub-d3" : sub-value3(int),
"sub-d4" : [
{
"sub-sub-d1" : "sub-sub-value3",
"sub-sub-d2" : sub-value3(int)
},
{
"sub-sub-d1" : sub-sub-value3(int),
"sub-sub-d2" : "sub-sub-value3"}
]
],
"sub-d5" : "sub-value4",
"sub-d6" : "sub-value5"
}
],
"d4" : "value3",
"d5" : "value4",
"d6" : "value5,
"d7" : "value6"
}
{ another entry with same pattern..and so on}
Some of the value and sub value has integers and str + int.
What I tried
import json
import csv
import requests
with open('./data/inverter.json', 'r') as myfile:
json_data = myfile.read()
def get_leaves(item, key=None):
if isinstance(item, dict):
leaves = {}
for i in item.keys():
leaves.update(get_leaves(item[i], i))
return leaves
elif isinstance(item, list):
leaves = {}
for i in item:
leaves.update(get_leaves(i, key))
return leaves
else:
return {key : item}
# First parse all entries to get the complete fieldname list
fieldnames = set()
for entry in json_data:
fieldnames.update(get_leaves(entry).keys())
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.DictWriter(f_output, fieldnames=sorted(fieldnames))
csv_output.writeheader()
csv_output.writerows(get_leaves(entry) for entry in json_data)
This one saves all my data in single column with split values.
I tried to use :
https://github.com/vinay20045/json-to-csv.git
but this also didn't work.
I also tried to parse and do simple trick with following code:
with open("./data/inverter.json") as data_file:
data = data_file.read()
#print(data)
data_content = json.loads(data)
print(data_content)
but it throws an error : 'JSONDecodeError: Expecting value: line 2 column 13 (char 15)'
Can any one help me to convert my nested json to csv ?
It would be appreciated.
Thank you

It looks like the NumberInt(234234) issue you describe was a bug in MongoDB: how to export mongodb without any wrapping with NumberInt(...)?
If you cannot fix it by upgrading MongoDB, I can recommend preprocessing the data with regular expressions and parsing it as regular JSON after that.
For the sake of example, let's say you've got "test.json" that looks like this, which is valid except for the NumberInt(...) stuff:
{
"d1" : "value1",
"d2" : NumberInt(1234),
"d3" : [
{
"sub-d1" : 123,
"sub-d2" : 123,
"sub-d3" : 123,
"sub-d4" : [
{
"sub-sub-d1" : "sub-sub-value3",
"sub-sub-d2" : NumberInt(123)
},
{
"sub-sub-d1" : 43242,
"sub-sub-d2" : "sub-sub-value3"
}
]
}
],
"d4" : "value3",
"d5" : "value4",
"d6" : "value5",
"d7" : "value6"
}
You could import this into Python as follows:
import re
import json
with open("test.json") as f:
data = f.read()
# This regular expression finds/replaces the NumberInt bits with just the contents
fixed_data = re.sub(r"NumberInt\((\d+)\)", r"\1", data)
loaded_data = json.loads(fixed_data)
print(json.dumps(loaded_data, indent=4))

Related

How to add to python dictionary without needing a key

I am scraping some information off the web and want to show write the information into a JSON file with this format:
[
{
"name" : "name1",
"value" : 1
},
{
"name" : "name2",
"value" : 2
},
{
"name" : "name3",
"value" : 3
},
{
"name" : "name4",
"value" : 4
},
{
"name" : "name5",
"value" : 5
}
]
I am looping through everything I am scraping but don't know how to convert that information to this format. I tried to create a dictionary and then add to it after every loop but it does not give me the output I want.
dictionary = None
name = None
value = None
for item in someList:
name = item.name
value = item.value
dictionary[""] = {"name": name, "value": value}
with open("data.json", "w") as file:
json.dump(dictionary, file, indent=4)
Try this:
import json
myList = [{"name": item.name, "value": item.value} for item in someList]
with open("data.json", "w") as file:
json.dump(myList, file, indent=4)
The answer was simpler than I thought. I just needed to make a list of dictionaries and use that list in the json.dumps() function. Like this:
myList = list()
name = None
value = None
for item in someList:
name = item.name
value = item.value
myList.append({"name": name, "value": value})
with open("data.json", "w") as file:
json.dump(myList, file, indent=4)
The format you show is a list not a dictionary. So you can make a list and append to it the different dictionaries.
arr = []
for item in someList:
dictionary.append({"name": item.name, "value": item.value})
with open("data.json", "w") as file:
json.dump(array, file, indent=4)

(Python Crash Course 16.8): json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I'm trying to convert this json file into a more readable one.
However, I'm getting an error message:
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
My code is:
import json
file = 'data/1_month.geojson'
with open(file) as f:
all_eq_data = json.load(f)
readable_file = 'data/readable_eq_data.json'
with open(readable_file, 'w') as f:
json.dump(all_eq_data, f, indent=4)
What might be a solution for this?
There is something wrong in the 1_month.geojson file, probably an invalid character at the beginning of the file. Instead of saving the data locally I suggest you download it directly from the website using requests
all_eq_data = requests.get('https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/1.0_month.geojson').json()
with open('readable_file.json', 'w') as f:
json.dump(all_eq_data, f, indent=4)
Your code works for me other than encoding.
Please re-check your json file (for example re-downloading).
path = r'your\path\Downloads\1.0_month.geojson'
import json
with open(path, encoding='utf-8') as f:
all_eq_data = json.load(f)
readable_file = path + '.readable.json'
with open(readable_file, 'w') as f:
json.dump(all_eq_data, f, indent=4)
While the OP asks for a python solution, I also wanted to share a one-liner that does this easily. Depending on your OS, you may have or can install a UNIX command called json_pp. It provides a prettified JSON output to stdout, which you can easily write to file. All you need is curl and this UNIX command:
curl https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/1.0_month.geojson | \
json_pp > readable_file.json
You'll get something like:
{
"metadata" : {
"status" : 200,
"api" : "1.10.3",
"title" : "USGS Magnitude 1.0+ Earthquakes, Past Month",
"url" : "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/1.0_month.geojson",
"generated" : 1626877469000,
"count" : 8488
},
"type" : "FeatureCollection",
"features" : [
{
"type" : "Feature",
"geometry" : {
"type" : "Point",
"coordinates" : [
-150.9063,
61.6748,
45.8
]
},
"properties" : {
"url" : "https://earthquake.usgs.gov/earthquakes/eventpage/ak0219aazwl5",
"code" : "0219aazwl5",
"net" : "ak",
"nst" : null,
"types" : ",origin,",
...
...

Python json issue - camel case and double quotes

I have to read a csv file and create a JSON list out of it. Currently , I first read each row, add to list and use JSON dumps to create the JSON output. There are 2 issues I am facing
JSON dumps adds single quotes to the attributes which is not what I want . I wanted each key-value pair to be enclosed in its own double quotes.
It is using the CSV file headers for the keys which are not in camel case but I need them in camel case
This is what my program produces
{
"Reqlist":[
{
'FieldName1' : 'val1'
},
{
'Fieldname2' : 'val2'
}
],
'metaData' : 'metaVal'
}
This is the output I expect
{
"Reqlist":[
{
"fieldName1" : "val1"
},
{
"fieldName2" : "val2"
}
],
"metaData" : "metaVal"
}
Sample code :
reader = csv.DictReader(open(file_data), restkey='INVALID', restval='INVALID')
headers = reader.fieldnames
error_count = 0
success_count =0
dict=[]
header_count = set(headers)
json_error_data = json.dumps({})
csv_list =[]
error_list={}
myDict =[]
print(headers)
if(len(header_count)!=constants.EXPECTED_HEADER_COUNT or (set(headers)!=set(constants.FIELD_NAMES))):
print('error for record')
else:
for row in reader:
if('INVALID' in row.values()):
error_count +=1
else:
success_count +=1
csv_list.append(row)
except Exception as e:
logging.error('error')
if(error_count>0 and success_count == 0 ):
print('save the errors')
elif(success_count>0):
jsonlist = json.dumps({'Reqlist': csv_list })
new = json.loads(jsonlist)
a_dict = {'metaData': 'metaVal'}
new.update(a_dict)
def convert_to_camel(dict1):
new_dict = {}
for key, value in dict1.items():
key = key[0].lower() + key[1:]
value = value if type(value) is not dict else convert_to_camel(value)
new_dict[key] = value
return new_dict
csv_list = [convert_to_camel(i) for i in csv_list]
This should work for camel case
For single quote thing, i have written a json-like library for python. It's not perfect, but here it is.

How to get an array of first elements from a json array

I have a config.json file, which contains an array of organisations:
config.json
{
"organisations": [
{ "displayName" : "org1", "bucketName" : "org1_bucket" },
{ "displayName" : "org2", "bucketName" : "org2_bucket" },
{ "displayName" : "org3", "bucketName" : "org3_bucket" }
]
}
How can I get an array of all organisation names?
This is what I have tried:
from python_json_config import ConfigBuilder
def read_config():
builder = ConfigBuilder()
org_array = builder.parse_config('config.json')
# return all firstNames in org_array
import json
def read_config():
display_names = []
with open('yourfilename.json', 'r', encoding="utf-8") as file:
orgs = json.load(file)
display_names = [ o["displayName"] for o in orgs["organizations"] ]
return display_names
Also, we don't have any way to know what happens with ConfigBuilder or builder.parse_config since we don't have access to that code, so sorry to not take into account your example
a = {
"organisations": [
{ "displayName" : "org1", "bucketName" : "org1_bucket" },
{ "displayName" : "org2", "bucketName" : "org2_bucket" },
{ "displayName" : "org3", "bucketName" : "org3_bucket" }
]
}
print([i["displayName"] for i in a["organisations"]])
Output:
['org1', 'org2', 'org3']
Use list comprehension, it's very easy. In order to read a json file.
import json
data = json.load(open("config.json"))
Use lambda with map to get array of only organizations names
>>> list(map(lambda i:i['displayName'],x['organisations']))
>>> ['org1', 'org2', 'org3']
If you want to read json data from file into dictionary you can achieve this as following.
import json
with open('config.json') as json_file:
data = json.load(json_file)
org_array = list(map(lambda i:i['displayName'],data['organisations']))

How to read a large JSON file using Python ijson?

I am trying to parse a big json file (hundreds of gigs) to extract information from its keys. For simplicity, consider the following example:
import random, string
# To create a random key
def random_string(length):
return "".join(random.choice(string.lowercase) for i in range(length))
# Create the dicitonary
dummy = {random_string(10): random.sample(range(1, 1000), 10) for times in range(15)}
# Dump the dictionary into a json file
with open("dummy.json", "w") as fp:
json.dump(dummy, fp)
Then, I use ijson in python 2.7 to parse the file:
file_name = "dummy.json"
with open(file_name, "r") as fp:
for key in dummy.keys():
print "key: ", key
parser = ijson.items(fp, str(key) + ".item")
for number in parser:
print number,
I was expecting to retrieve all the numbers in the lists corresponding to the keys of the dic. However, I got
IncompleteJSONError: Incomplete JSON data
I am aware of this post: Using python ijson to read a large json file with multiple json objects, but in my case I have a single json file, that is well formed, with a relative simple schema. Any ideas on how can I parse it? Thank you.
ijson has an iterator interface to deal with large JSON files allowing to read the file lazily. You can process the file in small chunks and save results somewhere else.
Calling ijson.parse() yields three values prefix, event, value
Some JSON:
{
"europe": [
{"name": "Paris", "type": "city"},
{"name": "Rhein", "type": "river"}
]
}
Code:
import ijson
data = ijson.parse(open(FILE_PATH, 'r'))
for prefix, event, value in data:
if event == 'string':
print(value)
Output:
Paris
city
Rhein
river
Reference: https://pypi.python.org/pypi/ijson
The sample json content file is given below: it has records of two people. It might as well have 2 million records.
[
{
"Name" : "Joy",
"Address" : "123 Main St",
"Schools" : [
"University of Chicago",
"Purdue University"
],
"Hobbies" : [
{
"Instrument" : "Guitar",
"Level" : "Expert"
},
{
"percussion" : "Drum",
"Level" : "Professional"
}
],
"Status" : "Student",
"id" : 111,
"AltID" : "J111"
},
{
"Name" : "Mary",
"Address" : "452 Jubal St",
"Schools" : [
"University of Pensylvania",
"Washington University"
],
"Hobbies" : [
{
"Instrument" : "Violin",
"Level" : "Expert"
},
{
"percussion" : "Piano",
"Level" : "Professional"
}
],
"Status" : "Employed",
"id" : 112,
"AltID" : "M112"
}
}
]
I created a generator which would return each person's record as a json object. The code would look like below. This is not the generator code. Changing couple of lines would make it a generator.
import json
curly_idx = []
jstr = ""
first_curly_found = False
with open("C:\\Users\\Rajeshs\\PycharmProjects\\Project1\\data\\test.json", 'r') as fp:
#Reading file line by line
line = fp.readline()
lnum = 0
while line:
for a in line:
if a == '{':
curly_idx.append(lnum)
first_curly_found = True
elif a == '}':
curly_idx.pop()
# when the right curly for every left curly is found,
# it would mean that one complete data element was read
if len(curly_idx) == 0 and first_curly_found:
jstr = f'{jstr}{line}'
jstr = jstr.rstrip()
jstr = jstr.rstrip(',')
jstr[:-1]
print("------------")
if len(jstr) > 10:
print("making json")
j = json.loads(jstr)
print(jstr)
jstr = ""
line = fp.readline()
lnum += 1
continue
if first_curly_found:
jstr = f'{jstr}{line}'
line = fp.readline()
lnum += 1
if lnum > 100:
break
You are starting more than one parsing iterations with the same file object without resetting it. The first call to ijson will work, but will move the file object to the end of the file; then the second time you pass the same.object to ijson it will complain because there is nothing to read from the file anymore.
Try opening the file each time you call ijson; alternatively you can seek to the beginning of the file after calling ijson so the file object can read your file data again.
if you are working with json with the following format you can use ijson.item()
sample json:
[
{"id":2,"cost":0,"test":0,"testid2":255909890011279,"test_id_3":0,"meeting":"daily","video":"paused"}
{"id":2,"cost":0,"test":0,"testid2":255909890011279,"test_id_3":0,"meeting":"daily","video":"paused"}
]
input = 'file.txt'
res=[]
if Path(input).suffix[1:].lower() == 'gz':
input_file_handle = gzip.open(input, mode='rb')
else:
input_file_handle = open(input, 'rb')
for json_row in ijson.items(input_file_handle,
'item'):
res.append(json_row)

Categories

Resources