I have a JSON file named debug.json that I created in Python3.3 that looks like this:
{"TIME": 55.55, "ID":155,"DATA": [17,22,33,44,55]}{"TIME": 56.55, "ID":195,"DATA": [17,22,ff,44,55]}
I'm trying to load it with the following code:
import json
with open("debug.json",'r',encoding='utf-8') as f:
testing = json.loads(f.read())
However when I try this I get the following error:
ValueError: Extra data line 1 column 92
This is where the second JSON object starts in the text file...I'm guessing that I am missing something pretty trivial here but I haven't found any examples that relate to my problem. Any help is appreciated!
Use json.JSONDecoder.raw_decode, which accepts JSON with extra data at the end, such as another JSON object, and returns a tuple with the first object decoded and the position of the next object.
Example with your JSON :
import json
js = """{"TIME": 55.55, "ID":155,"DATA": [17,22,33,44,55]}{"TIME": 56.55, "ID":195,"DATA": [17,22,ff,44,55]}"""
json.JSONDecoder().raw_decode(js) # ({'TIME': 55.55, 'DATA': [17, 22, 33, 44, 55], 'ID': 155}, 50)
js[50:] # '{"TIME": 56.55, "ID":195,"DATA": [17,22,ff,44,55]}'
As you can see, it successfully decoded the first object and told us where the next object starts (in this case at the 50th character).
Here is a function that I made that can decode multiple JSON objects and returns a list with all of them :
def multipleJSONDecode(js):
result = []
while js:
obj, pos = json.JSONDecoder().raw_decode(js)
result.append(obj)
js = js[pos:]
return result
When you create the file, make sure that you have at most one valid JSON string per line. Then, when you need to read them back out, you can loop over the lines in the file one at a time:
import json
testing = []
with open("debug.json",'r',encoding='utf-8') as f:
for line in f:
testing.append(json.loads(line))
Related
So I have a string that contains data below
https://myanimelist.net/animelist/domis1/load.json?status=2&offset=0.
I want to find all 'anime_id' and put them into the list (only numbers).
I tried with find('anime_id'), but I can't do this for multiple occurings in the string.
Here is an example, how to extract anime_id from a json file called test.json, using built-in json module:
import json
with open('test.json') as f:
data = json.load(f)
# Create generator and search for anime_id
gen = (i['anime_id'] for i in data)
# If needed, iterate over generator and create a list
gen_list = list(gen)
# Print list on console
print(gen_list)
Your string is in json format, you can parse it with the builtin json module.
import json
data = json.loads(your_string)
for d in data:
print(d["anime_id"])
I want to read a dictionary from a text file. This dictionary seems like {'key': [1, ord('#')]}. I read about eval() and literal_eval(), but none of those two will work due to ord().
I also tried json.loads and json.dumps, but no positive results.
Which other way could I use to do it?
So Assuming you read the text file in with open as a string and not with json.loads you could do some simple regex searching for what is between the parenthesis of ord e.g ord('#') -> #
This is a minimal solution that reads everything from the file as a single string then finds all instances of ord and places the integer representation in an output list called ord_. For testing this example myfile.txt was a text file with the following in it
{"key": [1, "ord('#')"],
"key2": [1, "ord('K')"]}
import json
import re
with open(r"myfile.txt") as f:
json_ = "".join([line.rstrip("\n") for line in f])
rgx = re.compile(r"ord\(([^\)]+)\)")
rgd = rgx.findall(json_)
ord_ = [ord(str_.replace(r"'", "")) for str_ in rgd]
json.dump() and json.load() will not work because ord() is not JSON Serializable (meaning that the function cannot be a JSON object.
Yes, eval is really bad practice, I would never recommend it to anyone for any use.
The best way I can think of to solve this is to use conditions and an extra list.
# data.json = {'key': [1, ['ord', '#']]} # first one is function name, second is arg
with open("data.json") as f:
data = json.load(f)
# data['key'][1][0] is "ord"
if data['key'][1][0] == "ord":
res = ord(data['key'][1][1])
I am trying to extract some data from JSON files, which are have all the same structure and then write the chosen data into a new JSON file. My goal is to create a new JSON file which is more or less a list of each JSON file in my folder with the data:
Filename, triggerdata, velocity {imgVel, trigVel}, coordinates.
In a further step of my programme, I will need this new splitTest1 for analysing the data of the different files.
I have the following code:
base_dir = 'mypath'
def createJsonFile() :
splitTest1 = {}
splitTest1['20mm PSL'] = []
for file in os.listdir(base_dir):
# If file is a json, construct it's full path and open it, append all json data to list
if 'json' in file:
json_path = os.path.join(base_dir, file)
json_data = pd.read_json(json_path, lines=True)
if splitTest1[file]['20mm PSL'] == to_find:
splitTest1['20mm PSL'].append({
'filename': os.path.basename(base_dir),
'triggerdata': ['rawData']['adcDump']['0B'],
'velocity': {
'imgVel': ['computedData']['particleProperties']['imgVelocity'],
'trigVel': ['computedData']['img0Properties']['coordinates']},
'coordinates': ['computedData']['img1Properties']['coordinates']})
print(len(splitTest1))
When I run the code, I get this error:
'triggerdata': ['rawData']['adcDump']['0B'], TypeError: list indices must be integers or slices, not str
What is wrong with the code? How do I fix this?
This is my previous code how I accessed that data without saving it in another JSON File:
with open('myJsonFile.json') as f0:
d0 = json.load(f0)
y00B = d0['rawData']['adcDump']['0B']
x = np.arange(0, (2048 * 0.004), 0.004) # in ms, 2048 Samples, 4us
def getData():
return y00B, x
def getVel():
imgV = d0['computedData']['particleProperties']['imgVelocity']
trigV = d0['computedData']['trigger']['trigVelocity']
return imgV, trigV
Basically, I am trying to put this last code snippet into a loop which is reading all my JSON files in my folder and make a new JSON file with a list of the names of these files and some other chosen data (like the ['rawData']['adcDump']['0B'], etc)
Hope this helps understanding my problem better
I assume what you want to do is to take some data from several json files and compile those into a list and write that into a new json file.
In order to get the data from your current json file you'll need to add a "reference" to it in front of the indices (otherwise the code has no idea where it should that data from). Like so:
base_dir = 'mypath'
def createJsonFile() :
splitTest1 = {}
splitTest1['20mm PSL'] = []
for file in os.listdir(base_dir):
# If file is a json, construct it's full path and open it, append all json data to list
if 'json' in file:
json_path = os.path.join(base_dir, file)
json_data = pd.read_json(json_path, lines=True)
if splitTest1[file]['20mm PSL'] == to_find:
splitTest1['20mm PSL'].append({
'filename': os.path.basename(base_dir),
'triggerdata': json_data['rawData']['adcDump']['0B'],
'velocity': {
'imgVel': json_data['computedData']['particleProperties']['imgVelocity'],
'trigVel': json_data['computedData']['img0Properties']['coordinates']},
'coordinates': json_data['computedData']['img1Properties']['coordinates']})
print(len(splitTest1))
So basically what you need to do is to add "json_data" in front of the indices.
Also I suggest you to write the variable "json_path" and not "base_dir" into the 'filename' field.
I found the solution with help of the post from Mattu475
I had to add the reference in front of the indices and also change on how to open the files found in my folder with the following code;
with open (json_path) as f0:
json_data = json.load(f0)
instead of pd.read_json(...)
Here the full code:
def createJsonFile() :
splitTest1 = {}
splitTest1['20mm PSL'] = []
for file in os.listdir(base_dir):
# If file is a json, construct it's full path and open it, append all json data to list
if 'json' in file:
print("filename: " ,file) # file is only the file name, the path not included
json_path = os.path.join(base_dir, file)
print("path : ", json_path)
with open (json_path) as f0:
json_data = json.load(f0)
splitTest1['20mm PSL'].append({
'filename': os.path.basename(json_path),
'triggerdata': json_data['rawData']['adcDump']['0B'],
#'imgVel': json_data['computedData']['particleProperties']['imgVelocity'],
'trigVel': json_data['computedData']['trigger']['trigVelocity'],
#'coordinatesImg0': json_data['computedData']['img0Properties']['coordinates'],
#'coordinatesImg1': json_data['computedData']['img1Properties']['coordinates']
})
return splitTest1
few lines (the ones commented out) do not function 100% yet, but the rest works.
Thank you for your help!
The issue is with this line
'imgVel': ['computedData']['particleProperties']['imgVelocity'],
And the two that come after that. What's happening there is you're creating a list with the string 'computedData' as the only element. And then trying to find the index 'particleProperties', which doesn't make sense. You can only index a list with integers. I can't really give you a "solution", but if you want imgVel to just be a list of those strings, then you would do
'imgVel': ['computedData', 'particularProperties', 'imgVelocity']
Your dict value isn't legal Python.
'triggerdata': ['rawData']['adcDump']['0B']
The value doesn't make any sense: you make a list of a single string, then you try to index it with another string. You asked for element "adcDump" of the list ['rawData'], and there isn't any such syntax.
You cannot store arbitrary source code (your partial expression) as if it were a data value.
If you want help to construct a particular reference, then please post a focused question. Please repeat how to ask from the intro tour.
The incoming data resembles the following:
[{
"foo": "bar"
}]
[{
"bar": "baz"
}]
[{
"baz": "foo"
}]
as you see, arrays of objects strung together. JSON-ish
ijson is able to handle the first array, and then I get:
ijson.common.JSONError: Additional data
when it hits the subsequent arrays. How do I get around this?
Here's a first cut at the problem that at least has a working regex substitution to turn a full string into valid json. It only works if you're ok with reading the full input stream before parsing as json.
import re
input = ''
for line in inputStream:
input = input + line
# input == '[{"foo": "bar"}][{"bar": "baz"}][{"baz": "foo"}]'
# wrap in [] and put commas between each ][
sanitizedInput = re.sub(r"\]\[", "],[", "[%s]" % input)
# sanitizedInput == '[[{"foo": "bar"}],[{"bar": "baz"}],[{"baz": "foo"}]]'
# then parse sanitizedInput
parsed = json.loads(sanitizedInput)
print parsed #=> [[{u'foo': u'bar'}], [{u'bar': u'baz'}], [{u'baz': u'foo'}]]
Note: since you're read the whole thing as a string, you can use json instead of ijson
You can use json.JSONDecoder.raw_decode to walk through the string. Its documentation indeed says:
This can be used to decode a JSON document from a string that may have extraneous data at the end.
The following code sample assumes all the JSON values are in one big string:
def json_elements(string):
while True:
try:
(element, position) = json.JSONDecoder.raw_decode(string)
yield element
string = string[position:]
except ValueError:
break
To avoid dealing with raw_decode yourself and to be able to parse a stream chunk by chunk, I would recommend a library I made for this exact purpose: streamcat.
def json_elements(stream)
decoder = json.JSONDecoder()
yield from streamcat.stream_to_iterator(stream, decoder)
This works for any concatenation of JSON values regardless of how many white-space characters are used within them or between them.
If you have control over how your input stream is encoded, you may want to consider using line-delimited JSON, which makes parsing easier.
I am trying to write a code in python and deploy on google app engine. I am new to both these things. I have json which contains the following
[
{
"sentiment":-0.113568,
"id":455908588913827840,
"user":"ANI",
"text":"Posters put up against Arvind Kejriwal in Varanasi http://t.co/ZDrzjm84je",
"created_at":1.397532052E9,
"location":"India",
"time_zone":"New Delhi"
},
{
"sentiment":-0.467335,
"id":456034840106643456,
"user":"Kumar Amit",
"text":"Arvind Kejriwal's interactive session with Varansi Supporter and Opponent will start in short while ..Join at http://t.co/f6xI0l2dWc",
"created_at":1.397562153E9,
"location":"New Delhi, Patna.",
"time_zone":"New Delhi"
},
I am trying to load this data in python. I have the following code for it
data = simplejson.load(open('data/convertcsv.json'))
# print data
for row in data:
print data['sentiment']
I am getting the following error - TypeError: list indices must be integers, not str
If I uncomment the print data line and remove the last 2 lines I can see all the data in console. I want to be able to do some computations on the sentiment and also search for some words in the text. But for that I need to know how to get it line by line.
If you'd like to clean it up a bit
import json
with open('data/convertcsv.json') as f:
data = json.loads(f.read())
for row in data:
print row['sentiment']
The 'with' only leaves the file open as its used, then closes it automatically once the indented block under is executed.
Try this:
import json
f = open('data/convertcsv.json');
data = json.loads(f.read())
f.close()
for row in data:
print row['sentiment']
The issue is that you use data['sentiment'] instead of row['sentiment'] otherwise your code is fine:
with open('data/convertcsv.json', 'rb') as file:
data = simplejson.load(file)
# print data
for row in data:
print row['sentiment'] # <-- data is a list, use `row` here