I am new to json data processing and stuck with this issue. Data in my input file looks like this -
[{"key1":"value1"},{"key2":"value2"}] [{"key3":"value3"},{"key4":"value4"}]
I tried to read using
json.load(file)
or by
with open(file) as f:
json.loads(f)
tried with pandas.read_json(file, orient="records") as well
each of these attempts failed with stating Extra data: line 1 column n (char n) issue
Can someone guide how best to parse this file? I am not in favor writing a manual parser which may fail to scale later
P.S. There is no , between two arrays
TIA
Your Json file content has issue.
1. If , between arrays:
Code:
import json
with open("my.json") as fp:
data = json.load(fp) # data = json.loads(fp.read())
print data
your file content can be eithor of these.
Option1:
Use outer most square bracket for your json content.
[[ {"key1":"value1"}, {"key2":"value2"}], [{"key3":"value3"},
{"key4":"value4"}]]
Option2:
use only one square bracket.
[ {"key1":"value1"}, {"key2":"value2"}, {"key3":"value3"},
{"key4":"value4"}]
2. If no , between arrays:
code:
Just writing as per the given JSON format.
def valid_json_creator(given):
replaced = given.replace("}] [{", "}],[{")
return "[" + replaced + "]"
def read_json():
with open("data.txt") as fp:
data = fp.read()
valid_json = valid_json_creator(data)
jobj = json.loads(valid_json)
print(jobj)
if __name__ == '__main__':
read_json()
This code works for JSON if it is in the following format.
Note no , between arrays, but space is there.
[{"key0":"value0"},{"key1":"value41"}]
[{"key1":"value1"},{"key2":"value42"}]
[{"key2":"value2"},{"key3":"value43"}]
[{"key3":"value3"},{"key4":"value44"}]
[{"key4":"value4"},{"key5":"value45"}]
[{"key5":"value5"},{"key6":"value46"}]
[{"key6":"value6"},{"key7":"value47"}]
[{"key7":"value7"},{"key8":"value48"}]
[{"key8":"value8"},{"key9":"value49"}]
[{"key9":"value9"},{"key10":"value410"}]
[{"key10":"value10"},{"key11":"value411"}]
[{"key11":"value11"},{"key12":"value412"}]
[{"key12":"value12"},{"key13":"value413"}]
[{"key13":"value13"},{"key14":"value414"}]
[{"key14":"value14"},{"key15":"value415"}]
[{"key15":"value15"},{"key16":"value416"}]
[{"key16":"value16"},{"key17":"value417"}]
[{"key17":"value17"},{"key18":"value418"}]
[{"key18":"value18"},{"key19":"value419"}]
[{"key19":"value19"},{"key20":"value420"}]
What you have tested is reading from a structure that corresponds to the JSON file (which by definition is text, not Python data structure).
Test:
file = '[{"key1":"value1"},{"key2":"value2"}],[{"key3":"value3"},{"key4":"value4"}]'
This should work better. But wait... you do not seem to provide a list or dict at the top level of your to-be JSON! Hence the error:
ValueError: Extra data: line 1 column 38 - line 1 column 76 (char 37 -
75)
Change it then to (note the additional list opening and closing brackets at the beginning and end):
file = '[[{"key1":"value1"},{"key2":"value2"}],[{"key3":"value3"},{"key4":"value4"}]]'
This will work with:
json.load(file)
but not with:
with open(file) as f:
json.loads(f)
as your text variable is not a file! You would want to store the contents of the variable named file to a file and pass the path to that file:
with open(r'C:\temp\myfile.json') as f:
json.loads(f)
For the code to work properly.
Related
I am trying to save high scores for a game and also load it in high score section, but the way I am saving adds more than one record to the JSON file. The problem is while loading, I get the error json.decoder.JSONDecodeError: Extra data only when there is more than one record.
I am pretty sure this is my problem but me being a starter I cannot make sense out of it.
what I am saving
score = {
"score" : round_counter,
"name" : player["name"],
"hp left" : player["hitpoints"]
}
how I am saving it
if os.path.isfile('score.json'):
print("your score has been added")
json_dump = json.dumps(score)
f = open("score.json","a")
f.write(json_dump)
f.close()
else :
print ("database doesn't exist so it was created!")
json_dump = json.dumps(score)
f = open("score.json","x")
f.write(json_dump)
f.close()
how I am reading it
with open ("score.json") as json_data:
data = json.load(json_data)
print(data)
It works for the first run but when there are 2 records in .json file I cannot read it. I don't know if I need more complete reading code or the way I am saving multiple dictionaries in .json is in its root wrong.
In order to store more than one JSON record, use an array, as when loading you can just load JSON types (an an array is the ideal JSON type you're looking for for that use case).
In that case, to read all scores:
scores = []
with open ("score.json") as json_data:
scores = json.load(json_data)
But most important, to write them to a file:
scores.append(score)
json_dump = json.dumps(scores)
f = open("score.json","w")
f.write(json_dump)
f.close()
Update
The last code can also be written using json.dump:
scores.append(score)
f = open("score.json","w")
json.dump(scores, f)
f.write(json_dump)
f.close()
The way your code is written, it will append score dictionaries instead of adding an object in an array of scores.
If you check the output file score.json it will look like {...}{...}{...} whereas it should be like [{...},{...},{...}]
You can read the file line by line, every line will contain a valid JSON object:
with open('score.json') as fp:
for line in fp:
data = json.loads(line)
# do something with data
Or if you need everything in one object:
with open('score.json') as fp:
data = []
for line in fp:
data.append(json.loads(line))
I need to load data from a .txt file, but I cannot figure out how to refer to the rows and columns that I want.
I have normally used code such as follows:
a = []
b = []
for line in file:
if line[0] != 'x':
False
else:
fields = (line.strip()).split('\t')
a.append(fields[0])
b.append(fields[1])
My issue is that the lines with the data I want do not all start with the same character like other files I have opened. The first line of data I want begins with a float (0.0) and goes up to 5300.0. This is column a. It is separated by a tab from the second column I need, b.
I'm unable to comment, so I apologize, can you post the contents of the file and explain what you need reached further?
In order to load the data from a .txt file, you can use the file handling
f = open('file.txt','r')
data1 = f.read()
data2 = f.readlines()
data3 = f.readline()
f.close()
Explanation
data1 would have all the data as it is from the txt file and is a str type
data2 would have all the lines in a list type ['line1','line2','line3'...]
data3 would read just the first line and the output it str type. You can use read(2) to read first 2 lines as well.
If you're looking for a more complex output, please post an expected output with the contents of the file - and I'll assist you with the writing the code
I'm trying to parse the json format data to json.load() method. But it's giving me an error. I tried different methods like reading line by line, convert into dictionary, list, and so on but it isn't working. I also tried the solution mention in the following url loading-and-parsing-a-json but it give's me the same error.
import json
data = []
with open('output.txt','r') as f:
for line in f:
data.append(json.loads(line))
Error:
ValueError: Extra data: line 1 column 71221 - line 1 column 6783824 (char 71220 - 6783823)
Please find the output.txt in the below URL
Content- output.txt
I wrote up the following which will break up your file into one JSON string per line and then go back through it and do what you originally intended. There's certainly room for optimization here, but at least it works as you expected now.
import json
import re
PATTERN = '{"statuses"'
file_as_str = ''
with open('output.txt', 'r+') as f:
file_as_str = f.read()
m = re.finditer(PATTERN, file_as_str)
f.seek(0)
for pos in m:
if pos.start() == 0:
pass
else:
f.seek(pos.start())
f.write('\n{"')
data = []
with open('output.txt','r') as f:
for line in f:
data.append(json.loads(line))
Your alleged JSON file is not a properly formatted JSON file. JSON files must contain exactly one object (a list, a mapping, a number, a string, etc). Your file appears to contain a number of JSON objects in sequence, but not in the correct format for a list.
Your program's JSON parser correctly returns an error condition when presented with this non-JSON data.
Here is a program that will interpret your file:
import json
# Idea and some code stolen from https://gist.github.com/sampsyo/920215
data = []
with open('output.txt') as f:
s = f.read()
decoder = json.JSONDecoder()
while s.strip():
datum, index = decoder.raw_decode(s)
data.append(datum)
s = s[index:]
print len(data)
I am beginner in the programming world and a would like some tips on how to solve a challenge.
Right now I have ~10 000 .dat files each with a single line following this structure:
Attribute1=Value&Attribute2=Value&Attribute3=Value...AttibuteN=Value
I have been trying to use python and the CSV library to convert these .dat files into a single .csv file.
So far I was able to write something that would read all files, store the contents of each file in a new line and substitute the "&" to "," but since the Attribute1,Attribute2...AttributeN are exactly the same for every file, I would like to make them into column headers and remove them from every other line.
Any tips on how to go about that?
Thank you!
Since you are a beginner, I prepared some code that works, and is at the same time very easy to understand.
I assume that you have all the files in the folder called 'input'. The code beneath should be in a script file next to the folder.
Keep in mind that this code should be used to understand how a problem like this can be solved. Optimisations and sanity checks have been left out intentionally.
You might want to check additionally what happens when a value is missing in some line, what happens when an attribute is missing, what happens with a corrupted input etc.. :)
Good luck!
import os
# this function splits the attribute=value into two lists
# the first list are all the attributes
# the second list are all the values
def getAttributesAndValues(line):
attributes = []
values = []
# first we split the input over the &
AtributeValues = line.split('&')
for attrVal in AtributeValues:
# we split the attribute=value over the '=' sign
# the left part goes to split[0], the value goes to split[1]
split = attrVal.split('=')
attributes.append(split[0])
values.append(split[1])
# return the attributes list and values list
return attributes,values
# test the function using the line beneath so you understand how it works
# line = "Attribute1=Value&Attribute2=Value&Attribute3=Vale&AttibuteN=Value"
# print getAttributesAndValues(line)
# this function writes a single file to an output file
def writeToCsv(inFile='', wfile="outFile.csv", delim=","):
f_in = open(inFile, 'r') # only reading the file
f_out = open(wfile, 'ab+') # file is opened for reading and appending
# read the whole file line by line
lines = f_in.readlines()
# loop throug evert line in the file and write its values
for line in lines:
# let's check if the file is empty and write the headers then
first_char = f_out.read(1)
header, values = getAttributesAndValues(line)
# we write the header only if the file is empty
if not first_char:
for attribute in header:
f_out.write(attribute+delim)
f_out.write("\n")
# we write the values
for value in values:
f_out.write(value+delim)
f_out.write("\n")
# Read all the files in the path (without dir pointer)
allInputFiles = os.listdir('input/')
allInputFiles = allInputFiles[1:]
# loop through all the files and write values to the csv file
for singleFile in allInputFiles:
writeToCsv('input/'+singleFile)
but since the Attribute1,Attribute2...AttributeN are exactly the same
for every file, I would like to make them into column headers and
remove them from every other line.
input = 'Attribute1=Value1&Attribute2=Value2&Attribute3=Value3'
once for the the first file:
','.join(k for (k,v) in map(lambda s: s.split('='), input.split('&')))
for each file's content:
','.join(v for (k,v) in map(lambda s: s.split('='), input.split('&')))
Maybe you need to trim the strings additionally; don't know how clean your input is.
Put the dat files in a folder called myDats. Put this script next to the myDats folder along with a file called temp.txt. You will also need your output.csv. [That is, you will have output.csv, myDats, and mergeDats.py in the same folder]
mergeDats.py
import csv
import os
g = open("temp.txt","w")
for file in os.listdir('myDats'):
f = open("myDats/"+file,"r")
tempData = f.readlines()[0]
tempData = tempData.replace("&","\n")
g.write(tempData)
f.close()
g.close()
h = open("text.txt","r")
arr = h.read().split("\n")
dict = {}
for x in arr:
temp2 = x.split("=")
dict[temp2[0]] = temp2[1]
with open('output.csv','w' """use 'wb' in python 2.x""" ) as output:
w = csv.DictWriter(output,my_dict.keys())
w.writeheader()
w.writerow(my_dict)
I have a txt file which has some 'excel formulas', I have converted this to a csv file using Python csv reader/writer. Now I want to read the values of the csv file and do some calculation, but when i try to access the particular column of .csv file, it still returns me in the 'excel formula' instead of the actual value?? although When i open the csv file .. formulas are converted in to value??
Any ideas?
Here is the code
Code to convert txt to csv
def parseFile(filepath):
file = open(filepath,'r')
content = file.read()
file.close()
lines = content.split('\n')
csv_filepath = filepath[:(len(filepath)-4)]+'_Results.csv'
csv_out = csv.writer(open(csv_filepath, 'a'), delimiter=',' , lineterminator='\n')
for line in lines:
data = line.split('\t')
csv_out.writerow(data)
return csv_filepath
Code to do some calculation in csv file
def csv_cal (csv_filepath):
r = csv.reader(open(csv_filepath))
lines = [l for l in r]
counter =[0]*(len(lines[4])+6)
if lines[4][4] == 'Last Test Pass?' :
print ' i am here'
for i in range(0,3):
print lines[6] [4] ### RETURNS FORMULA ??
return 0
I am new to python, any help would be appreciated!
Thanks,
You can paste special in Excel with Values only option selected. You could select all and paste into a another sheet and save. This would save you from having to implement some kind of parser in python. Or, you could evaluate some simple arithmetic with eval.
edit:
I've heard of xlrd which can be downloaded from pypi. It loads .xls files.
It sounded like you just wanted the final data which past special can do.