How to preserve trailing zeros with Python CSV Writer - python
I am trying to convert a json file with individual json lines to csv. The json data has some elements with trailng zeros that I need to maintain (ex. 1.000000). When writing to csv the value is changed to 1.0, removing all trailing zeros except the first zero following the decimal point. How can I keep all trailing zeros? The number of trailing zeros may not always static.
Updated the formatting of the sample data.
Here is a sample of the json input:
{"ACCOUNTNAMEDENORM":"John Smith","DELINQUENCYSTATUS":2.0000000000,"RETIRED":0.0000000000,"INVOICEDAYOFWEEK":5.0000000000,"ID":1234567.0000000000,"BEANVERSION":69.0000000000,"ACCOUNTTYPE":1.0000000000,"ORGANIZATIONTYPEDENORM":null,"HIDDENTACCOUNTCONTAINERID":4321987.0000000000,"NEWPOLICYPAYMENTDISTRIBUTABLE":"1","ACCOUNTNUMBER":"000-000-000-00","PAYMENTMETHOD":12345.0000000000,"INVOICEDELIVERYTYPE":98765.0000000000,"DISTRIBUTIONLIMITTYPE":3.0000000000,"CLOSEDATE":null,"FIRSTTWICEPERMTHINVOICEDOM":1.0000000000,"HELDFORINVOICESENDING":"0","FEINDENORM":null,"COLLECTING":"0","ACCOUNTNUMBERDENORM":"000-000-000-00","CHARGEHELD":"0","PUBLICID":"xx:1234346"}
Here is a sample of the output:
ACCOUNTNAMEDENORM,DELINQUENCYSTATUS,RETIRED,INVOICEDAYOFWEEK,ID,BEANVERSION,ACCOUNTTYPE,ORGANIZATIONTYPEDENORM,HIDDENTACCOUNTCONTAINERID,NEWPOLICYPAYMENTDISTRIBUTABLE,ACCOUNTNUMBER,PAYMENTMETHOD,INVOICEDELIVERYTYPE,DISTRIBUTIONLIMITTYPE,CLOSEDATE,FIRSTTWICEPERMTHINVOICEDOM,HELDFORINVOICESENDING,FEINDENORM,COLLECTING,ACCOUNTNUMBERDENORM,CHARGEHELD,PUBLICID
John Smith,2.0,0.0,5.0,1234567.0,69.0,1.0,,4321987.0,1,000-000-000-00,10012.0,10002.0,3.0,,1.0,0,,0,000-000-000-00,0,bc:1234346
Here is the code:
import json
import csv
f=open('test2.json') #open input file
outputFile = open('output.csv', 'w', newline='') #load csv file
output = csv.writer(outputFile) #create a csv.writer
i=1
for line in f:
try:
data = json.loads(line) #reads current line into tuple
except:
print("Can't load line {}".format(i))
if i == 1:
header = data.keys()
output.writerow(header) #Writes header row
i += 1
output.writerow(data.values()) #writes values row
f.close() #close input file
The desired output would look like:
ACCOUNTNAMEDENORM,DELINQUENCYSTATUS,RETIRED,INVOICEDAYOFWEEK,ID,BEANVERSION,ACCOUNTTYPE,ORGANIZATIONTYPEDENORM,HIDDENTACCOUNTCONTAINERID,NEWPOLICYPAYMENTDISTRIBUTABLE,ACCOUNTNUMBER,PAYMENTMETHOD,INVOICEDELIVERYTYPE,DISTRIBUTIONLIMITTYPE,CLOSEDATE,FIRSTTWICEPERMTHINVOICEDOM,HELDFORINVOICESENDING,FEINDENORM,COLLECTING,ACCOUNTNUMBERDENORM,CHARGEHELD,PUBLICID
John Smith,2.0000000000,0.0000000000,5.0000000000,1234567.0000000000,69.0000000000,1.0000000000,,4321987.0000000000,1,000-000-000-00,10012.0000000000,10002.0000000000,3.0000000000,,1.0000000000,0,,0,000-000-000-00,0,bc:1234346
I've been trying and I think this may solve your problem:
Pass the str function to the parse_float argument in json.loads :)
data = json.loads(line, parse_float=str)
This way when json.loads() tries to parse a float it will use the str method so it will be parsed as string and maintain the zeroes. Tried doing that and it worked:
i=1
for line in f:
try:
data = json.loads(line, parse_float=str) #reads current line into tuple
except:
print("Can't load line {}".format(i))
if i == 1:
header = data.keys()
print(header) #Writes header row
i += 1
print(data.values()) #writes values row
More information here: Json Documentation
PS: You could use a boolean instead of i += 1 to get the same behaviour.
The decoder of the json module parses real numbers with float by default, so trailing zeroes are not preserved as they are not in Python. You can use the parse_float parameter of the json.loads method to override the constructor of a real number for the JSON decoder with the str constructor instead:
data = json.loads(line, parse_float=str)
Use format but here need to give static decimal precision.
>>> '{:.10f}'.format(10.0)
'10.0000000000'
Related
ValueError: could not convert string to float: '.' Python
So I am trying to make sure that all the values that I have in the csv file are converted into float. The values in each cell inside the csv file are just numbers like for example "0.089" "23". For some reason when I try to run the code it is giving the following error, " ValueError: could not convert string to float: '.' " I can not really understand why the program is not reading the numbers from the csv file properly. def loadCsv(filename): with open('BreastCancerTumor.csv','r') as f: lines = f.readlines()[1:] dataset = list(lines) for i in range(len(dataset)): dataset[i] = [float(x) for x in dataset[i]] return dataset
You never split the line into comma-separated fields. So you're looping over the characters in the line, not the fields, and trying to parse each character as a float. You get an error when you get to the . character. Use the csv library to read the file, it will split each line into lists of fields. import csv def loadCsv(filename): with open('BreastCancerTumor.csv','r') as f: f.readline() # skip header csvf = csv.reader(f): dataset = [[float(x) for x in row] for row in csvf] return dataset
Extract data from text file to "String only" csv with python
I am trying to extract data from a text file into a csv sheet. I am using this: fileOutput = open(outputFolder + '/' + outputfile, mode='w+', newline='', encoding='utf-8') file_writer = csv.writer(fileOutput, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL) However, when I have a value like "0009", it's parsed as "9" in csv. My question is: Is there a way I can force all values to be parsed as strings to get the data as it is? Thank you
The CSV writer includes the leading 0s if you pass the value as a string, but not if you've already converted the value to an integer. writer.writerow([int('0009')]) # writes "9" to the file writer.writerow(['0009']) # writes "0009" to the file When you pass it an integer, it has no way of knowing how many leading zeros there were in the original text - that information has already been discarded. You need to look at the code that's extracting your data from the original text file, and keep that code from doing a conversion to integer.
python error when writing data in csv file
write a python program to write data in .csv file,but find that every item in the .csv has a "b'" before the content, and there are blank line, I do not know how to remove the blank lines; and some item in the .csv file are unrecognizable characters,such as "b'\xe7\xbe\x85\xe5\xb0\x91\xe5\x90\x9b'", because some data are in Chinese and Japanese, so I think maybe something wrong when writing these data in the .csv file.Please help me to solve the problem the program is: #write data in .csv file def data_save_csv(type,data,id_name,header,since = None): #get the date when storage data date_storage() #create the data storage directory csv_parent_directory = os.path.join("dataset","csv",type,glovar.date) directory_create(csv_parent_directory) #write data in .csv if type == "group_members": csv_file_prefix = "gm" if since: csv_file_name = csv_file_prefix + "_" + since.strftime("%Y%m%d-%H%M%S") + "_" + time_storage() + id_name + ".csv" else: csv_file_name = csv_file_prefix + "_" + time_storage() + "_" + id_name + ".csv" csv_file_directory = os.path.join(csv_parent_directory,csv_file_name) with open(csv_file_directory,'w') as csvfile: writer = csv.writer(csvfile,delimiter=',',quotechar='"',quoting=csv.QUOTE_MINIMAL) #csv header writer.writerow(header) row = [] for i in range(len(data)): for k in data[i].keys(): row.append(str(data[i][k]).encode("utf-8")) writer.writerow(row) row = [] the .csv file
You have a couple of problems. The funky "b" thing happens because csv will cast data to a string before adding it to a column. When you did str(data[i][k]).encode("utf-8"), you got a bytes object and its string representation is b"..." and its filled with utf-8 encoded data. You should handle encoding when you open the file. In python 3, open opens a file with the encoding from sys.getdefaultencoding() but its a good idea to be explicit about what you want to write. Next, there's nothing that says that two dicts will enumerate their keys in the same order. The csv.DictWriter class is built to pull data from dictionaries, so use it instead. In my example I assumed that header has the names of the keys you want. It could be that header is different, and in that case, you'll also need to pass in the actual dict key names you want. Finally, you can just strip out empty dicts while you are writing the rows. with open(csv_file_directory,'w', newline='', encoding='utf-8') as csvfile: writer = csv.DictWriter(csvfile, fieldnames=header, delimiter=',', quotechar='"',quoting=csv.QUOTE_MINIMAL) writer.writeheader() writer.writerows(d for d in data if d)
It sounds like at least some of your issues have to do with incorrect unicode. try implementing the snippet below into your existing code. As the comment say, the first part takes your input and converts it into utf-8. The second bit will return your output in the expected format of ascii. import codecs import unicodedata f = codecs.open('path/to/textfile.txt', mode='r',encoding='utf-8') #Take input and turn into unicode for line in f.readlines(): line = unicodedata.normalize('NFKD', line).encode('ascii', 'ignore'). #Ensure output is in ASCII
Using Python and Json and trying to replace sections
I'm using python to try and locate and change different parts of a Json file. I have a list with 2 columns and what I want to do is look for the string in the first column, find it in the Json file and then replace it with the second column in the list. Does anyone have any idea how to do this? Been driving me mad. for row in new_list: if json_str == new_list[row][0]: json_str.replace(new_list[row][0], new_list[row][1]) I tried using the .replace() above but it says that it list indices must be integers or slices, not list. The way that I've managed to print off all the data works... But this is not referencing anything, so if anyone has any ideas, feel free to lend a hand, thanks. import json # I import a json file and a text file... # with open('json file', 'r', encoding="utf8") as jsonData: # data = json.load(jsonData) # jsonData.close() jsonData = {"employees":[ {"firstName":"a"}, {"firstName":"b"}, {"firstName":"c"} ]} # text_file = open('text file', 'r', encoding="utf8") # list = text_file.readlines() # jsonString = str(data) # The text file contains lots of data like 'a|A', 'b|B, 'c|C' # so column 1 is lower and column 2 is upper list = 'a|A, b|B, c|C' def print_all(): for value in list: new_list = value.split("|") print("%s" % value.split("|")) # This prints column 1 and 2 if new_list[0:] == 'some value': print(new_list[1]) # This prints off the 'replaced' value print_all() edit for the comment, this should be able to run... I think
Without more context it's hard to say for sure, but it sounds as though what you want is to use for row in range(len(new_list)): instead of for row in new_list:
If your JSON file is small enough, just read it into memory and then .replace() in a loop. #UNTESTED with open('json.txt') as json_file: json_str = json_file.read() for was, will_be in new_list: json_str = json_str.replace(was, will_be) with open('new-json.txt', 'w') as json_file: json_file.write(json_str)
Error occurs while parsing the json file
I'm trying to parse the json format data to json.load() method. But it's giving me an error. I tried different methods like reading line by line, convert into dictionary, list, and so on but it isn't working. I also tried the solution mention in the following url loading-and-parsing-a-json but it give's me the same error. import json data = [] with open('output.txt','r') as f: for line in f: data.append(json.loads(line)) Error: ValueError: Extra data: line 1 column 71221 - line 1 column 6783824 (char 71220 - 6783823) Please find the output.txt in the below URL Content- output.txt
I wrote up the following which will break up your file into one JSON string per line and then go back through it and do what you originally intended. There's certainly room for optimization here, but at least it works as you expected now. import json import re PATTERN = '{"statuses"' file_as_str = '' with open('output.txt', 'r+') as f: file_as_str = f.read() m = re.finditer(PATTERN, file_as_str) f.seek(0) for pos in m: if pos.start() == 0: pass else: f.seek(pos.start()) f.write('\n{"') data = [] with open('output.txt','r') as f: for line in f: data.append(json.loads(line))
Your alleged JSON file is not a properly formatted JSON file. JSON files must contain exactly one object (a list, a mapping, a number, a string, etc). Your file appears to contain a number of JSON objects in sequence, but not in the correct format for a list. Your program's JSON parser correctly returns an error condition when presented with this non-JSON data. Here is a program that will interpret your file: import json # Idea and some code stolen from https://gist.github.com/sampsyo/920215 data = [] with open('output.txt') as f: s = f.read() decoder = json.JSONDecoder() while s.strip(): datum, index = decoder.raw_decode(s) data.append(datum) s = s[index:] print len(data)