ValueError: could not convert string to float: '.' Python - python

So I am trying to make sure that all the values that I have in the csv file are converted into float. The values in each cell inside the csv file are just numbers like for example "0.089" "23". For some reason when I try to run the code it is giving the following error, " ValueError: could not convert string to float: '.' "
I can not really understand why the program is not reading the numbers from the csv file properly.
def loadCsv(filename):
with open('BreastCancerTumor.csv','r') as f:
lines = f.readlines()[1:]
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[i]]
return dataset

You never split the line into comma-separated fields. So you're looping over the characters in the line, not the fields, and trying to parse each character as a float. You get an error when you get to the . character.
Use the csv library to read the file, it will split each line into lists of fields.
import csv
def loadCsv(filename):
with open('BreastCancerTumor.csv','r') as f:
f.readline() # skip header
csvf = csv.reader(f):
dataset = [[float(x) for x in row] for row in csvf]
return dataset

Related

How to convert a list of strings and numbers to float in python?

I need to get data from a csv file which I have done and have appended the data to a list but don't know how to make the entire list into a float.
I tried the following code and it did not work:
import csv
with open('csv1.csv', 'r') as f:
lines = f.readlines()[6]
new_list = []
for i in lines:
print(lines)
new_list.append(float(i))
print(new_list)
I got a ValueError message. ValueError: could not convert string to float: '"' which is weird since I don't understand where it is getting the " from.
The CSV file I am using is from Bureau Of Economic, here's the link to the exact file I am using: https://apps.bea.gov/iTable/iTable.cfm?reqid=19&step=2#reqid=19&step=2&isuri=1&1921=survey
When you execute lines = f.readlines()[6] you are grabbing only the seventh line of the file. That is only one line and it is a str (string) that contains that whole line from the file. As you go through and try to iterate for i in lines you are actually going through the seventh line of the file, character-by-character and attempting to do the float() conversion. The first character of that line is the " which is causing the error.
In order to go through all of the lines of the file starting with the seventh line you need to change the array indexing to lines = f.readlines()[6:]. That gets you to processing the lines, one-by-one.
But, that's all a sort of explanation of what was going on. As #Razzle Shazl points out, you're not using the CSV reader.
That said, I think this is what you're actually trying to accomplish:
import csv
with open('csv1.csv', 'r') as f:
csv_data = csv.reader(f, delimiter=',', quotechar='"')
for i in range(6): # this is to skip over header rows in the data (I saw 6 in the file I downloaded)
next(csv_data)
for row in csv_data: # going through the rest of the rows of data
new_list = []
for i in range(2, len(row)): # go through each column after the line number and description
new_list.append(float(row[i])) # convert element to float and append to array
print(f"CSV Input: {row}")
print(f"Converted: {new_list}")
You cannot convert empty string to float. Probably somewhere the data is empty.
In [1]: float("")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-603e80e58999> in <module>
----> 1 float("")
ValueError: could not convert string to float:
If you want to ignore the exception and continue processing, then use try/except block

How to preserve trailing zeros with Python CSV Writer

I am trying to convert a json file with individual json lines to csv. The json data has some elements with trailng zeros that I need to maintain (ex. 1.000000). When writing to csv the value is changed to 1.0, removing all trailing zeros except the first zero following the decimal point. How can I keep all trailing zeros? The number of trailing zeros may not always static.
Updated the formatting of the sample data.
Here is a sample of the json input:
{"ACCOUNTNAMEDENORM":"John Smith","DELINQUENCYSTATUS":2.0000000000,"RETIRED":0.0000000000,"INVOICEDAYOFWEEK":5.0000000000,"ID":1234567.0000000000,"BEANVERSION":69.0000000000,"ACCOUNTTYPE":1.0000000000,"ORGANIZATIONTYPEDENORM":null,"HIDDENTACCOUNTCONTAINERID":4321987.0000000000,"NEWPOLICYPAYMENTDISTRIBUTABLE":"1","ACCOUNTNUMBER":"000-000-000-00","PAYMENTMETHOD":12345.0000000000,"INVOICEDELIVERYTYPE":98765.0000000000,"DISTRIBUTIONLIMITTYPE":3.0000000000,"CLOSEDATE":null,"FIRSTTWICEPERMTHINVOICEDOM":1.0000000000,"HELDFORINVOICESENDING":"0","FEINDENORM":null,"COLLECTING":"0","ACCOUNTNUMBERDENORM":"000-000-000-00","CHARGEHELD":"0","PUBLICID":"xx:1234346"}
Here is a sample of the output:
ACCOUNTNAMEDENORM,DELINQUENCYSTATUS,RETIRED,INVOICEDAYOFWEEK,ID,BEANVERSION,ACCOUNTTYPE,ORGANIZATIONTYPEDENORM,HIDDENTACCOUNTCONTAINERID,NEWPOLICYPAYMENTDISTRIBUTABLE,ACCOUNTNUMBER,PAYMENTMETHOD,INVOICEDELIVERYTYPE,DISTRIBUTIONLIMITTYPE,CLOSEDATE,FIRSTTWICEPERMTHINVOICEDOM,HELDFORINVOICESENDING,FEINDENORM,COLLECTING,ACCOUNTNUMBERDENORM,CHARGEHELD,PUBLICID
John Smith,2.0,0.0,5.0,1234567.0,69.0,1.0,,4321987.0,1,000-000-000-00,10012.0,10002.0,3.0,,1.0,0,,0,000-000-000-00,0,bc:1234346
Here is the code:
import json
import csv
f=open('test2.json') #open input file
outputFile = open('output.csv', 'w', newline='') #load csv file
output = csv.writer(outputFile) #create a csv.writer
i=1
for line in f:
try:
data = json.loads(line) #reads current line into tuple
except:
print("Can't load line {}".format(i))
if i == 1:
header = data.keys()
output.writerow(header) #Writes header row
i += 1
output.writerow(data.values()) #writes values row
f.close() #close input file
The desired output would look like:
ACCOUNTNAMEDENORM,DELINQUENCYSTATUS,RETIRED,INVOICEDAYOFWEEK,ID,BEANVERSION,ACCOUNTTYPE,ORGANIZATIONTYPEDENORM,HIDDENTACCOUNTCONTAINERID,NEWPOLICYPAYMENTDISTRIBUTABLE,ACCOUNTNUMBER,PAYMENTMETHOD,INVOICEDELIVERYTYPE,DISTRIBUTIONLIMITTYPE,CLOSEDATE,FIRSTTWICEPERMTHINVOICEDOM,HELDFORINVOICESENDING,FEINDENORM,COLLECTING,ACCOUNTNUMBERDENORM,CHARGEHELD,PUBLICID
John Smith,2.0000000000,0.0000000000,5.0000000000,1234567.0000000000,69.0000000000,1.0000000000,,4321987.0000000000,1,000-000-000-00,10012.0000000000,10002.0000000000,3.0000000000,,1.0000000000,0,,0,000-000-000-00,0,bc:1234346
I've been trying and I think this may solve your problem:
Pass the str function to the parse_float argument in json.loads :)
data = json.loads(line, parse_float=str)
This way when json.loads() tries to parse a float it will use the str method so it will be parsed as string and maintain the zeroes. Tried doing that and it worked:
i=1
for line in f:
try:
data = json.loads(line, parse_float=str) #reads current line into tuple
except:
print("Can't load line {}".format(i))
if i == 1:
header = data.keys()
print(header) #Writes header row
i += 1
print(data.values()) #writes values row
More information here: Json Documentation
PS: You could use a boolean instead of i += 1 to get the same behaviour.
The decoder of the json module parses real numbers with float by default, so trailing zeroes are not preserved as they are not in Python. You can use the parse_float parameter of the json.loads method to override the constructor of a real number for the JSON decoder with the str constructor instead:
data = json.loads(line, parse_float=str)
Use format but here need to give static decimal precision.
>>> '{:.10f}'.format(10.0)
'10.0000000000'

Error occurs while parsing the json file

I'm trying to parse the json format data to json.load() method. But it's giving me an error. I tried different methods like reading line by line, convert into dictionary, list, and so on but it isn't working. I also tried the solution mention in the following url loading-and-parsing-a-json but it give's me the same error.
import json
data = []
with open('output.txt','r') as f:
for line in f:
data.append(json.loads(line))
Error:
ValueError: Extra data: line 1 column 71221 - line 1 column 6783824 (char 71220 - 6783823)
Please find the output.txt in the below URL
Content- output.txt
I wrote up the following which will break up your file into one JSON string per line and then go back through it and do what you originally intended. There's certainly room for optimization here, but at least it works as you expected now.
import json
import re
PATTERN = '{"statuses"'
file_as_str = ''
with open('output.txt', 'r+') as f:
file_as_str = f.read()
m = re.finditer(PATTERN, file_as_str)
f.seek(0)
for pos in m:
if pos.start() == 0:
pass
else:
f.seek(pos.start())
f.write('\n{"')
data = []
with open('output.txt','r') as f:
for line in f:
data.append(json.loads(line))
Your alleged JSON file is not a properly formatted JSON file. JSON files must contain exactly one object (a list, a mapping, a number, a string, etc). Your file appears to contain a number of JSON objects in sequence, but not in the correct format for a list.
Your program's JSON parser correctly returns an error condition when presented with this non-JSON data.
Here is a program that will interpret your file:
import json
# Idea and some code stolen from https://gist.github.com/sampsyo/920215
data = []
with open('output.txt') as f:
s = f.read()
decoder = json.JSONDecoder()
while s.strip():
datum, index = decoder.raw_decode(s)
data.append(datum)
s = s[index:]
print len(data)

Parse a string containing a large integer in Python

I am having trouble parsing a data set from a .txt file into an Excel file (.csv) in Python.
The source code looks like:
fin = open(filename,'r')
reader = csv.reader(fin)
for line in reader:
list = str(line).split()
print list3
print str(list3[1])
My data sample looks like:
10134.5 -123 9.9527
And Python screen output looks like this
["['10134.5", '-123', '9.9527,"']"
-131.7000
So I'm assuming list3[1] is a float or a number at this moment, which cause some overflow because 100,000 is large than it can hold...
Do you know how to let Python treat it as a string not a integer..
You do not need to split, or to cast to string... numbers inside the list are strings.
fin = open(filename,'r')
reader = csv.reader(fin)
for line in reader:
print(line)
output
['10134.5', '-123', '9.9527']

coordinates (str) to list

I have a very long txt file containing geographic coordinates..the format for each row looks like this:
501418.209 5314160.484 512.216
501418.215 5314160.471 512.186
501418.188 5314160.513 512.216
so separated by a blank (" ") and at the end a line break (\n)
I need to import that file into a list...so far I only managed to import it as a string and then tried to converted in into a list. Unforunately, I have no idea how I can keep the formatting of the txt file, as I need to perform calculations on each row.
My solution so far to import the txt file to a string variable:
fileobj = file(source,'r')
data = ""
for line in fileobj.readlines():
linevals = line.strip().split(" ")
data += "%s %s %s\n" % (linevals[0], linevals[1], linevals[2])
print type(data)
And my solution for importing as list that didn't work:
fileobj = file(source,'r')
data = []
for line in fileobj.readlines():
linevals = line.strip().split(" ")
data.append(linevals)
On stackoverflow I found lots of solutions that suggested the eval function - but that didn't work as I need the whole row as one list element. Hope that was clear. Any solutions for this problem? I'm pretty newish to python, but that bothers me for quite some time now. Thank you!
You don't need eval or anything other than simply splitting each row and casting to float:
with open(source) as f:
for row in f:
print(map(float,row.split()))
[501418.209, 5314160.484, 512.216]
[501418.215, 5314160.471, 512.186]
[501418.188, 5314160.513, 512.216]
If you want all rows in a single list:
with open(source) as f:
data = [ map(float,row.split()) for row in f] # python3 ->list(map(float,row.split()))
print(data)
[[501418.209, 5314160.484, 512.216], [501418.215, 5314160.471, 512.186], [501418.188, 5314160.513, 512.216]]
Or using the csv module:
import csv
with open(source) as f:
data = [map(float,row) for row in csv.reader(f,delimiter=" ")]
print(data)
If you want a flat list of all data:
with open(source) as f:
data = []
for row in f:
data.extend(map(float,row.split()))
If you are doing a lot of work on the data you may find numpy useful:
import numpy as np
data = np.genfromtxt(source,delimiter=" ").flatten()

Categories

Resources