ValueError: could not convert string to float: TF_008_2s - python

I'm very confused here. I have a code to read a csv file with 2 columns. Here is a piece of data:
1,TF_001_2s
2,TF_002_2s
3,TF_003_2s
4,TF_004_2s
5,TF_005_2s
6,TF_006_2s
7,TF_007_1s
8,TF_008_2s
9,TF_009_2s
10,TF_010_1s
What I want to do is to replace the number of the first column for the name of the second column in a vector. Here is my function:
def rename_flights(self):
index_flight = self.Names.index("Flight ID")
with open('Flight_names.csv','rb') as typical_flights:
typical_flights_read = csv.reader(typical_flights,delimiter=',')
for i in range(len(self.matrix[index_flight])):
flight_id = int(self.matrix[index_flight][i])
for new_typical_flights in typical_flights_read:
if flight_id == int(new_typical_flights[0]):
self.matrix[index_flight][i] = new_typical_flights[1]
This function takes the index of a vector (Names) and uses it in a matrix. This matrix may contain int numbers, and these numbers are those which I want to replace for the string of the csv file. I don't know if I have been enough clear. The problem is in the last line, when I assign the string to the correct position in the matrix. It throws me:
ValueError: could not convert string to float: TF_008_2s
I don't know what could be the fail, because I'm not converting any string to float. I hope you all can help me.
Thanks in advance.
Editing-#1:
Traceback (most recent call last):
File "script_server.py", line 566, in <module>
reading(name)
File "script_server.py", line 552, in reading
info.rename_flights()
File "script_server.py", line 272, in rename_flights
self.results_matrix_rounded[index_flight][i] = new_typical_flights[1]
ValueError: could not convert string to float: TF_008_2s

Related

problem related to h5py and create_dataset

Maybe the question is dumb, but so far I have not been able to find a solution.
I have been handed a code from other person who was working probably with a different set than mine (e.g. Python 2 instead of 3, etc).
So I have done some small changes to make things work, but I am stuck in a probably simple problem related to h5py.
The part of the code where it crushes looks like:
labels_ALL = ['ionic_str','psi0','psi1','psi2','psid','zeta','sig0','sig1','sig2','sigd','sig0_eq','sig1_eq','sig2_eq','sigd_eq','ch_bal_EDL','ch_bal_aq', 'sum_resid']
units_ALL = ['(mol/L)','(V)','(V)','(V)','(V)','(V)','(C/m**2)','(C/m**2)','(C/m**2)','(C/m**2)','(mol(eq))','(mol(eq))','(mol(eq))','(mol(eq))','(C/m**2)','(mol(eq)/L)',' ']
for i in range(len(Labels)):
labels_ALL.append(Labels[i])
units_ALL.append('(mol/L)')
base.create_dataset('Labels', data=labels_ALL)
base.create_dataset('Units', data=units_ALL)
The problem seems to be in base.create_dataset:
Traceback (most recent call last):
File "C:\Users\DaniJ\Documents\PostDoc_Jena\Trips, Conf, etc\Sinfonia Workshop\Exercise_1\exercise_1_SINFONIA_for_One\NR_chem_SINGLE_NoEu.py", line 252, in <module>
base.create_dataset('Labels', data=labels_ALL)
File "C:\Users\DaniJ\anaconda3\lib\site-packages\h5py\_hl\group.py", line 136, in create_dataset
dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
File "C:\Users\DaniJ\anaconda3\lib\site-packages\h5py\_hl\dataset.py", line 118, in make_new_dset
tid = h5t.py_create(dtype, logical=1)
File "h5py\h5t.pyx", line 1634, in h5py.h5t.py_create
File "h5py\h5t.pyx", line 1656, in h5py.h5t.py_create
File "h5py\h5t.pyx", line 1717, in h5py.h5t.py_create
TypeError: No conversion path for dtype: dtype('<U10')
the variable base seems to be a h5py._hl.files.File variable.
Does somebody how can I solve this problem?
Thanks
Best regards,
Dani
Did you solve your problem? I'm 99.9% sure it's related to your Labels data -- likely it's in a NumPy array instead of a List. I wrote 3 short examples to demonstrate the difference.
The first code segment uses a List and successfully creates the
datasets in file SO_69900543_1.h5.
The second code segment reproduces your error. It converts the List
to a NumPy Array then fails when attempting to create the datasets
in file SO_69900543_2.h5. Notice that it gives the same error
message you encountered: TypeError: No conversion path for dtype: dtype('<U10').
The third code segment shows how to modify numpy.str_ elements to str (solves problem in segment #2). Note that the each Labels value is converted with str() before it is added to Labels_All.
Maybe this will help you find (and fix) your problem with Unicode data.
Code segment 1 (works):
Labels = ['H+','Na+','Cl-','OH-','>SOH_x','>SO-_x','>SONa_x','>SOH2+_x','>SOH2Cl_x','>SOH_y','>SO-_y','>SONa_y']
labels_ALL = ['ionic_str','psi0','psi1','psi2','psid','zeta','sig0','sig1','sig2','sigd','sig0_eq','sig1_eq','sig2_eq','sigd_eq','ch_bal_EDL','ch_bal_aq', 'sum_resid']
units_ALL = ['(mol/L)','(V)','(V)','(V)','(V)','(V)','(C/m**2)','(C/m**2)','(C/m**2)','(C/m**2)','(mol(eq))','(mol(eq))','(mol(eq))','(mol(eq))','(C/m**2)','(mol(eq)/L)',' ']
for i in range(len(Labels)):
labels_ALL.append(Labels[i])
units_ALL.append('(mol/L)')
with h5py.File('SO_69900543_1.h5','w') as base:
base.create_dataset('Labels', data=labels_ALL)
base.create_dataset('Units', data=units_ALL)
Code segment 2 (returns TypeError):
Labels = ['H+','Na+','Cl-','OH-','>SOH_x','>SO-_x','>SONa_x','>SOH2+_x','>SOH2Cl_x','>SOH_y','>SO-_y','>SONa_y']
# Convert Labels List to NumPy array
# This will trigger the error when creating the dataset
Labels = np.array(Labels)
labels_ALL = ['ionic_str','psi0','psi1','psi2','psid','zeta','sig0','sig1','sig2','sigd','sig0_eq','sig1_eq','sig2_eq','sigd_eq','ch_bal_EDL','ch_bal_aq', 'sum_resid']
units_ALL = ['(mol/L)','(V)','(V)','(V)','(V)','(V)','(C/m**2)','(C/m**2)','(C/m**2)','(C/m**2)','(mol(eq))','(mol(eq))','(mol(eq))','(mol(eq))','(C/m**2)','(mol(eq)/L)',' ']
for i in range(len(Labels)):
labels_ALL.append(Labels[i])
units_ALL.append('(mol/L)')
for i in range(len(labels_ALL)):
print(i, type(labels_ALL[i]), type(units_ALL[i]))
with h5py.File('SO_69900543_2.h5','w') as base:
base.create_dataset('Labels', data=labels_ALL)
base.create_dataset('Units', data=units_ALL)
Code segment 3 (works):
Labels = ['H+','Na+','Cl-','OH-','>SOH_x','>SO-_x','>SONa_x','>SOH2+_x','>SOH2Cl_x','>SOH_y','>SO-_y','>SONa_y']
# Convert Labels List to NumPy array
# This will trigger the error when creating the dataset if not modified
Labels = np.array(Labels)
labels_ALL = ['ionic_str','psi0','psi1','psi2','psid','zeta','sig0','sig1','sig2','sigd','sig0_eq','sig1_eq','sig2_eq','sigd_eq','ch_bal_EDL','ch_bal_aq', 'sum_resid']
units_ALL = ['(mol/L)','(V)','(V)','(V)','(V)','(V)','(C/m**2)','(C/m**2)','(C/m**2)','(C/m**2)','(mol(eq))','(mol(eq))','(mol(eq))','(mol(eq))','(C/m**2)','(mol(eq)/L)',' ']
for i in range(len(Labels)):
# use str() to convert from 'numpy.str_' to 'str'
labels_ALL.append(str(Labels[i]))
units_ALL.append('(mol/L)')
for i in range(len(labels_ALL)):
print(i, type(labels_ALL[i]), type(units_ALL[i]))
with h5py.File('SO_69900543_2.h5','w') as base:
base.create_dataset('Labels', data=labels_ALL)
base.create_dataset('Units', data=units_ALL)

Reading a single number from another file using python

I am reading a number from a file p2.txt. This file contrains only 1 number which is an integer lets say 10.
test_file = open('p2.txt', 'r')
test_lines = test_file.readlines()
test_file.close()
ferNum= test_lines[0]
print int(ferNum)
when however, I am getting an error
print int(ferNum)
ValueError: invalid literal for int() with base 10: '1.100000000000000000e+01\n'
I can see that it is considering it just as a line. How can I parse that number to a variable? any suggestions? regards
The problem is that even though the value of the number is an integer (11) it is represented in scientific notation so you'd have to read it as a float first.
>>> float('1.100000000000000000e+01\n')
11.0
>>> int('1.100000000000000000e+01\n')
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
int('1.100000000000000000e+01\n')
ValueError: invalid literal for int() with base 10: '1.100000000000000000e+01\n'
You can of course convert first to a float then to an int after that.
>>> int(float('1.100000000000000000e+01\n'))
11

ValueError string to float when retrieving float32 from Netcdf file using Netcdf4 in python

I am using netcdf4 in python 2.7 on a windows7 machine. I have loaded numpy recarrays into a netcdf file I created and have subsequently retrieved the data several times. Then, for some unknown reason when I try to retrieve the data I get a ValueError could not convert string to float:
The code that is being used to retrieve the data is:
def getNetCDFGroupVarData(NCfilename, GroupPath, Variable):
""" ==============================================================
TITLE: getNetCDFGroupVarData
DESCR: for a valid variable on the specified path in a NetCDF file
returns a data vector
ARGS: NCfilename : netcdf4 file path and name
GroupPath : group path
Variable : variable name
RETURN: VarData: vector of variable data
DEPEND: netCDF4.Dataset
=======================================================================
"""
# get rootgroup and group from which to return attributes
if os.path.isfile(NCfilename):
RG = Dataset(NCfilename, 'a')
G = giveListEndGroup(RG,GroupPath)
# retrieve variable data from group
keyVar = G.variables.keys()
print(keyVar)
kvlen = len(keyVar)
var = unicode(Variable)
if kvlen > 0 :
print('variable name: ',var)
V = G.variables[var]
print V.dtype
print V.shape
print V.dimensions
VarData = V[:] #====== Error raised here ==============
else:
print('no keys found')
VarData = None
RG.close()
return VarData
The print outputs and error stack I get when calling this function are:
[u'time', u'SECONDS', u'NANOSECONDS', u'Rg', u'Ts1', u'Ts2', u'Ts3', u'V_log', u'T_log']
('variable name: ', u'time')
float64
(88872,)
(u'time',)
variable: time does not exist
Unexpected error: <type 'exceptions.ValueError'>
Traceback (most recent call last):
File "C:\Users\rclement\Documents\My Dropbox\Code\python\NCTSutil\Panel_NCTS_structure.py", line 69, in tree_path_changed
pub.sendMessage('NetcdfTS.group.specified', arg1=pathlist )
File "C:\Python27\lib\site-packages\pubsub\core\kwargs\publisher.py", line 27, in sendMessage
topicObj.publish(**kwargs)
File "C:\Python27\lib\site-packages\pubsub\core\kwargs\publishermixin.py", line 24, in publish
self._publish(msgKwargs)
File "C:\Python27\lib\site-packages\pubsub\core\topicobj.py", line 376, in _publish
self.__sendMessage(data, self, iterState)
File "C:\Python27\lib\site-packages\pubsub\core\topicobj.py", line 397, in __sendMessage
self._mix_callListener(listener, data, iterState)
File "C:\Python27\lib\site-packages\pubsub\core\kwargs\publishermixin.py", line 64, in _mix_callListener
listener(iterState.filteredArgs, self, msgKwargs)
File "C:\Python27\lib\site-packages\pubsub\core\kwargs\listenerimpl.py", line 43, in __call__
cb(**kwargs)
File "C:\Users\rclement\Documents\My Dropbox\Code\python\NCTSutil\NetcdfTimeSeries.py", line 70, in listner_group
atime = self.GetSelectedVariableData(pathlist, u'time')
File "C:\Users\rclement\Documents\My Dropbox\Code\python\NCTSutil\NetcdfTimeSeries.py", line 307, in GetSelectedVariableData
VarData = MNU.getNetCDFGroupVarData(self.filename, GroupPathList, variable )
File "C:\Users\rclement\Documents\My Dropbox\Code\python\NCTSutil\MyNetcdfUtil.py", line 304, in getNetCDFGroupVarData
VarData = V[:]
File "netCDF4.pyx", line 2949, in netCDF4.Variable.__getitem__ (netCDF4.c:36472)
File "netCDF4.pyx", line 2969, in netCDF4.Variable._toma (netCDF4.c:36814)
ValueError: could not convert string to float:
When I use other netcdf utilities (i.e. panolpy) I can access the data.
Does anyone have a clue why netcdf4 would be throwing this excpetion - or worse - how it could have inserted a string in my float32 field in the netcdf file?
From the traceback the problem was occurring in the "_toma" Netcdf4 function which converts the data to a masked array. When reading the file with other utilities (eg. NCDUMP) I had no problem accessing the data.
At the moment I believe the problem occurred because I had an unassigned 'missing_value' attribute for the variable. Apparently, if there is no 'missing_value' attribute Netcdf4 defaults to a missing value appropriate for the dtype. In my implementation the 'missing_value' attribute was being exposed for editing via a wxpyhton grid control. When the edited attributes in the grid were written back to the netcdf file the empty grid cell was returning either a None object or wx.emptyString, which Netcdf4 attempted to insert into the float type in the netcdf file.

Index out of bound while reading a dataframe

I have a tab separated file that I am trying to parse and for that I am doing this :
header of my file :
chrom coord ref_base var_base A C G T
17 26695663 G A 1 0 1934 0
17 26695664 T A 1 0 1 1935
my code is :
counts = pd.read_csv(args.counts_file, sep='\t')
toto = counts[(counts['chrom'].astype(str) == "17") & (counts['coord'].astype(str) == "26695663")]
print toto["G"].values[0]
this function returns the number wanted which is 1934
Now when I try to create a function that takes arguments the dataframe read from the file, I wrote this function
def get_foreground_counts(chrom, coord, counts, ref_base, var_base):
foreground_counts = counts[(counts['chrom'] == chrom) & (counts['coord'] == coord)]
foreground_ref_counts = foreground_counts[ref_base].values[0]
foreground_var_counts = foreground_counts[var_base].values[0]
return foreground_ref_counts, foreground_var_counts
I got this error that I am trying to figure out but still cant see why
Traceback (most recent call last):
File "test.py", line 203, in <module>
main(args)
File "test.py", line 71, in main
foreground_ref_counts, foreground_var_counts = get_foreground_counts(chrom, coord, counts, ref_base, var_base)
File "test.py", line 137, in get_foreground_counts
foreground_ref_counts = foreground_counts[ref_base].values[0]
IndexError: index out of bounds
Any idea why ?
Thanks
UPDATE
When I try to print foreground_counts[ref_base].values I get this []
What I am passing to the function is chrom (string), coord(string), counts(panda dataframe), ref_base (string), var_base(string) )
In your function, your filter does return zero rows, that's why you get the error. It seems you forgot the .astype(str) in your function's first line.
You could either cast the column type before calling the function or modify that line. The former would be a better approach if you really need to use a string type, otherwise why don't you use integer values for the comparison?.

Return the average mark givin the respective lecture

Each line represents a single student and consists of a student number, a name, a section code and a midterm grade, all separated by whitespace.
The first parameter is already done and the file is open and
The second parameter is a section code
this is the link http://www.cdf.toronto.edu/~csc108h/fall/exercises/e3/grade_file.txt
My code:
def average_by_section(the_file, section_code):
'''(io.TextIOWrapper, str) -> float
Return the average midtermmark for all students in that section
'''
score = 0
n = 0
for element in the_file:
line = element.split()
if section_code == line[-2]:
mark = mark + float(line[-1])
n += 1
lecture_avg = mark / n
return lecture_avg
I'm getting an index out of range. Is this correct? Or am I just opening up the wrong file?
can someone test this code and download that file? I'm pretty sure it should work, but not for me.
Well, you can troubleshoot the index out of range error with a print line or print(line) to explore the number of items in "line" (i.e. the effect of split()). I'd suggest looking closer at your split() statement...
It looks like you are omitting parts of your code where you define some of those variables (section_code, mark, etc.), but adjusting for some of those things seems to work properly. Assuming that the error you got was IndexError: list index out of range, that happens when you try to access an element of a list by index where that index doesn't exist. For instance:
>>> l = ['one']
>>> l[0]
'one'
>>> l[1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> l[-1]
'one'
>>> l[-2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
Therefore in your code, you will get that error if line is ever fewer than two items. I would check and see what you are actually getting for line to make sure it is what you expect.

Categories

Resources