Data conversion error with numpy

Data conversion error with numpy - python

i am in the process of making mu code nicer and i saw that numpy has some very nifty functions already built-in. However the following code throws an error that i cannot
explain:
data = numpy.genfromtxt('table.oout',unpack=True,names=True,dtype=None)
real_ov_data=np.float32(data['real_overlap'])
ana_ov_data= np.float32(data['Analyt_overlap'])
length_data =np.float32(data['Residues'])
plot(length_data,real_ov_data,label="overlapped Peaks, exponential function",marker="x", markeredgecolor="blue", markersize=3.0, linestyle=" ",color="blue")
plot(length_data,ana_ov_data,label="expected overlapped Peaks",marker="o", markeredgecolor="green", markersize=3.0, linestyle=" ",color="green")
throws the error
Traceback (most recent call last):
File "length_vs_overlap.py", line 52, in <module>
real_ov_data=np.float32(data['real_overlap'])
ValueError: invalid literal for float(): real_overlap
>Exit code: 1
when i am trying to read the following file:
'Residues' 'Analyt_overlap' 'anz_analyt_overlap' 'real_overlap'
21 1.2502 29 0.0000
13 1.0306 25 0.0000
56 5.8513 84 2.8741
190 68.0940 329 28.4706
54 5.4271 83 2.4999
What am i doing wrong? My piece of code should be simple enough?

You've either repeated the header line, or you're specifying the names as a list.
That's causing each column to be read as a string type starting with the column title.

Related

Data analysis - MD analysis Python

I am seeing this error, need help on this!
warnings.warn("Failed to guess the mass for the following atom types: {}".format(atom_type))
Traceback (most recent call last):
File "traj_residue-z.py", line 48, in
protein_z=protein.centroid()[2]
IndexError: index 2 is out of bounds for axis 0 with size 0

The problem was solved through a discussion in the mailing list thread https://groups.google.com/g/mdnalysis-discussion/c/J8oJ0M9Rjb4/m/kSD2jURODQAJ
In brief: The traj_residue-z.py script contained the line
protein=u.select_atoms('resid 1-%d' % (nprotein_res))
It turned out that the selection 'resid 1-%d' % (nprotein_res) would not select anything because the input GRO file started with resid 1327
1327LEU N 1 2.013 3.349 8.848 0.4933 -0.2510 0.2982
1327LEU H1 2 1.953 3.277 8.893 0.0174 0.1791 0.3637
1327LEU H2 3 1.960 3.377 8.762 0.6275 -0.5669 0.1094
...
and hence the selection of resids starting at 1 did not match anything. This produced an empty AtomGroup protein.
The subsequent centroid calculation
protein_z=protein.centroid()[2]
failed because for an empty AtomGroup, protein.centroid() returns an empty array and so trying to get the element at index 2 raises IndexError.
The solution (thanks to #IAlibay) was to
either change the selection string 'resid 1-%d' to accommodate start and stop resids, or
to just select the first nprotein_res residues protein = u.residues[:nprotein_res].atoms by slicing the ResiduesGroup.

How to fix the error of this code: "dirac[:N / 2] = 1"?

I got this python code from the internet, and it's for calculating the modulation spread function (MTF) from an input image. Here is the
full code.
The problem is that the code is not functioning on my PC due to an error in this line :
TypeError Traceback (most recent call last)
<ipython-input-1-035feef9e484> in <module>
54 N = 250
55 dirac = np.zeros(N)
---> 56 dirac[:N / 2] = 1
57
58 # Filter edge
TypeError: slice indices must be integers or None or have an __index__ method

Simply make N/2 an integer again.
dirac[:int(N/2)] = 1

df_data not defined, unsure of the cause

working through a tutorial that is supposed to help students do the assignment, but I'm encountering a problem. I'm using python on a notebook project in IBM. Right now the section is simply data exploration. However this error is occurring and I'm not sure how to fix it, no one else seemed to have this problem in this class and the teacher is rather slow to help so I came here!
I tried just defining the variable before its called, but no dice either way.
All the code prior to this is just importing libraries and then parsing the data
# Infer the data type of each column and convert the data to the inferred data type
from ingest import *
eu = ExtensionUtils(sqlContext)
df_data_1 = eu.convertTypes(df_data_1)
df_data_1.printSchema()
the error I'm getting is
TypeError Traceback (most recent call last)
<ipython-input-14-33250ae79106> in <module>()
2 from ingest import *
3 eu = ExtensionUtils(sqlContext)
----> 4 df_data_1 = eu.convertTypes(df_data_1)
5 df_data_1.printSchema()
/opt/ibm/third-party/libs/python3/ingest/extension_utils.py in convertTypes(self, input_obj, dictVal)
304 """
305
--> 306 checkEnrichType_or_DataFrame("input_obj",input_obj)
307 self.logger = self._jLogger.getLogger(__name__)
308 methodname = str(inspect.stack()[0][3])
/opt/ibm/third-party/libs/python3/ingest/extension_utils.py in checkEnrichType_or_DataFrame(param, paramval)
81 if not isinstance(paramval,(EnrichType ,DataFrame)):
82 raise TypeError("%s should be a EnrichType class object or DataFrame, got type %s"
---> 83 % (str(param), type(paramval)))
84
85
TypeError: input_obj should be a EnrichType class object or DataFrame, got type <class 'NoneType'>

The solution was not with the code itself but rather with the notebook. A code snippet from a built in function needed to be inserted first before this.

numpy.where produces inconsistent results

I have a piece of code where I need to look for an index of a value in a numpy array.
For this task, I use numpy.where.
The problem is that numpy.where produces a wrong result, i.e. returns an empty array, in situations where I am certain that the searched value is in the array.
To make things worse, I tested that the element is really in the array with a for loop, and in case it is found, also look for it with numpy.where.
Oddly enough, then it finds a result, while literally a line later, it doesnt.
Here is how the code looks like:
# progenitors, descendants and progenitor_outputnrs are 2D-arrays that are filled from reading in files.
# outputnrs is a 1d-array.
ozi = 0
for i in range(descendants[ozi].shape[0]):
if descendants[ozi][i] > 0:
if progenitors[ozi][i] < 0:
oind = outputnrs[0] - progenitor_outputnrs[ozi][i] - 1
print "looking for prog", progenitors[ozi][i], "with outputnr", progenitor_outputnrs[ozi][i], "in", outputnrs[oind]
for p in progenitors[oind]:
if p == -progenitors[ozi][i]:
# the following line works...
print "found", p, np.where(progenitors[oind]==-progenitors[ozi][i])[0][0]
# the following line doesn't!
iind = np.where(progenitors[oind]==-progenitors[ozi][i])[0][0]
I get the output:
looking for prog -76 with outputnr 65 in 66
found 76 79
looking for prog -2781 with outputnr 65 in 66
found 2781 161
looking for prog -3797 with outputnr 63 in 64
found 3797 163
looking for prog -3046 with outputnr 65 in 66
found 3046 163
looking for prog -6488 with outputnr 65 in 66
found 6488 306
Traceback (most recent call last):
File "script.py", line 1243, in <module>
main()
File "script.py", line 974, in main
iind = np.where(progenitors[oind]==-progenitors[out][i])[0][0]
IndexError: index 0 is out of bounds for axis 0 with size 0
I use python 2.7.12 and numpy 1.14.2.
Does anyone have an idea why this is happening?

Error while using sum() in Python SFrame

I'm new to python and I'm performing a basic EDA analysis on two similar SFrames. I have a dictionary as two of my columns and I'm trying to find out if the max values of each dictionary are the same or not. In the end I want to sum up the Value_Match column so that I can know how many values match but I'm getting a nasty error and I haven't been able to find the source. The weird thing is I have used the same methodology for both the SFrames and only one of them is giving me this error but not the other one.
I have tried calculating max_func in different ways as given here but the same error has persisted : getting-key-with-maximum-value-in-dictionary
I have checked for any possible NaN values in the column but didn't find any of them.
I have been stuck on this for a while and any help will be much appreciated. Thanks!
Code:
def max_func(d):
v=list(d.values())
k=list(d.keys())
return k[v.index(max(v))]
sf['Max_Dic_1'] = sf['Dic1'].apply(max_func)
sf['Max_Dic_2'] = sf['Dic2'].apply(max_func)
sf['Value_Match'] = sf['Max_Dic_1'] == sf['Max_Dic_2']
sf['Value_Match'].sum()
Error :
RuntimeError Traceback (most recent call last)
<ipython-input-70-f406eb8286b3> in <module>()
----> 1 x = sf['Value_Match'].sum()
2 y = sf.num_rows()
3
4 print x
5 print y
C:\Users\rakesh\Anaconda2\lib\site-
packages\graphlab\data_structures\sarray.pyc in sum(self)
2216 """
2217 with cython_context():
-> 2218 return self.__proxy__.sum()
2219
2220 def mean(self):
C:\Users\rakesh\Anaconda2\lib\site-packages\graphlab\cython\context.pyc in
__exit__(self, exc_type, exc_value, traceback)
47 if not self.show_cython_trace:
48 # To hide cython trace, we re-raise from here
---> 49 raise exc_type(exc_value)
50 else:
51 # To show the full trace, we do nothing and let
exception propagate
RuntimeError: Runtime Exception. Exception in python callback function
evaluation:
ValueError('max() arg is an empty sequence',):
Traceback (most recent call last):
File "graphlab\cython\cy_pylambda_workers.pyx", line 426, in
graphlab.cython.cy_pylambda_workers._eval_lambda
File "graphlab\cython\cy_pylambda_workers.pyx", line 169, in
graphlab.cython.cy_pylambda_workers.lambda_evaluator.eval_simple
File "<ipython-input-63-b4e3c0e28725>", line 4, in max_func
ValueError: max() arg is an empty sequence

In order to debug this problem, you have to look at the stack trace. On the last line we see:
File "<ipython-input-63-b4e3c0e28725>", line 4, in max_func
ValueError: max() arg is an empty sequence
Python thus says that you aim to calculate the maximum of a list with no elements. This is the case if the dictionary is empty. So in one of your dataframes there is probably an empty dictionary {}.
The question is what to do in case the dictionary is empty. You might decide to return a None into that case.
Nevertheless the code you write is too complicated. A simpler and more efficient algorithm would be:
def max_func(d):
if d:
return max(d,key=d.get)
else:
# or return something if there is no element in the dictionary
return None

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Data conversion error with numpy - python

You've either repeated the header line, or you're specifying the names as a list. That's causing each column to be read as a string type starting with the column title.

Related

Data analysis - MD analysis Python

How to fix the error of this code: "dirac[:N / 2] = 1"?

df_data not defined, unsure of the cause

numpy.where produces inconsistent results

Error while using sum() in Python SFrame

Categories

Resources