numpy.where produces inconsistent results

numpy.where produces inconsistent results - python

I have a piece of code where I need to look for an index of a value in a numpy array.
For this task, I use numpy.where.
The problem is that numpy.where produces a wrong result, i.e. returns an empty array, in situations where I am certain that the searched value is in the array.
To make things worse, I tested that the element is really in the array with a for loop, and in case it is found, also look for it with numpy.where.
Oddly enough, then it finds a result, while literally a line later, it doesnt.
Here is how the code looks like:
# progenitors, descendants and progenitor_outputnrs are 2D-arrays that are filled from reading in files.
# outputnrs is a 1d-array.
ozi = 0
for i in range(descendants[ozi].shape[0]):
if descendants[ozi][i] > 0:
if progenitors[ozi][i] < 0:
oind = outputnrs[0] - progenitor_outputnrs[ozi][i] - 1
print "looking for prog", progenitors[ozi][i], "with outputnr", progenitor_outputnrs[ozi][i], "in", outputnrs[oind]
for p in progenitors[oind]:
if p == -progenitors[ozi][i]:
# the following line works...
print "found", p, np.where(progenitors[oind]==-progenitors[ozi][i])[0][0]
# the following line doesn't!
iind = np.where(progenitors[oind]==-progenitors[ozi][i])[0][0]
I get the output:
looking for prog -76 with outputnr 65 in 66
found 76 79
looking for prog -2781 with outputnr 65 in 66
found 2781 161
looking for prog -3797 with outputnr 63 in 64
found 3797 163
looking for prog -3046 with outputnr 65 in 66
found 3046 163
looking for prog -6488 with outputnr 65 in 66
found 6488 306
Traceback (most recent call last):
File "script.py", line 1243, in <module>
main()
File "script.py", line 974, in main
iind = np.where(progenitors[oind]==-progenitors[out][i])[0][0]
IndexError: index 0 is out of bounds for axis 0 with size 0
I use python 2.7.12 and numpy 1.14.2.
Does anyone have an idea why this is happening?

Related

Data analysis - MD analysis Python

I am seeing this error, need help on this!
warnings.warn("Failed to guess the mass for the following atom types: {}".format(atom_type))
Traceback (most recent call last):
File "traj_residue-z.py", line 48, in
protein_z=protein.centroid()[2]
IndexError: index 2 is out of bounds for axis 0 with size 0

The problem was solved through a discussion in the mailing list thread https://groups.google.com/g/mdnalysis-discussion/c/J8oJ0M9Rjb4/m/kSD2jURODQAJ
In brief: The traj_residue-z.py script contained the line
protein=u.select_atoms('resid 1-%d' % (nprotein_res))
It turned out that the selection 'resid 1-%d' % (nprotein_res) would not select anything because the input GRO file started with resid 1327
1327LEU N 1 2.013 3.349 8.848 0.4933 -0.2510 0.2982
1327LEU H1 2 1.953 3.277 8.893 0.0174 0.1791 0.3637
1327LEU H2 3 1.960 3.377 8.762 0.6275 -0.5669 0.1094
...
and hence the selection of resids starting at 1 did not match anything. This produced an empty AtomGroup protein.
The subsequent centroid calculation
protein_z=protein.centroid()[2]
failed because for an empty AtomGroup, protein.centroid() returns an empty array and so trying to get the element at index 2 raises IndexError.
The solution (thanks to #IAlibay) was to
either change the selection string 'resid 1-%d' to accommodate start and stop resids, or
to just select the first nprotein_res residues protein = u.residues[:nprotein_res].atoms by slicing the ResiduesGroup.

How to fix the error of this code: "dirac[:N / 2] = 1"?

I got this python code from the internet, and it's for calculating the modulation spread function (MTF) from an input image. Here is the
full code.
The problem is that the code is not functioning on my PC due to an error in this line :
TypeError Traceback (most recent call last)
<ipython-input-1-035feef9e484> in <module>
54 N = 250
55 dirac = np.zeros(N)
---> 56 dirac[:N / 2] = 1
57
58 # Filter edge
TypeError: slice indices must be integers or None or have an __index__ method

Simply make N/2 an integer again.
dirac[:int(N/2)] = 1

How to set a condition statement on a loop process

I am using Python 3; I have a problem in setting a condition statement over some groups (to consider pixel only when there are more than 5 available data) in a loop and I expect to get a blank pixel whether the condition isn't satisfied.
I tried some 'if' statement, but I am constantly getting a KeyError when the condition isn't maybe satisfied.
I'll show the code:
Xpix = 78
Ypix = 30
row = []
mean_val = []
for i in range (0,Ypix):
for j in range (0,Xpix):
if(len(data_pixel.groupby(['lin','col']).get_group((i,j))[['gamma']])>=5):
means = data_pixel.groupby(['lin','col']).get_group((i,j))[['gamma'].mean()
else:
means = 0
row.append(means)
mean_val = np.array(row).reshape(Ypix, Xpix)
I expect a 78 x 30 array to plot with blank pixels and mean pixels.
Here I show the error I got:
Traceback (most recent call last):
File "map.py", line 415, in <module>
proc.process()
File "map.py", line 215, in process
if (len(data_pixel.groupby(['lin', 'col']).get_group((i,j))[['gamma']])>=5):
File "/xxx/yyy/anaconda3/envs/gnss/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 680, in get_group
raise KeyError(name)
KeyError: (10,41)
data_pixel refers to a big dataframe with a lot of data. I would appreciate a lot if anyone could help with this.

Unable to index into a list

I am writing a program to read the output of another program, read it line by line and put it in a list.
#!/usr/bin/python
import subprocess
def RECEIVE(COMMAND):
PROCESS = subprocess.Popen(COMMAND, stdout=subprocess.PIPE)
LINES = iter(PROCESS.stdout.readline, "")
for LINE in LINES:
RECARR = LINE.split()
print RECARR[14]
RECEIVE(["receivetest","-f=/dev/pcan32"])
The output from the receivetest program is:
19327481.401 receivetest: m s 0x0000000663 8 2f 00 42 02 00 e4 8a 8a
19327481.860 receivetest: m s 0x000000069e 8 00 1f 5e 28 34 83 59 1a
it is a constant stream of messages. When split, the list has a range of 14 because after splitting, to make sure, I used:
print len(RECARR)
This gave me an output of 14.
but whenever I try to print the last element:
print RECARR[14]
I get the following error:
file "./cancheck.py", line 10, in RECEIVE
print RECARR[14]
IndexError: list index out of range
This is caused by some erronious text that is printed at the top of the list, so I need some way of making sure that the program only reads in lines that start with
1234567.123
/^(.......\.\d{1,3}) (.*)$/
Any ideas?

Based on the sample data you provided, the length of RECARR is always 14.
14 is the size of the list, not the maximum index. To get the final element of the array, you can try RECARR[13] for this list, or RECARR[-1] in general.
The reason for this is that in Python, as in most programming languages, array indices are zero-based: the first element is accessed with RECARR[0], the second with RECARR[1], and so on. So, the 14th element (or the last one, in your case) would be accessed with RECARR[13].
So, your for loop would look something like this:
for LINE in LINES:
RECARR = LINE.split()
print RECARR[13] # or RECARR[-1]

Right everyone, it's a terrible workaround but I fixed the issue by working out that the only lines with exactly 14 elements are the lines I need so I fixed it by using the following
for LINE in LINES:
RECARR = LINE.split()
if(len(RECARR) == 14):
#do stuff

List indexes start from 0 and not 1. So
print RECARR[1]
prints the 2nd element and not the first. Thus to print the last element you have to use print RECARR[13] or negative index print RECARR[-1].
The lists in python can be depicted as
As you can see the last element can be accessed using either -1 or length of the list -1
An easier way to gauge the ranges is to put the indices before the cell. (Courtesy - Aristide)
+---+---+---+---+---+---+
| P | y | t | h | o | n |
+---+---+---+---+---+---+
0 1 2 3 4 5 6
-6 -5 -4 -3 -2 -1

You could have also done something similar to this:
try:
print RECARR[13]
except IndexError:
pass
This way you can easily handle the lines that are not long enough as well.

Data conversion error with numpy

i am in the process of making mu code nicer and i saw that numpy has some very nifty functions already built-in. However the following code throws an error that i cannot
explain:
data = numpy.genfromtxt('table.oout',unpack=True,names=True,dtype=None)
real_ov_data=np.float32(data['real_overlap'])
ana_ov_data= np.float32(data['Analyt_overlap'])
length_data =np.float32(data['Residues'])
plot(length_data,real_ov_data,label="overlapped Peaks, exponential function",marker="x", markeredgecolor="blue", markersize=3.0, linestyle=" ",color="blue")
plot(length_data,ana_ov_data,label="expected overlapped Peaks",marker="o", markeredgecolor="green", markersize=3.0, linestyle=" ",color="green")
throws the error
Traceback (most recent call last):
File "length_vs_overlap.py", line 52, in <module>
real_ov_data=np.float32(data['real_overlap'])
ValueError: invalid literal for float(): real_overlap
>Exit code: 1
when i am trying to read the following file:
'Residues' 'Analyt_overlap' 'anz_analyt_overlap' 'real_overlap'
21 1.2502 29 0.0000
13 1.0306 25 0.0000
56 5.8513 84 2.8741
190 68.0940 329 28.4706
54 5.4271 83 2.4999
What am i doing wrong? My piece of code should be simple enough?

You've either repeated the header line, or you're specifying the names as a list.
That's causing each column to be read as a string type starting with the column title.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy.where produces inconsistent results - python

Related

Data analysis - MD analysis Python

How to fix the error of this code: "dirac[:N / 2] = 1"?

How to set a condition statement on a loop process

Unable to index into a list

Data conversion error with numpy

Categories

Resources