Deleting values from array with np.diff - python

I need to edit an array. The array has two columns. One for X-Values, the other for Y-Values. The X-Values are 0.0025 steps (0, 0.0025, 0.005, etc.) but sometimes there are wrong steps and I need to delete those. The others recommend that I use the following:
data = data[~np.r_[True, (np.diff(data[:,0])>0)&(np.diff(data[:, 0])<0.0024)]]
The problem is that the first value always gets deleted and the second problem is that it doesn´t just delete the wrong step but the one after it too.

The reason the first element is always being deleted is because you invert the output of np.r_ which prepends True to the output of np.diff. When using ~, that gets turned into a False, and thus the first element gets deleted.
My guess that the step after gets deleted too is because np.diff checks the difference between consecutive elements. Consider:
0.0025, 0.005, 0.008, 0.01, 0.0125
~~~~~
# The diff here is going to look like:
0.0025, 0.003, 0.002, 0.0025
Note how the wrong element results in a wrong diff both before AND after that element.
If that is unexpected behavior, then you should not use np.diff, instead compare with the expected steps directly using np.arange
import numpy as np
# Solution:
data[ np.isclose(data[:, 0], np.arange(start, stop, 0.0025)) ]
# with I'm guessing start=0, and stop=data.shape[0]*0.0025

Related

Backchange enumeration mistakes

I have to change an array with an enumeration. It´s made of 0.0025 steps, but because of a methode I use it changes slightly. So it looks kind of like this:
[0, 0]
[0.002499989, 1]
[0.0049989, 2]
[0.00749989, 3]
[0.0103, 4]
I can´t just round them to the fourth decimal because at the end of the array they get significantly bigger then they should be, so e.g. the last value is 21.1892 instead of 21.1875.
So I tried the following:
def enumeration(data):
data = np.round_(data[:,0], 4) - (np.round_(data[:,0], 4)%0.0025)
return data
Whick works fine for all values, except for those who can be devided by 0.0075, so 0.0075, 0.015, 0.0225, etc.
Those values get changed to the previous ones, so 0.0075 -> 0.005, 0.015 -> 0.0125, 0.0225 -> 0.02
I have no idea why thats the case, if anybody could explain it to me, that would be great.
One solution is to build the list as multiples of 0.0025 directly:
data = np.array([[0.0025*i, i] for i in range(n)])

inverse elements except for zero in a numpy vector

In python, I'm trying to inverse a numpy vector except for these elements with zero values.
I used vectorize function, but always got a wrong answer when the first element is zero, (the code works well when zeros are not in the first position ).
active_N=np.array([0,1,3,5])
f=np.vectorize(lambda x:x if x==0 else 1./x)
active_N_inverse=f(active_N)
Run the code then I get
array([0, 0, 0, 0])
What was wrong with the codes above?
Is there any other method to solve this problem with high efficiency?
Use np.divide with a where clause:
np.divide(1, active_N, where=active_N!=0)
Optionally combined with round:
np.divide(1, active_N, where=active_N!=0).round(100)
Output:
array([0. , 1. , 0.33333333, 0.2 ])

including a negative number in the log sum of exponents, in python

I want to use numpy's logsumexp() in python 2.7.
The formula I need to solve looks like this:
log ( 1 + e^a1 + e^a2 + e^a3 + ... e^an - e^ax )
The last term which is a negative number just has to be appended on.
Excluding this last term, I would do the following:
myarray = numpy.array([0, a1, a2, a3, ..., an])
That way, with the first element being 0, then e^0 = 1 and so I have my first term, which is 1. Then I would just use
result = numpy.logsumexp(myarray)
and I would get the correct result.
But now I have to append a -e^ax, and because it's negative, I can't simply append ax to the end of myarray. I also can't append -ax because that's just wrong, it would mean that I'm adding 1/e^ax, instead of -e^ax.
Is there any direct way to append this so that I can still use logsumexp()? The only reason I'm insisting on using logsumexp() rather than separately using numpy.exp() and numpy.sum() and numpy.log() is because I have the impression that logsumexp also incorporates stability within it in order to prevent underflows (correct me if I'm wrong). However if there's no other way around then I guess I have no choice.
According to scipy.misc.logsumexp documentation:
scipy.misc.logsumexp(a, axis=None, b=None)
Parameters:
b: array-like, optional
Scaling factor for exp(a).
Must be of the same shape as a or broadcastable to a.
New in version 0.12.0.
So, you could add list of factors like this:
In [2]: a = [0, 1, 3, 2]
In [3]: logsumexp(a, b=[1] * (len(a) - 1) + [-1])
Out[3]: 2.7981810916785101

Calculate a discrete mean in python

I have a set of data points for which I have made a program that will look into the data set, from that set take every n points, and sum it, and put it in a new list. And with that I can make a simple bar plots.
Now I'd like to calculate a discrete mean for my new list.
The formula I'm using is this: t_av=(1/nsmp) Sum[N_i*t_i,{i,n_l,n_u}]
Basically I have nsmp bins that have N_i number in them, t_i is a time of a bin, and n_l is the first bin, and n_u is the last bin.
So if my list is this: [373, 156, 73, 27, 16],
I have 5 bins, and I have: t_av=1/5 (373*1+156*2+73*3+27*4+16*5)=218.4
Now I have run into a problem. I tried with this:
for i in range(0,len(L)):
sr_vr = L[i]*i
tsr=sr_vr/nsmp
Where nsmp is the number of bins I can set, and I have L calculated. Since range will go from 0,1,2,3,4 I won't get the correct answer, because my first bin is calculated by 0. If I say range(1,len(L)+1) I'll get IndexError: list index out of range, since that will mess up the L[i]*i part since he will still multiply second (1) element of the list with 1, and then he'll be one entry short for the last part.
How do I correct this?
You can just use L[i]*(i+1) (assuming you stick with zero-based indexing).
However you can also use enumerate() to loop over indices and values together, and you can even provide 1 as the second argument so that the indexing starts at 1 instead of 0.
Here is how I would write this:
tsr = sum(x * i for i, x in enumerate(L, 1)) / len(L)
Note that if you are on Python 2.x and L contains entirely integers this will perform integer division. To get a float just convert one of the arguments to a float (for example float(len(L))). You can also use from __future__ import division.

Choosing a random sample from each row of Numpy array, excluding negative numbers

I have a Numpy array that looks like
>>> a
array([[ 3. , 2. , -1. ],
[-1. , 0.1, 3. ],
[-1. , 2. , 3.5]])
I would like to select a value from each row at random, but I would like to exclude the -1 values from the random sampling.
What I do currently is:
x=[]
for i in range(a.shape[0]):
idx=numpy.where(a[i,:]>0)[0]
idxr=random.sample(idx,1)[0]
xi=a[i,idxr]
x.append(xi)
and get
>>> x
[3.0, 3.0, 2.0]
This is becoming a bit slow for large arrays and I would like to know if there is a way to conditionally select random values from the original a matrix without dealing with each row individually.
I really don't think that you will find anything in Numpy that does exactly what you are asking as packaged so I've decided to offer what optimizations I could think up.
There are several things that could make this slow here. First off, numpy.where() is rather slow because it has to check every value in the sliced array (the slice is generated for each row as well) and then generate an array of values. The best thing that you could do if you plan on doing this process over and over again on the same matrix would be to sort each row. Then you would just use a binary search to find where the positive values start and just use a random number to select a value from them. Of course, you could also just store the indices where the positive values start after finding them once with binary searches.
If you don't plan on doing this process many times over, then I would recommend using Cython to speed up the numpy.where line. Cython would allow you to not need to slice the rows out and speed up the process overall.
My last suggestion is to use random.choice rather than random.sample unless you really do plan on choosing sample sizes that are larger than 1.

Categories

Resources