I have to change an array with an enumeration. It´s made of 0.0025 steps, but because of a methode I use it changes slightly. So it looks kind of like this:
[0, 0]
[0.002499989, 1]
[0.0049989, 2]
[0.00749989, 3]
[0.0103, 4]
I can´t just round them to the fourth decimal because at the end of the array they get significantly bigger then they should be, so e.g. the last value is 21.1892 instead of 21.1875.
So I tried the following:
def enumeration(data):
data = np.round_(data[:,0], 4) - (np.round_(data[:,0], 4)%0.0025)
return data
Whick works fine for all values, except for those who can be devided by 0.0075, so 0.0075, 0.015, 0.0225, etc.
Those values get changed to the previous ones, so 0.0075 -> 0.005, 0.015 -> 0.0125, 0.0225 -> 0.02
I have no idea why thats the case, if anybody could explain it to me, that would be great.
One solution is to build the list as multiples of 0.0025 directly:
data = np.array([[0.0025*i, i] for i in range(n)])
Related
I need to edit an array. The array has two columns. One for X-Values, the other for Y-Values. The X-Values are 0.0025 steps (0, 0.0025, 0.005, etc.) but sometimes there are wrong steps and I need to delete those. The others recommend that I use the following:
data = data[~np.r_[True, (np.diff(data[:,0])>0)&(np.diff(data[:, 0])<0.0024)]]
The problem is that the first value always gets deleted and the second problem is that it doesn´t just delete the wrong step but the one after it too.
The reason the first element is always being deleted is because you invert the output of np.r_ which prepends True to the output of np.diff. When using ~, that gets turned into a False, and thus the first element gets deleted.
My guess that the step after gets deleted too is because np.diff checks the difference between consecutive elements. Consider:
0.0025, 0.005, 0.008, 0.01, 0.0125
~~~~~
# The diff here is going to look like:
0.0025, 0.003, 0.002, 0.0025
Note how the wrong element results in a wrong diff both before AND after that element.
If that is unexpected behavior, then you should not use np.diff, instead compare with the expected steps directly using np.arange
import numpy as np
# Solution:
data[ np.isclose(data[:, 0], np.arange(start, stop, 0.0025)) ]
# with I'm guessing start=0, and stop=data.shape[0]*0.0025
I am trying to write a sigma clipping program that calculates the differences between each point in an array and its neighbor, and if the difference is greater than x times the standard deviation of the array, it sets the neighbor equal to the average of the two points closest to it. For example, if I had an array, testarray = np.array([1.01, 2.0, 1.22, 1.005, .996, 0.95]), and wanted to change any points that were more than 2 times deviant from their neighbor, then this function would search through the array and set the 2.0 in the testarray equal to 1.115, the average of 1.01 and 1.22.
def sigmaclip2(array, stand):
originalDeviation = np.std(array)
differences = np.abs(np.diff(array))
for i in range(len(differences)):
if differences[i] > stand*originalDeviation:
if array[i+1] != array[-1]:
array[i+1] = (array[i] + array[i+2]) / 2.0
else:
array[i+1] = (array[i] + array[i-1]) / 2.0
else:
pass
return array
This code works for this small testarray. But, I am working with a larger data set (~12000 elements). When I try to run it on the larger data set, I get the same array back that I plugged in.
Does anyone know what might be going wrong?
I should note that I have tried some of Python's built in sigma clipping routines, such as the one from Astropy, but it appears as if that cuts off any values that are greater than x times the standard deviation of the array. This is not what I want to do. I want to find any large, sudden jumps (often caused by 1 bad value) and set that bad value equal to the average of the 2 points around it if the bad value is more than x times the standard deviation discrepant from its neighbor.
in line 6 of your function array[-1] may be a typo as it always uses the last element of the array. Are you missing an i? In which case you might need to shift by one as difference[0] is the diff between array[0] and array[1]
PS I think I would use np.where with slice notation on array to find just the indexes to alter rather than useing a normal python loop. With numpy a loop is almost always a bad idea.
EDIT
Understand about edges but I don't think your code does what you expect. When I run it it averages array[2] to 1.06 as well as array[1] to 1.115
If I change line 6 to if array[i+1] != array[i-1]: (array[-1] is the last entry, always 0.95) it still doesn't work properly.
You also have to think about what you want your code to do where you get more than one outlier.. 1.01, 2.0, 2.25, 1.99, 1.22, 1.005, .996, 0.95 To cope with single outliers I would use something like
def sigmaclip3(array, stand):
cutoff = stand * np.std(array)
diffs = np.abs(np.diff(array))
ix = np.where((diffs[:-1] > cutoff) &
(diffs[1:] > cutoff))[0] + 1
array[ix] = (array[ix - 1] + array[ix + 1]) / 2.0
return array
I want to use numpy's logsumexp() in python 2.7.
The formula I need to solve looks like this:
log ( 1 + e^a1 + e^a2 + e^a3 + ... e^an - e^ax )
The last term which is a negative number just has to be appended on.
Excluding this last term, I would do the following:
myarray = numpy.array([0, a1, a2, a3, ..., an])
That way, with the first element being 0, then e^0 = 1 and so I have my first term, which is 1. Then I would just use
result = numpy.logsumexp(myarray)
and I would get the correct result.
But now I have to append a -e^ax, and because it's negative, I can't simply append ax to the end of myarray. I also can't append -ax because that's just wrong, it would mean that I'm adding 1/e^ax, instead of -e^ax.
Is there any direct way to append this so that I can still use logsumexp()? The only reason I'm insisting on using logsumexp() rather than separately using numpy.exp() and numpy.sum() and numpy.log() is because I have the impression that logsumexp also incorporates stability within it in order to prevent underflows (correct me if I'm wrong). However if there's no other way around then I guess I have no choice.
According to scipy.misc.logsumexp documentation:
scipy.misc.logsumexp(a, axis=None, b=None)
Parameters:
b: array-like, optional
Scaling factor for exp(a).
Must be of the same shape as a or broadcastable to a.
New in version 0.12.0.
So, you could add list of factors like this:
In [2]: a = [0, 1, 3, 2]
In [3]: logsumexp(a, b=[1] * (len(a) - 1) + [-1])
Out[3]: 2.7981810916785101
I'm still relatively new to Python and I've been attempting to iterate through a solution array obtained from odeint to no avail. I've tried many different things and get a slew of errors no matter which way I go about it. The odeint result is a waveform and I'm attempting to find all the maximum and minimum voltages to calculate midpoint, etc. I have this working in Matlab and posted the code so you can see what my goal is:
for i = 1:length(t)-2
if Y((i+1),1) > Y(i,1) && Y((i+1),1)>Y((i+2),1)
max = [max ,[t(i+1); Y((i+1),1)]];
end
if Y((i+1),1) < Y(i,1) && Y((i+1),1)< Y((i+2),1)
min = [min, [t(i+1); Y((i+1),1)]];
end
end
%remove any max not followed by min & mins not following a max for mdpt calc
if max(1,1)>min(1,1)
min(:,1) = [];
elseif min(1,end)<max(1,end)
max(:,end) = [];
end
midpt = [((max(1,:)+(min(1,:)))/2);(((max(2,:))+(min(2,:)))/2)];
I apologize if this code is bad, I'm still new to programming and don't often approach things the right way. Here is a piece of the python code so you can see what I need to loop:
t = linspace(0,3500,350000)
y_init = [-50, -50, 0.027, 0.891, 0.033, 0.051, 0.499,
0.019, 0.043, 0.031, 0.000, 0.062, 0.22,
0.008069, 0.560552, 0.045224, 1.060]
sol = odeint(dy_dt, y_init, t)
S0 = sol[:, 0]
I need to loop through S0 here like in the matlab code. I think my main problem is indexing the array so that I call the value of S0. I tend to get a not callable or float64 error and was hoping for some advice.
To iterate though the values in S0 you just need to do this...
for val in S0:
do something with val...
You don't need to worry about the indexes like you do in MATLAB because you are able to iterate over objects in python. The MATLAB-like way implemented in python would look something like this...
for i in range(0,len(S0))
do something with S0[i]...
Hope this helps.
I have a set of data points for which I have made a program that will look into the data set, from that set take every n points, and sum it, and put it in a new list. And with that I can make a simple bar plots.
Now I'd like to calculate a discrete mean for my new list.
The formula I'm using is this: t_av=(1/nsmp) Sum[N_i*t_i,{i,n_l,n_u}]
Basically I have nsmp bins that have N_i number in them, t_i is a time of a bin, and n_l is the first bin, and n_u is the last bin.
So if my list is this: [373, 156, 73, 27, 16],
I have 5 bins, and I have: t_av=1/5 (373*1+156*2+73*3+27*4+16*5)=218.4
Now I have run into a problem. I tried with this:
for i in range(0,len(L)):
sr_vr = L[i]*i
tsr=sr_vr/nsmp
Where nsmp is the number of bins I can set, and I have L calculated. Since range will go from 0,1,2,3,4 I won't get the correct answer, because my first bin is calculated by 0. If I say range(1,len(L)+1) I'll get IndexError: list index out of range, since that will mess up the L[i]*i part since he will still multiply second (1) element of the list with 1, and then he'll be one entry short for the last part.
How do I correct this?
You can just use L[i]*(i+1) (assuming you stick with zero-based indexing).
However you can also use enumerate() to loop over indices and values together, and you can even provide 1 as the second argument so that the indexing starts at 1 instead of 0.
Here is how I would write this:
tsr = sum(x * i for i, x in enumerate(L, 1)) / len(L)
Note that if you are on Python 2.x and L contains entirely integers this will perform integer division. To get a float just convert one of the arguments to a float (for example float(len(L))). You can also use from __future__ import division.