Im struggling a little with stacking two matrices on top of each other. I'm using the pyKalman package, which when updated, returns a tuple of matrices. One with an updated estimate (new_pred a 1 x 2 vector) and the corresponding covariance matrix (new_cov a 2 x 2 matrix).
After the update, I want to stack the returned values to their corresponding outputs, for a recursive smoothing of the data, through these estimates.
The following is how it is currently implemented.
for meas in onlineObservations:
(new_pred, new_cov) = kf.filter_update(states_pred[-1], cov_pred[-1], meas)
states_pred = np.vstack((states_pred, new_pred))
cov_pred = np.stack((cov_pred, new_cov), axis=0)
Which works really well for the updated estimate (the 1x2 vector), but fails when i try to add new_cov to the array called cov_pred. For good measure:
states_pred.shape = (900,2)
cov_pred.shape = (900, 2, 2)
I've tried changing the axis of "stack" to no avail. It's probably something elementary, but i've been struggling with it for the past hour, and cannot seem to find a "simple" solution.
Thanks in advance.
This should work -
cov_pred = []
for meas in onlineObservations:
(new_pred, new_cov) = kf.filter_update(states_pred[-1], cov_pred[-1], meas)
states_pred = np.vstack((states_pred, new_pred))
cov_pred.append[new_cov]
cov_pred = np.stack(cov_pred, axis=0)
But since you want to update array which you are using already in code, you should use np.concatenate
for meas in onlineObservations:
(new_pred, new_cov) = kf.filter_update(states_pred[-1], cov_pred[-1], meas)
states_pred = np.vstack((states_pred, new_pred))
cov_pred = np.concatenate((cov_pred, np.reshape(new_cov, (1,2,2))), axis=0)
I've been able to make it work by converting cov_pred to a list, and then use:
cov_pred.append(new_cov)
And then re-convert it back again after the for loop. But it seems tedious - at least if there's an even better way!
You can keep your code inside a For Loop (While Loop will also do) and use 'Auto-index Enabled' and thats it....
At the output of Loop, LabVIEW will create a 3D data exactly as your requirement.
Related
I'm starting to use numpy. I get the slice notations and element-wise computations, but I can't understand this:
for i, (I,J) in enumerate(zip(data_list[0], data_list[1])):
joint_hist[int(np.floor(I/self.bin_size))][int(np.floor(J/self.bin_size))] += 1
Variables:
data_list contains two np.array().flatten() images (eventually more)
joint_hist[] is the joint histogram of those two images, it's displayed later with plt.imshow()
bin_size is the number of slots in the histogram
I can't understand why the coordinate in the final histogram is I,J. So it's not just that the value at a position in joint_hist[] is the result of some slicing/element-wise computation. I need to take the result of that computation and use THAT as the indices in joint_hist...
EDIT:
I indeed do not use the i in the loop actually - it's a leftover from previous iterations and I simply hadn't noticed I didn't need it anymore
I do want to remain in control of the bin sizes & the details of how this is done, so not particularly looking to use histogramm2D. I will later be using that for further image processing, so I'd rather have the flexibility to adapt my approach than have to figure out if/how to do particular things with built-in functions.
You can indeed gussy up that for loop using some numpy notation. Assuming you don't actually need i (since it isn't used anywhere):
for I,J in (data_list.T // self.bin_size).astype(int):
joint_hist[I, J] += 1
Explanation
data_list.T flips data_list on its side. Each row of data_list.T will contain the data for the pixels at a particular coordinate.
data_list.T // self.bin_size will produce the same result as np.floor(I/self.bin_size), only it will operate on all of the pixels at once, instead of one at a time.
.astype(int) does the same thing as int(...), but again operates on the entire array instead of a single element.
When you iterate over a 2D array with a for loop, the rows are returned one at a time. Thus, the for I,J in arr syntax will give you back one pair of pixels at a time, just like your zip statement did originally.
Alternative
You could also just use histogramdd to calculate joint_hist, in place of your for loop. For your application it would look like:
import numpy as np
joint_hist,edges = np.histogramdd(data_list.T)
This would have different bins than the ones you specified above, though (numpy would determine them automatically).
If I understand, your goal is to make an histogram or correlated values in your images? Well, to achieve the right bin index, the computation that you used is not valid. Instead of np.floor(I/self.bin_size), use np.floor(I/(I_max/bin_size)).astype(int). You want to divide I and J by their respective resolution. The result that you will get is a diagonal matrix for joint_hist if both data_list[0] and data_list[1] are the same flattened image.
So all put together:
I_max = data_list[0].max()+1
J_max = data_list[1].max()+1
joint_hist = np.zeros((I_max, J_max))
bin_size = 256
for i, (I, J) in enumerate(zip(data_list[0], data_list[1])):
joint_hist[np.floor(I / (I_max / bin_size)).astype(int), np.floor(J / (J_max / bin_size)).astype(int)] += 1
I've recently "taught" myself python in order to analyze data for my experiments. As such I'm pretty clueless on many aspects. I've managed to make my analysis work for certain files but in some cases it breaks down and I imagine it is a result of faulty programming.
Currently I export a file containing 3 numpy arrays. One of these arrays is my signal (float values from -10 to 10). What I wish to do is to normalize every datum in this array to a range of values that preceed it. (i.e. the 30001st value must have the average of the preceeding 3000 values subtracted from it and then the difference must then be divided by thisvery same average (the preceeding 3000 values). My data is collected at a rate of 100Hz thus to get a normalization of the alst 30s i must use the preceeding 3000values.
As it stand this is how I've managed to make it work:
this stores the signal into the variable photosignal
photosignal = np.array(seg.analogsignals[0], ndmin=1)
now this the part I use to get the delta F/F over a moving window of 30s
normalizedphotosignal = [(uu-(np.mean(photosignal[uu-3000:uu])))/abs(np.mean(photosignal[uu-3000:uu])) for uu in photosignal[3000:]]
The following adds 3000 values to the beginning to keep the array the same length since later on i must time lock it to another list that is the same length
holder =list(range(3000))
normalizedphotosignal = holder + normalizedphotosignal
What I have noticed is that in certain files this code gives me an error because it says that the"slice" is empty and therefore it cannot create a mean.
I think maybe there is a better way to program this that could avoid this problem altogether. Or this a correct way to approach this problem?
So i tried the solution but it is quite slow and it nevertheless still gives me the "empty slice error".
I went over the moving average post and found this method:
def running_mean(x, N):
cumsum = np.cumsum(np.insert(x, 0, 0))
return (cumsum[N:] - cumsum[:-N]) / N
however I'm having trouble accommodating it to my desired output. namely (x-running average)/running average
Allright so I finally figured it out thanks to your help and the posts you referred me to.
The calculation for my entire data (300 000 +) takes about a second!
I used the following code:
def runningmean(x,N):
cumsum =np.cumsum(np.insert(x,0,0))
return (cumsum[N:] -cumsum[:-N])/N
photosignal = np.array(seg.analogsignal[0], ndmin =1)
photosignalaverage = runningmean(photosignal, 3000)
holder = np.zeros(2999)
photosignalaverage = np.append(holder,photosignalaverage)
detalfsignal = (photosignal-photosignalaverage)/abs(photosignalaverage)
Photosignal stores my raw signal in a numpy array.
Photosignalaverage uses cumsum to calculate the running average of every datapoint in photosignal. I then add the first 2999 values as 0, to maintian the same list size as my photosignal.
I then use basic numpy calculations to get my delta F/F signal.
Thank you once more for the feedback, was truly helpful!
Your approach goes in the right direction. However, you made a mistake in your list comprehension: you are using uu as your index whereas uu are the elements of your input data photosignal.
You want something like this:
normalizedphotosignal2 = np.zeros((photosignal.shape[0]-3000))
for i, uu in enumerate(photosignal[3000:]):
normalizedphotosignal2 = (uu - (np.mean(photosignal[i-3000:i]))) / abs(np.mean(photosignal[i-3000:i]))
Keep in mind that for-loops are relatively slow in python. If performance is an issue here, you could try avoiding the for loop and use numpy methods instead (e.g. have a look at Moving average or running mean).
Hope this helps.
My timelines are stored in simple numpy Arrays, and they are long (>10 Million entrys)
I have to detect machine shutdowns, that show in jumps in the time vector . After that shutdown I want do delete the next 10 values (The sensors do give bad results for a while after being switched on) and continue.
I came up with the following code:
Keep_data=np.empty_like(Timestamp_new,dtype=np.bool)
Keep_data[0]=False
Keep_data[1:]=Timestamp_new[1:]>(Timestamp_new[:-1]+min_shutdown_length)
for item in np.nonzero(np.logical_not(Keep_data))[0]:
Keep_data[item:min(item+10,len(Keep_data)]=False
Timestampnew=Timestampnew[Keep_data]
Can anyone suggest a more effective code, without a pure python Loop?
Thank you.
Basically you are trying to spread/grow or in image-processing terms dilate the False regions. For the same, we have a built-in as scipy's binary_dilation. Now, you are trying to make it grow starting from each such False element in input array Keep_data towards higher indices. So, we need to use a different offset (or as scipy calls it : origin) than the default one as 0, which otherwise would have dilated across both ends for each element.
Thus, to sum up, an implementation with it to get rid of the loopy portion of the code, we would have an implementation like so -
N = 10 # Interval length
dilated_mask = binary_dilation(~Keep_data, structure=np.ones(N),origin=-int(N/2))
Keep_data[dilated_mask] = False
An alternative approach that would be closer to the one posted as the loopy code in the question, but vectorized with NumPy's broadcasting feature, would look something like this -
N = 10 # Interval length
idx = np.nonzero(np.logical_not(Keep_data[:-N]))[0]
Keep_datac[(idx + np.arange(N)[:,None]).ravel()] = False
rest = np.nonzero(np.logical_not(Keep_data[-N:]))[0]
if len(rest)>0:
Keep_datac[-N+rest[0]:] = False
I know that this question party has been answered, but I am looking specifically at numpy and scipy. Say I have a grid
lGrid = linspace(0.1, 8, 50)
and I want to find the index that corresponds best to 2, I do
index = abs(lGrid-2).argmin()
lGrid[index]
2.034
However, what if I have a whole matrix of values instead of 2 here. I guess iteration is pretty slow. abs(lGrid-[2,4]) however will fail due to shape issues. I will need a solution that is easily extendable to N-dim matrices. What is the best course of action in this environment?
You can use broadcasting:
from numpy import arange,linspace,argmin
vals = arange(30).reshape(2,5,3) #your N-dimensional input, like array([2,4])
lGrid = linspace(0.1, 8, 50)
result = argmin(abs(lGrid-vals[...,newaxis]),axis=-1)
for example, with input vals = array([2,4]), you obtain result = array([12, 24]) and lGrid[result]=array([ 2.03469388, 3.96938776])
You "guess that Iteration is pretty slow", but I guess it isn't. So I would just just iterate over the "whole Matrix of values instead of 2". Perhaps:
for val in BigArray.flatten():
index = abs(lGrid-val).argmin()
yield lGrid[index]
If lGrid is failry large, then the overhead of iterating in a Python for loop is probably not big in comparison to the vecotirsed operation Happening inside it.
There might be a way you can use broadcasting and reshaping to do the whole thing in one giant operation, but would be complicated, and you might accidentally allocate such a huge array that your machine slows down to a crawl.
I am having a small issue understanding indexing in Numpy arrays. I think a simplified example is best to get an idea of what I am trying to do.
So first I create an array of zeros of the size I want to fill:
x = range(0,10,2)
y = range(0,10,2)
a = zeros(len(x),len(y))
so that will give me an array of zeros that will be 5X5. Now, I want to fill the array with a rather complicated function that I can't get to work with grids. My problem is that I'd like to iterate as:
for i in xrange(0,10,2):
for j in xrange(0,10,2):
.........
"do function and fill the array corresponding to (i,j)"
however, right now what I would like to be a[2,10] is a function of 2 and 10 but instead the index for a function of 2 and 10 would be a[1,4] or whatever.
Again, maybe this is elementary, I've gone over the docs and find myself at a loss.
EDIT:
In the end I vectorized as much as possible and wrote the simulation loops that I could not in Cython. Further I used Joblib to Parallelize the operation. I stored the results in a list because an array was not filling right when running in Parallel. I then used Itertools to split the list into individual results and Pandas to organize the results.
Thank you for all the help
Some tips for your to get the things done keeping a good performance:
- avoid Python `for` loops
- create a function that can deal with vectorized inputs
Example:
def f(xs, ys)
return x**2 + y**2 + x*y
where you can pass xs and ys as arrays and the operation will be done element-wise:
xs = np.random.random((100,200))
ys = np.random.random((100,200))
f(xs,ys)
You should read more about numpy broadcasting to get a better understanding about how the arrays's operations work. This will help you to design a function that can handle properly the arrays.
First, you lack some parenthesis with zeros, the first argument should be a tuple :
a = zeros((len(x),len(y)))
Then, the corresponding indices for your table are i/2 and j/2 :
for i in xrange(0,10,2):
for j in xrange(0,10,2):
# do function and fill the array corresponding to (i,j)
a[i/2, j/2] = 1
But I second Saullo Castro, you should try to vectorize your computations.