I have to work with sensor data (from ros, specifically, but it should not be relevant). To this end, I have several 2-D numpy arrays with one row storing the timestamps and the following others the corresponding sensors data. Problem is, such arrays do not have the same dimensions (different sampling times). I need to merge all of these arrays into a single big one. How can I do so based on the timestamp and, say, replace the missing numbers with 0 or NaN?
Example of my situation:
import numpy as np
time1=np.arange(1,10)
data1=np.random.randint(200, size=time1.shape)
a=np.array((time1,data1))
print(a)
time2=np.arange(1,10,2)
data2=np.random.randint(200, size=time2.shape)
b=np.array((time2,data2))
print(b)
Which returns output
[[ 1 2 3 4 5 6 7 8 9]
[ 51 9 117 174 164 60 95 197 30]]
[[ 1 3 5 7 9]
[ 35 188 114 153 36]]
What I am looking for is
[[ 1 2 3 4 5 6 7 8 9]
[ 51 9 117 174 164 60 95 197 30]
[ 35 0 188 0 114 0 153 0 36]]
Is there any way to achieve this in an efficient way? This is an example but I am working with thousands of samples. Thanks!
For simple case of one b-matrix
With first row of a storing all possible timestamps and both of those first rows in a and b being sorted, we can use np.searchsorted -
idx = np.searchsorted(a[0],b[0])
out_dtype = np.result_type((a.dtype,b.dtype))
b0 = np.zeros(a.shape[1],dtype=out_dtype)
b0[idx] = b[1]
out = np.vstack((a,b0))
For several b-matrices
Approach #1
To extend to multiple b-matrices, we can follow a similar method with np.searchsorted within a loop, like so -
def merge_arrays(a, B):
# a : Array with first row holding all possible timestamps
# B : list or tuple of all b-matrices
lens = np.array([len(i) for i in B])
L = (lens-1).sum() + len(a)
out_dtype = np.result_type(*[i.dtype for i in B])
out = np.zeros((L, a.shape[1]), dtype=out_dtype)
out[:len(a)] = a
s = len(a)
for b_i in B:
idx = np.searchsorted(a[0],b_i[0])
out[s:s+len(b_i)-1,idx] = b_i[1:]
s += len(b_i)-1
return out
Sample run -
In [175]: a
Out[175]:
array([[ 4, 11, 16, 22, 34, 56, 67, 87, 91, 99],
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
In [176]: b0
Out[176]:
array([[16, 22, 34, 56, 67, 91],
[20, 80, 69, 79, 47, 64],
[82, 88, 49, 29, 19, 19]])
In [177]: b1
Out[177]:
array([[ 4, 16, 34, 99],
[28, 34, 0, 0],
[36, 53, 5, 38],
[17, 79, 4, 42]])
In [178]: merge_arrays(a, [b0,b1])
Out[178]:
array([[ 4, 11, 16, 22, 34, 56, 67, 87, 91, 99],
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[ 0, 0, 20, 80, 69, 79, 47, 0, 64, 0],
[ 0, 0, 82, 88, 49, 29, 19, 0, 19, 0],
[28, 0, 34, 0, 0, 0, 0, 0, 0, 0],
[36, 0, 53, 0, 5, 0, 0, 0, 0, 38],
[17, 0, 79, 0, 4, 0, 0, 0, 0, 42]])
Approach #2
If looping with np.searchsorted seems to be the bottleneck, we can vectorize that part -
def merge_arrays_v2(a, B):
# a : Array with first row holding all possible timestamps
# B : list or tuple of all b-matrices
lens = np.array([len(i) for i in B])
L = (lens-1).sum() + len(a)
out_dtype = np.result_type(*[i.dtype for i in B])
out = np.zeros((L, a.shape[1]), dtype=out_dtype)
out[:len(a)] = a
s = len(a)
r0 = [i[0] for i in B]
r0s = np.concatenate((r0))
idxs = np.searchsorted(a[0],r0s)
cols = np.array([i.shape[1] for i in B])
sp = np.r_[0,cols.cumsum()]
start,stop = sp[:-1],sp[1:]
for (b_i,s0,s1) in zip(B,start,stop):
idx = idxs[s0:s1]
out[s:s+len(b_i)-1,idx] = b_i[1:]
s += len(b_i)-1
return out
Here's an approach using np.searchsorted:
time1=np.arange(1,10)
data1=np.random.randint(200, size=time1.shape)
a=np.array((time1,data1))
# array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [118, 105, 86, 94, 69, 17, 142, 46, 54]])
time2=np.arange(1,10,2)
data2=np.random.randint(200, size=time2.shape)
b=np.array((time2,data2))
# array([[ 1, 3, 5, 7, 9],
# [70, 15, 4, 97, 57]])
out = np.vstack([a, np.zeros(a.shape[1])])
out[out.shape[0]-1, np.searchsorted(a[0], b[0])] = b[1]
array([[ 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[118., 105., 86., 94., 69., 17., 142., 46., 54.],
[ 70., 0., 15., 0., 4., 0., 97., 0., 57.]])
Update - Merging many matrices
Here's a almost fully vectorised approach for a scenario with multiple b matrices. This approach does not require a priori knowledge of which is the largest list:
def merge_timestamps(*x):
# infer which is the list with maximum length
# as well as individual lengths
concat = np.concatenate(*x, axis=1)[0]
lens = np.r_[np.flatnonzero(np.diff(concat) < 0), len(concat)]
max_len_list = np.r_[lens[0], np.diff(lens)].argmax()
# define the output matrix
A = x[0][max_len_list]
out = np.vstack([A[1], np.zeros((len(*x)-1, len(A[0])))])
others = np.flatnonzero(~np.in1d(np.arange(len(*x)), max_len_list))
# Update the output matrix with the values of the smaller
# arrays according to their index. This is of course assuming
# all values are contained in the largest
for ix, i in enumerate(others):
out[-(ix+1), x[0][i][0]-A[0].min()] = x[0][i][1]
return out
Lets check with the following example:
time1=np.arange(1,10)
data1=np.random.randint(200, size=time1.shape)
a=np.array((time1,data1))
# array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9],
# [107, 13, 123, 119, 137, 135, 65, 157, 83]])
time2=np.arange(1,10,2)
data2=np.random.randint(200, size=time2.shape)
b = np.array((time2,data2))
# array([[ 1, 3, 5, 7, 9],
# [ 81, 49, 83, 32, 179]])
time3=np.arange(1,4,2)
data3=np.random.randint(200, size=time3.shape)
c=np.array((time3,data3))
# array([[ 1, 3],
# [185, 117]])
merge_timestamps([a,b,c])
array([[ 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[107., 13., 123., 119., 137., 135., 65., 157., 83.],
[185., 0., 117., 0., 0., 0., 0., 0., 0.],
[ 81., 0., 49., 0., 83., 0., 32., 0., 179.]])
As mentioned this approach does not require a priori knowledge of which is the largest list, i.e. it would also work with:
merge_timestamps([b, c, a])
array([[ 1., 2., 3., 4., 5., 6., 7., 8., 9.],
[107., 13., 123., 119., 137., 135., 65., 157., 83.],
[185., 0., 117., 0., 0., 0., 0., 0., 0.],
[ 81., 0., 49., 0., 83., 0., 32., 0., 179.]])
Applicable only if sensor is capturing data at fixed interval.
First we will need to create a dataframe with fixed interval (15 min interval in this case), then use concat function to this dataframe with sensor's data.
Code to generate dataframe with 15 min interval (copied)
l = (pd.DataFrame(columns=['NULL'],
index=pd.date_range('2016-09-02T17:30:00Z', '2016-09-02T21:00:00Z',
freq='15T'))
.between_time('07:00','21:00')
.index.strftime('%Y-%m-%dT%H:%M:%SZ')
.tolist()
)
l = pd.DataFrame(l)
Assuming below data comes from sensor
m = (pd.DataFrame(columns=['NULL'],
index=pd.date_range('2016-09-02T17:30:00Z', '2016-09-02T21:00:00Z',
freq='30T'))
.between_time('07:00','21:00')
.index.strftime('%Y-%m-%dT%H:%M:%SZ')
.tolist()
)
m = pd.DataFrame(m)
m['SensorData'] = np.arange(8)
merge above two dataframes
df = l.merge(m, left_on = 0, right_on= 0,how='left')
df.loc[df['SensorData'].isna() == True,'SensorData'] = 0
Output
0 SensorData
0 2016-09-02T17:30:00Z 0.0
1 2016-09-02T17:45:00Z 0.0
2 2016-09-02T18:00:00Z 1.0
3 2016-09-02T18:15:00Z 0.0
4 2016-09-02T18:30:00Z 2.0
5 2016-09-02T18:45:00Z 0.0
6 2016-09-02T19:00:00Z 3.0
7 2016-09-02T19:15:00Z 0.0
8 2016-09-02T19:30:00Z 4.0
9 2016-09-02T19:45:00Z 0.0
10 2016-09-02T20:00:00Z 5.0
11 2016-09-02T20:15:00Z 0.0
12 2016-09-02T20:30:00Z 6.0
13 2016-09-02T20:45:00Z 0.0
14 2016-09-02T21:00:00Z 7.0
I'm looking for a solution to sum per column in a 2D array ("a" in the example below) and starting from a cell position as defined in a different 1D array ("ref" in the example below).
I have tried the following:
import numpy as np
a = np.arange(20).reshape(5, 4)
print(a) # representing an original large 2D array
ref = np.array([0, 2, 4, 1]) # reference array for defining start of sum
s = a.sum(axis=0)
print(s) # Works: sums all elements per column
s = a[2:].sum(axis=0)
print(s) # Works as well: sum from the third element till end per column
# This is what I look for: sum per column starting at element defined by ref[]
s = np.zeros(4).astype(int) # makes an empty 1D array
for i in np.arange(4): # for each column
for j in np.arange(ref[i], 5):
s[i] += a[j, i] # sums all elements from ref till end (i.e. 5)
print(s) # This is the desired outcome
for i in np.arange(4):
s = a[ref[i]:].sum(axis=0)
print(s) # No good; same as a[ref[4]:].sum(axis=0) and here ref[4] = 1
s = np.zeros(4).astype(int) # makes an empty 1D array
for i in np.arange(4):
s[i] = np.sum(a[ref[i]:, i])
print(s) # Yes; this is also the desired outcome
Is it possible to realize this without using a for loop?
Does numpy have functions for doing this in a single step?
s = a[ref:].sum(axis=0)
This would be nice, but is not working.
Thank you for your time!
A basic solution based on np.cumsum:
In [1]: a = np.arange(15).reshape(5, 3)
In [2]: res = np.array([0, 2, 3])
In [3]: b = np.cumsum(a, axis=0)
In [4]: b
Out[4]:
array([[ 0, 1, 2],
[ 3, 5, 7],
[ 9, 12, 15],
[18, 22, 26],
[30, 35, 40]])
In [5]: a
Out[5]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
In [6]: b[res, np.arange(a.shape[1])]
Out[6]: array([ 0, 12, 26])
In [7]: b[-1, :] - b[res, np.arange(a.shape[1])]
Out[7]: array([30, 23, 14])
so it does not give us the result we want: we need to add a first line of zeros to b:
In [13]: b = np.vstack([np.zeros((1, a.shape[1])), b])
In [14]: b
Out[14]:
array([[ 0., 0., 0.],
[ 0., 1., 2.],
[ 3., 5., 7.],
[ 9., 12., 15.],
[ 18., 22., 26.],
[ 30., 35., 40.]])
In [17]: b[-1, :] - b[res, np.arange(a.shape[1])]
Out[17]: array([ 30., 30., 25.])
which is, I believe, the desired output.
Assume that I have two arrays
>>> import numpy as np
>>> a = np.random.randint(0, 10, size=(5, 4))
>>> a
array([[1, 6, 7, 4],
[2, 7, 4, 2],
[9, 3, 6, 4],
[9, 6, 8, 2],
[7, 2, 9, 5]])
>>> b = np.random.randint(0, 10, size=(5, 4))
>>> b
array([[ 5., 8., 6., 5.],
[ 1., 8., 4., 8.],
[ 1., 4., 6., 3.],
[ 4., 8., 6., 4.],
[ 8., 7., 7., 5.]], dtype=float32)
Now I have a situation where I need to compare elements of each arrays and replace with known values. For example my conditions are
if a == 0 then replace with 0 (or) if b == 0 then replace with 0
if a > 4 and < 11 then replace with 1 (or) if b > 1 and < 3 then replace with 1
if a > 10 and < 18 then replace with 2 (or) if b > 2 and < 5 then replace with 2
.
.
.
and finally
if a > 40 replace with 9 (or) if b > 9 then replace with 9.
Those replaced values can be stored in a new arrary which I need to use it for other function.
The simplest form of element wise comparison like a[ a > 2 ] = 1 works. But I am not aware of multiple comparison (multiple times) with same method.
I am sure that there is a easy way exist in numpy which I am unable to find. Any help is appreciated.
if
np.digitize should do what you want. The first arguments are the values you want to replace and the second are the thresholds.
a_replace = np.digitize(a, [0, 4, 10, ..., 40], right=True)
b_replace = np.digitize(b, [0, 1, 2, ..., 9], right=True)
I have a 1-D function that takes so much time to compute over a big 2-D array of 'x' values, so it is much easy to create an interpolate function using SciPy facility and then compute y using it, which will be much faster. However, I cannot use the interpolation function on arrays that have more than 1-D.
Example:
# First, I create the interpolation function in the domain I want to work
x = np.arange(1, 100, 0.1)
f = exp(x) # a complicated function
f_int = sp.interpolate.InterpolatedUnivariateSpline(x, f, k=2)
# Now, in the code I do that
x = [[13, ..., 1], [99, ..., 45], [33, ..., 98] ..., [15, ..., 65]]
y = f_int(x)
# Which I want that it returns y = [[f_int(13), ..., f_int(1)], ..., [f_int(15), ..., f_int(65)]]
But returns:
ValueError: object too deep for desired array
I know I could loop over all x members, but I don't know if it is a better option...
Thanks!
EDIT:
A function like that also would do the job:
def vector_op(function, values):
orig_shape = values.shape
values = np.reshape(values, values.size)
return np.reshape(function(values), orig_shape)
I've tried the np.vectorize but it is too slow...
If f_int wants single dimensional data, you should flatten your input, feed it to the interpolator, then reconstruct your original shape:
>>> x = np.arange(1, 100, 0.1)
>>> f = 2 * x # a simple function to see the results are good
>>> f_int = scipy.interpolate.InterpolatedUnivariateSpline(x, f, k=2)
>>> x = np.arange(25).reshape(5, 5) + 1
>>> x
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
>>> x_int = f_int(x.reshape(-1)).reshape(x.shape)
>>> x_int
array([[ 2., 4., 6., 8., 10.],
[ 12., 14., 16., 18., 20.],
[ 22., 24., 26., 28., 30.],
[ 32., 34., 36., 38., 40.],
[ 42., 44., 46., 48., 50.]])
x.reshape(-1) does the flattening, and the .reshape(x.shape) returns it to its original form.
I think you want to do a vectorized function in numpy:
#create some random test data
test = numpy.random.random((100,100))
#a normal python function that you want to apply
def myFunc(i):
return np.exp(i)
#now vectorize the function so that it will work on numpy arrays
myVecFunc = np.vectorize(myFunc)
result = myVecFunc(test)
I would use a combination of a list comprehension and map (there might be a way to use two nested maps that I'm missing)
In [24]: x
Out[24]: [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
In [25]: [map(lambda a: a*0.1, x_val) for x_val in x]
Out[25]:
[[0.1, 0.2, 0.30000000000000004],
[0.1, 0.2, 0.30000000000000004],
[0.1, 0.2, 0.30000000000000004]]
this is just for illustration purposes.... replace lambda a: a*0.1 with your function, f_int
...better to directly show the code. Here it is:
import numpy as np
a = np.zeros([3, 3])
a
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
b = np.random.random_integers(0, 100, size = (1, 3))
b
array([[ 10, 3, 8]])
c = np.random.random_integers(0, 100, size = (4, 3))
c
array([[ 22, 21, 14],
[ 55, 64, 12],
[ 33, 85, 98],
[ 37, 44, 45]])
a = b will change dimensions of a
a = c will change dimensions of a
for a = b, I want:
array([[ 10., 3., 8.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
and for a = c, I want:
array([[ 22, 21, 14],
[ 55, 64, 12],
[ 33, 85, 98]])
So I want to lock the shape of 'a' so that values being assigned to it get "cropped" if necessary. Of course without if statements.
The problem is that the equal operator is making a shallow copy of the array, and what you want is a deep copy of part of the array.
So for this, if you know that b only has one outer array, then you can do:
a[0] = b
And if know that a is a 3x3, then you could also do:
a = c[0:3]
Furthermore, if you want them to be actual deep copies, you'll want:
a[0] = b.copy()
and
a = c[0:3].copy()
To make them independent.
If you don't already know the lengths of the matrices, you can use the len() function to find out at runtime.
You can do this easily by using Numpy slice notation. Here is a SO question with good answers explaining it clearly. Essentially, you need to ensure that the shape of the left hand array and the right had array match, and you can achieve this by slicing the corresponding arrays appropriately.
import numpy as np
a = np.zeros([3, 3])
b = np.array([[ 10, 3, 8]])
c = np.array([[ 22, 21, 14],
[ 55, 64, 12],
[ 33, 85, 98],
[ 37, 44, 45]])
a[0] = b
print a
a = c[0:3]
print a
Output:
[[ 10. 3. 8.]
[ 0. 0. 0.]
[ 0. 0. 0.]]
[[22 21 14]
[55 64 12]
[33 85 98]]
It seems you want to replace elements in the top left of a 2D array with elements from a second 2D array without worrying about the sizes of the arrays. Here is a method:
def replacer(orig, repl):
new = np.copy(orig)
w2, h1 = new.shape
w1, h2 = repl.shape
new[0:min(w1,w2), 0:min(h1,h2)] = repl[0:min(w1,w2), 0:min(h1,h2)]
return new
print replacer(a,b)
print replacer(a,c)