Save data to 3D numpy array

Save data to 3D numpy array - python

I have accelerometer data (x,y,z) which is being updated every 50ms. I need to store 80 values of the data into a 3D numpy array (1, 80, 3). For example:
[[[x,y,z] (at 0ms)
[x,y,z] (at 50ms)
...
[x,y,z]]] (at 4000ms)
After getting the first 80 values, I need to update the array with upcoming values, for example:
[[[x,y,z] (at 50ms)
[x,y,z] (at 100ms)
...
[x,y,z]]] (at 4050ms)
I'm sure there is a way to update the array without needing to manually write 80 variables to store the data into, but I can't think of it. Would really appreciate some help here.

It sounds like you want your array to always be 80 long, so what I would suggest is roll the array and then update the last value.
import numpy as np
data = np.arange(80*3).reshape(80, 3)
data
>>> array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
...,
[231, 232, 233],
[234, 235, 236],
[237, 238, 239]])
data = np.roll(data, -1, axis=0)
data
>>> array([[ 3, 4, 5], # this is second row (index 1) in above array
[ 6, 7, 8], # third row
[ 9, 10, 11], # etc.
...,
[234, 235, 236],
[237, 238, 239],
[ 0, 1, 2]]) # the first row has been rolled to the last position
# now update last position with new data
data[-1] = [x, y, z] # new xyz data
data
>>> data
>>> array([[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
...,
[234, 235, 236],
[237, 238, 239],
[ 76, 76, 76]]) # new data updates in correct position in array

You can use vstack (initializing the array for the first iteration):
data=[x,y,x] # first iteration
data=np.vstack([data,[x,y,z]]) # for the rest
print(data) # you would have a Nx3 array
For the update every N seconds it is easier if you use a FIFO or a ring buffer:
https://pypi.org/project/numpy_ringbuffer/

Related

Efficiently calculate mean within specified bins of a numpy array

I have a numpy array (p) that represents the evolution of some prices. Additionally I have another numpy array (bins) of the same length that assigns each entry of p to a specific bin starting from 0. The goal is to calculate the mean of prices within each bin and return them in an array p_mean. This can be done by using a for loop like this:
p = np.array([100, 100, 101, 102, 103, 104, 106, 103, 102, 103, 100, 105])
bins = np.array([3, 3, 3, 5, 6, 6, 6, 7, 7, 7, 8, 9])
lengthBins = 10
p_mean = np.empty(lengthBins)
p_mean.fill(np.nan)
for i in range(lengthBins):
p_mean[i] = p[bins==i].mean()
p_mean // Output: array([nan, nan, nan, 100.33, nan, 102., 104.33, 102.67, 100., 105.]
This gives the desired output (including the nans), however I feel there must be a faster way of doing this using numpy without using the for loop.

Fastest way to create 2D numpy array which starts at 0, and increases by 1 across the rows, and continues into the columns?

The first element of the first row should start with 0, and increment by 1 across the row, continues incrementing by 1 for the next column, and so on.
This is an example of what I am looking for
array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11],
...,
[5231, 5232, 5233, 5234],
[5235, 5236, 5237, 5238]], dtype=int32)
The solution should be able to apply for any specified 2D dimension, for example
array([[0, 1, 2, ..., 78, 79, 80],
[81, 82, 83, ..., 158, 159, 160],
...,
[2253, 2254, 2255, ..., 2453, 2454, 2455]], dtype=int32)
The examples aren't numerically accurate, I just wanted to demonstrate that it starts at 0, increments by 1 across the rows , and continues into the next row.
I was thinking of using a for loop to fill each value individually, but I am not sure if that is the fastest solution, nor the most pythonic and programmatically elegant solution.

You can use
np.arange(nrows*ncols).reshape(nrows,ncols)
Incidentally, this is how 90% of example 2D arrays are created in SO numpy posts.

Create a 1D array, initialize the array with the desired values, then use bumpy reshape to convert to a 2D array.

identifying sub-arrays in numpy

I have two two dimensional arrays a and b (#columns of a <= #columns in b). I would like to find an efficient way of matching a row in array a to a contiguous part of a row in array b.
a = np.array([[ 25, 28],
[ 84, 97],
[105, 24],
[ 28, 900]])
b = np.array([[ 25, 28, 84, 97],
[ 22, 25, 28, 900],
[ 11, 12, 105, 24]])
The output should be np.array([[0,0], [0,1], [1,0], [2,2], [3,1]]). Row 0 in array a matches Row 0 in array b (first two positions). Row 1 in array a matches row 0 in array b (third and fourth positions).

We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows for efficient patch extraction, and then compare those patches against each row off a, all of it in a vectorized manner. Then, get the matching indices with np.argwhere -
# a and b from posted question
In [325]: from skimage.util.shape import view_as_windows
In [428]: w = view_as_windows(b,(1,a.shape[1]))
In [429]: np.argwhere((w == a).all(-1).any(-2))[:,::-1]
Out[429]:
array([[0, 0],
[1, 0],
[0, 1],
[3, 1],
[2, 2]])
Alternatively, we could get the indices by the order of rows in a by pushing forward the first axis of a while performing broadcasted comparisons -
In [444]: np.argwhere((w[:,:,0] == a[:,None,None,:]).all(-1).any(-1))
Out[444]:
array([[0, 0],
[0, 1],
[1, 0],
[2, 2],
[3, 1]])

Another way I can think of is to loop over each row in a and perform a 2D correlation between the b which you can consider as a 2D signal a row in a.
We would find the results which are equal to the sum of squares of all values in a. If we subtract our correlation result with this sum of squares, we would find matches with a zero result. Any rows that give you a 0 result would mean that the subarray was found in that row. If you are using floating-point numbers for example, you may want to compare with some small threshold that is just above 0.
If you can use SciPy, the scipy.signal.correlate2d method is what I had in mind.
import numpy as np
from scipy.signal import correlate2d
a = np.array([[ 25, 28],
[ 84, 97],
[105, 24]])
b = np.array([[ 25, 28, 84, 97],
[ 22, 25, 28, 900],
[ 11, 12, 105, 24]])
EPS = 1e-8
result = []
for (i, row) in enumerate(a):
out = correlate2d(b, row[None,:], mode='valid') - np.square(row).sum()
locs = np.where(np.abs(out) <= EPS)[0]
unique_rows = np.unique(locs)
for res in unique_rows:
result.append((i, res))
We get:
In [32]: result
Out[32]: [(0, 0), (0, 1), (1, 0), (2, 2)]
The time complexity of this could be better, especially since we're looping over each row of a to find any subarrays in b.

Numpy Interweave oddly shaped arrays

Alright, here the given data;
There are three numpy arrays of the shapes:
(i, 4, 2), (i, 4, 3), (i, 4, 2)
the i is shared among them but is variable.
The dtype is float32 for everything.
The goal is to interweave them in a particular order. Let's look at the data at index 0 for these arrays:
[[-208. -16.]
[-192. -16.]
[-192. 0.]
[-208. 0.]]
[[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]
[[ 0.49609375 0.984375 ]
[ 0.25390625 0.984375 ]
[ 0.25390625 0.015625 ]
[ 0.49609375 0.015625 ]]
In this case, the concatened target array would look something like this:
[-208, -16, 1, 1, 1, 0.496, 0.984, -192, -16, 1, 1, 1, ...]
And then continue on with index 1.
I don't know how to achieve this, as the concatenate function just keeps telling me that the shapes don't match. The shape of the target array does not matter much, just that the memoryview of it must be in the given order for upload to a gpu shader.
Edit: I could achieve this with a few python for loops, but the performance impact would be a problem in this program.

Use np.dstack and flatten with np.ravel() -
np.dstack((a,b,c)).ravel()
Now, np.dstack is basically stacking along the third axis. So, alternatively we can use np.concatenate too along that axis, like so -
np.concatenate((a,b,c),axis=2).ravel()
Sample run -
1) Setup Input arrays :
In [613]: np.random.seed(1234)
...: n = 3
...: m = 2
...: a = np.random.randint(0,9,(n,m,2))
...: b = np.random.randint(11,99,(n,m,2))
...: c = np.random.randint(101,999,(n,m,2))
...:
2) Check input values :
In [614]: a
Out[614]:
array([[[3, 6],
[5, 4]],
[[8, 1],
[7, 6]],
[[8, 0],
[5, 0]]])
In [615]: b
Out[615]:
array([[[84, 58],
[61, 87]],
[[48, 45],
[49, 78]],
[[22, 11],
[86, 91]]])
In [616]: c
Out[616]:
array([[[104, 359],
[376, 560]],
[[472, 720],
[566, 115]],
[[344, 556],
[929, 591]]])
3) Output :
In [617]: np.dstack((a,b,c)).ravel()
Out[617]:
array([ 3, 6, 84, 58, 104, 359, 5, 4, 61, 87, 376, 560, 8,
1, 48, 45, 472, 720, 7, 6, 49, 78, 566, 115, 8, 0,
22, 11, 344, 556, 5, 0, 86, 91, 929, 591])

What I would do is:
np.hstack([a, b, c]).flatten()
assuming a, b, c are the three arrays

python most common pair of indices in 3 x n array

I have a numpy array with shape (3, 600219), which is a list of indices.
i.e.
array([[ 0, 0, 0, ..., 2879, 2879, 2879],
[ 40, 40, 40, ..., 162, 165, 168],
[ 249, 250, 251, ..., 195, 196, 198]])
The first row are time indices, the second and third rows are indices of the coordinates. I am trying to figure out which pair of coordinates most frequently occurred, disregarding the time.
e.g. Was it (49,249) or (40,250)...etc.?

I just used a small sample of your data, but I think you'll get the point:
import numpy as np
array = np.array([[ 0, 0, 0, 2879, 2879, 2879],
[ 40, 40, 40, 162, 165, 168],
[ 249, 250, 251, 195, 196, 198]])
# Zip together only the second and third rows
only_coords = zip(array[1,:], array[2,:])
from collections import Counter
Counter(only_coords).most_common()
Produces:
Out[11]:
[((40, 249), 1),
((165, 196), 1),
((162, 195), 1),
((168, 198), 1),
((40, 251), 1),
((40, 250), 1)]

Here's one vectorized approach -
IDs = a[1].max()+1 + a[2]
unq, idx, count = np.unique(IDs, return_index=1,return_counts=1)
out = a[1:,idx[count.argmax()]]
If there could be negative coordinates, use a[1].max()-a[1].min()+1 + a[2] to compute IDs.
Sample run -
In [44]: a
Out[44]:
array([[8, 3, 6, 6, 8, 5, 1, 6, 6, 5],
[5, 2, 1, 1, 5, 1, 5, 1, 1, 4],
[8, 2, 3, 3, 8, 1, 7, 3, 3, 3]])
In [47]: IDs = a[1].max()+1 + a[2]
In [48]: unq, idx, count = np.unique(IDs, return_index=1,return_counts=1)
In [49]: a[1:,idx[count.argmax()]]
Out[49]: array([1, 3])

This might seem a little abstract, but you could try saving each co-ordinate as a number, e.g. [2,1] = 2.1. And put your data into a list of these co-ordinates. For example, a 2nd row of [1,1,2] and 3rd row of [2,2,1] would be [1.2, 1.2, 2.1] You could then use the code:
from collections import Counter
list1=[1.2,1.2,2.1]
data = Counter(list1)
print (data.most_common(1)) # Returns the highest occurring item
which prints the most common number, and how many times it occurs, then you can simply convert the number back to a co-ordinate if you need to use it in your code.

Here is a sample code that does the count:
import numpy as np
import collections
a = np.array([[0, 1, 2, 3], [10, 10, 30 ,40], [25, 25, 10, 50]])
# You don't care about time
b = np.transpose(a[1:])
# convert list items to tuples
c = map(lambda v:tuple(v), b)
collections.Counter(c)
The output:
Counter({(10, 25): 2, (30, 10): 1, (40, 50): 1})

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Save data to 3D numpy array - python

Related

Efficiently calculate mean within specified bins of a numpy array

Fastest way to create 2D numpy array which starts at 0, and increases by 1 across the rows, and continues into the columns?

identifying sub-arrays in numpy

Numpy Interweave oddly shaped arrays

python most common pair of indices in 3 x n array

Categories

Resources