How to append many numpy files into one numpy file in python

How to append many numpy files into one numpy file in python - python

I am trying to put many numpy files to get one big numpy file, I tried to follow those two links Append multiple numpy files to one big numpy file in python and Python append multiple files in given order to one big file this is what I did:
import matplotlib.pyplot as plt
import numpy as np
import glob
import os, sys
fpath ="/home/user/Desktop/OutFileTraces.npy"
npyfilespath ="/home/user/Desktop/test"
os.chdir(npyfilespath)
with open(fpath,'wb') as f_handle:
for npfile in glob.glob("*.npy"):
# Find the path of the file
filepath = os.path.join(npyfilespath, npfile)
print filepath
# Load file
dataArray= np.load(filepath)
print dataArray
np.save(f_handle,dataArray)
dataArray= np.load(fpath)
print dataArray
An example of the result that I have:
/home/user/Desktop/Trace=96
[[ 0.01518007 0.01499514 0.01479736 ..., -0.00392216 -0.0039761
-0.00402747]]
[[-0.00824758 -0.0081808 -0.00811402 ..., -0.0077236 -0.00765425
-0.00762086]]
/home/user/Desktop/Trace=97
[[ 0.00614908 0.00581004 0.00549154 ..., -0.00814741 -0.00813457
-0.00809347]]
[[-0.00824758 -0.0081808 -0.00811402 ..., -0.0077236 -0.00765425
-0.00762086]]
/home/user/Desktop/Trace=98
[[-0.00291786 -0.00309509 -0.00329287 ..., -0.00809861 -0.00797789
-0.00784175]]
[[-0.00824758 -0.0081808 -0.00811402 ..., -0.0077236 -0.00765425
-0.00762086]]
/home/user/Desktop/Trace=99
[[-0.00379887 -0.00410453 -0.00438963 ..., -0.03497837 -0.0353842
-0.03575151]]
[[-0.00824758 -0.0081808 -0.00811402 ..., -0.0077236 -0.00765425
-0.00762086]
this line represents the first trace:
[[-0.00824758 -0.0081808 -0.00811402 ..., -0.0077236 -0.00765425
-0.00762086]]
It is repeated all the time.
I asked the second question two days ago, at first I think that I had the best answer, but after trying to model to print and lot the final file 'OutFileTraces.npy' I found that my code:
1/ doesn't print numpy files from folder 'test' with respecting their order(trace0,trace1, trace2,...)
2/ saves only the last trace in the file, I mean by that when print or plot the OutFileTraces.npy, I found just one trace , it is the first one.
So I need to correct my code because really I am blocked. I would be very grateful if you could help me.
Thanks in advance.

Glob produces unordered lists. You need to sort explicitly with an extra line as the sorting procedure is in-place and does not return the list.
npfiles = glob.glob("*.npy")
npfiles.sort()
for npfile in npfiles:
...
NumPy files contain a single array. If you want to store several arrays in a single file you may have a look at .npz files with np.savez https://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html#numpy.savez I have not seen this in use widely, so you may wish seriously to consider alternatives.
If your arrays are all of the same shape and store related data, you can make a larger array. Say that the current shape is (N_1, N_2) and that you have N_0 such arrays. A loop with
all_arrays = []
for npfile in npfiles:
all_arrays.append(np.load(os.path.join(npyfilespath, npfile)))
all_arrays = np.array(all_arrays)
np.save(f_handle, all_array)
will produce a file with a single array of shape (N_0, N_1, N_2)
If you need per-name access to the arrays, HDF5 files are a good match. See http://www.h5py.org/ (a full intro is too much for a SO reply, see the quick start guide http://docs.h5py.org/en/latest/quick.html)

As discussed in
loading arrays saved using numpy.save in append mode
it is possible to save multiple times to an open file, and it possible to load multiple times. That's not documented, and probably not preferred, but it works. savez archive is the preferred method for saving multiple arrays.
Here's a toy example:
In [777]: with open('multisave.npy','wb') as f:
...: arr = np.arange(10)
...: np.save(f, arr)
...: arr = np.arange(20)
...: np.save(f, arr)
...: arr = np.ones((3,4))
...: np.save(f, arr)
...:
In [778]: ll multisave.npy
-rw-rw-r-- 1 paul 456 Feb 13 08:38 multisave.npy
In [779]: with open('multisave.npy','rb') as f:
...: arr = np.load(f)
...: print(arr)
...: print(np.load(f))
...: print(np.load(f))
...:
[0 1 2 3 4 5 6 7 8 9]
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
Here's a simple example of saving a list of arrays of the same shape
In [780]: traces = [np.arange(10),np.arange(10,20),np.arange(100,110)]
In [781]: traces
Out[781]:
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])]
In [782]: arr = np.array(traces)
In [783]: arr
Out[783]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[100, 101, 102, 103, 104, 105, 106, 107, 108, 109]])
In [785]: np.save('mult1.npy', arr)
In [786]: data = np.load('mult1.npy')
In [787]: data
Out[787]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[100, 101, 102, 103, 104, 105, 106, 107, 108, 109]])
In [788]: list(data)
Out[788]:
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])]

Related

Calculate sum of all directly surrounding elements to some element in matrix

I am to calculate sum of all the directly surrounding elements to some element in a matrix.
[ [1, 2, 3],
[4, 5, 6],
[7, 8, 9] ]
so that sum_neighbours(matrix[0][0]) == 11 and sum_neighbours(matrix[1][1]) == 40.
The problem is just that I'm a beginner and I don't know how to make sum_neighbours calculate how many neighbours a certain number has.
I figured that I could write write if-elif-else-statement and then give the specific amount of neighbours that each value in the matrix has, but surely there must be a more efficient way to do this?
Otherwise it'll only be able to calculate the sum of the neighbours for matrices that are 3 x 3.

A nice approach is to use numpy and a convolution:
import numpy as np
from scipy.signal import convolve2d
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
convolve2d(a, [[1,1,1],[1,0,1],[1,1,1]], mode='same')
# top center bottom
output:
array([[11, 19, 13],
[23, 40, 27],
[17, 31, 19]])
Alternatively:
convolve2d(a, np.ones((3,3)), mode='same')-a
# this sums the neighbours + the center
# so we need to subtract the initial array
example on a larger array and ignoring the top left neighbor
this is just to show yo how easy it is to perform similar operations when using convolutions
a = np.arange(5*6).reshape((5,6))
# array([[ 0, 1, 2, 3, 4, 5],
# [ 6, 7, 8, 9, 10, 11],
# [12, 13, 14, 15, 16, 17],
# [18, 19, 20, 21, 22, 23],
# [24, 25, 26, 27, 28, 29]])
convolve2d(a, [[0,1,1],[1,0,1],[1,1,1]], mode='same')
array([[ 7, 15, 19, 23, 27, 25],
[ 20, 42, 49, 56, 63, 52],
[ 44, 84, 91, 98, 105, 82],
[ 68, 126, 133, 140, 147, 112],
[ 62, 107, 112, 117, 122, 73]])

If you would like to achieve this without any imports (the underlying assumption is that you have already checked that you have a well formed list of lists/matrix i.e. all the rows have the same length):
# you pass the matrix and the (i,j) coordinates of the element of interest
# This select the "matrix" around i,j (flooring to 0 and capping to
# the number of elements in the list - this is for the elements on the edge
# of the matrix)
def select(m, i, j):
def s(x, y): return x[max(0,y-1):min(len(x),y+1) + 1]
return [s(x, j) for x in s(m, i)]
def sum_around(m, i, j, excluded = True):
# this sums all the elements within each list and compute the
# grand total. It then subtracts the element in (i,j) if
# excluded = True (which is the default behaviour and what you want here)
return sum([sum(x) for x in select(m, i, j)]) - (m[i][j] if excluded else 0)
m = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print(sum_around(m, 0, 0)) # prints 11
print(sum_around(m, 1, 1)) # prints 40

I guess you can add an extra row and column on boundary with values 0.
Then you can easily add the neighbouring elements, without any boundary conditions.

Save data to 3D numpy array

I have accelerometer data (x,y,z) which is being updated every 50ms. I need to store 80 values of the data into a 3D numpy array (1, 80, 3). For example:
[[[x,y,z] (at 0ms)
[x,y,z] (at 50ms)
...
[x,y,z]]] (at 4000ms)
After getting the first 80 values, I need to update the array with upcoming values, for example:
[[[x,y,z] (at 50ms)
[x,y,z] (at 100ms)
...
[x,y,z]]] (at 4050ms)
I'm sure there is a way to update the array without needing to manually write 80 variables to store the data into, but I can't think of it. Would really appreciate some help here.

It sounds like you want your array to always be 80 long, so what I would suggest is roll the array and then update the last value.
import numpy as np
data = np.arange(80*3).reshape(80, 3)
data
>>> array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
...,
[231, 232, 233],
[234, 235, 236],
[237, 238, 239]])
data = np.roll(data, -1, axis=0)
data
>>> array([[ 3, 4, 5], # this is second row (index 1) in above array
[ 6, 7, 8], # third row
[ 9, 10, 11], # etc.
...,
[234, 235, 236],
[237, 238, 239],
[ 0, 1, 2]]) # the first row has been rolled to the last position
# now update last position with new data
data[-1] = [x, y, z] # new xyz data
data
>>> data
>>> array([[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
...,
[234, 235, 236],
[237, 238, 239],
[ 76, 76, 76]]) # new data updates in correct position in array

You can use vstack (initializing the array for the first iteration):
data=[x,y,x] # first iteration
data=np.vstack([data,[x,y,z]]) # for the rest
print(data) # you would have a Nx3 array
For the update every N seconds it is easier if you use a FIFO or a ring buffer:
https://pypi.org/project/numpy_ringbuffer/

Numpy Interweave oddly shaped arrays

Alright, here the given data;
There are three numpy arrays of the shapes:
(i, 4, 2), (i, 4, 3), (i, 4, 2)
the i is shared among them but is variable.
The dtype is float32 for everything.
The goal is to interweave them in a particular order. Let's look at the data at index 0 for these arrays:
[[-208. -16.]
[-192. -16.]
[-192. 0.]
[-208. 0.]]
[[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]
[[ 0.49609375 0.984375 ]
[ 0.25390625 0.984375 ]
[ 0.25390625 0.015625 ]
[ 0.49609375 0.015625 ]]
In this case, the concatened target array would look something like this:
[-208, -16, 1, 1, 1, 0.496, 0.984, -192, -16, 1, 1, 1, ...]
And then continue on with index 1.
I don't know how to achieve this, as the concatenate function just keeps telling me that the shapes don't match. The shape of the target array does not matter much, just that the memoryview of it must be in the given order for upload to a gpu shader.
Edit: I could achieve this with a few python for loops, but the performance impact would be a problem in this program.

Use np.dstack and flatten with np.ravel() -
np.dstack((a,b,c)).ravel()
Now, np.dstack is basically stacking along the third axis. So, alternatively we can use np.concatenate too along that axis, like so -
np.concatenate((a,b,c),axis=2).ravel()
Sample run -
1) Setup Input arrays :
In [613]: np.random.seed(1234)
...: n = 3
...: m = 2
...: a = np.random.randint(0,9,(n,m,2))
...: b = np.random.randint(11,99,(n,m,2))
...: c = np.random.randint(101,999,(n,m,2))
...:
2) Check input values :
In [614]: a
Out[614]:
array([[[3, 6],
[5, 4]],
[[8, 1],
[7, 6]],
[[8, 0],
[5, 0]]])
In [615]: b
Out[615]:
array([[[84, 58],
[61, 87]],
[[48, 45],
[49, 78]],
[[22, 11],
[86, 91]]])
In [616]: c
Out[616]:
array([[[104, 359],
[376, 560]],
[[472, 720],
[566, 115]],
[[344, 556],
[929, 591]]])
3) Output :
In [617]: np.dstack((a,b,c)).ravel()
Out[617]:
array([ 3, 6, 84, 58, 104, 359, 5, 4, 61, 87, 376, 560, 8,
1, 48, 45, 472, 720, 7, 6, 49, 78, 566, 115, 8, 0,
22, 11, 344, 556, 5, 0, 86, 91, 929, 591])

What I would do is:
np.hstack([a, b, c]).flatten()
assuming a, b, c are the three arrays

How can I create an numpy array from two different numpy arrays?

I want to create a bumpy array from two different bumpy arrays. For example:
Say I have 2 arrays a and b.
a = np.array([1,3,4])
b = np.array([[1,5,51,52],[2,6,61,62],[3,7,71,72],[4,8,81,82],[5,9,91,92]])
I want it to loop through each indices in array a and find it in array b and then save the row of b into c. Like below:
c = np.array([[1,5,51,52],
[3,7,71,72],
[4,8,81,82]])
I have tried doing:
c=np.zeros(shape=(len(b),4))
for i in b:
c[i]=a[b[i][:]]
but get this error "arrays used as indices must be of integer (or boolean) type"

Approach #1
If a is sorted, we can use np.searchsorted, like so -
idx = np.searchsorted(a,b[:,0])
idx[idx==a.size] = 0
out = b[a[idx] == b[:,0]]
Sample run -
In [160]: a
Out[160]: array([1, 3, 4])
In [161]: b
Out[161]:
array([[ 1, 5, 51, 52],
[ 2, 6, 61, 62],
[ 3, 7, 71, 72],
[ 4, 8, 81, 82],
[ 5, 9, 91, 92]])
In [162]: out
Out[162]:
array([[ 1, 5, 51, 52],
[ 3, 7, 71, 72],
[ 4, 8, 81, 82]])
If a is not sorted, we need to use sorter argument with searchsorted.
Approach #2
We can also use np.in1d -
b[np.in1d(b[:,0],a)]

Best way to vectorize operation having input and output history dependence?

My goal is to vectorize the following operation in numpy,
y[n] = c1*x[n] + c2*x[n-1] + c3*y[n-1]
If n is time, I essentially need the outputs depending on previous inputs as well as previous outputs. I'm given the values of x[-1] and y[-1]. Also, this is a generalized version of my actual problem where c1 = 1.001, c2 = -1 and c3 = 1.
I could figure out the procedure to add the first two operands, simply by adding c1*x and c2*np.concatenate([x[-1], x[0:-1]), but I can't seem to figure out the best way to deal with y[n-1].

One may use an IIR filter to do this. scipy.signal.lfilter is the correct choice in this case.
For my specific constants, the following code snippet would do -
from scipy import signal
inital = signal.lfiltic([1.001,-1], [1, -1], [y_0], [x_0])
output, _ = signal.lfilter([1.001,-1], [1, -1], input, zi=inital)
Here, signal.lfiltic is used to to specify the initial conditions.

Just by playing around with cumsum:
First a little function to produce your expression iteratively:
def foo1(x,C):
x=x.copy()
for i in range(1,x.shape[0]-1):
x[i]=np.dot(x[i-1:i+2],C)
return x[1:-1]
Make a small test array (I first worked with np.arange(10))
In [227]: y=np.arange(1,11); np.random.shuffle(y)
# array([ 4, 9, 7, 8, 2, 6, 1, 5, 10, 3])
In [229]: foo1(y,[1,2,1])
Out[229]: array([ 29, 51, 69, 79, 92, 99, 119, 142])
In [230]: y[0] + np.cumsum(2*y[1:-1] + 1*y[2:])
Out[230]: array([ 29, 51, 69, 79, 92, 99, 119, 142], dtype=int32)
and with a different C:
In [231]: foo1(y,[1,3,2])
Out[231]: array([ 45, 82, 110, 128, 148, 161, 196, 232])
In [232]: y[0]+np.cumsum(3*y[1:-1]+2*y[2:])
Out[232]: array([ 45, 82, 110, 128, 148, 161, 196, 232], dtype=int32)
I first tried:
In [238]: x=np.arange(10)
In [239]: foo1(x,[1,2,1])
Out[239]: array([ 4, 11, 21, 34, 50, 69, 91, 116])
In [240]: np.cumsum(x[:-2]+2*x[1:-1]+x[2:])
Out[240]: array([ 4, 12, 24, 40, 60, 84, 112, 144], dtype=int32)
and then realized that the x[:-2] term wasn't needed:
In [241]: np.cumsum(2*x[1:-1]+x[2:])
Out[241]: array([ 4, 11, 21, 34, 50, 69, 91, 116], dtype=int32)
If I was back in school I probably would have discovered this sort of pattern with algebra, rather than a numpy trial-n-error. It may not be general enough, but hopefully it's a start.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to append many numpy files into one numpy file in python - python

Related

Calculate sum of all directly surrounding elements to some element in matrix

Save data to 3D numpy array

Numpy Interweave oddly shaped arrays

How can I create an numpy array from two different numpy arrays?

Best way to vectorize operation having input and output history dependence?

Categories

Resources