How to make a multi-dims dataset in HDF5?

How to make a multi-dims dataset in HDF5? - python

For example,I want to make two datasets, one is Input ,the other is Output
The data in Input and Output are multi-dims.
such as
But I notice in h5py,input_node and output_node is fixed.
Input = f.create_dataset('Input', (3,input_node ),dtype='float', chunks=True)
Output = f.create_dataset('Output', (3,output_node),dtype='float', chunks=True)
But hdf5 can't handle this,this code can prove it
import h5py
X = [[1,2,3,4],[1,2],[1,2,3,4,5,6]]
with h5py.File('myfile.hdf5', "w") as ofile:
ofile.create_dataset("X", data=X)
TypeError: Object dtype dtype('O') has no native HDF5 equivalent
So how to make a multi-dims dataset in h5py?

I don't quite follow what your {...} denote. In Python those are used for dictionaries and sets. [] are used for lists, () for tuples. Array shape is expressed as a tuple.
Anyways, your code produces
In [68]: X
Out[68]:
array([ list([0.6503719194043309, 0.8703218883225239, -1.4139639093161405, 2.3288987644271835, -1.7957516518177206]),
list([-0.1781710442823114, 0.9591992379396287, -0.6319292685053243]),
list([0.7104492662861611, -0.8951817329357393, -0.8925882332063567, 1.5587934871464815]),
list([-1.2384976614455354, 0.9044140291496179, 1.1277220227448401]),
list([1.1386910680393805, -0.1775792543137636, 1.0567836199711476]),
list([2.7535019220459707, 0.29518918092088386, -0.32166742909305196, 1.5269788560083497, 0.29633276686886767]),
list([1.6397535315116918, -0.8839570613086122, -0.4491121599234047, -2.4461439611764333, -0.6884616200199412, -1.1920165045444608]),
list([1.3240629024597295, 1.170019287452736, 0.5999977019629572, -0.38338543090263366, 0.6030856099472732]),
list([-0.013529997305716175, -0.7093551284624415, -1.8611980839518099, 0.9165791506693297]),
list([2.384081118320432, -0.6158201308053464, 0.8802896893269192, -0.7636283160361232])], dtype=object)
In [69]: y
Out[69]: array([1, 1, 0, 0, 0, 1, 1, 0, 1, 0])
y is a simple array. h5py should have no problem saving that.
X is an object dtype array, containing lists of varying size
In [72]: [len(l) for l in X]
Out[72]: [5, 3, 4, 3, 3, 5, 6, 5, 4, 4]
h5py cannot save that kind of array. At best you can write each element to a different dataset. It will save each as an array.
....
for i, item in enumerate(X):
ofile.create_dataset('name%s'%i, data=item)

Related

how to delete a row or column in numpy array without actually creating a new copy?

I want to delete a particular row or column without actually creating a new copy in python numpy.
Right now i'm doing arr = np.delete(arr, row_or_column_number, axis) but it returns a copy and i have to assign it to it's self everytime.
I was wondering if a more ingenious approached could be used where the change is made to the array itself instead of creating a new copy every time ?

In [114]: x = np.arange(12).reshape(3,4)
In [115]: x.shape
Out[115]: (3, 4)
In [116]: x.ravel()
Out[116]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Do you understand how arrays are stored? Basically there's a flat storage of the elements, much like this ravel, and shape and strides. If not, you need to spend some time reading a numpy tutorial.
Delete makes a new array:
In [117]: y = np.delete(x, 1, 0)
In [118]: y
Out[118]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]])
In [119]: y.shape
Out[119]: (2, 4)
In [120]: y.ravel()
Out[120]: array([ 0, 1, 2, 3, 8, 9, 10, 11])
This delete is the same as selecting 2 rows from x, x[[0,2],:].
Its data elements are different; it has to copy values from x. Whether you assign that back to x doesn't matter. Variable assignment is a trivial python operation. What matters is how the new array is created.
Now in this particular case it is possible to create a view. This is still a new array, but it share memory with x. That's possible because I am selecting a regular pattern, not an arbitrary subset of the rows or columns.
In [121]: x[0::2,:]
Out[121]:
array([[ 0, 1, 2, 3],
[ 8, 9, 10, 11]])
Again, if view doesn't make sense, you need to read more numpy basics. And don't skip the python basics either.

Unfortunately, you can't do this using numpy. Array scalars are immutable. See documentation.
Link to a related question: How to remove specific elements in a numpy array

Once a numpy array is created, its size is fixed. To delete (or add) a column or row, a new copy needs to be created.
(Even if numpy had an option to drop columns without reassignment, its likely that another copy would still be created. Another library, Pandas, has the option called "inplace" to delete a column from an object without doing any reassignment, but its use is discouraged, and it doesn't literally prevent a copy from being created. For these reasons, it may be deprecated in the future.)

How to get an array with one dimension in python ?

I need to have each numpy file from a folder like an array with 1 dimension; This is my code:
path ='E:\\t'
traces= os.listdir(path)
print("tempTracesHW_Creation=", tempTracesHW)
for i in range(len(traces)):
HW = tempHW[i]
for trace in os.listdir(path):
file_array= np.load(os.path.join(path, trace))
print file_array
tempTracesHW[HW].append(file_array)
The result of file_array is:
file_array= [[-0.0006447 -0.00094265 -0.0012406 ..., -0.02096185 -0.0210646
-0.02114679]]
But what I want is:
file_array= [-0.0006447 -0.00094265 -0.0012406 ..., -0.02096185 -0.0210646
-0.02114679]
I would be very grateful if you could help me please?

The numpy load function loads the file and return the array.
The file_array is two dimensional because your input to numpy.load is two dimensional.
Check the trace file, you need to make it one-dimensional array.
For example:
example = numpy.save("example",numpy.array([1,2,3]))
result = numpy.load("example.npy")
print result
[1,2,3]
See if this helps.
More of the code snippet would have help understanding your problem.(About the Trace file)

You can use flatten to turn this (1, x) array into a (x, ) array. flatten can be used differently, but in this case, it will do what you're looking for.
>>> import numpy as np
>>> a = np.array([[1, 2, 3, 4, 5]])
>>> a
array([[1, 2, 3, 4, 5]])
>>> a.shape
(1, 5)
>>> a.flatten()
array([1, 2, 3, 4, 5])
>>> a.flatten().shape
(5,)

Appending a new row to a numpy array

I am trying to append a new row to an existing numpy array in a loop. I have tried the methods involving append, concatenate and also vstack none of them end up giving me the result I want.
I have tried the following:
for _ in col_change:
if (item + 2 < len(col_change)):
arr=[col_change[item], col_change[item + 1], col_change[item + 2]]
array=np.concatenate((array,arr),axis=0)
item+=1
I have also tried it in the most basic format and it still gives me an empty array.
array=np.array([])
newrow = [1, 2, 3]
newrow1 = [4, 5, 6]
np.concatenate((array,newrow), axis=0)
np.concatenate((array,newrow1), axis=0)
print(array)
I want the output to be [[1,2,3][4,5,6]...]

The correct way to build an array incrementally is to not start with an array:
alist = []
alist.append([1, 2, 3])
alist.append([4, 5, 6])
arr = np.array(alist)
This is essentially the same as
arr = np.array([ [1,2,3], [4,5,6] ])
the most common way of making a small (or large) sample array.
Even if you have good reason to use some version of concatenate (hstack, vstack, etc), it is better to collect the components in a list, and perform the concatante once.

If you want [[1,2,3],[4,5,6]] I could present you an alternative without append: np.arange and then reshape it:
>>> import numpy as np
>>> np.arange(1,7).reshape(2, 3)
array([[1, 2, 3],
[4, 5, 6]])
Or create a big array and fill it manually (or in a loop):
>>> array = np.empty((2, 3), int)
>>> array[0] = [1,2,3]
>>> array[1] = [4,5,6]
>>> array
array([[1, 2, 3],
[4, 5, 6]])
A note on your examples:
In the second one you forgot to save the result, make it array = np.concatenate((array,newrow1), axis=0) and it works (not exactly like you want it but the array is not empty anymore). The first example seems badly indented and without know the variables and/or the problem there it's hard to debug.

Sum of outer product of corresponding lists in two arrays - NumPy

I am trying to find the numpy matrix operations to get the same result as in the following for loop code. I believe it will be much faster but I am missing some python skills to do it.
It works line by line, each value from a line of x is multiplied by each value of the same line in e and then summed.
The first item of result would be (2*0+2*1+2*4+2*2+2*3)+(0*0+...)+...+(1*0+1*1+1*4+1*2+1*3)=30
Any idea would be much appreciated :).
e = np.array([[0,1,4,2,3],[2,0,2,3,0,1]])
x = np.array([[2,0,0,0,1],[0,3,0,0,4,0]])
result = np.zeros(len(x))
for key, j in enumerate(x):
for jj in j:
for i in e[key]:
result[key] += jj*i
>>> result
Out[1]: array([ 30., 56.])

Those are ragged arrays as they have lists of different lengths. So, a fully vectorized approach even if possible won't be straight-forward. Here's one using np.einsum in a loop comprehension -
[np.einsum('i,j->',x[n],e[n]) for n in range(len(x))]
Sample run -
In [381]: x
Out[381]: array([[2, 0, 0, 0, 1], [0, 3, 0, 0, 4, 0]], dtype=object)
In [382]: e
Out[382]: array([[0, 1, 4, 2, 3], [2, 0, 2, 3, 0, 1]], dtype=object)
In [383]: [np.einsum('i,j->',x[n],e[n]) for n in range(len(x))]
Out[383]: [30, 56]
If you are still feel persistent about a fully vectorized approach, you could make a regular array with the smaller lists being filled zeros. For the same, here's a post that lists a NumPy based approach to do the filling.
Once, we have the regular shaped arrays as x and e, the final result would be simply -
np.einsum('ik,il->i',x,e)

Is this close to what you are looking for?
https://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html
It seems like you are trying to get the dot product of matrices.

Numpy - the best way to remove the last element from 1 dimensional array?

What is the most efficient way to remove the last element from a numpy 1 dimensional array? (like pop for list)

NumPy arrays have a fixed size, so you cannot remove an element in-place. For example using del doesn't work:
>>> import numpy as np
>>> arr = np.arange(5)
>>> del arr[-1]
ValueError: cannot delete array elements
Note that the index -1 represents the last element. That's because negative indices in Python (and NumPy) are counted from the end, so -1 is the last, -2 is the one before last and -len is actually the first element. That's just for your information in case you didn't know.
Python lists are variable sized so it's easy to add or remove elements.
So if you want to remove an element you need to create a new array or view.
Creating a new view
You can create a new view containing all elements except the last one using the slice notation:
>>> arr = np.arange(5)
>>> arr
array([0, 1, 2, 3, 4])
>>> arr[:-1] # all but the last element
array([0, 1, 2, 3])
>>> arr[:-2] # all but the last two elements
array([0, 1, 2])
>>> arr[1:] # all but the first element
array([1, 2, 3, 4])
>>> arr[1:-1] # all but the first and last element
array([1, 2, 3])
However a view shares the data with the original array, so if one is modified so is the other:
>>> sub = arr[:-1]
>>> sub
array([0, 1, 2, 3])
>>> sub[0] = 100
>>> sub
array([100, 1, 2, 3])
>>> arr
array([100, 1, 2, 3, 4])
Creating a new array
1. Copy the view
If you don't like this memory sharing you have to create a new array, in this case it's probably simplest to create a view and then copy (for example using the copy() method of arrays) it:
>>> arr = np.arange(5)
>>> arr
array([0, 1, 2, 3, 4])
>>> sub_arr = arr[:-1].copy()
>>> sub_arr
array([0, 1, 2, 3])
>>> sub_arr[0] = 100
>>> sub_arr
array([100, 1, 2, 3])
>>> arr
array([0, 1, 2, 3, 4])
2. Using integer array indexing [docs]
However, you can also use integer array indexing to remove the last element and get a new array. This integer array indexing will always (not 100% sure there) create a copy and not a view:
>>> arr = np.arange(5)
>>> arr
array([0, 1, 2, 3, 4])
>>> indices_to_keep = [0, 1, 2, 3]
>>> sub_arr = arr[indices_to_keep]
>>> sub_arr
array([0, 1, 2, 3])
>>> sub_arr[0] = 100
>>> sub_arr
array([100, 1, 2, 3])
>>> arr
array([0, 1, 2, 3, 4])
This integer array indexing can be useful to remove arbitrary elements from an array (which can be tricky or impossible when you want a view):
>>> arr = np.arange(5, 10)
>>> arr
array([5, 6, 7, 8, 9])
>>> arr[[0, 1, 3, 4]] # keep first, second, fourth and fifth element
array([5, 6, 8, 9])
If you want a generalized function that removes the last element using integer array indexing:
def remove_last_element(arr):
return arr[np.arange(arr.size - 1)]
3. Using boolean array indexing [docs]
There is also boolean indexing that could be used, for example:
>>> arr = np.arange(5, 10)
>>> arr
array([5, 6, 7, 8, 9])
>>> keep = [True, True, True, True, False]
>>> arr[keep]
array([5, 6, 7, 8])
This also creates a copy! And a generalized approach could look like this:
def remove_last_element(arr):
if not arr.size:
raise IndexError('cannot remove last element of empty array')
keep = np.ones(arr.shape, dtype=bool)
keep[-1] = False
return arr[keep]
If you would like more information on NumPys indexing the documentation on "Indexing" is quite good and covers a lot of cases.
4. Using np.delete()
Normally I wouldn't recommend the NumPy functions that "seem" like they are modifying the array in-place (like np.append and np.insert) but do return copies because these are generally needlessly slow and misleading. You should avoid them whenever possible, that's why it's the last point in my answer. However in this case it's actually a perfect fit so I have to mention it:
>>> arr = np.arange(10, 20)
>>> arr
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
>>> np.delete(arr, -1)
array([10, 11, 12, 13, 14, 15, 16, 17, 18])
5.) Using np.resize()
NumPy has another method that sounds like it does an in-place operation but it really returns a new array:
>>> arr = np.arange(5)
>>> arr
array([0, 1, 2, 3, 4])
>>> np.resize(arr, arr.size - 1)
array([0, 1, 2, 3])
To remove the last element I simply provided a new shape that is 1 smaller than before, which effectively removes the last element.
Modifying the array inplace
Yes, I've written previously that you cannot modify an array in place. But I said that because in most cases it's not possible or only by disabling some (completely useful) safety checks. I'm not sure about the internals but depending on the old size and the new size it could be possible that this includes an (internal-only) copy operation so it might be slower than creating a view.
Using np.ndarray.resize()
If the array doesn't share its memory with any other array, then it's possible to resize the array in place:
>>> arr = np.arange(5, 10)
>>> arr.resize(4)
>>> arr
array([5, 6, 7, 8])
However that will throw ValueErrors in case it's actually referenced by another array as well:
>>> arr = np.arange(5)
>>> view = arr[1:]
>>> arr.resize(4)
ValueError: cannot resize an array that references or is referenced by another array in this way. Use the resize function
You can disable that safety-check by setting refcheck=False but that shouldn't be done lightly because you make yourself vulnerable for segmentation faults and memory corruption in case the other reference tries to access the removed elements! This refcheck argument should be treated as an expert-only option!
Summary
Creating a view is really fast and doesn't take much additional memory, so whenever possible you should try to work as much with views as possible. However depending on the use-cases it's not so easy to remove arbitrary elements using basic slicing. While it's easy to remove the first n elements and/or last n elements or remove every x element (the step argument for slicing) this is all you can do with it.
But in your case of removing the last element of a one-dimensional array I would recommend:
arr[:-1] # if you want a view
arr[:-1].copy() # if you want a new array
because these most clearly express the intent and everyone with Python/NumPy experience will recognize that.
Timings
Based on the timing framework from this answer:
# Setup
import numpy as np
def view(arr):
return arr[:-1]
def array_copy_view(arr):
return arr[:-1].copy()
def array_int_index(arr):
return arr[np.arange(arr.size - 1)]
def array_bool_index(arr):
if not arr.size:
raise IndexError('cannot remove last element of empty array')
keep = np.ones(arr.shape, dtype=bool)
keep[-1] = False
return arr[keep]
def array_delete(arr):
return np.delete(arr, -1)
def array_resize(arr):
return np.resize(arr, arr.size - 1)
# Timing setup
timings = {view: [],
array_copy_view: [], array_int_index: [], array_bool_index: [],
array_delete: [], array_resize: []}
sizes = [2**i for i in range(1, 20, 2)]
# Timing
for size in sizes:
print(size)
func_input = np.random.random(size=size)
for func in timings:
print(func.__name__.ljust(20), ' ', end='')
res = %timeit -o func(func_input) # if you use IPython, otherwise use the "timeit" module
timings[func].append(res)
# Plotting
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure(1)
ax = plt.subplot(111)
for func in timings:
ax.plot(sizes,
[time.best for time in timings[func]],
label=func.__name__)
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('size')
ax.set_ylabel('time [seconds]')
ax.grid(which='both')
ax.legend()
plt.tight_layout()
I get the following timings as log-log plot to cover all the details, lower time still means faster, but the range between two ticks represents one order of magnitude instead of a fixed amount. In case you're interested in the specific values, I copied them into this gist:
According to these timings those two approaches are also the fastest. (Python 3.6 and NumPy 1.14.0)

If you want to quickly get array without last element (not removing explicit), use slicing:
array[:-1]

To delete the last element from a 1-dimensional NumPy array, use the numpy.delete method, like so:
import numpy as np
# Create a 1-dimensional NumPy array that holds 5 values
values = np.array([1, 2, 3, 4, 5])
# Remove the last element of the array using the numpy.delete method
values = np.delete(values, -1)
print(values)
Output:
[1 2 3 4]
The last value of the NumPy array, which was 5, is now removed.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to make a multi-dims dataset in HDF5? - python

Related

how to delete a row or column in numpy array without actually creating a new copy?

How to get an array with one dimension in python ?

Appending a new row to a numpy array

Sum of outer product of corresponding lists in two arrays - NumPy

Numpy - the best way to remove the last element from 1 dimensional array?

Categories

Resources