Create a dynamic 2D numpy array on the fly - python

I am having a hard time creating a numpy 2D array on the fly.
So basically I have a for loop something like this.
for ele in huge_list_of_lists:
instance = np.array(ele)
creates a 1D numpy array of this list and now I want to append it to a numpy array so basically converting list of lists to array of arrays?
I have checked the manual.. and np.append() methods that doesn't work as for np.append() to work, it needs two arguments to append it together.
Any clues?

Create the 2D array up front, and fill the rows while looping:
my_array = numpy.empty((len(huge_list_of_lists), row_length))
for i, x in enumerate(huge_list_of_lists):
my_array[i] = create_row(x)
where create_row() returns a list or 1D NumPy array of length row_length.
Depending on what create_row() does, there might be even better approaches that avoid the Python loop altogether.

Just pass the list of lists to numpy.array, keep in mind that numpy arrays are ndarrays, so the concept to a list of lists doesn't translate to arrays of arrays it translates to a 2d array.
>>> import numpy as np
>>> a = [[1., 2., 3.], [4., 5., 6.]]
>>> b = np.array(a)
>>> b
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
>>> b.shape
(2, 3)
Also ndarrays have nd-indexing so [1][1] becomes [1, 1] in numpy:
>>> a[1][1]
5.0
>>> b[1, 1]
5.0
Did I misunderstand your question?
You defiantly don't want to use numpy.append for something like this. Keep in mind that numpy.append has O(n) run time so if you call it n times, once for each row of your array, you end up with a O(n^2) algorithm. If you need to create the array before you know what all the content is going to be, but you know the final size, it's best to create an array using numpy.zeros(shape, dtype) and fill it in later. Similar to Sven's answer.

import numpy as np
ss = np.ndarray(shape=(3,3), dtype=int);
array([[ 0, 139911262763080, 139911320845424],
[ 10771584, 10771584, 139911271110728],
[139911320994680, 139911206874808, 80]]) #random
numpy.ndarray function achieves this. numpy.ndarray

Related

Place output of numpy function into diagonal of array

I want to take the row sums of one array and place the output into the diagonals of another array. For performance reasons, I want to use the out argument of the np.sum function.
mat1 = np.array([[0.5, 0.5],[0.6, 0.4]])
mat2 = np.zeros([2,2])
mat3 = np.zeros([2,2])
If I want to place the row sums of mat1 into the first row of mat2, I can do it like this:
np.sum(mat1, axis=1, out = mat2[0])
mat2
#array([[ 1., 1.],
# [ 0., 0.]])
However, if I want to place the sums into the diagonal indices of mat3, I can't seem to do so.
np.sum(mat1, axis=1, out = mat3[np.diag_indices(2)])
mat3
#array([[ 0., 0.],
# [ 0., 0.]])
Of course, the following works, but I would like to use the out argument of np.sum
mat3[np.diag_indices(2)] = np.sum(mat1, axis=1)
mat3
#array([[ 1., 0.],
# [ 0., 1.]])
Can someone explain this behavior of the out argument not accepting the diagonal indices of an array as a valid output?
NumPy has two types of indexing: basic indexing and advanced indexing.
Basic indexing is what happens when your index expression uses only integers, slices, ..., and None (a.k.a. np.newaxis). This can be implemented entirely through simple manipulation of offsets and strides, so when basic indexing returns an array, the resulting array is always a view of the original data. Writing to the view writes to the original array.
When you index with an array, as in mat3[np.diag_indices(2)], you get advanced indexing. Advanced indexing cannot be done in a way that returns a view of the original data; it always copies data from the original array. That means that when you try to use the copy as an out parameter:
np.sum(mat1, axis=1, out = mat3[np.diag_indices(2)])
The data is placed into the copy, but the original array is unaffected.
We were supposed to have the ability to use np.diagonal for this by now, but even though the documentation says np.diagonal's output is writeable in NumPy 1.10, the relevant feature for making it writable is still in limbo. It's probably best to just not use the out parameter for this:
mat3[np.diag_indices(2)] = np.sum(mat1, axis=1)

How to assign a 1D numpy array of length x to an element of length y of a 2D Numpy Array?

I'm looking for a way to assign a 1D numpy-array consisting of x elements to a 2D numpy Array of shape (y,z).
Example:
A=np.array([[0],[0],[0]])
A[2]=np.array([0,2])
Which should result in
A=[[0],[0],[0,2]]
This works perfectly fine using a python list, but has been causing me huge trouble when trying to do it in numpy, usually resulting in the error message:
could not broadcast input array from shape (z) into shape (x)
This seems to occur as a result of the fact that numpy copies everything instead of modifying the array in place. I have only recently begun using numpy and would really be grateful if someone could help find a way to do this efficiently.
Actually the issue is that Numpy refuses to perform implicit copies or reshapes. For instance:
>>> A=np.array([[0],[0],[0]])
>>> A[2]=np.array([0,2])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not broadcast input array from shape (2) into shape (1)
Here A[2] is a subarray of A, of shape 1. 2 cells can't fit in 1, so we get shape error. The reverse situation is possible and known as broadcasting:
>>> A[0:2]=5
>>> A
array([[5],
[5],
[0]])
Here a single scalar has been broadcast to update the entire subarray. We can resize A to be able to fit the shape 2 entry:
>>> A.shape
(3, 1)
>>> A.resize((3,2))
>>> A.shape
(3, 2)
>>> A[2]=np.array([0,2])
>>> A
array([[5, 5],
[0, 0],
[0, 2]])
We can see that the resizing actually reorganized our cells. It still starts with 5 5 0 but the cells are no longer along a single column. This is because numpy doesn't copy unless asked to, either; all of our multicell slices in fact refer into the same original array. We can make a second matrix and copy the original into a single column there:
>>> B=np.zeros((A.shape[0]+1,A.shape[1]))
>>> B[:,0]=A.transpose()
>>> B
array([[ 5., 0.],
[ 5., 0.],
[ 0., 0.]])
The transpose is because the slice of B is 1-dimensional shape (3 long) rather than a 2-dimensional shape like A (which is 1 wide and 3 high). Numpy considers the 1-dimensional array to be a horisontal shape, so a 3 wide and 1 high matrix will fit. You could think of it like copying a range of cells in a spreadsheet.
Notably, the numbers thus placed in B are copies of what was in A. This is because we did a modification of B. Views can be used to manipulate sections of a matrix (including seeing it in another shape, like transpose() does), for instance:
>>> C=B[::-1,1]
>>> C
array([ 0., 0., 0.])
>>> C[:]=[1,2,3]
>>> B
array([[ 5., 3.],
[ 5., 2.],
[ 0., 1.]])

Numpy convert scalars to arrays

I am evaluating arbitrary expressions in terms of an x array, such as 3*x**2 + 4. This normally results in an array with x's shape. However if the expression is just a constant, it returns a scalar. What is the best way to ensure it has x's shape without explicitly checking the shape? Multiplying by numpy.ones(x.shape) works, but I think that uses unnecessary computations.
Edit:
To be clear, I don't just want it to be an array with size one, I want it to be the same shape and size as X.
I'm evaluating a string using NumExpr which can contain an arbitrary function of x:
x = numpy.linspace(min, max, num)
y = numexpr.evaluate(expr, {'x': x}, {})
I want to get an array of y-values that could be plotted against x through matplotlib. Currently I am doing this, which works fine:
y = numpy.ones(x.size) * y
But I'm worried that this is wasteful for large sizes. Is there a better way?
See atleast_1d:
Convert inputs to arrays with at least one dimension.
>>> import numpy as np
>>> x = 42 # x is a scalar
>>> np.atleast_1d(x)
array([42])
>>> x_is_array = np.array(42) # A zero dim array
>>> np.atleast_1d(x_is_array)
array([42])
>>> x_is_another_array = np.array([42]) # A 1d array
>>> np.atleast_1d(x_is_another_array)
array([42])
>>> np.atleast_1d(np.ones((3, 3))) # Any other numpy array
array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])
When I'm unsure whether x will be a scalar, list/tuple or array, I've been using:
x = np.asarray(x).reshape(1, -1)[0,:]
Alternatively by (ab)using the broadcasting rules, you could equally write:
x = np.asarray(x) * np.ones(1)
Perhaps a slightly more streamlined syntax is to make use of the extra arguments on the array constructor:
x = np.array(x, ndmin=1, copy=False)
Which will ensure that the array has at least one dimension.
But this is one of those things that seems a bit clumsy in numpy
You can use reshape: np.reshape(x, (1,1))
Here's demonstration:
>>> x = 4
>>> a = np.reshape(x, (1,1))
>>> a[0]
array([4])
>>> a[0][0]
lin_reg.predict(np.array(6.5).reshape(1,-1))

How to convert a Numpy 2D array with object dtype to a regular 2D array of floats

As part of broader program I am working on, I ended up with object arrays with strings, 3D coordinates and etc all mixed. I know object arrays might not be very favorite in comparison to structured arrays but I am hoping to get around this without changing a lot of codes.
Lets assume every row of my array obj_array (with N rows) has format of
Single entry/object of obj_array: ['NAME',[10.0,20.0,30.0],....]
Now, I am trying to load this object array and slice the 3D coordinate chunk. Up to here, everything works fine with simply asking lets say for .
obj_array[:,[1,2,3]]
However the result is also an object array and I will face problem as I want to form a 2D array of floats with:
size [N,3] of N rows and 3 entries of X,Y,Z coordinates
For now, I am looping over rows and assigning every row to a row of a destination 2D flot array to get around the problem. I am wondering if there is any better way with array conversion tools of numpy ? I tried a few things and could not get around it.
Centers = np.zeros([N,3])
for row in range(obj_array.shape[0]):
Centers[row,:] = obj_array[row,1]
Thanks
Nasty little problem... I have been fooling around with this toy example:
>>> arr = np.array([['one', [1, 2, 3]],['two', [4, 5, 6]]], dtype=np.object)
>>> arr
array([['one', [1, 2, 3]],
['two', [4, 5, 6]]], dtype=object)
My first guess was:
>>> np.array(arr[:, 1])
array([[1, 2, 3], [4, 5, 6]], dtype=object)
But that keeps the object dtype, so perhaps then:
>>> np.array(arr[:, 1], dtype=np.float)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: setting an array element with a sequence.
You can normally work around this doing the following:
>>> np.array(arr[:, 1], dtype=[('', np.float)]*3).view(np.float).reshape(-1, 3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: expected a readable buffer object
Not here though, which was kind of puzzling. Apparently it is the fact that the objects in your array are lists that throws this off, as replacing the lists with tuples works:
>>> np.array([tuple(j) for j in arr[:, 1]],
... dtype=[('', np.float)]*3).view(np.float).reshape(-1, 3)
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
Since there doesn't seem to be any entirely satisfactory solution, the easiest is probably to go with:
>>> np.array(list(arr[:, 1]), dtype=np.float)
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
Although that will not be very efficient, probably better to go with something like:
>>> np.fromiter((tuple(j) for j in arr[:, 1]), dtype=[('', np.float)]*3,
... count=len(arr)).view(np.float).reshape(-1, 3)
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
Based on Jaime's toy example I think you can do this very simply using np.vstack():
arr = np.array([['one', [1, 2, 3]],['two', [4, 5, 6]]], dtype=np.object)
float_arr = np.vstack(arr[:, 1]).astype(np.float)
This will work regardless of whether the 'numeric' elements in your object array are 1D numpy arrays, lists or tuples.
This works great working on your array arr to convert from an object to an array of floats. Number processing is extremely easy after. Thanks for that last post!!!! I just modified it to include any DataFrame size:
float_arr = np.vstack(arr[:, :]).astype(np.float)
This is way faster to just convert your object array to a NumPy float array:
arr=np.array(arr, dtype=[('O', np.float)]).astype(np.float) - from there no looping, index it just like you'd normally do on a NumPy array. You'd have to do it in chunks though with your different datatypes arr[:, 1], arr[:,2], etc. Had the same issue with a NumPy tuple object returned from a C++ DLL function - conversion for 17M elements takes <2s.
You may want to use structured array, so that when you need to access the names and the values independently you can easily do so. In this example, there are two data points:
x = zeros(2, dtype=[('name','S10'), ('value','f4',(3,))])
x[0][0]='item1'
x[1][0]='item2'
y1=x['name']
y2=x['value']
the result:
>>> y1
array(['item1', 'item2'],
dtype='|S10')
>>> y2
array([[ 0., 0., 0.],
[ 0., 0., 0.]], dtype=float32)
See more details: http://docs.scipy.org/doc/numpy/user/basics.rec.html
This problem usually happens when you have a dataset with different types, usually, dates in the first column or so.
What I use to do, is to store the date column in a different variable; and take the rest of the "X matrix of features" into X. So I have dates and X, for instance.
Then I apply the conversion to the X matrix as:
X = np.array(list(X[:,:]), dtype=np.float)
Hope to help!
For structured arrays use
structured_to_unstructured(arr).astype(np.float)
See: https://numpy.org/doc/stable/user/basics.rec.html#numpy.lib.recfunctions.structured_to_unstructured
np.array(list(arr), dtype=np.float) would work to convert all the elements in array to float at once.

initialize a numpy array

Is there way to initialize a numpy array of a shape and add to it? I will explain what I need with a list example. If I want to create a list of objects generated in a loop, I can do:
a = []
for i in range(5):
a.append(i)
I want to do something similar with a numpy array. I know about vstack, concatenate etc. However, it seems these require two numpy arrays as inputs. What I need is:
big_array # Initially empty. This is where I don't know what to specify
for i in range(5):
array i of shape = (2,4) created.
add to big_array
The big_array should have a shape (10,4). How to do this?
EDIT:
I want to add the following clarification. I am aware that I can define big_array = numpy.zeros((10,4)) and then fill it up. However, this requires specifying the size of big_array in advance. I know the size in this case, but what if I do not? When we use the .append function for extending the list in python, we don't need to know its final size in advance. I am wondering if something similar exists for creating a bigger array from smaller arrays, starting with an empty array.
numpy.zeros
Return a new array of given shape and
type, filled with zeros.
or
numpy.ones
Return a new array of given shape and
type, filled with ones.
or
numpy.empty
Return a new array of given shape and
type, without initializing entries.
However, the mentality in which we construct an array by appending elements to a list is not much used in numpy, because it's less efficient (numpy datatypes are much closer to the underlying C arrays). Instead, you should preallocate the array to the size that you need it to be, and then fill in the rows. You can use numpy.append if you must, though.
The way I usually do that is by creating a regular list, then append my stuff into it, and finally transform the list to a numpy array as follows :
import numpy as np
big_array = [] # empty regular list
for i in range(5):
arr = i*np.ones((2,4)) # for instance
big_array.append(arr)
big_np_array = np.array(big_array) # transformed to a numpy array
of course your final object takes twice the space in the memory at the creation step, but appending on python list is very fast, and creation using np.array() also.
Introduced in numpy 1.8:
numpy.full
Return a new array of given shape and type, filled with fill_value.
Examples:
>>> import numpy as np
>>> np.full((2, 2), np.inf)
array([[ inf, inf],
[ inf, inf]])
>>> np.full((2, 2), 10)
array([[10, 10],
[10, 10]])
Array analogue for the python's
a = []
for i in range(5):
a.append(i)
is:
import numpy as np
a = np.empty((0))
for i in range(5):
a = np.append(a, i)
You do want to avoid explicit loops as much as possible when doing array computing, as that reduces the speed gain from that form of computing. There are multiple ways to initialize a numpy array. If you want it filled with zeros, do as katrielalex said:
big_array = numpy.zeros((10,4))
EDIT: What sort of sequence is it you're making? You should check out the different numpy functions that create arrays, like numpy.linspace(start, stop, size) (equally spaced number), or numpy.arange(start, stop, inc). Where possible, these functions will make arrays substantially faster than doing the same work in explicit loops
To initialize a numpy array with a specific matrix:
import numpy as np
mat = np.array([[1, 1, 0, 0, 0],
[0, 1, 0, 0, 1],
[1, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[1, 0, 1, 0, 1]])
print mat.shape
print mat
output:
(5, 5)
[[1 1 0 0 0]
[0 1 0 0 1]
[1 0 0 1 1]
[0 0 0 0 0]
[1 0 1 0 1]]
For your first array example use,
a = numpy.arange(5)
To initialize big_array, use
big_array = numpy.zeros((10,4))
This assumes you want to initialize with zeros, which is pretty typical, but there are many other ways to initialize an array in numpy.
Edit:
If you don't know the size of big_array in advance, it's generally best to first build a Python list using append, and when you have everything collected in the list, convert this list to a numpy array using numpy.array(mylist). The reason for this is that lists are meant to grow very efficiently and quickly, whereas numpy.concatenate would be very inefficient since numpy arrays don't change size easily. But once everything is collected in a list, and you know the final array size, a numpy array can be efficiently constructed.
numpy.fromiter() is what you are looking for:
big_array = numpy.fromiter(xrange(5), dtype="int")
It also works with generator expressions, e.g.:
big_array = numpy.fromiter( (i*(i+1)/2 for i in xrange(5)), dtype="int" )
If you know the length of the array in advance, you can specify it with an optional 'count' argument.
I realize that this is a bit late, but I did not notice any of the other answers mentioning indexing into the empty array:
big_array = numpy.empty(10, 4)
for i in range(5):
array_i = numpy.random.random(2, 4)
big_array[2 * i:2 * (i + 1), :] = array_i
This way, you preallocate the entire result array with numpy.empty and fill in the rows as you go using indexed assignment.
It is perfectly safe to preallocate with empty instead of zeros in the example you gave since you are guaranteeing that the entire array will be filled with the chunks you generate.
I'd suggest defining shape first.
Then iterate over it to insert values.
big_array= np.zeros(shape = ( 6, 2 ))
for it in range(6):
big_array[it] = (it,it) # For example
>>>big_array
array([[ 0., 0.],
[ 1., 1.],
[ 2., 2.],
[ 3., 3.],
[ 4., 4.],
[ 5., 5.]])
Whenever you are in the following situation:
a = []
for i in range(5):
a.append(i)
and you want something similar in numpy, several previous answers have pointed out ways to do it, but as #katrielalex pointed out these methods are not efficient. The efficient way to do this is to build a long list and then reshape it the way you want after you have a long list. For example, let's say I am reading some lines from a file and each row has a list of numbers and I want to build a numpy array of shape (number of lines read, length of vector in each row). Here is how I would do it more efficiently:
long_list = []
counter = 0
with open('filename', 'r') as f:
for row in f:
row_list = row.split()
long_list.extend(row_list)
counter++
# now we have a long list and we are ready to reshape
result = np.array(long_list).reshape(counter, len(row_list)) # desired numpy array
Maybe something like this will fit your needs..
import numpy as np
N = 5
res = []
for i in range(N):
res.append(np.cumsum(np.ones(shape=(2,4))))
res = np.array(res).reshape((10, 4))
print(res)
Which produces the following output
[[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]]
If you want to add your item in multi-dimensional array, here is the solution.
import numpy as np
big_array = np.ndarray(shape=(0, 2, 4) # Empty with height and width 2, 4 and length 0
for i in range(5):
big_array = np.concatenate((big_array, i))
Here is the numpy official document for referral
# https://thispointer.com/create-an-empty-2d-numpy-array-matrix-and-append-rows-or-columns-in-python/
# Create an empty Numpy array with 4 columns or 0 rows
empty_array = np.empty((0, 4), int)
# Append a row to the 2D numpy array
empty_array = np.append(empty_array, np.array([[11, 21, 31, 41]]), axis=0)
# Append 2nd rows to the 2D Numpy array
empty_array = np.append(empty_array, np.array([[15, 25, 35, 45]]), axis=0)
print('2D Numpy array:')
print(empty_array)
pay attention that each inputed np.array is 2-dimensional

Categories

Resources