I want to create an empty array and append items to it, one at a time.
xs = []
for item in data:
xs.append(item)
Can I use this list-style notation with NumPy arrays?
That is the wrong mental model for using NumPy efficiently. NumPy arrays are stored in contiguous blocks of memory. To append rows or columns to an existing array, the entire array needs to be copied to a new block of memory, creating gaps for the new elements to be stored. This is very inefficient if done repeatedly.
Instead of appending rows, allocate a suitably sized array, and then assign to it row-by-row:
>>> import numpy as np
>>> a = np.zeros(shape=(3, 2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> a[0] = [1, 2]
>>> a[1] = [3, 4]
>>> a[2] = [5, 6]
>>> a
array([[ 1., 2.],
[ 3., 4.],
[ 5., 6.]])
A NumPy array is a very different data structure from a list and is designed to be used in different ways. Your use of hstack is potentially very inefficient... every time you call it, all the data in the existing array is copied into a new one. (The append function will have the same issue.) If you want to build up your matrix one column at a time, you might be best off to keep it in a list until it is finished, and only then convert it into an array.
e.g.
mylist = []
for item in data:
mylist.append(item)
mat = numpy.array(mylist)
item can be a list, an array or any iterable, as long
as each item has the same number of elements.
In this particular case (data is some iterable holding the matrix columns) you can simply use
mat = numpy.array(data)
(Also note that using list as a variable name is probably not good practice since it masks the built-in type by that name, which can lead to bugs.)
EDIT:
If for some reason you really do want to create an empty array, you can just use numpy.array([]), but this is rarely useful!
To create an empty multidimensional array in NumPy (e.g. a 2D array m*n to store your matrix), in case you don't know m how many rows you will append and don't care about the computational cost Stephen Simmons mentioned (namely re-buildinging the array at each append), you can squeeze to 0 the dimension to which you want to append to: X = np.empty(shape=[0, n]).
This way you can use for example (here m = 5 which we assume we didn't know when creating the empty matrix, and n = 2):
import numpy as np
n = 2
X = np.empty(shape=[0, n])
for i in range(5):
for j in range(2):
X = np.append(X, [[i, j]], axis=0)
print X
which will give you:
[[ 0. 0.]
[ 0. 1.]
[ 1. 0.]
[ 1. 1.]
[ 2. 0.]
[ 2. 1.]
[ 3. 0.]
[ 3. 1.]
[ 4. 0.]
[ 4. 1.]]
I looked into this a lot because I needed to use a numpy.array as a set in one of my school projects and I needed to be initialized empty... I didn't found any relevant answer here on Stack Overflow, so I started doodling something.
# Initialize your variable as an empty list first
In [32]: x=[]
# and now cast it as a numpy ndarray
In [33]: x=np.array(x)
The result will be:
In [34]: x
Out[34]: array([], dtype=float64)
Therefore you can directly initialize an np array as follows:
In [36]: x= np.array([], dtype=np.float64)
I hope this helps.
For creating an empty NumPy array without defining its shape you can do the following:
arr = np.array([])
The first one is preferred because you know you will be using this as a NumPy array. NumPy converts this to np.ndarray type afterward, without extra [] 'dimension'.
for adding new element to the array us can do:
arr = np.append(arr, 'new element')
Note that in the background for python there's no such thing as an array without
defining its shape. as #hpaulj mentioned this also makes a one-rank
array.
You can use the append function. For rows:
>>> from numpy import *
>>> a = array([10,20,30])
>>> append(a, [[1,2,3]], axis=0)
array([[10, 20, 30],
[1, 2, 3]])
For columns:
>>> append(a, [[15],[15]], axis=1)
array([[10, 20, 30, 15],
[1, 2, 3, 15]])
EDIT
Of course, as mentioned in other answers, unless you're doing some processing (ex. inversion) on the matrix/array EVERY time you append something to it, I would just create a list, append to it then convert it to an array.
Here is some workaround to make numpys look more like Lists
np_arr = np.array([])
np_arr = np.append(np_arr , 2)
np_arr = np.append(np_arr , 24)
print(np_arr)
OUTPUT: array([ 2., 24.])
If you absolutely don't know the final size of the array, you can increment the size of the array like this:
my_arr = numpy.zeros((0,5))
for i in range(3):
my_arr=numpy.concatenate( ( my_arr, numpy.ones((1,5)) ) )
print(my_arr)
[[ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.]]
Notice the 0 in the first line.
numpy.append is another option. It calls numpy.concatenate.
You can apply it to build any kind of array, like zeros:
a = range(5)
a = [i*0 for i in a]
print a
[0, 0, 0, 0, 0]
Depending on what you are using this for, you may need to specify the data type (see 'dtype').
For example, to create a 2D array of 8-bit values (suitable for use as a monochrome image):
myarray = numpy.empty(shape=(H,W),dtype='u1')
For an RGB image, include the number of color channels in the shape: shape=(H,W,3)
You may also want to consider zero-initializing with numpy.zeros instead of using numpy.empty. See the note here.
Another simple way to create an empty array that can take array is:
import numpy as np
np.empty((2,3), dtype=object)
I think you want to handle most of the work with lists then use the result as a matrix. Maybe this is a way ;
ur_list = []
for col in columns:
ur_list.append(list(col))
mat = np.matrix(ur_list)
I think you can create empty numpy array like:
>>> import numpy as np
>>> empty_array= np.zeros(0)
>>> empty_array
array([], dtype=float64)
>>> empty_array.shape
(0,)
This format is useful when you want to append numpy array in the loop.
Perhaps what you are looking for is something like this:
x=np.array(0)
In this way you can create an array without any element. It similar than:
x=[]
This way you will be able to append new elements to your array in advance.
The simplest way
Input:
import numpy as np
data = np.zeros((0, 0), dtype=float) # (rows,cols)
data.shape
Output:
(0, 0)
Input:
for i in range(n_files):
data = np.append(data, new_data, axis = 0)
Related
I am using numpy of python to convert multi dimensional data into matrix by using numpy.asmatrix().
data_array = [[] for k in range(K)]
while Append data to data_array:
data_array[k].append(data)
mymatrix = numpy.asmatrix(data_array)
The data_array ends up with K*E. While I try to use mymatrix for further manipulation, it turns out some of them works but others still remains as list. When I print it out, it looks like
[[ list([..., ...]), list(..., ...) ]]
Does anyone knows the potential reason for that?
My guess is that you are doing the equivalent of:
In [5]: np.matrix([[1,2,3],[2,3],[3]])
Out[5]: matrix([[list([1, 2, 3]), list([2, 3]), list([3])]], dtype=object)
You should change your approach to create the entire 2D array first, then fill it in. Something like this:
mymatrix = np.empty((K, N))
for k in range(K):
mymatrix[k] = data
i have a numpy array p like this:
array([[ 0.92691702, 0.07308298],
[ 0.65515095, 0.34484905],
[ 0.32526151, 0.67473849],
...,
[ 0.34171992, 0.65828008],
[ 0.77521514, 0.22478486],
[ 0.96430103, 0.03569897]])
If i do x=p[:,1:2], i would get
array([[ 0.07308298],
[ 0.34484905],
[ 0.67473849],
...,
[ 0.65828008],
[ 0.22478486],
[ 0.03569897]])
and x.shape is (5500,1)
However, if i do x=p[:,1], i would get
array([ 0.07308298, 0.34484905, 0.67473849, ..., 0.65828008,
0.22478486, 0.03569897])
and x.shape is (5500, )
Why there is difference like this? It quite confuses me. Thanks all in advance for your help.
It's the difference between using a slice and a single integer in the ndarray.__getitem__ call. Slicing causes the ndarray to return "views" while integers cause the ndarray values.
I'm being a little loose in my terminology here -- Really, for your case they both return a numpy view -- It's easier to consider just the 1D case first:
>>> import numpy as np
>>> x = np.arange(10)
>>> x[1]
1
>>> x[1:2]
array([1])
This idea extends to multiple dimensions nicely -- If you pass a slice for a particular axis, you'll get "array-like" values along that axis. If you pass a scalar for a particular axis, you'll get scalars along that axis in the result.
Note that the 1D case really isn't any different from how a standard python list behaves:
>>> x = [1, 2, 3, 4]
>>> x[1]
2
>>> x[1:2]
[2]
I'm dealing with arrays in python, and this generated a lot of doubts...
1) I produce a list of list reading 4 columns from N files and I store 4 elements for N times in a list. I then convert this list in a numpy array:
s = np.array(s)
and I ask for the shape of this array. The answer is correct:
print s.shape
#(N,4)
I then produce the mean of this Nx4 array:
s_m = sum(s)/len(s)
print s_m.shape
#(4,)
that I guess it means that this array is a 1D array. Is this correct?
2) If I subtract the mean vector s_m from the rows of the array s, I can proceed in two ways:
residuals_s = s - s_m
or:
residuals_s = []
for i in range(len(s)):
residuals_s.append([])
tmp = s[i] - s_m
residuals_s.append(tmp)
if I now ask for the shape of residuals_s in the two cases I obtain two different answers. In the first case I obtain:
(N,4)
in the second:
(N,1,4)
can someone explain why there is an additional dimension?
You can get the mean using the numpy method (producing the same (4,) shape):
s_m = s.mean(axis=0)
s - s_m works because s_m is 'broadcasted' to the dimensions of s.
If I run your second residuals_s I get a list containing empty lists and arrays:
[[],
array([ 1.02649662, 0.43613824, 0.66276758, 2.0082684 ]),
[],
array([ 1.13000227, -0.94129685, 0.63411801, -0.383982 ]),
...
]
That does not convert to a (N,1,4) array, but rather a (M,) array with dtype=object. Did you copy and paste correctly?
A corrected iteration is:
for i in range(len(s)):
residuals_s.append(s[i]-s_m)
produces a simpler list of arrays:
[array([ 1.02649662, 0.43613824, 0.66276758, 2.0082684 ]),
array([ 1.13000227, -0.94129685, 0.63411801, -0.383982 ]),
...]
which converts to a (N,4) array.
Iteration like this usually is not needed. But if it is, appending to lists like this is one way to go. Another is to pre allocate an array, and assign rows
residuals_s = np.zeros_like(s)
for i in range(s.shape[0]):
residuals_s[i,:] = s[i]-s_m
I get your (N,1,4) with:
In [39]: residuals_s=[]
In [40]: for i in range(len(s)):
....: residuals_s.append([])
....: tmp = s[i] - s_m
....: residuals_s[-1].append(tmp)
In [41]: residuals_s
Out[41]:
[[array([ 1.02649662, 0.43613824, 0.66276758, 2.0082684 ])],
[array([ 1.13000227, -0.94129685, 0.63411801, -0.383982 ])],
...]
In [43]: np.array(residuals_s).shape
Out[43]: (10, 1, 4)
Here the s[i]-s_m array is appended to an empty list, which has been appended to the main list. So it's an array within a list within a list. It's this intermediate list that produces the middle 1 dimension.
You are using NumPy ndarray without using the functions in NumPy, sum() is a python builtin function, you should use numpy.sum() instead.
I suggest you change your code as:
import numpy as np
np.random.seed(0)
s = np.random.randn(10, 4)
s_m = np.mean(a, axis=0, keepdims=True)
residuals_s = s - s_m
print s.shape, s_m.shape, residuals_s.shape
use mean() function with axis and keepdims arguments will give you the correct result.
2.765334406984874427e+00
3.309563282821381680e+00
The file looks like above: 2 rows, 1 col
numpy.loadtxt() returns
[ 2.76533441 3.30956328]
Please don't tell me use array.transpose() in this case, I need a real solution. Thank you in advance!!
You can always use the reshape command. A single column text file loads as a 1D array which in numpy's case is a row vector.
>>> a
array([ 2.76533441, 3.30956328])
>>> a[:,None]
array([[ 2.76533441],
[ 3.30956328]])
>>> b=np.arange(5)[:,None]
>>> b
array([[0],
[1],
[2],
[3],
[4]])
>>> np.savetxt('something.npz',b)
>>> np.loadtxt('something.npz')
array([ 0., 1., 2., 3., 4.])
>>> np.loadtxt('something.npz').reshape(-1,1) #Another way of doing it
array([[ 0.],
[ 1.],
[ 2.],
[ 3.],
[ 4.]])
You can check this using the number of dimensions.
data=np.loadtxt('data.npz')
if data.ndim==1: data=data[:,None]
Or
np.loadtxt('something.npz',ndmin=2) #Always gives at at least a 2D array.
Although its worth pointing out that if you always have a column of data numpy will always load it as a 1D array. This is more of a feature of numpy arrays rather then a bug I believe.
If you like, you can use matrix to read from string. Let test.txt involve the content. Here's a function for your needs:
import numpy as np
def my_loadtxt(filename):
return np.array(np.matrix(open(filename).read().strip().replace('\n', ';')))
a = my_loadtxt('test.txt')
print a
It gives column vectors if the input is a column vector. For the row vectors, it gives row vectors.
You might want to use the csv module:
import csv
import numpy as np
reader = csv.reader( open('file.txt') )
l = list(reader)
a = np.array(l)
a.shape
>>> (2,1)
This way, you will get the correct array dimensions irrespective of the number of rows / columns present in the file.
I've written a wrapper for loadtxt to do this and is similar to answer from #petrichor, but I think matrix can't have a string data format (probably understandably) so and that method doesn't seem to work if you're loading strings (such as column headings).
def my_loadtxt(filename, skiprows=0, usecols=None, dtype=None):
d = np.loadtxt(filename, skiprows=skiprows, usecols=usecols, dtype=dtype, unpack=True)
if len(d.shape) == 0:
d = d.reshape((1, 1))
elif len(d.shape) == 1:
d = d.reshape((d.shape[0], 1))
return d
I want to create an empty array and append items to it, one at a time.
xs = []
for item in data:
xs.append(item)
Can I use this list-style notation with NumPy arrays?
That is the wrong mental model for using NumPy efficiently. NumPy arrays are stored in contiguous blocks of memory. To append rows or columns to an existing array, the entire array needs to be copied to a new block of memory, creating gaps for the new elements to be stored. This is very inefficient if done repeatedly.
Instead of appending rows, allocate a suitably sized array, and then assign to it row-by-row:
>>> import numpy as np
>>> a = np.zeros(shape=(3, 2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> a[0] = [1, 2]
>>> a[1] = [3, 4]
>>> a[2] = [5, 6]
>>> a
array([[ 1., 2.],
[ 3., 4.],
[ 5., 6.]])
A NumPy array is a very different data structure from a list and is designed to be used in different ways. Your use of hstack is potentially very inefficient... every time you call it, all the data in the existing array is copied into a new one. (The append function will have the same issue.) If you want to build up your matrix one column at a time, you might be best off to keep it in a list until it is finished, and only then convert it into an array.
e.g.
mylist = []
for item in data:
mylist.append(item)
mat = numpy.array(mylist)
item can be a list, an array or any iterable, as long
as each item has the same number of elements.
In this particular case (data is some iterable holding the matrix columns) you can simply use
mat = numpy.array(data)
(Also note that using list as a variable name is probably not good practice since it masks the built-in type by that name, which can lead to bugs.)
EDIT:
If for some reason you really do want to create an empty array, you can just use numpy.array([]), but this is rarely useful!
To create an empty multidimensional array in NumPy (e.g. a 2D array m*n to store your matrix), in case you don't know m how many rows you will append and don't care about the computational cost Stephen Simmons mentioned (namely re-buildinging the array at each append), you can squeeze to 0 the dimension to which you want to append to: X = np.empty(shape=[0, n]).
This way you can use for example (here m = 5 which we assume we didn't know when creating the empty matrix, and n = 2):
import numpy as np
n = 2
X = np.empty(shape=[0, n])
for i in range(5):
for j in range(2):
X = np.append(X, [[i, j]], axis=0)
print X
which will give you:
[[ 0. 0.]
[ 0. 1.]
[ 1. 0.]
[ 1. 1.]
[ 2. 0.]
[ 2. 1.]
[ 3. 0.]
[ 3. 1.]
[ 4. 0.]
[ 4. 1.]]
I looked into this a lot because I needed to use a numpy.array as a set in one of my school projects and I needed to be initialized empty... I didn't found any relevant answer here on Stack Overflow, so I started doodling something.
# Initialize your variable as an empty list first
In [32]: x=[]
# and now cast it as a numpy ndarray
In [33]: x=np.array(x)
The result will be:
In [34]: x
Out[34]: array([], dtype=float64)
Therefore you can directly initialize an np array as follows:
In [36]: x= np.array([], dtype=np.float64)
I hope this helps.
For creating an empty NumPy array without defining its shape you can do the following:
arr = np.array([])
The first one is preferred because you know you will be using this as a NumPy array. NumPy converts this to np.ndarray type afterward, without extra [] 'dimension'.
for adding new element to the array us can do:
arr = np.append(arr, 'new element')
Note that in the background for python there's no such thing as an array without
defining its shape. as #hpaulj mentioned this also makes a one-rank
array.
You can use the append function. For rows:
>>> from numpy import *
>>> a = array([10,20,30])
>>> append(a, [[1,2,3]], axis=0)
array([[10, 20, 30],
[1, 2, 3]])
For columns:
>>> append(a, [[15],[15]], axis=1)
array([[10, 20, 30, 15],
[1, 2, 3, 15]])
EDIT
Of course, as mentioned in other answers, unless you're doing some processing (ex. inversion) on the matrix/array EVERY time you append something to it, I would just create a list, append to it then convert it to an array.
Here is some workaround to make numpys look more like Lists
np_arr = np.array([])
np_arr = np.append(np_arr , 2)
np_arr = np.append(np_arr , 24)
print(np_arr)
OUTPUT: array([ 2., 24.])
If you absolutely don't know the final size of the array, you can increment the size of the array like this:
my_arr = numpy.zeros((0,5))
for i in range(3):
my_arr=numpy.concatenate( ( my_arr, numpy.ones((1,5)) ) )
print(my_arr)
[[ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.]]
Notice the 0 in the first line.
numpy.append is another option. It calls numpy.concatenate.
You can apply it to build any kind of array, like zeros:
a = range(5)
a = [i*0 for i in a]
print a
[0, 0, 0, 0, 0]
Depending on what you are using this for, you may need to specify the data type (see 'dtype').
For example, to create a 2D array of 8-bit values (suitable for use as a monochrome image):
myarray = numpy.empty(shape=(H,W),dtype='u1')
For an RGB image, include the number of color channels in the shape: shape=(H,W,3)
You may also want to consider zero-initializing with numpy.zeros instead of using numpy.empty. See the note here.
Another simple way to create an empty array that can take array is:
import numpy as np
np.empty((2,3), dtype=object)
I think you want to handle most of the work with lists then use the result as a matrix. Maybe this is a way ;
ur_list = []
for col in columns:
ur_list.append(list(col))
mat = np.matrix(ur_list)
I think you can create empty numpy array like:
>>> import numpy as np
>>> empty_array= np.zeros(0)
>>> empty_array
array([], dtype=float64)
>>> empty_array.shape
(0,)
This format is useful when you want to append numpy array in the loop.
Perhaps what you are looking for is something like this:
x=np.array(0)
In this way you can create an array without any element. It similar than:
x=[]
This way you will be able to append new elements to your array in advance.
The simplest way
Input:
import numpy as np
data = np.zeros((0, 0), dtype=float) # (rows,cols)
data.shape
Output:
(0, 0)
Input:
for i in range(n_files):
data = np.append(data, new_data, axis = 0)