array of unknown size - python

I am pulling data from a .CSV into an array, as follows:
my_data = genfromtxt('nice.csv', delimiter='')
a = np.array(my_data)
I then attempt to establish the size and shape of the array, thus:
size_array=np.size(a)
shape_array=np.shape(a)
Now, I want to generate an array of identical shape and size, and then carry out some multiplications. The trouble I am having is generating the correctly sized array. I have tried this:
D = np.empty([shape_array,])
I receive the error:
"tuple' object cannot be interpreted as an index".
After investigation my array has a shape of (248L,). Please...how do I get this array in a sensible format?
Thanks.

The line shape_array=np.shape(a) creates a shape tuple, which is the expected input to np.empty.
The expression [shape_array,] is that tuple, wrapped in a list, which seems superfluous. Use shape_array directly:
d = np.empty(shape_array)
On a related note, you can use the function np.empty_like to get an array of the same shape and type as the original more effectively:
d = np.empty_like(a)
If you want to use just the shape and size, there is really no need to store them in separate variables after calling np.size and np.shape. It is more idiomatic to use the corresponding properties of np.ndarray directly:
d = np.empty(a.shape)

Related

Why Numpy array reshape takes two brackets?

While learning the Python numpy library I saw the array reshape function.
a = [1,2,3]
b = [5,6,7]
c = a + b
d = np.array(c)
e = d.reshape((2,3))
print(e)
In the above code why is it that reshape takes two brackets?
Why I can't write it like this?
e = d.reshape(2,3)
I saw this question anywhere else.
As mentioned in numpy doc for numpy.rehsape:
numpy.reshape(a, newshape, order='C')
newshape must be an int or tuple of ints.
And in the doc for ndarray.reshape:
ndarray.reshape(shape, order='C')
"Unlike the free function numpy.reshape, this method on ndarray allows the elements of the shape parameter to be passed in as separate arguments. For example, a.reshape(10, 11) is equivalent to a.reshape((10, 11))."
reshape method get shape parameter which it should be tuple as (2, 3)
def reshape(self, shape, *shapes, order='C'): # known case of numpy.core.multiarray.ndarray.reshape
"""
a.reshape(shape, order='C')
Returns an array containing the same data with a new shape.
Refer to `numpy.reshape` for full documentation.
See Also
--------
numpy.reshape : equivalent function
Notes
-----
Unlike the free function `numpy.reshape`, this method on `ndarray` allows
the elements of the shape parameter to be passed in as separate arguments.
For example, ``a.reshape(10, 11)`` is equivalent to
``a.reshape((10, 11))``.
"""
pass
```

how to populate a numpy 2D array with function returning a numpy 1D array

I have function predicton like
def predictions(degree):
some magic,
return an np.ndarray([0..100])
I want to call this function for a few values of degree and use it to populate a larger np.ndarray (n=2), filling each row with the outcome of the function predictions. It seems like a simple task but somehow I cant get it working. I tried with
for deg in [1,2,4,8,10]:
np.append(result, predictions(deg),axis=1)
with result being an np.empty(100). But that failed with Singleton array array(1) cannot be considered a valid collection.
I could not get fromfunction it only works on a coordinate tuple, and the irregular list of degrees is not covered in the docs.
Don't use np.ndarray until you are older and wiser! I couldn't even use it without rereading the docs.
arr1d = np.array([1,2,3,4,5])
is the correct way to construct a 1d array from a list of numbers.
Also don't use np.append. I won't even add the 'older and wiser' qualification. It doesn't work in-place; and is slow when used in a loop.
A good way of building a 2 array from 1d arrays is:
alist = []
for i in ....:
alist.append(<alist or 1d array>)
arr = np.array(alist)
provided all the sublists have the same size, arr should be a 2d array.
This is equivalent to building a 2d array from
np.array([[1,2,3], [4,5,6]])
that is a list of lists.
Or a list comprehension:
np.array([predictions(i) for i in range(10)])
Again, predictions must all return the same length arrays or lists.
append is in the boring section of numpy. here you know the shape in advance
len_predictions = 100
def predictions(degree):
return np.ones((len_predictions,))
degrees = [1,2,4,8,10]
result = np.empty((len(degrees), len_predictions))
for i, deg in enumerate(degrees):
result[i] = predictions(deg)
if you want to store the degree somehow, you can use custom dtypes

What is the difference between ndarray and array in NumPy?

What is the difference between ndarray and array in NumPy? Where is their implementation in the NumPy source code?
numpy.array is just a convenience function to create an ndarray; it is not a class itself.
You can also create an array using numpy.ndarray, but it is not the recommended way. From the docstring of numpy.ndarray:
Arrays should be constructed using array, zeros or empty ... The parameters given here refer to a
low-level method (ndarray(...)) for instantiating an array.
Most of the meat of the implementation is in C code, here in multiarray, but you can start looking at the ndarray interfaces here:
https://github.com/numpy/numpy/blob/master/numpy/core/numeric.py
numpy.array is a function that returns a numpy.ndarray object.
There is no object of type numpy.array.
Just a few lines of example code to show the difference between numpy.array and numpy.ndarray
Warm up step: Construct a list
a = [1,2,3]
Check the type
print(type(a))
You will get
<class 'list'>
Construct an array (from a list) using np.array
a = np.array(a)
Or, you can skip the warm up step, directly have
a = np.array([1,2,3])
Check the type
print(type(a))
You will get
<class 'numpy.ndarray'>
which tells you the type of the numpy array is numpy.ndarray
You can also check the type by
isinstance(a, (np.ndarray))
and you will get
True
Either of the following two lines will give you an error message
np.ndarray(a) # should be np.array(a)
isinstance(a, (np.array)) # should be isinstance(a, (np.ndarray))
numpy.ndarray() is a class, while numpy.array() is a method / function to create ndarray.
In numpy docs if you want to create an array from ndarray class you can do it with 2 ways as quoted:
1- using array(), zeros() or empty() methods:
Arrays should be constructed using array, zeros or empty (refer to the See Also section below). The parameters given here refer to a low-level method (ndarray(…)) for instantiating an array.
2- from ndarray class directly:
There are two modes of creating an array using __new__:
If buffer is None, then only shape, dtype, and order are used.
If buffer is an object exposing the buffer interface, then all keywords are interpreted.
The example below gives a random array because we didn't assign buffer value:
np.ndarray(shape=(2,2), dtype=float, order='F', buffer=None)
array([[ -1.13698227e+002, 4.25087011e-303],
[ 2.88528414e-306, 3.27025015e-309]]) #random
another example is to assign array object to the buffer
example:
>>> np.ndarray((2,), buffer=np.array([1,2,3]),
... offset=np.int_().itemsize,
... dtype=int) # offset = 1*itemsize, i.e. skip first element
array([2, 3])
from above example we notice that we can't assign a list to "buffer" and we had to use numpy.array() to return ndarray object for the buffer
Conclusion: use numpy.array() if you want to make a numpy.ndarray() object"
I think with np.array() you can only create C like though you mention the order, when you check using np.isfortran() it says false. but with np.ndarrray() when you specify the order it creates based on the order provided.

How to construct an np.array with fromiter

I'm trying to construct an np.array by sampling from a python generator, that yields one row of the array per invocation of next. Here is some sample code:
import numpy as np
data = np.eye(9)
labels = np.array([0,0,0,1,1,1,2,2,2])
def extract_one_class(X,labels,y):
""" Take an array of data X, a column vector array of labels, and one particular label y. Return an array of all instances in X that have label y """
return X[np.nonzero(labels[:] == y)[0],:]
def generate_points(data, labels, size):
""" Generate and return 'size' pairs of points drawn from different classes """
label_alphabet = np.unique(labels)
assert(label_alphabet.size > 1)
for useless in xrange(size):
shuffle(label_alphabet)
first_class = extract_one_class(data,labels,label_alphabet[0])
second_class = extract_one_class(data,labels,label_alphabet[1])
pair = np.hstack((first_class[randint(0,first_class.shape[0]),:],second_class[randint(0,second_class.shape[0]),:]))
yield pair
points = np.fromiter(generate_points(data,labels,5),dtype = np.dtype('f8',(2*data.shape[1],1)))
The extract_one_class function returns a subset of data: all data points belonging to one class label. I would like to have points be an np.array with shape = (size,data.shape[1]). Currently the code snippet above returns an error:
ValueError: setting an array element with a sequence.
The documentation of fromiter claims to return a one-dimensional array. Yet others have used fromiter to construct record arrays in numpy before (e.g http://iam.al/post/21116450281/numpy-is-my-homeboy).
Am I off the mark in assuming I can generate an array in this fashion? Or is my numpy just not quite right?
As you've noticed, the documentation of np.fromiter explains that the function creates a 1D array. You won't be able to create a 2D array that way, and #unutbu method of returning a 1D array that you reshape afterwards is a sure go.
However, you can indeed create structured arrays using fromiter, as illustrated by:
>>> import itertools
>>> a = itertools.izip((1,2,3),(10,20,30))
>>> r = np.fromiter(a,dtype=[('',int),('',int)])
array([(1, 10), (2, 20), (3, 30)],
dtype=[('f0', '<i8'), ('f1', '<i8')])
but look, r.shape=(3,), that is, r is really nothing but 1D array of records, each record being composed of two integers. Because all the fields have the same dtype, we can take a view of r as a 2D array
>>> r.view((int,2))
array([[ 1, 10],
[ 2, 20],
[ 3, 30]])
So, yes, you could try to use np.fromiter with a dtype like [('',int)]*data.shape[1]: you'll get a 1D array of length size, that you can then view this array as ((int, data.shape[1])). You can use floats instead of ints, the important part is that all fields have the same dtype.
If you really want it, you can use some fairly complex dtype. Consider for example
r = np.fromiter(((_,) for _ in a),dtype=[('',(int,2))])
Here, you get a 1D structured array with 1 field, the field consisting of an array of 2 integers. Note the use of (_,) to make sure that each record is passed as a tuple (else np.fromiter chokes). But do you need that complexity?
Note also that as you know the length of the array beforehand (it's size), you should use the counter optional argument of np.fromiter for more efficiency.
You could modify generate_points to yield single floats instead of np.arrays, use np.fromiter to form a 1D array, and then use .reshape(size, -1) to make it a 2D array.
points = np.fromiter(
generate_points(data,labels,5)).reshape(size, -1)
Following some suggestions here, I came up with a fairly general drop-in replacement for numpy.fromiter() that satisfies the requirements of the OP:
import numpy as np
def fromiter(iterator, dtype, *shape):
"""Generalises `numpy.fromiter()` to multi-dimesional arrays.
Instead of the number of elements, the parameter `shape` has to be given,
which contains the shape of the output array. The first dimension may be
`-1`, in which case it is inferred from the iterator.
"""
res_shape = shape[1:]
if not res_shape: # Fallback to the "normal" fromiter in the 1-D case
return np.fromiter(iterator, dtype, shape[0])
# This wrapping of the iterator is necessary because when used with the
# field trick, np.fromiter does not enforce consistency of the shapes
# returned with the '_' field and silently cuts additional elements.
def shape_checker(iterator, res_shape):
for value in iterator:
if value.shape != res_shape:
raise ValueError("shape of returned object %s does not match"
" given shape %s" % (value.shape, res_shape))
yield value,
return np.fromiter(shape_checker(iterator, res_shape),
[("_", dtype, res_shape)], shape[0])["_"]

Operator + to add a tuple to another tuple stored inside a multidimensional array of tuples

I recently found out how to use tuples thanks to great contributions from SO users(see here). However I encounter the problem that I can't add a tuple to another tuple stored inside an array of tuples. For instance if I define:
arrtup=empty((2,2),dtype=('int,int'))
arrtup[0,1]=(3,4)
Then if I try to add another tuple to the existing tupe to come up with a multidimensional index:
arrtup[0,1]+(4,4)
I obtain this error:
TypeError: unsupported operand type(s) for +: 'numpy.void' and 'tuple'
Instead of the expected (3,4,4,4) tuple, which I can obtain by:
(3,4)+(4,4)
Any ideas? Thanks!
You are mixing different concepts, I'm afraid.
Your arrtup array is not an array of tuples, it's a structured ndarray, that is, an array of elements that look like tuples but in fact are records (numpy.void objects, to be exact). In your case, you defined these records to consist in 2 integers. Internally, NumPy creates your array as a 2x2 array of blocks, each block taking a given space defined by your dtype: here, a block consists of 2 consecutive blocks of size int (that is, each sub-block takes the space an int takes on your machine).
When you retrieve an element with arrtup[0,1], you get the corresponding block. Because this block is structured as two-subblocks, NumPy returns a numpy.void (the generic object representing structured blocks), which has the same dtype as your array.
Because you set the size of those blocks at the creation of the array, you're no longer able to modify it. That means that you cannot transform your 2-int records into 4-int ones as you want.
However, you can transform you structured array into an array of objects:
new_arr = arrtup.astype(object)
Lo and behold, your elements are no longer np.void but tuples, that you can modify as you want:
new_arr[0,1] = (3,4) # That's a tuple
new_arr[0,1] += (4,4) # Adding another tuple to the element
Your new_arr is a different beast from your arrtup: it has the same size, true, but it's no longer a structured array, it's an array of objects, as illustrated by
>>> new_arr.dtype
dtype("object")
In practice, the memory layout is quite different between arrtup and newarr. newarr doesn't have the same constraints as arrtup, as the individual elements can have different sizes, but object arrays are not as efficient as structured arrays.
The traceback is pretty clear here. arrtup[0,1] is not a tuple. It's of type `numpy.void'.
You can convert it to a tuple quite easily however:
tuple(arrtup[0,1])
which can be concatenated with other tuples:
tuple(arrtup[0,1]) + (4,4)

Categories

Resources