Extend numpy array in a way compatible with builtin arrays - python

I am trying to write code that would not depend on whether the user uses np.array or a builtin array. I am trying to avoid checking object types, etc. The only problem that I have is in extending arrays. For example, if I have two Python arrays, a and b, to extend the first by the second, I can either do a = a + b, or a.extend(b). On the other hand, if a is a numpy array, I need to use np.append or something else.
Is there a quick way to extend arrays independently of whether they are np arrays or Python arrays?

NumPy's append() works on lists too!
>>> np.append([1,2,3], [1,2,3])
array([1, 2, 3, 1, 2, 3])
If you want to automatically make the result be the same type as the input, try this:
mytype = type(a)
arr = np.append(a, b)
result = mytype(arr)

Even if your function is flexible on input, your output should be of specific type. So I would just convert to desired output type.
For example, if my functions is working with numpy.array and returns a numpy.array, but I want to allow lists to be input as well, the first thing I would do is convert lists to numpy.arrays.
Like this:
def my_func(a, b):
a = np.asarray(a)
b = np.asarray(b)
# do my stuff here

Related

Numpy broadcasting - using a variable value

EDIT:
As my question was badly formulated, I decided to rewrite it.
Does numpy allow to create an array with a function, without using Python's standard list comprehension ?
With list comprehension I could have:
array = np.array([f(i) for i in range(100)])
with f a given function.
But if the constructed array is really big, using Python's list would be slow and would eat a lot of memory.
If such a way doesn't exist, I suppose I could first create an array of my wanted size
array = np.arange(100)
And then map a function over it.
array = f(array)
According to results from another post, it seems that it would be a reasonable solution.
Let's say I want to use the add function with a simple int value, it will be as follows:
array = np.array([i for i in range(5)])
array + 5
But now what if I want the value (here 5) as something that varies according to the index of the array element. For example the operation:
array + [i for i in range(5)]
What object can I use to define special rules for a variable value within a vectorized operation ?
You can add two arrays together like this:
Simple adding two arrays using numpy in python?
This assumes your "variable by index" is just another array.
For your specific example, a jury-rigged solution would be to use numpy.arange() as in:
In [4]: array + np.arange(5)
Out[4]: array([0, 2, 4, 6, 8])
In general, you can find some numpy ufunc that does the job of your custom function or you can compose then in a python function to do so, which then returns an ndarray, something like:
def custom_func():
# code for your tasks
return arr
You can then simply add the returned result to your already defined array as in:
array + cusom_func()

Pass numpy array of list of integers in Cython method from Python

I would like to pass the following array of list of integers (i.e., it's not an two dimensional array) to the Cython method from python code.
Python Sample Code:
import numpy as np
import result
a = np.array([[1], [2,3]])
process_result(a)
The output of a is array([list([1]), list([2, 3])], dtype=object)
Cython Sample Code:
def process_result(int[:,:] a):
pass
The above code gives the following error:
ValueError: Buffer has wrong number of dimensions (expected 2, got 1)
I tried to pass a simple array instead of numpy I got the following error
a = [[1], [2,3]]
process_result(a)
TypeError: a bytes-like object is required, not 'list'
Kindly assist me how to pass the value of a into the Cython method process_result and whats the exact datatype needs to use to receive this value in Cython method.
I think you're using the wrong data-type. Instead of a numpy array of lists, you should be using a list of numpy arrays. There is very little benefit of using numpy arrays of Python objects (such as lists) - unlike numeric types they aren't stored particulatly efficiently, they aren't quick to do calculations on, and you can't accelerate them in Cython. Therefore the outermost level may as well be a normal Python list.
However, the inner levels all look to be homogenous arrays of integers, and so would be ideal candidates for Numpy arrays (especially if you want to process them in Cython).
Therefore, build your list as:
a = [ np.array([1],dtype=np.int), np.array([2,3],dtype=np.int) ]
(Or use tolist on a numpy array)
For your function you can define it like:
def process_result(list a):
cdef int[:] item
for item in a:
#operations on the inner arrays are fast!
pass
Here I've assumed that you most likely want to iterate over the list. Note that there's pretty little benefit in typing a to be list, so you could just leave it untyped (to accept any Python object) and then you could pass it other iterables too, like your original numpy array.
Convert the array of list of integer to list of object (i.e., list of list of integers - its not a two dimensional array)
Python Code:
import numpy as np
import result
a = np.array([[1], [2,3]]).tolist()
process_result(a)
The output of a is [[1], [2,3]]
Cython Sample Code:
def process_result(list a):
pass
Change the int[:, :] to list. It works fine.
Note: If anyone know the optimal answer kindly post it, It will be
helpful.

Numpy: Uniform way of retrieving `dtype`

If I have a numpy array x, I can get its data type by using dtype like this:
t = x.dtype
However, that obviously won't work for things like lists. I wonder if there is a standard way of retrieving types for lists and numpy arrays. In the case of lists, I guess this would mean the largest type which fits all of the data. For instance, if
x = [ 1, 2.2 ]
I would want such a method to return float, or better yet numpy.float64.
Intuitively, I thought that this was the purpose of the numpy.dtype method. However, that is not the case. That method is used to create a type, not extract a type.
The only method that I know of getting a type is to wrap whatever object is passed in with a numpy array, and then get the dtype:
def dtype(x):
return numpy.asarray(x).dtype
The issue with this approach, however, is that it will copy the array if it is not already a numpy array. In this circumstance, that is extremely heavy for such a simple operation.
So is there a numpy method that I can use which won't require me to do any list copies?
EDIT
I am designing a library for doing some geometric manipulations... Conversions between rotation matrices, rotation vectors, quaternions, euler angles, etc.
It can easily happen that the user is simply working with a single rotation vector (which has 3 elements). In that case, they might write something like
q = vectorToQuaternion([ .1, 0, 0 ])
In this case, I would want the output quaternion to be a numpy array of type numpy.float64. However, sometimes to speed up the calculations, the user might want to use a numpy array of float32's:
q = vectorToQuaternion(numpy.float32([ .1, 0, 0 ]))
In which case, I think it is natural to expect that the output is the same type.
The issue is that I cannot use the zeros_like function (or empty_like, etc) because a quaternion has 4 components, while a vector has 3. So internally, I have to do something like
def vectorToQuaternion(v):
q = empty( (4,), dtype = asarray(v).dtype )
...
If there was a way of using empty_like which extracts all of the properties of the input, but lets me specify the shape of the output, then that would be the ideal function for me. However, to my knowledge, you cannot specify the shape in the call to empty_like.
EDIT
Here are some gists for the class I am talking about, and a test class (so that you can see how I intend to use it).
Class: https://gist.github.com/mholzel/c3af45562a56f2210270d9d1f292943a
Tests: https://gist.github.com/mholzel/1d59eecf1e77f21be7b8aadb37cc67f2
If you really want to do it that way you will probably have to use np.asarray, but I'm not sure that's the most solid way of dealing with the problem. If the user forgets to add . and gives [1, 0, 0] then you will be creating integer outputs, which most definitely does not make sense for quaternions. I would default to np.float64, using the dtype of the input if it is an array of some float type, and maybe also giving the option to explicitly pass a dtype:
import numpy as np
def vectorToQuaternion(v, dtype=None):
if dtype is None:
if isinstance(v, np.ndarray) and np.issubdtype(v.dtype, np.floating):
# Or if you prefer:
if np.issubdtype(getattr(v, 'dtype', np.int), np.floating):
dtype = v.dtype
else:
dtype = np.float64
q = np.empty((4,), dtype=dtype)
# ...

Numpy: how to apply vectorized functions to array with dtype

I'm following this tutorial on how to use numpy to manipulate images. When I load the sample image using scipy, I get a 2D array of RGB tuples, with a dtype value appended on the end.
array([[7, 8, 5],
[3, 5, 7]], dtype=uint8)
I wrote a function and vectorized it
def myfunc(a, b):
return a + 2
vfunc = np.vectorize(myfunc)
but when I apply it to my array, the result doesn't have the dtype
array([[9, 10, 7],
[5, 7, 9]])
My guess is that because "dtype + 2" isn't defined, it's just losing that element of the array.
How can I write a function that will not strip the dtype when I vectorize it and apply it to a numpy array?
np.vectorize takes an otypes parameter. You can use that to specify the dtype of the return. Without that vectorize does a trial calculation on the 1st element of your array, and uses that return dtype to determine the dtype of the whole reply.
Look at the 3rd example in its docs.
Usually users encounter this when the first value produces an integer value (e.g. 0) and they expect the whole thing to be float.
So try:
vfunc = np.vectorize(myfunc, otypes=[np.uint8])
dtype=uint8 is not an element of the array. It is just a thing that gets printed to let you know that the array is of type np.uint8.
The default types np.float_ and np.int_ do not get a printout like that, which is what you are seeing in the second case. The way you can tell float and int arrays apart is that float arrays will always have decimal points in the numbers.
The reason that this is happening is that you are adding 2 to each element of your array. Since 2 is an integer, the output array gets promoted to np.int_ type and you do not get an explicit dtype printout.
You can try the following experiment: redefine myfunc to add a np.uint8 instead of an integer to the array elements and try to print the result:
def myfunc(a, b):
return a + np.uint8(2)
Finally, keep in mind that vectorizing Python code is usually not the best way to get things done. The function itself will be a Python function, and therefore slow. It is generally better to find a way of performing whatever operations you want with numpy functions.

What is the difference between ndarray and array in NumPy?

What is the difference between ndarray and array in NumPy? Where is their implementation in the NumPy source code?
numpy.array is just a convenience function to create an ndarray; it is not a class itself.
You can also create an array using numpy.ndarray, but it is not the recommended way. From the docstring of numpy.ndarray:
Arrays should be constructed using array, zeros or empty ... The parameters given here refer to a
low-level method (ndarray(...)) for instantiating an array.
Most of the meat of the implementation is in C code, here in multiarray, but you can start looking at the ndarray interfaces here:
https://github.com/numpy/numpy/blob/master/numpy/core/numeric.py
numpy.array is a function that returns a numpy.ndarray object.
There is no object of type numpy.array.
Just a few lines of example code to show the difference between numpy.array and numpy.ndarray
Warm up step: Construct a list
a = [1,2,3]
Check the type
print(type(a))
You will get
<class 'list'>
Construct an array (from a list) using np.array
a = np.array(a)
Or, you can skip the warm up step, directly have
a = np.array([1,2,3])
Check the type
print(type(a))
You will get
<class 'numpy.ndarray'>
which tells you the type of the numpy array is numpy.ndarray
You can also check the type by
isinstance(a, (np.ndarray))
and you will get
True
Either of the following two lines will give you an error message
np.ndarray(a) # should be np.array(a)
isinstance(a, (np.array)) # should be isinstance(a, (np.ndarray))
numpy.ndarray() is a class, while numpy.array() is a method / function to create ndarray.
In numpy docs if you want to create an array from ndarray class you can do it with 2 ways as quoted:
1- using array(), zeros() or empty() methods:
Arrays should be constructed using array, zeros or empty (refer to the See Also section below). The parameters given here refer to a low-level method (ndarray(…)) for instantiating an array.
2- from ndarray class directly:
There are two modes of creating an array using __new__:
If buffer is None, then only shape, dtype, and order are used.
If buffer is an object exposing the buffer interface, then all keywords are interpreted.
The example below gives a random array because we didn't assign buffer value:
np.ndarray(shape=(2,2), dtype=float, order='F', buffer=None)
array([[ -1.13698227e+002, 4.25087011e-303],
[ 2.88528414e-306, 3.27025015e-309]]) #random
another example is to assign array object to the buffer
example:
>>> np.ndarray((2,), buffer=np.array([1,2,3]),
... offset=np.int_().itemsize,
... dtype=int) # offset = 1*itemsize, i.e. skip first element
array([2, 3])
from above example we notice that we can't assign a list to "buffer" and we had to use numpy.array() to return ndarray object for the buffer
Conclusion: use numpy.array() if you want to make a numpy.ndarray() object"
I think with np.array() you can only create C like though you mention the order, when you check using np.isfortran() it says false. but with np.ndarrray() when you specify the order it creates based on the order provided.

Categories

Resources