Numpy object arrays - python

I've recently run into issues when creating Numpy object arrays using e.g.
a = np.array([c], dtype=np.object)
where c is an instance of some complicated class, and in some cases Numpy tries to access some methods of that class. However, doing:
a = np.empty((1,), dtype=np.object)
a[0] = c
solves the issue. I'm curious as to what the difference is between these two internally. Why in the first case might Numpy try and access some attributes or methods of c?
EDIT: For the record, here is example code that demonstrates the issue:
import numpy as np
class Thing(object):
def __getitem__(self, item):
print "in getitem"
def __len__(self):
return 1
a = np.array([Thing()], dtype='object')
This prints out getitem twice. Basically if __len__ is present in the class, then this is when one can run into unexpected behavior.

In the first case a = np.array([c], dtype=np.object), numpy knows nothing about the shape of the intended array.
For example, when you define
d = range(10)
a = np.array([d])
Then you expect numpy to determine the shape based on the length of d.
So similarly in your case, numpy will attempt to see if len(c) is defined, and if it is, to access the elements of c via c[i].
You can see the effect by defining a class such as
class X(object):
def __len__(self): return 10
def __getitem__(self, i): return "x" * i
Then
print numpy.array([X()], dtype=object)
produces
[[ x xx xxx xxxx xxxxx xxxxxx xxxxxxx xxxxxxxx xxxxxxxxx]]
In contrast, in your second case
a = np.empty((1,), dtype=np.object)
a[0] = c
Then the shape of a has already been determined. Thus numpy can just directly assign the object.
However to an extent this is true only since a is a vector. If it had been defined with a different shape then method accesses will still occur. The following for example will still call ___getitem__ on a class
a = numpy.empty((1, 10), dtype=object)
a[0] = X()
print a
returns
[[ x xx xxx xxxx xxxxx xxxxxx xxxxxxx xxxxxxxx xxxxxxxxx]]

Related

Why does the *= operator change numpy arrays out of scope?

I just came across this weird behaviour (at least for me) of the *= operator for numpy arrays in python. If I pass a local variable (ndarray), lets call it x, to a function and then modify x for example via x*= 2, this change is propagated to the scope where I called the function. If I do the same but using x = x*2 I do not see this behaviour. Why is that? I was expecting that x*=2 and x=x*2 is identical. I observe this only for numpy arrays. Thank you for your help, I also attached an example code.
import numpy as np
def my_func1(x_func):
x_func *= 2
return None
def my_func2(x_func):
x_func = x_func * 2
return None
def my_func():
x = np.array([1]) # expect x to keep this value in the scope of my_func
my_func2(x)
print(x) # x still [1]
my_func1(x)
print(x) # x changed to [2]!
my_func()
Out:
[1]
[2]
Some operations, such as += and *=, act in place to modify an existing array rather than create a new one. That is why when the first function was called then the array from the main function was modified.
def my_func1(x_func):
x_func *= 2
return None
The new array was not returned in the second function after modification(x_func = x_func * 2) and it was a simple assignment operation.
def my_func2(x_func):
x_func = x_func * 2
return None
so its value was not modified in the main function(my_func()).
Reference: https://numpy.org/doc/stable/user/quickstart.html#basic-operations

Python- np.random.choice

I am using the numpy.random.choice module to generate an 'array' of choices based on an array of functions:
def f(x):
return np.sin(x)
def g(x):
return np.cos(x)
base=[f, g]
funcs=np.random.choice(base,size=2)
This code will produce an 'array' of 2 items referencing a function from the base array.
The reason for this post is, I have printed the outcome of funcs and recieved:
[<function f at 0x00000225AC94F0D0> <function f at 0x00000225AC94F0D0>]
Clearly this returns a reference to the functions in some form, not that I understand what that form is or how to manipulate it, this is where the problem comes in. I want to change the choice of function, so that it is no longer random and instead depends on some conditions, so it might be:
for i in range(2):
if testvar=='true':
choice[i] = 0
if testvar== 'false':
choice[i] = 1
This would return an array of indicies to be put in later function
The problem is, the further operations of the code (I think) require this previous form of function reference: [ ] as an input, instead of a simple array of 0,1 Indicies and I don't know how I can get an array of form [ ] by using if statements.
I could be completely wrong about the rest of the code requiring this input, but I don't know how I can amend it, so am hence posting it here. The full code is as follows: (it is a slight variation of code provided by #Attack68 on Evolving functions in python) It aims to store a function that is multiplied by a random function on each iteration and integrates accordingly. (I have put a comment on the code above the function that is causing the problem)
import numpy as np
import scipy.integrate as int
def f(x):
return np.sin(x)
def g(x):
return np.cos(x)
base = [f, g]
funcs = np.random.choice(base, size=2)
print(funcs)
#The below function is where I believe the [<function...>] input to be required
def apply(x, funcs):
y = 1
for func in funcs:
y *= func(x)
return y
print('function value at 1.5 ', apply(1.5, funcs))
answer = int.quad(apply, 1, 2, args=(funcs,))
print('integration over [1,2]: ', answer)
Here is my attempt of implementing a non-random event:
import numpy as np
import scipy.integrate as int
import random
def f(x):
return np.sin(x)
def g(x):
return np.cos(x)
base = [f, g]
funcs = list()
for i in range(2):
testvar=random.randint(0,100) #In my actual code, this would not be random but dependent on some other situation I have not accounted for here
if testvar>50:
func_idx = 0 # choose a np.random operation: 0=f, 1=g
else:
func_idx= 1
funcs.append(func_idx)
#funcs = np.random.choice(base, size=10)
print(funcs)
def apply(x, funcs):
y = 1
for func in funcs:
y *= func(x)
return y
print('function value at 1.5 ', apply(1.5, funcs))
answer = int.quad(apply, 1, 2, args=(funcs,))
print('integration over [1,2]: ', answer)
This returns the following error:
TypeError: 'int' object is not callable
If: You are trying to refactor your original code that operates on a list of randomly chosen functions to a version that operates with random indices which correspond to items in a list of functions. Refactor apply.
def apply(x,indices,base=base):
y = 1
for i in indices:
f = base[i]
y *= f(x)
return y
...this returns a reference to the functions in some form, not that I understand what that form is or how to manipulate it...
Functions are objects, the list contains a reference to the objects themselves. They can be used by either assigning them to a name then calling them or indexing the list and calling the object:
>>> def f():
... return 'f'
>>> def g():
... return 'g'
>>> a = [f,g]
>>> q = a[0]
>>> q()
'f'
>>> a[1]()
'g'
>>> for thing in a:
print(thing())
f
g
Or you can pass them around:
>>> def h(thing):
... return thing()
>>> h(a[1])
'g'
>>>
If you still want to use your function apply as-is, you need to keep your input a list of functions. Instead of providing a list of indices, you can use those indices to create your list of functions.
Instead of apply(1.5, funcs), try:
apply(1.5, [base(n) for n in funcs])

Set default value as numpy array

I have a class MyClass which stores an integer a. I want to define a function inside it that takes a numpy array x of length a, but I want that if the user does not pass in anything, x is set to a random array of the same length. (If they pass in values of the wrong length, I can raise an error). Basically, I would like x to default to a random array of size a.
Here is my attempt at implementing this
import numpy as np
class MyClass():
def __init__(self, a):
self.a = a
def function(self, x = None):
if x == None:
x = np.random.rand(self.a)
# do some more functiony stuff with x
This works if nothing is passed in, but if x is passed in I get ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() i.e. it seems numpy doesn't like comparing arrays with None.
Defining the default value inline doesn't work because self is not in scope yet.
Is there a nice pythonic way to achieve this? To sum up I would like the parameter x to default to a random array of a specific, class-defined length.
As a rule of thumb, comparisons of anything and None should be done with is and not ==.
Changing if x == None to if x is None solves this issue.
class MyClass():
def __init__(self, a):
self.a = a
def function(self, x=None, y=None):
if x is None:
x = np.random.rand(self.a)
print(x)
MyClass(2).function(np.array([1, 2]))
MyClass(2).function()
# [1 2]
# [ 0.92032119 0.71054885]

SymPy lambdify with dot()

Take an undefined function that happens to be named dot, and make it part of lambdify:
import numpy
import sympy
class dot(sympy.Function):
pass
x = sympy.Symbol('x')
a = sympy.Matrix([1, 0, 0])
f = sympy.lambdify(x, dot(a.T, x))
x = numpy.array([3, 2, 1])
print(f(x))
Surprise: This actually works!
Apparently, the string "dot" is somehow extracted and replaced by an implementation of the dot-product. Does anyone know which?
The result of the above is [3]. I would, however, like to get the scalar 3. (How) can I modify f() to achieve that?
I'm not a sympy user however quoting the documentation for lambdify it says:
If not specified differently by the user, SymPy functions are replaced
as far as possible by either python-math, numpy (if available) or
mpmath functions - exactly in this order. To change this behavior, the
“modules” argument can be used. It accepts:
the strings “math”, “mpmath”, “numpy”, “numexpr”, “sympy”
any modules (e.g. math)
dictionaries that map names of sympy functions to arbitrary functions
lists that contain a mix of the arguments above, with higher priority given to entries appearing first.
So it seems that if you have python-math installed it will use that, if not but you have numpy installed it will use numpy's version, otherwise mpmat and then describes how to modify this behaviour.
In your case just provide a modules value that is a dictionary that maps the name dot to a function that return a scalar as you want.
An example of what I mean:
>>> import numpy as np
>>> import sympy
>>> class dot(sympy.Function): pass
...
>>> x = sympy.Symbol('x')
>>> a = sympy.Matrix([1,0,0])
>>> f = sympy.lambdify(x, dot(a.T, x), modules=[{'dot': lambda x, y: np.dot(x, y)[0]}, 'numpy'])
>>> y = np.array([3,2,1])
>>> print(f(y))
3
>>> print(type(f(y)))
<class 'numpy.int64'>
As you can see by manipulating the modules argument you can achieve what you want. My implementation here is absolutely naive, but you can generalize it like:
>>> def my_dot(x, y):
... res = np.dot(x, y)
... if res.ndim == 1 and res.size == 1:
... return res[0]
... return res
This function checks whether the result of the normal dot is a scalar, and if so returns the plain scalar and otherwise return the same result as np.dot.

How to allow both floats and and array in a numpy array

Using numpy and matplotlib it seems quite common that functions allow both a number (float or int) or a numpy array as argument like this:
import numpy as np
print np.sin(0)
# 0
x = np.arange(0,4,0.1)
y = np.sin(x)
In this example I call np.sin once with an integer argument, and once with a numpy array x. I now want to write a function that allows similar treatment, but I don't know how. For example:
def fun(foo, n):
a = np.zeros(n)
for i in range(n):
a[i] = foo
return a
would allow me to call fun like fun(1, 5) but not like fun(x, 5). My actual calculation is much more complicated, of course.
How can I initialize a such that I can have both simple numbers or a whole array of numbers as elements?
Thanks a lot for your help!
Builtin numpy functions often start with a
def foo(a, ...):
a = np.asarray(a)
...
That is, they transform the input argument(s) to array (no copy if it is already an array). The allows them to work with scalars and lists.
Once the argument is an array it has a shape and can be broadcasted against other arguments.
In your example, it's unclear what is supposed to happen when foo is an array
def fun(foo, n):
a = np.zeros(n)
for i in range(n):
a[i] = foo
return a
a is initialized as a dtype float array. That means a[i]=foo works only if foo is a single element number (scalar, possibly a single element array). If foo is an array with more than one value you probably get an error about attempting to set an element with a sequence.
a[i] is short for a[i,...]. That is it indexes on the 1st dimension. So if a was initialed correctly it could accept arrays as inputs (subject to broadcasting rules).
If a was initialed as np.zeros(n, dtype=object), then a[i]=foo will work with anything, since it a just contains pointers to Python objects.
np.frompyfunc is a way of generating an array from a function. But it returns an array of dtype=object. np.vectorize uses that but gives you more control over the output type. But both work with scalars. An array, if given as argument, is passed element by element to the function.
You need a to inherit the dimensions of foo:
def fun(foo, n):
a = np.zeros((n,) + np.shape(foo))
for i in range(n):
a[i] = foo
return a
You can use type identification :
import numpy as np
def double(a):
if type(a)==int:
return 2*a
elif type(a)==float:
return 2.0*a
elif type(a)==list:
return [double(x) for x in a]
elif type(a)==np.ndarray:
return 2*a
else:
print "bad type"
print double(7)
print double(7.2)
print double([2,9,7])
print double(np.array([[9,8],[2,3]]))
result :
>14
>14.4
>[4, 18, 14]
>[[18 16]
[ 4 6]]
with eventually a recursive treatement as I did on list

Categories

Resources