numpy.equal with string values

numpy.equal with string values - python

The numpy.equal function does not work if a list or array contains strings:
>>> import numpy
>>> index = numpy.equal([1,2,'a'],None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: function not supported for these types, and can't coerce safely to supported types
What is the easiest way to workaround this without looping through each element? In the end, I need index to contain a boolean array indicating which elements are None.

If you really need to use numpy, be more careful about what you pass in and it can work:
>>> import numpy
>>> a = numpy.array([1, 2, 'a'], dtype=object) # makes type of array what you need
>>> numpy.equal(a, None)
array([False, False, False], dtype=bool)
Since you start with a list, there's a chance what you really want is just a list comprehension like [item is None for item in [1, 2, 'a']] or the similar generator expression.
To have an a heterogeneous list like this is odd. Lists (and numpy arrays) are typically used for homogeneous data.

What's wrong with a stock list comprehension?
index = [x is None for x in L]

Related

Creating a numpy array from a set

I noticed the following behaviour exhibited by numpy arrays:
>>> import numpy as np
>>> s = {1,2,3}
>>> l = [1,2,3]
>>> np.array(l)
array([1, 2, 3])
>>> np.array(s)
array({1, 2, 3}, dtype=object)
>>> np.array(l, dtype='int')
array([1, 2, 3])
>>> np.array(l, dtype='int').dtype
dtype('int64')
>>> np.array(s, dtype='int')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: int() argument must be a string, a bytes-like object or a number, not 'set'
There are 2 things to notice:
Creating an array from a set results in the array dtype being object
Trying to specify dtype results in an error which suggests that the
set is being treated as a single element rather than an iterable.
What am I missing - I don't fully understand which bit of python I'm overlooking. Set is a mutable object much like a list is.
EDIT: tuples work fine:
>>> t = (1,2,3)
>>> np.array(t)
array([1, 2, 3])
>>> np.array(t).dtype
dtype('int64')

The array factory works best with sequence objects which a set is not. If you do not care about the order of elements and know they are all ints or convertible to int, then you can use np.fromiter
np.fromiter({1,2,3},int,3)
# array([1, 2, 3])
The second (dtype) argument is mandatory; the last (count) argument is optional, providing it can improve performance.

As you can see from the syntax of using curly brackets, a set are more closely related to a dict than to a list. You can solve it very simply by turning the set into a list or tuple before converting to an array:
>>> import numpy as np
>>> s = {1,2,3}
>>> np.array(s)
array({1, 2, 3}, dtype=object)
>>> np.array(list(s))
array([1, 2, 3])
>>> np.array(tuple(s))
array([1, 2, 3])
However this might be too inefficient for large sets, because the list or tuple functions have to run through the whole set before even starting the creation of the array. A better method would be to use the set as an iterator:
>>> np.fromiter(s, int)
array([1, 2, 3])

The np.array documentation says that the object argument must be "an array, any object exposing the array interface, an object whose __array__ method returns an array, or any (nested) sequence" (emphasis added).
A set is not a sequence. Specifically, sets are unordered and do not support the __getitem__ method. Hence you cannot create an array from a set like you trying to with the list.

Numpy expects the argument to be a list, it doesn't understand the set type so it creates an object array (this would be the same if you passed any other non sequence object). You can create a numpy array with a set by first converting the set to a list numpy.array(list(my_set)). Hope this helps.

Two similar array-in-array containment tests. One passes, the other raises a ValueError. Why?

Moar noob Python questions
I have a list of NumPy arrays and want to test if two arrays are inside. Console log:
>>> theArray
[array([[[213, 742]]], dtype=int32), array([[[127, 740]],
[[127, 741]],
[[128, 742]],
[[127, 741]]], dtype=int32)]
>>> pair[0]
array([[[213, 742]]], dtype=int32)
>>> pair[1]
array([[[124, 736]]], dtype=int32)
>>> pair[0] in theArray
True
>>> pair[1] in theArray
Traceback (most recent call last):
File "...\pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<input>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
pair[0] and pair[1] seem to have absolutely similar characteristics according to the debugger (except the contents). So how are these two cases different? Why could the second one fail while the first does not?

Using in at all here is a mistake.
theArray isn't an array. It's a list. in for lists assumes that == is an equivalence relation on its elements, but == isn't an equivalence relation on NumPy arrays; it doesn't even return a boolean. Using in here is essentially meaningless.
Making theArray an array wouldn't help, because in for arrays makes basically no sense.
pair[0] in theArray happens to not raise an exception because of an optimization lists perform. Lists try an is comparison before == for in, and pair[0] happens to be the exact same object as the first element of theArray, so the list never gets around to trying == and being confused by its return value.
If you want to check whether a specific object obj is one of the elements of a list l (not just ==-equivalent to one of the elements, but actually that object), use any(obj is element for element in l).
If you want to check whether a NumPy array is "equal" to an array in a list of arrays in the sense of having the same shape and equal elements, use any(numpy.array_equal(obj, element) for element in l).

I get the ValueError for both success and failure cases.
as #user2357112 said, the issue is that the elements of the list are numpy arrays, so the == comparison which 'in' depends on doesn't work
but you can use a construction like:
any(np.all(x == p[0]) for x in theArray)

What's the Pythonic way to initialize an array?

I often see arrays in Python 3 that are declared in either one of two ways:
foo[2, 2] = [[1, 2], [3, 4]]
or...
foo[2][2] = [[1, 2], [3, 4]]
I've tried using both of these in computationally-expensive tasks(i.e. Machine Learning) for gargantuan arrays, and they seem to have not much of a difference.
Is there a difference between the two, in terms of memory allocation and execution times for looping and such, when the lists are big?

In this case it creates the tuple (ti, tj) and passes it to dense.__getitem__(). As to what that accomplishes, you will need to see the documentation and/or source for dense's type.

The code dense[ti, tj] calls dense.__getitem__((ti, tj)). The comma in this case constructs a tuple. This doesn't work with lists, but it could work with a dictionary if the keys are tuples.
>>> [1,2,3][1, 2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not tuple
>>> {(1, 2): 1}[1, 2]
1

Appending two arrays together in Python

I've been working in Python with an array which contains a one-dimensional list of values. I have until now been using the array.append(value) function to add values to the array one at a time.
Now, I would like to add all the values from another array to the main array instead. In other words I don't want to add single values one at a time. The secondary array collects ten values, and when these are collected, they are all transfered to the main array. The problem is, I can't simply use the code 'array.append(other_array)', as I get the following error:
unsupported operand type(s) for +: 'int' and 'list'
Where am I going wrong?

Lists can be added together:
>>> a = [1,2,3,4]
>>> b = [5,6,7,8]
>>> a+b
[1, 2, 3, 4, 5, 6, 7, 8]
and one can be easily added to the end of another:
>>> a += b
>>> a
[1, 2, 3, 4, 5, 6, 7, 8]

You are looking for array.extend() method. append() only appends a single element to the array.

Array (as in numpy.array or array module) or list? Because given your error message, it seems the later.
Anyway, you can use the += operator, that should be overridden for most container types, but the operands must be of the same (compound) type.

Usually, if you want to expand a structure to the right (axis=1) or at the bottom (axis=0), you should have a look at the numpy.concatenate() function, see Concatenate a NumPy array to another NumPy array.
np.concatenate(arr1, arr2, axis=0)
is probably what is needed here, adding a new row in a nested array.

Understanding the behavior of Python's set

The documentation for the built-in type set says:
class set([iterable])
Return a new set or frozenset object
whose elements are taken from
iterable. The elements of a set must
be hashable.
That is all right but why does this work:
>>> l = range(10)
>>> s = set(l)
>>> s
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
And this doesn't:
>>> s.add([10])
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
s.add([10])
TypeError: unhashable type: 'list'
Both are lists. Is some magic happening during the initialization?

When you initialize a set, you provide a list of values that must each be hashable.
s = set()
s.add([10])
is the same as
s = set([[10]])
which throws the same error that you're seeing right now.

In [13]: (2).__hash__
Out[13]: <method-wrapper '__hash__' of int object at 0x9f61d84>
In [14]: ([2]).__hash__ # nothing.
The thing is that set needs its items to be hashable, i.e. implement the __hash__ magic method (this is used for ordering in the tree as far as I know). list does not implement that magic method, hence it cannot be added in a set.

In this line:
s.add([10])
You are trying to add a list to the set, rather than the elements of the list. If you want ot add the elements of the list, use the update method.

Think of the constructor being something like:
class Set:
def __init__(self,l):
for elem in l:
self.add(elem)
Nothing too interesting to be concerned about why it takes lists but on the other hand add(element) does not.

It behaves according to the documentation: set.add() adds a single element (and since you give it a list, it complains it is unhashable - since lists are no good as hash keys). If you want to add a list of elements, use set.update(). Example:
>>> s = set([1,2,3])
>>> s.add(5)
>>> s
set([1, 2, 3, 5])
>>> s.update([8])
>>> s
set([8, 1, 2, 3, 5])

s.add([10]) works as documented. An exception is raised because [10] is not hashable.
There is no magic happening during initialisation.
set([0,1,2,3,4,5,6,7,8,9]) has the same effect as set(range(10)) and set(xrange(10)) and set(foo()) where
def foo():
for i in (9,8,7,6,5,4,3,2,1,0):
yield i
In other words, the arg to set is an iterable, and each of the values obtained from the iterable must be hashable.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy.equal with string values - python

What's wrong with a stock list comprehension? index = [x is None for x in L]

Related

Creating a numpy array from a set

Two similar array-in-array containment tests. One passes, the other raises a ValueError. Why?

What's the Pythonic way to initialize an array?

Appending two arrays together in Python

Understanding the behavior of Python's set

Categories

Resources