Counting array elements in Python [duplicate] - python

This question already has answers here:
How do I get the number of elements in a list (length of a list) in Python?
(11 answers)
Closed 5 years ago.
How can I count the number of elements in an array, because contrary to logic array.count(string) does not count all the elements in the array, it just searches for the number of occurrences of string.

The method len() returns the number of elements in the list.
Syntax:
len(myArray)
Eg:
myArray = [1, 2, 3]
len(myArray)
Output:
3

len is a built-in function that calls the given container object's __len__ member function to get the number of elements in the object.
Functions encased with double underscores are usually "special methods" implementing one of the standard interfaces in Python (container, number, etc). Special methods are used via syntactic sugar (object creation, container indexing and slicing, attribute access, built-in functions, etc.).
Using obj.__len__() wouldn't be the correct way of using the special method, but I don't see why the others were modded down so much.

If you have a multi-dimensional array, len() might not give you the value you are looking for. For instance:
import numpy as np
a = np.arange(10).reshape(2, 5)
print len(a) == 2
This code block will return true, telling you the size of the array is 2. However, there are in fact 10 elements in this 2D array. In the case of multi-dimensional arrays, len() gives you the length of the first dimension of the array i.e.
import numpy as np
len(a) == np.shape(a)[0]
To get the number of elements in a multi-dimensional array of arbitrary shape:
import numpy as np
size = 1
for dim in np.shape(a): size *= dim

Or,
myArray.__len__()
if you want to be oopy; "len(myArray)" is a lot easier to type! :)

Before I saw this, I thought to myself, "I need to make a way to do this!"
for tempVar in arrayName: tempVar+=1
And then I thought, "There must be a simpler way to do this." and I was right.
len(arrayName)

Related

Sum of Boolean List in Python not functioning as expected

I understand that python can treat 'True' as '1' (as do many coding languages) and as such taking the sum() of a list should return the number of trues in the list.
(as demonstrated in Counting the number of True Booleans in a Python List)
I'm new to Python and have been going through some of the ISLR application exercises in Python (http://www.springer.com/us/book/9781461471370).
Chapter 2, problem 10 (h) has a pretty simple question asking for the number of observations of a variable ('rm') that are greater than 7. I would expect the following code to work:
test = [Boston['rm'] > 7]
sum(test)
However this returns the entire list "test" with 0's and 1's, not its sum. Can anyone explain why?
(note Boston is from the Boston data set from the MASS package in R)
If I use a tuple or numpy array instead of a list it works just fine; for example:
test2 = (Boston['rm'] > 7)
sum(test2)
test3 = np.array(Boston['rm'] > 7)
sum(test3)
Also "test" seems to be a proper Boolean list because the following code using it to subset "Boston" also works fine:
test4 = Boston[Boston['rm'] > 7]
len(test4)
While I have clearly found several methods that work, I'm confused why the first did not. Thanks in advance.
If I use a tuple or numpy array instead of a list it works just fine; for example:
test2 = (Boston['rm'] > 7)
sum(test2)
test3 = np.array(Boston['rm'] > 7)
sum(test3)
(Boston['rm'] > 7) uses parentheses for grouping; it isn’t a tuple. The tuple equivalent would be (Boston['rm'] > 7,) (note the comma), and it breaks in the same way as the list does. Using np.array on an array doesn’t change it – it’s like the difference between list(5) and [5].
As for why it doesn’t work: Boston['rm'] > 7 is an array, so you want to get its sum directly. Wrapping it in another list means you’re taking the sum of a list of arrays and not a list of booleans.

How to represent ":" in numpy [duplicate]

This question already has answers here:
Array Assignment in numpy / : colon equivalent
(2 answers)
Closed 1 year ago.
I want to slice a multidimensional ndarray but don't know which dimension I will slice on. Lets say we have a ndarray A with shape (6,7,8). Sometimes I need to slice on 1st dimension A[:,3,4], sometimes on third A[1,2,:].
Is there any symbol represent the ":"? I want to use it to generate an index array.
index=np.zeros(3)
index[0]=np.:
index[1]=3
index[2]=4
A[index]
The : slice can be explicitly created by calling slice(None) Here's a short example:
import numpy as np
A = np.arange(9).reshape(3, -1)
# extract the 2nd column
A[:, 1]
# equivalently we can do
cslice = slice(None) # represents the colon
A[cslice, 1]
You want index to be a tuple, with a mix of numbers, lists and slice objects. A number of the numpy functions that take an axis parameter construct such a tuple.
A[(slice(None, None, None), 3, 4)] # == A[:, 3, 4]
there are various ways constructing that tuple:
index = (slice(None),)+(3,4)
index = [slice(None)]*3; index[1] = 3; index[2] = 4
index = np.array([slice(None)]*3]; index[1:]=[3,4]; index=tuple(index)
In this case index can be list or tuple. It just can't be an array.
Starting with a list (or array) is handy in that you can modify values, but it is best to convert it to a tuple before use. I'd have to check the docs for the details, but there are circumstances where a list means something different from a tuple.
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
Remember that a slicing tuple can always be constructed as obj and used in the x[obj] notation. Slice objects can be used in the construction in place of the [start:stop:step] notation. For example, x[1:10:5,::-1] can also be implemented as obj = (slice(1,10,5), slice(None,None,-1)); x[obj] . This can be useful for constructing generic code that works on arrays of arbitrary dimension.

efficient way for replacing sub-arrays within numpy array - numpy.put or similar?

I have a long list, called "colours", containing tuples of length 4.
I need to substitute some of these tuples by other tuples (or more specifically, all the tuples that I need to replace should be replaced by the tuple (1.,0.,0.,1.), corresponding to the colour 'red' in matplotlib). I know the indices of the tuples I need to substitute - stored in a list called "indices", of length "li".
Of course I could use:
for i in range(li):
colours[indices[i]] = (1.,0.,0.,1.)
but because the list "colours" is 400*400 elements long, this substitution takes fairly long. I was wondering whether there is a quicker and more elegant way to do this?
I tried to convert "colours" into a numpy.array (which still works fine for matplotlib), and then using the numpy.put method:
n.put(colours, indices, [(1.,0.,0.,1.)]*li)
but this does not work, because instead of replacing the whole tuple, n.put just replaces a sub-element of the tuple (i.e. a single number) within colours by another number from the (1.,0.,0.,1.)-tuple.
Does anyone have any suggestions what to use?
If you convert colours to a NumPy array, then you could use so-called "advanced (integer) indexing" to do the assignment:
colours = np.array(colours)
colours[indices, :] = (1, 0, 0, 1)

Replacing Nones in a python array with zeroes

I've just joined two arrays of unequal length together with the command:
allorders = map(None,todayorders, lastyearorders)
where "none" is given where today orders fails to have a value (as the todayorders array is not as long).
However, when I try to pass the allorders array into a matplotlib bar chart:
p10= plt.bar(ind, allorders[9], width, color='#0000DD', bottom=allorders[8])
..I get the following error:
TypeError: unsupported operand type(s) for +=: 'int' and 'NoneType'
So, is there a way for matplotlib to accept none datatypes? if not, how do I replace the 'Nones' with zeroes in my allorders array?
If you can, as I am a Python newbie (coming over from the R community), please provide detailed code from start to finish that I can use/test.
Use a list comprehension:
allorders = [i if i[0] is not None else (0, i[1]) for i in allorders]
With numpy:
import numpy as np
allorders = np.array(allorders)
This creates an arrray of objects due to the Nones. We can replace them with zeros:
allorders[allorders == None] = 0
Then convert the array to the proper type:
allorders.astype(int)
Since it sounds like you want this all to be in numpy, the direct answer to your question is really just an aside, and the right answer doesn't being until the "Of course…" paragraph.
If you think about it, you're using map with a None first parameter as a zip_longest, because Python doesn't have a zip_longest. But it does have one, in itertools—and it allows you to specify a custom fillvalue. So, you can do this all in one step with izip_longest:
>>> import itertools
>>> todayorders = [1, 2]
>>> lastyearorders = [1, 2, 3]
>>> allorders = itertools.izip_longest(todayorders, lastyearorders, fillvalue=0)
>>> list(allorders)
[(1, 1), (2, 2), (0, 3)]
This only fills in 0 for the Nones that show up as extra values for the shorter list; if you want to replace every None with a 0, you have to do it Martijn Pieters's way. But I think this is what you want.
Also, note that list(allorders) at the end: izip_longest, like most things in itertools, returns an iterator, not a list. Or, in terms you might be more familiar with, it returns a "lazy" sequence rather than a "strict" one. If you're just going to iterate over the result, that's actually better, but if you need to use it with some function that requires a list (like printing it out in human-readable form—or accessing allorders[9], as in your example), you need to explicitly convert it first.
If you actually want a numpy.array rather than a list, you can get there directly, without going through a list first. (If all you're ever going to do with it is matplotlib it, you probably do want an array.) The clearest way is to just use np.fromiter(allorders) instead of list(allorders). You might want to pass an explicit dtype=int (or whatever's appropriate). And, if you know the size (which you do—it's max(len(todayorders), len(lastyearorders))), in some cases it's faster or simpler to pass an explicit count as well.
Of course if any of the numpy stuff sounds appealing, you probably should stay within numpy in the first place, instead of using map or izip_longest:
>>> todayorders.resize(lastyearorders.shape)
>>> allorders = np.vstack(todayorders, lastyearorders).transpose()
Unfortunately, that mutates todayorders, and as far as I know, the equivalent immutable function numpy.resize doesn't give you any way to "zero-extend", but instead repeats the values. Hopefully I'm wrong and someone will suggest the easy way, but otherwise, you have to do it explicitly:
>>> extrazeros = np.zeros(len(lastyearorders) - len(todayorders), dtype=int)
>>> allorders = np.vstack(np.concatenate((todayorders, extrazeros)), lastyearorders)
>>> allorders = allorders.transpose()
array([[ 1, 1],
[ 2, 2],
[ 0, 3]])
Of course if you do a lot of that, I'd write a zeroextend function that takes a pair of arrays and extends one to match the other (or, if you're not just dealing with 1D, extends the shorter one on each axis to make the other).
At any rate, aside from being faster and using less temporary memory than using map, izip_longest, etc., this also means that you end up with a final array with the right dtype (int rather than object)—which means your result also uses less long-term memory, and everything you do from then on will also be faster and use less temporary memory.
For completeness: It is possible to have pyplot handle None values, but I don't think it's what you want. For example, you can pass it a Transform object whose transform method converts None to 0. But this will be effectively the same as Martijn Pieters's answer but much more verbose, and there's no advantage at all unless you need to plot tons of such arrays.

replacing values in a whole array

I would like to ask how I can change the values in a whole NumPy array.
For example I want to change every value which is < 1e-15 to be equal to 1e-15.
Assuming you mean a numpy array, and it's pointed to by a variable a:
np.fmax(a, 1e-15, a)
This finds the maximum of the two values given as the first two arguments (a and 1e-15) on a per-element basis, and writes the result back to the array given as the third argument, a.
I had a hard time finding the official docs for this function, but I found this.
If L is a list:
L[:] = [max(x, 10e-15) for x in L]
Assuming you mean a lsit instead of an array, I'd recommend to use a list comprehension:
new_list = [max(x, 1e-15) for x in my_list]
(I also assume you mean 1e-15 == 10. ** (-15) instead of 10e-15 == 1e-14.)
There also exist "arrays" in Python: The class array.array from the standard library, and NumPy arrays.
I like numpy.fmax (which was new to me), but for a possibly more generic case, I often use:
a[a < 1e-15] = 1e-15
(More generic in the sense that you can vary the condition, or that the replacement value is not equal to the comparison value.)

Categories

Resources