I am looking for a simple pythonic way to get the first element of a numpy array no matter it's dimension. For example:
For [1,2,3,4] that would be 1
For [[3,2,4],[4,5,6]] it would be 3
Is there a simple, pythonic way of doing this?
Using a direct index:
arr[(0,) * arr.ndim]
The commas in a normal index expression make a tuple. You can pass in a manually-constructed tuple as well.
You can get the same result from np.unravel_index:
arr[unravel_index(0, arr.shape)]
On the other hand, using the very tempting arr.ravel[0] is not always safe. ravel will generally return a view, but if your array is non-contiguous, it will make a copy of the entire thing.
A relatively cheap solution is
arr.flat[0]
flat is an indexable iterator. It will not copy your data.
Consider using .item, for example:
a = np.identity(3)
a.item(0)
# 1.0
But note that unlike regular indexing .item strives to return a native Python object, so for example an np.uint8 will be returned as plain int.
If that's acceptable this method seems a bit faster than other methods:
timeit(lambda:a.flat[0])
# 0.3602013469208032
timeit(lambda:a[a.ndim*(0,)])
# 0.3502263119444251
timeit(lambda:a.item(0))
# 0.2366882530041039
Related
my actual data is huge and quite heavy. But if I simplify and say if I have a list of numbers
x = [1,3,45,45,56,545,67]
and I have a function that performs some action on these numbers?
def sumnum(x):
return(x=np.sqrt(x)+1)
whats the best way apply this function to the list? I dont want to apply for loop. Would 'map' be the best option or anything faster/ efficient than that?
thanks,
Prasad
In standard Python, the map function is probably the easiest way to apply a function to an array (not sure about efficiency though). However, if your array is huge, as you mentioned, you may want to look into using numpy.vectorize, which is very similar to Python's built-in map function.
Edit: A possible code sample:
vsumnum = np.vectorize(sumnum)
x = vsumnum(x)
The first function call returns a function which is vectorized, meaning that numpy has prepared it to be mapped to your array, and the second function call actually applies the function to your array and returns the resulting array. Taken from the docs, this method is provided for convenience, not efficiency and is basically the same as a for loop
Edit 2:
As #Ch3steR mentioned, numpy also allows for elementwise operations to arrays, so in this case because you are doing simple operations, you can just do np.sqrt(x) + 1, which will add 1 to the square root of each element. Functions like map and numpy.vectorize are better for when you have more complicated operations to apply to an array
I have a function that I want to have quickly access the first (aka zeroth) element of a given Numpy array, which itself might have any number of dimensions. What's the quickest way to do that?
I'm currently using the following:
a.reshape(-1)[0]
This reshapes the perhaps-multi-dimensionsal array into a 1D array and grabs the zeroth element, which is short, sweet and often fast. However, I think this would work poorly with some arrays, e.g., an array that is a transposed view of a large array, as I worry this would end up needing to create a copy rather than just another view of the original array, in order to get everything in the right order. (Is that right? Or am I worrying needlessly?) Regardless, it feels like this is doing more work than what I really need, so I imagine some of you may know a generally faster way of doing this?
Other options I've considered are creating an iterator over the whole array and drawing just one element from it, or creating a vector of zeroes containing one zero for each dimension and using that to fancy-index into the array. But neither of these seems all that great either.
a.flat[0]
This should be pretty fast and never require a copy. (Note that a.flat is an instance of numpy.flatiter, not an array, which is why this operation can be done without a copy.)
You can use a.item(0); see the documentation at numpy.ndarray.item.
A possible disadvantage of this approach is that the return value is a Python data type, not a numpy object. For example, if a has data type numpy.uint8, a.item(0) will be a Python integer. If that is a problem, a.flat[0] is better--see #user2357112's answer.
np.hsplit(x, 2)[0]
Source: https://numpy.org/doc/stable/reference/generated/numpy.dsplit.html
Source:
https://numpy.org/doc/stable/reference/generated/numpy.hsplit.html
## y -- numpy array of shape (1, Ty)
if you want to get the first element:
use y.shape[0]
if you want to get the second element:
use y.shape[1]
Source:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.take.html
You can also use the take for more complicated extraction (to get few elements):
numpy.take(a, indices, axis=None, out=None, mode='raise')[source] Take
elements from an array along an axis.
If I want to create a numpy array with dtype = [('index','<u4'),('valid','b1')], and I have separate numpy arrays for the 32-bit index and boolean valid values, how can I do it?
I don't see a way in the numpy.ndarray constructor; I know I can do this:
arr = np.zeros(n, dtype = [('index','<u4'),('valid','b1')])
arr['index'] = indices
arr['valid'] = validity
but somehow calling np.zeros() first seems wrong.
Any suggestions?
An alternative is
arr = np.fromiter(zip(indices, validity), dtype=[('index','<u4'),('valid','b1')])
but I suspect your initial idea is more efficient. (In your approach, you could use np.empty() instead of np.zeros() for a tiny performance benefit.)
Just use empty instead of zeros, and it should feel less 'wrong', since you are just allocating the data without unnecessarily zeroing it.
Or use fromiter, and pass in also the optional count argument if you're keen on performance.
This is in any case a matter of taste in more than 99% of the use cases, and won't lead to any noticeable performance improvements IMHO.
This to understand things better. It is not an actual problem that I need to fix. A cstringIO object is supposed to emulate a string, file and also an iterator over the lines. Does it also emulate a buffer ? In anycase ideally one should be able to construct a numpy array as follows
import numpy as np
import cstringIO
c = cStringIO.StringIO('\x01\x00\x00\x00\x01\x00\x00\x00')
#Trying the iterartor abstraction
b = np.fromiter(c,int)
# The above fails with: ValueError: setting an array element with a sequence.
#Trying the file abstraction
b = np.fromfile(c,int)
# The above fails with: IOError: first argument must be an open file
#Trying the sequence abstraction
b = np.array(c, int)
# The above fails with: TypeError: long() argument must be a string or a number
#Trying the string abstraction
b = np.fromstring(c)
#The above fails with: TypeError: argument 1 must be string or read-only buffer
b = np.fromstring(c.getvalue(), int) # does work
My question is why does it behave this way.
The practical problem where this came up is the following: I have a iterator which yields a tuple. I am interested in making a numpy array from one of the components of the tuple with as little copying and duplication as possible. My first cut was to keep writing the interesting components of the yielded tuple into a StringIO object and then use its memory buffer for the array. I can of course use getvalue() but will create and return a copy. What would be a good way to avoid the extra copying.
The problem seems to be that numpy doesn't like being given characters instead of numbers. Remember, in Python, single characters and strings have the same type — numpy must have some type detection going on under the hood, and takes '\x01' to be a nested sequence.
The other problem is that a cStringIO iterates over its lines, not its characters.
Something like the following iterator should get around both of these problems:
def chariter(filelike):
octet = filelike.read(1)
while octet:
yield ord(octet)
octet = filelike.read(1)
Use it like so (note the seek!):
c.seek(0)
b = np.fromiter(chariter(c), int)
As cStringIO does not implement the buffer interface, if its getvalue returns a copy of the data, then there is no way to get its data without copying.
If getvalue returns the buffer as a string without making a copy, numpy.frombuffer(x.getvalue(), dtype='S1') will give a (read-only) numpy array referring to the string, without an additional copy.
The reason why np.fromiter(c, int) and np.array(c, int) do not work is that cStringIO, when iterated, returns a line at a time, similarly as files:
>>> list(iter(c))
['\x01\x00\x00\x00\x01\x00\x00\x00']
Such a long string cannot be converted to a single integer.
***
It's best not to worry too much about making copies unless it really turns out to be a problem. The reason is that the extra overhead in e.g. using a generator and passing it to numpy.fromiter may be actually larger than what is involved in constructing a list, and then passing that to numpy.array --- making the copies may be cheap compared to Python runtime overhead.
However, if the issue is with memory, then one solution is to put the items directly into the final Numpy array. If you know the size beforehand, you can pre-allocate it. If the size is unknown, you can use the .resize() method in the array to grow it as needed.
How to create big array in python, how efficient creating that
in C/C++:
byte *data = (byte*)memalloc(10000);
or
byte *data = new byte[10000];
in python...?
Have a look at the array module:
import array
array.array('B', [0] * 10000)
Instead of passing a list to initialize it, you can pass a generator, which is more memory efficient.
You can pre-allocate a list with:
l = [0] * 10000
which will be slightly faster than .appending to it (as it avoids intermediate reallocations). However, this will generally allocate space for a list of pointers to integer objects, which will be larger than an array of bytes in C.
If you need memory efficiency, you could use an array object. ie:
import array, itertools
a = array.array('b', itertools.repeat(0, 10000))
Note that these may be slightly slower to use in practice, as there is an unboxing process when accessing elements (they must first be converted to a python int object).
You can efficiently create big array with array module, but using it won't be as fast as C. If you intend to do some math, you'd be better off with numpy.array
Check this question for comparison.
Typically with python, you'd just create a list
mylist = []
and use it as an array. Alternatively, I think you might be looking for the array module. See http://docs.python.org/library/array.html.