Address of last value in 1d NumPy array - python

I have a 1d array with zeros scattered throughout. Would like to create a second array which contains the position of the last zero, like so:
>>> a = np.array([1, 0, 3, 2, 0, 3, 5, 8, 0, 7, 12])
>>> foo(a)
[0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3]
Is there a built-in NumPy function or broadcasting trick to do this without using a for loop or other iterator?

>>> (a == 0).cumsum()
array([0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3])

Related

How to sort numpy array column-wise consequetly?

I want to sort 2d array column-wise consequently, so if the values in one column are equal then sorting is performed by next column.
For example array
[[1, 0, 4, 2, 3]
[0, 1, 5, 7, 4]
[0, 0, 6, 1, 0]]
must be sorted as
[[0, 0, 6, 1, 0]
[0, 1, 5, 7, 4]
[1, 0, 4, 2, 3]]
So rows must not be changed, only their order. How can I do that?
This should work
import numpy as np
a = np.array([[1, 0, 4, 2, 3],[0, 1, 5, 7, 4],[0, 0, 6, 1, 0]])
np.sort(a.view('i8,i8,i8,i8,i8'), order=['f0'], axis=0).view(np.int)
I get
array([[0, 0, 6, 1, 0],
[0, 1, 5, 7, 4],
[1, 0, 4, 2, 3]])
f0 is the column which you want to sort by.

Numpy unique function

I have a quick question about the numpy unique function. I want to return the unique column values for each row
import numpy as np
a = np.array([[3, 2, 3, 2, 1, 3, 1, 2, 1, 3, 1, 2, 2, 2, 3, 3],
[3, 2, 3, 2, 3, 3, 3, 3, 2, 2, 3, 1, 2, 1, 2, 1],
[3, 3, 3, 2, 3, 3, 3, 2, 2, 2, 3, 2, 2, 3, 1, 1]]) # a.shape is (3,16)
np.unique(a)
array([1, 2, 3]) # not what I want
np.unique(a,axis=1)
array([[1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3],
[2, 3, 1, 1, 2, 2, 3, 1, 2, 2, 3],
[2, 3, 2, 3, 2, 3, 2, 1, 1, 2, 3]]) # also not what I want, and I'm not even sure what its doing
np.apply_along_axis(np.unique,1,a)
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]) # this is what I want
The problem is that I also want to use other features of np.unqiue, like returning index values. Can anyone help me to get np.unique to work by itself?
You can loop over rows and collect unique values:
import numpy as np
a = np.array([[3, 2, 3, 2, 1, 3, 1, 2, 1, 3, 1, 2, 2, 2, 3, 3],
[3, 2, 3, 2, 3, 3, 3, 3, 2, 2, 3, 1, 2, 1, 2, 1],
[3, 3, 3, 2, 3, 3, 3, 2, 2, 2, 3, 2, 2, 3, 1, 1]])
arr = np.empty((0,3), int)
for row in a:
arr = np.append(arr, np.array([np.unique(a)]), axis=0)
Output:
[[1 2 3]
[1 2 3]
[1 2 3]]
numpy will not be able to return a matrix with rows of different sizes. your example has exactly 3 distinct values per row which makes np.apply_along_axis work but if you had a value of 4 in one of the rows or only 1s and 2s on a row it would fail.
To obtain what you are looking for you will need to use a normal Python list as the result. You can build it using a list comprehension:
import numpy as np
a = np.array([[1, 2, 2, 2, 1, 1, 1, 2, 1, 2, 1, 2, 2, 2, 1, 1],
[3, 2, 3, 2, 3, 3, 3, 3, 2, 2, 3, 1, 2, 1, 2, 1],
[3, 3, 3, 2, 3, 3, 4, 2, 2, 2, 3, 2, 2, 3, 1, 1]])
r = [ np.unique(row) for row in a ]
print(r)
# [array([1, 2]), array([1, 2, 3]), array([1, 2, 3, 4])]
r = [ np.unique(row,return_index=True)for row in a ]
print(r)
# [(array([1, 2]), array([0, 1])),
# (array([1, 2, 3]), array([11, 1, 0])),
# (array([1, 2, 3, 4]), array([14, 3, 0, 6]))]
One thing you could do is build a mask of the values that are the first of their kind on each row. This can be done using numpy.
Here's one way to do it (hopefully, numpy experts could suggest something less convoluted):
np.sum(np.cumsum(np.cumsum(a==np.unique(a)[:,None,None],axis=2),axis=2)==1,axis=0)
array([[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0]])
Such a mask offers many processing options such as finding indices of the first occurrence on each line (using np.argwhere), erasing/assigning first or subsequent occurrences, and more.

how to sort in array index to index

Hey everyone how can I sort array index to index.
So I have code here
a = [0, 1, 2, 3, 4, 4, 3, 2, 1, 0, 4, 3, 2, 1, 0, 0, 1, 2, 3, 4]
how can i sort to?
[0, 4, 1, 3, 2, 2, 3, 1, 4, 0, 4, 0, 3, 1, 2, 2, 1, 3, 0, 4]
this is my idea
I could be wrong, but it sounds like you would like to return a list that is sorted like this:
[first_item, last_item, second_item, second_to_last_item, third_item, third_to_last_item,...]
I don't know of a one-line way to do that, but here's one way you could do it:
import numpy as np
a = [0, 1, 2, 3, 7] # length of list is an odd number
# create indexes that are all positive
index_values = np.repeat(np.arange(0, len(a)//2 + 1), 2) # [0,0,1,1,.....]
# make every other one negative
index_values[::2] *= -1 #[-0, 0, -1, 1, ....]
# return a[i]
[a[i] for i in index_values[1:(len(a)+1)]]
### Output: [0, 7, 1, 3, 2]
It also works for lists with even length:
a = [0, 1, 2, 3, 7, 5] # list length is an even number
index_values = np.repeat(np.arange(0, len(a)//2 + 1), 2) # [0,0,1,1,.....]
index_values[::2] *= -1 #[-0, 0, -1, 1, ....]
[a[i] for i in index_values[1:(len(a)+1)]]
### Output: [0, 5, 1, 7, 2, 3]
Here’s an almost one liner (based on #Callin’s sort method) for those that want one and that can’t/don’t want to use pandas:
from itertools import zip_longest
def custom_sort(a):
half = len(a)//2
return [n for fl in zip_longest(a[:half], a[:half-1:-1]) for n in fl if n is not None])
Examples:
custom_sort([0, 1, 2, 3, 7])
#[0, 7, 1, 3, 2]
custom_sort([0, 1, 2, 3, 7, 5])
#[0, 5, 1, 7, 2, 3]
This can be done in one line, although you’d be repeating the math to find the halfway point
[n for x in zip_longest(a[:len(a)//2], a[:(len(a)//2)-1:-1]) for n in x if n is not None]
Sometimes we want to sort in place, that is without creating a new list. Here is what I came up with
l=[1,2,3,4,5,6,7]
for i in range(1, len(l), 2):
l.insert(i, l.pop())

Python Numpy's argsort duplicate issue [duplicate]

Say you have a numpy vector [0,3,1,1,1] and you run argsort
you will get [0,2,3,4,1] but all the ones are the same!
What I want is an efficient way to shuffle indices of identical values.
Any idea how to do that without a while loop with two indices on the sorted vector?
numpy.array([0,3,1,1,1]).argsort()
Use lexsort:
np.lexsort((b,a)) means Sort by a, then by b
>>> a
array([0, 3, 1, 1, 1])
>>> b=np.random.random(a.size)
>>> b
array([ 0.00673736, 0.90089115, 0.31407214, 0.24299867, 0.7223546 ])
>>> np.lexsort((b,a))
array([0, 3, 2, 4, 1])
>>> a.argsort()
array([0, 2, 3, 4, 1])
>>> a[[0, 3, 2, 4, 1]]
array([0, 1, 1, 1, 3])
>>> a[[0, 2, 3, 4, 1]]
array([0, 1, 1, 1, 3])
This is a bit of a hack, but if your array contains integers only you could add random values and argsort the result. np.random.rand gives you results in [0, 1) so in this case you're guaranteed to maintain the order for non-identical elements.
>>> import numpy as np
>>> arr = np.array([0,3,1,1,1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 3, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 2, 3, 1])
Here we see index 0 is always first in the argsort result and index 1 is last, but the rest of the results are in a random order.
In general you could generate random values bounded by np.diff(np.sort(arr)).max(), but you might run into precision issues at some point.

python range() with duplicates?

Everybody knows that a list of numbers can be obtained with range like this;:
>>> list(range(5))
[0, 1, 2, 3, 4]
If you want, say, 3 copies of each number you could use:
>>> list(range(5)) * 3
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
But is there an easy way using range to repeat copies like this instead?
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
Examples:
sorted(list(range(5)) * 3) # has unnecessary n * log(n) complexity
[x//3 for x in range(3*5)] # O(n), but division seems unnecessarily complicated
You can do:
>>> [i for i in range(5) for _ in range(3)]
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
the range(3) part should be replaced with your number of repetitions...
BTW, you should use generators
Just to make it clearer, the _ is a variable name for something you don't care about (any name is allowed).
This list comprehension uses nested for loops and are just like that:
for i in range(5):
for j in range(3):
#your code here
Try this:
itertools.chain.from_iterable(itertools.repeat(x, 3) for x in range(5))
from itertools import chain, izip
list(chain(*izip(*[xrange(5)]*3)))
Gives
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
Leave off the list and you have a generator.
EDIT: or even better (leaves out a function call to izip):
list(chain(*([x]*3 for x in xrange(5))))
There is a very simple way to do this with a help from numpy. Example:
>>> import numpy as np
>>> np.arange(5*3) // 3
array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4])
With range you can do the following:
>>> list(map(lambda x: x // 3, range(5*3)))
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
Remembering that // performs a strict integer division.
>>> from itertools import chain, izip, tee
>>> list(chain.from_iterable(izip(*tee(range(5), 3))))
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
A cool iterator using another approach:
>>> from collections import Counter
>>> Counter(range(5) * 3).elements()
I like to Keep It Simple :)
>>> sorted(list(range(5)) * 3)
[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
import itertools
[x for tupl in itertools.izip(*itertools.tee(range(0,5),3)) for x in tupl]
Or:
[x for tupl in zip(range(0,5), range(0,5), range(0,5)) for x in tupl]

Categories

Resources