Repeat ndarray n times [duplicate] - python

This question already has answers here:
Repeating each element of a numpy array 5 times
(2 answers)
Closed 3 years ago.
I have a numpy.ndarray with True/False:
import numpy as np
a = np.array([True, True, False])
I want:
out = np.array([True, True, False, True, True, False, True, True, False])
I tried:
np.repeat(a, 3, axis = 0)
But it duplicates each element, I want to duplicate the all array.
This is the closes I got:
np.array([a for i in range(3)])
However, I want it to stay as 1D.
Edit
It was suggested to be a duplicate of Repeating each element of a numpy array 5 times. However, my question was how to repeat the all array and not each element.

Use np.tile
>>> a = np.array([True, True, False])
>>> np.tile(a, 3)
... array([ True, True, False, True, True, False, True, True, False])

Try:
import numpy as np
a = np.array([True, True, False])
print(np.concatenate([a]*3))
[ True True False True True False True True False]

Related

Mask with numpy isin

I want to make a mask with numpy array. I've found a function, but it doesn't make what I want. Here is the code example:
np.isin([1,2,3,4,5,8,6,1,1],[1,2,3,5,1])
This code returns this:
array([ True, True, True, False, True, False, False, True, True], dtype=bool)
But I want the same output except the last value of the output array to be False. Because I need exact mask of the sequence ([1,2,3,5,1]) in this order and no longer than its length.
You can turn elements after certain amount of Trues to zero with:
mask[mask.cumsum() > 5] = False
# ^ length of the second array
import numpy as np
mask = np.isin([1,2,3,4,5,8,6,1,1],[1,2,3,5,1])
mask[mask.cumsum() > 5] = False
mask
# array([ True, True, True, False, True, False, False, True, False], dtype=bool)

Partial random selection of items from numpy array from items that meet certain condition [duplicate]

This question already has answers here:
Make numpy matrix more sparse
(2 answers)
Closed 5 years ago.
I have a bool array that was created with respect to a double array:
array1 = ... # the double array initialization
array2 = array1 < threshold # threshold is set somewhere else
Assuming the output of my second array is like this:
# array2 = [True, False, True, True, True, False]
I want to select percentage of the True items. For example, if I want to randomly select 75% of the True items, the output would be any of these:
# array3 = [True, False, True, True, False, False]
# array3 = [False, False, True, True, True, False]
# array3 = [True, False, False, True, True, False]
The third array contains 3 out of the 4 True items that were found in the second array. How can I achieve this?
So, that is actually just a job of
getting all the indexes of True in your vector -> true_indices
shuffle true_indices
true_indices = true_indices[0:len(true_indices)*3/4
array3 = [False]*len(array2)
array3[true_indices] = True
done. all these "I need to randomly pick a fixed amount from a set" is usually well convertible to a shuffling method.
Numpy comes with a shuffle function.

Numpy indexing in python [duplicate]

This question already has answers here:
Mask out specific values from an array
(3 answers)
Closed 5 years ago.
Here is the deal,
idx_arr = [0,3,5,7];
tgt_arr = [
[0,3,3,5,5,6,6],
[1,1,3,1,1,3,3],
[2,4,6,8,1,2,9]]
I want to make new array with bool type that would look like. I tried also with sets but numpy.ndarrays are unhashable types. New matrix would look like
final_arr = [
[t,t,t,t,t,f,f],
[f,f,t,f,f,t,t],
[f,f,f,f,f,f,f]]
Thanks in advance.
Using base Python:
[[True if val in idx_arr else False for val in row] for row in tgt_arr]
Result:
[[True, True, True, True, True, False, False],
[False, False, True, False, False, True, True],
[False, False, False, False, False, False, False]]

Boolean list operation in python [duplicate]

This question already has answers here:
Python AND operator on two boolean lists - how?
(10 answers)
Closed 6 years ago.
Shouldn't the results be the same?
I do not understand.
[True,False] and [True, True]
Out[1]: [True, True]
[True, True] and [True,False]
Out[2]: [True, False]
No, because that's not the way that and operation works in python. First off it doesn't and the list items separately. Secondly the and operator works between two objects and if one of them is False (evaluated as False 1) it returns that and if both are True it returns the second one. Here is an example :
>>> [] and [False]
[]
>>>
>>> [False] and []
[]
>>> [False] and [True]
[True]
x and y : if x is false, then x, else y
If you want to apply the logical operations on all the lists pairs you can use numpy arrays:
>>> import numpy as np
>>> a = np.array([True, False])
>>> b = np.array([True, True])
>>>
>>> np.logical_and(a,b)
array([ True, False], dtype=bool)
>>> np.logical_and(b,a)
array([ True, False], dtype=bool)
1. Here since you are dealing with lists an empty list will be evaluated as False

Is there a pythonic way to get the beginning and end indexes of clusters of identical values in an iterable? [duplicate]

This question already has an answer here:
numpy: search of the first and last index in an array
(1 answer)
Closed 9 years ago.
The following question can easily be solved with a loop, but I suspect that there may be a more pythonic way of acheiving this.
In essence, I have an iterable of booleans that tend to be clustered into groups. Here's an illustrative example:
[True, True, True, True, False, False, False, True, True, True, True, True]
I'd like to pick out the beginning index and end index for each cluster of Trues. Using a loop, this is easy -- each time my iterator is a True, I simply need to check if I'm already in a True cluster. If not, I set the in_true_cluster variable to true and store the index. Once I find a False, I store index - 1 as the end point.
Is there a more pythonic way of doing this? Note that I'm using PANDAS and NumPy as well, so solutions using logical indexing are acceptable.
Actually, here's a numpy way, which should be faster than doing it with itertools or a manual loop:
>>> a = np.array([True, True, True, True, False, False, False, True, True, True, True, True])
>>> np.diff(a)
array([False, False, False, True, False, False, True, False, False,
False, False], dtype=bool)
>>> _.nonzero()
(array([3, 6]),)
As you mention in the comments, pandas's groupby would also work.
Timings to convince #poke this is worthwhile:
>>> %%timeit a = np.random.randint(2, size=1000000)
... np.diff(a).nonzero()
...
100 loops, best of 3: 12.2 ms per loop
>>> def cluster_changes(array):
... changes = []
... last = None
... for i, elt in enumerate(array):
... if elt != last:
... last = elt
... changes.append(i)
... return changes
...
>>> %%timeit a = np.random.randint(2, size=1000000)
cluster_changes(a)
...
1 loops, best of 3: 348 ms per loop
That's a factor of 30 on this array using the one-liner, as compared to the 7-line manual function. (Of course, the data here has many more cluster changes than OP's data, but that's not going to make up for such a big difference.)
How about:
In [25]: l = [True, True, True, True, False, False, False, True, True, True, True, True]
In [26]: d = np.diff(np.array([False] + l + [False], dtype=np.int))
In [28]: zip(np.where(d == 1)[0], np.where(d == -1)[0] - 1)
Out[28]: [(0, 3), (7, 11)]
Here, the two runs are at indices [0; 3] and [7; 11].

Categories

Resources