np.array index slicing bethween conditions - python

I need to slice an array's index from where a first condition is true to where a second condition is true, these conditions are never true at the same time, but one can be true more than one time before the other occurs.
I try to explain:
array_filter = np.array([3,4,5,6,4,3,2,3,4,5])
array1 = np.array([2,3,4,6,3,3,1,2,3,4])
array2 = np.array([3,5,6,7,5,4,3,3,5,6])
array1_cond = array1 >= array_filter
array2_cond = array2 <= array_filter
0 1 2 3 4 5 6 7 8 9
array_filter 3 4 5 6 4 3 2 3 4 5
array1 2 3 4 6 3 3 1 2 3 4
array1_cond ^ ^ (^ = True)
array2 3 5 6 7 5 4 3 3 5 6
array2_cond ^ ^
expected_output 2 3 4 | 7 5 4 3 | 2 3 4
array1 | array2 | array1
EXPECTED OUTPUT:
expected_output[(array2_cond) : (array1_cond)] = array1[(array2_cond) : (array1_cond)]
expected_output[(array1_cond) : (array2_cond)] = array2[(array1_cond) : (array2_cond)]
expected_output = [ 2, 3, 4, 7, 5, 4, 3, 2, 3, 4 ]
I'm so sorry if syntax is a little confusing, but idk how to make it better... <3
How can I perform this?
Is it possible WITHOUT LOOPS?

This works for your example, with a, b in place of array1, array2:
nz = np.flatnonzero(a_cond | b_cond)
lengths = np.diff(nz, append=len(a))
cond = np.repeat(b_cond[nz], lengths)
result = np.where(cond, a, b)
If at the start of the arrays neither condition holds true then elements from b are selected.

Related

How to remove consecutive pairs of opposite numbers from Pandas Dataframe?

How can i remove consecutive pairs of equal numbers with opposite signs from a Pandas dataframe?
Assuming i have this input dataframe
incremental_changes = [2, -2, 2, 1, 4, 5, -5, 7, -6, 6]
df = pd.DataFrame({
'idx': range(len(incremental_changes)),
'incremental_changes': incremental_changes
})
idx incremental_changes
0 0 2
1 1 -2
2 2 2
3 3 1
4 4 4
5 5 5
6 6 -5
7 7 7
8 8 -6
9 9 6
I would like to get the following
idx incremental_changes
0 0 2
3 3 1
4 4 4
7 7 7
Note that the first 2 could either be idx 0 or 2, it doesn't really matter.
Thanks
Can groupby consecutive equal numbers and transform
import itertools
def remove_duplicates(s):
''' Generates booleans that indicate when a pair of ints with
opposite signs are found.
'''
iter_ = iter(s)
for (a,b) in itertools.zip_longest(iter_, iter_):
if b is None:
yield False
else:
yield a+b == 0
yield a+b == 0
>>> mask = df.groupby(df['incremental_changes'].abs().diff().ne(0).cumsum()) \
['incremental_changes'] \
.transform(remove_duplicates)
Then
>>> df[~mask]
idx incremental_changes
2 2 2
3 3 1
4 4 4
7 7 7
Just do rolling, then we filter the multiple combine
s = df.incremental_changes.rolling(2).sum()
s = s.mask(s[s==0].groupby(s.ne(0).cumsum()).cumcount()==1)==0
df[~(s | s.shift(-1))]
Out[640]:
idx incremental_changes
2 2 2
3 3 1
4 4 4
7 7 7

Turn list clockwise for one time

How I can rotate list clockwise one time? I have some temporary solution, but I'm sure there is a better way to do it.
I want to get from this
Index: 0 1 2 3 4 5 6 7 8 9
Count: 0 2 4 4 5 6 6 7 7 7
to this:
Index: 0 1 2 3 4 5 6 7 8 9
Count: 0 0 2 4 4 5 6 6 7 7
And my temporary "solution" is just:
temporary = [0, 2, 4, 4, 5, 6, 6, 7, 7, 7]
test = [None] * len(temporary)
test[0] = temporary[0]
for index in range(1, len(temporary)):
test[index] = temporary[index - 1]
You might use temporary.pop() to discard the last item and temporary.insert(0, 0) to add 0 to the front.
Alternatively in one line:
temporary = [0] + temporary[:-1]

Rolling sum on a dynamic window

I am new to python and the last time I coded was in the mid-80's so I appreciate your patient help.
It seems .rolling(window) requires the window to be a fixed integer. I need a rolling window where the window or lookback period is dynamic and given by another column.
In the table below, I seek the Lookbacksum which is the rolling sum of Data as specified by the Lookback column.
d={'Data':[1,1,1,2,3,2,3,2,1,2],
'Lookback':[0,1,2,2,1,3,3,2,3,1],
'LookbackSum':[1,2,3,4,5,8,10,7,8,3]}
df=pd.DataFrame(data=d)
eg:
Data Lookback LookbackSum
0 1 0 1
1 1 1 2
2 1 2 3
3 2 2 4
4 3 1 5
5 2 3 8
6 3 3 10
7 2 2 7
8 1 3 8
9 2 1 3
You can create a custom function for use with df.apply, eg:
def lookback_window(row, values, lookback, method='sum', *args, **kwargs):
loc = values.index.get_loc(row.name)
lb = lookback.loc[row.name]
return getattr(values.iloc[loc - lb: loc + 1], method)(*args, **kwargs)
Then use it as:
df['new_col'] = df.apply(lookback_window, values=df['Data'], lookback=df['Lookback'], axis=1)
There may be some corner cases but as long as your indices align and are unique - it should fulfil what you're trying to do.
here is one with a list comprehension which stores the index and value of the column df['Lookback'] and the gets the slice by reversing the values and slicing according to the column value:
df['LookbackSum'] = [sum(df.loc[:e,'Data'][::-1].to_numpy()[:i+1])
for e,i in enumerate(df['Lookback'])]
print(df)
Data Lookback LookbackSum
0 1 0 1
1 1 1 2
2 1 2 3
3 2 2 4
4 3 1 5
5 2 3 8
6 3 3 10
7 2 2 7
8 1 3 8
9 2 1 3
An exercise in pain, if you want to try an almost fully vectorized approach. Sidenote: I don't think it's worth it here. At all.
Inspired by Divakar's answer here
Given:
import numpy as np
import pandas as pd
d={'Data':[1,1,1,2,3,2,3,2,1,2],
'Lookback':[0,1,2,2,1,3,3,2,3,1],
'LookbackSum':[1,2,3,4,5,8,10,7,8,3]}
df=pd.DataFrame(data=d)
Using the function from Divakar's answer, but slightly modified
from skimage.util.shape import view_as_windows as viewW
def strided_indexing_roll(a, r, fill_value=np.nan):
# Concatenate with sliced to cover all rolls
p = np.full((a.shape[0],a.shape[1]-1),fill_value)
a_ext = np.concatenate((p,a,p),axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = a.shape[1]
return viewW(a_ext,(1,n))[np.arange(len(r)), -r + (n-1),0]
Now, we just need to prepare a 2d array for the data and independently shift the rows according to our desired lookback values.
arr = df['Data'].to_numpy().reshape(1, -1).repeat(len(df), axis=0)
shifter = np.arange(len(df) - 1, -1, -1) #+ d['Lookback'] - 1
temp = strided_indexing_roll(arr, shifter, fill_value=0)
out = strided_indexing_roll(temp, (len(df) - 1 - df['Lookback'])*-1, 0).sum(-1)
Output:
array([ 1, 2, 3, 4, 5, 8, 10, 7, 8, 3], dtype=int64)
We can then just assign it back to the dataframe as needed and check.
df['out'] = out
#output:
Data Lookback LookbackSum out
0 1 0 1 1
1 1 1 2 2
2 1 2 3 3
3 2 2 4 4
4 3 1 5 5
5 2 3 8 8
6 3 3 10 10
7 2 2 7 7
8 1 3 8 8
9 2 1 3 3

Shuffle "coupled" elements in python array

Let's say I have this array:
np.arange(9)
[0 1 2 3 4 5 6 7 8]
I would like to shuffle the elements with np.random.shuffle but certain numbers have to be in the original order.
I want that 0, 1, 2 have the original order.
I want that 3, 4, 5 have the original order.
And I want that 6, 7, 8 have the original order.
The number of elements in the array would be multiple of 3.
For example, some possible outputs would be:
[ 3 4 5 0 1 2 6 7 8]
[ 0 1 2 6 7 8 3 4 5]
But this one:
[2 1 0 3 4 5 6 7 8]
Would not be valid because 0, 1, 2 are not in the original order
I think that maybe zip() could be useful here, but I'm not sure.
Short solution using numpy.random.shuffle and numpy.ndarray.flatten functions:
arr = np.arange(9)
arr_reshaped = arr.reshape((3,3)) # reshaping the input array to size 3x3
np.random.shuffle(arr_reshaped)
result = arr_reshaped.flatten()
print(result)
One of possible random results:
[3 4 5 0 1 2 6 7 8]
Naive approach:
num_indices = len(array_to_shuffle) // 3 # use normal / in python 2
indices = np.arange(num_indices)
np.random.shuffle(indices)
shuffled_array = np.empty_like(array_to_shuffle)
cur_idx = 0
for idx in indices:
shuffled_array[cur_idx:cur_idx+3] = array_to_shuffle[idx*3:(idx+1)*3]
cur_idx += 3
Faster (and cleaner) option:
num_indices = len(array_to_shuffle) // 3 # use normal / in python 2
indices = np.arange(num_indices)
np.random.shuffle(indices)
tmp = array_to_shuffle.reshape([-1,3])
tmp = tmp[indices,:]
tmp.reshape([-1])

range() function is giving me trouble

If I were to type something like this, I would get these values:
print range(1,10)
[1,2,3,4,5,6,7,8,9]
but say if I want to use this same value in a for loop then it would instead start at 0, an example of what I mean:
for r in range(1,10):
for c in range(r):
print c,
print ""
The Output is this:
0
0 1
0 1 2
0 1 2 3
0 1 2 3 4
0 1 2 3 4 5
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7 8
Why is 0 here? shouldn't it start at 1 and end in 9?
You are creating a second range() object in your loop. The default start value is 0.
Each iteration you create a loop over range(r), meaning range from 0 to r, exclusive, to produce the output numbers. For range(1) that means you get a list with just [0] in it, for range(1) you get [0, 1], etc.
If you wanted to produce ranges from 1 to r inclusive`, just add 1 to the number you actually print:
for r in range(1,10):
for c in range(r):
print c + 1,
print ""
or range from 1 to r + 1:
for r in range(1,10):
for c in range(1, r + 1):
print c,
print ""
Both produce your expected output:
>>> for r in range(1,10):
... for c in range(r):
... print c + 1,
... print ""
...
1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
1 2 3 4 5 6
1 2 3 4 5 6 7
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9
>>> for r in range(1,10):
... for c in range(1, r + 1):
... print c,
... print ""
...
1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
1 2 3 4 5 6
1 2 3 4 5 6 7
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9
If you pass only one argument to range function, it would treat that as the ending value (without including it), starting from zero.
If you pass two arguments to the range function, it would treat the first value as the starting value and the second value as the ending value (without including it).
If you pass three arguments to the range function, it would treat the first value as the starting value and the second value as the ending value (without including it) and the third value as the step value.
You can confirm this with few trial runs like this
>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] # Default start value 0
>>> range(5, 10)
[5, 6, 7, 8, 9] # Starts from 5
>>> range(5, 10, 2)
[5, 7, 9] # Starts from 5 & takes only the 2nd element
Nope.
for r in range(1,10):
for c in range(r):
print c,
print ""
range(), when only given one argument, prints the numbers from 0 to the argument, not including the argument:
>>> range(6)
[0, 1, 2, 3, 4, 5]
And so, on the third iteration of your code, this is what happens:
for r in range(1,10): # r is 3
for c in range(r): # range(3) is [0,1,2]
print c, #you then print each of the range(3), giving the output you observe
print ""
https://docs.python.org/2/library/functions.html#range
From the docs:
The arguments must be plain integers. If the step argument is omitted, it defaults to 1. If the start argument is omitted, it defaults to 0.

Categories

Resources