Group a numpy array

Group a numpy array - python

I have an one-dimensional array A, such that 0 <= A[i] <= 11, and I want to map A to an array B such that
for i in range(len(A)):
if 0 <= A[i] <= 2: B[i] = 0
elif 3 <= A[i] <= 5: B[i] = 1
elif 6 <= A[i] <= 8: B[i] = 2
elif 9 <= A[i] <= 11: B[i] = 3
How can implement this efficiently in numpy?

You need to use an int division by //3, and that is the most performant solution
A = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
B = A // 3
print(A) # [0 1 2 3 4 5 6 7 8 9 10 11]
print(B) # [0 0 0 1 1 1 2 2 2 3 3 3]

I would do something like dividing the values of the A[i] by 3 'cause you're sorting out them 3 by 3, 0-2 divided by 3 go answer 0, 3-5 go answer 1, 6-8 divided by 3 is equal to 2, and so on
I built a little schema here:
A[i] --> 0-2. divided by 3 = 0, what you wnat in array B[i] is 0, so it's ok
A[i] --> 3-5. divided by 3 = 1, and so on. Just use a method to make floor the value, so that it don't become float type.

Answers provided by others are valid, however I find this function from numpy quite elegant, plus it allows you to avoid for loop which could be quite inefficient for large arrays
import numpy as np
bins = [3, 5, 8, 9, 11]
B = np.digitize(A, bins)

Something like this might work:
C = np.zeros(12, dtype=np.int)
C[3:6] = 1
C[6:9] = 2
C[9:12] = 3
B = C[A]

If you hope to expand this to a more complex example you can define a function with all your conditions:
def f(a):
if 0 <= a and a <= 2:
return 0
elif 3 <= a and a <= 5:
return 1
elif 6 <= a and a <= 8:
return 2
elif 9 <= a and a <= 11:
return 3
And call it on your array A:
A = np.array([0,1,5,7,8,9,10,10, 11])
B = np.array(list(map(f, A))) # array([0, 0, 1, 2, 2, 3, 3, 3, 3])

Related

How can i build matrices in numpy

I want to build a matrix in NumPy in which the items add up to each other. So I have tried to build it with the following code:
StartpointRow = int(input("First number of row?:\n"))
EndpointRow = int(input("Last number of row?:\n"))
StepRow = int(input("Which steps should the row have?:\n"))
StartpointCol = int(input("First number of column?:\n"))
EndpointCol = int(input("Last number of column?:\n"))
StepCol = int(input("Which steps should the column have?:\n"))
x = np.array([[i+j for i in range(StartpointCol, EndpointCol , StepCol)]
for j in range(StartpointRow, EndpointRow , StepRow)])
print(x)
let's say that, for instance, I enter 1,4,1 and 1,4,1. I want the solution to be a matrix like this:
1 2 3 4
2 4 5 6
3 5 6 7
4 6 7 8
Not like that:
2 3 4
3 4 5
4 5 6
or If the user types in: 1,4,1 and 2,4,1.
0 1 2 3 4
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
not like that:
3 4
4 5
5 6
Could you help me out?

Use np.add.outer:
def build(r_start, r_stop, r_step, c_start, c_stop, c_step):
r = np.arange(r_start, r_stop + 1, r_step)
c = np.arange(c_start, c_stop + 1, c_step)
if r_start == c_start:
ret = np.empty((c.size, r.size), int)
ret[:, 0] = c
ret[0, :] = r
else:
ret = np.empty((c.size + 1, r.size + 1), int)
ret[0, 0] = 0
ret[1:, 0] = c
ret[0, 1:] = r
np.add.outer(ret[1:, 0], ret[0, 1:], out=ret[1:, 1:])
return ret
A little simplification:
def build(r_start, r_stop, r_step, c_start, c_stop, c_step):
r = np.arange(r_start, r_stop + 1, r_step)
c = np.arange(c_start, c_stop + 1, c_step)
ne = int(r_start != c_start)
ret = np.empty((c.size + ne, r.size + ne), int)
ret[0, 0] = 0
ret[ne:, 0] = c
ret[0, ne:] = r
np.add.outer(ret[1:, 0], ret[0, 1:], out=ret[1:, 1:])
return ret
Test:
>>> build(1, 4, 1, 1, 4, 1)
array([[1, 2, 3, 4],
[2, 4, 5, 6],
[3, 5, 6, 7],
[4, 6, 7, 8]])
>>> build(1, 4, 1, 2, 4, 1)
array([[0, 1, 2, 3, 4],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])

I think your test cases are wrong.
What I understand you mean is that each row and each column have a starting number, what needs to be done is to add the two and generate the matrix according to step.
If the user types in: 1,4,1 and 1,4,1, what he can get is:
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
And if the user types in: 1,4,1 and 2,4,1, what he can get is:
3 4 5
4 5 6
5 6 7
6 7 8
And my code is:
import numpy as np
StartpointRow = int(input("First number of row?:\n"))
EndpointRow = int(input("Last number of row?:\n"))
StepRow = int(input("Which steps should the row have?:\n"))
StartpointCol = int(input("First number of column?:\n"))
EndpointCol = int(input("Last number of column?:\n"))
StepCol = int(input("Which steps should the column have?:\n"))
x = np.array([[i+j for i in range(StartpointCol, EndpointCol + 1 , StepCol)]
for j in range(StartpointRow, EndpointRow + 1, StepRow)])
print(x)

Transforming an array of integers and computing the sum

Suppose we need to transform an array of integers and then compute the sum.
The transformation is the following:
For each integer in the array, subtract the first subsequent integer that is equal or less than its value.
For example, the array:
[6, 1, 3, 4, 6, 2]
becomes
[5, 1, 1, 2, 4, 2]
because
6 > 1 so 6 - 1 = 5
nothing <= to 1 so 1 remains 1
3 > 2 so 3 - 2 = 1
4 > 2 so 4 - 2 = 2
6 > 2 so 6 - 2 = 4
nothing <= to 2 so 2 remains 2
so we sum [5, 1, 1, 2, 4, 2] = 15
I already have the answer below but apparently there is a more optimal method. My answer runs in quadratic time complexity (nested for loop) and I can't figure out how to optimize it.
prices = [6, 1, 3, 4, 6, 2]
results = []
counter = 0
num_prices = len(prices)
for each_item in prices:
flag = True
counter += 1
for each_num in range(counter, num_prices):
if each_item >= prices[each_num] and flag == True:
cost = each_item - prices[each_num]
results.append(cost)
flag = False
if flag == True:
results.append(each_item)
print(sum(results))
Can someone figure out how to answer this question faster than quadratic time complexity? I'm pretty sure this can be done only using 1 for loop but I don't know the data structure to use.
EDIT:
I might be mistaken... I just realized I could have added a break statement after flag = False and that would have saved me from a few unnecessary iterations. I took this question on a quiz and half the test cases said there was a more optimal method. They could have been referring to the break statement so maybe there isn't a faster method than using nested for loop

You can use a stack (implemented using a Python list). The algorithm is linear since each element is compared at most twice (one time with the next element, one time with the next number smaller or equals to it).
def adjusted_total(prices):
stack = []
total_substract = i = 0
n = len(prices)
while i < n:
if not stack or stack[-1] < prices[i]:
stack.append(prices[i])
i += 1
else:
stack.pop()
total_substract += prices[i]
return sum(prices) - total_substract
print(adjusted_total([6, 1, 3, 4, 6, 2]))
Output:
15

a simple way to do it with lists, albeit still quadratic..
p = [6, 1, 3, 4, 6, 2]
out= []
for i,val in zip(range(len(p)),p):
try:
out.append(val - p[[x <= val for x in p[i+1:]].index(True)+(i+1)])
except:
out.append(val)
sum(out) # equals 15
NUMPY APPROACH - honestly don't have alot of programming background so I'm not sure if its linear or not (depending on how the conditional masking works in the background) but still interesting
p = np.array([6, 1, 3, 4, 6, 2])
out = np.array([])
for i,val in zip(range(len(p)),p):
pp = p[i+1:]
try:
new = val - pp[pp<=val][0]
out = np.append(out,new)
except:
out = np.append(out,p[i])
out.sum() #equals 15

Generate lexicographic series efficiently in Python

I want to generate a lexicographic series of numbers such that for each number the sum of digits is a given constant. It is somewhat similar to 'subset sum problem'. For example if I wish to generate 4-digit numbers with sum = 3 then I have a series like:
[3 0 0 0]
[2 1 0 0]
[2 0 1 0]
[2 0 0 1]
[1 2 0 0] ... and so on.
I was able to do it successfully in Python with the following code:
import numpy as np
M = 4 # No. of digits
N = 3 # Target sum
a = np.zeros((1,M), int)
b = np.zeros((1,M), int)
a[0][0] = N
jj = 0
while a[jj][M-1] != N:
ii = M-2
while a[jj][ii] == 0:
ii = ii-1
kk = ii
if kk > 0:
b[0][0:kk-1] = a[jj][0:kk-1]
b[0][kk] = a[jj][kk]-1
b[0][kk+1] = N - sum(b[0][0:kk+1])
b[0][kk+2:] = 0
a = np.concatenate((a,b), axis=0)
jj += 1
for ii in range(0,len(a)):
print a[ii]
print len(a)
I don't think it is a very efficient way (as I am a Python newbie). It works fine for small values of M and N (<10) but really slow beyond that. I wish to use it for M ~ 100 and N ~ 6. How can I make my code more efficient or is there a better way to code it?

Very effective algorithm adapted from Jorg Arndt book "Matters Computational"
(Chapter 7.2 Co-lexicographic order for compositions into exactly k parts)
n = 4
k = 3
x = [0] * n
x[0] = k
while True:
print(x)
v = x[-1]
if (k==v ):
break
x[-1] = 0
j = -2
while (0==x[j]):
j -= 1
x[j] -= 1
x[j+1] = 1 + v
[3, 0, 0, 0]
[2, 1, 0, 0]
[2, 0, 1, 0]
[2, 0, 0, 1]
[1, 2, 0, 0]
[1, 1, 1, 0]
[1, 1, 0, 1]
[1, 0, 2, 0]
[1, 0, 1, 1]
[1, 0, 0, 2]
[0, 3, 0, 0]
[0, 2, 1, 0]
[0, 2, 0, 1]
[0, 1, 2, 0]
[0, 1, 1, 1]
[0, 1, 0, 2]
[0, 0, 3, 0]
[0, 0, 2, 1]
[0, 0, 1, 2]
[0, 0, 0, 3]
Number of compositions and time on seconds for plain Python (perhaps numpy arrays are faster) for n=100, and k = 2,3,4,5 (2.8 ghz Cel-1840)
2 5050 0.040000200271606445
3 171700 0.9900014400482178
4 4421275 20.02204465866089
5 91962520 372.03577995300293
I expect time 2 hours for 100/6 generation
Same with numpy arrays (x = np.zeros((n,), dtype=int)) gives worse results - but perhaps because I don't know how to use them properly
2 5050 0.07999992370605469
3 171700 2.390003204345703
4 4421275 54.74532389640808
Native code (this is Delphi, C/C++ compilers might optimize better) generates 100/6 in 21 seconds
3 171700 0.012
4 4421275 0.125
5 91962520 1.544
6 1609344100 20.748
Cannot go sleep until all measurements aren't done :)
MSVS VC++: 18 seconds! (O2 optimization)
5 91962520 1.466
6 1609344100 18.283
So 100 millions variants per second.
A lot of time is wasted for checking of empty cells (because fill ratio is small). Speed described by Arndt is reached on higher k/n ratios and is about 300-500 millions variants per second:
n=25, k=15 25140840660 60.981 400 millions per second

My recommendations:
Rewrite it as a generator utilizing yield, rather than a loop that concatenates a global variable on each iteration.
Keep a running sum instead of calculating the sum of some subset of the array representation of the number.
Operate on a single instance of your working number representation instead of splicing a copy of it to a temporary variable on each iteration.
Note no particular order is implied.

I have a better solution using itertools as follows,
from itertools import product
n = 4 #number of elements
s = 3 #sum of elements
r = []
for x in range(n):
r.append(x)
result = [p for p in product(r, repeat=n) if sum(p) == s]
print(len(result))
print(result)
I am saying this is better because it took 0.1 secs on my system, while your code with numpy took 0.2 secs.
But as far as n=100 and s=6, this code takes time to go through all the combinations, I think it will take days to compute the results.

I found a solution using itertools as well (Source: https://bugs.python.org/msg144273). Code follows:
import itertools
import operator
def combinations_with_replacement(iterable, r):
# combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC
pool = tuple(iterable)
n = len(pool)
if not n and r:
return
indices = [0] * r
yield tuple(pool[i] for i in indices)
while True:
for i in reversed(range(r)):
if indices[i] != n - 1:
break
else:
return
indices[i:] = [indices[i] + 1] * (r - i)
yield tuple(pool[i] for i in indices)
int_part = lambda n, k: (tuple(map(c.count, range(k))) for c in combinations_with_replacement(range(k), n))
for item in int_part(3,4): print(item)

Counting the number of consecutive values that meets a condition (Pandas Dataframe)

So I created this post regarding my problem 2 days ago and got an answer thankfully.
I have a data made of 20 rows and 2500 columns. Each column is a unique product and rows are time series, results of measurements. Therefore each product is measured 20 times and there are 2500 products.
This time I want to know for how many consecutive rows my measurement result can stay above a specific threshold.
AKA: I want to count the number of consecutive values that is above a value, let's say 5.
A = [1, 2, 6, 8, 7, 3, 2, 3, 6, 10, 2, 1, 0, 2]
We have these values in bold and according to what I defined above, I should get NumofConsFeature = 3 as the result. (Getting the max if there are more than 1 series that meets the condition)
I thought of filtering using .gt, then getting the indexes and using a loop afterwards in order to detect the consecutive index numbers but couldn't make it work.
In 2nd phase, I'd like to know the index of the first value of my consecutive series. For the above example, that would be 3.
But I have no idea of how for this one.
Thanks in advance.

Here's another answer using only Pandas functions:
A = [1, 2, 6, 8, 7, 3, 2, 3, 6, 10, 2, 1, 0, 2]
a = pd.DataFrame(A, columns = ['foo'])
a['is_large'] = (a.foo > 5)
a['crossing'] = (a.is_large != a.is_large.shift()).cumsum()
a['count'] = a.groupby(['is_large', 'crossing']).cumcount(ascending=False) + 1
a.loc[a.is_large == False, 'count'] = 0
which gives
foo is_large crossing count
0 1 False 1 0
1 2 False 1 0
2 6 True 2 3
3 8 True 2 2
4 7 True 2 1
5 3 False 3 0
6 2 False 3 0
7 3 False 3 0
8 6 True 4 2
9 10 True 4 1
10 2 False 5 0
11 1 False 5 0
12 0 False 5 0
13 2 False 5 0
From there on you can easily find the maximum and its index.

There is simple way to do that.
Lets say your list is like: A = [1, 2, 6, 8, 7, 6, 8, 3, 2, 3, 6, 10,6,7,8, 2, 1, 0, 2]
And you want to find how many consecutive series that has values bigger than 6 and length of 5. For instance, here your answer is 2. There is two series that has values bigger than 6 and length of the series are 5. In python and pandas we do that like below:
condition = (df.wanted_row > 6) & \
(df.wanted_row.shift(-1) > 6) & \
(df.wanted_row.shift(-2) > 6) & \
(df.wanted_row.shift(-3) > 6) & \
(df.wanted_row.shift(-4) > 6)
consecutive_count = df[condition].count().head(1)[0]

Here's one with maxisland_start_len_mask -
# https://stackoverflow.com/a/52718782/ #Divakar
def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
# a is a boolean array
pad = np.zeros(a.shape[1],dtype=bool)
mask = np.vstack((pad, a, pad))
mask_step = mask[1:] != mask[:-1]
idx = np.flatnonzero(mask_step.T)
island_starts = idx[::2]
island_lens = idx[1::2] - idx[::2]
n_islands_percol = mask_step.sum(0)//2
bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
scale = island_lens.max()+1
scaled_idx = np.argsort(scale*bins + island_lens)
grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]
max_island_percol_start = max_island_starts%(a.shape[0]+1)
valid = n_islands_percol!=0
cut_idx = grp_shift_idx[:-1][valid]
max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)
out_len = np.full(a.shape[1], fillna_len, dtype=int)
out_len[valid] = max_island_percol_len
out_index = np.where(valid,max_island_percol_start,fillna_index)
return out_index, out_len
def maxisland_start_len(a, trigger_val, comp_func=np.greater):
# a is 2D array as the data
mask = comp_func(a,trigger_val)
return maxisland_start_len_mask(mask, fillna_index = -1, fillna_len = 0)
Sample run -
In [169]: a
Out[169]:
array([[ 1, 0, 3],
[ 2, 7, 3],
[ 6, 8, 4],
[ 8, 6, 8],
[ 7, 1, 6],
[ 3, 7, 8],
[ 2, 5, 8],
[ 3, 3, 0],
[ 6, 5, 0],
[10, 3, 8],
[ 2, 3, 3],
[ 1, 7, 0],
[ 0, 0, 4],
[ 2, 3, 2]])
# Per column results
In [170]: row_index, length = maxisland_start_len(a, 5)
In [172]: row_index
Out[172]: array([2, 1, 3])
In [173]: length
Out[173]: array([3, 3, 4])

You can apply diff() on your Series, and then just count the number of consecutive entries where the difference is 1 and the actual value is above your cutoff. The largest count is the maximum number of consecutive values.
First compute diff():
df = pd.DataFrame({"a":[1, 2, 6, 7, 8, 3, 2, 3, 6, 10, 2, 1, 0, 2]})
df['b'] = df.a.diff()
df
a b
0 1 NaN
1 2 1.0
2 6 4.0
3 7 1.0
4 8 1.0
5 3 -5.0
6 2 -1.0
7 3 1.0
8 6 3.0
9 10 4.0
10 2 -8.0
11 1 -1.0
12 0 -1.0
13 2 2.0
Now count consecutive sequences:
above = 5
n_consec = 1
max_n_consec = 1
for a, b in df.values[1:]:
if (a > above) & (b == 1):
n_consec += 1
else: # check for new max, then start again from 1
max_n_consec = max(n_consec, max_n_consec)
n_consec = 1
max_n_consec
3

Here's how I did it using numpy:
import pandas as pd
import numpy as np
df = pd.DataFrame({"a":[1, 2, 6, 7, 8, 3, 2, 3, 6, 10, 2, 1, 0, 2]})
consecutive_steps = 2
marginal_price = 5
assertions = [(df.loc[:, "a"].shift(-i) < marginal_price) for i in range(consecutive_steps)]
condition = np.all(assertions, axis=0)
consecutive_count = df.loc[condition, :].count()
print(consecutive_count)
which yields 6.

Decimal expansion based on slot length summation

I'm trying to create an algorithm to produce a decimal number by certain way.
a) I have an initial number say i = 2.
b) Then I have an incremental addition method, say f(n) { n * 2 }.
c) Then I have a slot length for digits say l = 2, that creates front zeros for small numbers and limits max length of the longer numbers. 2 becomes 02, 64 is 64, but 512 = (5)12 where 5 is moved backward on previous slot
d) Max slots is the fourth parameter, m = 10
e) Finally I want to compute value by summing up digit from slots and using it as a decimal part of the 0.
So with given example:
i=2
f(n)=n*2
l=2
m=10
outcome should be produced in this manner:
step 1)
02 04 08 16 32 64 128 256 512 1024
step 2)
02 04 08 16 32 64
1 28
2 56
5 12
10 24
->
slot: 1 2 3 4 5 6 7 8 9 10
computed: 02 04 08 16 32 65 30 61 22 24
step 3)
I have a number: 02040816326530612224 or 0.02040816326530612224 as stated on part e).
Note that if max slot is bigger in this example, then numbers on slots 9 and 10 will change. I also want to have part b) as a function, so I can change it to other like fib(nx) {n1+n2}.
I prefer Python as a computer language for algo, but anything that is easy to transform to Python is acceptable.
ADDED
This is a function I have managed to create so far:
# l = slot length, doesnt work with number > 2...
def comp(l = 2):
a = []
# how to pass a function, that produces this list?
b = [[0, 2], [0, 4], [0, 8], [1, 6], [3, 2], [6, 4], [1, 2, 8], [2, 5, 6], [5, 1, 2], [1, 0, 2, 4], [2, 0, 4, 8]]
r = 0
# main algo
for bb in b:
ll = len(bb)
for i in range(0, ll):
x = r + i - ll + l
# is there a better way to do following try except part?
try:
a[x] += bb[i]
except IndexError:
a.append(bb[i])
# moving bits backward, any better way to do this?
s = a[x] - 9
d = 0
while s > 0:
d += 1
a[x] -= 10
a[x-d] += 1
s = a[x-d] - 9
r += l
return '0.' + ''.join(map(str, a))

def doub(n):
return pow(2, n)
def fibo(n):
a, b = 0, 1
for i in range(n):
a, b = b, a + b
return a
def fixed_slot_numbers(f, l, m):
b = []
for n in range(1, m):
a = [int(c) for c in str(f(n))]
while len(a) < l:
a.insert(0, 0)
b.append(a)
return b
def algo(function, fixed_slot_length = 2, max_slots = 12):
a = []
slot_numbers = fixed_slot_numbers(function, fixed_slot_length, max_slots)
for r, b in enumerate(slot_numbers):
r *= fixed_slot_length
slot_length = len(b)
for bidx in range(0, slot_length):
aidx = r + bidx - slot_length + fixed_slot_length
try:
a[aidx] += b[bidx]
except IndexError:
a.append(b[bidx])
d = 0
while a[aidx-d] > 9:
a[aidx-d] -= 10
d += 1
a[aidx-d] += 1
return '0.%s' % ''.join(map(str, a))
algo(doub, 2, 28) -> 0.020408163265306122448979591836734693877551020405424128 = 1/49
algo(fibo, 1, 28) -> 0.112359550561797752808950848 = 10/89

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Group a numpy array - python

I have an one-dimensional array A, such that 0 <= A[i] <= 11, and I want to map A to an array B such that for i in range(len(A)): if 0 <= A[i] <= 2: B[i] = 0 elif 3 <= A[i] <= 5: B[i] = 1 elif 6 <= A[i] <= 8: B[i] = 2 elif 9 <= A[i] <= 11: B[i] = 3 How can implement this efficiently in numpy?

You need to use an int division by //3, and that is the most performant solution A = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) B = A // 3 print(A) # [0 1 2 3 4 5 6 7 8 9 10 11] print(B) # [0 0 0 1 1 1 2 2 2 3 3 3]

Answers provided by others are valid, however I find this function from numpy quite elegant, plus it allows you to avoid for loop which could be quite inefficient for large arrays import numpy as np bins = [3, 5, 8, 9, 11] B = np.digitize(A, bins)

Something like this might work: C = np.zeros(12, dtype=np.int) C[3:6] = 1 C[6:9] = 2 C[9:12] = 3 B = C[A]

Related

How can i build matrices in numpy

Transforming an array of integers and computing the sum

Generate lexicographic series efficiently in Python

Counting the number of consecutive values that meets a condition (Pandas Dataframe)

Decimal expansion based on slot length summation

Categories

Resources