I want to build a matrix in NumPy in which the items add up to each other. So I have tried to build it with the following code:
StartpointRow = int(input("First number of row?:\n"))
EndpointRow = int(input("Last number of row?:\n"))
StepRow = int(input("Which steps should the row have?:\n"))
StartpointCol = int(input("First number of column?:\n"))
EndpointCol = int(input("Last number of column?:\n"))
StepCol = int(input("Which steps should the column have?:\n"))
x = np.array([[i+j for i in range(StartpointCol, EndpointCol , StepCol)]
for j in range(StartpointRow, EndpointRow , StepRow)])
print(x)
let's say that, for instance, I enter 1,4,1 and 1,4,1. I want the solution to be a matrix like this:
1 2 3 4
2 4 5 6
3 5 6 7
4 6 7 8
Not like that:
2 3 4
3 4 5
4 5 6
or If the user types in: 1,4,1 and 2,4,1.
0 1 2 3 4
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
not like that:
3 4
4 5
5 6
Could you help me out?
Use np.add.outer:
def build(r_start, r_stop, r_step, c_start, c_stop, c_step):
r = np.arange(r_start, r_stop + 1, r_step)
c = np.arange(c_start, c_stop + 1, c_step)
if r_start == c_start:
ret = np.empty((c.size, r.size), int)
ret[:, 0] = c
ret[0, :] = r
else:
ret = np.empty((c.size + 1, r.size + 1), int)
ret[0, 0] = 0
ret[1:, 0] = c
ret[0, 1:] = r
np.add.outer(ret[1:, 0], ret[0, 1:], out=ret[1:, 1:])
return ret
A little simplification:
def build(r_start, r_stop, r_step, c_start, c_stop, c_step):
r = np.arange(r_start, r_stop + 1, r_step)
c = np.arange(c_start, c_stop + 1, c_step)
ne = int(r_start != c_start)
ret = np.empty((c.size + ne, r.size + ne), int)
ret[0, 0] = 0
ret[ne:, 0] = c
ret[0, ne:] = r
np.add.outer(ret[1:, 0], ret[0, 1:], out=ret[1:, 1:])
return ret
Test:
>>> build(1, 4, 1, 1, 4, 1)
array([[1, 2, 3, 4],
[2, 4, 5, 6],
[3, 5, 6, 7],
[4, 6, 7, 8]])
>>> build(1, 4, 1, 2, 4, 1)
array([[0, 1, 2, 3, 4],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
I think your test cases are wrong.
What I understand you mean is that each row and each column have a starting number, what needs to be done is to add the two and generate the matrix according to step.
If the user types in: 1,4,1 and 1,4,1, what he can get is:
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
And if the user types in: 1,4,1 and 2,4,1, what he can get is:
3 4 5
4 5 6
5 6 7
6 7 8
And my code is:
import numpy as np
StartpointRow = int(input("First number of row?:\n"))
EndpointRow = int(input("Last number of row?:\n"))
StepRow = int(input("Which steps should the row have?:\n"))
StartpointCol = int(input("First number of column?:\n"))
EndpointCol = int(input("Last number of column?:\n"))
StepCol = int(input("Which steps should the column have?:\n"))
x = np.array([[i+j for i in range(StartpointCol, EndpointCol + 1 , StepCol)]
for j in range(StartpointRow, EndpointRow + 1, StepRow)])
print(x)
Related
How to make a multiplication chart with nested lists and for ? I need to all numbers from first list multiply to from second list
chart = [
[],
[],
]
for i in range(1,len(chart)+1):
for j in range(i,i*len(chart)+1):
print(f'{i} * {j} = {i*j}')
In python positions of elements in a list start from 0.
The chart list contains 2 lists:
chart[0] = [1, 2, 3, 4, 5]
chart[1] = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
What you want to do is access the elements of the first list and multiply by elements of the second list.
for i in range(len(chart[0])): # range(5) => 0, 1, 2, 3, 4
for j in range(len(chart[1])): # range(10) => 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
print(f'{chart[0][i]} * {chart[1][j]} = {chart[0][i] * chart[1][j]}
itertools.product will produce the desired output, creating 2-tuples consisting of one element from the first list and one element from the second list:
import itertools
chart = [
[1,2,3,4,5],
[1,2,3,4,5,6,7,8,9,10],
]
for i, j in itertools.product(*chart):
print(f'{i} * {j} = {i*j}')
Use a cartesian product:
chart = [[1,2,3,4,5],[1,2,3,4,5,6,7,8,9,10]]
>>> print('\n'.join([f'{i} * {j} = {i*j}' for i in chart[0] for j in chart[1]]))
1 * 1 = 1
1 * 2 = 2
1 * 3 = 3
1 * 4 = 4
...
4 * 10 = 40
5 * 1 = 5
5 * 2 = 10
5 * 3 = 15
5 * 4 = 20
5 * 5 = 25
5 * 6 = 30
5 * 7 = 35
5 * 8 = 40
5 * 9 = 45
5 * 10 = 50
I have an one-dimensional array A, such that 0 <= A[i] <= 11, and I want to map A to an array B such that
for i in range(len(A)):
if 0 <= A[i] <= 2: B[i] = 0
elif 3 <= A[i] <= 5: B[i] = 1
elif 6 <= A[i] <= 8: B[i] = 2
elif 9 <= A[i] <= 11: B[i] = 3
How can implement this efficiently in numpy?
You need to use an int division by //3, and that is the most performant solution
A = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
B = A // 3
print(A) # [0 1 2 3 4 5 6 7 8 9 10 11]
print(B) # [0 0 0 1 1 1 2 2 2 3 3 3]
I would do something like dividing the values of the A[i] by 3 'cause you're sorting out them 3 by 3, 0-2 divided by 3 go answer 0, 3-5 go answer 1, 6-8 divided by 3 is equal to 2, and so on
I built a little schema here:
A[i] --> 0-2. divided by 3 = 0, what you wnat in array B[i] is 0, so it's ok
A[i] --> 3-5. divided by 3 = 1, and so on. Just use a method to make floor the value, so that it don't become float type.
Answers provided by others are valid, however I find this function from numpy quite elegant, plus it allows you to avoid for loop which could be quite inefficient for large arrays
import numpy as np
bins = [3, 5, 8, 9, 11]
B = np.digitize(A, bins)
Something like this might work:
C = np.zeros(12, dtype=np.int)
C[3:6] = 1
C[6:9] = 2
C[9:12] = 3
B = C[A]
If you hope to expand this to a more complex example you can define a function with all your conditions:
def f(a):
if 0 <= a and a <= 2:
return 0
elif 3 <= a and a <= 5:
return 1
elif 6 <= a and a <= 8:
return 2
elif 9 <= a and a <= 11:
return 3
And call it on your array A:
A = np.array([0,1,5,7,8,9,10,10, 11])
B = np.array(list(map(f, A))) # array([0, 0, 1, 2, 2, 3, 3, 3, 3])
So I created this post regarding my problem 2 days ago and got an answer thankfully.
I have a data made of 20 rows and 2500 columns. Each column is a unique product and rows are time series, results of measurements. Therefore each product is measured 20 times and there are 2500 products.
This time I want to know for how many consecutive rows my measurement result can stay above a specific threshold.
AKA: I want to count the number of consecutive values that is above a value, let's say 5.
A = [1, 2, 6, 8, 7, 3, 2, 3, 6, 10, 2, 1, 0, 2]
We have these values in bold and according to what I defined above, I should get NumofConsFeature = 3 as the result. (Getting the max if there are more than 1 series that meets the condition)
I thought of filtering using .gt, then getting the indexes and using a loop afterwards in order to detect the consecutive index numbers but couldn't make it work.
In 2nd phase, I'd like to know the index of the first value of my consecutive series. For the above example, that would be 3.
But I have no idea of how for this one.
Thanks in advance.
Here's another answer using only Pandas functions:
A = [1, 2, 6, 8, 7, 3, 2, 3, 6, 10, 2, 1, 0, 2]
a = pd.DataFrame(A, columns = ['foo'])
a['is_large'] = (a.foo > 5)
a['crossing'] = (a.is_large != a.is_large.shift()).cumsum()
a['count'] = a.groupby(['is_large', 'crossing']).cumcount(ascending=False) + 1
a.loc[a.is_large == False, 'count'] = 0
which gives
foo is_large crossing count
0 1 False 1 0
1 2 False 1 0
2 6 True 2 3
3 8 True 2 2
4 7 True 2 1
5 3 False 3 0
6 2 False 3 0
7 3 False 3 0
8 6 True 4 2
9 10 True 4 1
10 2 False 5 0
11 1 False 5 0
12 0 False 5 0
13 2 False 5 0
From there on you can easily find the maximum and its index.
There is simple way to do that.
Lets say your list is like: A = [1, 2, 6, 8, 7, 6, 8, 3, 2, 3, 6, 10,6,7,8, 2, 1, 0, 2]
And you want to find how many consecutive series that has values bigger than 6 and length of 5. For instance, here your answer is 2. There is two series that has values bigger than 6 and length of the series are 5. In python and pandas we do that like below:
condition = (df.wanted_row > 6) & \
(df.wanted_row.shift(-1) > 6) & \
(df.wanted_row.shift(-2) > 6) & \
(df.wanted_row.shift(-3) > 6) & \
(df.wanted_row.shift(-4) > 6)
consecutive_count = df[condition].count().head(1)[0]
Here's one with maxisland_start_len_mask -
# https://stackoverflow.com/a/52718782/ #Divakar
def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
# a is a boolean array
pad = np.zeros(a.shape[1],dtype=bool)
mask = np.vstack((pad, a, pad))
mask_step = mask[1:] != mask[:-1]
idx = np.flatnonzero(mask_step.T)
island_starts = idx[::2]
island_lens = idx[1::2] - idx[::2]
n_islands_percol = mask_step.sum(0)//2
bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
scale = island_lens.max()+1
scaled_idx = np.argsort(scale*bins + island_lens)
grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]
max_island_percol_start = max_island_starts%(a.shape[0]+1)
valid = n_islands_percol!=0
cut_idx = grp_shift_idx[:-1][valid]
max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)
out_len = np.full(a.shape[1], fillna_len, dtype=int)
out_len[valid] = max_island_percol_len
out_index = np.where(valid,max_island_percol_start,fillna_index)
return out_index, out_len
def maxisland_start_len(a, trigger_val, comp_func=np.greater):
# a is 2D array as the data
mask = comp_func(a,trigger_val)
return maxisland_start_len_mask(mask, fillna_index = -1, fillna_len = 0)
Sample run -
In [169]: a
Out[169]:
array([[ 1, 0, 3],
[ 2, 7, 3],
[ 6, 8, 4],
[ 8, 6, 8],
[ 7, 1, 6],
[ 3, 7, 8],
[ 2, 5, 8],
[ 3, 3, 0],
[ 6, 5, 0],
[10, 3, 8],
[ 2, 3, 3],
[ 1, 7, 0],
[ 0, 0, 4],
[ 2, 3, 2]])
# Per column results
In [170]: row_index, length = maxisland_start_len(a, 5)
In [172]: row_index
Out[172]: array([2, 1, 3])
In [173]: length
Out[173]: array([3, 3, 4])
You can apply diff() on your Series, and then just count the number of consecutive entries where the difference is 1 and the actual value is above your cutoff. The largest count is the maximum number of consecutive values.
First compute diff():
df = pd.DataFrame({"a":[1, 2, 6, 7, 8, 3, 2, 3, 6, 10, 2, 1, 0, 2]})
df['b'] = df.a.diff()
df
a b
0 1 NaN
1 2 1.0
2 6 4.0
3 7 1.0
4 8 1.0
5 3 -5.0
6 2 -1.0
7 3 1.0
8 6 3.0
9 10 4.0
10 2 -8.0
11 1 -1.0
12 0 -1.0
13 2 2.0
Now count consecutive sequences:
above = 5
n_consec = 1
max_n_consec = 1
for a, b in df.values[1:]:
if (a > above) & (b == 1):
n_consec += 1
else: # check for new max, then start again from 1
max_n_consec = max(n_consec, max_n_consec)
n_consec = 1
max_n_consec
3
Here's how I did it using numpy:
import pandas as pd
import numpy as np
df = pd.DataFrame({"a":[1, 2, 6, 7, 8, 3, 2, 3, 6, 10, 2, 1, 0, 2]})
consecutive_steps = 2
marginal_price = 5
assertions = [(df.loc[:, "a"].shift(-i) < marginal_price) for i in range(consecutive_steps)]
condition = np.all(assertions, axis=0)
consecutive_count = df.loc[condition, :].count()
print(consecutive_count)
which yields 6.
import pandas as pd, numpy as np
ltlist = [1, 2]
org = {'ID': [1, 3, 4, 5, 6, 7], 'ID2': [3, 4, 5, 6, 7, 2]}
ltlist_set = set(ltlist)
org['LT'] = np.where(org['ID'].isin(ltlist_set), org['ID'], 0)
I'll need to check the ID2 column and write the ID in, unless it already has an ID.
output
ID ID2 LT
1 3 1
3 4 0
4 5 0
5 6 0
6 7 0
7 2 2
Thanks!
Option 1
You can nest numpy.where statements:
org['LT'] = np.where(org['ID'].isin(ltlist_set), 1,
np.where(org['ID2'].isin(ltlist_set), 2, 0))
Option 2
Alternatively, you can use pd.DataFrame.loc sequentially:
org['LT'] = 0 # default value
org.loc[org['ID2'].isin(ltlist_set), 'LT'] = 2
org.loc[org['ID'].isin(ltlist_set), 'LT'] = 1
Option 3
A third option is to use numpy.select:
conditions = [org['ID'].isin(ltlist_set), org['ID2'].isin(ltlist_set)]
values = [1, 2]
org['LT'] = np.select(conditions, values, 0) # 0 is default value
I have json records in the file json_data. I used pd.DataFrame(json_data) to make a new table, pd_json_data, using these records.
pandas table pd_json_data
I want to manipulate pd_json_data to return a new table with primary key (url,hour), and then a column updated that contains a boolean value.
hour is based on the number of checks. For example, if number of checks contains 378 at row 0, the new table should have the numbers 1 through 378 in hour, with True in updated if the number in hour is a number in positive checks.
Any ideas for how I should approach this?
Updated Answer
Make fake data
df = pd.DataFrame({'number of checks': [5, 10, 300, 8],
'positive checks':[[1,3,10], [10,11], [9,200], [1,8,7]],
'url': ['a', 'b', 'c', 'd']})
Output
number of checks positive checks url
0 5 [1, 3, 10] a
1 10 [10, 11] b
2 300 [9, 200] c
3 8 [1, 8, 7] d
Iterate and create new dataframes, then concatenate
dfs = []
for i, row in df.iterrows():
hour = np.arange(1, row['number of checks'] + 1)
df_cur = pd.DataFrame({'hour' : hour,
'url': row['url'],
'updated': np.in1d(hour, row['positive checks'])})
dfs.append(df_cur)
df_final = pd.concat(dfs)
hour updated url
0 1 True a
1 2 False a
2 3 True a
3 4 False a
4 5 False a
0 1 False b
1 2 False b
2 3 False b
3 4 False b
4 5 False b
5 6 False b
6 7 False b
7 8 False b
8 9 False b
9 10 True b
0 1 False c
1 2 False c
Old answer
Now build new dataframe
df1 = df[['url']].copy()
df1['hour'] = df['number of checks'].map(lambda x: list(range(1, x + 1)))
df1['updated'] = df.apply(lambda x: x['number of checks'] in x['positive checks'], axis=1)
Output
url hour updated
0 a [1, 2, 3, 4, 5] False
1 b [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] True
2 c [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,... False
3 d [1, 2, 3, 4, 5, 6, 7, 8] True