Python: Group output of Column based on a condition

Python: Group output of Column based on a condition - python

#code source
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=50,
n_features=6,
n_informative=3,
n_classes=2,
random_state=10,
shuffle=True)
# Creating a dataFrame
df = pd.DataFrame({'Feature 1':X[:,0],
'Feature 2':X[:,1],
'Feature 3':X[:,2],
'Feature 4':X[:,3],
'Feature 5':X[:,4],
'Feature 6':X[:,5],
'Class':y})
values = [i for i,x in enumerate(df['Class']) if x == 0]
print(values)
The output is
[5, 6, 9, 11, 13, 14, 17, 18, 20, 21, 23, 24, 25, 26, 27, 31, 32, 34,
41, 42, 44, 45, 46, 47, 49]
I am trying to group the above output based on the condition that numbers come in concurrent value . Such as the output should be:
Group 1: 5,6
Group 2: 9
Group 3: 11
Group 4: 13,14
..
..
Group n: 23,24,25,26,27
I am grouping them to have an understanding of the gaps in the column, instead of having a slab of values following each other in a list.

I think need Series, get differences by diff, compare by gt and last create groups by cumsum to new Series which is used as by argument of groupby:
values = [5, 6, 9, 11, 13, 14, 17, 18, 20, 21, 23,
24, 25, 26, 27, 31, 32, 34, 41, 42, 44, 45, 46, 47, 49]
s = pd.Series(values)
s1 = s.groupby(s.diff().gt(1).cumsum() + 1).apply(lambda x: ','.join(x.astype(str)))
print (s1)
1 5,6
2 9
3 11
4 13,14
5 17,18
6 20,21
7 23,24,25,26,27
8 31,32
9 34
10 41,42
11 44,45,46,47
12 49
dtype: object

Related

Matlab to Python - extracting lower subdiagonal triangle, why different order?

I am translating code from MATLAB to Python. I need to extract the lower subdiagonal values of a matrix. My attempt in python seems to extract the same values (sum is equal), but in different order. This is a problem as I need to apply corrcoef after.
The original Matlab code is using an array of indices to subset a matrix.
MATLAB code:
values = 1:100;
matrix = reshape(values,[10,10]);
subdiag = find(tril(ones(10),-1));
matrix_subdiag = matrix(subdiag);
subdiag_sum = sum(matrix_subdiag);
disp(matrix_subdiag(1:10))
disp(subdiag_sum)
Output:
2
3
4
5
6
7
8
9
10
13
1530
My attempt in Python
import numpy as np
matrix = np.arange(1,101).reshape(10,10)
matrix_t = matrix.T #to match MATLAB arrangement
matrix_subdiag = matrix_t[np.tril_indices((10), k = -1)]
subdiag_sum = np.sum(matrix_subdiag)
print(matrix_subdiag[0:10], subdiag_sum))
Output:
[2 3 13 4 14 24 5 15 25 35] 1530
How do I get the same order output? Where is my error?
Thank you!

For the sum use directly numpy.triu on the non-transposed matrix:
S = np.triu(matrix, k=1).sum()
# 1530
For the indices, numpy.triu_indices_from and slicing as a flattened array:
idx = matrix[np.triu_indices_from(matrix, k=1)]
output:
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17, 18, 19, 20,
24, 25, 26, 27, 28, 29, 30, 35, 36, 37, 38, 39, 40, 46, 47, 48, 49,
50, 57, 58, 59, 60, 68, 69, 70, 79, 80, 90])

creating a range of numbers in pandas based on single column

I have a pandas dataframe:
df2 = pd.DataFrame({'ID':['A','B','C','D','E'], 'loc':['Lon','Tok','Ber','Ams','Rom'], 'start':[20,10,30,40,43]})
ID loc start
0 A Lon 20
1 B Tok 10
2 C Ber 30
3 D Ams 40
4 E Rom 43
I'm looking to add in a column called range which takes the value in 'start' and produces a range of values which (including the initial value) are 10 less than the initial value, all in the same row.
The desired output:
ID loc start range
0 A Lon 20 20,19,18,17,16,15,14,13,12,11,10
1 B Tok 10 10,9,8,7,6,5,4,3,2,1,0
2 C Ber 30 30,29,28,27,26,25,24,23,22,21,20
3 D Ams 40 40,39,38,37,36,35,34,33,32,31,30
4 E Rom 43 43,42,41,40,39,38,37,36,35,34,33
I have tried:
df2['range'] = [i for i in range(df2.start, df2.start -10)]
and
def create_range2(row):
return df2['start'].between(df2.start, df2.start - 10)
df2.loc[:, 'range'] = df2.apply(create_range2, axis = 1)
however I can't seem to get the desired output. I intend to apply this solution to multiple dataframes, one of which has > 2,000,000 rows.
thanks

You might prepare range creating function and .apply it to start column following way:
import pandas as pd
df2 = pd.DataFrame({'ID':['A','B','C','D','E'], 'loc':['Lon','Tok','Ber','Ams','Rom'], 'start':[20,10,30,40,43]})
def make_10(x):
return list(range(x, x-10-1, -1))
df2["range"] = df2["start"].apply(make_10)
print(df2)
output
ID loc start range
0 A Lon 20 [20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10]
1 B Tok 10 [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
2 C Ber 30 [30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20]
3 D Ams 40 [40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30]
4 E Rom 43 [43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33]
Explanation: .apply method of pandas.Series (column of pandas.DataFrame) accept function which is applied element-wise. Note that there is -1 in range as it is inclusive-exclusive and -1 as step size as you want to have descending values.

does this work?
df2['range'] = df2.apply(lambda row: list(range(row['start'],row['start']-11,-1)),axis=1)
df2
output
ID loc start range
0 A Lon 20 [20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10]
1 B Tok 10 [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
2 C Ber 30 [30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20]
3 D Ams 40 [40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30]
4 E Rom 43 [43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33]
or if you want comma-separated:
df2['range'] = df2.apply(lambda row: ','.join([str(v) for v in range(row['start'],row['start']-11,-1)]),axis=1)
to get
ID loc start range
0 A Lon 20 20,19,18,17,16,15,14,13,12,11,10
1 B Tok 10 10,9,8,7,6,5,4,3,2,1,0
2 C Ber 30 30,29,28,27,26,25,24,23,22,21,20
3 D Ams 40 40,39,38,37,36,35,34,33,32,31,30
4 E Rom 43 43,42,41,40,39,38,37,36,35,34,33

How to find average, Max and largest(similar to excel function) in a list in python?

I have a list of numbers and from this list, I want to create 3 more lists that contain the maximum, average, and 5th largest number from it. My original list overdraw is the block of lists, which means it has sub-blocks in it and each block has 6 numbers in it and there are a total of 3 blocks or 6x3 matrix or array.
overdraw:
[[16,13,23,14,33,45],[23,11,54,34,23,76],[22,54,34,43,41,11]]
I know how to calculate max, average and 5 largest in this list. But I want a answer in a specific way like I know the max, average, and 5th largest values of each block but I want them to get printed 4 times. I know all the values:
Max = [45, 76, 54]
Average = [24, 37, 34]
Largest(5th) = [14, 23, 22]
my approach:
overdraw = [[16,13,23,14,33,45],[23,11,54,34,23,76],[22,54,34,43,41,11]]
x = [sorted(block, reverse=True) for block in overdraw] # first sort the whole list
max = [x[i][0] for i in range(0, len(x))] # for max
largest = [x[i][4] for i in range(0, len(x))] #5th largest
average = [sum(x[i])/len(x[i]) for i in range(0, len(x))] #average
print("max: ", max)
print("5th largest: ", largest)
print("average: ", average)
You will get the same output after running this code but I want output in this format:
Average = [24, 24, 24, 24, 37, 37, 37, 37, 34, 34, 34, 34]
Max = [45, 45, 45, 45, 76, 76, 76, 76, 54, 54, 54, 54]
Largest(5th) = [14, 14, 14, 14, 23, 23, 23, 23, 22, 22, 22, 22]
As you can see each average, max, and the largest number is printed 4 times in their respective list. So can anyone help with this answer?

What about using pandas.DataFrame.explode
import pandas as pd
df = pd.DataFrame({
'OvIdx' : 3 * [range(4)],
'Average' : average,
'Max' : max, # should be renamed/assigned as max_ instead
'Largest(5th)': largest
}).explode('OvIdx').set_index('OvIdx').astype(int)
print(df)
which shows
Average Max Largest(5th)
OvIdx
0 24 45 14
1 24 45 14
2 24 45 14
3 24 45 14
0 36 76 23
1 36 76 23
2 36 76 23
3 36 76 23
0 34 54 22
1 34 54 22
2 34 54 22
3 34 54 22
from here, you can still do all the calculations you want and/or getting a NumPy array, doing df.values.
Following your comment, you can also get your column(s) as individual entities, doing, e.g.
>>> df.Average.tolist()
[24, 24, 24, 24, 36, 36, 36, 36, 34, 34, 34, 34]
>>> df.Max.tolist()
[45, 45, 45, 45, 76, 76, 76, 76, 54, 54, 54, 54]
>>> df['Largest(5th)'].tolist() # as string key since the name is a little bit exotic
[14, 14, 14, 14, 23, 23, 23, 23, 22, 22, 22, 22]
which approach starts to be a little bit overkilled, readable though.

A solution that returns lists like you specified
import itertools
import numpy as np
n_times = 4
overdraw = [[16,13,23,14,33,45],[23,11,54,34,23,76],[22,54,34,43,41,11]]
y = [sorted(block, reverse=True) for block in overdraw]
maximum = list(itertools.chain(*[[max(x)]*n_times for x in y]))
average = list(itertools.chain(*[[int(round(sum(x)/len(x)))]*n_times for x in y]))
fifth_largest = list(itertools.chain(*[[x[4]]*n_times for x in y]))
print(f"Average = {average}")
print(f"Max = {maximum}")
print(f"Largest(5th): {fifth_largest}")
Outputs:
Average = [24, 24, 24, 24, 37, 37, 37, 37, 34, 34, 34, 34]
Max = [45, 45, 45, 45, 76, 76, 76, 76, 54, 54, 54, 54]
Largest(5th): [14, 14, 14, 14, 23, 23, 23, 23, 22, 22, 22, 22]

Find largest value from multiple colums in each group of row index in Python, arrange those values diagonally in matrix, and find determinant

I am new to Python. I want to find the largest values from all the columns for repetitive row indexes (i.e. 5 to 130), and also show its row and column index label in output.The largest values should be absolute. (Irrespective of + or - sign). There should not be duplicates for row indexes in different groups.
After finding largest from each group,I want to arrange those values diagonally in square matrix. Then fill the remaining array with the corresponding values of indexes for each group from the main dataframe and find its Determinant.
df=pd.DataFrame(
{'0_deg': [43, 50, 45, -17, 5, 19, 11, 32, 36, 41, 19, 11, 32, 36, 1, 19, 7, 1, 36, 10],
'10_deg': [47, 41, 46, -18, 4, 16, 12, 34, -52, 31, 16, 12, 34, -71, 2, 9, 52, 34, -6, 9],
'20_deg': [46, 43, -56, 29, 6, 14, 13, 33, 43, 6, 14, 13, 37, 43, 3, 14, 13, 25, 40, 8],
'30_deg': [-46, 16, -40, -11, 9, 15, 33, -39, -22, 21, 15, 63, -39, -22, 4, 6, 25, -39, -22, 7]
}, index=[5, 10, 12, 101, 130, 5, 10, 12, 101, 130, 5, 10, 12, 101, 130, 5, 10, 12, 101, 130]
)
Data set :
Expected Output:
My code is showing only till output 1.
Actual Output:
Code:
df = pd.read_csv ('Matrixfile.csv')
df = df.set_index('Number')
def f(x):
x1 = x.abs().stack()
x2 = x.stack()
x = x2.iloc[np.argsort(-x1)].head(1)
return x
groups = (df.index == 5).cumsum()
df1 = df.groupby(groups).apply(f).reset_index(level=[1,2])
df1.columns = ['Number','Angle','Value']
print (df1)
df1.to_csv('Matrix_OP.csv', encoding='utf-8', index=True)

I am not sure about #piRSquared output from what I understood from your question. There might be some errors in there, for instance, in group 2, max(abs(values)) = 52 (underline in red in picture) but 41 is displayed on left...
Here is a less elegant way of doing it but maybe easier for you to understand :
import numpy as np
# INPUT
data_dict ={'0_deg': [43, 50, 45, -17, 5, 19, 11, 32, 36, 41, 19, 11, 32, 36, 1, 19, 7, 1, 36, 10],
'10_deg': [47, 41, 46, -18, 4, 16, 12, 34, -52, 31, 16, 12, 34, -71, 2, 9, 52, 34, -6, 9],
'20_deg': [46, 43, -56, 29, 6, 14, 13, 33, 43, 6, 14, 13, 37, 43, 3, 14, 13, 25, 40, 8],
'30_deg': [-46, 16, -40, -11, 9, 15, 33, -39, -22, 21, 15, 63, -39, -22, 4, 6, 25, -39, -22, 7],
}
# Row idx of a group in this list
idx = [5, 10, 12, 101, 130]
# Getting some dimensions and sorting the data
row_idx_length = len(idx)
group_length = len(data_dict['0_deg'])
number_of_groups = len(data_dict.keys())
idx = idx*number_of_groups
data_arr = np.zeros((group_length,number_of_groups),dtype=np.int32)
#
col = 0
keys = []
for key in sorted(data_dict):
data_arr[:,col] = data_dict[key]
keys.append(key)
col+=1
def get_extrema_value_group(arr):
# function to find absolute extrema value of a 2d array
extrema = 0
for i in range(0, len(arr)):
max_value = max(arr[i])
min_value = min(arr[i])
if (abs(min_value) > max_value) and (abs(extrema) < abs(min_value)):
extrema = min_value
elif (abs(min_value) < max_value) and (abs(extrema) < max_value):
extrema = max_value
return extrema
# For output 1
max_values = []
for i in range(0,row_idx_length*number_of_groups,row_idx_length):
# get the max value for the current group
value = get_extrema_value_group(data_arr[i:i+row_idx_length])
# get the row and column idx associated with the max value
idx_angle_number = np.nonzero(abs(data_arr[i:i+row_idx_length,:])==value)
print('Group number : ' + str(i//row_idx_length+1))
print('Number : '+ str(idx[idx_angle_number[0][0]]))
print('Angle : '+ keys[idx_angle_number[1][0]])
print('Absolute extrema value : ' + str(value))
print('------')
max_values.append(value)
# Arrange those values diagonally in square matrix for output 2
A = np.diag(max_values)
print('A = ' + str(A))
# Fill A with desired values
for i in range(0,number_of_groups,1):
A[i,0] = data_arr[i*row_idx_length+2,2] # 20 deg 12
A[i,1:3] = data_arr[i*row_idx_length+3,1] # x2 : 10 deg 101
A[i,3] = data_arr[i*row_idx_length+1,1] # 10 deg 10
# Final output
# replace the diagonal of A with max values
# get the idx of diag
A_di = np.diag_indices(number_of_groups)
# replace with max values
A[A_di] = max_values
print ('A = ' + str(A))
# Compute determinant of A
det_A = np.linalg.det(A)
print ('det(A) = '+str(det_A))
Output 1:
Group number : 1
Number : 12
Angle : 20_deg
Absolute extrema value : -56
------
Group number : 2
Number : 101
Angle : 10_deg
Absolute extrema value : -52
------
Group number : 3
Number : 101
Angle : 10_deg
Absolute extrema value : -71
------
Group number : 4
Number : 10
Angle : 10_deg
Absolute extrema value : 52
------
Output 2 :
A = [[-56 0 0 0]
[ 0 -52 0 0]
[ 0 0 -71 0]
[ 0 0 0 52]]
Output 3 :
A = [[-56 -18 -18 41]
[ 33 -52 -52 12]
[ 37 -71 -71 12]
[ 25 -6 -6 52]]
det(A) = -5.4731330578761246e-11

Print list in specified range Python

I'm new to Python and I have this problem
I have a list of numbers like this:
n = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
I want to print from 11 to 37, that means the output = 11, 13,.... 37.
I tried to print(n[11:37]) but of course it will print [37, 41, 43, 47]
because that is range index.
Any ideas or does Python have any built-in method for this ?

This should do the job...
n = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
n.sort()
mylist = [x for x in n if x in range(11, 38)]
print(mylist)
Want to print that as comma separated string:
print(mylist.strip('[]'))

This will work. (Assuming list is sorted)
print n[n.index(11): n.index(37)+1]
Output:
[11, 13, 17, 19, 23, 29, 31, 37]

Considering your list is ordered and it has no duplicates:
n = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
print(",".join(map(str,n[n.index(11): n.index(37)+1])))
Here you have a live example

Using numpy:
import numpy as np
narr = np.array(n)
m = (narr >= 11) & (narr <= 37)
for v in narr[m]:
print(v)
# or, to get rid of the loop:
print('\n'.join(map(str, narr[m])))

it pretty simple, since your list is already sorted you can write
my_list = [x for x in n if x in range(11, 38)]
print(*my_list)
what the '*' does is that it unpacks the array into individual elements, a term known as unpacking.This will produce the actual result you wanted and not an array

If your data is sorted, you can use a generator expression with either a range object or chained comparisons:
n = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
print(*(i for i in n if i in range(11, 38)), sep=', ')
print(*(i for i in n if 11 <= i <= 37), sep=', ')
If your data is unsorted and you can use indices of the first occurrences of each value, you can slice your list:
print(*n[n.index(11): n.index(37)+1], sep=', ')
Result with the data you have provided:
11, 13, 17, 19, 23, 29, 31, 37

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Group output of Column based on a condition - python

Related

Matlab to Python - extracting lower subdiagonal triangle, why different order?

creating a range of numbers in pandas based on single column

How to find average, Max and largest(similar to excel function) in a list in python?

Find largest value from multiple colums in each group of row index in Python, arrange those values diagonally in matrix, and find determinant

Print list in specified range Python

Categories

Resources