Matlab to Python - extracting lower subdiagonal triangle, why different order? - python

I am translating code from MATLAB to Python. I need to extract the lower subdiagonal values of a matrix. My attempt in python seems to extract the same values (sum is equal), but in different order. This is a problem as I need to apply corrcoef after.
The original Matlab code is using an array of indices to subset a matrix.
MATLAB code:
values = 1:100;
matrix = reshape(values,[10,10]);
subdiag = find(tril(ones(10),-1));
matrix_subdiag = matrix(subdiag);
subdiag_sum = sum(matrix_subdiag);
disp(matrix_subdiag(1:10))
disp(subdiag_sum)
Output:
2
3
4
5
6
7
8
9
10
13
1530
My attempt in Python
import numpy as np
matrix = np.arange(1,101).reshape(10,10)
matrix_t = matrix.T #to match MATLAB arrangement
matrix_subdiag = matrix_t[np.tril_indices((10), k = -1)]
subdiag_sum = np.sum(matrix_subdiag)
print(matrix_subdiag[0:10], subdiag_sum))
Output:
[2 3 13 4 14 24 5 15 25 35] 1530
How do I get the same order output? Where is my error?
Thank you!

For the sum use directly numpy.triu on the non-transposed matrix:
S = np.triu(matrix, k=1).sum()
# 1530
For the indices, numpy.triu_indices_from and slicing as a flattened array:
idx = matrix[np.triu_indices_from(matrix, k=1)]
output:
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17, 18, 19, 20,
24, 25, 26, 27, 28, 29, 30, 35, 36, 37, 38, 39, 40, 46, 47, 48, 49,
50, 57, 58, 59, 60, 68, 69, 70, 79, 80, 90])

Related

How to generate sequential subsets of integers?

I have the following start and end values:
start = 0
end = 54
I need to generate subsets of 4 sequential integers starting from start until end with a space of 20 between each subset. The result should be this one:
0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51
In this example, we obtained 3 subsets:
0, 1, 2, 3
24, 25, 26, 27
48, 49, 50, 51
How can I do it using numpy or pandas?
If I do r = [i for i in range(0,54,4)], I get [0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52].
This should get you what you want:
j = 20
k = 4
result = [split for i in range(0,55, j+k) for split in range(i, k+i)]
print (result)
Output:
[0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51]
Maybe something like this:
r = [j for i in range(0, 54, 24) for j in range(i, i + 4)]
print(r)
[0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51]
you can use numpy.arange which returns an ndarray object containing evenly spaced values within a given range
import numpy as np
r = np.arange(0, 54, 4)
print(r)
Result
[0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52]
Numpy approach
You can use np.arange to generate number with a step value of 20 + 4, where 20 is for space between each interval and 4 for each sequential sub array.
start = 0
end = 54
out = np.arange(0, 54, 24) # array([ 0, 24, 48]) These are the starting points
# for each subarray
step = np.tile(np.arange(4), (len(out), 1))
# [[0 1 2 3]
# [0 1 2 3]
# [0 1 2 3]]
res = out[:, None] + step
# array([[ 0, 1, 2, 3],
# [24, 25, 26, 27],
# [48, 49, 50, 51]])
This can be done with plane python:
rangeStart = 0
rangeStop = 54
setLen = 4
step = 20
stepTot = step + setLen
a = list( list(i+s for s in range(setLen)) for i in range(rangeStart,rangeStop,stepTot))
In this case you will get the subsets as sublists in the array.
I dont think you need to use numpy or pandas to do what you want. I achieved it with a simple while loop
num = 0
end = 54
sequence = []
while num <= end:
sequence.append(num)
num += 1
if num%4 == 0: //If four numbers have been added
num += 20
//output: [0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51]

Creating a list with 3 values every 3 values

I'm having troubles writing this piece of code.
I need to create a list to only have 3 values every 3 values :
The expected output must be something like :
output1 = [1,2,3,7,8,9,13,14,15,....67,68,69]
output2 = [4,5,6,10,11,12...70,71,72]
Any ideas how can I reach that ?
Use two loops -- one for each group of three, and one for each item within that group. For example:
>>> [i*6 + j for i in range(12) for j in range(1, 4)]
[1, 2, 3, 7, 8, 9, 13, 14, 15, 19, 20, 21, 25, 26, 27, 31, 32, 33, 37, 38, 39, 43, 44, 45, 49, 50, 51, 55, 56, 57, 61, 62, 63, 67, 68, 69]
>>> [i*6 + j for i in range(12) for j in range(4, 7)]
[4, 5, 6, 10, 11, 12, 16, 17, 18, 22, 23, 24, 28, 29, 30, 34, 35, 36, 40, 41, 42, 46, 47, 48, 52, 53, 54, 58, 59, 60, 64, 65, 66, 70, 71, 72]
Suppose you want n values every n values of total sets starting with start. Just change the start and number of sets you need. In below example list start with 1, so first set [1,2,3] and we need 12 sets each containing 3 consecutive element
Method 1
n = 3
start = 1
total = 12
# 2*n*i + start is first element of every set of n tuples (Arithmetic progression)
print([j for i in range(total) for j in range(2*n*i + start, 2*n*i + start+n)])
# Or
print(sum([list(range(2*n*i + start, 2*n*i + start+n)) for i in range(total)], []))
Method 2 (Numpy does operation in C, so fast)
import numpy as np
n = 3
start = 1
total = 12
# One liner
print(
(np.arange(start, start + n, step=1)[:, np.newaxis] + np.arange(0, total, 1) * 2*n).transpose().reshape(-1)
)
##############EXPLAINATION OF ABOVE ONE LINEAR########################
# np.arange start, start+1, ... start + n - 1
first_set = np.arange(start, start + n, step=1)
# [1 2 3]
# np.arange 0, 2*n, 4*n, 6*n, ....
multiple_to_add = np.arange(0, total, 1) * 2*n
print(multiple_to_add)
# broadcast first set using np.newaxis and repeatively add to each element in multiple_to_add
each_set_as_col = first_set[:, np.newaxis] + multiple_to_add
# [[ 1 7 13 19 25 31 37 43 49 55 61 67]
# [ 2 8 14 20 26 32 38 44 50 56 62 68]
# [ 3 9 15 21 27 33 39 45 51 57 63 69]]
# invert rows and columns
each_set_as_row = each_set_as_col.transpose()
# [[ 1 2 3]
# [ 7 8 9]
# [13 14 15]
# [19 20 21]
# [25 26 27]
# [31 32 33]
# [37 38 39]
# [43 44 45]
# [49 50 51]
# [55 56 57]
# [61 62 63]
# [67 68 69]]
merge_all_set_in_single_row = each_set_as_row.reshape(-1)
# array([ 1, 2, 3, 7, 8, 9, 13, 14, 15, 19, 20, 21, 25, 26, 27, 31, 32,
# 33, 37, 38, 39, 43, 44, 45, 49, 50, 51, 55, 56, 57, 61, 62, 63, 67,
# 68, 69])
To make the logic understandable, because sometimes the Pythonic methods look 'magic'
Here's a naive algorithm to do that:
output1 = []
output2 = []
for i in range(1, 100): # change as you like:
if (i-1) % 6 < 3:
output1.append(i)
else:
output2.append(i)
What's going on here:
Initializing two empty lists.
Iterate through integers in a range.
How to tell if i should go to output1 or output2:
I can see that 3 consecutive numbers go to output1, then 3 consecutive to output2.
This tells me I can use the modulo % operator, (doing % 6)
The rest is simple logic to get the exact result wanted.

How to looping in python with raster dataset

I have a multiband raster (84 bands). I am reading the raster using GDAL and converting it to numpy array. In numpy when I am checking the array shape it is showing as 84 = bands, 3 = row and 5 = col. I want to compute the ratio between the band(0)/band(n+1) for n in 1 to 84. Thus, I am able to get 83 arrays, each array represents pixel-by-pixel ratio. For example, I have:
Band 1
[[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]]
Band 2
[[21, 22, 23, 24, 25],
[26, 27, 28, 29, 30],
[31, 32, 33, 34, 35]]
Band 3
[[31, 32, 33, 34, 35],
[36, 37, 38, 39, 40],
[41, 42, 43, 44, 45]]
...
...
Band84
I need to loop through all the bands in such a way that I get these: Band2/Band1; Band3/Band1; ... ; Band84/Band1
Band2/Band1
[[1/21, 2/22, 3/23, 4/24, 5/25],
[6/26, 7/27, 8/28, 9/29, 10/30],
[11,31, 12/32, 13/33, 14/34, 15/35]]
And so on...
There is any way to vectorize this calculation?
I really appreciate your advice.
If I understand you need
Band2/Band1; Band3/Band1 ... Band84/Band1
Band3/Band2; Band4/Band2 ... Band84/Band2
...
Band84/Band83
It should be something like this
for a in range(0, len(all_bands)-1):
for b in range(a+1, len(all_bands)):
print( all_bands[b]/all_bands[a] )

creating a range of numbers in pandas based on single column

I have a pandas dataframe:
df2 = pd.DataFrame({'ID':['A','B','C','D','E'], 'loc':['Lon','Tok','Ber','Ams','Rom'], 'start':[20,10,30,40,43]})
ID loc start
0 A Lon 20
1 B Tok 10
2 C Ber 30
3 D Ams 40
4 E Rom 43
I'm looking to add in a column called range which takes the value in 'start' and produces a range of values which (including the initial value) are 10 less than the initial value, all in the same row.
The desired output:
ID loc start range
0 A Lon 20 20,19,18,17,16,15,14,13,12,11,10
1 B Tok 10 10,9,8,7,6,5,4,3,2,1,0
2 C Ber 30 30,29,28,27,26,25,24,23,22,21,20
3 D Ams 40 40,39,38,37,36,35,34,33,32,31,30
4 E Rom 43 43,42,41,40,39,38,37,36,35,34,33
I have tried:
df2['range'] = [i for i in range(df2.start, df2.start -10)]
and
def create_range2(row):
return df2['start'].between(df2.start, df2.start - 10)
df2.loc[:, 'range'] = df2.apply(create_range2, axis = 1)
however I can't seem to get the desired output. I intend to apply this solution to multiple dataframes, one of which has > 2,000,000 rows.
thanks
You might prepare range creating function and .apply it to start column following way:
import pandas as pd
df2 = pd.DataFrame({'ID':['A','B','C','D','E'], 'loc':['Lon','Tok','Ber','Ams','Rom'], 'start':[20,10,30,40,43]})
def make_10(x):
return list(range(x, x-10-1, -1))
df2["range"] = df2["start"].apply(make_10)
print(df2)
output
ID loc start range
0 A Lon 20 [20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10]
1 B Tok 10 [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
2 C Ber 30 [30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20]
3 D Ams 40 [40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30]
4 E Rom 43 [43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33]
Explanation: .apply method of pandas.Series (column of pandas.DataFrame) accept function which is applied element-wise. Note that there is -1 in range as it is inclusive-exclusive and -1 as step size as you want to have descending values.
does this work?
df2['range'] = df2.apply(lambda row: list(range(row['start'],row['start']-11,-1)),axis=1)
df2
output
ID loc start range
0 A Lon 20 [20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10]
1 B Tok 10 [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
2 C Ber 30 [30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20]
3 D Ams 40 [40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30]
4 E Rom 43 [43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33]
or if you want comma-separated:
df2['range'] = df2.apply(lambda row: ','.join([str(v) for v in range(row['start'],row['start']-11,-1)]),axis=1)
to get
ID loc start range
0 A Lon 20 20,19,18,17,16,15,14,13,12,11,10
1 B Tok 10 10,9,8,7,6,5,4,3,2,1,0
2 C Ber 30 30,29,28,27,26,25,24,23,22,21,20
3 D Ams 40 40,39,38,37,36,35,34,33,32,31,30
4 E Rom 43 43,42,41,40,39,38,37,36,35,34,33

How to find average, Max and largest(similar to excel function) in a list in python?

I have a list of numbers and from this list, I want to create 3 more lists that contain the maximum, average, and 5th largest number from it. My original list overdraw is the block of lists, which means it has sub-blocks in it and each block has 6 numbers in it and there are a total of 3 blocks or 6x3 matrix or array.
overdraw:
[[16,13,23,14,33,45],[23,11,54,34,23,76],[22,54,34,43,41,11]]
I know how to calculate max, average and 5 largest in this list. But I want a answer in a specific way like I know the max, average, and 5th largest values of each block but I want them to get printed 4 times. I know all the values:
Max = [45, 76, 54]
Average = [24, 37, 34]
Largest(5th) = [14, 23, 22]
my approach:
overdraw = [[16,13,23,14,33,45],[23,11,54,34,23,76],[22,54,34,43,41,11]]
x = [sorted(block, reverse=True) for block in overdraw] # first sort the whole list
max = [x[i][0] for i in range(0, len(x))] # for max
largest = [x[i][4] for i in range(0, len(x))] #5th largest
average = [sum(x[i])/len(x[i]) for i in range(0, len(x))] #average
print("max: ", max)
print("5th largest: ", largest)
print("average: ", average)
You will get the same output after running this code but I want output in this format:
Average = [24, 24, 24, 24, 37, 37, 37, 37, 34, 34, 34, 34]
Max = [45, 45, 45, 45, 76, 76, 76, 76, 54, 54, 54, 54]
Largest(5th) = [14, 14, 14, 14, 23, 23, 23, 23, 22, 22, 22, 22]
As you can see each average, max, and the largest number is printed 4 times in their respective list. So can anyone help with this answer?
What about using pandas.DataFrame.explode
import pandas as pd
df = pd.DataFrame({
'OvIdx' : 3 * [range(4)],
'Average' : average,
'Max' : max, # should be renamed/assigned as max_ instead
'Largest(5th)': largest
}).explode('OvIdx').set_index('OvIdx').astype(int)
print(df)
which shows
Average Max Largest(5th)
OvIdx
0 24 45 14
1 24 45 14
2 24 45 14
3 24 45 14
0 36 76 23
1 36 76 23
2 36 76 23
3 36 76 23
0 34 54 22
1 34 54 22
2 34 54 22
3 34 54 22
from here, you can still do all the calculations you want and/or getting a NumPy array, doing df.values.
Following your comment, you can also get your column(s) as individual entities, doing, e.g.
>>> df.Average.tolist()
[24, 24, 24, 24, 36, 36, 36, 36, 34, 34, 34, 34]
>>> df.Max.tolist()
[45, 45, 45, 45, 76, 76, 76, 76, 54, 54, 54, 54]
>>> df['Largest(5th)'].tolist() # as string key since the name is a little bit exotic
[14, 14, 14, 14, 23, 23, 23, 23, 22, 22, 22, 22]
which approach starts to be a little bit overkilled, readable though.
A solution that returns lists like you specified
import itertools
import numpy as np
n_times = 4
overdraw = [[16,13,23,14,33,45],[23,11,54,34,23,76],[22,54,34,43,41,11]]
y = [sorted(block, reverse=True) for block in overdraw]
maximum = list(itertools.chain(*[[max(x)]*n_times for x in y]))
average = list(itertools.chain(*[[int(round(sum(x)/len(x)))]*n_times for x in y]))
fifth_largest = list(itertools.chain(*[[x[4]]*n_times for x in y]))
print(f"Average = {average}")
print(f"Max = {maximum}")
print(f"Largest(5th): {fifth_largest}")
Outputs:
Average = [24, 24, 24, 24, 37, 37, 37, 37, 34, 34, 34, 34]
Max = [45, 45, 45, 45, 76, 76, 76, 76, 54, 54, 54, 54]
Largest(5th): [14, 14, 14, 14, 23, 23, 23, 23, 22, 22, 22, 22]

Categories

Resources