I am translating code from MATLAB to Python. I need to extract the lower subdiagonal values of a matrix. My attempt in python seems to extract the same values (sum is equal), but in different order. This is a problem as I need to apply corrcoef after.
The original Matlab code is using an array of indices to subset a matrix.
MATLAB code:
values = 1:100;
matrix = reshape(values,[10,10]);
subdiag = find(tril(ones(10),-1));
matrix_subdiag = matrix(subdiag);
subdiag_sum = sum(matrix_subdiag);
disp(matrix_subdiag(1:10))
disp(subdiag_sum)
Output:
2
3
4
5
6
7
8
9
10
13
1530
My attempt in Python
import numpy as np
matrix = np.arange(1,101).reshape(10,10)
matrix_t = matrix.T #to match MATLAB arrangement
matrix_subdiag = matrix_t[np.tril_indices((10), k = -1)]
subdiag_sum = np.sum(matrix_subdiag)
print(matrix_subdiag[0:10], subdiag_sum))
Output:
[2 3 13 4 14 24 5 15 25 35] 1530
How do I get the same order output? Where is my error?
Thank you!
For the sum use directly numpy.triu on the non-transposed matrix:
S = np.triu(matrix, k=1).sum()
# 1530
For the indices, numpy.triu_indices_from and slicing as a flattened array:
idx = matrix[np.triu_indices_from(matrix, k=1)]
output:
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14, 15, 16, 17, 18, 19, 20,
24, 25, 26, 27, 28, 29, 30, 35, 36, 37, 38, 39, 40, 46, 47, 48, 49,
50, 57, 58, 59, 60, 68, 69, 70, 79, 80, 90])
I am trying to create a list of 6 numbers lists from 1 to 49 throw looping from 1 to 49 and creating all possible sets of 1 to 49 .
the issue is that code stops at number 15 and in Pycharm nothing is being printed (excel file is being written but stops at 38759 record)
import itertools
import pandas as pd
stuff = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
all=[]
for L in range(0, len(stuff)+1):
for subset in itertools.combinations(stuff, L):
alist=list(subset)
if len(subset)==6:
all.append(alist)
all_tuple=tuple(all)
df = pd.DataFrame(all_tuple,columns=['z1','z2','z3','z4','z5','z6'])
print(df)
df.to_excel('test.xlsx')
If I understand correctly, you are trying to find the possible combinations of 6 numbers sampled from the list [1, 2, 3, ..., 49] without replacement.
But your code calculates the combinations of all lengths and then only saves those of length 6.
To get a clue as to why your code does not terminate quickly, consider the number of combinations of 6 numbers:
>>> print(len(list(itertools.combinations(range(1, 50), 6))))
13983816
So, if there are 14 million possible combinations of 6 numbers, imagine how many combinations there are of 7, 8, 9, ...
Here is some code to calculate only the 14 million combinations of length 6:
combs = list(itertools.combinations(range(1, 50), 6))
Or, if you really want to build the dataframe:
# Warning, this takes about 25 seconds
combs = itertools.combinations(range(1, 50), 6)
df = pd.DataFrame(combs, columns=['z1','z2','z3','z4','z5','z6'])
Bear in mind that this will take up quite a bit of memory. I'm not sure if Excel can handle 14 million rows so I didn't risk it.
Also, don't use reserved keywords for variable names. all is a built in Python function.
I have a pandas dataframe:
df2 = pd.DataFrame({'ID':['A','B','C','D','E'], 'loc':['Lon','Tok','Ber','Ams','Rom'], 'start':[20,10,30,40,43]})
ID loc start
0 A Lon 20
1 B Tok 10
2 C Ber 30
3 D Ams 40
4 E Rom 43
I'm looking to add in a column called range which takes the value in 'start' and produces a range of values which (including the initial value) are 10 less than the initial value, all in the same row.
The desired output:
ID loc start range
0 A Lon 20 20,19,18,17,16,15,14,13,12,11,10
1 B Tok 10 10,9,8,7,6,5,4,3,2,1,0
2 C Ber 30 30,29,28,27,26,25,24,23,22,21,20
3 D Ams 40 40,39,38,37,36,35,34,33,32,31,30
4 E Rom 43 43,42,41,40,39,38,37,36,35,34,33
I have tried:
df2['range'] = [i for i in range(df2.start, df2.start -10)]
and
def create_range2(row):
return df2['start'].between(df2.start, df2.start - 10)
df2.loc[:, 'range'] = df2.apply(create_range2, axis = 1)
however I can't seem to get the desired output. I intend to apply this solution to multiple dataframes, one of which has > 2,000,000 rows.
thanks
You might prepare range creating function and .apply it to start column following way:
import pandas as pd
df2 = pd.DataFrame({'ID':['A','B','C','D','E'], 'loc':['Lon','Tok','Ber','Ams','Rom'], 'start':[20,10,30,40,43]})
def make_10(x):
return list(range(x, x-10-1, -1))
df2["range"] = df2["start"].apply(make_10)
print(df2)
output
ID loc start range
0 A Lon 20 [20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10]
1 B Tok 10 [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
2 C Ber 30 [30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20]
3 D Ams 40 [40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30]
4 E Rom 43 [43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33]
Explanation: .apply method of pandas.Series (column of pandas.DataFrame) accept function which is applied element-wise. Note that there is -1 in range as it is inclusive-exclusive and -1 as step size as you want to have descending values.
does this work?
df2['range'] = df2.apply(lambda row: list(range(row['start'],row['start']-11,-1)),axis=1)
df2
output
ID loc start range
0 A Lon 20 [20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10]
1 B Tok 10 [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
2 C Ber 30 [30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20]
3 D Ams 40 [40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30]
4 E Rom 43 [43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33]
or if you want comma-separated:
df2['range'] = df2.apply(lambda row: ','.join([str(v) for v in range(row['start'],row['start']-11,-1)]),axis=1)
to get
ID loc start range
0 A Lon 20 20,19,18,17,16,15,14,13,12,11,10
1 B Tok 10 10,9,8,7,6,5,4,3,2,1,0
2 C Ber 30 30,29,28,27,26,25,24,23,22,21,20
3 D Ams 40 40,39,38,37,36,35,34,33,32,31,30
4 E Rom 43 43,42,41,40,39,38,37,36,35,34,33
I try to make an array in NumPy and put each index number in the corresponding place in an array
for example, if my array is a "ndarray(30,)" with the size of 30, then :
index 0 = 1
index 1 = 2
.
.
.
index 29 = 30
is there any function in NumPy that do it for me?
if it's not I would appreciate helping me with its code?
thanks
Here you go:
>>> import numpy as np
>>> np.arange(start=1, stop=31)
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30])
>>>
I found the builtin function numpy.arange(your_desired_size) for example :
a = numpy.array([30.3 , 20.5 , 14.2 , 15.3 , 81.2 , 88.4])
v = numpy.size(a)
a = np.arange(v)
I am trying to cummatively add a value to the previous value and each time, store the value in an array.
This code is just part of a larger project. For simplicity i am going to define my variables as follows:
ele_ini = [12]
smb = [2, 5, 7, 8, 9, 10]
val = ele_ini
for i in range(len(smb)):
val += smb[i]
print(val)
elevation_smb.append(val)
Problem
Each time, the previous value stored in elevation_smb is replaced by the current value such that the result i obtain is:
elevation_smb = [22, 22, 22, 22, 22, 22]
The result i am expecting however is
elevation_smb = [14, 19, 26, 34, 43, 53]
NOTE:
ele_ini is a vector with n elements. I am only using 1 element just for simplicity.
Don use loops, because slow. Better is fast vectorized solution below.
I think need numpy.cumsum and add vector ele_ini for 2d numpy array:
ele_ini = [12, 10, 1, 0]
smb = [2, 5, 7, 8, 9, 10]
elevation_smb = np.cumsum(np.array(smb)) + np.array(ele_ini)[:, None]
print (elevation_smb)
[[14 19 26 34 43 53]
[12 17 24 32 41 51]
[ 3 8 15 23 32 42]
[ 2 7 14 22 31 41]]
It seems vector in your case is using pointers. That's why it is not creating new values. Try adding copy() which copies the value.
elevation_smb.append(val.copy())
Do with reduce,
In [6]: reduce(lambda c, x: c + [c[-1] + x], smb, ele_ini)
Out[6]: [12, 14, 19, 26, 34, 43, 53]