How can I iterate over a numpy multidimensional array of dataframes? - python

Im trying to iterate over a multidimensional array on Python but I'm having problems because my array is full of dataframes instead of int().
I have a multidimensional numpy array (12, 11) which contains 12 x 11 different dataframes.
nombres_df = np.array([[df_01_ID4034, df_02_ID4034, df_03_ID4034, df_04_ID4034, df_05_ID4034, df_06_ID4034, df_07_ID4034, df_08_ID4034, df_09_ID4034, df_10_ID4034, df_11_ID4034, df_12_ID4034],
[df_01_ID4035, df_02_ID4035, df_03_ID4035, df_04_ID4035, df_05_ID4035, df_06_ID4035, df_07_ID4035, df_08_ID4035, df_09_ID4035, df_10_ID4035, df_11_ID4035, df_12_ID4035],
[df_01_ID4039, df_02_ID4039, df_03_ID4039, df_04_ID4039, df_05_ID4039, df_06_ID4039, df_07_ID4039, df_08_ID4039, df_09_ID4039, df_10_ID4039, df_11_ID4039, df_12_ID4039],
[df_01_ID4040, df_02_ID4040, df_03_ID4040, df_04_ID4040, df_05_ID4040, df_06_ID4040, df_07_ID4040, df_08_ID4040, df_09_ID4040, df_10_ID4040, df_11_ID4040, df_12_ID4040],
[df_01_ID4041, df_02_ID4041, df_03_ID4041, df_04_ID4041, df_05_ID4041, df_06_ID4041, df_07_ID4041, df_08_ID4041, df_09_ID4041, df_10_ID4041, df_11_ID4041, df_12_ID4041],
[df_01_ID4042, df_02_ID4042, df_03_ID4042, df_04_ID4042, df_05_ID4042, df_06_ID4042, df_07_ID4042, df_08_ID4042, df_09_ID4042, df_10_ID4042, df_11_ID4042, df_12_ID4042],
[df_01_ID4047, df_02_ID4047, df_03_ID4047, df_04_ID4047, df_05_ID4047, df_06_ID4047, df_07_ID4047, df_08_ID4047, df_09_ID4047, df_10_ID4047, df_11_ID4047, df_12_ID4047],
[df_01_ID4049, df_02_ID4049, df_03_ID4049, df_04_ID4049, df_05_ID4049, df_06_ID4049, df_07_ID4049, df_08_ID4049, df_09_ID4049, df_10_ID4049, df_11_ID4049, df_12_ID4049],
[df_01_ID4056, df_02_ID4056, df_03_ID4056, df_04_ID4056, df_05_ID4056, df_06_ID4056, df_07_ID4056, df_08_ID4056, df_09_ID4056, df_10_ID4056, df_11_ID4056, df_12_ID4056],
[df_01_ID4059, df_02_ID4059, df_03_ID4059, df_04_ID4059, df_05_ID4059, df_06_ID4059, df_07_ID4059, df_08_ID4059, df_09_ID4059, df_10_ID4059, df_11_ID4059, df_12_ID4059],
[df_01_ID4075, df_02_ID4075, df_03_ID4075, df_04_ID4075, df_05_ID4075, df_06_ID4075, df_07_ID4075, df_08_ID4075, df_09_ID4075, df_10_ID4075, df_11_ID4075, df_12_ID4075]], dtype="object")
for j in range(len(nombres_df)):
for i in range(len(nombres_df[i])):
print (nombres_df[i][j])
I need to iterate over it and make operations with values inside each dataframe.
The problem is that when I try to iterate as usually, I cannot do it because I'm getting this error:
5
6 for j in range(len(nombres_df)):
7 for i in range(len(nombres_df[i])): <--------
8 print (nombres_df[i][j], end = " ")
IndexError: arrays used as indices must be of integer (or boolean) type
I know the problem is here len(nombres_df[i]) but I don`t know how to solve it.
Thank you very much

I thick the problem is the fact that you are iterating over the wrong index in the second line of your code.
that i inside range(len(nombres_df[i])) shoud be j
also you inverted the indexes in nombres_df[i][j] it shoud be nombres_df[j][i]
this shoud do the trick
for j in range(len(nombres_df)):
for i in range(len(nombres_df[j])):
print (nombres_df[j][i])

Related

Getting wrong results with np.argpartition, while selecting maximum n values from an array

so I was using this answer on 'How do I get indices of N maximum values in a NumPy array?' question. I used it in my ML model in which it outputs Logsoftmax layer values and I was thinking to get top 4 classes in each. In most of the cases, it sorted and gave values correctly but in a very few cases, I see partially unsorted results like this
arr = np.array([-3.0302, -2.7103, -7.4844, -3.4761, -5.3009, -5.2121, -3.7549, -4.7834,
-5.8870, -3.4839, -5.0104, -3.0992, -4.8823, -0.3319, -6.8084])
ind = np.argpartition(arr, -4)[-4:]
print(arr[ind])
and the output is
[-3.0992 -3.0302 -0.3319 -2.7103]
which is unsorted, it has to output the maximum values at last but it is not seen in this case. I checked with other examples and it is doing all fine. Like
arr = np.array([45, 35, 67.345, -34.5555, 66, -0.23655, 11.0001, 0.234444444])
ind = np.argpartition(arr, -4)[-4:]
print(arr[ind])
output
[35. 45. 66. 67.345]
What could be the reason? Did I miss anything?
If you're not planning on actually utilizing the sorted indices, why not just use np.sort?
>>> arr = np.array([-3.0302, -2.7103, -7.4844, -3.4761, -5.3009, -5.2121, -3.7549,
-4.7834, -5.8870, -3.4839, -5.0104, -3.0992, -4.8823, -0.3319, -6.8084])
>>> np.sort(arr)[-4:]
array([-3.0992, -3.0302, -2.7103, -0.3319])
Alternatively, as read here you could use a range for your kth option on np.argpartition:
np.argpartition(arr, range(0, -4, -1))[-4:]
array([-3.0992, -3.0302, -2.7103, -0.3319])

Apply logsumexp to all the first element of array

I have a question, under a specific variable that for semplicitity we call a, I have the following arrays written in this way.
[-6.396736847188359, -6.154559100742114, -6.211476547612676]
[-8.006589632001111, -7.826171257487284, -7.71335303949824]
[-6.456557174187878, -6.262447971939394, -6.38657184063457]
[-7.487923068341583, -7.189375715312779, -7.252991999097159]
[-7.532980499994895, -7.44329050097094, -7.529773039725542]
[-7.429923219897081, -6.960840780894108, -7.173489030350187]
[-7.194082458487091, -6.909676564074833, -6.944666159195248]
[-7.734357883680035, -7.512036612219159, -7.607808831503251]
[-7.734008421702387, -7.164880777772352, -7.709697714174302]
[-8.3156235828106, -8.486948182913475, -8.612390113851397]
How can I apply the scipy formula logsumexp to each column? I tried to use the logsumexp(a[0]) but it doesn't work also I try to iterate over a[0] but i got the error about flot64.
Thanks to all
Use the axis parameter: logsumexp(a, axis=?), where ? Is 0 or 1

Remove elements from array of arrays

I have an array of arrays from which I want to remove specific elements according to a logical command.
I have an array of arrays such that galaxies = ([[z1,ra1,dec1,distance1],[z2,ra2,dec2,distance2]...])and i want to remove all elements whose distance term is greater than 1. Ive tried to write "from galaxies[i], remove all galaxies such that galaxies[i][4]>1"
My code right now is:
galaxies_in_cluster = []
for i in range(len(galaxies)):
galacticcluster = galaxies[~(galaxies[i][4]<=1)]
galaxies_in_cluster.append(galacticcluster)
where
galaxies = [array([1.75000000e-01, 2.43794800e+02, 5.63820000e+01, 6.80000000e+00,
7.07290131e-02]),
array([1.75000000e-01, 2.40898000e+02, 5.15900000e+01, 7.10000000e+00,
5.60800387e+00]),
array([1.80000000e-01, 2.43792000e+02, 5.63990000e+01, 6.50000000e+00,
5.00059297e+02]),
array([1.75000000e-01, 2.43805000e+02, 5.62190000e+01, 7.80000000e+00,
2.16588562e-01])]
I want it to return
galaxies_in_cluster = [array([1.75000000e-01, 2.43794800e+02, 5.63820000e+01, 6.80000000e+00,
7.07290131e-02]), array([1.75000000e-01, 2.43805000e+02, 5.62190000e+01, 7.80000000e+00,
2.16588562e-01])]
(basically eliminating the second and third entry) but its returning the first and second entry twice, which doesn't make sense to me, especially since in the second entry, galaxies[2][4]>1.
Any help would be much appreciated.

Sum of elements of numpy array not same as total

I'm trying to count number of pairs and save them in two different histograms, one saves the pair in an array where the parent objects are split and the other one just saves the total, that means I have a loop that looks like this:
for k in range(N_parents):
pair_hist[k, bin] +=1
total_pair_hist[bin] +=1
where both pair_hist and total_pair as defined as,
pair_hist = np.zeros((N_parents, bins.shape[0]), dtype = np.uint64)
total_pair_hist = np.zeros(bins.shape[0], dtype = np.uint64)
I'd expect that summing the elements of pair_hist across all parents (axis=0), I'd get the total histogram. The funny thing is, if I take the sum of pair_hist:
onehalo_sum_ind = np.sum(pair_hist, axis = 0)
I don't get exactly total_pair_hist, but something slightly different:
total_pair_hist = [ 287248245 448773033 695820015 1070797576 1634146741 2466680801
3667159080 5334307986 7524739978 10206208064 13237161068 16466436715
19231751113 20949333183 21254336387 19497450101 16459529579 13038604111
9783826702 7006904025 4813946458 3207605915 2097437543 1355158303
869077173 555036759 353732683 225171870 143179912 0]
pair_hist = [ 287267022 448887401 696415932 1073435699 1644677789 2503693266
3784008845 5665555755 8380564635 12201977310 17382403650 23929909625
31103373709 36859534246 38146287402 33454446858 25689430007 18142721164
12224099624 8035266046 5211441720 3353187036 2147027818 1370663213
873519714 556182465 353995293 225224668 143189173 0]
Any idea of what's going on? Thank you in advance :)
Sorry for the late reply, but I didn't have time to work on it before. The problem was caused by numba. I was using it with the parallel=True flag to parallelise one of the loops and that caused the error.

create a loop inside a loop in python

i am trying to use this program to get a list (phi0ex) of 211 arrays each array contains 251*251 elements
all what i get is a list of arrays of 251 elements, please help
data=loadtxt('data.csv',delimiter=',')
data1=data.transpose()
ngrains=loadtxt('nombre_grain.csv',delimiter=',')
phi0ex1=211*[zeros(shape(251*251))]
gr1=zeros(shape=(251,251))
for k in range(0,len(ngrains)):
for i,j in enumerate(data1):
for s in range(0,251):
gr1[i]=where(s==ngrains[k],1,0)
phi0ex1[k]=gr1
print phi0ex1
#
i found the solution thank you guys for showing intrest, accully the function where() do the iteration it self (that i did'nt know) there is no need to put it in an other loop, only the loop over "ngrains will do the trick.
data=loadtxt('data.csv',delimiter=',')
data1=data.transpose()
ngrains=loadtxt('nombre_grain.csv',delimiter=',')
phi0ex=len(ngrains)*[zeros(shape(250))]
for k in range(len(ngrains)):
print ngrains[k]
phi0ex[k]=where(data1==ngrains[k],1,0)
print phi0ex

Categories

Resources