create a loop inside a loop in python - python

i am trying to use this program to get a list (phi0ex) of 211 arrays each array contains 251*251 elements
all what i get is a list of arrays of 251 elements, please help
data=loadtxt('data.csv',delimiter=',')
data1=data.transpose()
ngrains=loadtxt('nombre_grain.csv',delimiter=',')
phi0ex1=211*[zeros(shape(251*251))]
gr1=zeros(shape=(251,251))
for k in range(0,len(ngrains)):
for i,j in enumerate(data1):
for s in range(0,251):
gr1[i]=where(s==ngrains[k],1,0)
phi0ex1[k]=gr1
print phi0ex1
#

i found the solution thank you guys for showing intrest, accully the function where() do the iteration it self (that i did'nt know) there is no need to put it in an other loop, only the loop over "ngrains will do the trick.
data=loadtxt('data.csv',delimiter=',')
data1=data.transpose()
ngrains=loadtxt('nombre_grain.csv',delimiter=',')
phi0ex=len(ngrains)*[zeros(shape(250))]
for k in range(len(ngrains)):
print ngrains[k]
phi0ex[k]=where(data1==ngrains[k],1,0)
print phi0ex

Related

Python: Use the "i" counter in while loop as digit for expressions

This seems like it should be very simple but am not sure the proper syntax in Python. To streamline my code I want a while loop (or for loop if better) to cycle through 9 datasets and use the counter to call each file out using the counter as a way to call on correct file.
I would like to use the "i" variable within the while loop so that for each file with sequential names I can get the average of 2 arrays, the max-min of this delta, and the max-min of another array.
Example code of what I am trying to do but the avg(i) and calling out temp(i) in loop does not seem proper. Thank you very much for any help and I will continue to look for solutions but am unsure how to best phrase this to search for them.
temp1 = pd.read_excel("/content/113VW.xlsx")
temp2 = pd.read_excel("/content/113W6.xlsx")
..-> temp9
i=1
while i<=9
avg(i) =np.mean(np.array([temp(i)['CC_H='],temp(i)['CC_V=']]),axis=0)
Delta(i)=(np.max(avg(i)))-(np.min(avg(i)))
deltaT(i)=(np.max(temp(i)['temperature='])-np.min(temp(i)['temperature=']))
i+= 1
EG: The slow method would be repeating code this for each file
avg1 =np.mean(np.array([temp1['CC_H='],temp1['CC_V=']]),axis=0)
Delta1=(np.max(avg1))-(np.min(avg1))
deltaT1=(np.max(temp1['temperature='])-np.min(temp1['temperature=']))
avg2 =np.mean(np.array([temp2['CC_H='],temp2['CC_V=']]),axis=0)
Delta2=(np.max(avg2))-(np.min(avg2))
deltaT2=(np.max(temp2['temperature='])-np.min(temp2['temperature=']))
......
Think of things in terms of lists.
temps = []
for name in ('113VW','113W6',...):
temps.append( pd.read_excel(f"/content/{name}.xlsx") )
avg = []
Delta = []
deltaT = []
for data in temps:
avg.append(np.mean(np.array([data['CC_H='],data['CC_V=']]),axis=0)
Delta.append(np.max(avg[-1]))-(np.min(avg[-1]))
deltaT.append((np.max(data['temperature='])-np.min(data['temperature=']))
You could just do your computations inside the first loop, if you don't need the dataframes after that point.
The way that I would tackle this problem would be to create a list of filenames, and then iterate through them to do the necessary calculations as per the following:
import pandas as pd
# Place the files to read into this list
files_to_read = ["/content/113VW.xlsx", "/content/113W6.xlsx"]
results = []
for i, filename in enumerate(files_to_read):
temp = pd.read_excel(filename)
avg_val =np.mean(np.array([temp(i)['CC_H='],temp['CC_V=']]),axis=0)
Delta=(np.max(avg_val))-(np.min(avg_val))
deltaT=(np.max(temp['temperature='])-np.min(temp['temperature=']))
results.append({"avg":avg_val, "Delta":Delta, "deltaT":deltaT})
# Create a dataframe to show the results
df = pd.DataFrame(results)
print(df)
I have included the enumerate feature to grab the index (or i) should you want to access it for anything, or include it in the results. For example, you could change the the results.append line to something like this:
results.append({"index":i, "Filename":filename, "avg":avg_val, "Delta":Delta, "deltaT":deltaT})
Not sure if I understood the question correctly. But if you want to read the files inside a loop using indexes (i variable), you can create a list to hold the contents of the excel files instead of using 9 different variables.
something like
files = []
files.append(pd.read_excel("/content/113VW.xlsx"))
files.append(pd.read_excel("/content/113W6.xlsx"))
...
then use the index variable to iterate over the list
i=1
while i<=9
avg(i) = np.mean(np.array([files[i]['CC_H='],files[i]['CC_V=']]),axis=0)
...
i+=1
P.S.: I am not a Pandas/NumPy expert, so you may have to adapt the code to your needs

How can I iterate over a numpy multidimensional array of dataframes?

Im trying to iterate over a multidimensional array on Python but I'm having problems because my array is full of dataframes instead of int().
I have a multidimensional numpy array (12, 11) which contains 12 x 11 different dataframes.
nombres_df = np.array([[df_01_ID4034, df_02_ID4034, df_03_ID4034, df_04_ID4034, df_05_ID4034, df_06_ID4034, df_07_ID4034, df_08_ID4034, df_09_ID4034, df_10_ID4034, df_11_ID4034, df_12_ID4034],
[df_01_ID4035, df_02_ID4035, df_03_ID4035, df_04_ID4035, df_05_ID4035, df_06_ID4035, df_07_ID4035, df_08_ID4035, df_09_ID4035, df_10_ID4035, df_11_ID4035, df_12_ID4035],
[df_01_ID4039, df_02_ID4039, df_03_ID4039, df_04_ID4039, df_05_ID4039, df_06_ID4039, df_07_ID4039, df_08_ID4039, df_09_ID4039, df_10_ID4039, df_11_ID4039, df_12_ID4039],
[df_01_ID4040, df_02_ID4040, df_03_ID4040, df_04_ID4040, df_05_ID4040, df_06_ID4040, df_07_ID4040, df_08_ID4040, df_09_ID4040, df_10_ID4040, df_11_ID4040, df_12_ID4040],
[df_01_ID4041, df_02_ID4041, df_03_ID4041, df_04_ID4041, df_05_ID4041, df_06_ID4041, df_07_ID4041, df_08_ID4041, df_09_ID4041, df_10_ID4041, df_11_ID4041, df_12_ID4041],
[df_01_ID4042, df_02_ID4042, df_03_ID4042, df_04_ID4042, df_05_ID4042, df_06_ID4042, df_07_ID4042, df_08_ID4042, df_09_ID4042, df_10_ID4042, df_11_ID4042, df_12_ID4042],
[df_01_ID4047, df_02_ID4047, df_03_ID4047, df_04_ID4047, df_05_ID4047, df_06_ID4047, df_07_ID4047, df_08_ID4047, df_09_ID4047, df_10_ID4047, df_11_ID4047, df_12_ID4047],
[df_01_ID4049, df_02_ID4049, df_03_ID4049, df_04_ID4049, df_05_ID4049, df_06_ID4049, df_07_ID4049, df_08_ID4049, df_09_ID4049, df_10_ID4049, df_11_ID4049, df_12_ID4049],
[df_01_ID4056, df_02_ID4056, df_03_ID4056, df_04_ID4056, df_05_ID4056, df_06_ID4056, df_07_ID4056, df_08_ID4056, df_09_ID4056, df_10_ID4056, df_11_ID4056, df_12_ID4056],
[df_01_ID4059, df_02_ID4059, df_03_ID4059, df_04_ID4059, df_05_ID4059, df_06_ID4059, df_07_ID4059, df_08_ID4059, df_09_ID4059, df_10_ID4059, df_11_ID4059, df_12_ID4059],
[df_01_ID4075, df_02_ID4075, df_03_ID4075, df_04_ID4075, df_05_ID4075, df_06_ID4075, df_07_ID4075, df_08_ID4075, df_09_ID4075, df_10_ID4075, df_11_ID4075, df_12_ID4075]], dtype="object")
for j in range(len(nombres_df)):
for i in range(len(nombres_df[i])):
print (nombres_df[i][j])
I need to iterate over it and make operations with values inside each dataframe.
The problem is that when I try to iterate as usually, I cannot do it because I'm getting this error:
5
6 for j in range(len(nombres_df)):
7 for i in range(len(nombres_df[i])): <--------
8 print (nombres_df[i][j], end = " ")
IndexError: arrays used as indices must be of integer (or boolean) type
I know the problem is here len(nombres_df[i]) but I don`t know how to solve it.
Thank you very much
I thick the problem is the fact that you are iterating over the wrong index in the second line of your code.
that i inside range(len(nombres_df[i])) shoud be j
also you inverted the indexes in nombres_df[i][j] it shoud be nombres_df[j][i]
this shoud do the trick
for j in range(len(nombres_df)):
for i in range(len(nombres_df[j])):
print (nombres_df[j][i])

What's the problem in my for loop code to seperate and make a new DataFrame from my origin Data?

I'm beginner at Python and Pandas.
I have origin Data what i defined F1 and shape (194000,4).
I wanna split it into 97 groups of 2,000 each (ex. Index num 0~1999 is F1_0, 2000~3999 is F1_1)
And i wrote code like below.
n=0
for i in (0, 97):
num=2000*(i+1)
globals()['F1_{0}'.format(i)] = F1.loc[n:num]
n = A
When i call F1_0, there is no problem.
But From F1_1 to F1_96, there is "no define error".
I don't know what's the reason in my code :(
And i'd appreciate if you could let me know if there is better way.
Thanks for reading
Using range instead of only passing a tuple in the loop. In your code, for loop will iterates the value 0 and 97 only, not the range (0, ..., 96).
n=0
for i in range(97):
num=2000*(i+1)
globals()['F1_{0}'.format(i)] = F1.loc[n:num]

optimizing nested for loop wrt time taken

I have a dictionary of dictionaries. Let's say 'data' and a numpy array. Let's say 'stats'.
i am trying to check whether:-
first and second columns of the numpy array exist in a range of 2 keys each in my dictionary of dictionaries OR if those 2 keys exist in range of columns in my numpy array.
Providing my code for reference
The main issue is this is taking a lot of time would really appreciate any help on making this run faster.
Any help will be appreciated, Thank you
final = []
for x,y,w,h,area in stats[:]:
valid = True
if any([(x in range(s["hpos_start"]-2, s["hpos_end"] + 2) and y in range(s["vpos_start"]-2, s["vpos_end"] + 2)) or ((int(s['hpos_start']) in range(x,x+w) and int(s['vpos_start']) in range(y,y+h))) for _, s in data.items()]):
valid = False
if valid:
final.append([x,y,w,h])
sample for stats =
[[ 246 1102 1678 2214 172182],
[ 678 1005 1688 2214 3528850],
[ 1031 241 17 23 331]]
sample for data =
{'0': {
'hpos_start': 244,
'hpos_end': 296,
'vpos_start': 1099,
'vpos_end': 3898,
},
'1': {
'hpos_start': 679,
'hpos_end': 952,
'vpos_start': 231,
'vpos_end': 281
},
'2': {'hpos_start': 1077,
'hpos_end': 1174,
'vpos_start': 231,
'vpos_end': 281
}}
stats is about size (352,5)
data is about size 212
can be more than above as well
My suggestion would be to turn both stats and data into numpy arrays and then figuring out a way to achieve your particular filtering without the need to use explicit for-loops. That's the whole advantage of numpy! Also then you'd use special indexing methods to generate your final array instead of building it piece by piece. Appending to a list can be somewhat slow...
For a small but easy-to-implement speedup: When you use any or all in your code, you should avoid passing it a list when you can pass it a generator expression instead. If you just remove the square brackets from inside the any, you should see a little speedup, because you will avoid always building the full intermediate list! The cool thing about any (and all) is that, when working on iterators, they have what's called short-circuiting: As soon as any finds an item that's True, it knows it can just stop looking at the rest of the items because the answer will be true. Likewise as soon as all finds an item that's False it will stop looking at the rest and just return False.
But really, turn your inputs into numpy arrays (or maybe a numpy array and a pandas dataframe) and then try to figure out a way to avoid for-loops.
To test and time your code, I run it in a for loop 100000 times. Your code runs in 1.406 seconds. I propose the code below: replace "in range" tests by limit tests, supress int(..) casts, and it runs in 0.609 seconds on my pc. Have a try on your pc and see what speed you can gain :
import time
start = time.process_time()
for i in range(100000) :
## 100000x 0.609
final = []
for x,y,h,w,area in stats:
if any([
((
(s["hpos_start"]-2) <= x and x <= (s["hpos_end"] + 2)
and
((s["vpos_start"]-2)<=y and y <=( s["vpos_end"] + 2))
))
or ((
(x<= s['hpos_start'] <= (x+w))
and
(y<= s['vpos_start'] <=(y+h))
))
for s in data.values()
]):
pass
else:
final.append([x,y,w,h])
print(time.process_time() - start)

Access Python Row index

I am trying to access row index as a variable not as list nor anything else.
I tried different methods without success, can anyone be in anyway to help.
thanks,
ddo=df[(df.iloc[:,3]==4) & (df.iloc[:,5]==2) & (df.iloc[:,6]==2) & (df.iloc[:,15]>=0.02)]
starttime=ddo.iloc[0,1]
starttimerow=ddo.index[ddo.iloc[0,1]==starttime]
the expected output to be list not an array
array([[103971, 104031, 104090, 104149, 104209, 104269, 104327, 104385,
104445, 104503, 104562, 104621, 104680, 104737, 104797, 104856,
104914, 104973, 105032, 105091, 105149, 105209, 105267, 105326,
105384, 105443, 105502, 105561, 105620, 105679, 105738, 105796,
105855, 105914, 105972, 106032, 106091, 106150, 106209, 106268,
106326, 106385, 106444, 106502, 106562, 106621, 106680, 106739,
106798, 106856, 106915, 106974, 107032, 107092, 107151, 107210,
107269, 107328, 107386, 107445, 107505, 107565, 107627, 107688,
107751, 107813, 107875, 107935, 107998, 108059, 108120]],
dtype=int64)
I tried these lines
enddwellrow=df.loc[(ddo.iloc[:,1]==end_dwell)&(ddo.iloc[:,3]==4)].reset_index()
enddwell_idx=enddwellrow['index'][0]

Categories

Resources