Python For loop not incrementing - python

clean_offset = len(malware)
tuple_clean = []
tuple_malware = []
for i in malware:
tuple_malware.append([malware.index(i), 0])
print(malware.index(i))
print(tuple_malware)
for j in clean:
tuple_clean.append([(clean_offset + clean.index(j)), 1])
print(clean.index(j))
print(tuple_clean)
import pdb; pdb.set_trace()
training_data_size_mal = 0.8 * len(malware)
training_data_size_clean = 0.8 * len(clean)
i increments as normal and produces correct output however j remains at 0 for three loops and then jumps to 3. I don't understand this.

There is a logical error on clean.index(j).
Array.index will return the first matched index in that array.
So if there are some equal variables there will be some error
You can inspect with below code.
malware = [1,2,3,4,5,6,7,8,8,8,8,8,2]
clean = [1,2,3,4,4,4,4,4,4,2,4,4,4,4]
clean_offset = len(malware)
tuple_clean = []
tuple_malware = []
for i in malware:
tuple_malware.append([malware.index(i), 0])
print(malware.index(i))
print(tuple_malware)
for j in clean:
tuple_clean.append([(clean_offset + clean.index(j)), 1])
print(clean.index(j))
print(tuple_clean)
training_data_size_mal = 0.8 * len(malware)
training_data_size_clean = 0.8 * len(clean)

for a in something
a is what is contained in something, not the index
for example:
for n in [1, 10, 9, 3]:
print(n)
gives
1
10
9
3

You either want
for i in range(len(malware))
or
for i, element in enumerate(malware)
at which point the i is the count and the element in the malware.index(i)
The last one is considered best practice when needing both the index and the element at that index in the loop.

op has already figured the question, but in case anyone is wondering or needs a TL;DR of Barkin's comment, its just a small correction,
replace
for i in malware
for j in clean
with
for i in range(len(malware))
for j in range(len(clean))
and at the end remove the .index() function, and place i and j.

Related

Python ( iteration problem ) with an exercice

The code :
import pandas as pd
import numpy as np
import csv
data = pd.read_csv("/content/NYC_temperature.csv", header=None,names = ['temperatures'])
np.cumsum(data['temperatures'])
printcounter = 0
list_30 = [15.22]#first temperature , i could have also added it by doing : list_30.append(i)[0] since it's every 30 values but doesn't append the first one :)
list_2 = [] #this is for the values of the subtraction (for the second iteration)
for i in data['temperatures']:
if (printcounter == 30):
list_30.append(i)
printcounter = 0
printcounter += 1
**for x in list_30:
substract = list_30[x] - list_30[x+1]**
list_2.append(substraction)
print(max(list_2))
Hey guys ! i'm really having trouble with the black part.
**for x in list_30:
substract = list_30[x] - list_30[x+1]**
I'm trying to iterate over the elements and sub stracting element x with the next element (x+1) but the following error pops out TypeError: 'float' object is not iterable. I have also tried to iterate using x instead of list_30[x] but then when I use next(x) I have another error.
for x in list_30: will iterate on list_30, and affect to x, the value of the item in the list, not the index in the list.
for your case you would prefer to loop on your list with indexes:
index = 0
while index < len(list_30):
substract = list_30[index] - list_30[index + 1]
edit: you will still have a problem when you will reach the last element of list_30 as there will be no element of list_30[laste_index + 1],
so you should probably stop before the end with while index < len(list_30) -1:
in case you want the index and the value, you can do:
for i, v in enumerate(list_30):
substract = v - list_30[i + 1]
but the first one look cleaner i my opinion
if you`re trying to find ifference btw two adjacent elements of an array (like differentiate it), you shoul probably use zip function
inp = [1, 2, 3, 4, 5]
delta = []
for x0,x1 in zip(inp, inp[1:]):
delta.append(x1-x0)
print(delta)
note that list of deltas will be one shorter than the input

Write code that find maximum cuts of an iron bar

I'm currently trying to write a Python programm that can find the maxium cuts for a 5m iron bar.
The allowed cuts are : 0.5m, 1m, 1.2m, and 2m.
Given that I have to find all the combinations that match or get close to 5m.
Combinations exemple:
10*0.5 = 5m
7*0.5 + 1.2 = 4.7m
The second combination is also good because we cant produce a 0.3m cut so we throw it away.
The order inside a combination dosent matter,
7x0.5 + 1.2 = 1.2 + 7x0.5
so we have to count it only once.
So ideally my programm take a file in, and write a file with all the combinations like:
c1 = 10*0.5
c2 = 7*0.5 + 1.2
....
Can anyone share advice from where to start please ? Thank you.
Perhaps this is what you are looking for ?
x = [2, 1.2, 1, 0.5]
max_val = 5
def find_next(state, idx, check=False):
if(idx == len(x)):
return print_it(state, check)
if(sum(state) + x[idx] > max_val):
find_next(state, idx + 1, True)
else:
find_next(state + [x[idx]], idx)
find_next(state, idx + 1)
def print_it(state, check):
if check:
print("")
print(sum(state),state)
return
find_next([],0)
It recursively finds the next value until it runs out of values, and it tracks the current state with a simple list which it prints with print_it function. I will leave the formatting and file storage up to you, which can easily be implemented in that function.

Compute mean and variance on specific columns of 2D array only if a condition on element on each row is satisfied

I have a 2D numpy array with dimension (690L, 15L).
I need to compute a columns wise mean on this dataset only in some particolar columns, but with a condition: I need to include a row if and only if an element in the same row at specific column satisfy a condition. Let's me more cleare with some code.
f = open("data.data")
dataset = np.loadtxt(fname = f, delimiter = ',')
I have array with fullfilled with indexes where I need to perform mean (and variance)
index_catego = [0, 3, 4, 5, 7, 8, 10, 11]
The condition is that the dataset[i, 14] == 1
As output I want an 1D array with length like len(index_catego) where each element of this array is the mean of the previously columns
output = [mean_of_index_0, mean_of_index_3, ..., mean_of_index_11]
I am using Python recently but I am sure there is a cool way of doing this with np.where, mask, np.mean or something else.
I already implement a solution, but it is not elegant and I am not sure if it is correct.
import numpy as np
index_catego = [0, 3, 4, 5, 7, 8, 10, 11]
matrix_mean_positive = []
matrix_variance_positive = []
matrix_mean_negative = []
matrix_variance_negative = []
n_positive = 0
n_negative = 0
sum_positive = np.empty(len(index_catego))
sum_negative = np.empty(len(index_catego))
for i in range(dataset.shape[0]):
if dataset[i, 14] == 0:
n_positive = n_positive + 1
j = 0
for k in index_catego:
sum_positive[j] = sum_positive[j] + dataset[i, k]
j = j + 1
else:
n_negative = n_negative + 1
j = 0
for k in index_catego:
sum_negative[j] = sum_negative[j] + dataset[i, k]
j = j + 1
for item in np.nditer(sum_positive):
matrix_mean_positive.append(item / n_positive)
for item in np.nditer(sum_negative):
matrix_mean_negative.append(item / n_negative)
print(matrix_mean_positive)
print(matrix_mean_negative)
If you wanna try your solution, I put some data example
1,22.08,11.46,2,4,4,1.585,0,0,0,1,2,100,1213,0
0,22.67,7,2,8,4,0.165,0,0,0,0,2,160,1,0
0,29.58,1.75,1,4,4,1.25,0,0,0,1,2,280,1,0
0,21.67,11.5,1,5,3,0,1,1,11,1,2,0,1,1
1,20.17,8.17,2,6,4,1.96,1,1,14,0,2,60,159,1
0,15.83,0.585,2,8,8,1.5,1,1,2,0,2,100,1,1
1,17.42,6.5,2,3,4,0.125,0,0,0,0,2,60,101,0
Thanks for you help.
UPDATE 1:
I tried with this
output_positive = dataset[:, index_catego][dataset[:, 14] == 0]
mean_p = output_positive.mean(axis = 0)
print(mean_p)
output_negative = dataset[:, index_catego][dataset[:, 14] == 1]
mean_n = output_negative.mean(axis = 0)
print(mean_n)
but means computed by the first (solution not cool) and the second solution (one line cool solotion) are all different.
I checked what dataset[:, index_catego][dataset[:, 14] == 0] and dataset[:, index_catego][dataset[:, 14] == 1] select and seems correct (right dimension and right element).
UPDATE 2:
Ok, the first solution is wrong because (for example) the first column have as element only 0 and 1, but as mean return a value > 1. I do not know where I failed. Seems that the positive class is correct (or at least plausible), while negative class are not even plausible.
So, is it second solution correct? Is there a better way of doing it?
UPDATE 3:
I think I found the problem with the first solution: I am using jupyter notebook and sometimes (not all the times) when I rerun the same cell where the first solution is, element in matrix_mean_positive and matrix_mean_negative are doubled. If someone know why, could be point me?
Now both solution return the same means.
Do Kernel->Restart in jupyter notebook to clean the memory before rerun

Issues with using np.linalg.solve in Python

Below, I'm trying to code a Crank-Nicholson numerical solution to the Navier-Stokes equation for momentum (simplified with placeholders for time being), but am having issues with solving for umat[timecount,:], and keep getting the error "ValueError: setting an array element with a sequence". I'm extremely new to Python, does anyone know what I could do differently to avoid this problem?
Thanks!!
def step(timesteps,dn,dt,Numvpts,Cd,g,alpha,Sl,gamma,theta_L,umat):
for timecount in range(0, timesteps+1):
if timecount == 0:
umat[timecount,:] = 0
else:
Km = 1 #placeholder for eddy viscosity
thetaM = 278.15 #placeholder for theta_m for time being
A = Km*dt/(2*(dn**2))
B = (-g*dt/theta_L)*thetaM*np.sin(alpha)
C = -dt*(1/(2*Sl) + Cd)
W_arr = np.zeros(Numvpts+1)
D = np.zeros(Numvpts+1)
for x in (0,Numvpts): #creating the vertical veocity term
if x==0:
W_arr[x] = 0
D[x] = 0
else:
W_arr[x] = W_arr[x-1] - (dn/Sl)*umat[timecount-1,x-1]
D = W_arr/(4*dn)
coef_mat_u = Neumann_mat(Numvpts,D-A,(1+2*A),-(A+D))
b_arr_u = np.zeros(Numvpts+1) #the array of known quantities
umat_forward = umat[timecount-1,2:Numvpts]
umat_center = umat[timecount-1,1:Numvpts-1]
umat_backward = umat[timecount-1,0:Numvpts-2]
b_arr_u = np.zeros(Numvpts+1)
for j in (0,Numvpts):
if j==0:
b_arr_u[j] = 0
elif j==Numvpts:
b_arr_u[j] = 0
else:
b_arr_u[j] = (A+D[j])*umat_backward[j]*(1-2*A)*umat_center[j] + (A-D[j])*umat_forward[j] - C*(umat_center[j]*umat_center[j]) - B
umat[timecount,:] = np.linalg.solve(coef_mat_u,b_arr_u)
return(umat)
Please note that,
for i in (0, 20):
print(i),
will give result 0 20 not 0 1 2 3 4 ... 20
So you have to use the range() function
for i in range(0, 20 + 1):
print(i),
to get 0 1 2 3 4 ... 20
I have not gone through your code rigorously, but I think the problem is in your two inner for loops:
for x in (0,Numvpts): #creating the vertical veocity term
which is setting values only at zero th and (Numvpts-1) th index. I think you must use
for x in range(0,Numvpts):
Similar is the case in (range() must be used):
for j in (0,Numvpts):
Also, here j never becomes == Numvpts, but you are checking the condition? I guess it must be == Numvpts-1
And also the else condition is called for every index other than 0? So in your code the right hand side vector has same numbers from index 1 onwards!
I think the fundamental problem is that you are not using range(). Also it is a good idea to solve the NS eqns for a small grid and manually check the A and b matrix to see whether they are being set correctly.

How to break down an expression into for-loops

I'm not a python expert, and I ran into this snippet of code which actually works and produces the correct answer, but I'm not sure I understand what happens in the second line:
for i in range(len(motifs[0])):
best = ''.join([motifs[j][i] for j in range(len(motifs))])
profile.append([(best.count(base)+1)/float(len(best)) for base in 'ACGT'])
I was trying to replace it with something like:
for i in range(len(motifs[0])):
for j in range(len(motifs)):
best =[motifs[j][i]]
profile.append([(best.count(base)+1)/float(len(best)) for base in 'ACGT'])
and also tried to break down the last line like this:
for i in range(len(motifs[0])):
for j in range(len(motifs)):
best =[motifs[j][i]]
for base in 'ACGT':
profile.append(best.count(base)+1)/float(len(best)
I tried some more variations but non of them worked.
My question is: What are those expressions (second and third line of first code) mean and how would you break it down to a few lines?
Thanks :)
''.join([motifs[j][i] for j in range(len(motifs))])
is idiomatically written
''.join(m[i] for m in motifs)
so it concatenates the i'th entry of all motifs, in order. Similarly,
[(best.count(bseq)+1)/float(len(seq)) for base in 'ACGT']
builds a list of (best.count(bseq)+1)/float(len(seq)) values for of ACGT; since the base variable doesn't actually occur, it's a list containing the same value four times and can be simplified to
[(best.count(bseq)+1) / float(len(seq))] * 4
for i in range(len(motifs[0])):
seq = ''.join([motifs[j][i] for j in range(len(motifs))])
profile.append([(best.count(bseq)+1)/float(len(seq)) for base in 'ACGT'])
is equivalent to:
for i in range(len(motifs[0])):
seq = ''
for j in range(len(motifs)):
seq += motifs[j][i]
profile.append([(best.count(bseq)+1)/float(len(seq)) for base in 'ACGT'])
which can be improved in countless ways.
For example:
seqs = [ ''.join(motif) for motif in motifs ]
bc = best.count(bseq)+1
profilte.extend([ map(lambda x: bc / float(len(x)),
seq) for base in 'ACGT' ] for seq in seqs)
correctness of which, I cannot test due to lack of input/output conditions.
Closest I got without being able to test it
for i, _ in enumerate(motifs[0]):
seq = ""
for m in motifs:
seq += m[i]
tmp = []
for base in "ACGT":
tmp.append(best.count(bseq) + 1 / float(len(seq)))
profile.append(tmp)

Categories

Resources