Dumb error in my if/and statement - not seeing it - python

I have a data set with float values:
dog-auto dog-bird dog-cat dog-dog Result
41.9579761457 41.7538647304 36.4196077068 33.4773590373 0
46.0021331807 41.33958925 38.8353268874 32.8458495684 0
42.9462290692 38.6157590853 36.9763410854 35.0397073189 0
41.6866060048 37.0892269954 34.575072914 33.9010327697 0
39.2269664935 38.272288694 34.778824791 37.4849250909 0
40.5845117698 39.1462089236 35.1171578292 34.945165344 0
45.1067352961 40.523040106 40.6095830913 39.0957278345 0
41.3221140974 38.1947918393 39.9036867306 37.7696131032 0
41.8244654995 40.1567131661 38.0674700168 35.1089144603 0
45.4976929401 45.5597962603 42.7258732951 43.2422832585 0
This is an SFrame. I have attempted to write a function that uses an if/an statement to determine if the value for dog-dog is less that the values for dog-ct AND dog-auto AND dog-bird.
I've gone through this for the better part of 4 hours. Admittedly I'm a newby to python - I'm making a illy mistake and just not seeing it.
If statement:
def is_dog_correct(row):
if (dog_distances[dog_distances['dog-dog']] < dog_distances[dog_distances['dog-cat']]) & (dog_distances[dog_distances['dog-dog']] < dog_distances[dog_distances['dog-bird']]) & (dog_distances[dog_distances['dog-dog']] < dog_distances[dog_distances['dog-auto']]):
dog_distances['Result'] = 1
else:
dog_distances['Result'] = 0
then I call the function with:
dog_distances.apply(is_dog_correct)
If this was working correctly, I would see "0" in every row but the fifth record. What is wrong with my if statement?
Full disclosure - this is coursework, but after spending 4 hours on this, I'm reaching for help!

Change & to and as indicated by the previous comments. Also, I recommend you break up such long if statements into multiple lines so it's clearer and easier to read.
def is_dog_correct(row):
if (dog_distances[dog_distances['dog-dog']] < dog_distances[dog_distances['dog-cat']]) and
(dog_distances[dog_distances['dog-dog']] < dog_distances[dog_distances['dog-bird']]) and
(dog_distances[dog_distances['dog-dog']] < dog_distances[dog_distances['dog-auto']]):
dog_distances['Result'] = 1
else:
dog_distances['Result'] = 0

Make your first if statement more clean by finding the min (minimum) of all of the values. This makes sure that 'dog-dog' is less than all of the rest:
def is_dog_correct(row):
if dog_distances[dog_distances['dog-dog']] < min([dog_distances[dog_distances['dog-'+x]] for x in ['cat','bird','auto']]):
dog_distances['Result'] = 0
else:
dog_distances['Result'] = 1
EDIT: For debuggin purposes use the following:
def is_dog_correct(row):
print 'dog is {}'.format(dog_distances[dog_distances['dog-dog']])
print 'everyone else is {}'.format([dog_distances[dog_distances['dog-'+x]] for x in ['cat','bird','auto']])
if dog_distances[dog_distances['dog-dog']] < min([dog_distances[dog_distances['dog-'+x]] for x in ['cat','bird','auto']]):
print 'Yay dog is faster'
dog_distances['Result'] = 0
else:
print 'Awww, dog is not faster'
dog_distances['Result'] = 1

Related

Using only for loop and if statement (no built in functions), group the similar values in a column and add the corresponding values in another column

I have a following dataframe - df (this is a demo one, actual one is very big):
Text
Score
'I love pizza!'
2
'I love pizza!'
1
'I love pizza!'
3
'Python rules!'
0
'Python rules!'
5
I want to group the 'Text' column values and then add the following rows of the 'Score' column.
The output I desire is thus:
Text
Score
Sum
'I love pizza!'
2
6
'I love pizza!'
1
6
'I love pizza!'
3
6
'Python rules!'
0
5
'Python rules!'
5
5
I know how to get the desired output using Python/Pandas groupby and sum() (and aggregate) methods, for instance,
df1 = df.groupby('Text')['Score'].sum().reset_index(name='Sum')
df3 = df.merge(df1, on='Text', how='left')
However, I do not want to use any such in-built functions. I want to only use simple for loop and if statement to accomplish this.
I tried doing this the following way:
def func(df):
# NOTE, CANNOT USE LIST APPEND (as it is an in-built function).
sum = 0
n = len(df['text']) # NEED TO WORK FOR-LOOP USING INTEGERS AND HENCE NEED LENGTH
for i in range(0,n):
exists = False #flag to track repeated values
for j in range(i+1,n):
if df['text'][i] == df['text'][j]: # IF TRUE, THEN THE 'TEXT' ROWS ARE SIMILAR I.E. GROUPED
exists = True
sum = df['score'][i] + df['score'][j]
break;
if not exists:
sum += sum
return sum
df['Sum'] = func(df)
The output for this script is incorrect:
Text
Score
Sum
'I love pizza!'
2
10
'I love pizza!'
1
10
'I love pizza!'
3
10
'Python rules!'
0
10
'Python rules!'
5
10
I have tried playing around with the above script, I get different results, but never the correct one. Any help with this is greatly appreciated!
Thank you so much in advance!
Herein is the script that produces the correct output for the above question:
def func(df):
result = []
final_result = []
n = len(df['Text'])
#Add a list of zeros the same length as the original list (= n) to flag positions already checked
flags = [0] * n
for k in range(0,n):
sum = df['Score'][k]
for i in range(0,n):
#Step to skip (continue) without doing anything if the position has already been flagged (processed, counted)
if flags[i]:
continue
else:
if i==k:
for j in range(i+1,n):
if df['Text'][i]==df['Text'][j]: #If true, then the 'Text' rows are similar, i.e. grouped
#Every time there is a match, the position is flageed by turning it to 1
flags[j] = 1
sum += df['Score'][j]
result = sum
break
final_result += [result]
return final_result
df['Sum'] = func(df)

problem calculating minimum amount of coins in change using python

I have a homework assignment in which we have to write a program that outputs the change to be given by a vending machine using the lowest number of coins. E.g. £3.67 can be dispensed as 1x£2 + 1x£1 + 1x50p + 1x10p + 1x5p + 1x2p.
However, my program is outputting the wrong numbers. I know there will probably be rounding issues, but I think the current issue is to do with my method of coding this.
change=float(input("Input change"))
twocount=0
onecount=0
halfcount=0
pttwocount=0
ptonecount=0
while change!=0:
if change-2>-1:
change=change-2
twocount+=1
else:
if change-1>-1:
change=change-1
onecount+=1
else:
if change-0.5>-1:
change=change-0.5
halfcount+=1
else:
if change-0.2>-1:
change=change-0.2
pttwocount+=1
else:
if change-0.1>-1:
change=change-0.1
ptonecount+=1
else:
break
print(twocount,onecount,halfcount,pttwocount,ptonecount)
RESULTS:
Input: 2.3
Output: 11010
i.e. 3.2
Input: 3.2
Output:20010
i.e. 4.2
Input: 2
Output: 10001
i.e. 2.1
All your comparisons use >-1, so you give out change as long as you have more than -1 balance.
This would be correct if you were only dealing with integers, since there >-1 is equal to >=0.
For floating point numbers however, we have for example -0.5>-1, so we will give out change for negative balance (which we do not want).
So the correct way would be to replace all >-1 comparisons by >=0 (larger or equal to 0) comparisons.
The problem is how it calculates the change using your if/else statements. If you walk through the first example change-2>-1 will register true and then result will be .3 but on the next loop the if change - 1 > -1 you are expecting to be false but it's not it's actually -0.7. One of the best ways to do this would be with Python's floor // and mod % operators. You have to round some of the calculations because of the way Python handles floats
change=float(input("Input change"))
twocount=0
onecount=0
halfcount=0
pttwocount=0
ptonecount=0
twocount = int(change//2)
change = round(change%2,1)
if change//1 > 0:
onecount = int(change//1)
change = round(change%1,1)
if change//0.5 > 0:
halfcount = int(change//0.5)
change = round(change%0.5, 1)
if change//0.2 > 0:
pttwocount = int(change//0.2)
change = round(change%0.2, 1)
if change//0.1 > 0:
ptonecount = int(change//0.1)
change = round(change%0.1,1)
print(twocount,onecount,halfcount,pttwocount,ptonecount)
But given the inputs this produces
Input: 2.3
Output: 1 0 0 1 1
Input: 3.2
Output:1 1 0 1 0
Input: 2
Output: 1 0 0 0 0

Is there a way to increment the iterator if an 'if' condition is met

I'm solving this HackerRank challenge:
Alice has a binary string. She thinks a binary string is beautiful if and only if it doesn't contain the substring '010'.
In one step, Alice can change a 0 to a 1 or vice versa. Count and print the minimum number of steps needed to make Alice see the string as beautiful.
So basically count the number of '010' occurrences in the string 'b' passed to the function.
I want to increment i by 2 once the if statement is true so that I don't include overlapping '010' strings in my count.
And I do realize that I can just use the count method but I wanna know why my code isn't working the way I want to it to.
def beautifulBinaryString(b):
count = 0
for i in range(len(b)-2):
if b[i:i+3]=='010':
count+=1
i+=2
return count
Input: 0101010
Expected Output: 2
Output I get w/ this code: 3
You are counting overlapping sequences. For your input 0101010 you find 010 three times, but the middle 010 overlaps with the outer two 010 sequences:
0101010
--- ---
---
You can't increment i in a for loop, because the for loop construct sets i at the top. Giving i a different value inside the loop body doesn't change this.
Don't use a for loop; you could use a while loop:
def beautifulBinaryString(b):
count = 0
i = 0
while i < len(b) - 2:
if b[i:i+3]=='010':
count += 1
i += 2
i += 1
return count
A simpler solution is to just use b.count("010"), as you stated.
If you want to do it using a for loop, you can add a delta variable to keep track of the number of positions that you have to jump over the current i value.
def beautifulBinaryString(b):
count = 0
delta = 0
for i in range(len(b)-2):
try:
if b[i+delta:i+delta+3]=='010':
count+=1
delta=delta+2
except IndexError:
break
return count
You don't need to count the occurrences; as soon as you find one occurrence, the string is "ugly". If you never find one, it's beautiful.
def is_beautiful(b):
for i in range(len(b) - 2):
if b[i:i+3] == '010':
return False
return True
You can also avoid the slicing by simply keeping track of whether you've started to see 010:
seen_0 = False
seen_01 = False
for c in b:
if seen_01 and c == '0':
return False
elif seen_1 and c == '1':
seen_01 = True
elif c == '0':
seen_0 = True
else:
# c == 1, but it doesn't follow a 0
seen_0 = False
seen_01 = False
return True

Consensus string & profile matrix

I have written the following python code to solve one of the Rosalind problems (http://rosalind.info/problems/cons/) and for some reason, Rosalind says the answer is wrong but I did some spot-checking and it appears right.
The problem is as follows:
Given: A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.
Return: A consensus string and profile matrix for the collection. (If several possible consensus strings exist, then you may return any one of them.)
A sample dataset is:
>Rosalind_1
ATCCAGCT
>Rosalind_2
GGGCAACT
>Rosalind_3
ATGGATCT
>Rosalind_4
AAGCAACC
>Rosalind_5
TTGGAACT
>Rosalind_6
ATGCCATT
>Rosalind_7
ATGGCACT
A sample solution is:
ATGCAACT
A: 5 1 0 0 5 5 0 0
C: 0 0 1 4 2 0 6 1
G: 1 1 6 3 0 1 0 0
T: 1 5 0 0 0 1 1 6
My attempt to solve this:
from Bio import SeqIO
A,C,G,T = [],[],[],[]
consensus=""
for i in range(0,len(record.seq)):
countA,countC,countG,countT=0,0,0,0
for record in SeqIO.parse("fasta.txt", "fasta"):
if record.seq[i]=="A":
countA=countA+1
if record.seq[i]=="C":
countC=countC+1
if record.seq[i]=="G":
countG=countG+1
if record.seq[i]=="T":
countT=countT+1
A.append(countA)
C.append(countC)
G.append(countG)
T.append(countT)
if countA >= max(countC,countG,countT):
consensus=consensus+"A"
elif countC >= max(countA,countG,countT):
consensus=consensus+"C"
elif countG >= max(countA,countC,countT):
consensus=consensus+"G"
elif countT >= max(countA,countC,countG):
consensus=consensus+"T"
print("A: "+" ".join([str(i) for i in A]))
print("C: "+" ".join([str(i) for i in C]))
print("G: "+" ".join([str(i) for i in G]))
print("T: "+" ".join([str(i) for i in T]))
print(consensus)
Would be great if someone can take a look and suggest what I am doing wrong? Many thanks!
For your consensus string, your code is not handling the case in which you have a tie, i.e., two nucleotides in a given position are equally frequent. The way your code is written now, this case will result in nothing being printed at that position in the consensus string
in this part
if countA >= max(countC,countG,countT):
consensus=consensus+"A"
elif countC >= max(countA,countG,countT):
consensus=consensus+"C"
elif countG >= max(countA,countC,countT):
consensus=consensus+"G"
elif countT >= max(countA,countC,countG):
consensus=consensus+"T"
Use this instead and you will get your Consensus sequences correctly
if countA[i] >= max(countC[i],countG[i],countT[i]):
consensus+="A"
if countC[i] >= max(countA[i],countG[i],countT[i]):
consensus+="C"
if countG[i] >= max(countA[i],countC[i],countT[i]):
consensus+="G"
if countT[i] >= max(countA[i],countC[i],countG[i]):
consensus+="T"

Speeding Up Python Code Time

start = time.time()
import csv
f = open('Speed_Test.csv','r+')
coordReader = csv.reader(f, delimiter = ',')
count = -1
successful_trip = 0
trips = 0
for line in coordReader:
successful_single = 0
count += 1
R = interval*0.30
if count == 0:
continue
if 26 < float(line[0]) < 48.7537144 and 26 < float(line[2]) < 48.7537144 and -124.6521017 < float(line[1]) < -68 and -124.6521017 < float(line[3]) < -68:
y2,x2,y1,x1 = convertCoordinates(float(line[0]),float(line[1]),float(line[2]),float(line[3]))
coords_line,interval = main(y1,x1,y2,x2)
for item in coords_line:
loop_count = 0
r = 0
min_dist = 10000
for i in range(len(df)):
dist = math.sqrt((item[1]-df.iloc[i,0])**2 + (item[0]-df.iloc[i,1])**2)
if dist < R:
loop_count += 1
if dist < min_dist:
min_dist = dist
r = i
if loop_count != 0:
successful_single += 1
df.iloc[r,2] += 1
trips += 1
if successful_single == (len(coords_line)):
successful_trip += 1
end = time.time()
print('Percent Successful:',successful_trip/trips)
print((end - start))
I have this code and explaining it would be extremely time consuming but it doesn't run as fast as I need it to in order to be able to compute as much as I'd like. Is there anything anyone sees off the bat that I could do to speed the process up? Any suggestions would be greatly appreciated.
In essence it reads in 2 lat and long coordinates and changes them to a cartesian coordinate and then goes through every coordinate along the path from on origin coordinate to the destination coordinate in certain interval lengths depending on distance. As it is doing this though there is a data frame (df) with 300+ coordinate locations that it checks against each one of the trips intervals and sees if one is within radius R and then stores the shortest on.
Take advantage of any opportunity to break out of a for loop once the result is known. For example, at the end of the for line loop you check to see if successful_single == len(coords_line). But that will happen any time the statement if loop_count != 0 is False, because at that point successful_single will not get incremented; you know that its value will never reach len(coords_line). So you could break out of the for item loop right there - you already know it's not a "successful_trip." There may be other situations like this.
have you considered pooling and running these calculations in parallel ?
https://docs.python.org/2/library/multiprocessing.html
Your code also suggests the variable R,interval might create a dependency and requires a linear solution

Categories

Resources