How to successfully conduct a Unit Test in Python?

How to successfully conduct a Unit Test in Python? - python

So I am trying to run a Unit Test in which only one correct solution passes and every other incorrect solutions need to fail. But, the thing is the Unit Test has to account for a broad spectrum of test cases even negative values. How can I do this in which only one solution passes and every other fails? I heard of people doing this efficiently by using hash tables in which the input is the key and the output is the value.
What I did below apparently isn't good enough of a Unit Test and is marked incorrect.
Unit_Test/lecture/MainObject.py
def computeShippingCost(input):
if (0 < input <= 30):
return 5
elif (input > 30):
return ((input - 30) * 0.25) + 5
Unit_Test/tests/Testing.py
from lecture.MainObject import computeShippingCost
class Testing(object):
def Test(self):
assert computeShippingCost(20) == 3 #incorrect
assert computeShippingCost(-30) == -8 #incorrect
assert computeShippingCost(40) == -20 #incorrect
assert computeShippingCost(50) == 10 #correct

From your example and description I take it that there might be a very fundamental misconception about how a test should look. Every test somehow stimulates its subject (the system under test, aka SUT), and then verifies that the result meets the expectation.
On a very abstract level, a test looks like this:
def myTest():
<Prepare the SUT for the test>
<Stimulate the SUT>
<check if the result matches the expectation>
The intention is, that a failing test will indicate that there is a bug in the SUT. Correctly implemented code shall not lead to a failing test. (*)
In your code example, you have stimulated the SUT and checked the result in the following way:
assert computeShippingCost(20) == 3 #incorrect
From the implementation of computeShippingCost it is clear that the result in this case would be 5 and not 3. There are now two possibilities:
A) computeShippingCost is implemented correctly. Then, the expectation in this case should be 5. An assertion against anything else than 5 will fail. This violates the above goal (*), because you will have a failing test although the code is implemented correctly.
B) computeShippingCost has a bug, and it actually should deliver 3 in this situation. Then, this assertion represents a useful test, and the fact that it fails indicates to you that your function has a bug.

Related

Brute Force Created In Python | Numerical Only | Up to 4 Digits

Thank you very much for taking the time to read this. I am a tadpole when it comes to write python and right now I just written the framework to create a brute force algo to sort up to 3 digits worth of numbers. Am I in the right direction?
I am assuming the web app actually reveal the number of digits it is send to your email. I first learn about this from TryHackMe.
Now it can randomly create up to 4 digit numbers as seen in BookFace and the above can crack the code as seen using their example. My question is am I doing it in the right way? Because I seen other people's sample of bruteforce and they use a lot of function. Am I being too long winded?
import random
digits = 4 #reveal the number of digits
i=0
z=0
x="" #init password to nothing
#generate random password up to 4 digits
while i < digits:
z = random.randint(0,9)
print(z)
x = str(x) + str(z)
i+=1
y = 1
print("Code Random Generated: " + x)
if digits == 1:
while y!=int(x):
print(y)
y+=1
elif digits == 2:
while y!=int(x):
if len(str(y)) == 1:
print("0" + format(y))
elif len(str(y)) == 2:
print(y)
y+=1
elif digits == 3:
while y!=int(x):
if len(str(y)) ==1:
print("00" + format(y))
elif len(str(y)) == 2:
print("0" + format(y))
elif len(str(y)) == 3:
print(y)
y+=1
elif digits == 4:
while y!=int(x):
if len(str(y)) ==1:
print("000" + format(y))
elif len(str(y)) == 2:
print("00" + format(y))
elif len(str(y)) == 3:
print("0" + format(y))
elif len(str(y)) == 4:
print(y)
y+=1
if y !=0:
print("Reset Code Revealed: " + format(y))

You are sort of asking for a code review, not for a problem's solution. This is what Code Review Stack Exchange is for.
Anyway, let's look at my own solution to the problem :
"""
An humble try at cracking simple passwords.
"""
import itertools
import random
from typing import Callable, Iterable, Optional
def generate_random_numerical_password(ndigits: int) -> str:
return "".join(str(random.randint(0, 9)) for _ in range(ndigits))
def guesses_generator(ndigits: int) -> Iterable[str]:
yield from ("".join(digits) for digits in itertools.product("0123456789", repeat=ndigits))
PasswordOracle = Callable[[str], bool]
def cracker(ndigits: int, oracle: PasswordOracle) -> Optional[str]:
for guess in guesses_generator(ndigits):
if oracle(guess):
return guess
else:
return None
if __name__ == "__main__":
NDIGITS = 4 # difficulty
print(f"the difficulty (number of digits) is set to {NDIGITS}")
password = generate_random_numerical_password(NDIGITS)
print(f"the password to crack is {password!r}")
password_oracle = lambda guess: guess == password # do not use `is`, see Python's string interning
if match := cracker(NDIGITS, password_oracle):
print("Cracked!")
else:
print("Error, not found")
Some differences :
The different parts of the program are clearly delimited using functions (password creation and cracking).
The guess generation is generalized : if the number of digits become 5, you will have to write another (tedious and error-prone) elif.
The "flow" of the program is made very clear in the "main" part : generate a password and the corresponding oracle, call the cracker and check if a result was found. There is very few lines to read, and using descriptive names helped too.
The names come from the domain ("cracking") : "guess", "difficulty", "generator" instead of abstract ones like "x" and "y".
Some language knowledge is used : standard types operations (str-joining instead of concatenation), generators (yield from), libraries (itertools.product), syntax (walrus oparetor if match := cracker(...), ...
There is more documentation : comments, docstring at the top of the module, type annotations, ... all of these helping to understand how the program works.
My question is am I doing it in the right way? Because I seen other people's sample of bruteforce and they use a lot of function. Am I being too long winded?
I did not do use functions on purpose to do like the others people, but because I see things in a different way than you, which I will try to explain.
In the end, both your solution and mine solve the initial problem "crack a four-digit password". In this way, they are not much different. But there could be other ways to consider :
Will the problem change ? Could the password be 5 digits, or contain alphabetic characters, or special characters ? In such cases, how long will it take to adapt the script ? This is the malleability of the code.
How clear is the code ? If I have a bug to fix, how long will it take me to find where it comes from ? What if it is someone new to Python ? Or someone that knows Python well but never saw the code ? This is the maintainability of the code.
How easy is it to test the code ? What can be tested ? This is teastability.
How fast is the code ? How much memory does it uses ? This is performance ?
Can the program crash ? What if the program is given bad inputs ? Could it cause an infinite loop ? This is program-safety.
...
Depending on how your objectives about malleability, maintainability, testability, performance, safety, ... (other qualities of a program), but also depending on the context. who is writing the code ? Who will read it later ? How much experienced they are ? How much time do they have to finish writing it ? Will it be run only once then thrown away or will it be deployed on millions of devices ?
All of that affects how you write it. If I was in your shoes (beginner to Python, writing a run-once-then-forget script) I would have done the same as you. The difference is the context and the objectives. I wrote my code as an example to show you the difference. But neither is good nor bad.
You are not bad at running just because you are slower than an Olympic athlete. You can only be bad relative to a context and objectives. In Physical Education class, you are graded on your running speed according to the average for your age and the progress you made.
Seeing things in their perspective is a very useful skill, not just for programming.
Whan you compare your code to other's, yours seem less "clever", less "clean", less "elegant". But you are comparing apples to oranges. You are not the others. If your solution was accepted (correct and fast enough) that's a good start at your level.
My years of professionnal experience working with several other people on tens-of-thousand-lines codebases that have to be maintained for 20 years are a different set of contexts and objectives than you learning the language in fun ways (TryHackMe). Neither is objectively bad, both are subjectively good.
TL;DR : your code is fine for a beginner, you still have lots to learn if you want to, and keep having fun !

Python Memoization failing on Leetcode

The Scenario:
I am doing a question on Leetcode called nth Ugly Number. The algorithm is to find the nth number whose prime factors include only 1, 2, 3, and 5.
I created a solution which was accepted and passed all the tests. Then, I wanted to memoize it for practice with memoization with python - however, something has gone wrong with the memoization. It works for my own personal tests, but Leetcode does not accept the answer.
The memoized code is detailed below:
class Solution:
uglyNumbers = [1, 2, 3, 4, 5]
latest2index = 2
latest3index = 1
latest5index = 1
def nthUglyNumber(self, n: int) -> int:
while len(self.uglyNumbers) <= n:
guess2 = self.uglyNumbers[self.latest2index] * 2
guess3 = self.uglyNumbers[self.latest3index] * 3
guess5 = self.uglyNumbers[self.latest5index] * 5
nextUgly = min(guess2, guess3, guess5)
if(nextUgly == guess2):
self.latest2index += 1
if(nextUgly == guess3):
self.latest3index += 1
if(nextUgly == guess5):
self.latest5index += 1
self.uglyNumbers.append(nextUgly)
return self.uglyNumbers[n-1]
The only change I made when memoizing was to make uglyNumbers, latest2index, etc. to be class members instead of local variables.
The Problem:
When I submit to LeetCode, it claims that the solution no longer works. Here is where it breaks:
Input 12 /// Output 6 /// Expected 16
However, when I test the code myself and provide it with input 12, it gives the expected output 16. It does this even if I call nthUglyNumber with a bunch of different inputs before and after 12, so I have no idea why the test case breaks upon being submitted to LeetCode
Here's the testing I performed to confirm that the algorithm appears to work as expected:
# This code goes inside Class Solution
def nthUglyNumber(self, n: int) -> int:
print("10th: " + str(self.nthUgliNumber(10)))
print("11th: " + str(self.nthUgliNumber(11)))
print("12th: " + str(self.nthUgliNumber(12)))
print("9th: " + str(self.nthUgliNumber(9)))
print("14th: " + str(self.nthUgliNumber(14)))
print("10th: " + str(self.nthUgliNumber(10)))
print("11th: " + str(self.nthUgliNumber(11)))
print("12th: " + str(self.nthUgliNumber(12)))
return self.nthUgliNumber(n)
def nthUgliNumber(self, n: int) -> int:
# The regular definition of nthUglyNumber goes here
What I want to know
Is there some edge case in Python memoization that I am not seeing that's causing the code to trip up? Or is it fully Leetcode's fault? I know my algorithm works without memoization, but I want to understand what's going wrong so I gain a better understanding of Python and so that I can avoid similar mistakes in the future.
I appreciate the help!

I believe leetcode is probably running all tests in parallel on multiple threads using separate instances of the Solution class. Since you are storing nthUgliNumber as a class variable, instances may be updating it (and the 3 indexes) in a conflicting manner.
From leetcode's perspective, each test is not expected to have side effects that would impact other tests. So, parallel execution in distinct instances is legitimate. Caching beyond the scope of the test case is likely undesirable as it would make performance measurements inconsistent and dependent on the order and content of the test cases.

Why do we need to add a "sleep" method to make a constant time attack succeed?

In this website: http://elijahcaine.me/remote-timing-attacks/ the author describes well what is a constant time attack and how to protect against this type of vulnerability.
But in the code that the author have done:
# secret.py
from time import sleep # Used to exaggerate time difference.
from sys import argv # Used to read user input.
def is_equal(a,b):
"""Custom `==` operator"""
# Fail if the strings aren't the right length
if len(a) != len(b):
return False
for i in range(len(a)):
# Short-circuit if the strings don't match
if a[i] != b[i]:
return False
sleep(0.15) # This exaggerates it just enough for our purposes
return True
# Hard-coded secret globals FOR DEMONSTRATIONS ONLY
secret = 'l33t'
# This is python for "If someone uses you as a script, do this"
if __name__ == '__main__':
try:
# The user got it right!
if is_equal(str(argv[1]), secret):
print('You got the secret!')
# The user got it wrong
else:
print('Try again!')
# The user forgot to enter a guess.
except IndexError:
print('Usage: python secret.py yourguess\n' \
+'The secret may consist of characters in [a-z0-9] '\
+'and is {} characters long.'.format(len(secret)))
I don't understand why we have to add this line to make the constant time attack succeed:
sleep(0.15) # This exaggerates it just enough for our purposes
In the website, the author says :
it exaggerates the time it takes to evaluate the is_equal function.
I've tried it, and we need a "sleep" method to make this attack succeed. Why we need to exaggerate the time?

Edit 1:
Why we need to exagerate the time ?
We need to exaggerate the time to showcase the time difference when two characters match and when they don't. So in this case, if the first character of a and b match, the method sleeps, then if the second characters don't match the function returns. This took 1 comparasion time + sleep(0.15) + 1 comparasion time.
On the other hand, if the first characters don't match, the functions returns in 1 comparasion time, so the attacker can see, if they match any character or not. The example uses this sleep to demonstrate this time difference.
For this to not happen, the is_equal function should be implemented in a way, that the response time of the function is static.
Using the example you provided:
def is_equal(a,b):
_is_equal = True
if len(a) != len(b):
return False
for i in range(len(a)):
if a[i] != b[i]:
_is_equal = False
return _is_equal
There is a built-in function in the secrets module which solves this problem.
compare_digest()

There are two possible paths to take in the "match" loop:
for i in range(len(a)):
# Short-circuit if the strings don't match
if a[i] != b[i]:
return False
sleep(0.15) # This exaggerates it just enough for our purposes
return True
if a[i] != b[i] evaluates as True - no match, exit from the function.
if a[i] != b[i] evaluates as False - match, continue to Sleep(0.15) before leaving function.
Sleep(0.15) if characters match adds significant time difference between these two paths. This in turn allows to simply use max of all attempts to identify correct character of the secret. Without this exaggeration you need to look for statistically significant differences in matching times.
Author mentioned this here:
Most important [for the author] we don't need to use StatisticsTM to
figure the secret, evaluating each input multiple times and
collecting/processing that timing data, it already takes about one
magnitude longer to evaluate a matching letter than it does to
evaluate a non-matching letter.
Use debug lines to see how times are different with and without sleep.
# Uncomment the following line for fun debug output
print('max {} min {}'.format(max(guess_times), min(guess_times)))
# Add this line to see full guess_times list
print(['{:.2f}'.format(elem) for elem in guess_times])

Using return instead of yield

Is return better than yield? From what ive read it can be. In this case I am having trouble getting iteration from the if statement. Basically what the program does is take two points, a begin and end. If the two points are at least ten miles apart, it takes a random sample. The final if statement shown works for the first 20 miles from the begin point, begMi. nCounter.length = 10 and is a class member. So the question is, how can I adapt the code to where a return statement would work instead of a yield? Or is a yield statement fine in this instance?
def yielderOut(self):
import math
import random as r
for col in self.fileData:
corridor = str(col['CORRIDOR_CODE'])
begMi = float(col['BEGIN_MI'])
endMi = float(col['END_MI'])
roughDiff = abs(begMi - endMi)
# if the plain distance between two points is greater than length = 10
if roughDiff > nCounter.length:
diff = ((int(math.ceil(roughDiff/10.0))*10)-10)
if diff > 0 and (diff % 2 == 0 or diff % 3 == 0 or diff % 5 == 0)\
and ((diff % roughDiff) >= diff):
if (nCounter.length+begMi) < endMi:
vars1 = round(r.uniform(begMi,\
(begMi+nCounter.length)),nCounter.rounder)
yield corridor,begMi,endMi,'Output 1',vars1
if ((2*nCounter.length)+begMi) < endMi:
vars2 = round(r.uniform((begMi+nCounter.length),\
(begMi+ (nCounter.length*2))),nCounter.rounder)
yield corridor,begMi,endMi,'Output 2',vars1,vars2
So roughdiff equals the difference between two points and is rounded down to the nearest ten. Ten is then subtracted so the sample is taken from a full ten mile section; and that becomes diff. So lets say a roughDiff of 24 is rounded to 20, 20 - 10, diff + begin point = sample is taken from between mi 60 and 70 instead of between 70 and 80.
The program works, but I think it would be better if I used return instead of yield. Not a programmer.

return is not better, it's different. return says "I am done. Here is the result". yield says "here is the next value in a series of values"
Use the one that best expresses your intent.

Using yield makes your function a generator function, which means it will produce a series of values each time its (automatically created) next() method is called.
This is useful when you want to process things iteratively because it means you don't have to save all the results in a container and then process them. In addition, any preliminary work that is required before values can generated only has to be done once, because the generator created will resume execution of your code following the that last yield encountered — i.e. it effectively turns it into what is called a coroutine.
Generator functions quit when they return a value rather than yield one. This usually happens when execution "falls off the end" when it will return None by default.
From the looks of your code, I'd say using yield would be advantageous, especially if you can process the results incrementally. The alternative would be to have it store all the values in a container like a list and return that when it was finished.

I use yield in situations where I want to continue iteration on some object. However, if I wanted to make that function recursive, I'd use return.

Serial Key Generation and Validation

I'm toying around with writing creating a serial code generator/validator, but I can't seem to get how to do a proper check.
Here's my generator code:
# Serial generator
# Create sequences from which random.choice can choose
Sequence_A = 'ABCDEF'
Sequence_B = 'UVWQYZ'
Sequence_C = 'NOPQRS'
Sequence_D = 'MARTIN'
import random
# Generate a series of random numbers and Letters to later concatenate into a pass code
First = str(random.randint(1,5))
Second = str(random.choice(Sequence_A))
Third = str(random.randint(6,9))
Fourth = str(random.choice(Sequence_B))
Fifth = str(random.randint(0,2))
Sixth = str(random.choice(Sequence_C))
Seventh = str(random.randint(7,8))
Eighth = str(random.choice(Sequence_D))
Ninth = str(random.randint(3,5))
serial = First+Second+Third+Fourth+Fifth+Sixth+Seventh+Eighth+Ninth
print serial
I'd like to make a universal check so that my validation code will accept any key generated by this.
My intuition was to create checks like this:
serial_check = raw_input("Please enter your serial code: ")
# create a control object for while loop
control = True
# Break up user input into list that can be analyzed individually
serial_list = list(serial_check)
while control:
if serial_list[0] == range(1,5):
pass
elif serial_list[0] != range(1,5):
control = False
if serial_list[1] == random.choice('ABCDEF'):
pass
elif serial_list[1] != random.choice('ABCDEF'):
control = False
# and so on until the final, where, if valid, I would print that the key is valid.
if control == False:
print "Invalid Serial Code"
I'm well aware that the second type of check won't work at all, but it's a place holder because I've got no idea how to check that.
But I thought the method for checking numbers would work, but it doesn't either.

The expression `range(1, 5)' creates a list of numbers from 1 to 4. So in your first test, you're asking whether the first character in your serial number is equal to that list:
"1" == [1, 2, 3, 4]
Probably not...
What you probably want to know is whether a digit is in the range (i.e. from 1 to 5, I assume, not 1 to 4).
Your other hurdle is that the first character of the serial is a string, not an integer, so you would want to take the int() of the first character. But that will raise an exception if it's not a digit. So you must first test to make sure it's a digit:
if serial_list[0].isdigit() and int(serial_list[0]) in range(1, 6):
Don't worry, if it's not a digit, Python won't even try to evaluate the part after and. This is called short-circuiting.
However, I would not recommend doing it this way. Instead, simply check to make sure it is at least "1" and no more than "5", like this:
if "1" <= serial_list <= "5":
You can do the same thing with each of your tests, varying only what you're checking.
Also, you don't need to convert the serial number to a list. serial_check is a string and accessing strings by index is perfectly acceptable.
And finally, there's this pattern going on in your code:
if thing == other:
pass
elif thing != other:
(do something)
First, because the conditions you are testing are logical opposites, you don't need elif thing != other -- you can just say else, which means "whatever wasn't matched by any if condition."
if thing == other:
pass
else:
(do something)
But if you're just going to pass when the condition is met, why not just test the opposite condition to begin with? You clearly know how to write it 'cause you were putting it in the elif. Put it right in the if instead!
if thing != other:
(do something)
Yes, each of your if statements can easily be cut in half. In the example I gave you for checking the character range, probably the easiest way to do it is using not:
if not ("1" <= serial_list <= "5"):

Regarding your python, I'm guessing that when your wrote this:
if serial_list[0] == range(1,5):
You probably meant this:
if 1 <= serial_list[0] <= 5:
And when you wrote this:
if serial_list[1] == random.choice('ABCDEF'):
You probably meant this:
if serial_list[1] in 'ABCDEF':
There are various other problems with your code, but I'm sure you'll improve it as you learn python.
At a higher level, you seem to be trying to build something like a software activation code generator/validator. You should know that just generating a string of pseudo-random characters and later checking that each is in range is an extremely weak form of validation. If you want to prevent forgeries, I would suggest learning about HMAC (if you're validating on a secure server) or public key cryptography (if you're validating on a user's computer) and incorporating that into your design. There are libraries available for python that can handle either approach.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to successfully conduct a Unit Test in Python? - python

Related

Brute Force Created In Python | Numerical Only | Up to 4 Digits

Python Memoization failing on Leetcode

Why do we need to add a "sleep" method to make a constant time attack succeed?

Using return instead of yield

Serial Key Generation and Validation

Categories

Resources