Is there an equivalent of xfail for coverage? - python

Occasionally I will handle some condition that I'm fairly certain would be an edge case, but I can't think of an example where it would come up, so I can't come up with a test case for it. I'm wondering if there's a way to add a pragma to my code such that if, in the future some change in the tests accidentally starts covering this line, I'd would be alerted to this fact (since such accidental coverage will produce the needed test case, but possibly as an implementation detail, leaving the coverage of this line fragile). I've come up with a contrived example of this:
In mysquare.py:
def mysquare(x):
ov = x * x
if abs(ov) != ov or type(abs(ov)) != type(ov):
# Always want this to be positive, though why would this ever fail?!
ov = abs(ov) # pragma: nocover
return ov
Then in my test suite I start with:
from hypothesis import given
from hypothesis.strategies import one_of, integers, floats
from mysquare import mysquare
NUMBERS = one_of(integers(), floats()).filter(lambda x: x == x)
#given(NUMBERS)
def test_mysquare(x):
assert mysquare(x) == abs(x * x)
#given(NUMBERS)
def test_mysquare_positive(x):
assert mysquare(x) == abs(mysquare(x))
The nocover line is never hit, but that's only because I can't think of a way to reach it! However, at some time in the far future, I decide that mysquare should also support complex numbers, so I change NUMBERS:
NUMBERS = one_of(integers(), floats(),
complex_numbers()).filter(lambda x: x == x)
Now I'm suddenly unexpectedly covering the line, but I'm not alerted this fact. Is there something like nocover that works more like pytest.xfail - as a positive assertion that that particular line is covered by no tests? Preferably compatible with pytest.

Coverage.py doesn't yet have this feature, but it's an interesting one: A warning that a line marked as not covered, was actually covered. As a difficult hack, you could do something like disable your nocover pragma, and collect the lines covered, and compare them to the line numbers with the pragma.... Ick.

Related

Python Pandas how to speed up the __init__ class function correctly with numba?

I have a class that does different mathematical calculations on a cycle
I want to speed up its processing with Numba
And now I'm trying to apply Namba to init functions
The class itself and its init function looks like this:
class Generic(object):
##numba.vectorize
def __init__(self, N, N1, S1, S2):
A = [['Time'],['Time','Price'], ["Time", 'Qty'], ['Time', 'Open_interest'], ['Time','Operation','Quantity']]
self.table = pd.DataFrame(pd.read_csv('Datasets\\RobotMath\\table_OI.csv'))
self.Reader()
for a in A:
if 'Time' in a:
self.df = pd.DataFrame(pd.read_csv(Ex2_Csv, usecols=a, parse_dates=[0]))
self.df['Time'] = self.df['Time'].dt.floor("S", 0)
self.df['Time'] = pd.to_datetime(self.df['Time']).dt.time
if a == ['Time']:
self.Tik()
elif a == ['Time','Price']:
self.Poc()
self.Pmm()
self.SredPrice()
self.Delta_Ema(N, N1)
self.Comulative(S1, S2)
self.M()
elif a == ["Time", 'Qty']:
self.Volume()
elif a == ['Time', 'Open_interest']:
self.Open_intrest()
elif a == ['Time','Operation','Quantity']:
self.D()
#self.Dataset()
else:
print('Something went wrong', f"Set Error: {0} ".format(a))
All functions of the class are ordinary column calculations using Pandas.
Here are two of them for example:
def Tik(self):
df2 = self.df.groupby('Time').value_counts(ascending=False)
df2.to_csv('Datasets\\RobotMath\\Tik.csv', )
def Poc(self):
g = self.df.groupby('Time', sort=False)
out = (g.last()-g.first()).reset_index()
out.to_csv('Datasets\\RobotMath\\Poc.csv', index=False)
I tried to use Numba in different ways, but I got an error everywhere.
Is it possible to speed up exactly the init function? Or do I need to look for another way and I can 't do without rewriting the class ?
So there's nothing special or magical about the init function, at run time, it's just another function like any other.
In terms of performance, your class is doing quite a lot here - it might be worth breaking down the timings of each component first to establish where your performance hang-ups lie.
For example, just reading in the files might be responsible for a fair amount of that time, and for Ex2_Csv you repeatedly do that within a loop, which is likely to be sub-optimal depending on the volume of data you're dealing with, but before targeting a resolution (like Numba for example) it'd be prudent to identify which aspect of the code is performing within expectations.
You can gather that information in a number of ways, but the simplest might be to add in some tagged print statements that emit the elapsed time since the last print statement.
e.g.
start = datetime.datetime.now()
print("starting", start)
## some block of code that does XXX
XXX_finish = datetime.datetime.now()
print("XXX_finished at", XXX_finish, "taking", XXX_finish-start)
## and repeat to generate a runtime report showing timings for each block of code
Then you can break the runtime of your program down into feature-aligned chunks, then when tweaking you'll be able to directly see the effect. Profiling the runtime of your code like this can really help when making performance tweaks, and it helps sharpen focus on what tweaks are benefiting/harming specific areas of your code.
For example, in the section that's performing the group-bys, with some timestamp outputs before and after you can compare running it with the numba turned on, and then again with it turned off.
From there, with a little careful debugging, sensible logging of times and a bit of old-fashioned sleuthing (I tend to jot these kinds of things down with paper and pencil) you (and if you share your findings, we) ought to be in a better position to answer your question more precisely.

Having some problems with Project Euler 31

I've been working on solving Problem 31 from Project Euler. I'm almost finished, but my code is giving me what I know is the wrong answer. My dad showed the right answer, and mine is giving me something completely wrong.
coins = (1,2,5,10,20,50,100,200)
amount = 200
def ways(target, avc):
target = amount
if avc <= 1:
return 1
res = 0
while target >= 0:
res = res + ways(target, avc-1)
target = target - coins[avc]
return res
print(ways(amount, 7))
This is the answer I get.
284130
The answer is supposed to 73682.
EDIT: Thank you to everyone who answered my question. Thanks to all of you, I have figured it out!
I see several ways how you can improve the way you're working on problems:
Copy the task description into your source code. This will make you more independent from Internet resources. That allows you to work offline. Even if the page is down or disappears completely, it will enable someone else understanding what problem you're trying to solve.
Use an IDE that does proper syntax highlighting and gives you hints what might be wrong with your code. The IDE will tell you what #aronquemarr mentioned in the comments:
There's also a thing called magic numbers. Many of the numbers in your code can be explained, but why do you start with an avc of 7?. Where does that number come from? 7 does not occur in the problem description.
Use better naming, so you understand better what your code does. What does avc stand for? If you think it's available_coins, that's not correct. There are 8 different coins, not 7.
If it still does not work, reduce the problem to a simpler one. Start with the most simple one you can think of, e.g. make only 1 type of coin available and set the amount to 2 cent. There should only be 1 solution.
To make sure this simple result will never break, introduce an assertion or unit test. They will help you when you change the code to work with larger datasets. They will also change the way you write code. In your case you'll find that accessing the variable coins from an outer scope is probably not a good idea, because it will break your assertion when you switch to the next level. The change that is needed will make your methods more self-contained and robust.
Then, increase the difficulty by having 2 different coins etc.
Examples for the first assertions that I used on this problem:
# Only 1 way of making 1 with only a 1 coin of value 1
assert(find_combinations([1], 1) == 1)
# Still only 1 way of making 1 with two coins of values 1 and 2
assert(find_combinations([1,2], 1) == 1)
# Two ways of making 2 with two coins of values 1 and 2
assert(find_combinations([1,2], 2) == 2)
As soon as the result does no longer match your expectation, use the debugger and step through your code. After every line, check the values in the debugger against the values that you think they should be.
One time, the debugger will be your best friend. And you can never imagine how you did stuff without it. You just need to learn how to use it.
Firstly, read Thomas Weller's answer. They give some excellent suggestions as to how to improve your coding and problem solving.
Secondly, your code works, and gives the correct answer after the following changes:
As suggested, remove the target = amount line. You're resigning the argument of the function to global amount on each call (even on the recursive calls). Having removed that the answer comes up to 3275 - still not the right answer.
The other thing you have to remember is that Python is 0-indexed. Therefore, your simplest case condition ought to read if avc <= 0: ... (not <=1).
Having made these two changes, your code gives the correct answer. Here is your code with these changes:
coins = (1,2,5,10,20,50,100,200)
amount = 200
def ways(target, avc):
if avc <= 0:
return 1
res = 0
while target >= 0:
res = res + ways(target, avc-1)
target = target - coins[avc]
return res
print(ways(amount, 7))
Lastly, there are plenty of Project Euler answers out there. Having solved the yourself, it might be useful to have a look at what others did. For reference, I have not actually solved this Project Euler before, so I had to do that first. Here is my solution. I've added a pile of comments on top of it to explain what it does.
EDIT
I've just noticed something quite worrying: Your code (after fixes) works only if the first element of coins is 1. All the other elements can be shuffled:
# This still works ok
coins = (1,2,200,10,20,50,100,5)
# But this does not
coins = (2,1,5,10,20,50,100,200)
To ensure that this is always the case, you can just do the following:
coins = ... # <- Some not necessarily sorted tuple
coins = tuple(sorted(coins))
In principle there are a few other issues. Your solution breaks for non-unique values coins, and which don't include 1. The former you could fix with the use of sets, and the latter by modifying your if avc <= 0: case (check for the divisibility of the target by the remaining coin). Here is you piece of code with these changes implemented. I've also renamed the variables and the function to be a bit easier to read, and used coins as the argument, rather that avc pointer (which, by the way, I could not stop reading as avec):
unique = lambda x: sorted(set(x)) # Sorted unique list
def find_combinations(target, coins):
''' Find the number of ways coins can make up the target amount '''
coins = unique(coins)
if len(coins)==1: # Only one coin, therefore just check divisibility
return 1 if not target%coins[0] else 0
combinations = 0
while target >= 0:
combinations += find_combinations(target, coins[:-1])
target -= coins[-1]
return combinations
coins = [2,1,5,10,20,50,100,200] # <- Now works
amount = 200
print(find_combinations(amount, coins))

Scipy minimize iterating past bounds

I am trying to minimize a function of 3 input variables using scipy. The function reads like so-
def myfunc(x):
x[0] = a
x[1] = b
x[2] = c
n = f(a,b,c)
return n
bound1 = (80,100)
bound2 = (10,20)
bound3 = (312,740)
guess = [a0,b0,c0]
bds = (bound1,bound2,bound3)
result = minimize(myfunc, guess,method='L-BFGS-B',bounds=bds)
The function I am trying to currently run reaches a minimum at a=100,b=10,c=740, which is at the end of the bounds.
The minimize function keeps trying to iterate past the end of bound 3 (gets to c0 value of 740.0000000149012 on its last iteration.
Is there any way to stop this from happening? i.e. stop the iteration at the actual end of my bound?
This happens due to numerical-differentiation, which itself is not only needed to infer the step-direction and size, but also to reason about termination.
In general you can't do much without being very careful in regards to whatever solver (and there are many backend-solvers) being used. The basic idea is to replace the automatic numerical-differentiation with one provided by you: this one then respects those bounds and must be careful about the solvers-internals, e.g. "how to reason about termination at this end".
Fix A:
Your problem should vanish automatically when using: Pull-request #10673, which touches your configuration: L-BFGS-B.
It seems, this PR is not part of the current release SciPy 1.4.1 (as this was 2 months before the PR).
See also #6026, where a milestone of 1.5.0 is mentioned in regards to some changes including respecting bounds in num-diff.
For above PR, you will need to install scipy from the sources, which is:
quite doable on linux (and maybe os x)
not something you should try on windows!
trust me...
See the documentation if needed.
Fix B:
Apart from that, as you are doing unconstrained-optimization (with variable-bounds) where more solver-backends are available (compared to constrained-optimization), you might try another solver, trust-constr, which has explicit support for this, see #9098.
Be careful to recognize, that you need to signal this explicitly when setting up the bounds!

Search for all the if conditions in a python file and adding a print statement in the next line

I have to edit a python file such that after every if condition, i need to add a line which says
if condition_check:
if self.debug == 1: print "COVERAGE CONDITION #8.3 True (condition_check)"
#some other code
else:
if self.debug == 1: print "COVERAGE CONDITION #8.4 False (condition_check)"
#some other code
The number 8.4(generally y.x) refer to the fact that this if condition is in function number 8(y) (the functions are just sequentially numbers, nothing special about 8) and x is xth if condition in yth function.
and of course, the line that will be added will have to be added with proper indentation. The condition_check is the condition being checked.
For example:
if (self.order_in_cb):
self.ccu_process_crossing_buffer_order()
becomes:
if (self.order_in_cb):
if self.debug == 1: print "COVERAGE CONDITION #8.2 TRUE (self.order_in_cb)"
self.ccu_process_crossing_buffer_order()
How do i achieve this?
EXTRA BACKGROUND:
I have about 1200 lines of python code with about 180 if conditions - i need to see if every if condition is hit during the execution of 47 test cases.
In other words i need to do code coverage. The complication is - i am working with cocotb stimulus for RTL verification. As a result, there is no direct way to drive the stimulus, so i dont see an easy way to use the standard coverage.py way to test coverage.
Is there a way to check the coverage so other way? I feel i am missing something.
If you truly can't use coverage.py, then I would write a helper function that used inspect.stack to find the caller, then linecache to read the line of source, and log that way. Then you only have to change if something: to if condition(something): throughout your file, which should be fairly easy.
Here's a proof of concept:
import inspect
import linecache
import re
debug = True
def condition(label, cond):
if debug:
caller = inspect.stack()[1]
line = linecache.getline(caller.filename, caller.lineno)
condcode = re.search(r"if condition\(.*?,(.*)\):", line).group(1)
print("CONDITION {}: {}".format(label, condcode))
return cond
x = 1
y = 1
if condition(1.1, x + y == 2):
print("it's two!")
This prints:
CONDITION 1.1: x + y == 2
it's two!
I have about 1200 lines of python code with about 180 if conditions - i need to see if every if condition is hit during the execution of 47 test cases. In other words i need to do code coverage. The complication is - i am working with cocotb stimulus for RTL verification.
Cocotb has support for coverage built in (docs)
export COVERAGE=1
# run cocotb however you currently invoke it

python testing code coverage and multiple asserts in one test function

In the following code,
1.If we assert the end result of a function then is it right to say that we have covered all lines of the code while testing or we have to test each line of the code explicitly if so how ?
2.Also Is it fine that we can have the positive ,negative and more assert statements in one single test function.If not please give examples
def get_wcp(book_id):
update_tracking_details(book_id)
c_id = get_new_supreme(book_id)
if book_id == c_id:
return True
return False
class BookingentitntyTest(TestCase):
def setUp(self):
self.booking_id = 123456789# 9 digit number poistive test case
def test_get_wcp(self):
retVal = get_wcp(self.book_id)
self.assertTrue( retVal )
self.booking_id = 1# 1 digit number Negative test case
retVal = get_wcp(self.book_id)
self.assertFalse( retVal )
No, just because you asserted the final result of the statement doesn't mean all paths of your code has been evaluated. You don't need to write "test each line" in the sense that you only need to go through all the code paths possible.
As a general guideline, try to keep the number of assert statements in a test as minimum as possible. When one assert statement fails, the further statements are not executed, and doesn't count as failures which is often not required.
Besides try to write your tests as elegantly as possible, even more so than the code they are testing. We wouldn't want to write tests to test our tests, now would we?
def test_get_wcp_returns_true_when_valid_booking_id(self):
self.assertTrue(get_wcp(self.book_id))
def test_get_wcp_returns_false_if_invalid_booking_id(self):
self.assertFalse(get_wcp(1))
Complete bug-free code is not possible.
"Testing shows the presence, not the absence of bugs" - Edsger Dijkstra.

Categories

Resources