Automatically generate data for unit testing in Python

Automatically generate data for unit testing in Python - python

I have a module to test, module includes a serie of functions / simple classes.
Wondering if there any attempts(ie package) to generate automatically:
1) Generate Python code from initial Python file containing function definition.
2) This code list of call to the functions with random/parametric data as parameters.
It is technically feasible by using inspect and python meta classes,
usually limited to numerical type functions....(numpy array).
Because string (ie url input) would be impossible (only parametrized...).
EDIT: By random, it means obviously "parametric random".
Suppose we have
def f(x1,x2,x3)
For all xi of f
if type(xi) = array1D ->
Do those tests: empty array, zeros array, negative array(random),
positivearray(random), high values, low values, integer array, real
number array, ordered array, equal space array,.....
if type(xi)=int -> test zero, 1, 2,3,4, randomValues, Negative
Do people think such project is possible using inspect and meta class? (limited to numpy/numerical items).
Suppose you have a very large library..., things can be done in background.

You might be thinking of fuzz testing, where a bunch of garbage data is submitted to a function to see if anything makes it behave badly. It sounds like the Hypothesis library will let you generate different test cases based on some parameters.

I spent searching, it seems this kind of project does not really exist (to my knowledge):
Technically, this is a mix of packages (issues):
Hypothese : data generation for input, running the code with crash/error.
(without the invariant part of Hypothese)
Jedi: Static analysis of code/Inference of the type
Type inference is a difficult issue in Python (in general)
implementing type inference
If type is num/array of num:
Boundary exists/ typical usage is clearly defined
If type is string: Inference is pretty difficult without human guessing.
Same for others, Context guessing is important

Related

GPyOpt - how to run a physical experiment?

I'm trying to do some physical experiments to find a formulation that optimizes some parameters. By physical experiments I mean I have a chemistry bench, I'm mixing stuff together, then measuring the properties of that formulation. Historically I've used traditional DOEs, but I need to speed up my time to getting to the ideal formulation. I'm aware of simplex optimization, but I'm interested in trying out Bayesian optimization. I found GPyOpt which claims (even in the SO Tag description) to support physical experiments. However, it's not clear how to enable this kind of behavior.
One thing I've tried is to collect user input via input, and I suppose I could pickle off the optimizer and function, but this feels kludgy. In the example code below, I use the function from the GPyOpt example but I have to type in the actual value.
from GPyOpt.methods import BayesianOptimization
import numpy as np
# --- Define your problem
def f(x):
return (6*x-2)**2*np.sin(12*x-4)
def g(x):
print(f(x))
return float(input("Result?"))
domain = [{'name': 'var_1', 'type': 'continuous', 'domain': (0, 1)}]
myBopt = BayesianOptimization(f=g,
domain=domain,
X=np.array([[0.745], [0.766], [0], [1], [0.5]]),
Y=np.array([[f(0.745)], [f(0.766)], [f(0)], [f(1)], [f(0.5)]]),
acquisition_type='LCB')
myBopt.run_optimization(max_iter=15, eps=0.001)
So, my questions is, what is the intended way of using GPyOpt for physical experimentation?

A few things.
First, set f=None. Note that this has the side-effect of causing the BO object to ignore the maximize=True, if you happen to be using this.
Second, rather than use run_optimization, you want suggest_next_locations. The former runs the entire optimization, whereas the latter just runs a single iteration. This method returns a vector with parameter combinations ("locations") to go test in the lab.
Third, you'll need to make some decisions regarding batch size. The number of combinations/locations that you get are controlled by the batch_size parameter that you use to initialize the BayesianOptimization object. Choice of acquisition function is important here, because some are closely tied to a batch_size of 1. If you need larger batches, then you'll need to read the docs for combinations suitable to your situation (e.g. acquisition_type=EI and evaluator_type=local_penalization.
Fourth, you'll need to explicitly manage the data between iterations. There are at least two ways to approach this. One is to pickle the BO object and add more data to it. An alternative that I think is more elegant is to instead create a completely fresh BO object each time. When instantiating it, you concatenate the new data to the old data, and just run a single iteration on the whole set (again, using suggest_next_locations). This might be kind of insane if you were using BO to optimize a function in silico, but considering how slow the chemistry steps are likely to be, this might be cleanest (and easier to make mid-course corrections.)
Hope this helps!

What parts of analytical code to write unit tests for?

For the last while I've been writing analytical Python code that gets run on demand when users interact with a front end tool throught a queue based batch processing.
Typically the users set some values in the front end tool that get passed as parameters to the analytical code and they either supply a dataset or choose a subset of data from an overall data source that their company provides.
Typically each analytical model sits in a larger repo amongst other analytical models so each model would usually sit in it's own module and that module would export one function which is the entrypoint in to that model. The models range from being simple models that take on the order of minutes to very complex stastical or machine learning based models and might use combinations of numpy/Pandas/Numba or Dask dataframes that take on the order of hours.
Now to my question, I've been going back on forth on where I should be aiming to concentrate my testing efforts for this type of code. Previously earlier on in my career I naively thought that every function should have a unit test so my code would have a comprehensive of set of tests.
I quickly realised that this was counter-productive as even a small performance refactor could result in ripping apart and possibly even throwing away a lot of the unit tests. So clearly it felt like I should only be writing tests for the main public function of each model, however, this usually means the opposite happening, for some of the more complex models, edge cases that were quite deep into the control flow were now hard to test.
My question then is how should I be aiming to properly test these analytical models? Some people would probably say "Only test public facing functions, if you can't test edge cases through the public facing functions then they should technically not be reachable so don't need to be there". But, I've found, in reality this doesn't quite work.
To provide a simple example, say the particular model is to calculate a frequency matrix for dropoff/pickoff points from a taxi dataset.
import pandas as pd
def _cat(col1, col2):
cat_col = col1.astype(str).str.cat(col2.astype(str), ', ')
return cat_col
def _make_points_df(taxi_df):
pickup_points = _cat(taxi_df["pickup_longitude"], taxi_df["pickup_latitude"])
dropoff_points = _cat(taxi_df["dropoff_longitude"], taxi_df["dropoff_latitude"])
points_df = pd.DataFrame({"pickup": pickup_points, "dropoff": dropoff_points})
return points_df
def _points_df_to_freq_mat(points_df):
mat_df = points_df.groupby(['pickup', 'dropoff']).size().unstack(fill_value=0)
return mat_df
def _validate_taxi_df(taxi_df):
if type(taxi_df) is not pd.DataFrame:
raise TypeError(f"taxi_df param must be a pandas dataframe, got: {type(taxi_df)}")
expected_cols = {
"pickup_longitude",
"pickup_latitude",
"dropoff_longitude",
"dropoff_latitude",
}
if set(taxi_df) != expected_cols:
raise RuntimeError(
f"Expected the following columns for taxi_df param: {expected_cols}."
f"Got: {set(taxi_df)}"
)
def calculate_frequency_matrix(taxi_df, long_round=1, lat_round=1):
"""Calculate a dropoff/pickup frequency matrix which tells you the number of times
passengers have been picked up and dropped from a given discrete point. The
resolution of these points is controlled by using the long_round and lat_round params
Paramaters
----------
taxi_df : pandas.DataFrame
Dataframe specifying dropoff and pickup long/lat coordinates
long_round : int
Number of decimal places to round the dropoff and pickup longitude values to
lat_round : int
Number of decimal places to round the dropoff and pickup latitude values to
Returns
-------
pandas.DataFrame
Dataframe in matrix format of frequency of dropoff/pickup points
Raises
------
TypeError : If taxi_df is not a pandas DataFrame
RuntimeError : If taxi_df does not contain correct columns
"""
_validate_taxi_df(taxi_df)
taxi_df = taxi_df.copy()
taxi_df["pickup_longitude"] = taxi_df["pickup_longitude"].round(long_round)
taxi_df["dropoff_longitude"] = taxi_df["dropoff_longitude"].round(long_round)
taxi_df["pickup_latitude"] = taxi_df["pickup_latitude"].round(lat_round)
taxi_df["dropoff_latitude"] = taxi_df["dropoff_latitude"].round(lat_round)
points_df = _make_points_df(taxi_df)
mat_df = _points_df_to_freq_mat(points_df)
return mat_df
Taking in a dataframe like
pickup_longitude pickup_latitude dropoff_longitude dropoff_latitude
0 -73.988129 40.732029 -73.990173 40.756680
1 -73.964203 40.679993 -73.959808 40.655403
2 -73.997437 40.737583 -73.986160 40.729523
3 -73.956070 40.771900 -73.986427 40.730469
4 -73.970215 40.761475 -73.961510 40.755890
5 -73.991302 40.749798 -73.980515 40.786549
6 -73.978310 40.741550 -73.952072 40.717003
7 -74.012711 40.701527 -73.986481 40.719509
Say in terms of a folder structure this code would sit at
analytics/models/taxi_freq/taxi_freq.py
and the
analytics/models/taxi_freq/__init__.py
file would look like
from taxi_freq import calculate_frequency_matrix
And obviously the private functions in the above code could be split across multiple utiltiy files in analytics/models/taxi_freq/.
Would the consensus be to only test the calculate_frequency_matrix function, or should the "private" helper methods and other utility files/functions within the taxi_freq module also be tested?

As with software development in general, also with testing you always have to search for solutions that represent the (ideally optimal) tradeoff between competing goals. One of the primary goals of testing in general and also for unit-testing is to find bugs (see Myers, Badgett, Sandler: The Art of Software Testing, or, Beizer: Software Testing Techniques, but also many others).
In your project you may have a more relaxed position on this, but there are many software projects where it would have serious consequences if implementation level bugs escape to later development phases or even to the field. Some say, your goal should rather be to increase confidence in your code - and this is also true, but confidence can only be a consequence of doing testing right. If you don't test to find bugs, then I will simply not have confidence in your code after you have finished testing.
When finding bugs is a primary goal of unit-testing, then attempts to keep unit-test suites completely independent of implementation details is likely to result in inefficient test suites - that is, test suites that are not suited to find all bugs that could be found. Different implementations have different potential bugs. If you don't use unit-testing for finding these bugs, then any other test level (integration, subsystem, system) is definitely less suited for finding them systematically.
For example, think about the different ways to implement a Fibonacci function: as an iterative or recursive function, as a closed form expression (Moivre/Binet), or as a lookup table: The interface is always the same, the possible bugs differ significantly, and so do the unit-testing strategies. There will be a useful set of implementation independent test cases, but these alone will not be sufficient to find all bugs that are likely for the specific implementation.
The goal to have an effective test suite therefore is in competition with another goal, namely to have a maintenance friendly test suite. This goal, however, comes in different forms with different consequences: You could demand that the unit-test suite shall not be affected when implementation details change. This is quite tough and IMO puts the secondary goal of maintenance friendly test code above the primary goal of finding bugs.
Meszaros has a more balanced formulation, namely "The effort for changes to the code base shall be commensurate with the effort to maintain the test suite." (see Meszaros: Principles of Test Automation: Ensure Commensurate Effort). That is, little changes to the SUT shall only require little changes to the test suite, for larger changes to the SUT it is acceptable that the test suite requires equally large changes. (However, for me personally the formulation "the effort for test code maintenance shall be low" is sufficient.)
Conclusion:
For me, as I see finding bugs as the primary goal and test suite maintainability as a secondary goal, this leads to the following consequence: I accept that I have to test also implementation details to find more bugs. But, despite this fact I nevertheless try to keep the maintenance effort low: I do this mostly by applying the following mechanisms that aim at making it simpler to adjust the test suite in case of changes to the SUT:
First, if the goal of a specific test case can be reached by an implementation agnostic test case and an implementation dependent test case, prefer the implementation agnostic test case. In other words, don't make individual test cases unnecessarily implementation dependent.
Second, hide implementation details behind helper functions. There can be helper functions for specific setups, teardowns, assertions etc. This is an extremely powerful mechanism to limit the effect of implementation details within the test suite.

ABAQUS python scripting inconsistencies when selecting regions

This may sound more like a rant to some extent, but I also would like to have your opinion on how to deal with the inconsistencies when using python scripting in abaqus.
here my example: in my rootAssembly (ra) I have three instances called a, b, c. in the script below I assign general seed, then mesh control, and element types, finally I generate the mesh:
ra.seedPartInstance(regions=(a,b,c), size=1.0)
ra.setMeshControls(elemShape=QUAD,
regions=(a.faces+b.faces+c.faces),
technique=STRUCTURED)
ra.setElementType(
elemTypes=eltyp,
regions=(a.faces,b.faces,c.faces))
ra.generateMesh(regions=(a,b,c))
As you can see, ABAQUS requires you to define the same region in several different modes.
Even though the argument is called "regions", ABAQUS either asks for a Set, or a Vertex, or a GeomSequence.
how do you deal with this? scripting feels a lot like trial and error, as there is no way to know in advance what is expected.
any suggestions?

Yes, there is clearly "a way to know in advance what is expected" - the docs. These spell out exactly what arguments are allowed.
But seriously - I see no inconsistency in your example. In practice, the reuse of the argument regions makes complete sense when you consider the context for what each of the functions actually do. Consider how the word "region" is a useful conceptual framework that can be adapted to easily allow the user to specify the necessary info for a variety of different tasks.
Now consider the complexity of the underlying system that the Python API exposes, and the variety of tasks that different users want to control and do with that underlying system. I doubt it would be simpler if the args were named something like seq_of_geomCells_geomFaces_or_geomSets. Or even worse, if there were a different argument for each allowable model entity that the function was designed to handle - that would be a nightmare. In this respect, the reuse of the keyword regions as a logical conceptual framework makes complete sense.

ok, i read now from the documentation of the three commands used above:
seedPartInstance(...)
regions: A sequence of PartInstance objects specifying the part instances to seed.
setMeshControls(...)
regions: A sequence of Face or Cell regions specifying the regions for which to set the mesh control parameters.
setElementType(...)
regions: A sequence of Geometry regions or MeshElement objects, or a Set object containing either geometry regions or elements, specifying the regions to which element types are to be assigned.
ok, i get the difference between partInstances and faces, but still it's not extremely clear why one is appended (using commas) and the other is added (using +), since they both call for a Sequence, and at this point, how does setElementType even works when passing faces objects to it?
I will take some more time to learn ABAQUS and to think through it, hopefully i can understand truly these differences.

Writing reusable code

I find myself constantly having to change and adapt old code back and forth repeatedly for different purposes, but occasionally to implement the same purpose it had two versions ago.
One example of this is a function which deals with prime numbers. Sometimes what I need from it is a list of n primes. Sometimes what I need is the nth prime. Maybe I'll come across a third need from the function down the road.
Any way I do it though I have to do the same processes but just return different values. I thought there must be a better way to do this than just constantly changing the same code. The possible alternatives I have come up with are:
Return a tuple or a list, but this seems kind of messy since there will be all kinds of data types within including lists of thousands of items.
Use input statements to direct the code, though I would rather just have it do everything for me when I click run.
Figure out how to utilize class features to return class properties and access them where I need them. This seems to be the cleanest solution to me, but I am not sure since I am still new to this.
Just make five versions of every reusable function.
I don't want to be a bad programmer, so which choice is the correct choice? Or maybe there is something I could do which I have not thought of.

Modular, reusable code
Your question is indeed important. It's important in a programmers everyday life. It is the question:
Is my code reusable?
If it's not, you will run into code redundancies, having the same lines of code in more than one place. This is the best starting point for bugs. Imagine you want to change the behavior somehow, e.g., because you discovered a potential problem. Then you change it in one place, but you will forget the second location. Especially if your code reaches dimensions like 1,000, 10,0000 or 100,000 lines of code.
It is summarized in the SRP, the Single-Responsibilty-Principle. It states that every class (also applicable to functions) should only have one determination, that it "should do just one thing". If a function does more than one thing, you should break it apart into smaller chunks, smaller tasks.
Every time you come across (or write) a function with more than 10 or 20 lines of (real) code, you should be skeptical. Such functions rarely stick to this principle.
For your example, you could identify as individual tasks:
generate prime numbers, one by one (generate implies using yield for me)
collect n prime numbers. Uses 1. and puts them into a list
get nth prime number. Uses 1., but does not save every number, just waits for the nth. Does not consume as much memory as 2. does.
Find pairs of primes: Uses 1., remembers the previous number and, if the difference to the current number is two, yields this pair
collect all pairs of primes: Uses 4. and puts them into a list
...
...
The list is extensible, and you can reuse it at any level. Every function will not have more than 10 lines of code, and you will not be reinventing the wheel everytime.
Put them all into a module, and use it from every script for an Euler Problem related to primes.
In general, I started a small library for my Euler Problem scripts. You really can get used to writing reusable code in "Project Euler".
Keyword arguments
Another option you didn't mention (as far as I understand) is the use of optional keyword arguments. If you regard small, atomic functions as too complicated (though I really insist you should get used to it) you could add a keyword argument to control the return value. E.g., in some scipy functions there is a parameter full_output, that takes a bool. If it's False (default), only the most important information is returned (e.g., an optimized value), if it's True some supplementary information is returned as well, e.g., how well the optimization performed and how many iterations it took to converge.
You could define a parameter output_mode, with possible values "list", "last" ord whatever.
Recommendation
Stick to small, reusable chunks of code. Getting used to this is one of the most valuable things you can pick up at "Project Euler".
Remark
If you try to implement the pattern I propose for reusable functions, you might run into a problem immediately at point 1: How to create a generator-style function for this? E.g., if you use the sieve method. But it's not too bad.

My guess, create module that contain:
private core function (example: return list of n-th first primes or even something more generall)
several wrapper/util functions that use core one and prepare output different ways. (example: n-th prime number)

Try to reduce your functions as much as possible, and reuse them.
For example you might have a function next_prime which is called repeatedly by n_primes and n_th_prime.
This also makes your code more maintainable, as if you come up with a more efficient way to count primes, all you do is change the code in next_prime.
Furthermore you should make your output as neutral as possible. If you're function returns several values, it should return a list or a generator, not a comma separated string.

Partial evaluation with pyparsing

I need to be able to take a formula that uses the OpenDocument formula syntax, parse it into syntax that Python can understand, but without evaluating the variables, and then be able to evaluate the formula many times with changing valuables for the variables.
Formulas can be user input, so pyparsing allows me to both effectively handle the formula syntax, and clean user input. There are a number of good examples of pyparsing available, but all the mathematical ones seem to assume that one evaluates everything in the current scope immediately.
For context, I am working with a model of the industrial economy (life cycle assessment, or LCA), where these formulas represent the amount of material or energy exchanges between processes. The variable amount can be a function of several parameters, such as geographical location. THe chain of formula and variable references are stored in a directed acyclic graph, so that formulas can always be simply evaluated. Formulas are stored as strings in a database.
My questions are:
Is it possible to parse a formula such that the parsed evaluation can also be stored in the database (as a string to be evaled, or something else)?
Are there alternatives to this approach? Bear in mind that the ideal solution is to parse/write once, and read many times. For example, partially parsing the formula, and then using the ast module, although I don't know how this could work with database storage.
Any examples of a project or library similar to this that I could look over? I am not a programmer, just a student trying to finish his thesis while making an open-source LCA software model in my spare time.
Is this approach too slow? I would like to be able to do substantial Monte Carlo runs, where each run could involve tens of thousands of formula evaluations (it is a big database).

1) Yes, it is possible to pickle the results from parsing your expression, and save that to a database. Then you can just fetch and unpickle the expression, rather than reparse the original again.
2) You can do a quick-and-dirty pass at this just using the compile and eval built-ins, as in the following interactive session:
>>> y = compile("m*x+b","","eval")
>>> m = 100
>>> x = 5
>>> b = 1
>>> eval(y)
501
Of course, this has the security pitfalls of any eval- or exec-based implementation, in that untrusted or malicious source strings can embed harmful system calls. But if this is your thesis and entirely within your scope of control, just don't do anything foolish.
3) You can get an online example of parsing an expression into a "evaluatable" data structure at the pyparsing wiki's Examples page. Check out simpleBool.py and evalArith.py especially. If you're feeling flush, order a back issue of the May,2008 issue of Python magazine, which has my article "Writing a Simple Interpreter/Compiler with Pyparsing" with a more detailed description of the methods used, plus a description of how pickling and unpickling the parsed results works.
4) The slow part will be the parsing, so you are on the right track in preserving these results in some intermediate and repeatably-evaluatable form. The eval part should be fairly snappy. The second slow part will be in fetching these pickled structures from your database. During your MC run, I would package a single function that takes the selection parameters for an expression, fetches from the database, and unpickles and returns the evaluatable expression. Then once you have this working, use a memoize decorator to cache these query-results pairs, so that any given expression only needs to be fetched/unpickled once.
Good luck with your thesis!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.