Python Behave sharing data between features

Python Behave sharing data between features - python

I'm starting with Python Behave as I want to make an API testing thing.
Currently I stumbled upon something that may not be even valid but the question is: Can I share data between features? I could store some in a DB or files but maybe there is something 'builtin'?
Or is this invalid and I shouldn't share data like that, or maybe it's possible inside a feature?
In practice it looks like:
I have to make a request to an endpoint, in the response I get a number that is required to make another request that's needed to be tested.

Yes, you can, and it is trivial. In the same directory where you have your feature files you can create a file named environment.py. In it, you can put:
def before_all(context):
context.db = whatever
The before_all hook is run before all the tests and whatever you set there is available to all features. I use this, for instance, to create a new Selenium instance that will be used by all tests in a test suite. context.db above could be a database connection. This kind of sharing is fine.
What you share should be read-only or be resettable to a known state between tests. It is not a good practice to share writable resources between tests. It makes it hard to figure out what went wrong when a test fails and it makes tests dependent on each other. So if you have a failure on test C but it depends on A and B you cannot just ask Behave to run test C. You have to ask it to run A, B and then C.
If you decide to go against the best practices and do it anyway, you should be aware that new values set on context are scoped by feature and scenario. So if your before_all hook sets context.foo = 1 and then feature A sets context.foo = 2. When feature B runs after feature A, it will see the value 1 for context.foo because Behave will have removed the changes made by feature A (the changes made by feature A are "scoped" to feature A.) Now, you have to remember how Python stores values. If the hook sets context.foo = [] and feature A does context.foo.append(1), then feature B will see that context.foo has the value [1] because context.foo contains a reference to the array and calling append changes the array itself. So it is possible to work around the scoping. This is still not advisable, though.
Last I checked, features are run in alphabetical order. You can force the order by specifying features on the command line: behave B.feature A.feature

Related

How to mock in python unit tests aws-assume-role-lib and boto3 libraries?

i have a function that has the following code in it
ses = boto3.Session()
as_role = aws_assume_role_lib.assume_role(session, "some_role")
i'm trying to unit test it and i need to mock those two calls.
What's the best way to do so?

Without more information, there's no way to answer your question. Any direct answer would require a lot more information. What is it that you want to test? It certainly can't be the code you're showing us because if you mock out both of these calls, there's nothing left to test.
Mocking involves injecting stand-ins for specific test cases such that for those test cases, the mocks return predetermined values without calling the real routines. So what do you want those values to be? The mocks could also have side effects, or change their behavior based on external state, but you want to try to avoid either of those situations. Mocks are usually functional...returning a specific output for each set of specific explicit inputs, and neither modifying or being affected by implicit (external) state.
I assume that ses should be session so that the result of the first call is passed to the second call. In this case, neither of these calls takes varying data, so you can mock the result of the two calls by just assigning a static value to as_role...whatever value your later code wants to see. Since there is only one set of inputs, there should only be one possible output.
It would be more complicated if you need the mocks to change their behavior based on external state. How you might mock in that case is entirely dependent on the external states that are possible, where they come from, and how they affects the value of as_role. Likewise, if you need your mocks to cause a change in external state, like modifying global variables, you'd need to specify that, and that might affect how you'd choose to mock.
In short, you should define a spec describing how you want your mocks to act under a restricted set of particular conditions. Having that spec will let you begin to decide how to implement them.
If you're just asking how to generally build mocks in Python, check out one or more of the existing Python mocking frameworks, like mock, flexmock, mox, Mocker, etc. A quick Google pointed me to this, which might be helpful: https://garybernhardt.github.io/python-mock-comparison/. Asking for library recommendations is against SO policy as it involves mostly personal opinions, so if you're asking for that, you should look elsewhere.

The way to design structure of classes

I have very simple structure of classes. Class B and C inheriting from A. The functionality is similar in the case of some functions and properties. Class B and C has different processing of the data but the classes have the same output (the function creating the output is inside the A class). Example of the structure:
Now I need to extend the functionality. I need to add the option for a little differences in processing and the outputting of the data (class X). This option is managed by configuration file so I want keep the old way of the downloading, processing and outputting the data, e.g.:
option 1 - download the data without threads using the old way processing and output
option 2 - download the data without threads using the new way processing and output
option 3 - download the data with threads using the old way processing and output
option 4 - download the data without threads using the new way processing and output
I am not sure how to implement combination of the new processing and outputting the data. I need combination of Class B and X (BX) and Class C and X (CX). I think about these options:
The easiest way but the worst - duplicating some functions in the class B and class A.
Keep the classes A and B and add the combination of BX and AX classes.
Rewrite the classes A and B only for downloading the data. Then add classes for processing the data and then add classes for outputting the data. All classes will share the objects. Looks like the best option but with the most work to do.
Is there any better option (e.g. design pattern or something like this) how to extend the classes with the cleanest way?

It seems to me that you have a Dataflow.
What you have is Get (Download) Data -> Process Data -> Output Data. This basically is a pipeline with three stages.
If you are using object to models this you, have two types of objects: the ones who perform the operations and one or more that manage the data flow and the ordering of operations.
Lets use three interfaces (you may use abstract classes to define them it doesn't matter) to define the steps of the pipeline: IDataDownloader, IDataProcessor and IDataOutputer.
Lets add another object that will represent and manage the pipeline:
(Note: I don't have experience with python, so I will write it in C# sorry..)
public class Pipeline {
private IDataDownloader mDataDownloader;
private IDataProcessor mDataProcessor;
private IDataOutputer mDataOutputer;
public void Setup() {
// use a config file to select between the two ways
if (Config.UseOldWayDownloader) {
mDataDownloader = new OldWayDownloader();
}
else {
mDataDownloader = new NewWayDownloader();
}
// ....
// or you can use service locator or dependecy injection that will get
// a specific downloader based on configuration/setup
mDataDownloader = Services.Get<IDataDownloader>();
// ....
}
public void Execute() {
var data = mDownloader.Download();
var processedData = mDataProcessor.Process(data);
mDataOutputer.Output(processedData);
}
}
This way you get nice separation of concerns and you get a modular system that you can extend. This sytem is also composable. You can compose it's parts in different ways.
It may seem like more work, but this may not be true. Clean designs are often easier to write and extend.
It's true that you may write little more code, but this code can be writen faster because of the clean design and you can save time from debugging. Debugging is the thing that comsumes most of the time we write software but it's often overloked by programmers that thinks that they can measure only by writing and code lines.
One example for this is the case with loose typed languages. Quite often you write less code so it's fater, but it's prone to errors due to mistakes. In case of error you get more (and sometimes harder) debugging so you waste time.
If you start with a simple proof of concept app they will significantly speed up developement. Once you get to the point of acqually having a robust software that is tested then a different forces kick in.
In order to ensure that your software is robust you have to write more tests and more checks/assertions to make sure that everything runs smoothly. So in the end you will probably have the same amount of code as strong typing does some of the checks for you and you can write less tests.
So what's better? Well... it depends.
There are several factors that will influence the speed of developement. The
taste and experice of the programmers with a language. The application will also affect this. Some applications are easier to write on loose typed languages, other on strong typed languages.
Speed can be different, but it's not always like this, sometimes it's just the same.
I think that in your case you will waste more time trying to get the hierarchy right and debugging it even if in the end you have few lines less code. In the long run, having a design that is harder to understand will slow you down significantly if you need to extend it.
In the pipeline approach adding new ways of downloading, processing or outputting the data is a matter of adding a single class.

Well, to make it an UML answer I add the class diagram:
The Pipeline has those three processing classes referred. They are all abstract, so you will need to implement them differently. And the configuration will assign them according to what the configuration (file) says).

Storing global configuration data in a pytest/xdist framework

I'm building a test framework using python + pytest + xdist + selenium grid. This framework needs to talk to a pre-existing custom logging system. As part of this logging process, I need to submit API calls to: set up each new test run, set up test cases within those test runs, and log strings and screenshots to those test cases.
The first step is to set up a new test run, and the API call for that returns (among other things) a Test Run ID. I need to keep this ID available for all test cases to read. I'd like to just stick it in a global variable somewhere, but running my tests with xdist causes the framework to lose track of the value.
I've tried:
Using a "globals" class; it forgot the value when using xdist.
Keeping a global variable inside my conftest.py file; same problem, the value gets dropped when using xdist. Also it seems wrong to import my conftest everywhere.
Putting a "globals" class inside the conftest; same thing.
At this point, I'm considering writing it to a temp file, but that seems primitive, and I think I'm overlooking a better solution. What's the most correct, pytest-style way to store and access global data across multiple xdist threads?

Might be worth looking into Proboscis, as it allows specific test dependencies and could be a possible solution.

Can you try config.cache E.g. -
request.config.cache.set('run_id', run_id)
refer documention

Why use python classes over modules with functions?

Im teaching myself python (3.x) and I'm trying to understand the use case for classes. I'm starting to understand what they actually do, but I'm struggling to understand why you would use a class as opposed to creating a module with functions.
For example, how does:
class cls1:
def func1(arguments...):
#do some stuff
obj1 = cls1()
obj2 = cls1()
obj1.func1(arg1,arg2...)
obj2.func1(arg1,arg2...)
Differ from:
#module.py contents
def func1(arguments...):
#do some stuff
import module
x = module.func1(arg1,arg2...)
y = module.func1(arg1,arg2...)
This is probably very simple but I just can't get my head around it.
So far, I've had quite a bit of success writing python programs, but they have all been pretty procedural, and only importing basic module functions. Classes are my next biggest hurdle.

You use class if you need multiple instance of it, and you want that instances don't interfere each other.
Module behaves like a singleton class, so you can have only one instance of them.
EDIT: for example if you have a module called example.py:
x = 0
def incr():
global x
x = x + 1
def getX():
return x
if you try to import these module twice:
import example as ex1
import example as ex2
ex1.incr()
ex1.getX()
1
ex2.getX()
1
This is why the module is imported only one time, so ex1 and ex2 points to the same object.

As long as you're only using pure functions (functions that only works on their arguments, always return the same result for the same arguments set, don't depend on any global/shared state and don't change anything - neither their arguments nor any global/shared state - IOW functions that don't have any side effects), then classes are indeed of a rather limited use. But that's functional programming, and while Python can technically be used in a functional style, it's possibly not the best choice here.
As soon has you have to share state between functions, and specially if some of these functions are supposed to change this shared state, you do have a use for OO concepts. There are mainly two ways to share state between functions: passing the state from function to function, or using globals.
The second solution - global state - is known to be troublesome, first because it makes understanding of the program flow (hence debugging) harder, but also because it prevents your code from being reentrant, which is a definitive no-no for quite a lot of now common use cases (multithreaded execution, most server-side web application code etc). Actually it makes your code practically unusable or near-unusable for anything except short simple one-shot scripts...
The second solution most often implies using half-informal complex datastructures (dicts with a given set of keys, often holding other dicts, lists, lists of dicts, sets etc), correctly initialising them and passing them from function to function - and of course have a set of functions that works on a given datastructure. IOW you are actually defining new complex datatypes (a data structure and a set of operations on that data structure), only using the lowest level tools the language provide.
Classes are actually a way to define such a data type at a higher level, grouping together the data and operations. They also offer a lot more, specially polymorphism, which makes for more generic, extensible code, and also easier unit testing.

Consider you have a file or a database with products, and each product has product id, price, availability, discount, published at web status, and more values. And you have second file with thousands of products that contain new prices and availability and discount. You want to update the values and keep control on how many products will be change and other stats. You can do it with Procedural programming and Functional programming but you will find yourself trying to discover tricks to make it work and most likely you will be lost in many different lists and sets.
On the other hand with Object-oriented programming you can create a class Product with instance variables the product-id, the old price, the old availability, the old discount, the old published status and some instance variables for the new values (new price, new availability, new discount, new published status). Than all you have to do is to read the first file/database and for every product to create a new instance of the class Product. Than you can read the second file and find the new values for your product objects. In the end every product of the first file/database will be an object and will be labeled and carry the old values and the new values. It is easier this way to track the changes, make statistics and update your database.
One more example. If you use tkinter, you can create a class for a top level window and every time you want to appear an information window or an about window (with custom color background and dimensions) you can simply create a new instance of this class.
For simple things classes are not needed. But for more complex things classes sometimes can make the solution easier.

I think the best answer is that it depends on what your indented object is supposed to be/do. But in general, there are some differences between a class and an imported module which will give each of them different features in the current module. Which the most important thing is that class has been defined to be objects, this means that they have a lot of options to act like an object which modules don't have. For example some special attributes like __getattr__, __setattr__, __iter__, etc. And the ability to create a lot of instances and even controlling the way that they are created. But for modules, the documentation describes their use-case perfectly:
If you quit from the Python interpreter and enter it again, the
definitions you have made (functions and variables) are lost.
Therefore, if you want to write a somewhat longer program, you are
better off using a text editor to prepare the input for the
interpreter and running it with that file as input instead. This is
known as creating a script. As your program gets longer, you may want
to split it into several files for easier maintenance. You may also
want to use a handy function that you’ve written in several programs
without copying its definition into each program.
To support this, Python has a way to put definitions in a file and use
them in a script or in an interactive instance of the interpreter.
Such a file is called a module; definitions from a module can be
imported into other modules or into the main module (the collection of
variables that you have access to in a script executed at the top
level and in calculator mode).

How does eclipse's pydev do code completion?

Does anyone know how pydev determines what to use for code completion? I'm trying to define a set of classes specifically to enable code completion. I've tried using __new__ to set __dict__ and also __slots__, but neither seems to get listed in pydev autocomplete.
I've got a set of enums I want to list in autocomplete, but I'd like to set them in a generator, not hardcode them all for each class.
So rather than
class TypeA(object):
ValOk = 1
ValSomethingSpecificToThisClassWentWrong = 4
def __call__(self):
return 42
I'd like do something like
def TYPE_GEN(name, val, enums={}):
def call(self):
return val
dct = {}
dct["__call__"] = call
dct['__slots__'] = enums.keys()
for k, v in enums.items():
dct[k] = v
return type(name, (), dct)
TypeA = TYPE_GEN("TypeA",42,{"ValOk":1,"ValSomethingSpecificToThisClassWentWrong":4})
What can I do to help the processing out?
edit:
The comments seem to be about questioning what I am doing. Again, a big part of what I'm after is code completion. I'm using python binding to a protocol to talk to various microcontrollers. Each parameter I can change (there are hundreds) has a name conceptually, but over the protocol I need to use its ID, which is effectively random. Many of the parameters accept values that are conceptually named, but are again represented by integers. Thus the enum.
I'm trying to autogenerate a python module for the library, so the group can specify what they want to change using the names instead of the error prone numbers. The __call__ property will return the id of the parameter, the enums are the allowable values for the parameter.
Yes, I can generate the verbose version of each class. One line for each type seemed clearer to me, since the point is autocomplete, not viewing these classes.

Ok, as pointed, your code is too dynamic for this... PyDev will only analyze your own code statically (i.e.: code that lives inside your project).
Still, there are some alternatives there:
Option 1:
You can force PyDev to analyze code that's in your library (i.e.: in site-packages) dynamically, in which case it could get that information dynamically through a shell.
To do that, you'd have to create a module in site-packages and in your interpreter configuration you'd need to add it to the 'forced builtins'. See: http://pydev.org/manual_101_interpreter.html for details on that.
Option 2:
Another option would be putting it into your predefined completions (but in this case it also needs to be in the interpreter configuration, not in your code -- and you'd have to make the completions explicit there anyways). See the link above for how to do this too.
Option 3:
Generate the actual code. I believe that Cog (http://nedbatchelder.com/code/cog/) is the best alternative for this as you can write python code to output the contents of the file and you can later change the code/rerun cog to update what's needed (if you want proper completions without having to put your code as it was a library in PyDev, I believe that'd be the best alternative -- and you'd be able to grasp better what you have as your structure would be explicit there).
Note that cog also works if you're in other languages such as Java/C++, etc. So, it's something I'd recommend adding to your tool set regardless of this particular issue.

Fully general code completion for Python isn't actually possible in an "offline" editor (as opposed to in an interactive Python shell).
The reason is that Python is too dynamic; basically anything can change at any time. If I type TypeA.Val and ask for completions, the system had to know what object TypeA is bound to, what its class is, and what the attributes of both are. All 3 of those facts can change (and do; TypeA starts undefined and is only bound to an object at some specific point during program execution).
So the system would have to know st what point in the program run do you want the completions from? And even if there were some unambiguous way of specifying that, there's no general way to know what the state of everything in the program is like at that point without actually running it to that point, which you probably don't want your editor to do!
So what pydev does instead is guess, when it's pretty obvious. If you have a class block in a module foo defining class Bar, then it's a safe bet that the name Bar imported from foo is going to refer to that class. And so you know something about what names are accessible under Bar., or on an object created by obj = Bar(). Sure, the program could be rebinding foo.Bar (or altering its set of attributes) at runtime, or could be run in an environment where import foo is hitting some other file. But that sort of thing happens rarely, and the completions are useful in the common case.
What that means though is that you basically lose completions whenever you use "too much" of Python's dynamic language flexibility. Defining a class by calling a function is one of those cases. It's not ready to guess that TypeA has names ValOk and ValSomethingSpecificToThisClassWentWrong; after all, there's presumably lots of other objects that result from calls to TYPE_GEN, but they all have different names.
So if your main goal is to have completions, I think you'll have to make it easy for pydev and write these classes out in full. Of course, you could use similar code to generate the python files (textually) if you wanted. It looks though like there's actually more "syntactic overhead" of defining these with dictionaries than as a class, though; you're writing "a": b, per item rather than a = b. Unless you can generate these more systematically or parse existing definition files or something, I think I'd find the static class definition easier to read and write than the dictionary driving TYPE_GEN.

The simpler your code, the more likely completion is to work. Would it be reasonable to have this as a separate tool that generates Python code files containing the class definitions like you have above? This would essentially be the best of both worlds. You could even put the name/value pairs in a JSON or INI file or what have you, eliminating the clutter of the methods call among the name/value pairs. The only downside is needing to run the tool to regenerate the code files when the codes change, but at least that's an automated, simple process.
Personally, I would just go with making things more verbose and writing out the classes manually, but that's just my opinion.
On a side note, I don't see much benefit in making the classes callable vs. just having an id class variable. Both require knowing what to type: TypeA() vs TypeA.id. If you want to prevent instantiation, I think throwing an exception in __init__ would be a bit more clear about your intentions.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.