Automated test-bench, Matlab or Python?

Automated test-bench, Matlab or Python? - python

I have a system, let's say a product firmware that has to be tested on its embedded platform. For this I can access the platform using a C library and I will need to control some instruments (function generators, multi-meters, oscilloscopes) to get some measurements.
To be more specific regarding my application, let's imagine I would like to check that the firmware of a parallel robot is perfectly working. Apart from the independent unit-tests inside the firmware source-code, I will need to do more tests that involve physical feedback. I can for instance measure the pressure of a arm against a sensor or measure the back EMF voltage. Some of these tests are critical (that means the overall testing procedure will fail with no second chance). Some other are not only pass-fail, they can raise a warning and the test can continue.
Because some routines are implemented in low-level C, they are part of the API that I am addressing with Matlab/Python. So these routines may fail and I will have to catch the error code into a try/catch.
At a different level, my test can be also broken in some ways. If a failure occurs, I would like to log the complete trace-back. Which test unit, which class instance, which method and which API function.
I have found two technologies that seem to be very suited to be used for that purpose: Matlab and Python. With both I can access C dlls, I can plot graphs and I can control instruments over a VISA port.
To write the series of tests in python I can use a combo of unittest, HTMLTestRunner and logging.
In Matlab it seems the existing packages are very poor. I have found log4m and Matlab2015 provides native functions for unit testing. However, my feeling is that Python is a bit more adapted for what I am trying to do.
I have to say I already have plenty of Matlab licenses and money is not an issue in this case.
I recently discovered that Matlab and Python offer both interfaces to talk to each others. Also, I've read that these two technologies are becoming very popular for testing purposes.
I would like to find solid arguments and concrete examples that will help me to find the right technology.
My current feeling is that Matlab will not give me any help to automate my test-bench with for instance external hooks on Git/ClearCase repositories. Bulding an HTML report and getting a good traceback information is not easy to achieve.
Is it possible to get good traceback information, nice logger accross my modules, and test classes that can be triggered by an external script in matlab?

It's tough to say without gathering more details on your specific use case and learning more how you are setting up your environment. Just looking at what you are using in python we have the unit test framework, a logging framework, and an HTML test runner.
As you say, MATLAB we have the unit test framework and it itself has logging features (1,2). Can those be used?. There is not (yet) have an HTML report, but you can:
Use the XMLPlugin to produce JUnit style xml and perform an XSLT to create html from it.
Plug in to a CI system using either JUnit or TAP and these systems typically have reporting capabilities. An example of doing this with Jenkins is here and here.
Ultimately, the CI system links do so how to run your tests automatically, and I can tell you that there has been and continues to be a very large investment in the diagnostic (i.e. traceback) information provided by the unit test framework. The logging features in the framework are catered to test logging rather than inter-module logging so I am not sure if that will work for what you would like, but there is also log4m like you say.

Related

I'm using Jaeger tracing python functions. Do i have to create spans in every function manually?

I've found an example: https://medium.com/velotio-perspectives/a-comprehensive-tutorial-to-implementing-opentracing-with-jaeger-a01752e1a8ce
I have a pretty large codebase and I really don't want to modify every function by adding a line like ' with tracer.start_span('booking') as span:'. Is there any way to do it?
Thanks in advance.

Jaeger is a distributed tracer, inspired by Google's Dapper paper, and so it is mainly used for tracing communication between different processes in a microservices / distributed system architecture, not so much for portions of code inside an application.
The way Jaeger is introduced into most applications is to integrate it into the part of the application that is receiving requests from the network. For example, if your Python application is receiving HTTP requests using Django or Flask, or other types of requests (e.g. gRPC) using some other framework, there will probably be a project somewhere on the internet that lets you hook Jaeger into your framework with a couple of lines of code. For the most popular frameworks, the Jaeger docs point to opentracing-contrib as a good source for these "client libraries".
While making extra tracing calls inside an application is not unheard of or discouraged with d.tracers, it's not something that tends to happen a lot, because d.tracers are typically used in microservices environments where the interactions between components are more important than what's happening inside the components.
If you do want to create tracing records inside an application, then it would be very unusual to do tracing of every single function. Instead, tracing inside an application would typically be done at the boundary of components in a modular monolith, i.e. when one component calls into another component.
Lastly, if what you really want is performance analysis of your single Python application at the level of each function, and you don't care about it's interaction with other applications in your system (maybe you only have the one?), then Jaeger is probably not the right tool. In that case, you would probably want to look for an Application Performance Monitoring or APM tool that works with Python and suits your needs.

recommendations for report generator (Python or web service)

I maintain several web applications and I'd like to add some "nice" reporting/analytics pages. Building that once is simple enough (e.g. using flot or similar plotting libraries) but somehow it seems like there should be a report generation library out there which "just" generates the necessary graphs without much coding + offer some filtering ability.
There are some tools out there but for some reason there was never a good fit:
must work on Linux
open source preferred though closed source works as well as long the pricing model is also suitable for small installs
Python API required (or external services using standard web protocols)
I realize that this is not exactly a unique question but I couldn't find other stackoverflow questions with the same scope. Any pointers appreciated.
Update (2012-08-09, 15:10 UTC): I realized I did not state some more requirements/wishes:
web interface to access reports
access control: Each user can only get reports on his own data (simple to do with a library, might be hard with an external server)
filtering: I need interactive filtering of values based on some parameters (e.g. "only events in this time frame", "only in place X").

Windward* is one software company that offers a solution that seems to meet most of your needs. They offer a Python API through either Jython or a RESTful API (their Java Engine and Javelin, respectively), and their main strength is that template design is done in Microsoft Office, so reports can be very flexible and are easy to put together (most people already know how to use Word, so there's also much less learning curve than other solutions out there). You can add dynamic filters that take parameters at runtime or change on-the-fly, you can output to a variety of formats including HTML and PDF, and it works with pretty much every major datasource. For a web interface, you can either build your own and easily integrate reporting into it (Engine) or buy one pre-built and modify it to your specifications (Javelin).
On the downside, they are closed-source and without knowing more about your setup, it would be difficult for me to say whether their pricing would work. Might be worth a look, though--the links above and their documentation wiki are probably good places to start looking to see if you're a fit.
*Disclaimer: I work for Windward. I do believe they are one of the better reporting packages out there, but there are others that may fit your needs too.

What programming language features are well suited for developing a live coding framework?

I would like to build a "live coding framework".
I should explain what is meant by "live coding framework". I'll do so by comparing live coding to traditional coding.
Generally put, in traditional programming you write code, sometimes compile it, then launch an executable or open a script in some sort of interpreter. If you want to modify your application you must repeat this process. A live coding framework enables code to be updated while the application is running and reloaded on demand. Perhaps this reloading happens each time a file containing code is changed or by some other action. Changes in the code are then reflected in the application as it is running. There is no need to close the program and to recompile and relaunch it.
In this case, the application is a windowed app that has an update/draw loop, is most likely using OpenGL for graphics, an audio library for sound processing ( SuperCollider? ) and ideally a networking lib.
Of course I have preferred languages, though I'm not certain that any of them would be well suited for this kind of architecture. Ideally I would use Python, Lua, Ruby or another higher level language. However, a friend recently suggested Clojure as a possibility, so I am considering it as well.
I would like to know not only what languages would be suitable for this kind of framework but, generally, what language features would make a framework such as this possible.

I have implemented a live coding feature in Lua as part of the ZeroBrane Studio IDE. It works exactly as you described by reloading the application when a change in the code is made. I'm working on possible improvements to modify values at run-time to avoid full reload of the application. It's a pure Lua-based solution and doesn't require any modifications to the VM.
You can see the demo of the live coding as currently implemented here: http://notebook.kulchenko.com/zerobrane/live-coding-in-lua-bret-victor-style.
In terms of language features used/required, I rely on:
the ability to interrupt/resume a running application (this is based on debug.hook and error() calls),
the ability to interact with the (unmodified) application remotely (this is done based on debug.hook, TCP interactions with select() support to detect if a new request is being sent from the host machine, as well and on coroutines to switch between the main application and the live coding module), and
the ability to inject new code into the application (this mechanism is also using co-routines, but I'm sure there are alternatives). There is also a possibility to inject just a modified fragment, but it need to be at the level of a function, and if this function is a local to some other function, you need to include that too and so on.

Clojure has pretty much everything you are likely to want as a live coding language. Main highlights:
Interactive REPL - so you can interact directly with your running program. Even when I'm doing "traditional programming" I tend to write code interactively and copy the bits I like into a source file later. Clojure is just designed to work this way - pretty much everything in your program is inspectable, modifiable and replaceable at runtime.
Great concurrency support - you can kick off concurrent background tasks trivially with code like (future (some-function)). More importantly, Clojure's STM and emphasis on high performance immutable data structures will take care of the more subtle concurrency aspects (e.g. what happens if I update a live data structure while it is in the middle of being rendered??)
Library availability - it's a JVM language so you can pull in all the audio, visual, IO or computational tools you require from the Java ecosystem. It's easy to wrap these in a line or two of Clojure so that you get a concise interface to the functions that you need
Macros - as Clojure is a homoiconic language you can take advantage of the Lisp ability to write powerful macros that extend the language. You can effectively build the exact syntax that you want to use in the live environment, and let the compiler do all the hard work of creating the complete code behind the scenes.
Dynamic typing - the benefits of this can be argued both ways, but it's certainly a huge benefit when trying to write code quickly and concisely.
Active community with a lot of cool projects - you're likely to find a lot of people interested in similar live coding techniques in the Clojure community.
A couple of links you might find interesting:
Paul Graham on Lisp - beating the averages
Live Clojure coding example with the Overtone sound synthesizer (a frontend to SuperCollider)

The only thing that’s necessary to make this work is a form of dynamic binding, e.g., message passing in Erlang or eval in many other languages.
If you have dynamic binding, then you can change the target of a message without affecting the message, or a message without affecting the target—provided that a target is defined when you try to send a message to it, and a message is defined for the targets to which you send it, when you send it.
When changing a target, all you have to do is serve messages to the previous version until the new version is in place, then do a small locked update to transition to the new version. Similarly, when changing a message, you just serve the old version till the new one is available.
Readily hot-swappable code must still be designed as such, however—the application must be modular enough that replacing the implementation of a component does not cause an interruption, and that can only come from careful programming.

It's well and good to have 'live coding' on your dev box, but a way to directly interact with a deployed server takes it a lot closer to being 'real'. For this you need a network aware REPL.
clojure provides this nicely in the form of a socket repl. This allows you to remotely attach to the running version of your code on your deployed tomcat server (for instance). You can then attach your favorite swank-enabled development tool and hack away.

Smalltalk is probably the best bet for this. As unlike the others, it has a whole IDE for live coding, not just a REPL

Tcl has such a thing already. For example, you can write a gui program that creates a separate window that has an interactive prompt. From there you can reload your code, type in new code, etc.
You can do this with any gui toolkit, though some will be much harder than others. It should be easy with python, though the indentation thing -- IMHO -- makes interactive use challenging. I'm reasonably certain most other dynamic languages can do this without too much trouble.
Look at it this way: if your toolkit lets you open more than one window, there's no reason why one of those windows can't be an interactive prompt. All you need is the ability to open a window, and some sort of "eval" command that runs code fed to it as a string.

python on google appengine has repote_api_shell.py. this is not a full live-coding suite - clojure on emacs w/ swank-clojure has had much more real-life use as far as integrating 'livecoding' into everyday development rhythms - but a lot of people don't realize this is possible in certain python environments.
$ PYTHONPATH=. remote_api_shell.py -s dustin-getz.appspot.com
App Engine remote_api shell
Python 2.7.1 (r271:86832, Jun 16 2011, 16:59:05)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)]
The db, users, urlfetch, and memcache modules are imported.
dustin-getz> import models
dustin-getz> models.BlogPost(title='a modern take on automated testing', link='https://docs.google.com/document/pub?id=1DUxQogBg45rOTK4c5_SfEHiQcvL5c207Ivcy-gDNx2s', dont_publish_feed=False).put()
dustin-getz> items = models.BlogPost.all().filter('dont_publish_feed =', False).order('-published_date').fetch(100)
dustin-getz> len(items)
58
dustin-getz> for item in items[:5]: print item.title
a modern take on automated testing
Notes: Running a startup on haskell
the [un]necessity of superstar middle management in bigcos
"everything priced above its proper value"
stages of growth as a software engineer

I'm working on a live coding feature for PyDev's Python editor. It was inspired by Bret Victor's Inventing on Principle talk, and I've implemented a program state display as well as turtle graphics. They both update as you type your Python code in Eclipse.
The project is hosted on GitHub, and I've posted a demo video, as well as a tutorial.
The main features of Python that I used were abstract syntax trees and dynamic code execution. I take the user's code, parse it into a tree, then instrument any assignment statements, loop iterations, and function calls. Once I've instrumented the tree, I execute it and display the report or draw the requested turtle graphics.
I haven't implemented the swapping feature that other answers discuss. Instead, I always run the code to completion or a time out. I envision live coding as an enhancement to test-driven development, not as a way to hack on a live application. However, I will think more about what swapping out pieces of a live application would let me do.

Testing for external resource consistency / skipping django tests

I'm writing tests for a Django application that uses an external data source. Obviously, I'm using fake data to test all the inner workings of my class but I'd like to have a couple of tests for the actual fetcher as well. One of these will need to verify that the external source is still sending the data in the format my application expects, which will mean making the request to retrieve that information in the test.
Obviously, I don't want our CI to come down when there is a network problem or the data provider has a spot of downtime. In this case, I would like to throw a warning that skips the rest of that test method and doesn't contribute to an overall failure. This way, if the data arrives successfully I can check it for consistency (failing if there is a problem) but if the data cannot be fetched it logs a warning so I (or another developer) know to quickly check that the data source is ok.
Basically, I would like to test my external source without being dependant on it!
Django's test suite uses Python's unittest module (at least, that's how I'm using it) which looks useful, given that the documentation for it describes Skipping tests and expected failures. This feature is apparently 'new in version 2.7', which explains why I can't get it to work - I've checked the version of unittest I have installed on from the console and it appears to be 1.63!
I can't find a later version of unittest in pypi so I'm wondering where I can get hold of the unittest version described in that document and whether it will work with Django (1.2).
I'm obviously open to recommendations / discussion on whether or not this is the best approach to my problem :)
[EDIT - additional information / clarification]
As I said, I'm obviously mocking the dependancy and doing my tests on that. However, I would also like to be able to check that the external resource (which typically is going to be an API) still matches my expected format, without bringing down CI if there is a network problem or their server is temporarily down. I basically just want to check the consistency of the resource.
Consider the following case...
If you have written a Twitter application, you will have tests for all your application's methods and behaviours - these will use fake Twitter data. This gives you a complete, self-contained set of tests for your application. The problem is that this doesn't actually check that the application works because your application inherently depends on the consistency of Twitter's API. If Twitter were to change an API call (perhaps change the URL, the parameters or the response) the application would stop working even though the unit tests would still pass. (Or perhaps if they were to completely switch off basic authentication!)
My use case is simpler - I have a single xml resource that is used to import information. I have faked the resource and tested my import code but I would like to have a test that checks the format of that xml resource has not changed.
My question is about skipping tests in Django's test runner so I can throw a warning if the resource is unavailable without the tests failing, specifically getting a version of Python's unittest module that supports this behaviour. I've given this much background information to allow anyone with experience in this area to offer alternative suggestions.
Apologies for the lengthy question, I'm aware most people won't read this now.
I've 'bolded' the important bits to make it easier to read.

I created a separate answer since your edit invalidated my last answer.
I assume you're running on Python version 2.6 - I believe the changes that you're looking for in unittest are available in Python version 2.7. Since unittest is in the standard library, updating to Python 2.7 should make those changes available to you. Is that an option that will work for you?
One other thing that I might suggest is to maybe separate the "external source format verification" test(s) into a separate test suite and run that separately from the rest of your unit tests. That way your core unit tests are still fast and you don't have to worry about the external dependencies breaking your main test suites. If you're using Hudson, it should be fairly easy to create a separate job that will handle those tests for you. Just a suggestion.

The new features in unittest in 2.7 have been backported to 2.6 as unittest2. You can just pip install and substitute unittest2 for unittest and your tests will work as thyey did plus you get the new features without upgrading to 2.7.

What are you trying to test? The code in your Django application(s) or the dependency? Can you just Mock whatever that external dependency is? If you're just wanting to test your Django application, then I would say Mock the external dependency, so your tests are not dependent upon that external resource's availability.
If you can post some code of your "actual fetcher" maybe you will get some tips on how you could use mocks.

How to do a meaningful code-coverage analysis of my unit-tests?

I manage the testing for a very large financial pricing system. Recently our HQ have insisted that we verify that every single part of our project has a meaningful test in place. At the very least they want a system which guarantees that when we change something we can spot unintentional changes to other sub-systems. Preferably they want something which validates the correctness of every component in our system.
That's obviously going to be quite a lot of work! It could take years, but for this kind of project it's worth it.
I need to find out which parts of our code are not covered by any of our unit-tests. If I knew which parts of my system were untested then I could set about developing new tests which would eventually approach towards my goal of complete test-coverage.
So how can I go about running this kind of analysis. What tools are available to me?
I use Python 2.4 on Windows 32bit XP
UPDATE0:
Just to clarify: We have a very comprehensive unit-test suite (plus a seperate and very comprehensive regtest suite which is outside the scope of this exercise). We also have a very stable continuous integration platform (built with Hudson) which is designed to split-up and run standard python unit-tests across our test facility: Approx 20 PCs built to the company spec.
The object of this exercise is to plug any gaps in our python unittest suite (only) suite so that every component has some degree of unittest coverage. Other developers will be taking responsibility for non Python components of the project (which are also outside of scope).
"Component" is intentionally vague: Sometime it will be a class, other time an entire module or assembly of modules. It might even refer to a single financial concept (e.g. a single type of financial option or a financial model used by many types of option). This cake can be cut in many ways.
"Meaningful" tests (to me) are ones which validate that the function does what the developer originally intended. We do not want to simply reproduce the regtests in pure python. Often the developer's intent is not immediatly obvious, hence the need to research and clarify anything which looks vague and then enshrine this knowledge in a unit-test which makes the original intent quite explicit.

For the code coverage alone, you could use coverage.py.
As for coverage.py vs figleaf:
figleaf differs from the gold standard
of Python coverage tools
('coverage.py') in several ways.
First and foremost, figleaf uses the
same criterion for "interesting" lines
of code as the sys.settrace function,
which obviates some of the complexity
in coverage.py (but does mean that
your "loc" count goes down). Second,
figleaf does not record code executed
in the Python standard library, which
results in a significant speedup. And
third, the format in which the
coverage format is saved is very
simple and easy to work with.
You might want to use figleaf if
you're recording coverage from
multiple types of tests and need to
aggregate the coverage in interesting
ways, and/or control when coverage is
recorded. coverage.py is a better
choice for command-line execution, and
its reporting is a fair bit nicer.
I guess both have their pros and cons.

First step would be writing meaningfull tests. If you'll be writing tests only meant to reach full coverage, you'll be counter-productive; it will probably mean you'll focus on unit's implementation details instead of it's expectations.
BTW, I'd use nose as unittest framework (http://somethingaboutorange.com/mrl/projects/nose/0.11.1/); it's plugin system is very good and leaves coverage option to you (--with-coverage for Ned's coverage, --with-figleaf for Titus one; support for coverage3 should be coming), and you can write plugisn for your own build system, too.

FWIW, this is what we do. Since I don't know about your Unit-Test and Regression-Test setup, you have to decide yourself whether this is helpful.
Every Python package has
UnitTests.
We automatically detect unit tests using nose. Nose automagically detects standard Python unit tests (basically everything that looks like a test). Thereby we don't miss unit-tests. Nose also has a plug-in concept so that you can produce, e.g. nice output.
We strive for 100% coverage for
unit-testing. To this end, we use
Coverage
to check, because a nose-plugin provides integration.
We have set up Eclipse (our IDE) to automatically run nose whenever a file changes so that the unit-tests always get executed, which shows code-coverage as a by-product.

"every single part of our project has a meaningful test in place"
"Part" is undefined. "Meaningful" is undefined. That's okay, however, since it gets better further on.
"validates the correctness of every component in our system"
"Component" is undefined. But correctness is defined, and we can assign a number of alternatives to component. You only mention Python, so I'll assume the entire project is pure Python.
Validates the correctness of every module.
Validates the correctness of every class of every module.
Validates the correctness of every method of every class of every module.
You haven't asked about line of code coverage or logic path coverage, which is a good thing. That way lies madness.
"guarantees that when we change something we can spot unintentional changes to other sub-systems"
This is regression testing. That's a logical consequence of any unit testing discipline.
Here's what you can do.
Enumerate every module. Create a unittest for that module that is just a unittest.main(). This should be quick -- a few days at most.
Write a nice top-level unittest script that uses a testLoader to all unit tests in your tests directory and runs them through the text runner. At this point, you'll have a lot of files -- one per module -- but no actual test cases. Getting the testloader and the top-level script to work will take a few days. It's important to have this overall harness working.
Prioritize your modules. A good rule is "most heavily reused". Another rule is "highest risk from failure". Another rule is "most bugs reported". This takes a few hours.
Start at the top of the list. Write a TestCase per class with no real methods or anything. Just a framework. This takes a few days at most. Be sure the docstring for each TestCase positively identifies the Module and Class under test and the status of the test code. You can use these docstrings to determine test coverage.
At this point you'll have two parallel tracks. You have to actually design and implement the tests. Depending on the class under test, you may have to build test databases, mock objects, all kinds of supporting material.
Testing Rework. Starting with your highest priority untested module, start filling in the TestCases for each class in each module.
New Development. For every code change, a unittest.TestCase must be created for the class being changed.
The test code follows the same rules as any other code. Everything is checked in at the end of the day. It has to run -- even if the tests don't all pass.
Give the test script to the product manager (not the QA manager, the actual product manager who is responsible for shipping product to customers) and make sure they run the script every day and find out why it didn't run or why tests are failing.
The actual running of the master test script is not a QA job -- it's everyone's job. Every manager at every level of the organization has to be part of the daily build script output. All of their jobs have to depend on "all tests passed last night". Otherwise, the product manager will simply pull resources away from testing and you'll have nothing.

Assuming you already have a relatively comprehensive test suite, there are tools for the python part. The C part is much more problematic, depending on tools availability.
For python unit tests
For C code, it is difficult on many platforms because gprof, the Gnu code profiler cannot handle code built with -fPIC. So you have to build every extension statically in this case, which is not supported by many extensions (see my blog post for numpy, for example). On windows, there may be better code coverage tools for compiled code, but that may require you to recompile the extensions with MS compilers.
As for the "right" code coverage, I think a good balance it to avoid writing complicated unit tests as much as possible. If a unit test is more complicated than the thing it tests, then it is a probably not a good test, or a broken test.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.