Embedding test code or data within doctest strings - python

I'd like to have several of the doctests in a file share test data and/or functions. Is there a way to do this without locating them in an external file or within the code of the file being tested?
update
"""This is the docstring for the module ``fish``.
I've discovered that I can access the module under test
from within the doctest, and make changes to it, eg
>>> import fish
>>> fish.data = {1: 'red', 2: 'blue'}
"""
def jef():
"""
Modifications made to the module will persist across subsequent tests:
>>> import fish
>>> fish.data[1]
'red'
"""
pass
def seuss():
"""
Although the doctest documentation claims that
"each time doctest finds a docstring to test,
it uses a shallow copy of M‘s globals",
modifications made to the module by a doctest
are not imported into the context of subsequent docstrings:
>>> data
Traceback (most recent call last):
...
NameError: name 'data' is not defined
"""
pass
So I guess that doctest copies the module once, and then copies the copy for each docstring?
In any case, importing the module into each docstring seems usable, if awkward.
I'd prefer to use a separate namespace for this, to avoid accidentally trampling on actual module data that will or will not be imported into subsequent tests in an possibly undocumented manner.
It's occurred to me that it's (theoretically) possible to dynamically create a module in order to contain this namespace. However as yet I've not gotten any direction on how to do that from the question I asked about that a while back. Any information is quite welcome! (as a response to the appropriate question)
In any case I'd prefer to have the changes be propagated directly into the namespace of subsequent docstrings. So my original question still stands, with that as a qualifier.

This is the sort of thing that causes people to turn away from doctests: as your tests grow in complexity, you need real programming tools to be able to engineer your tests just as you would engineer your product code.
I don't think there's a way to include shared data or functions in doctests other than defining them in your product code and then using them in the doctests.
You are going to need to use real code to define some of your test infrastructure. If you like doctests, you can use that infrastructure from your doctests.

This is possible, albeit perhaps not advertised as loudly.
To obtain literate modules with tests that all use a shared execution context (i.e. individual tests can share and re-use their results), one has to look at the relevant part of documentation which says:
... each time doctest finds a docstring to test, it uses a shallow copy of M‘s globals, so that running tests doesn’t change the module’s real globals, and so that one test in M can’t leave behind crumbs that accidentally allow another test to work.
...
You can force use of your own dict as the execution context by passing globs=your_dict to testmod() or testfile() instead.
Given this, I managed to reverse-engineer from doctest module that besides using copies (i.e. the dict's copy() method), it also clears the globals dict (using clear()) after each test.
Thus, one can patch their own globals dictionary with something like:
class Context(dict):
def clear(self):
pass
def copy(self):
return self
and then use it as:
import doctest
from importlib import import_module
module = import_module('some.module')
doctest.testmod(module,
# Make a copy of globals so tests in this
# module don't affect the tests in another
glob=Context(module.__dict__.copy()))

Related

How to override or replace an imported function call on a different module

I am leveraging an existing framework for a tool build activity based on python. Let me get into my issue straight :
Let's say the framework I am using is having a module named m1.py where I am having below function
def func_should_not_run(*args,**kwargs):
<doing something >
And I have a another module named m2.py where I am having below class :
from m1 import
class JustAClass:
def __init__(self,*args,*kwargs):
<All kind of initialisation..>
def run_something(self,*args,*kwargs):
<lots of code before>
func_should_not_run(*args,*kwargs)
<lots of code after>
Now my code module my_mod.py is having below class where I am creating an instance of above class from framework and calling m2.JustAClass.run_something inside another method as below
class JustAnotherClass:
def __init__(self,*args,*kwargs):
<All kind of initialisation..>
self.obj1=JustAClass(*some_args,**some_kwargs)
def run(self):
<some code before>
self.obj1.run_something(*some_other_args,**some_other_kwargs)
<some code after>
Now due to some implementation issue with m1.func_should_not_run which is getting called inside m2.JustAClass.run_something , I need to replace it with my own function func_should_run so that when func_should_not_run will be called inside m2.JustAClass.run_something, it should instead execute func_should_run from my module.
How can I achieve this?
Is there any way if I can override the import statement "from m1 import" on m2.py from my_mod.py?
This solution is risky for some aspects and could potentially fail given its side-effects, but worth to be mentioned in my opinion.
The idea is to replace (or better reload) the module that depends on the module you want to change, after some adjustment. I am going to start from the code and then I will show you the problems and limits of this approach:
from m2 import JustAClass
def func_should_run():
print('This is the function you want to call')
class JustAnotherClass:
def __init__(self,*args, **kwargs):
self.obj1 = JustAClass(*args, **kwargs)
def run(self):
self.obj1.run_something()
if __name__ == '__main__':
import importlib
importlib.import_module('m1').func_should_not_run = func_should_run
importlib.reload(importlib.import_module('m2'))
janc = JustAnotherClass()
janc.run()
Output:
This is the function you want to call
After importing importlib:
importlib.import_module('m1').func_should_not_run = func_should_run: I am importing module m1 and changing the func_should_not_run reference to func_should_run. This means that, for all the following calls to func_should_not_run, the code executed is the one of func_should_run. Obviously, this is also not valid for objects referencing the old func_should_not_run, like m2.JustAClass, so
importlib.reload(importlib.import_module('m2')): here I am reloading the module m2, that is going to use the new version of func_should_not_run because the module m1 is already loaded in the cache (i.e. sys.module) and therefore is not going to reload it (for this reason Transitive reloading can't occur, unless you explicitly do that).
From now on, every instance of JustAnotherClass correctly calls func_should_run
Should you use importlib.reload() for this?
Typically, reloading a module is useful when you have applied changes to a certain module and you do not want to restart the whole system to see those changes. In your case, unless you have clear in mind all the risks of this approach, you are kind of abusing the reload().
What are the main side-effects of this solution?
For start, reloading has its costs, especially if inside the module you have some initialization code that you do not want to re-execute. This means:
You are inevitably going to execute the module code twice (at least)
Be sure to comment your code explaining that every occurence of func_should_not_run is actually replace with func_should_run, but this is definitely not a good practice and not maintainable if used in many places.
To conclude, it is a simple as much as risky solution that can be adopted taking all the necessary precautions, with the awareness that it is just a hack and not a reasonable design decision.

How do I mock the hierarchy of non-existing modules?

Let's assume that we have a system of modules that exists only on production stage. At the moment of testing these modules do not exist. But still I would like to write tests for the code that uses those modules. Let's also assume that I know how to mock all the necessary objects from those modules. The question is: how do I conveniently add module stubs into current hierarchy?
Here is a small example. The functionality I want to test is placed in a file called actual.py:
actual.py:
def coolfunc():
from level1.level2.level3_1 import thing1
from level1.level2.level3_2 import thing2
do_something(thing1)
do_something_else(thing2)
In my test suite I already have everything I need: I have thing1_mock and thing2_mock. Also I have a testing function. What I need is to add level1.level2... into current module system. Like this:
tests.py
import sys
import actual
class SomeTestCase(TestCase):
thing1_mock = mock1()
thing2_mock = mock2()
def setUp(self):
sys.modules['level1'] = what should I do here?
#patch('level1.level2.level3_1.thing1', thing1_mock)
#patch('level1.level2.level3_1.thing1', thing2_mock)
def test_some_case(self):
actual.coolfunc()
I know that I can substitute sys.modules['level1'] with an object containing another object and so on. But it seems like a lot of code for me. I assume that there must be much simpler and prettier solution. I just cannot find it.
So, no one helped me with my problem and I decided to solve it by myself. Here is a micro-lib called surrogate which allows one to create stubs for non-existing modules.
Lib can be used with mock like this:
from surrogate import surrogate
from mock import patch
#surrogate('this.module.doesnt.exist')
#patch('this.module.doesnt.exist', whatever)
def test_something():
from this.module.doesnt import exist
do_something()
Firstly #surrogate decorator creates stubs for non-existing modules, then #patch decorator can alter them. Just as #patch, #surrogate decorators can be used "in plural", thus stubbing more than one module path. All stubs exist only at the lifetime of decorated function.
If anyone gets any use of this lib, that would be great :)

Unpickling a function into a different context in Python

I have written a Python interface to a process-centric job distribution system we're developing/using internally at my workplace. While reasonably skilled programmers, the primary people using this interface are research scientists, not software developers, so ease-of-use and keeping the interface out of the way to the greatest degree possible is paramount.
My library unrolls a sequence of inputs into a sequence of pickle files on a shared file server, then spawns jobs that load those inputs, perform the computation, pickle the results, and exit; the client script then picks back up and produces a generator that loads and yields the results (or rethrows any exception the calculation function did.)
This is only useful since the calculation function itself is one of the serialized inputs. cPickle is quite content to pickle function references, but requires the pickled function to be reimportable in the same context. This is problematic. I've already solved the problem of finding the module to reimport it, but the vast majority of the time, it is a top-level function that is pickled and, thus, does not have a module path. The only strategy I've found to be able to unpickle such a function on the computation nodes is this nauseating little approach towards simulating the original environment in which the function was pickled before unpickling it:
...
# At this point, we've identified the source of the target function.
# A string by its name lives in "modname".
# In the real code, there is significant try/except work here.
targetModule = __import__(modname)
globalRef = globals()
for thingie in dir(targetModule):
if thingie not in globalRef:
globalRef[thingie] = targetModule.__dict__[thingie]
# sys.argv[2]: the path to the pickle file common to all jobs, which contains
# any data in common to all invocations of the target function, then the
# target function itself
commonFile = open(sys.argv[2], "rb")
commonUnpickle = cPickle.Unpickler(commonFile)
commonData = commonUnpickle.load()
# the actual function unpack I'm having trouble with:
doIt = commonUnpickle.load()
The final line is the most important one here- it's where my module is picking up the function it should actually be running. This code, as written, works as desired, but directly manipulating the symbol tables like this is unsettling.
How can I do this, or something very much like this that does not force the research scientists to separate their calculation scripts into a proper class structure (they use Python like the most excellent graphing calculator ever and I would like to continue to let them do so) the way Pickle desperately wants, without the unpleasant, unsafe, and just plain scary __dict__-and-globals() manipulation I'm using above? I fervently believe there has to be a better way, but exec "from {0} import *".format("modname") didn't do it, several attempts to inject the pickle load into the targetModule reference didn't do it, and eval("commonUnpickle.load()", targetModule.__dict__, locals()) didn't do it. All of these fail with Unpickle's AttributeError over being unable to find the function in <module>.
What is a better way?
Pickling functions can be rather annoying if trying to move them into a different context. If the function does not reference anything from the module that it is in and references (if anything) modules that are guaranteed to be imported, you might check some code from a Rudimentary Database Engine found on the Python Cookbook.
In order to support views, the academic module grabs the code from the callable when pickling the query. When it comes time to unpickle the view, a LambdaType instance is created with the code object and a reference to a namespace containing all imported modules. The solution has limitations but worked well enough for the exercise.
Example for Views
class _View:
def __init__(self, database, query, *name_changes):
"Initializes _View instance with details of saved query."
self.__database = database
self.__query = query
self.__name_changes = name_changes
def __getstate__(self):
"Returns everything needed to pickle _View instance."
return self.__database, self.__query.__code__, self.__name_changes
def __setstate__(self, state):
"Sets the state of the _View instance when unpickled."
database, query, name_changes = state
self.__database = database
self.__query = types.LambdaType(query, sys.modules)
self.__name_changes = name_changes
Sometimes is appears necessary to make modifications to the registered modules available in the system. If for example you need to make reference to the first module (__main__), you may need to create a new module with your available namespace loaded into a new module object. The same recipe used the following technique.
Example for Modules
def test_northwind():
"Loads and runs some test on the sample Northwind database."
import os, imp
# Patch the module namespace to recognize this file.
name = os.path.splitext(os.path.basename(sys.argv[0]))[0]
module = imp.new_module(name)
vars(module).update(globals())
sys.modules[name] = module
Your question was long, and I was too caffeinated to make it through your very long question… However, I think you are looking to do something that there's a pretty good existing solution for already. There's a fork of the parallel python (i.e. pp) library that takes functions and objects and serializes them, sends them to different servers, and then unpikles and executes them. The fork lives inside the pathos package, but you can download it independently here:
http://danse.cacr.caltech.edu/packages/dev_danse_us
The "other context" in that case is another server… and the objects are transported by converting the objects to source code and then back to objects.
If you are looking to use pickling, much in the way you are doing already, there's an extension to mpi4py that serializes arguments and functions, and returns pickled return values… The package is called pyina, and is commonly used to ship code and objects to cluster nodes in coordination with a cluster scheduler.
Both pathos and pyina provide map abstractions (and pipe), and try to hide all of the details of parallel computing behind the abstractions, so scientists don't need to learn anything except how to program normal serial python. They just use one of the map or pipe functions, and get parallel or distributed computing.
Oh, I almost forgot. The dill serializer includes dump_session and load_session functions that allow the user to easily serialize their entire interpreter session and send it to another computer (or just save it for later use). That's pretty handy for changing contexts, in a different sense.
Get dill, pathos, and pyina here: https://github.com/uqfoundation
For a module to be recognized as loaded I think it must by in sys.modules, not just its content imported into your global/local namespace. Try to exec everything, then get the result out of an artificial environment.
env = {"fn": sys.argv[2]}
code = """\
import %s # maybe more
import cPickle
commonFile = open(fn, "rb")
commonUnpickle = cPickle.Unpickler(commonFile)
commonData = commonUnpickle.load()
doIt = commonUnpickle.load()
"""
exec code in env
return env["doIt"]
While functions are advertised as first-class objects in Python, this is one case where it can be seen that they are really second-class objects. It is the reference to the callable, not the object itself, that is pickled. (You cannot directly pickle a lambda expression.)
There is an alternate usage of __import__ that you might prefer:
def importer(modulename, symbols=None):
u"importer('foo') returns module foo; importer('foo', ['bar']) returns {'bar': object}"
if modulename in sys.modules: module = sys.modules[modulename]
else: module = __import__(modulename, fromlist=['*'])
if symbols == None: return module
else: return dict(zip(symbols, map(partial(getattr, module), symbols)))
So these would all be basically equivalent:
from mymodule.mysubmodule import myfunction
myfunction = importer('mymodule.mysubmodule').myfunction
globals()['myfunction'] = importer('mymodule.mysubmodule', ['myfunction'])['myfunction']

What is monkey patching?

I am trying to understand, what is monkey patching or a monkey patch?
Is that something like methods/operators overloading or delegating?
Does it have anything common with these things?
No, it's not like any of those things. It's simply the dynamic replacement of attributes at runtime.
For instance, consider a class that has a method get_data. This method does an external lookup (on a database or web API, for example), and various other methods in the class call it. However, in a unit test, you don't want to depend on the external data source - so you dynamically replace the get_data method with a stub that returns some fixed data.
Because Python classes are mutable, and methods are just attributes of the class, you can do this as much as you like - and, in fact, you can even replace classes and functions in a module in exactly the same way.
But, as a commenter pointed out, use caution when monkeypatching:
If anything else besides your test logic calls get_data as well, it will also call your monkey-patched replacement rather than the original -- which can be good or bad. Just beware.
If some variable or attribute exists that also points to the get_data function by the time you replace it, this alias will not change its meaning and will continue to point to the original get_data. (Why? Python just rebinds the name get_data in your class to some other function object; other name bindings are not impacted at all.)
A MonkeyPatch is a piece of Python code which extends or modifies
other code at runtime (typically at startup).
A simple example looks like this:
from SomeOtherProduct.SomeModule import SomeClass
def speak(self):
return "ook ook eee eee eee!"
SomeClass.speak = speak
Source: MonkeyPatch page on Zope wiki.
What is a monkey patch?
Simply put, monkey patching is making changes to a module or class while the program is running.
Example in usage
There's an example of monkey-patching in the Pandas documentation:
import pandas as pd
def just_foo_cols(self):
"""Get a list of column names containing the string 'foo'
"""
return [x for x in self.columns if 'foo' in x]
pd.DataFrame.just_foo_cols = just_foo_cols # monkey-patch the DataFrame class
df = pd.DataFrame([list(range(4))], columns=["A","foo","foozball","bar"])
df.just_foo_cols()
del pd.DataFrame.just_foo_cols # you can also remove the new method
To break this down, first we import our module:
import pandas as pd
Next we create a method definition, which exists unbound and free outside the scope of any class definitions (since the distinction is fairly meaningless between a function and an unbound method, Python 3 does away with the unbound method):
def just_foo_cols(self):
"""Get a list of column names containing the string 'foo'
"""
return [x for x in self.columns if 'foo' in x]
Next we simply attach that method to the class we want to use it on:
pd.DataFrame.just_foo_cols = just_foo_cols # monkey-patch the DataFrame class
And then we can use the method on an instance of the class, and delete the method when we're done:
df = pd.DataFrame([list(range(4))], columns=["A","foo","foozball","bar"])
df.just_foo_cols()
del pd.DataFrame.just_foo_cols # you can also remove the new method
Caveat for name-mangling
If you're using name-mangling (prefixing attributes with a double-underscore, which alters the name, and which I don't recommend) you'll have to name-mangle manually if you do this. Since I don't recommend name-mangling, I will not demonstrate it here.
Testing Example
How can we use this knowledge, for example, in testing?
Say we need to simulate a data retrieval call to an outside data source that results in an error, because we want to ensure correct behavior in such a case. We can monkey patch the data structure to ensure this behavior. (So using a similar method name as suggested by Daniel Roseman:)
import datasource
def get_data(self):
'''monkey patch datasource.Structure with this to simulate error'''
raise datasource.DataRetrievalError
datasource.Structure.get_data = get_data
And when we test it for behavior that relies on this method raising an error, if correctly implemented, we'll get that behavior in the test results.
Just doing the above will alter the Structure object for the life of the process, so you'll want to use setups and teardowns in your unittests to avoid doing that, e.g.:
def setUp(self):
# retain a pointer to the actual real method:
self.real_get_data = datasource.Structure.get_data
# monkey patch it:
datasource.Structure.get_data = get_data
def tearDown(self):
# give the real method back to the Structure object:
datasource.Structure.get_data = self.real_get_data
(While the above is fine, it would probably be a better idea to use the mock library to patch the code. mock's patch decorator would be less error prone than doing the above, which would require more lines of code and thus more opportunities to introduce errors. I have yet to review the code in mock but I imagine it uses monkey-patching in a similar way.)
According to Wikipedia:
In Python, the term monkey patch only
refers to dynamic modifications of a
class or module at runtime, motivated
by the intent to patch existing
third-party code as a workaround to a
bug or feature which does not act as
you desire.
First: monkey patching is an evil hack (in my opinion).
It is often used to replace a method on the module or class level with a custom implementation.
The most common usecase is adding a workaround for a bug in a module or class when you can't replace the original code. In this case you replace the "wrong" code through monkey patching with an implementation inside your own module/package.
Monkey patching can only be done in dynamic languages, of which python is a good example. Changing a method at runtime instead of updating the object definition is one example;similarly, adding attributes (whether methods or variables) at runtime is considered monkey patching. These are often done when working with modules you don't have the source for, such that the object definitions can't be easily changed.
This is considered bad because it means that an object's definition does not completely or accurately describe how it actually behaves.
Monkey patching is reopening the existing classes or methods in class at runtime and changing the behavior, which should be used cautiously, or you should use it only when you really need to.
As Python is a dynamic programming language, Classes are mutable so you can reopen them and modify or even replace them.
What is monkey patching? Monkey patching is a technique used to dynamically update the behavior of a piece of code at run-time.
Why use monkey patching? It allows us to modify or extend the behavior of libraries, modules, classes or methods at runtime without
actually modifying the source code
Conclusion Monkey patching is a cool technique and now we have learned how to do that in Python. However, as we discussed, it has its
own drawbacks and should be used carefully.

How is a Python project set up?

I am doing some heavy commandline stuff (not really web based) and am new to Python, so I was wondering how to set up my files/folders/etc. Are there "header" files where I can keep all the DB connection stuff?
How/where do I define classes and objects?
Just to give you an example of a typical Python module's source, here's something with some explanation. This is a file named "Dims.py". This is not the whole file, just some parts to give an idea what's going on.
#!/usr/bin/env python
This is the standard first line telling the shell how to execute this file. Saying /usr/bin/env python instead of /usr/bin/python tells the shell to find Python via the user's PATH; the desired Python may well be in ~/bin or /usr/local/bin.
"""Library for dealing with lengths and locations."""
If the first thing in the file is a string, it is the docstring for the module. A docstring is a string that appears immediately after the start of an item, which can be accessed via its __doc__ property. In this case, since it is the module's docstring, if a user imports this file with import Dims, then Dims.__doc__ will return this string.
# Units
MM_BASIC = 1500000
MILS_BASIC = 38100
IN_BASIC = MILS_BASIC * 1000
There are a lot of good guidelines for formatting and naming conventions in a document known as PEP (Python Enhancement Proposal) 8. These are module-level variables (constants, really) so they are written in all caps with underscores. No, I don't follow all the rules; old habits die hard. Since you're starting fresh, follow PEP 8 unless you can't.
_SCALING = 1
_SCALES = {
mm_basic: MM_BASIC,
"mm": MM_BASIC,
mils_basic: MILS_BASIC,
"mil": MILS_BASIC,
"mils": MILS_BASIC,
"basic": 1,
1: 1
}
These module-level variables have leading underscores in their names. This gives them a limited amount of "privacy", in that import Dims will not let you access Dims._SCALING. However, if you need to mess with it, you can explicitly say something like import Dims._SCALING as scaling.
def UnitsToScale(units=None):
"""Scales the given units to the current scaling."""
if units is None:
return _SCALING
elif units not in _SCALES:
raise ValueError("unrecognized units: '%s'." % units)
return _SCALES[units]
UnitsToScale is a module-level function. Note the docstring and the use of default values and exceptions. No spaces around the = in default value declarations.
class Length(object):
"""A length. Makes unit conversions easier.
The basic, mm, and mils properties can be used to get or set the length
in the desired units.
>>> x = Length(mils=1000)
>>> x.mils
1000.0
>>> x.mm
25.399999999999999
>>> x.basic
38100000L
>>> x.mils = 100
>>> x.mm
2.54
"""
The class declaration. Note the docstring has things in it that look like Python command line commands. These care called doctests, in that they are test code in the docstring. More on this later.
def __init__(self, unscaled=0, basic=None, mm=None, mils=None, units=None):
"""Constructs a Length.
Default contructor creates a length of 0.
>>> Length()
Length(basic=0)
Length(<float>) or Length(<string>) creates a length with the given
value at the current scale factor.
>>> Length(1500)
Length(basic=1500)
>>> Length("1500")
Length(basic=1500)
"""
# Straight copy
if isinstance(unscaled, Length):
self._x = unscaled._x
return
# rest omitted
This is the initializer. Unlike C++, you only get one, but you can use default arguments to make it look like several different constructors are available.
def _GetBasic(self): return self._x
def _SetBasic(self, x): self._x = x
basic = property(_GetBasic, _SetBasic, doc="""
This returns the length in basic units.""")
This is a property. It allows you to have getter/setter functions while using the same syntax as you would for accessing any other data member, in this case, myLength.basic = 10 does the same thing as myLength._SetBasic(10). Because you can do this, you should not write getter/setter functions for your data members by default. Just operate directly on the data members. If you need to have getter/setter functions later, you can convert the data member to a property and your module's users won't need to change their code. Note that the docstring is on the property, not the getter/setter functions.
If you have a property that is read-only, you can use property as a decorator to declare it. For example, if the above property was to be read-only, I would write:
#property
def basic(self):
"""This returns the length in basic units."""
return self._x
Note that the name of the property is the name of the getter method. You can also use decorators to declare setter methods in Python 2.6 or later.
def __mul__(self, other):
"""Multiplies a Length by a scalar.
>>> Length(10)*10
Length(basic=100)
>>> 10*Length(10)
Length(basic=100)
"""
if type(other) not in _NumericTypes:
return NotImplemented
return Length(basic=self._x * other)
This overrides the * operator. Note that you can return the special value NotImplemented to tell Python that this operation isn't implemented (in this case, if you try to multiply by a non-numeric type like a string).
__rmul__ = __mul__
Since code is just a value like anything else, you can assign the code of one method to another. This line tells Python that the something * Length operation uses the same code as Length * something. Don't Repeat Yourself.
Now that the class is declared, I can get back to module code. In this case, I have some code that I want to run only if this file is executed by itself, not if it's imported as a module. So I use the following test:
if __name__ == "__main__":
Then the code in the if is executed only if this is being run directly. In this file, I have the code:
import doctest
doctest.testmod()
This goes through all the docstrings in the module and looks for lines that look like Python prompts with commands after them. The lines following are assumed to be the output of the command. If the commands output something else, the test is considered to have failed and the actual output is printed. Read the doctest module documentation for all the details.
One final note about doctests: They're useful, but they're not the most versatile or thorough tests available. For those, you'll want to read up on unittests (the unittest module).
Each python source file is a module. There are no "header" files. The basic idea is that when you import "foo" it'll load the code from "foo.py" (or a previously compiled version of it). You can then access the stuff from the foo module by saying foo.whatever.
There seem to be two ways for arranging things in Python code. Some projects use a flat layout, where all of the modules are at the top-level. Others use a hierarchy. You can import foo/bar/baz.py by importing "foo.bar.baz". The big gotcha with hierarchical layout is to have __init__.py in the appropriate directories (it can even be empty, but it should exist).
Classes are defined like this:
class MyClass(object):
def __init__(self, x):
self.x = x
def printX(self):
print self.x
To create an instance:
z = MyObject(5)
You can organize it in whatever way makes the most sense for your application. I don't exactly know what you're doing so I can't be certain what the best organization would be for you, but you can pretty much split it up as you see fit and just import what you need.
You can define classes in any file, and you can define as many classes as you would like in a script (unlike Java). There are no official header files (not like C or C++), but you can use config files to store info about connecting to a DB, whatever, and use configparser (a standard library function) to organize them.
It makes sense to keep like things in the same file, so if you have a GUI, you might have one file for the interface, and if you have a CLI, you might keep that in a file by itself. It's less important how your files are organized and more important how the source is organized into classes and functions.
This would be the place to look for that: http://docs.python.org/reference/.
First of all, compile and install pip: http://pypi.python.org/pypi/pip. It is like Ubuntu's apt-get. You run it via a Terminal by typing in pip install package-name. It has a database of packages, so you can install/uninstall stuff quite easily with it.
As for importing and "header" files, from what I can tell, if you run import foo, Python looks for foo.py in the current folder. If it's not there, it looks for eggs (folders unzipped in the Python module directory) and imports those.
As for defining classes and objects, here's a basic example:
class foo(foobar2): # I am extending a class, in this case 'foobar2'. I take no arguments.
__init__(self, the, list, of, args = True): # Instead, the arguments get passed to me. This still lets you define a 'foo()' objects with three arguments, only you let '__init__' take them.
self.var = 'foo'
def bar(self, args):
self.var = 'bar'
def foobar(self): # Even if you don't need arguments, never leave out the self argument. It's required for classes.
print self.var
foobar = foo('the', 'class', 'args') # This is how you initialize me!
Read more on this in the Python Reference, but my only tip is to never forget the self argument in class functions. It will save you a lot of debugging headaches...
Good luck!
There's no some fixed structure for Python programs, but you can take Django project as an example. Django project consists of one settings.py module, where global settings (like your example with DB connection properties) are stored and pluggable applications. Each application has it's own models.py module, which stores database models and, possibly, other domain specific objects. All the rest is up to you.
Note, that these advices are not specific to Python. In C/C++ you probably used similar structure and kept settings in XML. Just forget about headers and put settings in plain in .py file, that's all.

Categories

Resources