How could you swap out a particular database implementation in python? - python

If I have a seperate class for my db calls, and I create another implementation of the db layer but say with a different data store.
Is there a way for me to completly swap out the implementation without having to change allot of code?
i.e. I am starting a project, so I can design things properly to achieve this from the get-go.
Note: I will use this pattern for other parts of the site also, not just the db layer so its not really specific to db layer only.

As long as two modules implement exactly the same interface (classes with the same names, methods, and other attributes, functions with the same names and signatures, ...) you can pick one or the other at the time your application is starting up, for example on the basis of some configuration file, and import the chosen one under a fixed name. All the rest of your application can then use that fixed name and, net of the startup code, be blissfully unaware of any shenanigans that may have been done at the start.
For example, consider a simplified case:
# english.py
def greet(): return 'Hello!'
# italian.py
def greet(): return 'Ciao!'
# french.py
def greet(): return 'Salut!'
# config.py
langname = 'italian'
# startit.py
import config
import sys
lang = __import__(config.langname)
sys.modules['lang'] = lang
Now, all the rest of the application can just import lang, and it will be getting under that name the italian module, so, when calling lang.greet(), it will get the string 'Ciao!'.
Of course, in real life you'll have multiple modules, each with multiple functions, classes, and whatnot, but the general principles stay very similar. Just take special care about modules with qualified names (such as foo.bar), i.e., modules which must reside in a package (in this case, foo). For those, you can't just use __import__'s return value, but must use a slightly more roundabout approach, such as:
import sys
def importanyasname(actualname, fakename):
__import__(actualname)
sys.modules[fakename] = sys.modules[actualname]
that is, ignore __import__'s return value, and reach right for the value that's left (with the actual name as the key) in the sys.modules dictionary -- that is the module object you seek, and that you can set back into sys.modules with the "fake name" by which all the rest of the application will be able to blissfully import it any time.

Related

QA - testing order is not right [duplicate]

How can I be sure of the unittest methods' order? Is the alphabetical or numeric prefixes the proper way?
class TestFoo(TestCase):
def test_1(self):
...
def test_2(self):
...
or
class TestFoo(TestCase):
def test_a(self):
...
def test_b(self):
...
You can disable it by setting sortTestMethodsUsing to None:
import unittest
unittest.TestLoader.sortTestMethodsUsing = None
For pure unit tests, you folks are right; but for component tests and integration tests...
I do not agree that you shall assume nothing about the state.
What if you are testing the state?
For example, your test validates that a service is auto-started upon installation. If in your setup, you start the service, then do the assertion, and then you are no longer testing the state, but you are testing the "service start" functionality.
Another example is when your setup takes a long time or requires a lot of space and it just becomes impractical to run the setup frequently.
Many developers tend to use "unit test" frameworks for component testing...so stop and ask yourself, am I doing unit testing or component testing?
There is no reason given that you can't build on what was done in a previous test or should rebuild it all from scratch for the next test. At least no reason is usually offered but instead people just confidently say "you shouldn't". That isn't helpful.
In general I am tired of reading too many answers here that say basically "you shouldn't do that" instead of giving any information on how to best do it if in the questioners judgment there is good reason to do so. If I wanted someone's opinion on whether I should do something then I would have asked for opinions on whether doing it is a good idea.
That out of the way, if you read say loadTestsFromTestCase and what it calls it ultimately scans for methods with some name pattern in whatever order they are encountered in the classes method dictionary, so basically in key order. It take this information and makes a testsuite of mapping it to the TestCase class. Giving it instead a list ordered as you would like is one way to do this. I am not so sure of the most efficient/cleanest way to do it but this does work.
If you use 'nose' and you write your test cases as functions (and not as methods of some TestCase derived class), 'nose' doesn't fiddle with the order, but uses the order of the functions as defined in the file.
In order to have the assert_* methods handy without needing to subclass TestCase I usually use the testing module from NumPy. Example:
from numpy.testing import *
def test_aaa():
assert_equal(1, 1)
def test_zzz():
assert_equal(1, 1)
def test_bbb():
assert_equal(1, 1)
Running that with ''nosetest -vv'' gives:
test_it.test_aaa ... ok
test_it.test_zzz ... ok
test_it.test_bbb ... ok
----------------------------------------------------------------------
Ran 3 tests in 0.050s
OK
Note to all those who contend that unit tests shouldn't be ordered: while it is true that unit tests should be isolated and can run independently, your functions and classes are usually not independent.
They rather build up on another from simpler/low-level functions to more complex/high-level functions. When you start optimising your low-level functions and mess up (for my part, I do that frequently; if you don't, you probably don't need unit test anyway;-) then it's a lot better for diagnosing the cause, when the tests for simple functions come first, and tests for functions that depend on those functions later.
If the tests are sorted alphabetically the real cause usually gets drowned among one hundred failed assertions, which are not there because the function under test has a bug, but because the low-level function it relies on has.
That's why I want to have my unit tests sorted the way I specified them: not to use state that was built up in early tests in later tests, but as a very helpful tool in diagnosing problems.
I half agree with the idea that tests shouldn't be ordered. In some cases it helps (it's easier, damn it!) to have them in order... after all, that's the reason for the 'unit' in UnitTest.
That said, one alternative is to use mock objects to mock out and patch the items that should run before that specific code under test. You can also put a dummy function in there to monkey patch your code. For more information, check out Mock, which is part of the standard library now.
Here are some YouTube videos if you haven't used Mock before.
Video 1
Video 2
Video 3
More to the point, try using class methods to structure your code, and then place all the class methods in one main test method.
import unittest
import sqlite3
class MyOrderedTest(unittest.TestCase):
#classmethod
def setUpClass(cls):
cls.create_db()
cls.setup_draft()
cls.draft_one()
cls.draft_two()
cls.draft_three()
#classmethod
def create_db(cls):
cls.conn = sqlite3.connect(":memory:")
#classmethod
def setup_draft(cls):
cls.conn.execute("CREATE TABLE players ('draftid' INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, 'first', 'last')")
#classmethod
def draft_one(cls):
player = ("Hakeem", "Olajuwon")
cls.conn.execute("INSERT INTO players (first, last) VALUES (?, ?)", player)
#classmethod
def draft_two(cls):
player = ("Sam", "Bowie")
cls.conn.execute("INSERT INTO players (first, last) VALUES (?, ?)", player)
#classmethod
def draft_three(cls):
player = ("Michael", "Jordan")
cls.conn.execute("INSERT INTO players (first, last) VALUES (?, ?)", player)
def test_unordered_one(self):
cur = self.conn.execute("SELECT * from players")
draft = [(1, u'Hakeem', u'Olajuwon'), (2, u'Sam', u'Bowie'), (3, u'Michael', u'Jordan')]
query = cur.fetchall()
print query
self.assertListEqual(query, draft)
def test_unordered_two(self):
cur = self.conn.execute("SELECT first, last FROM players WHERE draftid=3")
result = cur.fetchone()
third = " ".join(result)
print third
self.assertEqual(third, "Michael Jordan")
Why do you need a specific test order? The tests should be isolated and therefore it should be possible to run them in any order, or even in parallel.
If you need to test something like user unsubscribing, the test could create a fresh database with a test subscription and then try to unsubscribe. This scenario has its own problems, but in the end it’s better than having tests depend on each other. (Note that you can factor out common test code, so that you don’t have to repeat the database setup code or create testing data ad nauseam.)
There are a number of reasons for prioritizing tests, not the least of which is productivity, which is what JUnit Max is geared for. It's sometimes helpful to keep very slow tests in their own module so that you can get quick feedback from the those tests that that don't suffer from the same heavy dependencies. Ordering is also helpful in tracking down failures from tests that are not completely self-contained.
Don't rely on the order. If they use some common state, like the filesystem or database, then you should create setUp and tearDown methods that get your environment into a testable state, and then clean up after the tests have run.
Each test should assume that the environment is as defined in setUp, and should make no further assumptions.
You should try the proboscis library. It will allow you to make tests order as well as set up any test dependencies. I use it and this library is truly awesome.
For example, if test case #1 from module A should depend on test case #3 from module B you CAN set this behaviour using the library.
Here is a simpler method that has the following advantages:
No need to create a custom TestCase class.
No need to decorate every test method.
Use the unittest standard load test protocol. See the Python docs here.
The idea is to go through all the test cases of the test suites given to the test loader protocol and create a new suite but with the tests ordered by their line number.
Here is the code:
import unittest
def load_ordered_tests(loader, standard_tests, pattern):
"""
Test loader that keeps the tests in the order they were declared in the class.
"""
ordered_cases = []
for test_suite in standard_tests:
ordered = []
for test_case in test_suite:
test_case_type = type(test_case)
method_name = test_case._testMethodName
testMethod = getattr(test_case, method_name)
line = testMethod.__code__.co_firstlineno
ordered.append( (line, test_case_type, method_name) )
ordered.sort()
for line, case_type, name in ordered:
ordered_cases.append(case_type(name))
return unittest.TestSuite(ordered_cases)
You can put this in a module named order_tests and then in each unittest Python file, declare the test loader like this:
from order_tests import load_ordered_tests
# This orders the tests to be run in the order they were declared.
# It uses the unittest load_tests protocol.
load_tests = load_ordered_tests
Note: the often suggested technique of setting the test sorter to None no longer works because Python now sorts the output of dir() and unittest uses dir() to find tests. So even though you have no sorting method, they still get sorted by Python itself!
From unittest — Unit testing framework
Note that the order in which the various test cases will be run is determined by sorting the test function names with respect to the built-in ordering for strings.
If you need set the order explicitly, use a monolithic test.
class Monolithic(TestCase):
def step1(self):
...
def step2(self):
...
def steps(self):
for name in sorted(dir(self)):
if name.startswith("step"):
yield name, getattr(self, name)
def test_steps(self):
for name, step in self.steps():
try:
step()
except Exception as e:
self.fail("{} failed ({}: {})".format(step, type(e), e)
Check out this Stack Overflow question for details.
There are scenarios where the order can be important and where setUp and Teardown come in as too limited. There's only one setUp and tearDown method, which is logical, but you can only put so much information in them until it gets unclear what setUp or tearDown might actually be doing.
Take this integration test as an example:
You are writing tests to see if the registration form and the login form are working correctly. In such a case the order is important, as you can't login without an existing account.
More importantly the order of your tests represents some kind of user interaction. Where each test might represent a step in the whole process or flow you're testing.
Dividing your code in those logical pieces has several advantages.
It might not be the best solution, but I often use one method that kicks off the actual tests:
def test_registration_login_flow(self):
_test_registration_flow()
_test_login_flow()
A simple method for ordering "unittest" tests is to follow the init.d mechanism of giving them numeric names:
def test_00_createEmptyObject(self):
obj = MyObject()
self.assertIsEqual(obj.property1, 0)
self.assertIsEqual(obj.dict1, {})
def test_01_createObject(self):
obj = MyObject(property1="hello", dict1={"pizza":"pepperoni"})
self.assertIsEqual(obj.property1, "hello")
self.assertIsDictEqual(obj.dict1, {"pizza":"pepperoni"})
def test_10_reverseProperty(self):
obj = MyObject(property1="world")
obj.reverseProperty1()
self.assertIsEqual(obj.property1, "dlrow")
However, in such cases, you might want to consider structuring your tests differently so that you can build on previous construction cases. For instance, in the above, it might make sense to have a "construct and verify" function that constructs the object and validates its assignment of parameters.
def make_myobject(self, property1, dict1): # Must be specified by caller
obj = MyObject(property1=property1, dict1=dict1)
if property1:
self.assertEqual(obj.property1, property1)
else:
self.assertEqual(obj.property1, 0)
if dict1:
self.assertDictEqual(obj.dict1, dict1)
else:
self.assertEqual(obj.dict1, {})
return obj
def test_00_createEmptyObject(self):
obj = self.make_object(None, None)
def test_01_createObject(self):
obj = self.make_object("hello", {"pizza":"pepperoni"})
def test_10_reverseProperty(self):
obj = self.make_object("world", None)
obj.reverseProperty()
self.assertEqual(obj.property1, "dlrow")
I agree with the statement that a blanket "don't do that" answer is a bad response.
I have a similar situation where I have a single data source and one test will wipe the data set causing other tests to fail.
My solution was to use the operating system environment variables in my Bamboo server...
(1) The test for the "data purge" functionality starts with a while loop that checks the state of an environment variable "BLOCK_DATA_PURGE." If the "BLOCK_DATA_PURGE" variable is greater than zero, the loop will write a log entry to the effect that it is sleeping 1 second. Once the "BLOCK_DATA_PURGE" has a zero value, execution proceeds to test the purge functionality.
(2) Any unit test which needs the data in the table simply increments "BLOCK_DATA_PURGE" at the beginning (in setup()) and decrements the same variable in teardown().
The effect of this is to allow various data consumers to block the purge functionality so long as they need without fear that the purge could execute in between tests. Effectively the purge operation is pushed to the last step...or at least the last step that requires the original data set.
Today I am going to extend this to add more functionality to allow some tests to REQUIRE_DATA_PURGE. These will effectively invert the above process to ensure that those tests only execute after the data purge to test data restoration.
See the example of WidgetTestCase on Organizing test code. It says that
Class instances will now each run one of the test_*() methods, with self.widget created and destroyed separately for each instance.
So it might be of no use to specify the order of test cases, if you do not access global variables.
I have implemented a plugin, nosedep, for Nose which adds support for test dependencies and test prioritization.
As mentioned in the other answers/comments, this is often a bad idea, however there can be exceptions where you would want to do this (in my case it was performance for integration tests - with a huge overhead for getting into a testable state, minutes vs. hours).
A minimal example is:
def test_a:
pass
#depends(before=test_a)
def test_b:
pass
To ensure that test_b is always run before test_a.
The philosophy behind unit tests is to make them independent of each other. This means that the first step of each test will always be to try to rethink how you are testing each piece to match that philosophy. This can involve changing how you approach testing and being creative by narrowing your tests to smaller scopes.
However, if you still find that you need tests in a specific order (as that is viable), you could try checking out the answer to Python unittest.TestCase execution order .
It seems they are executed in alphabetical order by test name (using the comparison function between strings).
Since tests in a module are also only executed if they begin with "test", I put in a number to order the tests:
class LoginTest(unittest.TestCase):
def setUp(self):
driver.get("http://localhost:2200")
def tearDown(self):
# self.driver.close()
pass
def test1_check_at_right_page(self):
...
assert "Valor" in driver.page_source
def test2_login_a_manager(self):
...
submit_button.click()
assert "Home" in driver.title
def test3_is_manager(self):
...
Note that numbers are not necessarily alphabetical - "9" > "10" in the Python shell is True for instance. Consider using decimal strings with fixed 0 padding (this will avoid the aforementioned problem) such as "000", "001", ... "010"... "099", "100", ... "999".
Contrary to what was said here:
tests have to run in isolation (order must not matter for that)
and
ordering them is important because they describe what the system do and how the developer implements it.
In other words, each test brings you information of the system and the developer logic.
So if this information is not ordered it can make your code difficult to understand.
To randomise the order of test methods you can monkey patch the unittest.TestLoader.sortTestMethodsUsing attribute
if __name__ == '__main__':
import random
unittest.TestLoader.sortTestMethodsUsing = lambda self, a, b: random.choice([1, 0, -1])
unittest.main()
The same approach can be used to enforce whatever order you need.

How to test complicated functions which use requests?

I want to test my code that is based on the API created by someone else, but im not sure how should I do this.
I have created some function to save the json into file so I don't need to send requests each time I run test, but I don't know how to make it work in situation when the original (check) function takes an input arg (problem_report) which is an instance of some class provided by API and it has this
problem_report.get_correction(corr_link) method. I just wonder if this is a sign of bad written code by me, beacuse I can't write a test to this, or maybe I should rewrite this function in my tests file like I showed at the end of provided below code.
# I to want test this function
def check(problem_report):
corrections = {}
for corr_link, corr_id in problem_report.links.items():
if re.findall(pattern='detailCorrection', string=corr_link):
correction = problem_report.get_correction(corr_link)
corrections.update({corr_id: correction})
return corrections
# function serves to load json from file, normally it is downloaded by API from some page.
def load_pr(pr_id):
print('loading')
with open('{}{}_view_pr.json'.format(saved_prs_path, pr_id)) as view_pr:
view_pr = json.load(view_pr)
...
pr_info = {'view_pr': view_pr, ...}
return pr_info
# create an instance of class MyPR which takes json to __init__
#pytest.fixture
def setup_pr():
print('setup')
pr = load_pr('123')
my_pr = MyPR(pr['view_pr'])
return my_pr
# test function
def test_check(setup_pr):
pr = setup_pr
checked_pr = pr.check(setup_rft[1]['problem_report_pr'])
assert checker_pr
# rewritten check function in test file
#mock.patch('problem_report.get_correction', side_effect=get_corr)
def test_check(problem_report):
corrections = {}
for corr_link, corr_id in problem_report.links.items():
if re.findall(pattern='detailCorrection', string=corr_link):
correction = problem_report.get_correction(corr_link)
corrections.update({corr_id: correction})
return corrections
Im' not sure if I provided enough code and explanation to underastand the problem, but I hope so. I wish you could tell me if this is normal that some function are just hard to test, and if this is good practice to rewritte them separately so I can mock functions inside the tested function. I also was thinking that I could write new class with similar functionality but API is very large and it would be very long process.
I understand your question as follows: You have a function check that you consider hard to test because of its dependency on the problem_report. To make it better testable you have copied the code into the test file. You will test the copied code because you can modify this to be easier testable. And, you want to know if this approach makes sense.
The answer is no, this does not make sense. You are not testing the real function, but completely different code. Well, the code may not start being completely different, but in short time the copy and the original will deviate, and it will be a maintenance nightmare to ensure that the copy always resembles the original. Improving code for testability is a different story: You can make changes to the check function to improve its testability. But then, exactly the same resulting function should be used both in the test and the production code.
How to better test the function check then? First, are you sure that using the original problem_report objects really can not be sensibly used in your tests? (Here are some criteria that help you decide: What to mock for python test cases?). Now, lets assume that you come to the conclusion you can not sensibly use the original problem_report.
In that case, here the interface is simple enough to define a mocked problem_report. Keep in mind that Python uses duck typing, so you only have to create a class that has a links member which has an items() method. Plus, your mocked problem_report class needs a method get_correction(). Beyond that, your mock does not have to produce types that are similar to the types used by problem_report. The items() method can return simply a list of lists, like [["a",2],["xxxxdetailCorrectionxxxx",4]]. The same argument holds for get_correction, which could for example simply return its argument or a derived value, like, its negative.
For the above example (items() returning [["a",2],["xxxxdetailCorrectionxxxx",4]] and get_correction returning the negative of its argument) the expected result would be {4: -4}. No need to simulate real correction objects. And, you can create your mocked versions of problem_report without need to read data from files - the mocks can be setup completely from within the unit-testing code.
Try patching the problem_report symbol in the module. You should put your tests in a separate class.
#mock.patch('some.module.path.problem_report')
def test_check(problem_report):
problem_report.side_effect = get_corr
corrections = {}
for corr_link, corr_id in problem_report.links.items():
if re.findall(pattern='detailCorrection', string=corr_link):
correction = problem_report.get_correction(corr_link)
corrections.update({corr_id: correction})
return corrections

How to design a program with many configuration options?

Lets say I have a program that has a large number of configuration options. The user can specify them in a config file. My program can parse this config file, but how should it internally store and pass around the options?
In my case, the software is used to perform a scientific simulation. There are about 200 options most of which have sane defaults. Typically the user only has to specify a dozen or so. The difficulty I face is how to design my internal code. Many of the objects that need to be constructed depend on many configuration options. For example an object might need several paths (for where data will be stored), some options that need to be passed to algorithms that the object will call, and some options that are used directly by the object itself.
This leads to objects needing a very large number of arguments to be constructed. Additionally, as my codebase is under very active development, it is a big pain to go through the call stack and pass along a new configuration option all the way down to where it is needed.
One way to prevent that pain is to have a global configuration object that can be freely used anywhere in the code. I don't particularly like this approach as it leads to functions and classes that don't take any (or only one) argument and it isn't obvious to the reader what data the function/class deals with. It also prevents code reuse as all of the code depends on a giant config object.
Can anyone give me some advice about how a program like this should be structured?
Here is an example of what I mean for the configuration option passing style:
class A:
def __init__(self, opt_a, opt_b, ..., opt_z):
self.opt_a = opt_a
self.opt_b = opt_b
...
self.opt_z = opt_z
def foo(self, arg):
algo(arg, opt_a, opt_e)
Here is an example of the global config style:
class A:
def __init__(self, config):
self.config = config
def foo(self, arg):
algo(arg, config)
The examples are in Python but my question stands for any similar programming langauge.
matplotlib is a large package with many configuration options. It use a rcParams module to manage all the default parameters. rcParams save all the default parameters in a dict.
Every functions will get the options from keyword argurments:
for example:
def f(x,y,opt_a=None, opt_b=None):
if opt_a is None: opt_a = rcParams['group1.opt_a']
A few design patterns will help
Prototype
Factory and Abstract Factory
Use these two patterns with configuration objects. Each method will then take a configuration object and use what it needs. Also consider applying a logical grouping to config parameters and think about ways to reduce the number of inputs.
psuedo code
// Consider we can run three different kinds of Simulations. sim1, sim2, sim3
ConfigFactory configFactory = new ConfigFactory("/path/to/option/file");
....
Simulation1 sim1;
Simulation2 sim2;
Simulation3 sim3;
sim1.run( configFactory.ConfigForSim1() );
sim2.run( configFactory.ConfigForSim2() );
sim3.run( configFactory.ConfigForSim3() );
Inside of each factory method it might create a configuration from a prototype object (that has all of the "sane" defaults) and the option file becomes just the things that are different from default. This would be paired with clear documentation on what these defaults are and when a person (or other program) might want to change them.
** Edit: **
Also consider that each config returned by the factory is a subset of the overall config.
Pass around either the config parsing class, or write a class that wraps it and intelligently pulls out the requested options.
Python's standard library configparser exposes the sections and options of an INI style configuration file using the mapping protocol, and so you can retrieve your options directly from that as though it were a dictionary.
myconf = configparser.ConfigParser()
myconf.read('myconf.ini')
what_to_do = myconf['section']['option']
If you explicitly want to provide the options using the attribute notation, create a class that overrides __getattr__:
class MyConf:
def __init__(self, path):
self._parser = configparser.ConfigParser()
self._parser.read('myconf.ini')
def __getattr__(self, option):
return self._parser[{'what_to_do': 'section'}[option]][option]
myconf = MyConf()
what_to_do = myconf.what_to_do
Have a module load the params to its namespace, then import it and use wherever you want.
Also see related question here

the proper method for making a DB connection available across many python modules

I want to make a single database object available across many python modules.
For a related example, I create globl.py:
DOCS_ROOT="c:\docs" ## as an example
SOLR_BASE="http://localhost:8636/solr/"
Any other module which needs it can do a
from globl import DOCS_ROOT
Now this example aside, I want to do the same thing with database connection objects, share them across many modules.
import MySQLdb
conn = MySQLdb.connect (host="localhost"...)
cursor = conn.cursor()
I tried this on the interpreter:
from globl import cursor
and it seems to work. But I suspect that this will cause the same module to be executed each time one imports from it. So is this the proper way?
Even if the import doesn't run the code multiple times, this is definitely not the correct way.
You should instead hide the process of obtaining a connection or cursor behind a function. You can then implement this function using either a Singleton or Object Pool design pattern.
So it would be something like this:
db.py:
_connection = None
def get_connection():
global _connection
if not _connection:
_connection = MySQLdb.connect(host="localhost"...)
return _connection
# List of stuff accessible to importers of this module. Just in case
__all__ = [ 'getConnection' ]
## Edit: actually you can still refer to db._connection
## if you know that's the name of the variable.
## It's just left out from enumeration if you inspect the module
someothermodule.py:
import db
conn = db.get_connection() # This will always return the same object
By the way, depending on what you are doing, it may not be so much of a good idea to share
your connection object rather than create a new one every time you need one.
But, that's why you'd want to write a get_connection() method, to abstract from these issues in the rest of your code.
You suspect wrongly. The code will only be executed once - subsequent imports just refer to the module via sys.modules, and don't re-run it.
(Note that this is the case as long as you always use the same path to import the module - if you do from globl import cursor in one place, and from my.fullyqualified.project.global import cursor in another, you probably will find the code is re-executed.)
Edit to add as S.Lott says in the comment, this is a perfectly good way to handle a global object.
I think Daniel already answered the question, while I'd like to add few comments about the cursor object you want to share.
It is generally not a good idea to share the cursor object that way. Certainly it depends on what your program is, but as a general solution I'd recommend you to hide this cursor object behind a "factory" producing cursors. Basically you can create a method cursor() or get_cursor() instead of making the cursor a global variable. The major benefit (but not the only one) - you can hide a more complex logic behind this "factory" - pooling, automatic re-connection in case the connection is dropped, etc. Even if you don't need it right away - it will be very easy to add it later if you start using this approach now, and while for now you can keep this function implementation as simple as return _cursor.
And yes, still, the module itself will be imported once only.

Scope, using functions in current module

I know this must be a trivial question, but I've tried many different ways, and searched quie a bit for a solution, but how do I create and reference subfunctions in the current module?
For example, I am writing a program to parse through a text file, and for each of the 300 different names in it, I want to assign to a category.
There are 300 of these, and I have a list of these structured to create a dict, so of the form lookup[key]=value (bonus question; any more efficient or sensible way to do this than a massive dict?).
I would like to keep all of this in the same module, but with the functions (dict initialisation, etc) at the
end of the file, so I dont have to scroll down 300 lines to see the code, i.e. as laid out as in the example below.
When I run it as below, I get the error 'initlookups is not defined'. When I structure is so that it is initialisation, then function definition, then function use, no problem.
I'm sure there must be an obvious way to initialise the functions and associated dict without keeping the code inline, but have tried quite a few so far without success. I can put it in an external module and import this, but would prefer not to for simplicity.
What should I be doing in terms of module structure? Is there any better way than using a dict to store this lookup table (It is 300 unique text keys mapping on to approx 10 categories?
Thanks,
Brendan
import ..... (initialisation code,etc )
initLookups() # **Should create the dict - How should this be referenced?**
print getlookup(KEY) # **How should this be referenced?**
def initLookups():
global lookup
lookup={}
lookup["A"]="AA"
lookup["B"]="BB"
(etc etc etc....)
def getlookup(value)
if name in lookup.keys():
getlookup=lookup[name]
else:
getlookup=""
return getlookup
A function needs to be defined before it can be called. If you want to have the code that needs to be executed at the top of the file, just define a main function and call it from the bottom:
import sys
def main(args):
pass
# All your other function definitions here
if __name__ == '__main__':
exit(main(sys.argv[1:]))
This way, whatever you reference in main will have been parsed and is hence known already. The reason for testing __name__ is that in this way the main method will only be run when the script is executed directly, not when it is imported by another file.
Side note: a dict with 300 keys is by no means massive, but you may want to either move the code that fills the dict to a separate module, or (perhaps more fancy) store the key/value pairs in a format like JSON and load it when the program starts.
Here's a more pythonic ways to do this. There aren't a lot of choices, BTW.
A function must be defined before it can be used. Period.
However, you don't have to strictly order all functions for the compiler's benefit. You merely have to put your execution of the functions last.
import # (initialisation code,etc )
def initLookups(): # Definitions must come before actual use
lookup={}
lookup["A"]="AA"
lookup["B"]="BB"
(etc etc etc....)
return lookup
# Any functions initLookups uses, can be define here.
# As long as they're findable in the same module.
if __name__ == "__main__": # Use comes last
lookup= initLookups()
print lookup.get("Key","")
Note that you don't need the getlookup function, it's a built-in feature of a dict, named get.
Also, "initialisation code" is suspicious. An import should not "do" anything. It should define functions and classes, but not actually provide any executable code. In the long run, executable code that is processed by an import can become a maintenance nightmare.
The most notable exception is a module-level Singleton object that gets created by default. Even then, be sure that the mystery object which makes a module work is clearly identified in the documentation.
If your lookup dict is unchanging, the simplest way is to just make it a module scope variable. ie:
lookup = {
'A' : 'AA',
'B' : 'BB',
...
}
If you may need to make changes, and later re-initialise it, you can do this in an initialisation function:
def initLookups():
global lookup
lookup = {
'A' : 'AA',
'B' : 'BB',
...
}
(Alternatively, lookup.update({'A':'AA', ...}) to change the dict in-place, affecting all callers with access to the old binding.)
However, if you've got these lookups in some standard format, it may be simpler simply to load it from a file and create the dictionary from that.
You can arrange your functions as you wish. The only rule about ordering is that the accessed variables must exist at the time the function is called - it's fine if the function has references to variables in the body that don't exist yet, so long as nothing actually tries to use that function. ie:
def foo():
print greeting, "World" # Note that greeting is not yet defined when foo() is created
greeting = "Hello"
foo() # Prints "Hello World"
But:
def foo():
print greeting, "World"
foo() # Gives an error - greeting not yet defined.
greeting = "Hello"
One further thing to note: your getlookup function is very inefficient. Using "if name in lookup.keys()" is actually getting a list of the keys from the dict, and then iterating over this list to find the item. This loses all the performance benefit the dict gives. Instead, "if name in lookup" would avoid this, or even better, use the fact that .get can be given a default to return if the key is not in the dictionary:
def getlookup(name)
return lookup.get(name, "")
I think that keeping the names in a flat text file, and loading them at runtime would be a good alternative. I try to stick to the lowest level of complexity possible with my data, starting with plain text and working up to a RDMS (I lifted this idea from The Pragmatic Programmer).
Dictionaries are very efficient in python. It's essentially what the whole language is built on. 300 items is well within the bounds of sane dict usage.
names.txt:
A = AAA
B = BBB
C = CCC
getname.py:
import sys
FILENAME = "names.txt"
def main(key):
pairs = (line.split("=") for line in open(FILENAME))
names = dict((x.strip(), y.strip()) for x,y in pairs)
return names.get(key, "Not found")
if __name__ == "__main__":
print main(sys.argv[-1])
If you really want to keep it all in one module for some reason, you could just stick a string at the top of the module. I think that a big swath of text is less distracting than a huge mess of dict initialization code (and easier to edit later):
import sys
LINES = """
A = AAA
B = BBB
C = CCC
D = DDD
E = EEE""".strip().splitlines()
PAIRS = (line.split("=") for line in LINES)
NAMES = dict((x.strip(), y.strip()) for x,y in PAIRS)
def main(key):
return NAMES.get(key, "Not found")
if __name__ == "__main__":
print main(sys.argv[-1])

Categories

Resources