Best practice: handle functions with lots of parameters and reserved names - python

i am working on a python client for the api of uwsgi.it and i found the necessity to write methods that accept lots of (optional) parameters that will be sent via http requests.
Initially i wanted to declare which parameters the user can insert, since they are so many i thought it was easier, and safer, for the user to have a list as parameters instead of leave him free to insert anything inside a dict and i ended up with something like this:
def alarms(self, container=None, _class=None, color=None,
vassal=None, level=None, line=None, filename=None,
func=None, with_total=None, range=None):
params = {k: v for k, v in locals().iteritems() if k != 'self' and v}
if '_class' in params:
params['class'] = params['_class']
del params['_class']
return self.get('alarms', params)
But it is pretty ugly and i really don't like this way to handle '_class' parameter. So the other possibility that comes to my mind is to accept a dictionary that can contain anything (or **kwargs), list the accepted keys in the docstring and then to sanitize the input. A possible way would be to declare a "private" method that accept only the allowed params. But then the same problems appears again! Any suggestion? Any best-practice for methods with so many parameters?

I agree that using **kwargs is a good idea, and you can easily sanitize its keys using a set. I'm using Python 2.6, so I don't have set comprehensions, but my code should be easy to translate to more modern versions.
FWIW, I actually posted a version of this program late last night, but then I decided it ought to do something about bad parameters, so I temporarily deleted it. Here's the revised version.
validate_params.py
#! /usr/bin/env python
''' Validate the keys in kwargs
Test keys against a container (set, tuple, list) of good keys,
supplying a value of None for missing keys
Also, if a key ends with an underscore, strip it.
Written by PM 2Ring 2014.11.15
From
http://stackoverflow.com/questions/26945235/best-practice-handle-functions-with-lots-of-parameters-and-reserved-names
'''
import sys
def test(**kwargs):
good_keys = ("container", "class_", "color",
"vassal", "level", "line", "filename",
"func", "with_total", "range")
new_kwargs = validate_keys(kwargs, good_keys)
for t in new_kwargs.items():
print "%-12s : %r" % t
#def alarms(**kwargs):
#good_keys = ("container", "class_", "color",
#"vassal", "level", "line", "filename",
#"func", "with_total", "range")
#return self.get('alarms', validate_keys(kwargs, good_keys))
def validate_keys(kwargs, good_keys):
good_keys = set(good_keys)
bad_keys = set(kwargs.keys()) - good_keys
if bad_keys:
bad_keys = ', '.join(bad_keys)
print >>sys.stderr, "Unknown parameters: %s\n" % bad_keys
raise KeyError, bad_keys
new_kwargs = {}
for k in good_keys:
new_kwargs[k.rstrip('_')] = kwargs.get(k, None)
return new_kwargs
test(color="red", class_="top",
#bar=1, foo=3, #Some bad keys
level=2, func="copy",filename="text.txt")
output
container : None
with_total : None
level : 2
color : 'red'
filename : 'text.txt'
vassal : None
range : None
func : 'copy'
line : None
class : 'top'

one thing you could do to tidy up the logic is change your dict comprehension to:
params = {k.strip("_"): v for k, v in locals().iteritems() if k != 'self' and v is not None}
# ^^^^^^^^^^^
Then you don't need to do anything about class; Also, I would probably use class_ in favor of _class, since the latter indicates that the argument is "private", but the former is often a hint that "i need to use a keyword as an identifier"

When a method begins to require many inputs, one software design practice to consider is to declare a special class which contains properties for each of those input values and then you can instantiate and populate it separately from it's use. That way you only need to pass a single reference into your method signature (to the encapsulating class) instead of references to each property. As your object model grows you can even add builder and validation methods to help you easily generate your new class and verify it's properties if needed.
How to define a class in Python
Also, consider design patterns and SOLID design principals as ways to improve your code's form, function and maintainability. Get intimately familiar with these patterns and you will have the knowledge you need to truly up your game and move from a software programmer to a lead or engineer.
http://en.wikipedia.org/wiki/SOLID_%28object-oriented_design%29
http://en.wikipedia.org/wiki/Encapsulation_%28object-oriented_programming%29
http://en.wikipedia.org/wiki/Software_design_pattern

Related

Why is it forbidden to override log record attributes?

Reading the documentation of Python's logging library (for version 2.7) I came across the following:
Logger.debug(msg, *args, **kwargs)
[...] The second keyword argument is extra which can be used to pass a dictionary which is used to populate the __dict__ of the LogRecord created for the logging event with user-defined attributes. These custom attributes can then be used as you like. For example, they could be incorporated into logged messages. [...] The keys in the dictionary passed in extra should not clash with the keys used by the logging system. [emph. mine]
So why does this constraint exist? In my opinion this removes flexibility from the library for no good reason (it is up to the developer to check which keys are built-in and which are not).
Imagine you want to write a decorator which logs function entry and exit:
def log_entry_exit(func):
def wrapper(*args, **kwargs):
logger.debug('Entry')
result = func(*args, **kwargs)
logger.debug('Exit')
return result
return wrapper
#log_entry_exit
def foo():
pass
Suppose you also want to log the name of the enclosing function:
format_string = '%(funcName)s: %(message)s'
Oops! This doesn't work. The output is:
>>> foo()
wrapper: Entry
wrapper: Exit
Of course the function name evaluates to wrapper because that is the enclosing function. However this is not what I want. I want the function name of the decorated function to be printed. Therefore it would be very convenient to just modify my logging calls to:
logger.debug('<msg>', extra={'funcName': func.__name__})
However (as the documentation already points out) this doesn't work:
KeyError: "Attempt to overwrite 'funcName' in LogRecord"
Nevertheless this would be a very straightforward and light solution to the given problem.
So again, why is logging preventing me from setting custom values for built-in attributes?
Not being the author, I can't be sure, but I have a hunch.
Looking at
https://hg.python.org/cpython/file/3.5/Lib/logging/__init__.py, this seems to be the code that threw the error you quoted:
rv = _logRecordFactory(name, level, fn, lno, msg, args, exc_info, func, sinfo)
if extra is not None:
for key in extra:
if (key in ["message", "asctime"]) or (key in rv.__dict__):
raise KeyError("Attempt to overwrite %r in LogRecord" % key)
rv.__dict__[key] = extra[key]
Looking at the __init__() method in that file, we can see that it sets a long list of attributes, at least some of which are used to keep track of object state (to borrow terminology from elsewhere, these serve the purpose of private member variables):
self.args = args
self.levelname = getLevelName(level)
self.levelno = level
self.pathname = pathname
try:
self.filename = os.path.basename(pathname)
self.module = os.path.splitext(self.filename)[0]
except (TypeError, ValueError, AttributeError):
self.filename = pathname
self.module = "Unknown module"
self.exc_info = exc_info
self.exc_text = None # used to cache the traceback text
self.stack_info = sinfo
self.lineno = lineno
self.funcName = func
[...]
The code makes assumptions in various places that these attributes contain what they were initialized to contain; rather than defensively checking whether the value is still sensible every time that it's used, it blocks attempts to update any of them, as we've seen above. And, instead of trying to distinguish between "safe-to-overwrite" and "unsafe-to-overwrite" attributes, it simply blocks any overwriting.
In the particular case of funcName, I suspect you won't suffer any ill effects (other than having a different funcName displayed) by overwriting it.
Possible ways forward:
live with the limitation
override Logger.makeRecord() to permit an update of funcName
override Logger to add a setFuncName() method
Of course, whatever you do, test your modification carefully to avoid surprises.
I know this is a few years old, but there is no chosen answer. If anyone else comes across it I have a workaround that should continue to work while the logging module undergoes changes.
Unfortunately, the author doesn't expose the keys that would conflict in a way that makes them easy to check for. However he/she does hint at a way to do so in the docs. This line: https://hg.python.org/cpython/file/3.5/Lib/logging/init.py#l368 returns a shell of a LogRecord object:
rv = _logRecordFactory(None, None, "", 0, "", (), None, None)
...and in this object you can see all the properties and you can make a Set that holds the "conflicting keys".
I created a logging helper module:
import logging
clashing_keywords = {key for key in dir(logging.LogRecord(None, None, "", 0, "", (), None, None)) if "__" not in key}
additional_clashing_keywords = {
"message",
"asctime"
}
clashing_keywords = clashing_keywords.union(additional_clashing_keywords)
def make_safe_kwargs(kwargs):
'''
Makes sure you don't have kwargs that might conflict with
the logging module
'''
assert isinstance(kwargs, dict)
for k in kwargs:
if k in clashing_keywords:
kwargs['_'+k] = kwargs.pop(k)
return kwargs
...which just prepends conflicting keys with a _. It can be used like so:
from mymodule.logging_helpers import make_safe_kwargs
logger.info("my message", extra=make_safe_kwargs(kwargs))
It's been working well for me. Hope this helps!
The short answer for me was to identify the name clash, and rename the kwarg:
#broken
log.info('some message', name=name)
# working
log.info('some message', special_name=name)

Too many if statements

I have some topic to discuss. I have a fragment of code with 24 ifs/elifs. Operation is my own class that represents functionality similar to Enum. Here is a fragment of code:
if operation == Operation.START:
strategy = strategy_objects.StartObject()
elif operation == Operation.STOP:
strategy = strategy_objects.StopObject()
elif operation == Operation.STATUS:
strategy = strategy_objects.StatusObject()
(...)
I have concerns from readability point of view. Is is better to change it into 24 classes and use polymorphism? I am not convinced that it will make my code maintainable... From one hand those ifs are pretty clear and it shouldn't be hard to follow, on the other hand there are too many ifs.
My question is rather general, however I'm writing code in Python so I cannot use constructions like switch.
What do you think?
UPDATE:
One important thing is that StartObject(), StopObject() and StatusObject() are constructors and I wanted to assign an object to strategy reference.
You could possibly use a dictionary. Dictionaries store references, which means functions are perfectly viable to use, like so:
operationFuncs = {
Operation.START: strategy_objects.StartObject
Operation.STOP: strategy_objects.StopObject
Operation.STATUS: strategy_objects.StatusObject
(...)
}
It's good to have a default operation just in case, so when you run it use a try except and handle the exception (ie. the equivalent of your else clause)
try:
strategy = operationFuncs[operation]()
except KeyError:
strategy = strategy_objects.DefaultObject()
Alternatively use a dictionary's get method, which allows you to specify a default if the key you provide isn't found.
strategy = operationFuncs.get(operation(), DefaultObject())
Note that you don't include the parentheses when storing them in the dictionary, you just use them when calling your dictionary. Also this requires that Operation.START be hashable, but that should be the case since you described it as a class similar to an ENUM.
Python's equivalent to a switch statement is to use a dictionary. Essentially you can store the keys like you would the cases and the values are what would be called for that particular case. Because functions are objects in Python you can store those as the dictionary values:
operation_dispatcher = {
Operation.START: strategy_objects.StartObject,
Operation.STOP: strategy_objects.StopObject,
}
Which can then be used as follows:
try:
strategy = operation_dispatcher[operation] #fetch the strategy
except KeyError:
strategy = default #this deals with the else-case (if you have one)
strategy() #call if needed
Or more concisely:
strategy = operation_dispatcher.get(operation, default)
strategy() #call if needed
This can potentially scale a lot better than having a mess of if-else statements. Note that if you don't have an else case to deal with you can just use the dictionary directly with operation_dispatcher[operation].
You could try something like this.
For instance:
def chooseStrategy(op):
return {
Operation.START: strategy_objects.StartObject
Operation.STOP: strategy_objects.StopObject
}.get(op, strategy_objects.DefaultValue)
Call it like this
strategy = chooseStrategy(operation)()
This method has the benefit of providing a default value (like a final else statement). Of course, if you only need to use this decision logic in one place in your code, you can always use strategy = dictionary.get(op, default) without the function.
Starting from python 3.10
match i:
case 1:
print("First case")
case 2:
print("Second case")
case _:
print("Didn't match a case")
https://pakstech.com/blog/python-switch-case/
You can use some introspection with getattr:
strategy = getattr(strategy_objects, "%sObject" % operation.capitalize())()
Let's say the operation is "STATUS", it will be capitalized as "Status", then prepended to "Object", giving "StatusObject". The StatusObject method will then be called on the strategy_objects, failing catastrophically if this attribute doesn't exist, or if it's not callable. :) (I.e. add error handling.)
The dictionary solution is probably more flexible though.
If the Operation.START, etc are hashable, you can use dictionary with keys as the condition and the values as the functions to call, example -
d = {Operation.START: strategy_objects.StartObject ,
Operation.STOP: strategy_objects.StopObject,
Operation.STATUS: strategy_objects.StatusObject}
And then you can do this dictionary lookup and call the function , Example -
d[operation]()
Here is a bastardized switch/case done using dictionaries:
For example:
# define the function blocks
def start():
strategy = strategy_objects.StartObject()
def stop():
strategy = strategy_objects.StopObject()
def status():
strategy = strategy_objects.StatusObject()
# map the inputs to the function blocks
options = {"start" : start,
"stop" : stop,
"status" : status,
}
Then the equivalent switch block is invoked:
options["string"]()

Is there a reason not to send super().__init__() a dictionary instead of **kwds?

I just started building a text based game yesterday as an exercise in learning Python (I'm using 3.3). I say "text based game," but I mean more of a MUD than a choose-your-own adventure. Anyway, I was really excited when I figured out how to handle inheritance and multiple inheritance using super() yesterday, but I found that the argument-passing really cluttered up the code, and required juggling lots of little loose variables. Also, creating save files seemed pretty nightmarish.
So, I thought, "What if certain class hierarchies just took one argument, a dictionary, and just passed the dictionary back?" To give you an example, here are two classes trimmed down to their init methods:
class Actor:
def __init__(self, in_dict,**kwds):
super().__init__(**kwds)
self._everything = in_dict
self._name = in_dict["name"]
self._size = in_dict["size"]
self._location = in_dict["location"]
self._triggers = in_dict["triggers"]
self._effects = in_dict["effects"]
self._goals = in_dict["goals"]
self._action_list = in_dict["action list"]
self._last_action = ''
self._current_action = '' # both ._last_action and ._current_action get updated by .update_action()
class Item(Actor):
def __init__(self,in_dict,**kwds)
super().__init__(in_dict,**kwds)
self._can_contain = in_dict("can contain") #boolean entry
self._inventory = in_dict("can contain") #either a list or dict entry
class Player(Actor):
def __init__(self, in_dict,**kwds):
super().__init__(in_dict,**kwds)
self._inventory = in_dict["inventory"] #entry should be a Container object
self._stats = in_dict["stats"]
Example dict that would be passed:
playerdict = {'name' : '', 'size' : '0', 'location' : '', 'triggers' : None, 'effects' : None, 'goals' : None, 'action list' = None, 'inventory' : Container(), 'stats' : None,}
(The None's get replaced by {} once the dictionary has been passed.)
So, in_dict gets passed to the previous class instead of a huge payload of **kwds.
I like this because:
It makes my code a lot neater and more manageable.
As long as the dicts have at least some entry for the key called, it doesn't break the code. Also, it doesn't matter if a given argument never gets used.
It seems like file IO just got a lot easier (dictionaries of player data stored as dicts, dictionaries of item data stored as dicts, etc.)
I get the point of **kwds (EDIT: apparently I didn't), and it hasn't seemed cumbersome when passing fewer arguments. This just appears to be a comfortable way of dealing with a need for a large number of attributes at the the creation of each instance.
That said, I'm still a major python noob. So, my question is this: Is there an underlying reason why passing the same dict repeatedly through super() to the base class would be a worse idea than just toughing it out with nasty (big and cluttered) **kwds passes? (e.g. issues with the interpreter that someone at my level would be ignorant of.)
EDIT:
Previously, creating a new Player might have looked like this, with an argument passed for each attribute.
bob = Player('bob', Location = 'here', ... etc.)
The number of arguments needed blew up, and I only included the attributes that really needed to be present to not break method calls from the Engine object.
This is the impression I'm getting from the answers and comments thus far:
There's nothing "wrong" with sending the same dictionary along, as long as nothing has the opportunity to modify its contents (Kirk Strauser) and the dictionary always has what it's supposed to have (goncalopp). The real answer is that the question was amiss, and using in_dict instead of **kwds is redundant.
Would this be correct? (Also, thanks for the great and varied feedback!)
I'm not sure I understand your question exactly, because I don't see how the code looked before you made the change to use in_dict. It sounds like you have been listing out dozens of keywords in the call to super (which is understandably not what you want), but this is not necessary. If your child class has a dict with all of this information, it can be turned into kwargs when you make the call with **in_dict. So:
class Actor:
def __init__(self, **kwds):
class Item(Actor):
def __init__(self, **kwds)
self._everything = kwds
super().__init__(**kwds)
I don't see a reason to add another dict for this, since you can just manipulate and pass the dict created for kwds anyway
Edit:
As for the question of the efficiency of using the ** expansion of the dict versus listing the arguments explicitly, I did a very unscientific timing test with this code:
import time
def some_func(**kwargs):
for k,v in kwargs.items():
pass
def main():
name = 'felix'
location = 'here'
user_type = 'player'
kwds = {'name': name,
'location': location,
'user_type': user_type}
start = time.time()
for i in range(10000000):
some_func(**kwds)
end = time.time()
print 'Time using expansion:\t{0}s'.format(start - end)
start = time.time()
for i in range(10000000):
some_func(name=name, location=location, user_type=user_type)
end = time.time()
print 'Time without expansion:\t{0}s'.format(start - end)
if __name__ == '__main__':
main()
Running this 10,000,000 times gives a slight (and probably statistically meaningless) advantage passing around a dict and using **.
Time using expansion: -7.9877269268s
Time without expansion: -8.06108212471s
If we print the IDs of the dict objects (kwds outside and kwargs inside the function), you will see that python creates a new dict for the function to use in either case, but in fact the function only gets one dict forever. After the initial definition of the function (where the kwargs dict is created) all subsequent calls are just updating the values of that dict belonging to the function, no matter how you call it. (See also this enlightening SO question about how mutable default parameters are handled in python, which is somewhat related)
So from a performance perspective, you can pick whichever makes sense to you. It should not meaningfully impact how python operates behind the scenes.
I've done that myself where in_dict was a dict with lots of keys, or a settings object, or some other "blob" of something with lots of interesting attributes. That's perfectly OK if it makes your code cleaner, particularly if you name it clearly like settings_object or config_dict or similar.
That shouldn't be the usual case, though. Normally it's better to explicitly pass a small set of individual variables. It makes the code much cleaner and easier to reason about. It's possible that a client could pass in_dict = None by accident and you wouldn't know until some method tried to access it. Suppose Actor.__init__ didn't peel apart in_dict but just stored it like self.settings = in_dict. Sometime later, Actor.method comes along and tries to access it, then boom! Dead process. If you're calling Actor.__init__(var1, var2, ...), then the caller will raise an exception much earlier and provide you with more context about what actually went wrong.
So yes, by all means: feel free to do that when it's appropriate. Just be aware that it's not appropriate very often, and the desire to do it might be a smell telling you to restructure your code.
This is not python specific, but the greatest problem I can see with passing arguments like this is that it breaks encapsulation. Any class may modify the arguments, and it's much more difficult to tell which arguments are expected in each class - making your code difficult to understand, and harder to debug.
Consider explicitly consuming the arguments in each class, and calling the super's __init__ on the remaining. You don't need to make them explicit:
class ClassA( object ):
def __init__(self, arg1, arg2=""):
pass
class ClassB( ClassA ):
def __init__(self, arg3, arg4="", *args, **kwargs):
ClassA.__init__(self, *args, **kwargs)
ClassB(3,4,1,2)
You can also leave the variables uninitialized and use methods to set them. You can then use different methods in the different classes, and all subclasses will have access to the superclass methods.

PyYAML parse into arbitary object

I have the following Python 2.6 program and YAML definition (using PyYAML):
import yaml
x = yaml.load(
"""
product:
name : 'Product X'
sku : 123
features :
- size : '10x30cm'
weight : '10kg'
"""
)
print type(x)
print x
Which results in the following output:
<type 'dict'>
{'product': {'sku': 123, 'name': 'Product X', 'features': [{'weight': '10kg', 'size': '10x30cm'}]}}
It is possible to create an object with fields from x?
I would like to the following:
print x.features[0].size
I am aware that it is possible to create and instance from an existing class, but that is not what I want for this particular scenario.
Edit:
Updated the confusing part about a 'strongly typed object'.
Changed access to features to a indexer as suggested Alex Martelli
So you have a dictionary with string keys and values that can be numbers, nested dictionaries, lists, and you'd like to wrap that into an instance which lets you use attribute access in lieu of dict indexing, and "call with an index" in lieu of list indexing -- not sure what "strongly typed" has to do with this, or why you think .features(0) is better than .features[0] (such a more natural way to index a list!), but, sure, it's feasible. For example, a simple approach might be:
def wrap(datum):
# don't wrap strings
if isinstance(datum, basestring):
return datum
# don't wrap numbers, either
try: return datum + 0
except TypeError: pass
return Fourie(datum)
class Fourie(object):
def __init__(self, data):
self._data = data
def __getattr__(self, n):
return wrap(self._data[n])
def __call__(self, n):
return wrap(self._data[n])
So x = wrap(x['product']) should give you your wish (why you want to skip that level when your overall logic would obviously require x.product.features(0).size, I have no idea, but clearly that skipping's better applied at the point of call rather than hard-coded in the wrapper class or the wrapper factory function I've just shown).
Edit: as the OP says he does want features[0] rather than features(0), just change the last two lines to
def __getitem__(self, n):
return wrap(self._data[n])
i.e., define __getitem__ (the magic method underlying indexing) instead of __call__ (the magic method underlying instance-call).
The alternative to "an existing class" (here, Fourie) would be to create a new class on the fly based on introspecting the wrapped dict -- feasible, too, but seriously dark-gray, if not actually black, magic, and without any real operational advantage that I can think of.
If the OP can clarify exactly why he may be hankering after the meta-programming peaks of creating classes on the fly, what advantage he believes he might be getting that way, etc, I'll show how to do it (and, probably, I'll also show why the craved-for advantage will not in fact be there;-). But simplicity is an important quality in any programming endeavor, and using "deep dark magic" when plain, straightforward code like the above works just fine, is generally not the best of ideas!-)

Dictionary or If statements, Jython

I am writing a script at the moment that will grab certain information from HTML using dom4j.
Since Python/Jython does not have a native switch statement I decided to use a whole bunch of if statements that call the appropriate method, like below:
if type == 'extractTitle':
extractTitle(dom)
if type == 'extractMetaTags':
extractMetaTags(dom)
I will be adding more depending on what information I want to extract from the HTML and thought about taking the dictionary approach which I found elsewhere on this site, example below:
{
'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags
}[type](dom)
I know that each time I run the script the dictionary will be built, but at the same time if I were to use the if statements the script would have to check through all of them until it hits the correct one. What I am really wondering, which one performs better or is generally better practice to use?
Update: #Brian - Thanks for the great reply. I have a question, if any of the extract methods require more than one object, e.g.
handle_extractTag(self, dom, anotherObject)
# Do something
How would you make the appropriate changes to the handle method to implemented this? Hope you know what I mean :)
Cheers
To avoid specifying the tag and handler in the dict, you could just use a handler class with methods named to match the type. Eg
class MyHandler(object):
def handle_extractTitle(self, dom):
# do something
def handle_extractMetaTags(self, dom):
# do something
def handle(self, type, dom):
func = getattr(self, 'handle_%s' % type, None)
if func is None:
raise Exception("No handler for type %r" % type)
return func(dom)
Usage:
handler = MyHandler()
handler.handle('extractTitle', dom)
Update:
When you have multiple arguments, just change the handle function to take those arguments and pass them through to the function. If you want to make it more generic (so you don't have to change both the handler functions and the handle method when you change the argument signature), you can use the *args and **kwargs syntax to pass through all received arguments. The handle method then becomes:
def handle(self, type, *args, **kwargs):
func = getattr(self, 'handle_%s' % type, None)
if func is None:
raise Exception("No handler for type %r" % type)
return func(*args, **kwargs)
With your code you're running your functions all get called.
handlers = {
'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags
}
handlers[type](dom)
Would work like your original if code.
It depends on how many if statements we're talking about; if it's a very small number, then it will be more efficient than using a dictionary.
However, as always, I strongly advice you to do whatever makes your code look cleaner until experience and profiling tell you that a specific block of code needs to be optimized.
Your use of the dictionary is not quite correct. In your implementation, all methods will be called and all the useless one discarded. What is usually done is more something like:
switch_dict = {'extractTitle': extractTitle,
'extractMetaTags': extractMetaTags}
switch_dict[type](dom)
And that way is facter and more extensible if you have a large (or variable) number of items.
The efficiency question is barely relevant. The dictionary lookup is done with a simple hashing technique, the if-statements have to be evaluated one at a time. Dictionaries tend to be quicker.
I suggest that you actually have polymorphic objects that do extractions from the DOM.
It's not clear how type gets set, but it sure looks like it might be a family of related objects, not a simple string.
class ExtractTitle( object ):
def process( dom ):
return something
class ExtractMetaTags( object ):
def process( dom ):
return something
Instead of setting type="extractTitle", you'd do this.
type= ExtractTitle() # or ExtractMetaTags() or ExtractWhatever()
type.process( dom )
Then, you wouldn't be building this particular dictionary or if-statement.

Categories

Resources