Related
My goal is to optimize a framework based on a stack of modifiers for CSV-sourced lists. Each modifier uses a header list to work on a named basis.
CSV example (including header):
date;place
13/02/2013;New York
15/04/2012;Buenos Aires
29/10/2010;Singapour
I have written some code based on namedtuple in order to be able to use lists generated by csv module without reorganizing data every time. Generated code below :
class MyNamedList(object):
__slots__ = ("__values")
_fields = ['date', 'ignore', 'place']
def __init__(self, values):
self.__values = values
if len(self.__values) <= 151:
for i in range(len(self.__values), 151):
self.__values += [None,]
#property
def date(self):
return self.__values[0]
#date.setter
def date(self, val):
self.__values[0] = val
#property
def ignore(self):
return self.__values[150]
#ignore.setter
def ignore(self, val):
self.__values[150] = val
#property
def place(self):
return self.__values[1]
#b.setter
def place(self, val):
self.__values[1] = val
I must say i am very disappointed with performance using this class. Calling a simple modifier function (which changes "ignore" to True 100 times. Yes i know it is useless) for each line of a 70000-line csv file takes 9 seconds (with pypy. 5.5 using original python) whereas equivalent code using a list named foo takes 1.1 second (same with pypy and original python).
Is there anything i could do to get comparable performance between both approaches ? To me, record.ignore = True could be directly inlined (or so) and therefore translated into record[150] = True. Is there any blocking point i don't see to get this to happen ?
Note that the record i am modifying is actually (for now) not created for each line in the CSV file, meaning adding more items into the list happens only once, before the iteration.
Update : sample codes
--> Using namedlist
import namedlist
MyNamedList=namedlist.namedlist("MyNamedList", {"a":1, "b":2, "ignore":150})
test = MyNamedList([0,1])
def foo(a):
test.ignore = True # x100 times
import csv
stream = csv.reader(open("66666.csv", "rb"))
for i in stream:
foo(i)
--> Not using namedlist
import namedlist
import csv
MyNamedList=namedlist.namedlist("MyNamedList", {"a":1, "b":2, "ignore":150})
test = MyNamedList([0,1])
sample_data = []
for i in range(len(sample_data), 151):
sample_data += [None,]
def foo(a):
sample_data[150] = True # x100 times
stream = csv.reader(open("66666.csv", "rb"))
for i in stream:
foo(i)
Update #2 : code for namedlist.py (heavily based on namedtuple.py
# Retrieved from http://code.activestate.com/recipes/500261/
# Licensed under the PSF license
from keyword import iskeyword as _iskeyword
import sys as _sys
def namedlist(typename, field_indices, verbose=False, rename=False):
# Parse and validate the field names. Validation serves two purposes,
# generating informative error messages and preventing template injection attacks.
field_names = field_indices.keys()
for name in [typename,] + field_names:
if not min(c.isalnum() or c=='_' for c in name):
raise ValueError('Type names and field names can only contain alphanumeric characters and underscores: %r' % name)
if _iskeyword(name):
raise ValueError('Type names and field names cannot be a keyword: %r' % name)
if name[0].isdigit():
raise ValueError('Type names and field names cannot start with a number: %r' % name)
seen_names = set()
for name in field_names:
if name.startswith('_') and not rename:
raise ValueError('Field names cannot start with an underscore: %r' % name)
if name in seen_names:
raise ValueError('Encountered duplicate field name: %r' % name)
seen_names.add(name)
# Create and fill-in the class template
numfields = len(field_names)
argtxt = repr(field_names).replace("'", "")[1:-1] # tuple repr without parens or quotes
reprtxt = ', '.join('%s=%%r' % name for name in field_names)
max_index=-1
for name in field_names:
index = field_indices[name]
if max_index < index:
max_index = index
max_index += 1
template = '''class %(typename)s(object):
__slots__ = ("__values") \n
_fields = %(field_names)r \n
def __init__(self, values):
self.__values = values
if len(self.__values) <= %(max_index)s:
for i in range(len(self.__values), %(max_index)s):
self.__values += [None,]'''% locals()
for name in field_names:
index = field_indices[name]
template += ''' \n
#property
def %s(self):
return self.__values[%d]
#%s.setter
def %s(self, val):
self.__values[%d] = val''' % (name, index, name, name, index)
if verbose:
print template
# Execute the template string in a temporary namespace
namespace = {'__name__':'namedtuple_%s' % typename,
'_property':property, '_tuple':tuple}
try:
exec template in namespace
except SyntaxError, e:
raise SyntaxError(e.message + ':\n' + template)
result = namespace[typename]
# For pickling to work, the __module__ variable needs to be set to the frame
# where the named tuple is created. Bypass this step in enviroments where
# sys._getframe is not defined (Jython for example) or sys._getframe is not
# defined for arguments greater than 0 (IronPython).
try:
result.__module__ = _sys._getframe(1).f_globals.get('__name__', '__main__')
except (AttributeError, ValueError):
pass
return result
hasattr(obj, attribute) is used to check if an object has the specified attribute but given an attribute is there a way to know where (all) it is defined?
Assume that my code is getting the name of an attribute (or a classmethod) as string and I want to invoke classname.attribute but I don't have the classname.
One solution that comes to my mind is this
def finder(attr):
for obj in globals():
try:
if globals()[obj].__dict__[attr]:
return(globals()[obj])
except:
...
usage:
class Lime(object):
#classmethod
def lfunc(self):
print('Classic')
getattr(finder('lfunc'),'lfunc')() #Runs lfunc method of Lime class
I am quite sure that this is not the best (oe even proper way) to do it. Can someone please provide a better way.
It is always "possible". Wether it is desirable is another history.
A quick and dirty way to do it is to iterate linearly over all classes and check if any define the attribute you have. Of course, that is subject to conflicts, and it will yield the first class that has such a named attribute. If it exists in more than one, it is up to you to decide which you want:
def finder(attr):
for cls in object.__subclasses__():
if hasattr(cls, attr):
return cls
raise ValueError
Instead of searching in "globals" this searches all subclasses of "object" - thus the classes to be found don't need to be in the namespace of the module where the finder function is.
If your methods are unique in teh set of classes you are searching, though, maybe you could just assemble a mapping of all methods and use it to call them instead.
Let's suppose all your classes inehrit from a class named "Base":
mapper = {attr_name:getattr(cls, attr_name) for cls in base.__subclasses__() for attr_name, obj in cls.__dict__.items()
if isinstance(obj, classmethod) }
And you call them with mapper['attrname']()
This avoids a linear search at each method call and thus would be much better.
- EDIT -
__subclassess__ just find the direct subclasses of a class, not the inheritance tree - so it won't be usefull in "real life" - maybe it is in the specifc case the OP has in its hands.
If one needs to find things across a inheritance tree, one needs to recurse over the each subclass as well.
As for old-style classes: of course this won't work - that is one of the motives for which they are broken by default in new code.
As for non-class attributes: they can only be found inspecting instances anyway - so another method has to be thought of - does not seem to be the concern of the O.P. here.
This might help:
import gc
def checker(checkee, maxdepth = 3):
def onlyDict(ls):
return filter(lambda x: isinstance(x, dict), ls)
collection = []
toBeInspected = {}
tBI = toBeInspected
gc.collect()
for dic in onlyDict(gc.get_referrers(checkee)):
for item, value in dic.iteritems():
if value is checkee:
collection.append(item)
elif item != "checker":
tBI[item] = value
def _auxChecker(checkee, path, collection, checked, current, depth):
if current in checked: return
checked.append(current)
gc.collect()
for dic in onlyDict(gc.get_referents(current)):
for item, value in dic.iteritems():
currentPath = path + "." + item
if value is checkee:
collection.append(currentPath)
else:
try:
_auxChecker(checkee, currentPath, collection,
checked, value, depth + 1)
if depth < maxdepth else None
except TypeError:
continue
checked = []
for item, value in tBI.iteritems():
_auxChecker(checkee, item, collection, checked, value, 1)
return collection
How to use:
referrer = []
class Foo:
pass
noo = Foo()
bar = noo
import xml
import libxml2
import sys
import os
op = os.path
xml.foo = bar
foobar = noo
for x in checker(foobar, 5):
try:
y= eval(x)
referrer.append(x)
except:
continue
del x, y
ps: attributes of the checkee will not be further checked, for recursive or nested references to the checkee itself.
This should work in all circumstances, but still needs a lot of testing:
import inspect
import sys
def finder(attr, classes=None):
result = []
if classes is None:
# get all accessible classes
classes = [obj for name, obj in inspect.getmembers(
sys.modules[__name__])]
for a_class in classes:
if inspect.isclass(a_class):
if hasattr(a_class, attr):
result.append(a_class)
else:
# we check for instance attributes
if hasattr(a_class(), attr):
result.append(a_class)
try:
result += finder(attr, a_class.__subclasses__())
except:
# old style classes (that don't inherit from object) do not
# have __subclasses; not the best solution though
pass
return list(set(result)) # workaround duplicates
def main(attr):
print finder(attr)
return 0
if __name__ == "__main__":
sys.exit(main("some_attr"))
I have created a function that takes a value, does some calculations and return the different answers as an object. However when I try to parallelize the code, using pp, I get the following error.
File "trmm.py", line 8, in getattr
return self.header_array[name]
RuntimeError: maximum recursion depth exceeded while calling a Python object
Here is a simple version of what I am trying to do.
class DataObject(object):
"""
Class to handle data objects with several arrays.
"""
def __getattr__(self, name):
try:
return self.header_array[name]
except KeyError:
try:
return self.line[name]
except KeyError:
raise AttributeError("%s instance has no attribute '%s'" %(self.__class__.__name__, name))
def __setattr__(self, name, value):
if name in ('header_array', 'line'):
object.__setattr__(self, name, value)
elif name in self.line:
self.line[name] = value
else:
self.header_array[name] = value
class TrmmObject(DataObject):
def __init__(self):
DataObject.__init__(self)
self.header_array = {
'header': None
}
self.line = {
'longitude': None,
'latitude': None
}
if __name__ == '__main__':
import pp
ppservers = ()
job_server = pp.Server(2, ppservers=ppservers)
def get_monthly_values(value):
tplObj = TrmmObject()
tplObj.longitude = value
tplObj.latitude = value * 2
return tplObj
job1 = job_server.submit(get_monthly_values, (5,), (DataObject,TrmmObject,),("numpy",))
result = job1()
If I change return tplObj to return [tplObj.longitude, tplObj.latitude] there is no problem. However, as I said before this is a simple version, in reality this change would complicate the program a lot.
I am very grateful for any help.
You almost never need to use getattr and setattr, and it almost always ends up with something blowing up, and infinite recursions is a typical effect of that. I can't really see any reason for using them here either. Be explicit and use the line and header_array dictionaries directly.
If you want a function that looks up a value over all arrays, create a function for that and call it explicitly. Calling the function __getitem__ and using [] is explicit. :-)
(And please don't call a dictionary "header_array", it's confusing).
I've been working on a way to get tests produced from a generator in nose to have descriptions that are customized for the specific iteration being tested. I have something that works, as long as my generator target method never tries to access self from my generator class. I'm seeing that all my generator target instances have a common test class instance while nose is generating a one-offed instance of the test class for each test run from the generator. This is resulting in setUp being run on each test instance nose creates, but never running on the instance the generator target is bound to (of course, the real problem is that I can't see how to bind the nose-created instance to the generator target). Here's the code I'm using to try to figure this all out (yes, I know the decorator would probably be better as a callable class, but nose, at least version 1.2.1 that I have, explicitly checks that tests are either functions or methods, so a callable class won't run at all):
import inspect
def labelable_yielded_case(case):
argspec = inspect.getargspec(case)
if argspec.defaults is not None:
defaults_list = [''] * (len(argspec.args) - len(argspec.defaults)) + argspec.defaults
else:
defaults_list = [''] * len(argspec.args)
argument_defaults_list = zip(argspec.args, defaults_list)
case_wrappers = []
def add_description(wrapper_id, argument_dict):
case_wrappers[wrapper_id].description = case.__doc__.format(**argument_dict)
def case_factory(*factory_args, **factory_kwargs):
def case_wrapper_wrapper():
wrapper_id = len(case_wrappers)
def case_wrapper(*args, **kwargs):
args = factory_args + args
argument_list = []
for argument in argument_defaults_list:
argument_list.append(list(argument))
for index, value in enumerate(args):
argument_list[index][1] = value
argument_dict = dict(argument_list)
argument_dict.update(factory_kwargs)
argument_dict.update(kwargs)
add_description(wrapper_id, argument_dict)
return case(*args, **kwargs)
case_wrappers.append(case_wrapper)
case_wrapper.__name__ = case.__name__
return case_wrapper
return case_wrapper_wrapper()
return case_factory
class TestTest(object):
def __init__(self):
self.data = None
def setUp(self):
print 'setup', self
self.data = (1,2,3)
def test_all(self):
for index, value in enumerate((1,2,3)):
yield self.validate_equality(), index, value
def test_all_again(self):
for index, value in enumerate((1,2,3)):
yield self.validate_equality_again, index, value
#labelable_yielded_case
def validate_equality(self, index, value):
'''element {index} equals {value}'''
print 'test', self
assert self.data[index] == value, 'expected %d got %d' % (value, self.data[index])
def validate_equality_again(self, index, value):
print 'test', self
assert self.data[index] == value, 'expected %d got %d' % (value, self.data[index])
validate_equality_again.description = 'again'
When run through nose, the again tests work just fine, but the set of tests using the decorated generator target all fail because self.data is None (because setUp is never run because the instance of TestTest stored in the closures is not the instances run by nose). I tried making the decorator an instance member of a base class for TestTest, but then nose threw errors about having too few arguments (no self) passed to the unbound labelable_yielded_case. Is there any way I can make this work (short of hacking nose), or am I stuck choosing between either not being able to have the yield target be an instance member or not having per-test labeling for each yielded test?
Fixed it (at least for the case here, though I think I got it for all cases). I had to fiddle with case_wrapper_wrapper and case_wrapper to get the factory to return the wrapped cases attached to the correct class, but not bound to any given instance in any way. I also had another code issue because I was building the argument dict in wrapper wrapper, but then not passing it to the case. Working code:
import inspect
def labelable_yielded_case(case):
argspec = inspect.getargspec(case)
if argspec.defaults is not None:
defaults_list = [''] * (len(argspec.args) - len(argspec.defaults)) + argspec.defaults
else:
defaults_list = [''] * len(argspec.args)
argument_defaults_list = zip(argspec.args, defaults_list)
case_wrappers = []
def add_description(wrapper_id, argument_dict):
case_wrappers[wrapper_id].description = case.__doc__.format(**argument_dict)
def case_factory(*factory_args, **factory_kwargs):
def case_wrapper_wrapper():
wrapper_id = len(case_wrappers)
def case_wrapper(*args, **kwargs):
argument_list = []
for argument in argument_defaults_list:
argument_list.append(list(argument))
for index, value in enumerate(args):
argument_list[index][1] = value
argument_dict = dict(argument_list)
argument_dict.update(kwargs)
add_description(wrapper_id, argument_dict)
return case(**argument_dict)
case_wrappers.append(case_wrapper)
case_name = case.__name__ + str(wrapper_id)
case_wrapper.__name__ = case_name
if factory_args:
setattr(factory_args[0].__class__, case_name, case_wrapper)
return getattr(factory_args[0].__class__, case_name)
else:
return case_wrapper
return case_wrapper_wrapper()
return case_factory
class TestTest(object):
def __init__(self):
self.data = None
def setUp(self):
self.data = (1,2,3)
def test_all(self):
for index, value in enumerate((1,2,3)):
yield self.validate_equality(), index, value
#labelable_yielded_case
def validate_equality(self, index, value):
'''element {index} equals {value}'''
assert self.data[index] == value, 'expected %d got %d' % (value, self.data[index])
This has come up several times recently and I'd like to deal with it better than I have been: I have a series of attributes that I'm cross referencing between an object and a dictionary. If the value is different between them, I want to set the object.attribute to the dictionary['attribute'] value. I also want to keep track of what's getting changed.
Now, my first thought is to just use an if else statement for every attribute, but after writing a few of these it's apparent that I'm re-writing the same code again and again. There has to be a DRY way to do this, where I specify only the parts that are changing every time, and then loop through all the attributes.
In production code, there are 15 different attributes, but my example below will just use 2 for simplicity. I have some idea about how to do this in a clever way, but I'm missing the final step of actually setting the object.attribute equal to the dictionary['attribute'] value.
# Simulated data setup - not under my control IRL
class someClass:
def __init__(self, name, version):
self.name = name
self.version = version
objA = someClass('Test1','1.1')
dictA = {'name':'Test1','revision':'1.2'}
# My code below
# option 1 - a series of for loops
def updateAttributesSimple(obj, adict, msg):
if obj.name == adict['name']:
msg.append('Name is the same')
else:
msg.append('Name was updated from %s to %s' % (obj.name, adict['name']))
obj.name = adict['name']
if obj.version == adict['revision']:
msg.append('Version is the same')
else:
msg.append('Version was updated from %s to %s' % (obj.version, adict['revision']))
obj.version = adict['revision']
# option 2 - trying to be clever about this
def updateAttributesClever(obj, adict, msg):
attributeList = (('Name', obj.name, adict['name']),
('Version', obj.version, adict['revision']))
for valTuple in attributeList:
if valTuple[1] == valTuple[2]:
msg.append('%s is the same' % (valTuple[0]))
else:
msg.append('%s was updated from %s to %s' % (valTuple[0], valTuple[1], valTuple[2]))
# code to set valTuple[1] = valTuple[2] goes here, but what is it?
# valTuple[1] = valTuple[2] attempts to set the desired value to a string, rather than the attribute of obj itself
msg = ['Updating Attributes simple way:']
updateAttributesSimple(objA, dictA, msg)
print '\n\t'.join(msg)
#reset data
objA = someClass('Test1','1.1')
dictA = {'name':'Test1','revision':'1.2'}
msg = ['Updating Attributes clever way:']
updateAttributesClever(objB, dictB, msg)
print '\n\t'.join(msg)
The idea being that this way, whenever I need to add another attribute, I can just update the list of attributes being inspected and the rest of the code is already written. What's the Pythonic way to accomplish this?
setattr() is what you're looking for:
attributeList = (('Name', 'name', 'name'),
('Version', 'version', 'revision'))
for title, obj_attribute, dict_key in attributeList:
obj_value = getattr(obj, obj_attribute)
adict_value = adict[dict_key]
if obj_value == adict_value:
msg.append('%s is the same' % (obj_value))
else:
msg.append('%s was updated from %s to %s' % (title, obj_value, adict_value))
setattr(obj, obj_attribute, adict_value)
This should work for your:
class X(object):
def __init__(self):
self.a = 1
self.b = 2
x = X()
d = dict()
d['a'] = 1
d['b'] = 3
def updateAttributes(obj,dic):
def update(name):
val = dic[name]
if getattr(obj,name)==val:
print name,"was equal"
else:
print "setting %s to %s" % (name,val)
setattr(obj,name,val)
for name in ['a','b']:
update(name)
updateAttributes(x,d)
print x.a
print x.b
You might want to think about creating a function which can take an arbitrary object and convert the dictionary of name/value pairs into something more meaningful. It's not strictly a "Python" strategy but something that is fairly easy to do in Python because of its support of closures and how it treats objects under the hood:
def checkUpdates( obj ):
def updated( dictionaryPrevious, msg ):
for name, value in dictionaryPrevious.items():
if( obj.__dict__[name] == value ):
msg.append('Name is the same')
else:
msg.append(name + 'has been changed!')
obj.__dict__[name] = value
return updated
I am making one assumption, the names in the dictionary always correspond to the object variables. If they're not the same you'll need to make a mapping.
edit:
() => [] and object => obj. thanks guys. Sometimes you go from one language to a few others and it all gets muddled.
A couple of answers are close, but to handle that fact that the name of the key in the dict don't match the corresponding object's attribute name, you'll need some way to handle that. This can be easily done by adding yet another dictionary mapping the names of keys in the dict to the names of the object's attributes.
class someClass:
def __init__(self, name, version):
self.name = name
self.version = version
objA = someClass('Test1','1.1')
dictA = {'name':'Test1','revision':'1.2'}
keymap = {'name':'name', 'revision':'version'}
def updateAttributesGeneric(obj, adict, key2attr, msg):
for key, value in adict.iteritems():
attrname = key2attr[key]
if getattr(obj, attrname) == value:
msg.append('%s is the same' % attrname)
else:
msg.append('%s has been changed' % attrname)
setattr(obj, attrname, adict[key])
msg = ['Updating Attributes:']
updateAttributesGeneric(objA, dictA, keymap, msg)
print '\n\t'.join(msg)
# Updating Attributes:
# name is the same
# version has been changed