SQLAlchemy versioning cares about class import order - python

I was following the guide here:
http://www.sqlalchemy.org/docs/orm/examples.html?highlight=versioning#versioned-objects
and have come across an issue. I have defined my relationships like:
generic_ticker = relation('MyClass', backref=backref("stuffs"))
with strings so it doesn't care about the import order of my model modules. This all works fine normally, but when I use the versioning meta I get the following error:
sqlalchemy.exc.InvalidRequestError: When initializing mapper Mapper|MyClass|stuffs, expression 'Trader' failed to locate a name ("name 'MyClass' is not defined"). If this is a class name, consider adding this relationship() to the class after both dependent classes have been defined.
I tracked down the error to:
File "/home/nick/workspace/gm3/gm3/lib/history_meta.py", line 90, in __init__
mapper = class_mapper(cls)
File "/home/nick/venv/tg2env/lib/python2.6/site-packages/sqlalchemy/orm/util.py", line 622, in class_mapper
mapper = mapper.compile()
class VersionedMeta(DeclarativeMeta):
def __init__(cls, classname, bases, dict_):
DeclarativeMeta.__init__(cls, classname, bases, dict_)
try:
mapper = class_mapper(cls)
_history_mapper(mapper)
except UnmappedClassError:
pass
I fixed the problem by putting the try: except stuff in a lambda and running them all after all the imports have happened. This works but seems a bit rubbish, any ideas of how to fix this is a better way?
Thanks!
Update
The problem is not actually about import order. The versioning example is designed such that mapper requires compilation in costructor of each versioned class. And compilation fails when related classes are not yet defined. In case of circular relations there is no way to make it working by changing definition order of mapped classes.
Update 2
As the above update states (I didn't know you could edit other people's posts on here :)) this is likely due to circular references. In which case may be someone will find my hack useful (I'm using it with turbogears) (Replace VersionedMeta and add in create_mappers global in history_meta)
create_mappers = []
class VersionedMeta(DeclarativeMeta):
def __init__(cls, classname, bases, dict_):
DeclarativeMeta.__init__(cls, classname, bases, dict_)
#I added this code in as it was crashing otherwise
def make_mapper():
try:
mapper = class_mapper(cls)
_history_mapper(mapper)
except UnmappedClassError:
pass
create_mappers.append(lambda: make_mapper())
Then you can do something like the following in your models __init__.py
# Import your model modules here.
from myproj.lib.history_meta import create_mappers
from myproj.model.misc import *
from myproj.model.actor import *
from myproj.model.stuff1 import *
from myproj.model.instrument import *
from myproj.model.stuff import *
#setup the history
[func() for func in create_mappers]
That way it create the mappers only after all the classes have been defined.
Update 3
Slightly unrelated but I came across a duplicate primary key error in some circumstances (committing 2 changes to the same object in one go). My workaround has been to add a new primary auto-incrementing key. Of course you can't have more than 1 with mysql so I had to de-primary key the existing stuff used to create the history table. Check out my overall code (including a hist_id and getting rid of the foreign key constraint):
"""Stolen from the offical sqlalchemy recpies
"""
from sqlalchemy.ext.declarative import DeclarativeMeta
from sqlalchemy.orm import mapper, class_mapper, attributes, object_mapper
from sqlalchemy.orm.exc import UnmappedClassError, UnmappedColumnError
from sqlalchemy import Table, Column, ForeignKeyConstraint, Integer
from sqlalchemy.orm.interfaces import SessionExtension
from sqlalchemy.orm.properties import RelationshipProperty
from sqlalchemy.types import DateTime
import datetime
from sqlalchemy.orm.session import Session
def col_references_table(col, table):
for fk in col.foreign_keys:
if fk.references(table):
return True
return False
def _history_mapper(local_mapper):
cls = local_mapper.class_
# set the "active_history" flag
# on on column-mapped attributes so that the old version
# of the info is always loaded (currently sets it on all attributes)
for prop in local_mapper.iterate_properties:
getattr(local_mapper.class_, prop.key).impl.active_history = True
super_mapper = local_mapper.inherits
super_history_mapper = getattr(cls, '__history_mapper__', None)
polymorphic_on = None
super_fks = []
if not super_mapper or local_mapper.local_table is not super_mapper.local_table:
cols = []
for column in local_mapper.local_table.c:
if column.name == 'version':
continue
col = column.copy()
col.unique = False
#don't auto increment stuff from the normal db
if col.autoincrement:
col.autoincrement = False
#sqllite falls over with auto incrementing keys if we have a composite key
if col.primary_key:
col.primary_key = False
if super_mapper and col_references_table(column, super_mapper.local_table):
super_fks.append((col.key, list(super_history_mapper.base_mapper.local_table.primary_key)[0]))
cols.append(col)
if column is local_mapper.polymorphic_on:
polymorphic_on = col
#if super_mapper:
# super_fks.append(('version', super_history_mapper.base_mapper.local_table.c.version))
cols.append(Column('hist_id', Integer, primary_key=True, autoincrement=True))
cols.append(Column('version', Integer))
cols.append(Column('changed', DateTime, default=datetime.datetime.now))
if super_fks:
cols.append(ForeignKeyConstraint(*zip(*super_fks)))
table = Table(local_mapper.local_table.name + '_history', local_mapper.local_table.metadata,
*cols, mysql_engine='InnoDB')
else:
# single table inheritance. take any additional columns that may have
# been added and add them to the history table.
for column in local_mapper.local_table.c:
if column.key not in super_history_mapper.local_table.c:
col = column.copy()
super_history_mapper.local_table.append_column(col)
table = None
if super_history_mapper:
bases = (super_history_mapper.class_,)
else:
bases = local_mapper.base_mapper.class_.__bases__
versioned_cls = type.__new__(type, "%sHistory" % cls.__name__, bases, {})
m = mapper(
versioned_cls,
table,
inherits=super_history_mapper,
polymorphic_on=polymorphic_on,
polymorphic_identity=local_mapper.polymorphic_identity
)
cls.__history_mapper__ = m
if not super_history_mapper:
cls.version = Column('version', Integer, default=1, nullable=False)
create_mappers = []
class VersionedMeta(DeclarativeMeta):
def __init__(cls, classname, bases, dict_):
DeclarativeMeta.__init__(cls, classname, bases, dict_)
#I added this code in as it was crashing otherwise
def make_mapper():
try:
mapper = class_mapper(cls)
_history_mapper(mapper)
except UnmappedClassError:
pass
create_mappers.append(lambda: make_mapper())
def versioned_objects(iter):
for obj in iter:
if hasattr(obj, '__history_mapper__'):
yield obj
def create_version(obj, session, deleted = False):
obj_mapper = object_mapper(obj)
history_mapper = obj.__history_mapper__
history_cls = history_mapper.class_
obj_state = attributes.instance_state(obj)
attr = {}
obj_changed = False
for om, hm in zip(obj_mapper.iterate_to_root(), history_mapper.iterate_to_root()):
if hm.single:
continue
for hist_col in hm.local_table.c:
if hist_col.key == 'version' or hist_col.key == 'changed' or hist_col.key == 'hist_id':
continue
obj_col = om.local_table.c[hist_col.key]
# get the value of the
# attribute based on the MapperProperty related to the
# mapped column. this will allow usage of MapperProperties
# that have a different keyname than that of the mapped column.
try:
prop = obj_mapper.get_property_by_column(obj_col)
except UnmappedColumnError:
# in the case of single table inheritance, there may be
# columns on the mapped table intended for the subclass only.
# the "unmapped" status of the subclass column on the
# base class is a feature of the declarative module as of sqla 0.5.2.
continue
# expired object attributes and also deferred cols might not be in the
# dict. force it to load no matter what by using getattr().
if prop.key not in obj_state.dict:
getattr(obj, prop.key)
a, u, d = attributes.get_history(obj, prop.key)
if d:
attr[hist_col.key] = d[0]
obj_changed = True
elif u:
attr[hist_col.key] = u[0]
else:
# if the attribute had no value.
attr[hist_col.key] = a[0]
obj_changed = True
if not obj_changed:
# not changed, but we have relationships. OK
# check those too
for prop in obj_mapper.iterate_properties:
if isinstance(prop, RelationshipProperty) and \
attributes.get_history(obj, prop.key).has_changes():
obj_changed = True
break
if not obj_changed and not deleted:
return
attr['version'] = obj.version
hist = history_cls()
for key, value in attr.iteritems():
setattr(hist, key, value)
obj.version += 1
session.add(hist)
class VersionedListener(SessionExtension):
def before_flush(self, session, flush_context, instances):
for obj in versioned_objects(session.dirty):
create_version(obj, session)
for obj in versioned_objects(session.deleted):
create_version(obj, session, deleted = True)

I fixed the problem by putting the try: except stuff in a lambda and
running them all after all the imports have happened.
Great!

Related

ZODB transactions for nested objects not working

I know that there is little development on ZODB but it might be useful for someone if using ZODB in 2022, or there might be some obvious thing I'm missing:
when trying to store changes to persistent objects inside a ZODB.DB.transaction with block.
they are not stored, and no error is raised.
while doing the same between transaction.begin() and transaction.commit() calls does work.
that is, the only way to currently use a with block is to change objects directly through conn.root(),
that means all persistent objects which want to store changes on themselves must know the full path from root to themselves, which is impractical.
there is also another weird behavior. where after storing an object for the first time, and retrieving it returns the same object, while the 2nd call and up will return a different object.
this trips tests trying to check if something is stored successfully, as it only happens once.
the following code tries to store attributes in a 2-level persistent hierarchy (simplified dev code)
import ZODB
import ZODB.FileStorage
from persistent.mapping import PersistentMapping
import transaction
store=ZODB.FileStorage.FileStorage("temp1.db")
db=ZODB.DB(store)
def get_init(name, obj):
with db.transaction(f"creating root[{name}]") as conn:
try:
return conn.root()[name]
except KeyError:
conn.root()[name] = obj()
return conn.root()[name]
class A:
def __init__(self):
self.cfg = PersistentMapping()
def __setitem__(self, key, value) -> None:
transaction.begin()
self.cfg[key+", inside block"] = value
transaction.commit()
with db.transaction():
self.cfg[key+", inside with"] = value #does not work
#these should be equivalent, no?
def __iter__(self):
return iter(self.cfg)
class Manager:
def __init__(self):
self.a1=get_init("testing", PersistentMapping) # set up the db, should only happen once
def __setitem__(self, name, obj) -> None:
"""Registers in persistent storage"""
with db.transaction(f"Adding testing:{name}") as conn:
if name in conn.root()["testing"]:
print(f"testing with same name {name} already exists in storage")
return
conn.root()["testing"][name] = obj
def __getitem__(self, name: str):
return db.open().root()["testing"][name]
dm=Manager()
initial=A() #only relevant for forst run
dm['a']=initial #only relevant for forst run
fromdb1= dm['a']
fromdb2= dm['a']
with db.transaction() as conn:
fromdb1.cfg['updated from outer txn, directly'] = 1 #doed not work
conn.root()['testing']['a'].cfg['updated from outer txn,through conn'] = 1
#this should be equivalent but only the second one works
initial['new txn updated on initial'] = 1
fromdb1['new txn updated on retrieved 1']= 1
fromdb2['new txn updated on retrieved 2']= 1
print(f"initial obj - {initial.cfg}")
print(f"from db obj 1- {fromdb1.cfg}")
print(f"from db obj 2- {fromdb2.cfg}")
print(f"\nnew from db obj- {dm['a'].cfg}")
print(f"\nis the initial obj and the first obj from db the same: {initial is fromdb1}")
print(f"is the initial obj and the second obj from db the same: {initial is fromdb2}")
Unless I'm missing something the expected result is for all those methods to work.
Any advice from people using ZODB?

Not able handle "on_conflict_do_nothing" in Sqlalchemy in Mysql [duplicate]

Is there an elegant way to do an INSERT ... ON DUPLICATE KEY UPDATE in SQLAlchemy? I mean something with a syntax similar to inserter.insert().execute(list_of_dictionaries) ?
ON DUPLICATE KEY UPDATE post version-1.2 for MySQL
This functionality is now built into SQLAlchemy for MySQL only. somada141's answer below has the best solution:
https://stackoverflow.com/a/48373874/319066
ON DUPLICATE KEY UPDATE in the SQL statement
If you want the generated SQL to actually include ON DUPLICATE KEY UPDATE, the simplest way involves using a #compiles decorator.
The code (linked from a good thread on the subject on reddit) for an example can be found on github:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def append_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
if 'append_string' in insert.kwargs:
return s + " " + insert.kwargs['append_string']
return s
my_connection.execute(my_table.insert(append_string = 'ON DUPLICATE KEY UPDATE foo=foo'), my_values)
But note that in this approach, you have to manually create the append_string. You could probably change the append_string function so that it automatically changes the insert string into an insert with 'ON DUPLICATE KEY UPDATE' string, but I'm not going to do that here due to laziness.
ON DUPLICATE KEY UPDATE functionality within the ORM
SQLAlchemy does not provide an interface to ON DUPLICATE KEY UPDATE or MERGE or any other similar functionality in its ORM layer. Nevertheless, it has the session.merge() function that can replicate the functionality only if the key in question is a primary key.
session.merge(ModelObject) first checks if a row with the same primary key value exists by sending a SELECT query (or by looking it up locally). If it does, it sets a flag somewhere indicating that ModelObject is in the database already, and that SQLAlchemy should use an UPDATE query. Note that merge is quite a bit more complicated than this, but it replicates the functionality well with primary keys.
But what if you want ON DUPLICATE KEY UPDATE functionality with a non-primary key (for example, another unique key)? Unfortunately, SQLAlchemy doesn't have any such function. Instead, you have to create something that resembles Django's get_or_create(). Another StackOverflow answer covers it, and I'll just paste a modified, working version of it here for convenience.
def get_or_create(session, model, defaults=None, **kwargs):
instance = session.query(model).filter_by(**kwargs).first()
if instance:
return instance
else:
params = dict((k, v) for k, v in kwargs.iteritems() if not isinstance(v, ClauseElement))
if defaults:
params.update(defaults)
instance = model(**params)
return instance
I should mention that ever since the v1.2 release, the SQLAlchemy 'core' has a solution to the above with that's built in and can be seen under here (copied snippet below):
from sqlalchemy.dialects.mysql import insert
insert_stmt = insert(my_table).values(
id='some_existing_id',
data='inserted value')
on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(
data=insert_stmt.inserted.data,
status='U'
)
conn.execute(on_duplicate_key_stmt)
Based on phsource's answer, and for the specific use-case of using MySQL and completely overriding the data for the same key without performing a DELETE statement, one can use the following #compiles decorated insert expression:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def append_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
if insert.kwargs.get('on_duplicate_key_update'):
fields = s[s.find("(") + 1:s.find(")")].replace(" ", "").split(",")
generated_directive = ["{0}=VALUES({0})".format(field) for field in fields]
return s + " ON DUPLICATE KEY UPDATE " + ",".join(generated_directive)
return s
It's depends upon you. If you want to replace then pass OR REPLACE in prefixes
def bulk_insert(self,objects,table):
#table: Your table class and objects are list of dictionary [{col1:val1, col2:vale}]
for counter,row in enumerate(objects):
inserter = table.__table__.insert(prefixes=['OR IGNORE'], values=row)
try:
self.db.execute(inserter)
except Exception as E:
print E
if counter % 100 == 0:
self.db.commit()
self.db.commit()
Here commit interval can be changed to speed up or speed down
My way
import typing
from datetime import datetime
from sqlalchemy.dialects import mysql
class MyRepository:
def model(self):
return MySqlAlchemyModel
def upsert(self, data: typing.List[typing.Dict]):
if not data:
return
model = self.model()
if hasattr(model, 'created_at'):
for item in data:
item['created_at'] = datetime.now()
stmt = mysql.insert(getattr(model, '__table__')).values(data)
for_update = []
for k, v in data[0].items():
for_update.append(k)
dup = {k: getattr(stmt.inserted, k) for k in for_update}
stmt = stmt.on_duplicate_key_update(**dup)
self.db.session.execute(stmt)
self.db.session.commit()
Usage:
myrepo.upsert([
{
"field11": "value11",
"field21": "value21",
"field31": "value31",
},
{
"field12": "value12",
"field22": "value22",
"field32": "value32",
},
])
The other answers have this covered but figured I'd reference another good example for mysql I found in this gist. This also includes the use of LAST_INSERT_ID, which may be useful depending on your innodb auto increment settings and whether your table has a unique key. Lifting the code here for easy reference but please give the author a star if you find it useful.
from app import db
from sqlalchemy import func
from sqlalchemy.dialects.mysql import insert
def upsert(model, insert_dict):
"""model can be a db.Model or a table(), insert_dict should contain a primary or unique key."""
inserted = insert(model).values(**insert_dict)
upserted = inserted.on_duplicate_key_update(
id=func.LAST_INSERT_ID(model.id), **{k: inserted.inserted[k]
for k, v in insert_dict.items()})
res = db.engine.execute(upserted)
return res.lastrowid
ORM
use upset func based on on_duplicate_key_update
class Model():
__input_data__ = dict()
def __init__(self, **kwargs) -> None:
self.__input_data__ = kwargs
self.session = Session(engine)
def save(self):
self.session.add(self)
self.session.commit()
def upsert(self, *, ingore_keys = []):
column_keys = self.__table__.columns.keys()
udpate_data = dict()
for key in self.__input_data__.keys():
if key not in column_keys:
continue
else:
udpate_data[key] = self.__input_data__[key]
insert_stmt = insert(self.__table__).values(**udpate_data)
all_ignore_keys = ['id']
if isinstance(ingore_keys, list):
all_ignore_keys =[*all_ignore_keys, *ingore_keys]
else:
all_ignore_keys.append(ingore_keys)
udpate_columns = dict()
for key in self.__input_data__.keys():
if key not in column_keys or key in all_ignore_keys:
continue
else:
udpate_columns[key] = insert_stmt.inserted[key]
on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(
**udpate_columns
)
# self.session.add(self)
self.session.execute(on_duplicate_key_stmt)
self.session.commit()
class ManagerAssoc(ORM_Base, Model):
def __init__(self, **kwargs):
self.id = idWorker.get_id()
column_keys = self.__table__.columns.keys()
udpate_data = dict()
for key in kwargs.keys():
if key not in column_keys:
continue
else:
udpate_data[key] = kwargs[key]
ORM_Base.__init__(self, **udpate_data)
Model.__init__(self, **kwargs, id = self.id)
....
# you can call it as following:
manager_assoc.upsert()
manager.upsert(ingore_keys = ['manager_id'])
Got a simpler solution:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def replace_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
s = s.replace("INSERT INTO", "REPLACE INTO")
return s
my_connection.execute(my_table.insert(replace_string=""), my_values)
I just used plain sql as:
insert_stmt = "REPLACE INTO tablename (column1, column2) VALUES (:column_1_bind, :columnn_2_bind) "
session.execute(insert_stmt, data)
As none of these solutions seem all the elegant. A brute force way is to query to see if the row exists. If it does delete the row and then insert otherwise just insert. Obviously some overhead involved but it does not rely on modifying the raw sql and it works on non orm stuff.

Use ObjectListView in the Model-View-Controller pattern with SQLAlchemy in Python

I think about how to use ObjectListView fitting the Model-View-Controller Pattern with wxPython & SQLAlchemy. And I am not sure about it so I created a simple example as a working basis not as a solution.
The concrete question related to the code below is: What should happen if a new MyData object is generated?
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import wx
import sqlalchemy as sa
import sqlalchemy.ext.declarative as sad
import ObjectListView as olv
_Base = sad.declarative_base()
class MyData(_Base):
"""the database table representing class"""
__tablename__ = 'MyData'
__name = sa.Column('name', sa.String, primary_key=True)
__count = sa.Column('count', sa.Numeric(10, 2))
def __init__(self, name, count):
super(MyData, self).__init__()
self.__name = name
self.__count = count
def GetName(self):
return self.__name
def GetCount(self):
return self.__count
def CreateData():
"""
helper creating data
imagnine this as a SELECT * FROM on the database
"""
return [
MyData('Anna', 7),
MyData('Bana', 6)
]
class MyView(olv.ObjectListView):
def __init__(self, parent):
super(MyView, self).__init__(parent, wx.ID_ANY, style=wx.LC_REPORT)
self.SetColumns([
olv.ColumnDefn('Name', valueGetter='GetName'),
olv.ColumnDefn('Count', valueGetter='GetCount')
])
data = CreateData()
self.SetObjects(data)
def ColDef(self):
return
class MyApp(wx.App):
def OnInit(self):
frame = wx.Frame(None)
view = MyView(frame)
frame.Show()
return True
if __name__ == '__main__':
app = MyApp()
app.MainLoop()
What whould you think about...
Create a controller "MyDataController" that handle all the sqlalchemy-stuff for MyData objects. e.g. GetAllMyDataObjects AddMyDataObjectToDatabase, QueryMyData, ...
Related to the Observer-Pattern the ObjectListView observer the controller as the subject.
I am not sure if this is an elegant solution.
The point is the confusion about, what is the model? The one (and new) MyData instance or the list of all MyData instances? There is no intelligent list which could act like a model.
With regards to OLV. In my case when an update to the SA model happens I use pubsub to inform the world about the change add/update/delete.
Then my OLV base class subscribes to the 'itemAdded', 'itemModified' and 'itemDeleted' messages and the following are the methods called:
def pubListItemAdded(self, dbitem):
"""
Add list if dbitem instance matches dbScKlass
:param dbitem: an SA model instance
If dbitem instance matches the list controls model it is added.
"""
# protect from PyDeadObjectError
if self:
# E.g. Externalimp is a faked class and does not exist in db
# so we need to protect for that
if hasattr(db, self._klassName):
tList = self.getList()
cInst = getattr(db, self._klassName)
if isinstance(dbitem, cInst):
log.debug("olvbase - added: %s", dbitem)
log.debug("olvbase - added: %s", self)
# for some reason this creates dups on e.g. rating/tasting/consumption
# so, lets check if it is there and only add if not
idx = tList.GetIndexOf(dbitem)
if idx == -1:
tList.AddObject(dbitem)
else:
tList.RefreshObject(dbitem)
# bring it into view
self.resetSelection()
tList.SelectObject(dbitem, deselectOthers=True,
ensureVisible=True)
def pubListItemModified(self, dbitem):
"""
Update list if dbitem instance matches dbScKlass
:param dbitem: an SA model instance
If dbitem instance matches the list controls model it is updated.
"""
# protect from PyDeadObjectError
if self:
# E.g. Externalimp is a faked class and does not exist in db
# so we need to protect for that
if hasattr(db, self._klassName):
cInst = getattr(db, self._klassName)
if isinstance(dbitem, cInst):
log.debug("olvbase - modified: %s", dbitem)
log.debug("olvbase - modified: %s", self)
tList = self.getList()
# need to refresh to ensure relations are loaded
wx.GetApp().ds.refresh(dbitem)
tList.RefreshObject(dbitem)
tList.SelectObject(dbitem, deselectOthers=True,
ensureVisible=True)
# deselect all, so if we select same item again
# we will get a select event
tList.DeselectAll()
def pubListItemDeleted(self, dbitem):
"""
Delete from list if dbitem instance matches dbScKlass
:param dbitem: an SA model instance
If dbitem instance matches the list controls model it is updated.
"""
# protect from PyDeadObjectError
if self:
# E.g. Externalimp is a faked class and does not exist in db
# so we need to protect for that
if hasattr(db, self._klassName):
cInst = getattr(db, self._klassName)
if isinstance(dbitem, cInst):
log.debug("olvbase - deleted: %s", dbitem)
log.debug("olvbase - deleted: %s", self)
tList = self.getList()
tList.RemoveObject(dbitem)
self.currentObject = None
self.currentItemPkey = None

Optimizing modifiable named list based on namedtuple

My goal is to optimize a framework based on a stack of modifiers for CSV-sourced lists. Each modifier uses a header list to work on a named basis.
CSV example (including header):
date;place
13/02/2013;New York
15/04/2012;Buenos Aires
29/10/2010;Singapour
I have written some code based on namedtuple in order to be able to use lists generated by csv module without reorganizing data every time. Generated code below :
class MyNamedList(object):
__slots__ = ("__values")
_fields = ['date', 'ignore', 'place']
def __init__(self, values):
self.__values = values
if len(self.__values) <= 151:
for i in range(len(self.__values), 151):
self.__values += [None,]
#property
def date(self):
return self.__values[0]
#date.setter
def date(self, val):
self.__values[0] = val
#property
def ignore(self):
return self.__values[150]
#ignore.setter
def ignore(self, val):
self.__values[150] = val
#property
def place(self):
return self.__values[1]
#b.setter
def place(self, val):
self.__values[1] = val
I must say i am very disappointed with performance using this class. Calling a simple modifier function (which changes "ignore" to True 100 times. Yes i know it is useless) for each line of a 70000-line csv file takes 9 seconds (with pypy. 5.5 using original python) whereas equivalent code using a list named foo takes 1.1 second (same with pypy and original python).
Is there anything i could do to get comparable performance between both approaches ? To me, record.ignore = True could be directly inlined (or so) and therefore translated into record[150] = True. Is there any blocking point i don't see to get this to happen ?
Note that the record i am modifying is actually (for now) not created for each line in the CSV file, meaning adding more items into the list happens only once, before the iteration.
Update : sample codes
--> Using namedlist
import namedlist
MyNamedList=namedlist.namedlist("MyNamedList", {"a":1, "b":2, "ignore":150})
test = MyNamedList([0,1])
def foo(a):
test.ignore = True # x100 times
import csv
stream = csv.reader(open("66666.csv", "rb"))
for i in stream:
foo(i)
--> Not using namedlist
import namedlist
import csv
MyNamedList=namedlist.namedlist("MyNamedList", {"a":1, "b":2, "ignore":150})
test = MyNamedList([0,1])
sample_data = []
for i in range(len(sample_data), 151):
sample_data += [None,]
def foo(a):
sample_data[150] = True # x100 times
stream = csv.reader(open("66666.csv", "rb"))
for i in stream:
foo(i)
Update #2 : code for namedlist.py (heavily based on namedtuple.py
# Retrieved from http://code.activestate.com/recipes/500261/
# Licensed under the PSF license
from keyword import iskeyword as _iskeyword
import sys as _sys
def namedlist(typename, field_indices, verbose=False, rename=False):
# Parse and validate the field names. Validation serves two purposes,
# generating informative error messages and preventing template injection attacks.
field_names = field_indices.keys()
for name in [typename,] + field_names:
if not min(c.isalnum() or c=='_' for c in name):
raise ValueError('Type names and field names can only contain alphanumeric characters and underscores: %r' % name)
if _iskeyword(name):
raise ValueError('Type names and field names cannot be a keyword: %r' % name)
if name[0].isdigit():
raise ValueError('Type names and field names cannot start with a number: %r' % name)
seen_names = set()
for name in field_names:
if name.startswith('_') and not rename:
raise ValueError('Field names cannot start with an underscore: %r' % name)
if name in seen_names:
raise ValueError('Encountered duplicate field name: %r' % name)
seen_names.add(name)
# Create and fill-in the class template
numfields = len(field_names)
argtxt = repr(field_names).replace("'", "")[1:-1] # tuple repr without parens or quotes
reprtxt = ', '.join('%s=%%r' % name for name in field_names)
max_index=-1
for name in field_names:
index = field_indices[name]
if max_index < index:
max_index = index
max_index += 1
template = '''class %(typename)s(object):
__slots__ = ("__values") \n
_fields = %(field_names)r \n
def __init__(self, values):
self.__values = values
if len(self.__values) <= %(max_index)s:
for i in range(len(self.__values), %(max_index)s):
self.__values += [None,]'''% locals()
for name in field_names:
index = field_indices[name]
template += ''' \n
#property
def %s(self):
return self.__values[%d]
#%s.setter
def %s(self, val):
self.__values[%d] = val''' % (name, index, name, name, index)
if verbose:
print template
# Execute the template string in a temporary namespace
namespace = {'__name__':'namedtuple_%s' % typename,
'_property':property, '_tuple':tuple}
try:
exec template in namespace
except SyntaxError, e:
raise SyntaxError(e.message + ':\n' + template)
result = namespace[typename]
# For pickling to work, the __module__ variable needs to be set to the frame
# where the named tuple is created. Bypass this step in enviroments where
# sys._getframe is not defined (Jython for example) or sys._getframe is not
# defined for arguments greater than 0 (IronPython).
try:
result.__module__ = _sys._getframe(1).f_globals.get('__name__', '__main__')
except (AttributeError, ValueError):
pass
return result

sqlalchemy with dynamic mapping and complex object querying

I have the following situation:
class MyBaseClass(object):
def __init__(self, name):
self.name = name
self.period = None
self.foo = None
def __getitem__(self, item):
return getattr(self, item)
def __setitem__(self, item, value):
return setattr(self, item, value)
If in running time I need to add some additional columns we could do:
my_base_class_table = Table("MyBaseClass", metadata,
Column('name', String, primary_key=True),
Column('period', DateTime),
Column('foo', Float),
)
my_base_class_table = Table("MyBaseClass", metadata, extend_existing=True)
column_list = ["value_one", "other_name", "random_XARS123"]
for col in column_list:
my_base_class_table.append_column(Column(col, Float))
create_all()
mapper(MyBaseClass, my_base_class_table)
Until here we have a fully functional dynamic table mapping with extended columns.
Now, using the sqlalchemy's ORM you could easily instantiate a MyBaseClass and modify it to reflect changes in the database:
base_class = MyBaseClass(name="Something")
base_class.period = "2002-10-01"
And using the dynamic columns with unknown column names:
for col in column_list:
base_class[col] = 10
session.add(base_class)
But actually only if you know the column names:
t_query = session.query(func.strftime('%Y-%m-%d', MyBaseClass.period),
func.sum(MyBaseClass.foo), \
func.sum(MyBaseClass.other_name*MyBaseClass.value_one))
Is possible to repeat the last query (t_query) without knowing the column names? I've already tried different cases with no luck:
func.sum(MyBaseClass[column_list[0]]*MyBaseClass.[column_list[1]])
The only thing that actually work is doing the extended text sql like:
text_query = text("SELECT strftime('%Y-%m-%d', period) as period, sum(foo) as foo, sum({0}*{1}) as bar FROM {2} ".format(column_list[0], column_list[1], "MyBaseClass")
Simple getattr will do the trick:
t_query = session.query(func.strftime('%Y-%m-%d', getattr(MyBaseClass, "period")),
func.sum(getattr(MyBaseClass, "foo")),
func.sum(getattr(MyBaseClass, "other_name") * getattr(MyBaseClass, "value_one"))
)

Categories

Resources