Recreate some sqlalchemy object - python

Ok, so I'm having a bit of a problem here. I need to be able to create a sort of import/export functionality for some sqlalchemy. Now these are not objects I'm defining, so to get the columns I'm doing:
for attr, value in res.__class__.__dict__.iteritems():
if isinstance(value, InstrumentedAttribute):
data = eval("res." + str(attr))
param_dict[attr] = data
Now this correctly gets me the attributes of that object. However, I can;t be certain that the parameters of the init are the same, since I'm not the one handling this objects. So there could be a situation like:
class SomeClass(model.Base):
my_column = Column(String)
....some other stuff...
def __init__(self, mycolumn, ...):
self.my_column = mycolumn
So in this case I don't have any correspondance between the name of the field and the name of the parameter as recieved by init. I'm currently contraining the ones who define these classes to have all the init parametes with a default value so I could just:
obj = SomeClass()
exec "obj." + attr + " = " + param[attr]
However I would like to get away even from this constrain. Is there any way I can achieve this?

Serializing can't really be generalized for all possible sqlalchemy mapped classes, classes might have properties that aren't stored in the database, or that must be inferred across multiple levels of indirection through relationship properties. In short, only you know how to serialize a particular class for a particular use.
Lets pretend that you only need or care about the column values for the specific instance of the specific class under consideration (in res). Here's a crude function that will return a dict containing only those values.
from sqlalchemy.orm.attributes import manager_of_class
from sqlalchemy.orm.properties import ColumnProperty
def get_state_dict(instance):
cls = type(instance)
mgr = manager_of_class(cls)
return dict((key, getattr(instance, key))
for key, attr in mgr.iteritems()
if isinstance(attr.property, ColumnProperty))
and to recreate an instance from the dict*
def create_from_state_dict(cls, state_dict):
mgr = manager_of_class(cls)
instance = mgr.new_instance()
for key, value in state_dict.iteritems():
setattr(instance, key, value)
return instance
If you need something more complex, probably handling the relationships not shown as columns (as in many-to-many relationships), you can probably add that case by looking for sqlalchemy.orm.properties.RelationshipProperty, and then iterating over the collection.
*Serializing the intermediate dict and class is left as an exercise.

Related

Relationship to multiple types (polymorphism) in neomodel

Since 2014, there was an issue that relationship to multiple object types is not available:
https://github.com/robinedwards/neomodel/issues/126
It's now 2016, and still I'm not aware of any solution regarding this critical issue.
Example for usage:
class AnimalNode(StructuredNode):
tail_size = IntegerProperty()
age = IntegerProperty()
name = StringProperty()
class DogNode(AnimalNode):
smell_level = IntegerProperty()
class CatNode(AnimalNode):
vision_level = IntegerProperty()
class Owner(StructuredNode):
animals_owned = RelationshipTo("AnimalNode", "OWNED_ANIMAL")
dog_node1 = DogNode(name="Doggy", tail_size=3, age=2, smell_level=8).save()
cat_node1 = CatNode(name="Catty", tail_size=3, age=2, vision_level=8).save()
owner = Owner().save()
owner.animals_owned.connect(dog_node1)
owner.animals_owned.connect(cat_node1)
If I try to access animals_owned relationship of the owner, as you expect, it retrives only AnimalNode baseclasses and not its subclasses (DogNode or CatNode) so I am not able to access the attributes: smell_level or vision_level
I would want something like this to be permitted in neomodel:
class Owner(StructuredNode):
animals_owned = RelationshipTo(["DogNode", "CatNode"], "OWNED_ANIMAL")
and then when I will access animals_owned relationship of owner, It will retrieve objects of types DogNode and CatNode so I can access the subclasses attributes as I wish.
But the connect method yields the following error:
TypeError: isinstance() arg 2 must be a type or tuple of types
Is there any way to achieve that in neomodel in an elegant way?
Thanks!
I recently did something like this in order to implement a metadata model with inheritance. The relevant code is here: https://github.com/diging/cidoc-crm-neo4j/blob/master/crm/models.py
Basically the approach I took was to use plain-old multiple inheritance to build the models, which neomodel conveniently turns into correspondingly multiple labels on the nodes. Those models were all based on an abstract subclass of neomodel's StructuredNode; I added methods to re-instantiate the node at various levels of the class hierarchy, using the labels() and inherited_labels() instance methods. For example, this method will re-instantiate a node as either its most derivative class or a specific class in its hierarchy:
class HeritableStructuredNode(neomodel.StructuredNode):
def downcast(self, target_class=None):
"""
Re-instantiate this node as an instance its most derived derived class.
"""
# TODO: there is probably a far more robust way to do this.
_get_class = lambda cname: getattr(sys.modules[__name__], cname)
# inherited_labels() only returns the labels for the current class and
# any super-classes, whereas labels() will return all labels on the
# node.
classes = list(set(self.labels()) - set(self.inherited_labels()))
if len(classes) == 0:
return self # The most derivative class is already instantiated.
cls = None
if target_class is None: # Caller has not specified the target.
if len(classes) == 1: # Only one option, so this must be it.
target_class = classes[0]
else: # Infer the most derivative class by looking for the one
# with the longest method resolution order.
class_objs = map(_get_class, classes)
_, cls = sorted(zip(map(lambda cls: len(cls.mro()),
class_objs),
class_objs),
key=lambda (size, cls): size)[-1]
else: # Caller has specified a target class.
if not isinstance(target_class, basestring):
# In the spirit of neomodel, we might as well support both
# class (type) objects and class names as targets.
target_class = target_class.__name__
if target_class not in classes:
raise ValueError('%s is not a sub-class of %s'\
% (target_class, self.__class__.__name__))
if cls is None:
cls = getattr(sys.modules[__name__], target_class)
instance = cls.inflate(self.id)
# TODO: Can we re-instatiate without hitting the database again?
instance.refresh()
return instance
Note that this works partly because all of the models are defined in the same namespace; this might get tricky if that were not the case. There are still some kinks here to work out, but it gets the job done.
With this approach, you can define a relation to a superior class, and then connect nodes instantiated with inferior/more derivative classes. And then upon retrieval, "downcast" them to their original class (or some class in the hierarchy). For example:
>>> for target in event.P11_had_participant.all():
... original_target = target.downcast()
... print original_target, type(original_target)
{'id': 39, 'value': u'Joe Bloggs'} <class 'neomodel.core.E21Person'>
See this README for usage examples.
Good question.
I guess you could manually check what type of object each element of owner.animals_owned is and "inflate it" to the right type of object.
But would be really nice to have something automatic.
This following is not a proper solution, but more of a workaround. As noted in the error, isinstance() requires a tuple and not a dictionary. So the following will work:
class Owner(StructuredNode):
animals_owned = RelationshipTo((DogNode, CatNode), "OWNED_ANIMAL")
The limitation is that DogNode and CatNode have to be defined before the relationship; a quoted name will not work. It makes use of a feature of isinstance which allows you to pass a tuple of possible classes.
However, this usage is not officially supported in neomodel (as of now). Trying to list all the nodes will give an error, since neomodel still expects the type to be a class name and not a tuple.
Accessing relationship in reverse (AnimalNode -> Owner)
You can still make use of the relationship if you define it the other way as well, like
class AnimalNode(StructuredNode):
...
owner = RelationshipFrom("Owner", "OWNED_ANIMAL")
and then use AnimalNode.owner.get(), DogNode.owner.get(), etc. to retrieve the owner.
Workaround method to generate animals_owned
To generate animals_owned the from the Owner model, I used the following workaround method:
class Owner(StructuredNode):
...
def get_animals_owned(self):
# Possible classes
# (uses animals_owned property and converts to set of class names)
allowed_classes = set([i.__name__ for i in self.animals_owned.definition['node_class']])
# Retrieve all results with OWNED_ANIMAL relationship to self
results, columns = self.cypher('MATCH (o) where id(o)={self} MATCH (o)-[:OWNED_ANIMAL]->(a) RETURN a')
output = []
for r in results:
# Select acceptable labels (class names)
labels = allowed_classes.intersection(r[0].labels)
# Pick a random label from the selected ones
selected_label = labels.pop()
# Retrieve model class from label name
# see http://stackoverflow.com/a/1176179/1196444
model = globals()[selected_label]
# Inflate the model to the given class
output.append(model.inflate(r[0]))
return output
Testing:
>>> owner.get_animals_owned()
[<CatNode: {'age': 2, 'id': 49, 'vision_level': 8, 'name': 'Catty', 'tail_size': 3}>, <DogNode: {'age': 2, 'id': 46, 'smell_level': 8, 'name': 'Doggy', 'tail_size': 3}>]
Limitations:
If the multiple acceptable model types are available, a random one will be picked. (That's probably part of the reason it hasn't been officially implemented: for example if there's a PuppyModel which inherits from DogModel, and both are possible options, there's no easy way for the function to decide which one you really wanted).
Set function assumes multiple models (having a single one will not work)
Cypher query needs to be manually written depending on the model and relationship (it should be pretty straightforward to automate, though)
Access is via a function (adding an #property decorator will help fix this)
Of course, you will probably want to add more fine-tuning and safety-checks, but that should be enough to start off.

How to duplicate an SQLAlchemy-mapped object the correct way?

I want to duplicate (copy) an object mapped by SQLAlchemy. It should only copy the data created by me, not all the underliying stuff. It shouldn't copy the primary keys or unique values.
This is usefull when creating new data entries which differ only a little from the last one. So the user doesn't have to enter all data again.
An important requirement is that this need to work when the column name in the table (e.g. name) and the memeber name (e.g. _name) in the python class are not the same.
This (simplified) code work for all declarative_base() derived classes BUT ONLY when the col-name and the member-name are the same.
import sqlalchemy as sa
def DuplicateObject(oldObj):
mapper = sa.inspect(type(oldObj))
newObj = type(oldObj)()
for col in mapper.columns:
# no PrimaryKey not Unique
if not col.primary_key and not col.unique:
setattr(newObj, col.key, getattr(oldObj, col.key))
return newObj
col.key is the name of the column in the table. When the member name in the python class is different this wouldn't work. I don't know how SQLAlchemy connect the column-name with the member-name. How does SQLA know this connection? How can I take care of it?
import sqlalchemy as sqa
def duplicate_object(old_obj):
# SQLAlchemy related data class?
if not isinstance(old_obj, _Base):
raise TypeError('The given parameter with type {} is not '
'mapped by SQLAlchemy.'.format(type(old_obj)))
mapper = sa.inspect(type(old_obj))
new_obj = type(old_obj)()
for name, col in mapper.columns.items():
# no PrimaryKey not Unique
if not col.primary_key and not col.unique:
setattr(new_obj, name, getattr(old_obj, name))
return new_obj
It looks like that this work. Even when the members have begin with double underlines (__name).
But someone on the SQLA-mailinglist mentioned
It’s not a generalized solution for the whole world, though. It doesn’t take into account columns that are part of unique Index objects or columns that are mentioned in standalone UniqueConstraint objects.
But because the SQLA-docu is (for me!) quite hard to read and understand I am not really sure what happen in that code - especially in the for-construct. What is behind items() and why are there two parameters (name, col)?

Override instance attribute

In views.py I have:
my_computer = Computer.objects.get(pk=some_value)
The computer object has a field called projects that's a ManyRelatedManager.
Calling
my_projects = my_computer.projects.all()
will set the value of my_projects to a list of three project objects.
What I'm trying to achive is to set the value of my_computer.projects to the above list of projects instead of the ManyRelatedManager.
I have tried:
my_computer.projects = my_projects
but that doesn't work, although it doesn't raise an error either. The value of my_computer.projects is still the ManyRelatedManager.
Manager objects implement __set__ - they behave as descriptors.
This means you cannot change the object by assigning it (as long as its attribute of another object - __set__ is only called in the context of __setattr__ on the parent object - parent regarding composition relationships, and not inheritance relationships).
You can assign any list-like (actually: iterable) value to a manager if such iterable value yields models of the expected type. However this means:
When you query my_computer.projects, you will get again a manager object, with the objects you assigned.
When you save the object my_computer, only the specified objects will belong to the relationship - previous object in the relationship will not be related anymore to the current object.
There are three scenarios you could have which led you to this issue:
You need to hold a volatile list - this data is not stored, in any way, but used temporarily. You have to create a normal attribute in the class:
class Computer(models.Model):
#normal database fields here
def __init__(self, *args, **kwargs):
super(Computer, self).__init__(*args, **kwargs)
#ENSURE this attribute name does not collide with any field
#I'm assuming the Many manager name is projects.
self.my_projects = []
You need another representation of the exact same relationship - in this way, you want a comfortable way to access the object, instead of calling a strange .all(), e.g. to do a [k.foo for k in mycomputer.my_projects]. You have to create a property like this:
class Computer(models.Model):
#Normal database fields here
#I'm assuming the Many manager name is projects.
#property
def my_projects(self):
#remember: my_projects is another name.
#it CANNOT collide, so I have another
#name - cannot use projects as name.
return list(self.projects.all())
#my_projects.setter
def my_projects(self, value):
#this only abstracts the name, to match
#the getter.
self.projects = value
You need another relationship (so it's not volatile data): Create ANOTHER relationship in your model, pointing to the same mode, using the same through if applicable, but using a different related_name= (you must explicitly set related_name for at least one of the multiple relationships to the same model, from the same model)
You can't do that. Your best bet is to simply use another attribute name.
my_computer.related_projects = list(my_computer.projects.all())

How do I define a unique property for a Model in Google App Engine?

I need some properties to be unique. How can I achieve this?
Is there something like unique=True?
I'm using Google App Engine for Python.
Google has provided function to do that:
http://code.google.com/appengine/docs/python/datastore/modelclass.html#Model_get_or_insert
Model.get_or_insert(key_name, **kwds)
Attempts to get the entity of the model's kind with the given key name. If it exists, get_or_insert() simply returns it. If it doesn't exist, a new entity with the given kind, name, and parameters in kwds is created, stored, and returned.
The get and subsequent (possible) put are wrapped in a transaction to ensure atomicity. Ths means that get_or_insert() will never overwrite an existing entity, and will insert a new entity if and only if no entity with the given kind and name exists.
In other words, get_or_insert() is equivalent to this Python code:
def txn():
entity = MyModel.get_by_key_name(key_name, parent=kwds.get('parent'))
if entity is None:
entity = MyModel(key_name=key_name, **kwds)
entity.put()
return entity
return db.run_in_transaction(txn)
Arguments:
key_name
The name for the key of the entity
**kwds
Keyword arguments to pass to the model class's constructor if an instance with the specified key name doesn't exist. The parent argument is required if the desired entity has a parent.
Note: get_or_insert() does not accept an RPC object.
The method returns an instance of the model class that represents the requested entity, whether it existed or was created by the method. As with all datastore operations, this method can raise a TransactionFailedError if the transaction could not be completed.
There's no built-in constraint for making sure a value is unique. You can do this however:
query = MyModel.all(keys_only=True).filter('unique_property', value_to_be_used)
entity = query.get()
if entity:
raise Exception('unique_property must have a unique value!')
I use keys_only=True because it'll improve the performance slightly by not fetching the data for the entity.
A more efficient method would be to use a separate model with no fields whose key name is made up of property name + value. Then you could use get_by_key_name to fetch one or more of these composite key names and if you get one or more not-None values, you know there are duplicate values (and checking which values were not None, you'll know which ones were not unique.)
As onebyone mentioned in the comments, these approaches – by their get first, put later nature – run the risk concurrency issues. Theoretically, an entity could be created just after the check for an existing value, and then the code after the check will still execute, leading to duplicate values. To prevent this, you will have to use transactions: Transactions - Google App Engine
If you're looking to check for uniqueness across all entities with transactions, you'd have to put all of them in the same group using the first method, which would be very inefficient. For transactions, use the second method like this:
class UniqueConstraint(db.Model):
#classmethod
def check(cls, model, **values):
# Create a pseudo-key for use as an entity group.
parent = db.Key.from_path(model.kind(), 'unique-values')
# Build a list of key names to test.
key_names = []
for key in values:
key_names.append('%s:%s' % (key, values[key]))
def txn():
result = cls.get_by_key_name(key_names, parent)
for test in result:
if test: return False
for key_name in key_names:
uc = cls(key_name=key_name, parent=parent)
uc.put()
return True
return db.run_in_transaction(txn)
UniqueConstraint.check(...) will assume that every single key/value pair must be unique to return success. The transaction will use a single entity group for every model kind. This way, the transaction is reliable for several different fields at once (for only one field, this would be much simpler.) Also, even if you've got fields with the same name in one or more models, they will not conflict with each other.

Converting a database-driven (non-OO) python script into a non-database driven, OO-script

I have some software that is heavily dependent on MySQL, and is written in python without any class definitions. For performance reasons, and because the database is really just being used to store and retrieve large amounts of data, I'd like to convert this to an object-oriented python script that does not use the database at all.
So my plan is to export the database tables to a set of files (not many -- it's a pretty simple database; it's big in that it has a lot of rows, but only a few tables, each of which has just two or three columns).
Then I plan to read the data in, and have a set of functions which provide access to and operations on the data.
My question is this:
is there a preferred way to convert a set of database tables to classes and objects? For example, if I have a table which contains fruit, where each fruit has an id and a name, would I have a "CollectionOfFruit" class which contains a list of "Fruit" objects, or would I just have a "CollectionOfFruit" class which contains a list of tuples? Or would I just have a list of Fruit objects?
I don't want to add any extra frameworks, because I want this code to be easy to transfer to different machines. So I'm really just looking for general advice on how to represent data that might more naturally be stored in database tables, in objects in Python.
Alternatively, is there a good book I should read that would point me in the right direction on this?
If the data is a natural fit for database tables ("rectangular data"), why not convert it to sqlite? It's portable -- just one file to move the db around, and sqlite is available anywhere you have python (2.5 and above anyway).
Generally you want your Objects to absolutely match your "real world entities".
Since you're starting from a database, it's not always the case that the database has any real-world fidelity, either. Some database designs are simply awful.
If your database has reasonable models for Fruit, that's where you start. Get that right first.
A "collection" may -- or may not -- be an artificial construct that's part of the solution algorithm, not really a proper part of the problem. Usually collections are part of the problem, and you should design those classes, also.
Other times, however, the collection is an artifact of having used a database, and a simple Python list is all you need.
Still other times, the collection is actually a proper mapping from some unique key value to an entity, in which case, it's a Python dictionary.
And sometimes, the collection is a proper mapping from some non-unique key value to some collection of entities, in which case it's a Python collections.defaultdict(list).
Start with the fundamental, real-world-like entities. Those get class definitions.
Collections may use built-in Python collections or may require their own classes.
There's no "one size fits all" answer for this -- it'll depend a lot on the data and how it's used in the application. If the data and usage are simple enough you might want to store your fruit in a dict with id as key and the rest of the data as tuples. Or not. It totally depends. If there's a guiding principle out there then it's to extract the underlying requirements of the app and then write code against those requirements.
you could have a fruit class with id and name instance variables. and a function to read/write the information from a file, and maybe a class variable to keep track of the number of fruits (objects) created
In the simple case namedtuples let get you started:
>>> from collections import namedtuple
>>> Fruit = namedtuple("Fruit", "name weight color")
>>> fruits = [Fruit(*row) for row in cursor.execute('select * from fruits')]
Fruit is equivalent to the following class:
>>> Fruit = namedtuple("Fruit", "name weight color", verbose=True)
class Fruit(tuple):
'Fruit(name, weight, color)'
__slots__ = ()
_fields = ('name', 'weight', 'color')
def __new__(cls, name, weight, color):
return tuple.__new__(cls, (name, weight, color))
#classmethod
def _make(cls, iterable, new=tuple.__new__, len=len):
'Make a new Fruit object from a sequence or iterable'
result = new(cls, iterable)
if len(result) != 3:
raise TypeError('Expected 3 arguments, got %d' % len(result))
return result
def __repr__(self):
return 'Fruit(name=%r, weight=%r, color=%r)' % self
def _asdict(t):
'Return a new dict which maps field names to their values'
return {'name': t[0], 'weight': t[1], 'color': t[2]}
def _replace(self, **kwds):
'Return a new Fruit object replacing specified fields with new values'
result = self._make(map(kwds.pop, ('name', 'weight', 'color'), self))
if kwds:
raise ValueError('Got unexpected field names: %r' % kwds.keys())
return result
def __getnewargs__(self):
return tuple(self)
name = property(itemgetter(0))
weight = property(itemgetter(1))
color = property(itemgetter(2))
Another way would be to use the ZODB to directly store objects persistently. The only thing you have to do is to derive your classes from Peristent and everything from the root object up is then automatically stored in that database as an object. The root object comes from the ZODB connection. There are many backends available and the default is simple a file.
A class could then look like this:
class Collection(persistent.Persistent):
def __init__(self, fruit = []):
self.fruit = fruit
class Fruit(peristent.Persistent):
def __init__(self, name):
self.name = name
Assuming you have the root object you can then do:
fruit = Fruit("apple")
root.collection = Collection([fruit])
and it's stored in the database automatically. You can find it again by simply looking accessing 'collection' from the root object:
print root.collection.fruit
You can also derive subclasses from e.g. Fruit as usual.
Useful links with more information:
The new ZODB homepage
a ZODB tutorial
That way you still are able to use the full power of Python objects and there is no need to serialize something e.g. via an ORM but you still have an easy way to store your data.
Here are a couple points for you to consider. If your data is large reading it all into memory may be wasteful. If you need random access and not just sequential access to your data then you'll either have to scan the at most the entire file each time or read that table into an indexed memory structure like a dictionary. A list will still require some kind of scan (straight iteration or binary search if sorted). With that said, if you don't require some of the features of a DB then don't use one but if you just think MySQL is too heavy then +1 on the Sqlite suggestion from earlier. It gives you most of the features you'd want while using a database without the concurrency overhead.
Abstract persistence from the object class. Put all of the persistence logic in an adapter class, and assign the adapter to the object class. Something like:
class Fruit(Object):
#classmethod
def get(cls, id):
return cls.adapter.get(id)
def put(self):
cls.adapter.put(self)
def __init__(self, id, name, weight, color):
self.id = id
self.name = name
self.weight = weight
self.color = color
class FruitAdapter(Object):
def get(id):
# retrieve attributes from persistent storage here
return Fruit(id, name, weight, color)
def put(fruit):
# insert/update fruit in persistent storage here
Fruit.adapter = FruitAdapter()
f = Fruit.get(1)
f.name = "lemon"
f.put()
# and so on...
Now you can build different FruitAdapter objects that interoperate with whatever persistence format you settle on (database, flat file, in-memory collection, whatever) and the basic Fruit class will be completely unaffected.

Categories

Resources