Is there a DB/ORM pattern for attributes?

Is there a DB/ORM pattern for attributes? - python

i want to create an object with different key-value as attributes, for example:
animal
id
name
attribute
id
name
and mapping
animal_attribute
animal_id
attribute_id
so i can have a entry "duck", which has multiple attribute "flying", "swimming", etc. Each attribute type would have its own table defining some variables
attribute_flying
animal_id
height
length
..
attribute_swimming
animal_id
depth
..
is there a better way to do this? Also how would the object layout work in programming (python)?

You have several alternatives.
If you have not very deep hierarchy of objects and just several attributes, then you can create one table and add columns for every attribute you need to support
Other way is to create table for each object and map each attribute to different column.
Approach you want to use is not very good due to the following issues:
Hard to check if all required attibutes are existing for the animal.
Hard to load all attributes for the animal. (actually, if you want to load several animals in one query, then you stuck)
It is hard to use different values for attributes.
It is hard to make aggregate queries.
Actually this is so called Entity-Attribute-Value antipattern, as described in SQL Antipatterns book.
To resolve this antipattern it is required to rethink how you will store your inheritance in database. There are several approaches:
table per class hierarchy
table per subclass
table per concrete class
Exact solution depends on your task, currently it is hard to decide what is the best solution. Possible you should use table per subclass. In this case you will store common attributes in one table and all specific for the animal goes to the additional table.
sqlalchemy supports all three major types of inheritance, read about inheritance configuration in the documentation and choose what is best for your needs.

Related

How to structure a project into Object Oriented framework?

I have a project and would like ideas/tips on how I could tackle it. This is the project
Each component in a car has a number. Example:
"Hitch" has num: "43". I want to search and return
every car model in the database (CSV file) that has the number "43".
In the future, I would also like to be able to see information about
the car model. Example: "manufacturer", "HP", then info about the
"manufacturer" etc etc.
I have never done anything like this before, but I have researched and found that maybe OOP is the way to go? In that case, how could one structure it?

Since your main question is how do you break your requirement into an Object Oriented framework, I would structure it as follows:
You have objects called Parts and an object called Car.
Parts will have attributes: PartName(string), PartId(integer), PartManufacturer(string).
Car will have objects CarName(string), CarId(int), CarManufacturer(string).
A third object PartsInCar will track the relation between Car and Parts.
This object will have attributes CarPartId (int), CarId(Car), PartId(part).
Alternatively PartsInCar can also be an attribute of Car as a "vector of class Parts".
Depending on how granular you want to maintain the data, there is a possibility to create another class Manufacturer having ManufacturerId(int), ManufacturerName(string).
Now the question is how do you load your csv database into this structure ?
This depends on how your input data looks like.
if you are doing the whole thing in memory, then you could use vector or dictionary to store the whole "list" of parts, cars and parts_in_car.
For each class defined above you will of course need member functions that will allow us easy operation.
E.g.:
Cars object can have a member function that will return all Parts objects associated with it.
Parts object can have a member function that searches all Cars and shortlists those cars that are using this part.

you could use a dictionary with the key being the car name, and the value being a list of information about each car. You can assign certain aspects of the car to each index of the list to keep track of the data. you could use oop for this but it's a bit uncessary if I understood your question correctly

Best Practices to add custom attributes to pandas Dataframe

I am building a Table class to make it easy to retrieve data from a database, manipulate it arbitrarily in memory, then save it back. Ideally, these tables work for the python interpreter and normal code. "Work" means I can use all standard pandas Dataframe features, as well as all custom features from the Table class.
Generally, the tables contain data I use for academic research or personal interest. So, the user-base is currently just me, but for portability I'm trying to write as generically as possible.
I have seen several threads (example 1, example 2) discussing whether to subclass DataFrame, or use composition. After trying to walk through pandas's subclassing guide I decided to go for composition because pandas itself says this is easier.
The problem is, I want to be able to call any Dataframe function, property, or attribute on a Table, but I to do so, I have to keep track of any attribute I code into the Table class. See below, points of interest are metadata and __getattr__, everything else is meant to be illustrative.
class Table(object):
metadata = ['db', 'data', 'name', 'clean', 'refresh', 'save']
def __getattr__(self, name):
if name not in Table.metadata:
return getattr(self.data, name) #self.data is the Dataframe
def __init__(self, db, name):
#set up Table specific values
def refresh(self):
#undo all changes since last save
etc...
Obviously, having to explicitly specify the Table attributes versus the Dataframe ones is not ideal (though--to my understanding--this is how pandas implements column names as attributes). I could write out tablename.data.foo, but I find that unintuitive and non-pythonic. Is there a better way to achieve the same functionality?

Here's my understanding of your desired workflow: (1) you have a table in a database, (2) you read part/all(?) of it into memory as a pandas dataframe wrapped in a custom class (3) you make any manipulation you want and then (4) save it back to the database as the new state of that table.
I'm worried that arbitrary changes to the df could break db features
I'm guessing this is a relational db? Do other tables rely on primary keys of this table?
Are you trying to keep a certain schema?
Are you ok with adding/deleting/renaming columns arbitrarily?
If you decide there are an enumerable amount of manipulations, rather than an arbitrary amount, then I'd make a separate class method for each.
If you don't care about your db schema and your db table doesn't have relationships with other tables then I guess you can do arbitrary manipulations in memory and replace the db table each time
In this case I feel you are not benefiting from using a database over a CSV file
I guess one benefit could be the db is publicly accessible while the CSV wouldn't be (unless you were using S3 or something)

Google app engine: better way to make query

Say I have RootEntity, AEntity(child of RootEntity), BEntity(child of AEntity).
class RootEntity(ndb.Model):
rtp = ndb.StringProperty()
class AEntity(ndb.Model):
ap = ndb.IntegerProperty()
class BEntity(ndb.Model):
bp = ndb.StringProperty()
So in different handlers I need to get instances of BEntity with specific ancestor(instance of AEntity).
There is a my query: BEntity.query(ancestor = ndb.Key("RootEntity", 1, "AEntity", AEntity.query(ancestor = ndb.Key("RootEntity", 1)).filter(AEntity.ap == int(some_value)).get().key.integer_id()))
How I can to optimize this query? Make it better, may be less sophisticated?
Upd:
This query is a part of function with #ndb.transactional decorator.

You should not use Entity Groups to represent entity relationships.
Entity groups have a special purpose: to define the scope of transactions. They give you ability to update multiple entities transactionally, as long as they are a part of the same entity group (this limitation has been somewhat relaxed with the new XG transactions). They also allow you to use queries within transactions (not available via XG transactions).
The downside of entity groups is that they have an update limitation of 1 write/second.
In your case my suggestion would be to use separate entities and make references between them. The reference should be a Key of the referenced entity as this is type-safe.
Regarding query simplicity: GAE unfortunately does not support JOINs or reference (multi-entity) queries, so you would still need to combine multiple queries together (as you do now).

There is a give and take with ancestor queries. They are a more verbose and messy to deal with but you get a better structure to your data and consistency in your queries.
To simplify this, if your handler knows the BEntity you want to get, just pass around the key.urlsafe() encoded key, it already has all of your ancestor information encoded.
If this is not possible, try possibly restructuring your data. Since these objects are all of the same ancestor, they belong to the same entity group, thus at most you can insert/update ~1 time per second for objects in that entity group. If you require higher throughput or do not require consistent ancestral queries, then try using ndb.KeyProperty to link entities with a reference to a parent rather than as an ancestor. Then you'd only need to get a single parent to query on rather than the parent and the parent's parent.
You should also try and use IDs whenever possible, so you can avoid having to filter for entities in your datastore by properties and just reference them by ID:
BEntity.query(ancestor = ndb.Key("RootEntity", 1, "AEntity", int(some_value)))
Here, int(some_value) is the integer ID of the AEntity you used when you created that object. Just be sure that you can ensure the IDs you manually create/use will be unique across all instances of that Model that share the same parent.
EDIT:
To clarify, my last example should have been made more clear in that I was suggesting to restructure the data such that int(some_value) be used as the integer ID of the AEntity rather than storing is as a separate property of the Entity - if possible of course. From the example given, a query is performed for the AEntity objects that have a given integer field value of int(some_value) and executed with a get() - implying that you will always expect a single value return for that integer ID making it a good candidate to use as the integer ID for the key of that object eliminating the need for a query.

SQLAlchemy Custom Properties

I've got a table called "Projects" which has a mapped column "project". What I'm wanting to be able to do is to define my own property on my mapped class called "project" that performs some manipulation of the project value before returning it. This will of course create an infinite loop when I try to reference the row value. So my question is whether there's a way of setting up my table mapper to use an alias for the project column, perhaps _project. Is there any easy way of doing this?
I worked it out myself in the end. You can specify an alternative name when calling orm.mapper:
orm.mapper(MappedClass, table, properties={'_project': table.c.project})

Have you check the synonyms feature of Sqlalchemy
http://www.sqlalchemy.org/docs/05/reference/ext/declarative.html#defining-synonyms
http://www.sqlalchemy.org/docs/05/mappers.html#synonyms
?
I use this pretty often to provide a proper setter/getter public API for properties
having a pretty complicated underlaying data structure or in case where additional functionality/validation or whatever is needed.

How do I get the value of a property corresponding to a SQLAlchemy InstrumentedAttribute?

Given a SQLAlchemy mapped class Table and an instance of that class t, how do I get the value of t.colname corresponding to the sqlalchemy.org.attributes.InstrumentedAttribute instance Table.colname?
What if I need to ask the same question with a Column instead of an InstrumentedAttribute?
Given a list of columns in an ORDER BY clause and a row, I would like to find the first n rows that come before or after that row in the given ordering.

To get an objects attribute value corresponding to an InstrumentedAttribute it should be enough to just get the key of the attribute from it's ColumnProperty and fetch it from the object:
t.colname == getattr(t, Table.colname.property.key)
If you have a Column it can get a bit more complicated because the property that corresponds to the Column might have a different key. There currently doesn't seem to be a public API to get from a column to the corresponding property on a mapper. But if you don't need to cover all cases, just fetch the attr using Column.key.
To support descending orderings you'll either need to construct the desc() inside the function or poke a bit at non-public API's. The class of the descending modifier ClauseElement is sqlalchemy.sql.expression._UnaryExpression. To see if it is descending you'll need to check if the .modifier attribute is sqlalchemy.sql.operators.desc_op. If that case you can get at the column inside it via the .element attribute. But as you can see it is a private class, so watch for any changes in that area when upgrading versions.
Checking for descending still doesn't cover all the cases. Fully general support for arbitrary orderings needs to be able to rewrite full SQL expression trees replacing references to a table with corresponding values from an object. Unfortunately this isn't possible with public API's at this moment. The traversal and rewriting part is easy with sqlalchemy.sql.visitors.ReplacingCloningVisitor, the complex part is figuring out which column maps to which attribute given inheritance hierarchies, mappings to joins, aliases and probably some more parts that escape me for now. I'll give a shot at implementing this visitor, maybe I can come up with something robust enough to be worthy of integrating into SQLAlchemy.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.