Why is a pickled SQLAlchemy model object smaller than its pickled `__dict__`? - python

The same attributes stored in __dict__ are needed to restore the object, right?

I think a SQLAlchemy RowProxy uses _row, a tuple, to store the value. It doesn't have a __dict__, so no storage overhead of a _dict__ per row. Its _parent object has fields which store the column names to index pos in tuple lookup. Pretty common thing to do if you are trying to cut on down sql fetching result sizes - the column list is always the same for each row of the same select, so you rely on a common parent to keep track of which index of the tuple holds which column rather than having your own per-row __dict__.
Additional advantage is that, at the db lib connect level, sql cursors return (always?) their values in tuples, so you have little processing overhead. But a straight sql fetch is just that, a cursor descr & a bunch of disconnected rows with tuples in them - SQLALchemy bridges that and allows to use column names.
Now, as to how the unpickling process goes, you'd have to look at the actual implementation.

Related

session.execute returns model objects instead of actual data

I have switched from connection.execute to session.execute. I am not able to get usable data from it. The results seem to contain references to models instead of actual row data.
with Session(engine) as s:
q = select(WarrantyRequest)
res = s.execute(q)
keys = res.keys()
data_list = res.all()
print(keys) # should print list of column names
print(data_list) # should print list of lists with row data
dict_list = s.execute(q).mappings().all()
print(dict_list) # should print list of dicts with column names as keys
It prints
RMKeyView(['WarrantyRequest'])
[(<models.mock.WarrantyRequest object at 0x7f4d065d3df0>,), ...]
[{'WarrantyRequest': <models.mock.WarrantyRequest object at 0x7f002b464df0>}, ... ]
When doing the same with connection.execute, I got the expected results.
What am I missing?
There is this paragraph in the docs which kind of describes the behaviour, but I am not able to tell what I am supposed to do to get data out of it.
It’s important to note that while methods of Query such as Query.all() and Query.one() will return instances of ORM mapped objects directly in the case that only a single complete entity were requested, the Result object returned by Session.execute() will always deliver rows (named tuples) by default; this is so that results against single or multiple ORM objects, columns, tables, etc. may all be handled identically.
If only one ORM entity was queried, the rows returned will have exactly one column, consisting of the ORM-mapped object instance for each row. To convert these rows into object instances without the tuples, the Result.scalars() method is used to first apply a “scalars” filter to the result; then the Result can be iterated or deliver rows via standard methods such as Result.all(), Result.first(), etc.
Querying a model
q = select(WarrantyRequest)
returns rows that consist of individual model instances. To get rows of raw data instead, query the model's __table__ attribute:
q = select(WarrantyRequest.__table__)
SQLAlchemy's ORM layer presents database tables and rows in an object-oriented way, on the assumption that the programmer wants to work with objects and their attributes rather than raw data.

How to implement this in python : Check if a vertex exist, if not, create a new vertex

I want to create a neptune db, and dump data to it. I download historical data from DynamoDB to S3, these files in csv format. The header in these csv like:
~id, someproperties:String, ~label
Then, I need to implement real-time streaming to this neptune db through lambda, in the lambda function, I will check if one vertex(or edges) exist or not, if exist, I will update the vertex(or edges), otherwise I creat a new one.
In python, my implementation like this:
g.V().hasLabel('Event').has(T.id, event['Id']).fold().coalesce(unfold(), addV('Event').property(T.id, event['Id'])).property(Cardinality.single, 'State', event['State']).property('sourceData', event['sourceData']).next()
Here I have some questions:
In real-time streaming, I need to query if vertex with a id already
there, so I need to query the nodes of historical data, so can
has(T.id, event['Id']) do this? or should I just use
has(id, event['Id']) or has("id", event['Id']) ?
I was using g.V().has('Event', T.id, event['Id']) instead of
g.V().hasLabel('Event').has(T.id, event['Id']), but got error
like cannot local NeptuneGraphTraversal.has(). Are these two
queries same thing?
Here's the three bits of Gremlin you had a question about:
g.V().has(T.id, "some-id")
g.V().has(id, "some-id")
g.V().has("id", "some-id")
The first two will return you the same result as id is a member of T (as a point of style, Gremlin users typically statically import id so that it can be referenced that way for brevity). The last traversal is different from the first two because, as a String value it refers to a standard property key named "id". Generally speaking, TinkerPop would recommend that you not use a property key name like "id" or "label" as it can lead to mistakes and confusion with values of T.
As for the second part of your question revolving around:
g.V().has('Event', T.id, event['Id'])
g.V().hasLabel('Event').has(T.id, event['Id'])
You can't pass T.id to the 3-ary form of has() as Kelvin points out as the step signature only allows a String in that second position. It also wouldn't make sense to allow T there because T.label is already accounted for by the first argument and T.id refers to the actual graph element identifier. If you know that value then you wouldn't bother specifying the T.label in the first place, as the T.id already uniquely identifies the element. You would just do g.V(event['Id']).

Storing Python Lists in SQL Database

I have a SQL database that I store python lists in. Currently I convert the list to a string and then insert it into the database (using sqlite3) i.e.
foo = [1,2,3]
foo = str(foo)
#Establish connection with database code here and get cursor 'cur'
cur.execute("INSERT INTO Table VALUES(?, ?)", (uniqueKey, foo,))
It seems strange to convert my list to a string first, is there a better way to do this?
Replace your (key, listdata) table with (key, index, listitem). The unique key for the table becomes (key, index) instead of just key, and you'll want to ensure as a consistency condition that the set of indexes in the table for any given key is contiguous starting from 0.
You may or may not also need to distinguish between a key whose list is empty and a key that doesn't exist at all. One way is to have two tables (one of lists, and one of their elements), so that an empty but existing list is naturally represented as a row in the lists table with no corresponding rows in the elements table. Another way is just to fudge it and say that a row with index=null implies that the list for that key is empty.
Note that this is worthwhile if (and probably only if) you want to act on the elements of the list using SQL (for example writing a query to pull the last element of every list in the table). If you don't need to do that, then it's not completely unreasonable to treat your lists as opaque data in the DB. You're just losing the ability for the DB to "understand" it.
The remaining question then is how best to serialize/deserialize the list. str/eval does the job, but is a little worrying. You might consider json.dumps / json.loads, which for a list of integers is the same string format but with more safety restrictions in the parser. Or you could use a more compact binary representation if space is an issue.
2 ways.
Normalize tables, to you need setup new table for list value. so you get something like "TABLE list(id)" and "TABLE list_values(list_id, value)".
You can serialize the list and put in a column. Ex. Json, XML and so on (its not a very good practice in SQL).

Google App Engine non-indexed list property?

I have this property in GAE:
memberNames = db.StringListProperty(indexed = False)
But for unindexed properties, they usually don't cost me any writes (just the basic write to put the file), but with this property, I'm getting writes for every string in the list. Am I not allowed to have non-indexed ListProperties? Is my only other choice to use a JSON array string or is there a way around this?
StringListProperty creates several index rows (one row per value) and is stored only in the index, so you are correct that you need to implement your own serialization (e.g. JSON) of multi-value-properties rather than using StringListProperty to eliminate the index writes.

Reorder of SQLAlchemy Query results based on external ranking

The results of an ORM query (e.g., MyObject.query()) need to be ordered according to a ranking algorithm that is based on values not within the database (i.e. from a separate search engine). This means 'order_by' will not work, since it only operates on fields within the database.
But, I don't want to convert the query results to a list, then reorder, because I want to maintain the ability to add further constraints to the query. E.g.:
results = MyObject.query()
results = my_reorder(results)
results = results.filter(some_constraint)
Is this possible to accomplish via SQLAlchemy?
I am afraid you will not be able to do it, unless the ordering can be derived from the fields of the object's table(s) and/or related objects' tables which are in the database.
But you could return from your code the tuple (query, order_func/result). In this case the query can be still extended until it is executed, and then resorted. Or you could create a small Proxy-like class, which will contain this tuple, and will delegate the query-extension methods to the query, and query-execution methods (all(), __iter__, ...) to the query and apply ordering when executed.
Also, if you could calculate the value for each MyObject instance beforehand, you could add a literal column to the query with the values and then use order_by to order by it. Alternatively, add temporary table, add rows with the computed ordering value, join on it in the query and add ordering. But I guess these are adding more complexity than the benefit they bring.

Categories

Resources