I'm using flask with py2neo for my Rest service , I have a user node with the label "User".
how to autoincrement id for the "User" label , in neo4j using py2neo?
You don't, and you probably shouldn't. Neo4j already provides an internal id field that is an auto-incrementing integer. It isn't a property of the node, but is accessible via the id() function, like this:
MATCH (n:Person)
RETURN id(n);
So whenever you create any node, this already happens automatically for free by neo4j, and isn't done by py2neo.
If you need a different type of identifier for your code, I'd recommend something that's plausibly globally unique, like a UUID which is very easy to do in python, rather than an auto-incrementing integer.
The trouble with auto-incrementing numbers as IDs is that since they have a pattern to them (auto-incrementing) people come to rely on the value of the identifier, or come to rely on expectations of how the ID will be assigned. This is almost always a bad idea in databases. The sole purpose of the identifier is to be unique from everything else. It doesn't mean anything, and in some cases isn't even guaranteed not to change. Avoid embedding any reliance on any particular value or assignment scheme into your code.
That's why I like UUIDs, is because their assignment scheme is essentially arbitrary, and they clearly don't mean anything -- so they don't tempt designers to do anything clever with them. :)
Related
I just jumped into Django for a quick project and I figured there is a UUIDField in the models.
I am using this for an external id field that every model will have to expose the object. Will the default parameter handle the uniqueness or do I have to write it in the save? I mean I know there is practically no chance of values colliding, but just to know how it is done internally
How does UUID module guarantees unique values each time?
RFC 4122(UUID module specification) specifies three algorithms to generate UUIDs:
Using IEEE 802 MAC addresses as a source of uniqueness
Using pseudo-random numbers
Using well-known strings combined with cryptographic hashing
In all cases the seed value is combined with the system clock and a clock sequence value (to maintain uniqueness in case the clock was set backwards). As a result, the UUIDs generated according to the mechanisms above will be unique from all other UUIDs that have been or will be assigned.
Taken from RFC 4122 Abstract:
A UUID is 128 bits long, and can guarantee uniqueness across space and
time.
Note: Due to this uniqueness property of UUIDS, there is no check done by Django internally (as mentioned by #FlipperPA) to check if there already exists another object with the same uuid.
Django doesn't enforce the uniqueness of UUIDs. That's because the main use case for UUIDs is to provide an identifier that can be expected to be unique without having to check with a centralized authority (like a database, which is what unique=True does).
(Note that UUIDs are not guaranteed to be unique, there is just an astronomically small chance of a collision.)
You certainly can use the database to enforce uniqueness on top of the UUIDs if you want (by setting unique=True on your field), but I would say that's an unusual, and hard to justify, configuration.
No, it does not. Here is the relevant part of the code from Django:
def get_db_prep_value(self, value, connection, prepared=False):
if isinstance(value, six.string_types):
value = uuid.UUID(value.replace('-', ''))
if isinstance(value, uuid.UUID):
if connection.features.has_native_uuid_field:
return value
return value.hex
return value
As you can see, when preparing the value for the database, it simple calls uuid with a replace on hyphens; there's no check for existing uniqueness. That said, the UUIDField inherits from class Field, which will obey Django's models unique definition.
I am fond of using UUIDs as primary keys and I am fond of not delivering 500 errors to end users for a simple operation such as create a login. So I have the following classmethod in my model. I siphon off some pre-assigned reserved guids for synthetic transactions on the production database and don't want those colliding either. Cosmic lightning has struck before, a variant (instrumented to report collision) of the code below has actually fired the second attempt of guid assignment. The code shown below still risks a concurrent write collision from a different app server, so my views go back to this method if write/create operations in the view fail.
I do ack that this code is slower by the time cost of the db lookup, but as guid is my pk it is not ridiculously expensive when the underlying db uses a b-tree index on the field.
#classmethod
def attempt_to_set_guid(cls,attemptedGuid=None):
while(True):
try:
if attemptedGuid is None:
attemptedGuid = uuid4()
elif (attemptedGuid in cls.reserved_guids):
attemptedGuid = uuid4()
continue
alreadyExists = Guid.objects.get(guid=attemptedGuid)
break
except Exception as e:
break
return attemptedGuid
I'm working with SQLAlchemy for the first time and was wondering...generally speaking is it enough to rely on python's default equality semantics when working with SQLAlchemy vs id (primary key) equality?
In other projects I've worked on in the past using ORM technologies like Java's Hibernate, we'd always override .equals() to check for equality of an object's primary key/id, but when I look back I'm not sure this was always necessary.
In most if not all cases I can think of, you only ever had one reference to a given object with a given id. And that object was always the attached object so technically you'd be able to get away with reference equality.
Short question: Should I be overriding eq() and hash() for my business entities when using SQLAlchemy?
Short answer: No, unless you're working with multiple Session objects.
Longer answer, quoting the awesome documentation:
The ORM concept at work here is known as an identity map and ensures that all operations upon a particular row within a Session operate upon the same set of data. Once an object with a particular primary key is present in the Session, all SQL queries on that Session will always return the same Python object for that particular primary key; it also will raise an error if an attempt is made to place a second, already-persisted object with the same primary key within the session.
I had a few situations where my sqlalchemy application would load multiple instances of the same object (multithreading/ different sqlalchemy sessions ...). It was absolutely necessary to override eq() for those objects or I would get various problems. This could be a problem in my application design, but it probably doesn't hurt to override eq() just to be sure.
On the database side, I gather that a natural primary key is preferable as long as it's not prohibitively long, which can cause indexing performance problems. But as I'm reading through projects that use sqlalchemy via google code search, I almost always find something like:
class MyClass(Base):
__tablename__ = 'myclass'
id = Column(Integer, primary_key=True)
If I have a simple class, like a tag, where I only plan to store one value and require uniqueness anyway, what do I gain through a surrogate primary key, when I'm using sqlalchemy? One of the SQL books I'm reading suggests ORM's are a legitimate use of the 'antipattern,' but the ORMs he envisions sound more like ActiveRecord or Django. This comes up a few places in my model, but here's one:
class Tag(Base):
__tablename__ = 'tag'
id = Column(Integer, primary_key=True) #should I drop this and add primary_key to Tag.tag?
tag = Column(Unicode(25), unique=True)
....
In my broader, relational model, Tag has multiple many-to-many relationships with other objects. So there will be a number of intermediate tables that have to store a longer key. Should I pick tag or id for my primary key?
Although ORMs or programming languages make some usages easier than others, I think that choosing primary key is a database design problem unrelated to ORM. It is more important to get database schema right on its own grounds. Databases tend to live longer than code that accesses them, anyways.
Search SO (and google) for more general questions on how to chose primary key, e.g.: https://stackoverflow.com/search?q=primary+key+natural+surrogate+database-design ( Surrogate vs. natural/business keys, Relational database design question - Surrogate-key or Natural-key?, When not to use surrogate primary keys?, ...)
I assume that Tag table will not be very large or very dynamic.
In this case I would try to use tag as a primary key, unless there are important reasons to add some invisible to end user primary key, e.g.:
poor performance under real world data (measured, not imagined),
frequent changes of tag names (but then, I'd still use some unique string based on first used tag name as key),
invisible behind-the-scenes merging of tags (but, see previous point),
problems with different collations -- comparing international data -- in your RDBMS (but, ...)
...
In general I observed that people tend to err in both directions:
by using complex multi-field "natural" keys (where particular fields are themselves opaque numbers), when table rows have their own identity and would benefit from having their own surrogate IDs,
by introducing random numeric codes for everything, instead of using short meaningful strings.
Meaningful primary key values -- if possible -- will prove themselves useful when browsing database by hand. You won't need multiple joins to figure out your data.
Personally I prefer surrogate keys in most places; The two biggest reasons for this are 1) integer keys are generally smaller/faster and 2) Updating data doesn't require cascades. That second point is a fairly important one for what you are doing; If there are several many to many tables referencing the tag table, then remember that if someone wants to update a tag (eg, to fix a spelling/case mistake, or to use a more/less specific word, etc), the update will need to be done across all of the tables at the same time.
I'm not saying that you should never use a natural key -- If I am certain that the natural key will never be changed, I will consider a natural key. Just be certain, otherwise it becomes a pain to maintain.
Whenever I see people (over)using surrogate keys, I remember Roy Hann's blog articles regarding this topic, especially the second and the third article:
http://community.actian.com/forum/blogs/rhann/127-surrogate-keys-part-2-boring-bit.html
http://community.actian.com/forum/blogs/rhann/128-surrogate-keys-part-3-surrogates-composites.html
I strongly suggest people reading them as these articles come from a person who has spent few decades as database expert.
Nowadays surrogate key usage reminds me of early years of the 21 century when people used XML for literally everything, both where it did belong, and where it did not belong.
I currently work with Google's AppEngine and I could not find out, whether a Google DataStorage Object Entry has an ID by default, and if not, how I add such a field and let it increase automatically?
regards,
An object has a Key, part of which is either an automatically-generated numeric ID, or an assigned key name. IDs are not guaranteed to be increasing, and they're almost never going to be consecutive because they're allocated to an instance in big chunks, and IDs unused by the instance to which they're allocated will never be used by another instance (at least, not currently). They're also only unique within the same entity group for a kind; they're not unique to the entire kind if you have parent relationships.
Yes, they have id's by default, and it is named ID as you mentioned.
I'd also add that, per the documentation, the id is not guaranteed to increase:
An application should not rely on
numeric IDs being assigned in
increasing order with the order of
entity creation. This is generally the
case, but not guaranteed.
Given a SQLAlchemy mapped class Table and an instance of that class t, how do I get the value of t.colname corresponding to the sqlalchemy.org.attributes.InstrumentedAttribute instance Table.colname?
What if I need to ask the same question with a Column instead of an InstrumentedAttribute?
Given a list of columns in an ORDER BY clause and a row, I would like to find the first n rows that come before or after that row in the given ordering.
To get an objects attribute value corresponding to an InstrumentedAttribute it should be enough to just get the key of the attribute from it's ColumnProperty and fetch it from the object:
t.colname == getattr(t, Table.colname.property.key)
If you have a Column it can get a bit more complicated because the property that corresponds to the Column might have a different key. There currently doesn't seem to be a public API to get from a column to the corresponding property on a mapper. But if you don't need to cover all cases, just fetch the attr using Column.key.
To support descending orderings you'll either need to construct the desc() inside the function or poke a bit at non-public API's. The class of the descending modifier ClauseElement is sqlalchemy.sql.expression._UnaryExpression. To see if it is descending you'll need to check if the .modifier attribute is sqlalchemy.sql.operators.desc_op. If that case you can get at the column inside it via the .element attribute. But as you can see it is a private class, so watch for any changes in that area when upgrading versions.
Checking for descending still doesn't cover all the cases. Fully general support for arbitrary orderings needs to be able to rewrite full SQL expression trees replacing references to a table with corresponding values from an object. Unfortunately this isn't possible with public API's at this moment. The traversal and rewriting part is easy with sqlalchemy.sql.visitors.ReplacingCloningVisitor, the complex part is figuring out which column maps to which attribute given inheritance hierarchies, mappings to joins, aliases and probably some more parts that escape me for now. I'll give a shot at implementing this visitor, maybe I can come up with something robust enough to be worthy of integrating into SQLAlchemy.