I have the following model class.
class Human(db.Model):
email = db.StringProperty(required=True)
date = db.DateTimeProperty(auto_now=True)
checksum = db.IntegerProperty(required=True)
version = db.IntegerProperty(required=True)
content = blobstore.BlobReferenceProperty(required=True)
Currently, to ensure the uniqueness of email in database level, (Ensure there are no duplicated email in entire database) I am using the following method.
h = human.Human(key_name='yccheok#yahoo.com', email='yccheok#yahoo.com', checksum=456, version=1281, content=blob_key)
I am not sure this is a good way to do so? Or, is there any other better way?
This is really the only way to do it.
The email property is probably redundant in this case, since you're already storing the data in the key name.
The only other option is to give all of the Human entities the same parent, thus putting them together in an entity group, which will allow you to do updates within a transaction to check for an existing entity with the same email. However, this would prevent you from making more than about 1 change to all of your Human entities (and any of their children) per second, which doesn't sound bad for a small low-traffic site but will kill scalability.
Related
I have two tables, one is Anonym the other is Userdatabase. I want my app to work without requiring any login info therefore it will work with Anonym only by using the deviceid of the user to process account information. If however, a user wants to access extra features they need to create a user with username/password. Then I will process the data using Userdatabase table. A user can have multiple devices so there is a OneToMany relationship in there, but a device doesn't have to have a User (they don't need to register) which breaks the relationship. Is there a way to make the Userdatabase table optional while keeping the OneToMany relationship? Perhaps by inserting a method or another class within UserDatabase? Please find the code below:
--Models--
class Anonym(models.Model):
deviceid=models.ForeignKey(Userdatabase,max_length=200,on_delete=models.SET_NULL,null=True)
accounttype=models.TextField(default='Free')
numberofattempts=models.IntegerField(default=0)
created=models.DateField(auto_now_add=True)
class Userdatabase(models.Model):
username=models.CharField(max_length=20,unique=True)
password=models.CharField(max_length=20)
deviceid=models.TextField(default='inputdeviceid')
accounttype=models.TextField(default='Free')
numberofattempts=models.IntegerField(default=0)
created=models.DateField(auto_now_add=True)
--urls--
urlpatterns=[path('deviceregister/<str:id>/',views.deviceregistration)]
--views--
def deviceregistration(request,id):
import time
deviceid=id
newdevice-models.Anonym(created=time.strftime("%Y-%m-%d"),deviceid=deviceid)
newdevice.save()
return HttpResponse('Succesful registration')
When I send a request as '/deviceregister/123456/' for example, django raises an ValueError saying Cannot assign "'123456'": "Anonym.deviceid" must be a "Userdatabase" instance.
you should search by fieldname, which contains id. in your case it is deviceid_id.
newdevice=models.Anonym(created=time.strftime("%Y-%m-%d"),deviceid_id=deviceid)
deviceid in your case should be Userdatabase.objects.get(pk=id)
deviceid=Userdatabase.objects.get(pk=id)
newdevice=models.Anonym(created=time.strftime("%Y-%m-%d"),deviceid=deviceid)
in my opinion - field names in your project really can confuse anyone
If you do not want to change your model, you can just link any newly-added device to a dummy user. When later a user want to link a device, replace dummy with the real user.
If you can change your model, you can remove the foreign key relationship, and add another table which links the id of both side: one field for deviceid and the other userid.
I know both options kind of smell, but at least they should work :)
I have the following models:
class Company(ndb.Model):
name = ndb.StringProperty(indexed=False)
# some other fields
class User(polymodel.PolyModel):
company = ndb.KeyProperty(kind=Company)
# some other fields
class Object(ndb.Model):
user = ndb.KeyProperty(kind=User)
# some other fields
Now I have a user and I want to query Objects that are associated with other Users in the same company like this:
Object.query(Object.user.company == user.company)
Of course, this doesn't work, since Object.user is a key and I cannot access anything beyond that.
Is there any way to do it? I only need the company key, I was thinking on a ComputedProperty but I'm not sure if it's the best solution. Also, it would be better to query based on any field in company.
You need to denormalize and store redundant information, as the datastore doesn't support joins.
For instance given your models above, a user can only be a member of one company, if you really need to search all objects whose user is a member of a particular company then store the company key in the Object.
Use a computed property if that works best for you.
Alternately use a factory that always takes the User as a argument and construct Object that way.
I have two classes:
class User(ndb.Model): # key is user's email
phone = ndb.IntegerProperty(indexed=False)
...
class Question(ndb.Model):
user = ndb.KeyProperty(kind=User)
...
And use the following code to add user's question to the datastore:
q = Question()
...
user = User.get_by_id(email)
if user:
q.user = ndb.Key(User, email)
(questions could be added by unknown users)
Am I doing it correctly? Should I optimize the code somehow (use keys_only?) to decrease number of read/write NDB operations?
Would the user already be logged in? If yes, then you could store the userkey in the session and leave out the get_by_id.
Apart from that, get_by_id is pretty efficient, but if the user entity is big you should consider splitting it (because fetching a smaller entity is cheaper). Another cool thing about get_by_id is that it'll use memcache first, then the datastore if the entity wasn't in memcache. If the user is asking lots of questions the entity will probably be in memcache.
I have a Django application that gathers information about composers (in the musical sense) from various sources - APIs, HTTP POSTs, scraping, and so on.
Once this information is aggregated, it's not very high quality. So you might have "J S Bach" in one place, "J. S. Bach" in another, and various other mistakes. This leads to several rows in my table that represent the same person.
I want to eliminate these duplicates, by making "J. S. Bach" the canonical version, and have it so that if we ever see "J S Bach", we know to correct it. In reality, there are quite a lot of variations, but I'm happy for the process of correction to be a manual one with human input.
So my question is, what's the best way to represent this in code? At the moment, my model is:
class Composer(models.Model):
name = models.CharField(max_length=100)
Should I:
Have a new ComposerCorrection model, that maps composer_id to canonical_id?
Add an optional canonical_id to the Composer model?
Some other thing I've not considered?
It's also worth mentioning that there are other relationships that involve composer, such as a Work belonging to a Composer. When a correction happens, these IDs would also need to be re-pointed somehow, but I think that's not part of the main problem here.
Let me know if you'd like any more information!
Adding on to VascoP's answer (I'd make this a cmoment but there's a little too much code in it), you could store his replace_dic in the database so that you can add corrections through e.g. the Django admin, without having to change any code. This might look like:
class ComposerCorrection(models.Model):
wrong_name = models.CharField(max_length=100, unique=True)
canonical_name = models.CharField(max_length=100)
def correct_name(name):
try:
return ComposerCorrection.objects.get(wrong_name=name).canonical_name
except ComposerCorrection.DoesNotExist:
return name
Then you can put correct_name in the save() method of Composer (or as a pre-save signal), and also add VascoP's correctComposer function as a post-save signal for ComposerCorrection objects, so that adding a new one will fix the database without having to do anything else.
When you find a wrongly named Composer you should update these relationships and remove the wrongly named Composer:
def correctComposer(canonical_composer_name, wrong_composer_name):
canonical_composer = Composer.objects.get(name__exact=canonical_composer_name)
wrong_composer = Composer.objects.get(name__exact=wrong_composer_name)
# repeat this for each relationship
work = wrong_composer.work_set.all()
for entry in work:
entry.composer = canonical_composer
correction.save()
wrong_composer.delete()
EDIT: That works for previously inserted Composers. For auto-correcting upon insertion a different method could be used since we don't need to create new composers if there's already a canonical composer that suits him.
For this you can keep a dictionary (which should be kept near the model for readability) of frequent mistakes and a correcNames function:
replace_dic = {
'motzart' : 'Mozart',
'j s bach' : 'J. S. Bach'
}
def correctNames(name, dic):
return dic.get(name.lower(), name)
By making keys lowercase you get case-insensitive replacement which is kind of a bonus.
And then you might override the Composer save method like this:
def save(self, *args, **kwargs):
self.name = correctNames(self.name, replace_dic)
super(Composer, self).save()
If Composer only contains name before finishing data collection, for simplicity, I may choose not to normalize composer name to Composer at first, but store them in Work instance directly. Just as
class Work(models.Model):
composer_name = models.CharField(max_length=100)
...
And manually filter by composer name and perform batch update in the admin changelist of Work, w/ help of filter and action.
You could then create Composer instances and link Work instance to them, or even use composer_name as primary key of Composer..
After building a few application on the gae platform I usually use some relationship between different models in the datastore in basically every application. And often I find my self the need to see what record is of the same parent (like matching all entry with same parent)
From the beginning I used the db.ReferenceProperty to get my relations going, like:
class Foo(db.Model):
name = db.StringProperty()
class Bar(db.Model):
name = db.StringProperty()
parentFoo = db.ReferanceProperty(Foo)
fooKey = someFooKeyFromSomePlace
bars = Bar.all()
for bar in bar:
if bar.parentFoo.key() == fooKey:
// do stuff
But lately I've abandoned this approch since the bar.parentFoo.key() makes a sub query to fetch Foo each time. The approach I now use is to store each Foo key as a string on Bar.parentFoo and this way I can string compare this with someFooKeyFromSomePlace and get rid of all the subquery overhead.
Now I've started to look at Entity groups and wondering if this is even a better way to go? I can't really figure out how to use them.
And as for the two approaches above I'm wondering is there any downsides to using them? Could using stored key string comeback and bit me in the * * *. And last but not least is there a faster way to do this?
Tip:
replace...
bar.parentFoo.key() == fooKey
with...
Bar.parentFoo.get_value_for_datastore(bar) == fooKey
To avoid the extra lookup and just fetch the key from the ReferenceProperty
See Property Class
I think you should consider this as well. This will help you fetch all the child entities of a single parent.
bmw = Car(brand="BMW")
bmw.put()
lf = Wheel(parent=bmw,position="left_front")
lf.put()
lb = Wheel(parent=bmw,position="left_back")
lb.put()
bmwWheels = Wheel.all().ancestor(bmw)
For more reference in modeling. you can refer this Appengine Data modeling
I'm not sure what you're trying to do with that example block of code, but I get the feeling it could be accomplished with:
bars = Bar.all().filter("parentFoo " = SomeFoo)
As for entity groups, they are mainly used if you want to alter multiple things in transactions, since appengine restricts that to entities within the same group only; in addition, appengine allows ancestor filters ( http://code.google.com/appengine/docs/python/datastore/queryclass.html#Query_ancestor ), which could be useful depending on what it is you need to do. With the code above, you could very easily also use an ancestor query if you set the parent of Bar to be a Foo.
If your purposes still require a lot of "subquerying" as you put it, there is a neat prefetch pattern that Nick Johnson outlines here: http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine which basically fetches all the properties you need in your entity set as one giant get instead of a bunch of tiny ones, which gets rid of a lot of the overhead. However do note his warnings, especially regarding altering the properties of entities while using this prefetch method.
Not very specific, but that's all the info I can give you until you be more specific about exactly what you're trying to do here.
When you design your modules you also need to consider whether you want to be able to save this within a transaction. However only do this if you need to use transactions.
An alternative approach is to assign the parent like so:
from google.appengine.ext import db
class Foo(db.Model):
name = db.StringProperty()
class Bar(db.Model):
name = db.StringProperty()
def _save_entities( foo_name, bar_name ):
"""Save the model data"""
foo_item = Foo( name = foo_name )
foo_item.put()
bar_item = Bar( parent = foo_item, name = bar_name )
bar_item.put()
def main():
# Run the save in a transaction, if any fail this should all roll back
db.run_in_transaction( _save_transaction, "foo name", "bar name" )
# to query the model data using the ancestor relationship
for item in bar_item.gql("WHERE ANCESTOR IS :ancestor", ancestor = foo_item.key()).fetch(1000):
# do stuff