I'm new to app engine (and SO)...
I'm writing a twitter bot that replies to mentions. Wanting to remember the last tweet it replied to using since_id but im not sure of the best way to store that ID so the next time the page loads it can check it, reply to any since, and then overwrite and store the new ID.
Do I use memcache for this?
Cheers!
Unless you're fine with losing the data, don't use memcache for data storage; it should only be used as a cache. The reason for this is that although you can set how long it should keep this data, it's documented that it can delete it at any time.
Create a datastore entity to hold the object.
example:
class LastTweet(db.Model):
since_id = db.IntegerProperty()
Related
I want to add the 'check username available' functionality on my signup page using AJAX. I have few doubts about the way I should implement it.
With which event should I register my AJAX requests? We can send the
requests when user focus out of the 'username' input field (blur
event) or as he types (keyup event). Which provides better user
experience?
On the server side, a simple way of dealing with requests would be
to query my main 'Accounts' database. But this could lead to a lot
of request hitting my database(even more if we POST using the keyup
event). Should I maintain a separate model for registered usernames
only and use that to get better results?
Is it possible to use Memcache in this case? Initializing cache with
every username as key and updating it as we register users and use a
random key to check if cache is actually initialized or pass the
queries directly to db.
Answers -
Do the check on blur. If you do it on key up, you will be hammering your server with unnecessary queries, annoying the user who is not yet done typing, and likely lag the typing anyway.
If your Account entity is very large, you may want to create a separate AccountName entity, and create a matching such entity whenever you create a real Account (but this is probably an unnecessary optimization). When you create the Account (or AccountName), be sure to assign id=name when you create it. Then you can do an AccountName.get_by_id(name) to quickly see if the AccountName has already been assigned, and it will automatically pull it from memcache if it has been recently dealt with.
By default, GAE NDB will automatically populate memcache for you when you put or get entities. If you follow my advice in step 2, things will be very fast and you won't have to mess around with pre-populating memcache.
If you are concerned about 2 people simultaneously requesting the same user name, put your create method in a transaction:
#classmethod
#ndb.transactional()
def create_account(cls, name, other_params):
acct = Account.get_by_id(name)
if not acct:
acct = Account(id=name, other param assigns)
acct.put()
I would recommend the blur event of the username field, combined with some sort of inline error/warning display.
I would also suggest maintaining a memcache of registered usernames, to reduce DB hits and improve user experience - although probably not populate this with a warm-up, but instead only when requests are made. This is sometimes called a "Repository" pattern.
BUT, you can only populate the cache with USED usernames - you should not store the "available" usernames here (or if you do, use a much lower timeout).
You should always check directly against the DB/Datastore when actually performing the registration. And ideally in some sort of transactional method so that you don't have race conditions with multiple people registering.
BUT, all of this work is dependant on several things, including how busy your app is and what data storage tech you are using!
I'm writing a web app through Google App Engine and I'd like to have a script frequently update user profiles based on live, temporary information I'm getting from an XML feed. I'm doing this with a GAE background_thread so the site can continue to operate while this runs.
Outside this background thread, users can still navigate the website and thereby make changes to their profile.
The background thread does exactly what it should, updating user profiles based on the live XML data and re-entering the profile to the datastore. However when a user makes a change to their profile, the background thread is not picking up on the changes. The returned list from the ndb datastore query does not reflect the changes users make.
The curious detail is that it DOES reflect the correct changes if a new user is added to the datastore, it just doesn't reflect changes if a preexisting user profile is modified. I should be able to query/put the datastore from a background thread right?
The meat of the background thread:
def update_accounts():
while True:
# Get data from XML feed.
info_dict = get_live_data()
# Get all the users from the GAE database
gprofiles = mUserStats.query()
for profile in gprofiles:
# This isn't the actual condition but there's a condition here.
if needs_update in profile.m_needsUpdate:
# Modify the current profile.
profile.make_change(info_dict)
# Re enter into database.
profile.put()
# Add a sleep time as this doesn't need to run that frequently.
time.sleep(20)
class updateAccounts():
def start_thread(self):
t =background_thread.start_new_background_thread(target=update_accounts())
This is where profiles are modified:
def post(self):
session = get_current_session()
user_key = mUserStats_key(session['me'].m_email)
curr_user = mUserStats.get_by_id(session['me'].m_email, user_key)
curr_user.change_profile()
curr_user.put()
Just some random thoughts, don't really know which would work best (if any at all):
Instead of doing profile.put() inside the loop maybe you could store changed entities in a list and do some ndb.put_multi() calls after the loop? This would reduce the number of datastore calls by the number of mUserStats entities you have thus reducing the execution time and leaving less chances for a profile to be changed by a user while the background task is running.
If the gprofiles = mUserStats.query() line actually fetches whole entities maybe you could try doing keys_only=True and get each mUserStats entity individually inside the loop. This will increase the execution time and number of datastore calls by the number of mUserStats entities but there will be a lot less chances that an entity was changed by a user during the time it was fetched by the background task.
Are the properties updated by the XML feed the same properties updated by user? If not - maybe they could be stored in different models.
You could also take a look at query's cursors and iterators which might be helpful to automate suggestions 1 & 2.
I have about 1000 user account entities like this:
class UserAccount(ndb.Model):
email = ndb.StringProperty()
Some of these email values contain uppercase letters like JohnathanDough#email.com. I want to select all the email values from all UserAccount entities and apply python's email.lower(). How can I do this efficiently, and most importantly, without errors?
Note: The email values are important for login, so I cannot afford to mess this up. Is there a way to backup this data in case of the event that I do make a mistake?
Thank you.
Yes, off course. Even if Datastore Administration is an experimental feature we can backup and restore data without coding. Follow this instruction for the backup flow: Backing up data.
To processing your data instead, the most efficient way is to use the MapReduce library.
Mapreduce works but its an excesive complication if youve never done it before.
Use task queues, each can handle a query result page, store the next pageToken and start another taskqueue for the next page.
Slower than mapreduce if you run the taskqueues secuentially. 1000 entries ia not much. Maybe in one minute it will be done.
I know sure is it me or everyone, I have a following code
http://api.twitter.com/1/statuses/user_timeline.xml?screen_name=barbara_volkwyn
http://api.twitter.com/1/statuses/user_timeline.xml?user_id=248623669
Apparently according to Twitter api, user with screen_name = "barbara_volkwyn" has the user id = 248623669, however, when I run the above API call I get totally different result, one thing that's even weirder is if I try to run the second API call, the users object contain in the returned result is not even the same user.
I wonder anyone has the same problem, feel free to give it a try.
Regards,
Andy.
your userID of barbara_volkwyn isn't valid. It should be: 264882189
You can fetch userID's trough the api or with https://tweeterid.com/
The user_ids reported by the Search API aren't the same as the user_ids used in the Twitter REST API -- unsure if that's where you found the user_id 248623669 or not though.
A timeline contains tweets which in turn contain embedded (but cached) user objects, usually reflecting the state of the user at the time the Tweet was published. Sometimes users change their screen_names, so a user by the name of #barbara_volkwyn might be user_id 1234 one day and user_id 5678 the next day, while the tweets that belonged to user_id 1234 will always belong to user_id 1234, regardless of the screen_name.
The user_id for #babara_volkwyn according to the REST API is 264882189. It's entirely possible that someone held the same screen name but a different user_id at another time. The only way to ever be certain about the identity of a Twitter user is to refer to them by their REST API user_id -- screen_names are transitory and can be modified by the end-user at any time.
As I mentioned, cached user objects used within statuses can become stale -- the most reliable source for up-to-date information about a single user account is the user/show API method. The most reliable source for up-to-date information on recent Tweets by an account is the statuses/user_timeline method.
The embedded objects work for most scenarios, but if you're looking for maximum accuracy, the distinct resources are best.
Thanks, Taylor.
I am struggling to find a good tutorial or best practices document for the use of memcache in app engine.
I'm pretty happy with it on the level presented in the docs. Get an object by ID, checking memcache first, but I'm unclear on things like:
If you cache a query, is there an accepted method for ensuring that the cache is cleared/updated when an object stored in that query is updated.
What are the effects of using ReferenceProperties ? If a cache a Foo object with a Bar reference. Is my foo.bar in memcache too and in need of clearing down if it gets updated from some other part of my application.
I don't expect answers to this here (unless you are feeling particularly generous!), but pointers to things I could read would be very gratefully received.
If you cache a query, is there an
accepted method for ensuring that the
cache is cleared/updated when an
object stored in that query is
updated.
Typically you wrap your reads with a conditional to retrieve the value from the main DB if it's not in the cache. Just wrap your updates as well to fill the cache whenever you write the data. That's if you need the results to be as up to date as possible - if you're not so bothered about it being out of date just set an expiry time that is low enough for the application to have to re-request the data from the main DB often enough.
One way to easily do this is by searching for any datastore put() call and set memcache after it. Next make sure to get data from memcache before attempting a datastore query.
Set memcache after writing to datastore:
data.put()
memcache.set(user_id, data)
Try getting data from memcache before doing a datastore query:
data = memcache.get(user_id)
if data is None:
data = Data.get_by_key_name(user_id)
memcache.set(user_id, data)
Using Memcache reduces app engine costs significantly. More details on how I optimized on app engine.
About reference properties, let's say you have MainModel and RefModel with a reference property 'ref' that points to a MainModel instance. Whenever you cal ref_model.ref it does a datastore get operation and retrieves the object from datastore. It does not interact with memcache in any way.