Keeping track of user habits and activities? - Django

Keeping track of user habits and activities? - Django - python

I was working on a project a few months ago, and had the need to implement an award system. Similar to StackOverflow's badge system. Badges
I might have not implemented it in the best possible way, and I am curious what your say in it would be.
What would a good way to track user activities, needed for badge awarding be?
Stackoverflow's system needs to know of a lot of information, and I also get the impression that there would be a lot of data processing complicating things.
I would assume that SO calculates badges once or twice every 24, and that maybe logs are stored or a server dedicated to badge calculation.
Thoughts?

I don't think is as complicated as you think. I highly doubt that SO calculates badges with some kind of user activity log (although technically the entire database is a user activity log). When I look at the lists of badges, I don't see anything that can't be implemented by running a SQL select query.
Some of the queries could be pretty complicated, and there might be some sort of fancy caching mechanism, but I don't see any reason why you would have to calculate badges in batches.

In general badge/point systems can be based on two things.
Activity log of interesting events, this is effectively the paper register receipt of what has happend such that you can re-compute from the ground up if it's ever needed. Can be as simple as (user_id, timestamp, event_id, event_detail)
Most of the time you've pre-designed your scoring/point system so you know exactly which counters to keep on a user. Now it's as simple as having a big record that contains all of the details. (user_id, reply_points, login_points, last_login, thumbs_up_points, etc.,etc.)
Now you can slap some simple methods on that model object and have it manage/store the points as needed.

Related

Database Design NBA

I am new to database design so sorry if this is obvious beginner question. I use python and sqlalchemy though I don't think that is relevant (the sample code below is psuedo code), though may be wrong. I have looked through some previous questions and didn't see this addressed. Anyway, on to the question. The goal here is to develop a database of NBA information which will have info on all games played and also box scores for every each player, for each game. There are a couple ways this DB can be designed.
Game(game_id, date, home_name, away_name, score)
Box_Score(game_id, player_name, date, points, rebounds)
In this situation if I want to get all the games the Los Angeles Lakers played I can just do
query(Game).filter(home_name=="lakers" or away_name=="lakers").all()
query(Box_Score).filter(player_name="kobe bryant")
Here is the second option for how to design this database:
Game(game_id, date, home_name=(foreignkey=Team.team_name), away_name, score)
Box_Score(game_id, player_name=foreignkey=Player.player_name), date, points, rebounds)
Team(team_name, home_games=relationship("Game"))
Player(player_name, box_scores=relationship("Box_Score"))
Then I can do
query(Team).filter(name=="lakers").first().games
query(Player).filter(name=="kobe bryant").first().box_scores
On the one hand it seems like the whole point of using a relational database is to set it up like in situation #2. On the other hand, I am not sure what extra functionality it gives me. So I guess my question is, which design do you recommend? Are there some benefits or disadvantages to either design that will become apparent down the line which I cannot see yet? And if you recommend the simpler design #1 which does not use table relationships, why is it that I am storing a decent amount of related information but don't need to use relational database? Thanks!!

The ideal data model for any database is highly subjective. If you are new to database design, you probably will not find the ideal schema until after you have created your application and tested it for an extended period of time. I would recommend reading up on some design basics, particularly Database Normalization, since you would probably benefit from a highly normalized schema, where data can be referenced in many different ways. Highly-normalized databases can suffer in the performance department if very large (which this does not seem like it would be), but you can always de-normalize data through the use of Materialized Views or other methods of caching.

Big mysql query versus an http post connection in terms of long term speed

right now I think i'm stuck between two main choices for grabbing a user's friends list.
The first is a direct connection with facebook, and the pulling the friends list out and creating a list of friend models with the json. (Takes quite a while whenever I try it out, like 2 seconds?)
The other is whenever a user logs in, the program will store his or her entire friends list inside a big friends model (note that even if two people have the same exact friends, two sets will still be stored, all friend models will have an FK back to the person who has these friends on their list).
Whenever a user needs his or her friends list, I just use django's filter to grab them.
Right now this is pretty fast but that's because it hasn't been tested with many people yet.
Based off of your guys experience, which of these two decisions would make the most sense long term?
Thank you

It depends a lot on what you plan on doing with the data. However, thinking long term you're going to have much more flexibility with breaking out the friends into distinct units than just storing them all together.
If the friend creation process is taking too long, you should consider off-loading it to a separate process that can finish it in the background, using something like Celery.

Giving users a "reputation system" - Should I...?

I'm thinking of adding a reputation system to my Django web application; the site is already being used so I'm trying to be careful about my choices.
Reputation is generated in all actions that contribute to the site, similar to Stackoverflow's system.
I know there are literally millions of ways of implementing this, and this is why I feel quite lost.
Two alternatives I am not sure about are:
Keep track of reasons why reputation was incremented
Ignore reasons in order to reduce complexity of the site and overhead
Would be happy with a few pointers, and directions. Would be very much appreciated!

In Django, I'd suggest having a property on the User (or Profile) model that calculates a user's reputation on-demand. Then, cache the reputation with your caching framework and/or store to the database for fast retrieval.
This way, in addition to having the records of what impacts reputation, you can change your reputation criteria at will.

keep track of the reasons, IMHO. It surely wouldn't be that complex, and you don't need to store a huge amount of information, just a datetime, point value, command, target, and originator. If the data gets to be too much after some time dump the DB to a backup medium and clear the history.

Reverse Search Best Practices?

I'm making an app that has a need for reverse searches. By this, I mean that users of the app will enter search parameters and save them; then, when any new objects get entered onto the system, if they match the existing search parameters that a user has saved, a notification will be sent, etc.
I am having a hard time finding solutions for this type of problem.
I am using Django and thinking of building the searches and pickling them using Q objects as outlined here: http://www.djangozen.com/blog/the-power-of-q
The way I see it, when a new object is entered into the database, I will have to load every single saved query from the db and somehow run it against this one new object to see if it would match that search query... This doesn't seem ideal - has anyone tackled such a problem before?

At the database level, many databases offer 'triggers'.
Another approach is to have timed jobs that periodically fetch all items from the database that have a last-modified date since the last run; then these get filtered and alerts issued. You can perhaps put some of the filtering into the query statement in the database. However, this is a bit trickier if notifications need to be sent if items get deleted.
You can also put triggers manually into the code that submits data to the database, which is perhaps more flexible and certainly doesn't rely on specific features of the database.
A nice way for the triggers and the alerts to communicate is through message queues - queues such as RabbitMQ and other AMQP implementations will scale with your site.

The amount of effort you use to solve this problem is directly related to the number of stored queries you are dealing with.
Over 20 years ago we handled stored queries by treating them as minidocs and indexing them based on all of the must have and may have terms. A new doc's term list was used as a sort of query against this "database of queries" and that built a list of possibly interesting searches to run, and then only those searches were run against the new docs. This may sound convoluted, but when there are more than a few stored queries (say anywhere from 10,000 to 1,000,000 or more) and you have a complex query language that supports a hybrid of Boolean and similarity-based searching, it substantially reduced the number we had to execute as full-on queries -- often no more that 10 or 15 queries.
One thing that helped was that we were in control of the horizontal and the vertical of the whole thing. We used our query parser to build a parse tree and that was used to build the list of must/may have terms we indexed the query under. We warned the customer away from using certain types of wildcards in the stored queries because it could cause an explosion in the number of queries selected.
Update for comment:
Short answer: I don't know for sure.
Longer answer: We were dealing with a custom built text search engine and part of it's query syntax allowed slicing the doc collection in certain ways very efficiently, with special emphasis on date_added. We played a lot of games because we were ingesting 4-10,000,000 new docs a day and running them against up to 1,000,000+ stored queries on a DEC Alphas with 64MB of main memory. (This was in the late 80's/early 90's.)
I'm guessing that filtering on something equivalent to date_added could be done used in combination the date of the last time you ran your queries, or maybe the highest id at last query run time. If you need to re-run the queries against a modified record you could use its id as part of the query.
For me to get any more specific, you're going to have to get a lot more specific about exactly what problem you are trying to solve and the scale of the solution you are trying accomplishing.

If you stored the type(s) of object(s) involved in each stored search as a generic relation, you could add a post-save signal to all involved objects. When the signal fires, it looks up only the searches that involve its object type and runs those. That probably will still run into scaling issues if you have a ton of writes to the db and a lot of saved searches, but it would be a straightforward Django approach.

Django shopping cart/basket solution (or should I DIM)?

I'm about to build a site that has about half a dozen fairly similar products. They're all DVDs so they fit into a very "fixed" database very well. I was going to make a DVD model. Tag them up. All very simple. All very easy.
But we need to be able to sell them. The current site outsources the whole purchasing system but that's not going to fly on the new site. We want to integrate everything right up until the payment (for both UX reasons plus we get to customise the process a lot more).
The other problem with the outsourced problem is it doesn't account for people that don't need to pay VAT (sales tax) or for the fact you get a discounts if you buy more than one of the same thing, or more than one SKU at the same time.
So I've been looking around.
Satchmo looks like a whole mini-framework. It has listing options that I just don't need with the quantities of SKUs I'm dealing with.
django-cart has been re-hashed as of March but it looks pretty abandoned since then.
I'm looking for something that will let me:
pass it a model instances, a price and a quantity
apply a quantities formula based on the number of unique SKUs and copies in the same title
list what's in the cart on every page
That's about it (but it's quite fiddly, nevertheless). I can handle the final order processing nonsense.
Or am I just being silly?
Should I just get on and Do It Myself? If that's your vote, I've never built a cart before so are there any considerations that are not obvious to somebody who has only used shopping carts before?

Since you asked: if your needs are that limited, it does sound like a DIY situation to me. I don't see what's so fiddly about it; what complexity there is is all in the pricing formula, and you're planning to supply that either way. Add in Django's built-in session support and you're most of the way there.

There is an open source solution available: http://www.getlfs.com
I don't know if you could tweak it to suit you but it's based on the technologies you mention. The license is very liberal and it is heavily maintained.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.