Most efficient way to store a 5 point scale in Django

Most efficient way to store a 5 point scale in Django - python

I have the following model for a song:
class Song(models.Model):
name = models.CharField(max_length=255)
artist = models.CharField(max_length=255)
rating = models.SmallIntegerField()
I want to know if there is a better way to store the rating for a Song, as it is a scale from 1 to 5?
Better in the sense that it will take less space to store the same data in the database.
I have read through the Django documentation and the SmallIntegerField looks like the smallest value field.
But it can hold values ranging from "-32768 to 32767" which is overkill for my needs, as I just want to store 5 unique values (1 through 5).
Is there a smaller field type? Should I use a CharField(max_length=1)?
Any suggestions?

You can go one smaller and use a PositiveSmallIntegerField, which is guaranteed in all databases to store all integers from 0 to 32,767 (i.e. the positive range of a signed short).
The ideal answer for storage optimization would be to use something like a bit field where you can store scores in binary, but there is no field type like that in built-in Django that represents numerical values. You'd have to make your own Field that stores a 3-bit unsigned integer, with entails all of the appropriate development required to glue it into Django's ORM.

Related

In python django, would it be possible to extract data from database table and store it in an array?

I tried to extract the data using 'tablename.objects.Fname()' but I am still confused on how to store all the first names in the array from database.
if yes could anyone provide with an example, any sort of help would be appreciated.

You can obtain the values stored in a column by using .values(…), or .values_list(…). For example:
tablename.objects.values_list('Fname', flat=True)
This QuerySet is an iterable that for each record will contain one element with the cleaned value of that record. So if it is an ArrayField, it will contain a collection of lists.
But using an ArrayField [Django-doc] or other composite field is often not a good idea. It makes the items in the array harder to process, filter, JOIN, etc. Therefore it is often better to make an extra table, and define a many-to-one relation, for example with a ForeignKey [Django-doc].

When is it more logical to use Set as compared to Map?

For example I have a bunch of objects, say Cars. I need to keep track of a fleet of cars in the company. Obviously, cars are unique (VIN number), so they can be formed into a Set.
But I need everyday to change the state of some cars in the fleet, say to fill them with gas.
Which construct now makes sense? Why?
A dictionary with the VIN number as key and Car instance as value.
A Set of cars and then match on the Car.vin attribute.

Remember that things that both sets and maps/dicts work using hashes. Hashed elements need to be immutable and preferably small (faster hash calculation).
In your case, we don't know what is in the car instance and whether it changes. Maybe you can replace some parts or change the owner? If those are stored inside the instance, the car will have to be mutable.
So the obvious choice for the hash will be to go for a short VIN number that doesn't change. -> You should use a map (dict).
+In Python classes are by default mutable, so it's just easier to use some kind of (alpha)numerical id or a pair (tuple) of such elements.

from what I can see keeping the data in a dictionary is a more suitable approach
if you have to change the state of your Car objects this means you have to search for that car (by the VIN number) and a dict can do this job in O(1) time complexity, I do not see how you can search by the VIN number for a Car object in a set (you can't)

Django database planning - time series data

I would like some advice on how to best organize my django models/database tables to hold the data in my webapp
Im designing a site that will hold a users telemetry data from a racing sim game. So there will be a desktop companion app that will sample the game data every 0.1 seconds for a variety of information (car, track, speed, gas, brake, clutch, rpm, etc). For example, in a 2 minute race, each of those variables will hold 1200 data points (10 samples a second * 120 seconds).
The important thing here is that this list of data can be as many as 20 variables, and could potentially grow in the future. So 1200 * the number of variables you have is the amount of data for an individual race session. If a single user submits 100 sessions, and there are 100 users....the amount of data adds up very quickly.
The app will then ship all this data for a race session off to the database for the website. The data MUST be transferred between game and website via a CSV file. So structurally I am limited to what CSV can do. The website will then allow you to choose a race session/lap and plot this information on separate time series graphs (for each variable), and importantly allow you to plot your session against somebody elses to see where differences lie
My question here is how do you structure such a database to hold this much information?
The simplest structure I have in my mind is to have a separate table for each race track, then each row/entry will be a race session on that track. Fields in this table will be the variables above.
The problem I have is:
1) most of the variables in the list above are time series data and not individual values (e.g. var speed might look like: 70, 72, 74, 77, 72, 71, 65 where the values are samples spaced 0.1 seconds apart over the course of the entire lap). How do you store this type of information in a table/field?
2) The length of each var in the list above will always be the same length for any single race session (if your lap took 1min 35 then you all your vars will only capture data for that length of time), but given that I want to be able to compare different laps with each other, session times will be different for each lap. In other words, however I store the time series data for those variables, it must be variable in size
Any thoughts would be appreciated

One thing that may help you with HUGE tables is partitioning. Judging by the postgresql tag that you set for your question, take a look here: http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html
But for a start I would go with a one, simple table, supported by a reasonable set of indexes. From what I understand, each data entry in the table will be identified by race session id, player id and time indicator. Those columns should be covered with indexes according to your querying requirements.
As for your two questions:
1) You store those informations as simple integers. Remember to set a proper data types for those columns. For e.g. if you are 100% sure that some values will be very small, you can use smallint data type. More on integer data types here: http://www.postgresql.org/docs/9.3/static/datatype-numeric.html#DATATYPE-INT
2) That won't be a problem if you every var list will be different row in the table. You will be able to insert as many as you'd like.
So, to sum things up. I would start with a VERY simple single table schema. From django perspective this would look something like this:
class RaceTelemetryData(models.Model):
user = models.ForeignKey(..., index_db=True)
race = models.ForeignKey(YourRaceModel, index_db=True)
time = models.IntegerField()
gas = models.IntegerField()
speed = models.SmallIntegerField()
# and so on...
Additionaly, you should create an index (manually) for (user_id, race_id, time) columns, so looking up, data about one race session (and sorting it) would be quick.
In the future, if you'll find the performance of this single table too slow, you'll be able to experiment with additional indexes, or partitioning. PostgreSQL is quite flexible in modifying existing database structures, so you shouldn't have many problems with it.
If you decide to add a new variable to the collection, you will simply need to add a new column to the table.
EDIT:
In the end you end up with one table, that has at least these columns:
user_id - To specify which users data this row is about.
race_id - To specify which race data this row is about.
time - To identify the correct order in which to represent the data.
This way, when you want to get information on Joe's 5th race, you would look up rows that have user_id = 'Joe_ID' and race_id = 5, then sort all those rows by the time column.

How to store data pairs in django without an extra model?

I want to create an app that stores bills for me. Since I don't have fixed prices for anything there is no need to store my services in an extra model. I just want to store data pairs of "action" and "price" to print them out nicely in a table.
Is there something in django that can help me with that task or should I just put all data pairs together in a textfield and explode it every time i want to use it ?
The number of data pairs per bill is not fixed. Data pairs are used only in one bill, so i don't want an extra table.

Instead of a plain TextField you should look at field types that are better suited for storing structured data: In addition to Acorn's suggestion (django-hstore) a JsonField or a PickleField might be suitable (and more portable) solutions for your use case.

You might be interested in Postgres's hstore: http://www.postgresql.org/docs/9.1/static/hstore.html
https://github.com/jordanm/django-hstore/

How to store a dynamic List into MySQL column efficiently?

I want to store a list of numbers along with some other fields into MySQL. The number of elements in the list is dynamic (some time it could hold about 60 elements)
Currently I'm storing the list into a column of varchar type and the following operations are done.
e.g. aList = [1234122433,1352435632,2346433334,1234122464]
At storing time, aList is coverted to string as below
aListStr = str(aList)
and at reading time the string is converted back to list as below.
aList = eval(aListStr)
There are about 10 million rows, and since I'm storing as strings, it occupies lot space. What is the most efficient way to do this?
Also what should be the efficient way for storing list of strings instead of numbers?

Since you wish to store integers, an effective way would be to store them in an INT/DECIMAL column.
Create an additional table that will hold these numbers and add an ID column to relate the records to other table(s).
Also what should be the efficient way
for storing list of strings instead of
numbers?
Beside what I said, you can convert them to HEX code which will be very easy & take less space.
Note that a big VARCHAR may influence badly on the performance.
VARCHAR(2) and VARCHAR(50) does matter when actions like sotring are done, since MySQL allocates fixed-size memory slices for them, according to the VARCHAR maximum size.
When those slices are too large to store in memory, MySQL will store them on disk.

MySQL also has a SET type, it works like ENUM but can hold multiple items.
Of course you'd have to have a limited list, currently MySQL only supports up to 64 different items.

I'd be less worried about storage space and more worried about record retrieveal i.e., indexability/searching.
For example, I imagine performing a LIKE or REGEXP in a WHERE clause to find a single item in the list will be quite bit more expensive than if you normalized each list item into a row in a separate table.
However, if you never need to perform such queries agains these columns, then it just won't matter.

Since you are using relational database you should know that storing non-atomic values in individual fields breaks even the first normal form. More likely than not you should follow Don's advice and keep those values in related table. I can't say that for certain because I don't know your problem domain. It may well be that choosing RDBMS for this data was a bad choice altogether.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.