we have the following data base schema to store different types of data.
DataDefinition: basic information about the new data.
*FieldDefinition: Every DataDefinition has some fields. Every field has a type, title, etc, that information is stored here. Every DataDefinition has more than one FieldDefinition associated. I have put '' because we have a lot of different models, one for every kind of field supported.
DataValue, *FieldValues: we store the definition and the values in different models.
With this setup, to retrieve a data from our database we need to do a lot of queries:
Retrieve the DataDefinition.
Retrieve the DataValue.
Retrieve the *FieldDefinition associated to that DataDefinition.
Retrieve all the *FieldValues associated to those *FieldDefinition.
So, if n is the average number of fields of a DataDefinition, we need to make 2*n+2 queries to the database to retrieve a single value.
We cannot change this setup, but queries are quite slow. So to speed it up I have thought the following: store a joined version of the tables. I do not know if this is possible but I cannot think of any other way. Any suggestion?
Update: we are already using prefetch_related and select_related and it's still slow.
Use Case Right now: get an entire data object from the one object value:
someValue = SomeTypeValue.objects.filter(value=value).select_related('DataValue', 'DataDefinition')[0]
# for each *FieldDefinition/*FieldValue model
definition = SomeFieldDefinition.objects.filter(*field_definition__id=someValue.data_value.data_definition.id)
value = SomeFieldValue.objects.filter(*field_definition__id=definition.id)
And with that info you can now build the entire data object.
Django: 1.11.20
Python: 2.7
Related
There are two databases, Db_A and Db_B, each with their own data dictionary. Most of the data in my database, Db_A, will fit in some field or another of the target database Db_B. Many values from Db_A will require reformatting before being inserted into fields in Db_B, and some values to be inserted into Db_B will need to be derived from multiple fields in Db_A. Very few fields in Db_A will be transferable to Db_B without at least some processing. Some fields may require a lot of processing (especially those which are derived). Unfortunately the processing steps are not very consistent. Each field will essentially require its own unique conversion.
In other words, I have a large set of fields. Each field needs to be processed in a specific way. These fields may change and the way they need to be processed may change. What is the best way of implementing this system?
One way I've done this in the past was to have a central function which loops through each field, calling that fields function. I created one function per field and used a csv file to map fields to functions and to the parameters the functions need. That way if a new field is created, I can just update the csv file and write a method to handle its conversion. If the way a field is converted needs to be changed, I can just change the corresponding method.
Is this a good way of doing it? Any suggestions? I'm using Python.
I have little to no experience with databases and i'm wondering how i would go about storing certain parts of an object.
Let's say I have an object like the following and steps can be an arbitrary length. How would I store these steps or list of steps into an sql database?
class Error:
name = "" #name of error
steps = [] #steps to take to attempt to solve error
For your example you would create a table called Errors with metadata about the error such as an error_ID as the primary key, a name, date created, etc... then you'd create another table called Steps with it's own id, lets say Step_ID and any fields related to the step. The important part is you'd create a field on the Steps table that relates back to the Error that the steps are for we'll call that field again error_ID, then you'd make that field a foreign key so the database enforces that constraint.
If you want to store your Python objects in a database (or any other language objects in a database) the place to start is a good ORM (Object-Relational Mapper). For example Django has a built-in ORM. This link has a comparison of some Python Object-Relational mappers.
I have a model, Reading, which has a foreign key, Type. I'm trying to get a reading for each type that I have, using the following code:
for type in Type.objects.all():
readings = Reading.objects.filter(
type=type.pk)
if readings.exists():
reading_list.append(readings[0])
The problem with this, of course, is that it hits the database for each sensor reading. I've played around with some queries to try to optimize this to a single database call, but none of them seem efficient. .values for instance will provide me a list of readings grouped by type, but it will give me EVERY reading for each type, and I have to filter them with Python in memory. This is out of the question, as we're dealing with potentially millions of readings.
if you use PostgreSQL as your DB backend you can do this in one-line with something like:
Reading.objects.order_by('type__pk', 'any_other_order_field').distinct('type__pk')
Note that the field on which distinct happens must always be the first argument in the order_by method. Feel free to change type__pk with the actuall field you want to order types on (e.g. type__name if the Type model has a name property). You can read more about distinct here https://docs.djangoproject.com/en/dev/ref/models/querysets/#distinct.
If you do not use PostgreSQL you could use the prefetch_related method for this purpose:
#reading_set could be replaced with whatever your reverse relation name actually is
for type in Type.objects.prefetch_related('reading_set').all():
readings = type.reading_set.all()
if len(readings):
reading_list.append(readings[0])
The above will perform only 2 queries in total. Note I use len() so that no extra query is performed when counting the objects. You can read more about prefetch_related here https://docs.djangoproject.com/en/dev/ref/models/querysets/#prefetch-related.
On the downside of this approach is you first retrieve all related objects from the DB and then just get the first.
The above code is not tested, but I hope it will at least point you towards the right direction.
I have a set of ID's that I'd like to retrieve all of the objects for. My current solution works, however it hammers the database with a a bunch of get queries inside a loop.
objects = [SomeModel.objects.get(id=id_) for id_ in id_set]
Is there a more efficient way of going about this?
There's an __in (documentation here) field lookup that you can use to get all objects for which a certain field matches one of a list of values
objects = SomeModel.objects.filter(id__in=id_set)
Works just the same for lots of different field types (e.g. CharFields), not just id fields.
I'm designing a python application which works with a database. I'm planning to use sqlite.
There are 15000 objects, and each object has a few attributes. every day I need to add some data for each object.(Maybe create a column with the date as its name).
However, I would like to easily delete the data which is too old but it is very hard to delete columns using sqlite(and it might be slow because I need to copy the required columns and then delete the old table)
Is there a better way to organize this data other than creating a column for every date? Or should I use something other than sqlite?
It'll probably be easiest to separate your data into two tables like so:
CREATE TABLE object(
id INTEGER PRIMARY KEY,
...
);
CREATE TABLE extra_data(
objectid INTEGER,
date DATETIME,
...
FOREIGN KEY(objectid) REFERENCES object(id)
);
This way when you need to delete all of your entries from a date it'll be an easy:
DELETE FROM extra_data WHERE date = curdate;
I would try and avoid altering tables all the time, usually indicates a bad design.
For that size of a db, I would use something else. I've used sqlite once for a media library with about 10k objects and it was slow, like 5 minutes to query it all and display, searches were :/, switching to postgres made life so much easier. This is just on the performance issue only.
It also might be better to create an index that contains the date and the data/column you want to add and a pk reference to the object it belongs and use that for your deletions instead of altering the table all the time. This can be done in sqlite if you give the pk an int type and save the pk of the object to it, instead of using a Foreign Key like you would with mysql/postgres.
If your database is pretty much a collection of almost-homogenic data, you could as well go for a simpler key-value database. If the main action you perform on the data is scanning through everything, it would perform significantly better.
Python library has bindings for popular ones as "anydbm". There is also a dict-imitating proxy over anydbm in shelve. You could pickle your objects with the attributes using any serializer you want (simplejson, yaml, pickle)