Do Django delete db_index? - python

I am using postgresql for my django app.
Managed to delete almost 500000 rows, but size of my DB didn't lower significantly.
Deleted them with smth like lots.objects.filter(id__in=[ids]).delete() in chunks (because it's too hard to delete so many rows in one query).
Some columns have db_index=True, so I think indexes were not deleted.
Do I have the possibility to delete also indexes for deleted objects from django?
Maybe there is also a way to see unused indexes from Django?

None of this has anything to do with Django. If an item is deleted from a database, it is always automatically deleted from any indexes - otherwise indexing just wouldn't work.
Normally you should let Postgres itself determine the size of the database files. Deleted items are removed when a VACUUM operation is done; again, normally Postgres will do this via a regularly scheduled daemon. If you need to specifically recover space, then you can run VACUUM manually. See the docs.

Related

When I delete all the item's from my query in Django the item id's don't reset [duplicate]

I have been working on an offline version of my Django web app and have frequently deleted model instances for a certain ModelX.
I have done this from the admin page and have experienced no issues. The model only has two fields: name and order and no other relationships to other models.
New instances are given the next available pk which makes sense, and when I have deleted all instances, adding a new instance yields a pk=1, which I expect.
Moving the code online to my actual database I noticed that this is not the case. I needed to change the model instances so I deleted them all but to my surprise the primary keys kept on incrementing without resetting back to 1.
Going into the database using the Django API I have checked and the old instances are gone, but even adding new instances yield a primary key that picks up where the last deleted instance left off, instead of 1.
Wondering if anyone knows what might be the issue here.
I wouldn't call it an issue. This is default behaviour for many database systems. Basically, the auto-increment counter for a table is persistent, and deleting entries does not affect the counter. The actual value of the primary key does not affect performance or anything, it only has aesthetic value (if you ever reach the 2 billion limit you'll most likely have other problems to worry about).
If you really want to reset the counter, you can drop and recreate the table:
python manage.py sqlclear <app_name> > python manage.py dbshell
Or, if you need to keep the data from other tables in the app, you can manually reset the counter:
python manage.py dbshell
mysql> ALTER TABLE <table_name> AUTO_INCREMENT = 1;
The most probable reason you see different behaviour in your offline and online apps, is that the auto-increment value is only stored in memory, not on disk. It is recalculated as MAX(<column>) + 1 each time the database server is restarted. If the table is empty, it will be completely reset on a restart. This is probably very often for your offline environment, and close to none for your online environment.
As others have stated, this is entirely the responsibility of the database.
But you should realize that this is the desirable behaviour. An ID uniquely identifies an entity in your database. As such, it should only ever refer to one row. If that row is subsequently deleted, there's no reason why you should want a new row to re-use that ID: if you did that, you'd create a confusion between the now-deleted entity that used to have that ID, and the newly-created one that's reused it. There's no point in doing this and you should not want to do so.
Did you actually drop them from your database or did you delete them using Django? Django won't change AUTO_INCREMENT for your table just by deleting rows from it, so if you want to reset your primary keys, you might have to go into your db and:
ALTER TABLE <my-table> AUTO_INCREMENT = 1;
(This assumes you're using MySQL or similar).
There is no issue, that's the way databases work. Django doesn't have anything to do with generating ids it just tells the database to insert a row and gets the id in response from database. The id starts at 1 for each table and increments every time you insert a row. Deleting rows doesn't cause the id to go back. You shouldn't usually be concerned with that, all you need to know is that each row has a unique id.
You can of course change the counter that generates the id for your table with a database command and that depends on the specific database system you're using.
If you are using SQLite you can reset the primary key with the following shell commands:
DELETE FROM your_table;
DELETE FROM SQLite_sequence WHERE name='your_table';
Another solution for 'POSTGRES' DBs is from the UI.
Select your table and look for 'sequences' dropdown and select the settings and adjust the sequences that way.
example:
I'm not sure when this was added, but the following management command will delete all data from all tables and will reset the auto increment counters to 1.
./manage.py sqlflush | psql DATABASE_NAME

Django migration 11 million rows, need to break it down

I have a table which I am working on and it contains 11 million rows there abouts... I need to run a migration on this table but since Django trys to store it all in cache I run out of ram or disk space which ever comes first and it comes to abrupt halt.
I'm curious to know if anyone has faced this issue and has come up with a solution to essentially "paginate" migrations maybe into blocks of 10-20k rows at a time?
Just to give a bit of background I am using Django 1.10 and Postgres 9.4 and I want to keep this automated still if possible (which I still think it can be)
Thanks
Sam
The issue comes from a Postgresql which rewrites each row on adding a new column (field).
What you would need to do is to write your own data migration in the following way:
Add a new column with null=True. In this case data will not be
rewritten and migration will finish pretty fast.
Migrate it
Add a default value
Migrate it again.
That is basically a simple pattern on how to deal with adding a new row in a huge postgres database.

update existing cache data with newer items in django

I want to use caching in Django and I am stuck up with how to go about it. I have data in some specific models which are write intensive. records will get added continuously to the model. Each user has some specific data in the model similar to orders table.
Since my model is write intensive I am not sure how effective caching frameworks in Django are going to be. I tried Django view specific caching and I am try to develop a view where first it will pick up data from the cache. Then I will have another call which will bring in data which was added to the model after the caching was done. What I want to do is add the updated data to the original cache data and store it again.
It is like I don't want to expire my cache, I just want to keep adding to my existing cache data. may be once in 3 hrs I can clear it.
Is what I am doing right. Are there better ways than this. Can I really add to items in existing cache.
I will be very glad for your help
You ask about "caching" which is a really broad topic, and the answer is always a mix of opinion, style and the specific app requirements. Here are a few points to consider.
If the data is per user, you can cache it per user:
from django.core.cache import cache
cache.set(request.user.id,"foo")
cache.get(request.user.id)
The common practice it to keep a database flag that tells you if the user's data changed since it was cached. So before you fetch the data from cache, check only this flag from the DB. If the flag says nothing changed, get the data from cache. If it did change, pull from DB, replace the cache, and set the flag again.
The flag check should be fast and simple: one table, indexed by user.id, and a boolean flag field. This will squeeze a lot of index rows into a single DB page, and enables a fast fetching of a single one field row. Yet you still get a persistent updated main storage, that prevents the use of not updated cache data. You can check this flag in a middleware.
You can run expiry in many ways: clear cache when user logs out, run a cron script that clears items, or let the cache backend expire items. If you use a flag check before you use the cache, there is no issue in keeping items in cache except space, and caching backends handle that. If you use the django simple file cache (which is easy, simple and zero config), you will have to clear the cache. A simple cron script will do.

Django haystack indexing after new entity added

I'm wondering what operation should invoke after adding new entity to database to make this entity searchable with haystack:
should I only update the index?
should I rebuild the whole index?
What's problematic is that new entities will be added frequently and there might be potentially large amount of entities in db.
If you're adding new rows to your database then update_index should be enough.
From the haystack docs:
The conventional method is to use SearchIndex in combination with cron
jobs. Running a ./manage.py update_index every couple hours will keep
your data in sync within that timeframe and will handle the updates in
a very efficient batch.
If you added a new field to your search index, then you would need to run rebuild_index:
If you have an existing SearchIndex and you add a new field to it,
Haystack will add this new data on any updates it sees after that
point. However, this will not populate the existing data you already
have.
In order for the data to be picked up, you will need to run
./manage.py rebuild_index. This will cause all backends to rebuild the
existing data already present in the quickest and most efficient way.

secure hash as database table key

I have a database table that is populated by a long running process. This process reads external data and updates the records in the database. Instead of really updating the records, it is easier to cascade-delete them and recreate. This way all the dependencies will be cleaned up too.
Each record has a unique name. I need to find a way to generate identifiers for these records in such a way that the same names are identified by the same identifiers. So that the identifier stays the same when the record is deleted and recreated. I tried using slugs but they can become very long and Django's SlugField does not always work.
Is it reasonable to use a secure hash as the key? I could create a hash from the slug and use that. Or is it too expensive?

Categories

Resources