The only way I can find of adding new data to a TinyDB table is with the table.insert() method. However this appends the entry to the end of the table, but I would like to maintain the sequence of entries and sometimes I need to insert into an arbitrary index in the middle of the table. Is there no way to do this?
There is no way to do what you are asking. Normally, the default index created tracks insertion order. When you add data, it will go at the end. If you need to maintain a certain order, you could create a new property the handle that case, and retrieve with a sort on that property.
If you truly want to insert in a specific id, you would need to add some logic to cascade the documents down. The logic would flow as:
Insert a new record which is equal to the last record.
Then, go backwards and cascade the records to the new open location
Stop when you get to the location you need, and update the record with what you want to insert by using the ID.
The performance would drag since you are having to shift the records down. There are other ways to maintain the list - it would be similar to inserting a record in the middle of an array. Similar methods would ally here. Good Luck!
Related
In our DynamoDB database, we have table where we usually have thousands of items that are junk because of test_data and we clean it up once awhile.
But there is a specific item that we don't want to delete but when we do select all, that one gets deleted as well.
Is there a way in the table, where we define the ID and stop it from getting deleted? Or if someone comes and wants to delete all, it will delete everything except that one?
I can think of two options:
Add a policy, to anyone (or any role) who might perform this action, that denies permission to delete that item. You can accomplish this by Specifying Conditions: Using Condition Keys using the dynamodb:LeadingKeys condition key.
Add a stream handler to your table and any time the record is deleted you can automatically add it back.
The first option is probably best, but you would need to be sure it's always attached to the appropriate users/roles. You also need to be sure you are handling the error you're going to get when you try to delete the record you aren't allowed to delete.
The second option removes the need to worry about it but it comes with the overhead of a Lambda running everytime you create, update, or delete a record in the table (with some batching, so not EVERY change). It also opens up a brief period where the record will be deleted, so if it's important that the record NEVER be deleted then this isn't a viable option.
I hope I can make it as clear as possible to understand. I am maintaining data regarding stage movements i.e. when someone moves into stage and when someone moves out. I want bigquery table to have single entry for each stage movement(due to the kind of query I'll be doing on the data) but there are two updates for in and out and so this is what I am doing;
Normal Streaming insert when someone moves into stage
While Moving out:
a. Copy the truncated table to the same destination using a query like
SELECT * FROM my_dataset.my_table WHERE id !="id"
b. Do a streaming insert for the new row.
The problem is, there are random data drops when doing streaming inserts after the copy operation.
I found this link: After recreating BigQuery table streaming inserts are not working?
where it has been mentioned that there should be a delay of >2mins before doing streaming inserts in this case to avoid data drops but, I want it to be instantaneous since multiple stage movements can be happening in a matter of a few seconds. Is there a workaround or a fix for this? Or do I have to rethink my complete process in an append-only basis which isn't looking likely right now?
do I have to rethink my complete process in an append-only basis?
My suggestion for your particular case would be not to truncate table on each and every “move out”.
Assuming you have field that identifies recent row (timestamp or order, etc.) you can easily filter out old rows with something like
SELECT <your_fileds>
FROM (
SELECT
<your_fileds>,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY timestanp DESC) AS most_recent_row
FROM my_dataset.my_table
)
WHERE most_recent_row = 1
If needed you can do daily purging of "old/not latest" rows into truncated table using the very same approach as above
where it has been mentioned?
Maybe not that explicitly your case, but check Data availability section
And in How to change the template table schema read third paragraph (I feel it is related)
So I have an application in which a user creates a list. The user also orders the items in the list and can add and remove items from the list. If the user logs out and then logs in from another device, the list needs to be presented in the same order it was in before.
So far I have approached this problem by just adding a field called "order" to the records in the table. Let's say I have a list of 800 items. If the user deletes item 4, I cannot simply remove the record from the table -- I also have to update 796 records to reflect the new order of those items. If the user then adds an item, I have to not only add a record to the table, I have to update every item with an order count higher than the position the new item was added.
My approach seems expensive and naive to me. Is there some clever and efficient way to approach this problem? Any ideas? Thanks.
You should implement the equivalent of a doubly linked list where each node has a pointer to its previous and next node.
Inserting a node is only updating previous/next pointers so no need to update anything else.
Removing a node is only updating previous/next pointers (by having them point to each other) so no need to update anything else.
So instead of one order field you need two fields previous and next that indicate the previous and next node in the ordered list.
First off, this is my first project using SQLAlchemy, so I'm still fairly new.
I am making a system to work with GTFS data. I have a back end that seems to be able to query the data quite efficiently.
What I am trying to do though is allow for the GTFS files to update the database with new data. The problem that I am hitting is pretty obvious, if the data I'm trying to insert is already in the database, we have a conflict on the uniqueness of the primary keys.
For Efficiency reasons, I decided to use the following code for insertions, where model is the model object I would like to insert the data into, and data is a precomputed, cleaned list of dictionaries to insert.
for chunk in [data[i:i+chunk_size] for i in xrange(0, len(data), chunk_size)]:
engine.execute(model.__table__.insert(),chunk)
There are two solutions that come to mind.
I find a way to do the insert, such that if there is a collision, we don't care, and don't fail. I believe that the code above is using the TableClause, so I checked there first, hoping to find a suitable replacement, or flag, with no luck.
Before we perform the cleaning of the data, we get the list of primary key values, and if a given element matches on the primary keys, we skip cleaning and inserting the value. I found that I was able to get the PrimaryKeyConstraint from Table.primary_key, but I can't seem to get the Columns out, or find a way to query for only specific columns (in my case, the Primary Keys).
Either should be sufficient, if I can find a way to do it.
After looking into both of these for the last few hours, I can't seem to find either. I was hoping that someone might have done this previously, and point me in the right direction.
Thanks in advance for your help!
Update 1: There is a 3rd option I failed to mention above. That is to purge all the data from the database, and reinsert it. I would prefer not to do this, as even with small GTFS files, there are easily hundreds of thousands of elements to insert, and this seems to take about half an hour to perform, which means if this makes it to production, lots of downtime for updates.
With SQLAlchemy, you simply create a new instance of the model class, and merge it into the current session. SQLAlchemy will detect if it already knows about this object (from cache or the database) and will add a new row to the database if needed.
newentry = model(chunk)
session.merge(newentry)
Also see this question for context: Fastest way to insert object if it doesn't exist with SQLAlchemy
I have a table in a django app where one of the fields is called Order (as in sort order) and is an integer. Every time a new record is entered the field auto increments itself to the next number. My issue is when a record is deleted I would like the other records to shift a number up and cant find anything that would recalculate all the records in the table and shift them a number up if a record is deleted.
For instance there are 5 records in the table where order numbers are 1, 2, 3, 4, and 5. Someone deleted record number 2 and now I would like numbers 3, 4, and 5 to move up to take the deleted number 2's place so the order numbers would now be 1, 2, 3, and 4. Is it possible with python, postgres and django?
Thanks in Advance!
You are going to have to implement that feature yourself, I doubt very much that a relational db will do that for you, and for good reason: it means updating a potentially large number of rows when one row is deleted.
Are you sure you need this? It could become expensive.
Here what I ended up using:
item.delete()
items = table.objects.order_by('order')
count =0
for element in items:
element.order = count
element.save()
count=count+1
You're probably better off leaving the values in the table alone and using a query to generate the numbering. You can use window functions to do this if you're up to writing some SQL.
SELECT
output_column,
...,
row_number() over (
order by
order_column)
FROM
TheTable;
Instead of deleting orders - you should create a field which is a boolean (call it whatever you like - for example, deleted) and set this field to 1 for "deleted" orders.
Messing with a serial field (which is what your auto-increment field is called in postgres) will lead to problems later; especially if you have foreign keys and relationships with tables.
Not only will it impact your database server's performance; it also will impact on your business as eventually you will have two orders floating around that have the same order number; even though you have "deleted" one from the database, the order number may already be referenced somewhere else - like in a receipt your printed for your customer.
You could try using signals post_save and post_delete to query the appropriate objects, sort them, and then look for missing numbers and reassign/save as necessary. This might be pretty heavy for a lot of data, but for only a few items that change rarely, it would be ok.
from django.db.models.signals import post_delete
from django.dispatch import receiver
def fix_order(sorted_objects):
#ensures that the given objects have sequential order values from 1 upwards
i = 1
for item in sorted_objects
if item.order != i:
item.order = i
item.save()
i += 1
#receiver(post_delete, sender=YourSortedModel)
def update_order_post_delete(sender, kwargs):
#get the sorted items you need
sort_items = YourSortedModel.objects.filter(....).order_by('order')
fix_order(sort_items)
I came across this looking for something else and wanted to point something out:
By storing the order in a field in the same table as your data, you lose data integrity, or if you index it things will get very complicated if you hit a conflict. In other words, it's very easy to have a bug (or something else) give you two 3's, a missing 4, and other weird things can happen. I inherited a project with a manual sort order that was critical to the application (there were other issues as well) and this was constantly an issue, with just 200-300 items.
The right way to handle a manual sort order is to have a separate table to manage it and sort with a join. This way your Order table will have exactly 10 entries with just it's PK (the order number) and a foreign key relationship to the ID of the items you want to sort. Deleted items just won't have a reference anymore.
You can continue to sort on delete similar to how you're doing it now, you'll just be updating the Order model's FK to list instead of iterating through and re-writing all your items. Much more efficient.
This will scale up to millions of manually sorted items easily. But rather than using auto-incremented ints, you would want to give each item a random order id in between the two items you want to place it between and keep plenty of space (few hundred thousand should do it) so you can arbitrarily re-sort them.
I see you mentioned that you've only got 10 rows here, but designing your architecture to scale well the first time, as a practice, will save you headaches down the road, and once you're in the habit of it, it won't really take you any more time.
Try to set the value with type sequence in postgres using pgadmin.