How to keep deleted rows from table in variable in Django [duplicate] - python

This question already has answers here:
Force evaluate a lazy query
(2 answers)
Closed last month.
I want to delete all the rows in a tables but before that I want to store the old data and do some operation between old and new data.
This is what I have tried.
old_data = Rc.object.all()
Rc.object.all().delete()
# Now I fetch some api and apply saving operation on table
I have noticed that data in old_data is updated to the new data.
To test this I have done some thing like this
for rows in old_data:
check = Rc.object.filter(Q(id=dt.id)&Q(modified_date__gt=rows.modified_date)).exist()
print(check)
I found check is always false (rows with that id exists in the table)
But when I print each rows in old_data just after old_data = Rc.object.all(), old_data remains same(check becomes true) and does not updated to the new data.
Is there something I am missing? How can I store all rows in the variable and then delete the rows ? Please help.

For this you need to understand how and when Django execute the command:
In your below code :
old_data = Rc.object.all()
Rc.object.all().delete()
# Now I fetch some api and apply saving operation on table
the line old_data = Rc.object.all() is just an expression and would not be executed until you are using the old_data.
For example :
if you insert the line len(old_data) or print(old_data), the values would be stored.
QuerySets are evaluated only if you "consume" the queryset.

Related

SQLAlchemy sqlite3 remove value from JSON column on multiple rows with different JSON values

Say I have an id column that is saved as ids JSON NOT NULL using SQLAlchemy, and now I want to delete an id from this column. I'd like to do several things at once:
query only the rows who have this specific ID
delete this ID from all rows it appears in
a bonus, if possible - delete the row if the ID list is now empty.
For the query, something like this:
db.query(models.X).filter(id in list(models.X.ids)) should work.
now, I'd rather avoid iterating over each query and then send an update request as it can be multiple rows. Is there any elegant way to do this?
Thanks!
For the search and remove remove part you can use json_remove function (from SQLLite built-in functions)
from sqlalchemy import func
db.query(models.X).update({'ids': func.json_remove(models.X.ids,f'$[{TARGET_ID}]') })
Here replace TARGET_ID by the targeted id.
Now this will update the row 'silently' (wether or not this id is present in the array).
If you want to first check if target id is in the column: you can query first all rows containing the target id with json_extract query (calling .all() method and then remove those ids with an .update() call.
But this will cost you double amount of queries (less performant).
For the delete part, you can use the json_array_length built-in function
from sqlalchemy import func
db.query(models.X).filter(func.json_array_length(models.X.ids) == 0).delete()
FYI : Not sure that you can do both in one query, and even if possible, I would not do it for clean syntax, logging and monitoring reasons.

How do I upsert all the rows from one table into another table using Postgres?

I am working in Python, using Pandas to pull data from a TSV, convert it to a data frame, then syncing that data frame to a temp table in postgres using df.to_sql. That process works great.
However, once that table exists, I want to move all the rows from that table to the permanent table. The two tables will always be identical. The permanent table has a unique index, so if the id already exists it should update the row instead.
Here is my attempt to upsert all rows from one table to another:
WITH moved_rows AS (
DELETE FROM my_table_tmp a
RETURNING a.*
)
INSERT INTO my_table
SELECT * FROM moved_rows
ON CONFLICT ("unique_id") DO
UPDATE SET
Field_A = excluded.Field_A,
Field_B = excluded.Field_B,
Field_C = excluded.Field_C
Unfortunately, when I run this, I get this error:
psycopg2.errors.UndefinedColumn: column excluded.field_a does not exist
LINE 10: Field_A = excluded.Field_A,
^
HINT: Perhaps you meant to reference the column "excluded.Field_A".
But in fact, that column does exist. What am I missing here? I've tried removing Field_A from the set and then I get the same error with Field_B.
Answering my own question here - the issue is that Postgres ignores capitalization unless it's quoted.
This was not clear in the example I posted because I obscured the naming of the fields I was working with. I've updated them now to show the issue.
In order to fix this you need to wrap your field names with double quotes. E.g. "Field_A" = excluded."Field_A"

PySpark - Get the inserted row id after writing to PostgrSQL DB

I'm using PySpark to write a DataFrame to a PostgreSQL database via JDBC command below. How can I get the inserted row id ? which is set as identity column with auto-increment.
I'm using below command, not a for-loop inserting each row separately.
df.write.jdbc(url=url, table="table1", mode=mode, properties=properties)
I know I can use monotonicallyIncreasingId and set the IDs within Spark, but I'm looking for an alternative where the DB handles the assignment, but I want to get he IDs back to use in other DataFrames.
I didn't find this in the documentation.
The easiest way will be to query the table that you created and read that into a new data frame.
Alternatively, as you iterate over each row in either a for loop or generator, before you close the loop, fetch the ID of the record you just created and append each ID to a new column in the dataframe.

Can I use the name column in python? [duplicate]

This question already has answers here:
How to get fields by name in query Python?
(3 answers)
Closed 3 years ago.
I would like to be able to use my SQL table in python.
I managed to connect and access my table using the code.
I would like to know if it was possible to retrieve the column names in my code.
I already have my colomns:
ACTION | MANAGER
123 XYZ
456 ABC
PersonnesQuery = "SELECT........"
cursor = mydb.cursor()
cursor.execute(PersonnesQuery)
records_Personnes = cursor.fetchall()
for row in records_Personnes:
if (row["MANAGER"] == "XYZ"):
..................
elif (row["MANAGER"] == "ABC"):
................
the way quoted does not seem to work. However, it works with the number, for example row[0]. But I would like by column name
In order to access a field by key, in this case "MANAGER", row should be a dict. But due to the fact that you can access row by index, row is type list.
Meaning you can't access by key when using .fetchall() because it returns a list.
I'm assuming you're using the pymysql library, in which case there is a specific kind of cursor called a DictCursor. This will return the results as a dictionary rather than a list.
cursor = mydb.cursor(pymysql.cursors.DictCursor)
cursor.execute(PersonnesQuery)
records_Personnes = cursor.fetchall()

SQLite3 + Python CSV DictReader: Best Method to handle empty values

Still new to Python and I ran into an issue earlier this month where String '0' was being passed into my Integer Column (using a SQLite db). More information from my original thread:
SQL: Can WHERE Statement filter out specific groups for GROUP BY Statement
This caused my SQL Query statements to return invalid data.
I'm having this same problem pop up in other columns in my database when the CSV file does not contain any value for the specific cell.
The source of my data is from an external csv file that I download (unicode format). I use the following code to insert my code into the DB:
with sqlite3.connect(db_filename) as conn:
dbcursor = conn.cursor()
with codecs.open(csv_filename, "r", "utf-8-sig") as f:
csv_reader = csv.DictReader(f, delimiter=',')
# This is a much smaller column example as the actual data has many columns.
csv_dict = [( i['col1'], i['col2'] ) for i in csv_reader)
dbcursor.executemany(sql_str, csv_dict)
From what I researched, by design, SQLite does not enforce column type when inserting values. My solution to my original problem was to do a manual check to see if it was an empty value and then make it an int 0 using this code:
def Check_Session_ID( sessionID ):
if sessionID == '':
sessionID = int(0)
return sessionID
Each integer / float column will need to be checked when I insert the values into the Database. Since there will be many rows on each import (100K +) x (50+ columns) I would imagine the imports to take quite a bit of time.
What are better ways to handle this problem instead of checking each value for each Int / Float column per row?
Thank you so much for the advice and guidance.

Categories

Resources