Postgresql DROP TABLE doesn't work - python

I'm trying to drop a few tables with the "DROP TABLE" command but for a unknown reason, the program just "sits" and doesn't delete the table that I want it to in the database.
I have 3 tables in the database:
Product, Bill and Bill_Products which is used for referencing products in bills.
I managed to delete/drop Product, but I can't do the same for bill and Bill_Products.
I'm issuing the same "DROP TABLE Bill CASCADE;" command but the command line just stalls. I've also used the simple version without the CASCADE option.
Do you have any idea why this is happening?
Update:
I've been thinking that it is possible for the databases to keep some references from products to bills and maybe that's why it won't delete the Bill table.
So, for that matter i issued a simple SELECT * from Bill_Products and after a few (10-15) seconds (strangely, because I don't think it's normal for it to last such a long time when there's an empty table) it printed out the table and it's contents, which are none. (so apparently there are no references left from Products to Bill).

What is the output of
SELECT *
FROM pg_locks l
JOIN pg_class t ON l.relation = t.oid AND t.relkind = 'r'
WHERE t.relname = 'Bill';
It might be that there're other sessions using your table in parallel and you cannot obtain Access Exclusive lock to drop it.

Just do
SELECT pid, relname
FROM pg_locks l
JOIN pg_class t ON l.relation = t.oid AND t.relkind = 'r'
WHERE t.relname = 'Bill';
And then kill every pid by
kill 1234
Where 1234 is your actual pid from query results.
You can pipe it all together like this (so you don't have to copy-paste every pid manually):
psql -c "SELECT pid FROM pg_locks l
JOIN pg_class t ON l.relation = t.oid AND t.relkind = 'r'
WHERE t.relname = 'Bill';" | tail -n +3 | head -n -2 | xargs kill

So I was hitting my head against the wall for some hours trying to solve the same issue, and here is the solution that worked for me:
Check if PostgreSQL has a pending prepared transaction that's never been committed or rolled back:
SELECT database, gid FROM pg_prepared_xacts;
If you get a result, then for each transaction gid you must execute a ROLLBACK from the database having the problem:
ROLLBACK PREPARED 'the_gid';
For further information, click here.

If this is for AWS postgres run the first statement to get the PID (Process ID) and then run the second query to terminate the process (it would be very similar to do kill -9 but since this is in the cloud that's what AWS recommends)
-- gets you the PID
SELECT pid, relname FROM pg_locks l JOIN pg_class t ON l.relation = t.oid AND t.relkind = 'r' WHERE t.relname = 'YOUR_TABLE_NAME'
-- what actually kills the PID ...it is select statement but it kills the job!
SELECT pg_terminate_backend(YOUR_PID_FROM_PREVIOUS_QUERY);
source

I ran into this today, I was issuing a:
DROP TABLE TableNameHere
and getting ERROR: table "tablenamehere" does not exist. I realized that for case-sensitive tables (as was mine), you need to quote the table name:
DROP TABLE "TableNameHere"

Had the same problem.
There were not any locks on the table.
Reboot helped.

Old question but ran into a similar issue. Could not reboot the database so tested a few things until this sequence worked :
truncate table foo;
drop index concurrently foo_something; times 4-5x
alter table foo drop column whatever_foreign_key; times 3x
alter table foo drop column id;
drop table foo;

The same thing happened for me--except that it was because I forgot the semicolon. face palm

Related

Query executes on phpmyadmin, but not in python script

The statement is set-up so that when a record already exists, it doesn't add a record, else, it does.
I've tried changing the query, even though I don't see anything wrong with it.
I've let the script run on python, and print the query it executed. Then I pasted that query in phpmyadmin, where it executed succesfully.
I have also double checked all parameters.
Query (blank params):
INSERT INTO users (uname,pass) SELECT * FROM (SELECT '{}','{}') AS tmp WHERE NOT EXISTS(SELECT uname FROM users WHERE uname = '{}') LIMIT 1;
Query (filled in parameters):
INSERT INTO users (uname,pass) SELECT * FROM (SELECT 'john_doe','password') AS tmp WHERE NOT EXISTS(SELECT uname FROM users WHERE uname = 'john_doe') LIMIT 1;
Python script (the important part)
if action == "add_user":
username = form.getvalue('username')
password = form.getvalue('password')
query = """
INSERT INTO users (uname,pass) SELECT * FROM
(SELECT '{}','{}') AS tmp WHERE NOT EXISTS(SELECT uname FROM users WHERE uname = '{}') LIMIT 1;
""".format(username, password, username)
mycursor.execute(query)
I know a couple of things.
There is nothing wrong with the database connection.
The parameters are not empty (ex. username="john_doe" & password="secret")
The query actually executes in that specific table.
The query seems to add a record and delete it directly afterwards (as AUTO_INCREMENT increases each time, even when the python script executes and doesn't add anything)
A try except doesn't do anything, as mysql.connector.Error doesn't report any error (which is obvious, since the query actually executes succesfully)
phpMyAdmin practical example:
(Removed INSERT INTO part in order to be able to show the resulting tables)
The first time you enter the query (above query as example), it will result in a table with both values as both column names as column values.
Screenshot of table output: http://prntscr.com/nkgaka
Once that result is entered once, next time you will try to insert it, it will simply result in only column names, no values. This means it will insert nothing, as there is nothing to insert as there are no actual values.
Screenshot of table output: http://prntscr.com/nkgbp3
Help is greatly appreciated.
If you want to ensure any field is unique in a table, make that field a PRIMARY KEY or UNIQUE KEY. In this case you want name to be unique.
CREATE UNIQUE INDEX name ON users(name)
With this in place you only need to INSERT. If a duplicate key error occurs the name already exists.
To avoid SQL injection, don't use """SELECT {}""".format. It will be SQL injection vulnerable.
Also don't store plain text passwords, ever. Salted hashes at least. There's plenty of frameworks that do this well already so you don't need to invent your own.

How to scale psycopg2 insert and select with single process in python?

It takes average of about 0.300095081329 for my insert to go through to finish commit to postgres.
Here is my table pattern
id_table
latest_update_id (primary index)
product_id (index)
publish_date
product_meta_table
latest_update_id (index)
product_id (index)
meta_related_info1
meta_related_info2
...etc
product_table
latest_update_id (index)
product_id (index)
note_related_info1
note_related_info2
....etc
Here are some of my inserts
db_cursor.execute("INSERT INTO id_table (product_id, publish_date) \
VALUES (%s, %s) RETURNING latest_update_id",
(my_dict["product_id"], my_dict["publish_date"])
)
db_cursor.execute("INSERT INTO product_table ( \
latest_update_id, \
product_id, \
note_related_info1, \
note_related_info2, \
...etc) \
VALUES (%s, %s, %s, %s) RETURNING *",
(my_dict["latest_update_id"],
my_dict["product_id"],
my_dict["note_related_info1"],
my_dict["note_related_info2"])
)
Using the insert time my throughput is about 1/0.3 = 3qps
I know I can scale this horizontally by adding more instances but I want to try to see if I can hit at least 3000qps.
I am thinking of either using aync or threading, but was not sure of GIL is going to interfere or not.
Is there a general good practice and technique on how to scale insert statements using psycopg2?
Thanks
Note: I am using python 2.7
Note: python process is communicating with sql server through https
Note: the inserts to each table are staggered, table2 inserts after table1, table3 inserts after table2. Technically table2 and table3 only have to wait for table1 to finish insert because they need latest_update_id
Do a single insert query in instead of 3. Notice the triple quotes and dictionary parameter passing:
insert_query = """
with i as (
insert into id_table (product_id, publish_date)
values (%(product_id)s, %(publish_date)s)
returning latest_update_id
)
insert into product_table (
latest_update_id,
product_id,
note_related_info1,
note_related_info2
) values (
(select latest_update_id from i),
%(product_id)s, %(note_related_info1)s, %(note_related_info2)s
)
returning *
"""
db_cursor.execute(insert_query, my_dict)
Followup on my network comment.
Say you have 100ms roundtrip (like the time for SELECT 1).
If you want to chain queries, then you will have no other choice than to do INSERT... with tons of values to amortize the roundtrip time.
This is cumbersome, as you then will have to sort through the returned ids, to insert the dependent rows. Also, if your bandwidth is low, you will saturate it, and it won't be that fast anyway.
If your bandwidth is high enough but your ping is slow, you may be tempted to multithread... but this creates another problem...
Instead of having, say 1-2 server process churning through queries very fast, you'll have 50 processes sitting there doing nothing except waste valuable server RAM while they wait for the queries to come over the slow network.
Also, concurrency and lock issues may arise. You won't do just INSERTs... You're going to do some SELECT FOR UPDATE which grabs a lock...
...and then other processes pile up to acquire that lock while your next query crawls over the network...
This feels like using MyISAM in a concurrent write-intensive scenario. Locks should be held for the shortest time possible... fast pings help, putting the whole chain of queries from lock acquisition to release lock inside a stored proc is even better, so it is held for only a very short time.
So, consider executing your python script on the DB server, or on a server on the same LAN.

Error (1046, 'No database selected') happened in Python program, but not in mysql workbench

I have encounter a weird problem when using python to update a record in mysql database. That is, the error 1046 was thrown by python group, but the same mysql statement worked pretty fine in mysql workbench.
Here is the mysql statement,
UPDATE r resultant_data d
INNER JOIN
(SELECT
uid,
SUBSTRING_INDEX(GROUP_CONCAT(login_type
ORDER BY device_ct DESC), ',', 1) devices
FROM
(SELECT
uid, login_type, COUNT(*) AS device_ct
FROM
login_record l
WHERE
l.ctime > 1451577600
AND l.ctime < 1454256000
GROUP BY uid , login_type
ORDER BY device_ct DESC) a
GROUP BY uid) ct ON d.uid = ct.uid AND d.month_id = 1
SET
d.device = ct.devices
;
My task is to update the most used login device of one user during one month into table resultant_data based on the login_record table. So step one (innermost query): create a table that showcases uid, login_device, login times(i.e. device_ct). Step two (the second innermost query): based on the device_ct, find out the uid and login_type which is associated with the most device_ct. Step three (the update layer): match the uid and update the record into resultant_data.
So does the problem come from the python? Or mysql statement? I suspect the problem is due to "inner join" command (although it works fine in mysql workbench_. I have a similar problem before, which I solved by rewriting "inner join" as "where uid in (select....)". But for this task, is there a way to rewrite or restructure the statement?
Many thanks.
I don't think your problem is your SQL. Make sure when you are making your database connection in python that you include the database aka schema name that the table resides in:
db_connection = mysql.connector.connect(
host="192.168.1.101",
user="myusername",
passwd="mypassword",
database="database_table_lives_in"
)

Remove all data from table but last N entries

I'm using psycopg2 with Python.
I'd like to periodically flush data from my db. I've set up a task with Timer for this. I had asked this question before, but using the answer listed there freezes up my machine (keyboard stops responding and entire system grinds to halt). Instead, I would like to delete all entries in my table albeit the last N (Not sure that this is the right approach either).
Basically, there is another python process that is running (separate executable), which is populating the db that I wish to interrogate. It seems that if I delete all entries, and that other process is running, that it can lead to the freeze. I don't know of a safe way in which I can remove entries; it's almost as if the other process is relying on an incrementing ID as it writes to the db.
If anyone could help me work this out it'd be greatly appreciated. Thoughts?
A possible solution is to run a DELETE on all ids except those returned by select ... order by pk desc limit N given an autoincremental pk. If no such pk exists, having a created_date and ordering by it should do the same.
Non tested example:
import psycopg2
connection = psycopg2.connect('dbname=test user=postgres')
cursor = conn.cursor()
query = 'delete from my_table where id not in (
select id from my_table order by id desc limit 30)'
cursor.execute(query)
cursor.commit() #Don't know if necessary
cursor.close()
connection.close()
This is probably much faster:
CRETE TEMP TABLE tbl_tmp AS
SELECT * FROM tbl ORDER BY <undisclosed> LIMIT <N>;
TRUNCATE TABLE tbl;
INSERT INTO tbl SELECT * FROM tbl_tmp;
Do it all in one session. Specifics depend on additional circumstances you did not disclose.
Compare to this related, comprehensive answer (your case is simpler):
Remove duplicates from table based on multiple criteria and persist to other table

Join with Pythons SQLite module is slower than doing it manually

I am using pythons built-in sqlite3 module to access a database. My query executes a join between a table of 150000 entries and a table of 40000 entries, the result contains about 150000 entries again. If I execute the query in the SQLite Manager it takes a few seconds, but if I execute the same query from Python, it has not finished after a minute. Here is the code I use:
cursor = self._connection.cursor()
annotationList = cursor.execute("SELECT PrimaryId, GOId " +
"FROM Proteins, Annotations " +
"WHERE Proteins.Id = Annotations.ProteinId")
annotations = defaultdict(list)
for protein, goterm in annotationList:
annotations[protein].append(goterm)
I did the fetchall just to measure the execution time. Does anyone have an explanation for the huge difference in performance? I am using Python 2.6.1 on Mac OS X 10.6.4.
I implemented the join manually, and this works much faster. The code looks like this:
cursor = self._connection.cursor()
proteinList = cursor.execute("SELECT Id, PrimaryId FROM Proteins ").fetchall()
annotationList = cursor.execute("SELECT ProteinId, GOId FROM Annotations").fetchall()
proteins = dict(proteinList)
annotations = defaultdict(list)
for protein, goterm in annotationList:
annotations[proteins[protein]].append(goterm)
So when I fetch the tables myself and then do the join in Python, it takes about 2 seconds. The code above takes forever. Am I missing something here?
I tried the same with apsw, and it works just fine (the code does not need to be changed at all), the performance it great. I'm still wondering why this is so slow with the sqlite3-module.
There is a discussion about it here: http://www.mail-archive.com/python-list#python.org/msg253067.html
It seems that there is a performance bottleneck in the sqlite3 module. There is an advice how to make your queries faster:
make sure that you do have indices on the join columns
use pysqlite
You haven't posted the schema of the tables in question, but I think there might be a problem with indexes, specifically not having an index on Proteins.Id or Annotations.ProteinId (or both).
Create the SQLite indexes like this
CREATE INDEX IF NOT EXISTS index_Proteins_Id ON Proteins (Id)
CREATE INDEX IF NOT EXISTS index_Annotations_ProteinId ON Annotations (ProteinId)
I wanted to update this because I am noticing the same issue and we are now 2022...
In my own application I am using python3 and sqlite3 to do some data wrangling on large databases (>100000 rows * >200 columns). In particular, I have noticed that my 3 table inner join clocks in around ~12 minutes of run time in python, whereas running the same join query in sqlite3 from the CLI runs in ~100 seconds. All the join predicates are properly indexed and the EXPLAIN QUERY PLAN indicates that the added time is most likely because I am using SELECT *, which is a necessary evil in my particular context.
The performance discrepancy caused me to pull my hair out all night until I realized there is a quick fix from here: Running a Sqlite3 Script from Command Line. This is definitely a workaround at best, but I have research due so this is my fix.
Write out the query to an .sql file (I am using f-strings to pass variables in so I used an example with {foo} here)
fi = open("filename.sql", "w")
fi.write(f"CREATE TABLE {Foo} AS SELECT * FROM Table1 INNER JOIN Table2 ON Table2.KeyColumn = Table1.KeyColumn INNER JOIN Table3 ON Table3.KeyColumn = Table1.KeyColumn;")
fi.close()
Run os.system from inside python and send the .sql file to sqlite3
os.system(f"sqlite3 {database} < filename.sql")
Make sure you close any open connection before running this so you don't end up locked out and you'll have to re-instantiate any connection objects afterward if you're going back to working in sqlite within python.
Hope this helps and if anyone has figured the source of this out, please link to it!

Categories

Resources