I have to periodically insert data into table #1 that contains foreign key reference to table #2. And table #2 is quite big - about 200.000 rows. I'm trying to check rows that must be inserted into table #1 for foreign key constraint by simply removing those rows that definitely can't be inserted, and my query looks like this:
DELETE FROM temp_table1
WHERE temp_table1.fk NOT IN (SELECT id FROM table2) AND
temp_table1.id_d IS NOT NULL;
The problem is, this method is veeery slow :( So is there any "right" method to insert rows in such situation?
I'm using Python3, Postgresql and Psycopg2, if it matters.
You do not need the delete step. insert directly in instead:
insert into table1
select t1.*
from
temp_table1 t1
inner join
table2 t2 on t1.fk = t2.id
where t1.id_d is not null
Related
I am creating a database from different CSV files. After doing this I have tried to define the primary key table by table but I got an error.
c.execute("ALTER TABLE patient_data ADD PRIMARY KEY (ID);").fetchall()
OperationalError: near "PRIMARY": syntax error
Maybe the best thing to avoid this error is to define the primary key when the table is create but I dont know how to do that. I have been working with python for a few years but today is my first approach with SQL.
This is the code I use to import a CSV to a table
c.execute('''DROP TABLE IF EXISTS patient_data''')
c.execute(''' CREATE TABLE patient_data (ID, NHS_Number,Full_Name,Gender, Birthdate, Ethnicity, Postcode)''')
patients_admitted.to_sql('patient_data', conn, if_exists='append', index = False)
c.execute('''SELECT * FROM patient_data''').fetchall()
This is too long for a comment.
If your table does not have data, just re-create it with the primary key definition.
If your table does have data, you cannot add a primary key in one statement. Why not? The default value is either NULL or constant. And neither is allowed as a primary key.
And finally, SQLite does not allow you to add a primary key to an existing table. The solution is to copy the data to another table, recreate the table with the structure you want, and then copy the data back in.
I have a MySQL server running on a remote host. The connection to the host is fairly slow and it affects the performance of the Python code I am using. I find that using the executemany() function makes a big improvement over using an loop to insert many rows. My challenge is that for each row I insert into one table, I need to insert several rows in another table. My sample below does not contain much data, but my production data could be thousands of rows.
I know that this subject has been asked about many times in many places, but I don't see any kind of definitive answer, so I'm asking here...
Is there a way to get a list of auto generated keys that were created using an executemany() call?
If not, can I use last_insert_id() and assume that the auto generated keys will be in sequence?
Looking at the sample code below, is there a simpler or better way do accomplish this task?
What if my cars dictionary were empty? No rows would be inserted so what would the last_insert_id() return?
My tables...
Table: makes
pkey bigint autoincrement primary_key
make varchar(255) not_null
Table: models
pkey bigint autoincrement primary_key
make_key bigint not null
model varchar(255) not_null
...and the code...
...
cars = {"Ford": ["F150", "Fusion", "Taurus"],
"Chevrolet": ["Malibu", "Camaro", "Vega"],
"Chrysler": ["300", "200"],
"Toyota": ["Prius", "Corolla"]}
# Fill makes table with car makes
sql_data = list(cars.keys())
sql = "INSERT INTO makes (make) VALUES (%s)"
cursor.executemany(sql, sql_data)
rows_added = len(sqldata)
# Find the primary key for the first row that was just added
sql = "SELECT LAST_INSERT_ID()"
cursor.execute(sql)
rows = cursor.fetchall()
first_key = rows[0][0]
# Fill the models table with the car models, linked to their make
this_key = first_key
sql_data = []
for car in cars:
for model in cars[car]:
sql_data.append((this_key, car))
this_key += 1
sql = "INSERT INTO models (make_key, model) VALUES (%s, %s)"
cursor.executemany(sql, sql_data)
cursor.execute("COMMIT")
...
I have, more than once, measured about 10x speedup when batching inserts.
If you are inserting 1 row in table A, then 100 rows in table B, don't worry about the speed of the 1 row; worry about the speed of the 100.
Yes, it is clumsy to get the ids generated by an insert. I have found no straightforward way like LAST_INSERT_ID, but that works only for a single-row insert.
So, I have developed the following to do a batch of "normalization" inserts. This is where you a have a table that maps strings to ids (where the string is likely to show up repeatedly). It takes 2 steps: First a batch insert of the "new" strings. Then fetch all the needed ids and copy them into the other table. The details are laid out here: http://mysql.rjweb.org/doc.php/staging_table#normalization
(Sorry, I am not fluent in python or the hundred other ways to talk to MySQL, so I can't give you python code.)
Your use case example is "normalization"; I recommend doing it outside the main transaction. Note that my code takes care of multiple connections, avoiding 'burning' ids, etc.
When you have subcategories ("make" + "model" or "city" + "state" + "country"), I recommend a single normalization table, not one for each.
In your example, pkey could be a 2-byte SMALLINT UNSIGNED (limit 64K) instead of a bulky 8-byte BIGINT.
So I am trying to figure out the proper way to use the sqlite database, but I feel like I got it all wrong when it comes to the Key/ID part.
I'm sure the question has been asked before and answered somewhere, but I have yet to find it, so here it goes.
From what I've gathered so far I am supposed to use the Key/ID for reference to entries across tables, correct?
So if table A has an entry with ID 1 and then several columns of data, then table B uses ID 1 in table A to access that data.
I can do that and it works out just fine as long as I already know the Key/ID.
What I fail to understand is how to do this if I don't already know it.
Consider the following code:
import sqlite3
conn = sqlite3.connect("./DB")
conn.execute("""CREATE TABLE IF NOT EXISTS Table_A (
A_id INTEGER NOT NULL PRIMARY KEY UNIQUE,
A_name TEXT
)""")
conn.execute("""CREATE TABLE IF NOT EXISTS Table_B (
B_id INTEGER NOT NULL PRIMARY KEY UNIQUE,
B_name TEXT,
B_A_id INTEGER
)""")
conn.execute("""INSERT INTO Table_A (A_name) VALUES ('Something')""")
conn.commit()
I now want to add an entry to Table_B and have it refer to the entry I just made in the B_A_id column.
How do I do this?
I have no idea what the Key/ID is, and all I do know is that it has 'Something' in in the A_name column. Can I find it without making a query for 'Something' or checking the database directly? Cause that feels a bit backwards.
Am I doing it wrong or am I missing something here?
Maybe I am just being stupid.
You don't need to know the A_id from Table_A.
All you need is the value of the column A_name, say it is 'Something', which you want to reference in Table_B and you can do it like this:
INSERT INTO Table_B (B_name, B_A_id)
SELECT 'SomethingInTableB', A_Id
FROM Table_A
WHERE A_name = 'Something'
or:
INSERT INTO Table_B (B_name, B_A_id) VALUES
('SomethingInTableB', (SELECT A_Id FROM Table_A WHERE A_name = 'Something'))
You are on the right path, but have run into the problem that the Connection.execute() function is actually a shortcut for creating a cursor and executing the query using that. To retrieve the id of the new row in Table_A explicitly create the cursor, and access the lastrowid attribute, for example:
c = conn.cursor()
c.execute("""INSERT INTO Table_A (A_name) VALUES ('Something')""")
print(c.lastrowid) # primary key (A_id) of the new row
For more information about Connection and Cursor objects, refer to the python sqlite3 documentation.
I have two tables, Table A and Table B. I have added one column to Table A, record_id. Table B has record_id and the primary ID for Table A, table_a_id. I am looking to deprecate Table B.
Relationships exist between Table B's table_a_id and Table A's id, if that helps.
Currently, my solution is:
db.execute("UPDATE table_a t
SET record_id = b.record_id
FROM table_b b
WHERE t.id = b.table_a_id")
This is my first time using this ORM -- I'd like to see if there is a way I can use my Python models and the actual functions SQLAlchemy gives me to be more 'Pythonic' rather than just dumping a Postgres statement that I know works in an execute call.
My solution ended up being as follows:
(db.query(TableA)
.filter(TableA.id == TableB.table_a_id,
TableA.record_id.is_(None))
.update({TableA.record_id: TableB.record_id}, synchronize_session=False))
This leverages the ability of PostgreSQL to do updates based on implicit references of other tables, which I did in my .filter() call (this is analogous to a WHERE in a JOIN query). The solution was deceivingly simple.
I would like to have in sqlite a "counter" table that always give me a new unique ID. I have managed what I need in the following way. First, I create the following table:
cursor.execute('''create table second (id integer primary key autoincrement, age integer)''')
Then I perform the following sequence of commands:
cursor.execute('''insert into second (age) values (1)''')
cursor.lastrowid
Each time when I execute the above two columns I get a new integer. It is exactly what I need. However, the above solution is not elegant since I use a column ("age") that I do not really need. The reason I used is following. I can create a table that contains only one column with the IDs:
cursor.execute('''create table first (id integer primary key autoincrement)''')
However, the problem is that I cannot manage to insert into this table. The following does not work:
cursor.execute('''insert into first () values ()''')
I get the following error message:
sqlite3.OperationalError: near ")": syntax error
Does anybody knows how to solve the described problem?
This should work:
sqlite> CREATE TABLE first (id integer primary key autoincrement);
sqlite> INSERT INTO first (id) VALUES (null);
sqlite> SELECT * FROM first;
1
sqlite> INSERT INTO first (id) VALUES (null);
sqlite> SELECT * FROM first;
1
2
The documentation says:
If no ROWID is specified on the insert, or if the specified ROWID has a value of NULL, then an appropriate ROWID is created automatically.
So you can either explicitly specify NULL:
INSERT INTO first(id) VALUES(NULL)
or specify no value at all:
INSERT INTO first DEFAULT VALUES