I have a table with a column ID varchar(255) and a done bit.
I want fetch the first ID found, where the bit isn't set and whilst fetching also set the bit. So that no other instance of the script uses the same ID and no race condition is possible.
import _mssql
con = _mssql.connect(server='server', user='user', password='password', database='default')
#these two in a single command
con.execute_query('SELECT TOP 1 ID FROM tableA WHERE done=0')
con.execute_query('UPDATE tableA SET done=1 WHERE ID=\''+id_from_above+'\'')
for row in con:
#row['ID'] contains nothing as it last used with the UPDATE, not the SELECT
start_function(row['ID'])
edit (including the suggestion of wewesthemenace):
[...]
con.execute_query('UPDATE tableA SET done = 1 WHERE ID = (SELECT TOP 1 ID FROM tableA WHERE done = 0)')
for row in con:
#row['ID'] contains nothing as it last used with the UPDATE, not the SELECT
start_function(row['ID'])
Working on Microsoft SQL Server Enterprise Edition v9.00.3042.00, i.e. SQL Server 2005 Service Pack 2
edit 2:
The answered question lead me to a follow-up question: While mssql query returns an affected ID use it in a while loop
How about this one?
UPDATE tableA SET done = 1 WHERE ID = (SELECT TOP 1 ID FROM tableA WHERE done = 0)
Possible solution, which works in my situation.
con.execute_query('UPDATE tableA SET done=1 OUTPUT INSERTED.ID WHERE ID=(SELECT TOP(1) ID FROM tableA WHERE done=0)')
for row in con:
#row['ID'] is exactly one ID where the done bit wasn't set, but now is.
start_function(row['ID'])
Related
So I am trying to figure out the proper way to use the sqlite database, but I feel like I got it all wrong when it comes to the Key/ID part.
I'm sure the question has been asked before and answered somewhere, but I have yet to find it, so here it goes.
From what I've gathered so far I am supposed to use the Key/ID for reference to entries across tables, correct?
So if table A has an entry with ID 1 and then several columns of data, then table B uses ID 1 in table A to access that data.
I can do that and it works out just fine as long as I already know the Key/ID.
What I fail to understand is how to do this if I don't already know it.
Consider the following code:
import sqlite3
conn = sqlite3.connect("./DB")
conn.execute("""CREATE TABLE IF NOT EXISTS Table_A (
A_id INTEGER NOT NULL PRIMARY KEY UNIQUE,
A_name TEXT
)""")
conn.execute("""CREATE TABLE IF NOT EXISTS Table_B (
B_id INTEGER NOT NULL PRIMARY KEY UNIQUE,
B_name TEXT,
B_A_id INTEGER
)""")
conn.execute("""INSERT INTO Table_A (A_name) VALUES ('Something')""")
conn.commit()
I now want to add an entry to Table_B and have it refer to the entry I just made in the B_A_id column.
How do I do this?
I have no idea what the Key/ID is, and all I do know is that it has 'Something' in in the A_name column. Can I find it without making a query for 'Something' or checking the database directly? Cause that feels a bit backwards.
Am I doing it wrong or am I missing something here?
Maybe I am just being stupid.
You don't need to know the A_id from Table_A.
All you need is the value of the column A_name, say it is 'Something', which you want to reference in Table_B and you can do it like this:
INSERT INTO Table_B (B_name, B_A_id)
SELECT 'SomethingInTableB', A_Id
FROM Table_A
WHERE A_name = 'Something'
or:
INSERT INTO Table_B (B_name, B_A_id) VALUES
('SomethingInTableB', (SELECT A_Id FROM Table_A WHERE A_name = 'Something'))
You are on the right path, but have run into the problem that the Connection.execute() function is actually a shortcut for creating a cursor and executing the query using that. To retrieve the id of the new row in Table_A explicitly create the cursor, and access the lastrowid attribute, for example:
c = conn.cursor()
c.execute("""INSERT INTO Table_A (A_name) VALUES ('Something')""")
print(c.lastrowid) # primary key (A_id) of the new row
For more information about Connection and Cursor objects, refer to the python sqlite3 documentation.
I'm trying to find a way to copy data out of SQL Server 2014 and 2017 every time a table has an insert or update performed to it. I'm trying to do it in real time to insert these values into another table in PostgreSQL. Few options that I've explored are batch processing using tools such as:
Talend ETL tool
Foreign data wrappers in PostgreSQL that uses cron job to trigger
procedures that does hourly inserts and updates to PostgreSQL table
using data from SQL Server table.
I'm not sure how to get events from SQL server in real time that I can link with something like Kafka or even something like Python microservices or if there is a better way.
Use triggers
Create SQL Server & Postgresql tables:
-- SQL Server
create table test (id int identity(1,1) not null primary key, name varchar(25), description varchar(1000))
go
-- Postgresql:
CREATE TABLE public.test
(
id integer,
name character varying(25) COLLATE pg_catalog."default",
description character varying(1000) COLLATE pg_catalog."default"
)
Create a linked server in SQL Server to your Postgresql server.
Then create triggers on your SQL Server table:
create trigger iu_trigger_name on test
after insert, update
as
begin
UPDATE [SQLAuth_PG].[DefaultDB].[public].[test]
SET name = t.name, description = t.description
FROM [SQLAuth_PG].[DefaultDB].[public].[test] p
INNER JOIN inserted t ON p.id = t.id
INSERT INTO [SQLAuth_PG].[DefaultDB].[public].[test]
([id]
,[name]
,[description])
SELECT t.id, t.name, t.description
FROM inserted t
WHERE NOT EXISTS (
SELECT * FROM [SQLAuth_PG].[DefaultDB].[public].[test]
WHERE id = t.id
)
end
go
create trigger d_trigger_name on test
after delete
as
begin
delete p
FROM [SQLAuth_PG].[DefaultDB].[public].[test] p
inner join deleted d on p.id = d.id
end
go
Test:
insert into test (name, description) select 'Name1', 'Name 1 description'
go
select * from [SQLAuth_PG].[DefaultDB].[public].[test]
--output
--id name description
--1 Name1 Name 1 description
update test set description = 'Updated description!' where name = 'Name1'
go
select * from [SQLAuth_PG].[DefaultDB].[public].[test]
-- output
--id name description
--1 Name1 Updated description!
delete from test
go
select * from [SQLAuth_PG].[DefaultDB].[public].[test]
go
-- postgresql table is empty
The trigger in this example handles batch inserts and updates. That's the only real pitfall with triggers - assuming there's only one record in the "inserted" table. After a bulk insert or update, the inserted table is populated with all the new/modified records.
If you want to go the Kafka route, there are several options for getting data out of SQL Server into Kafka:
For log based CDC:
Debezium
kafka-connect-cdc-microsoft-sql
Plus Attunity, Goldengate, et al
For query based CDC:
kafka-connect-jdbc Source
Once the data's in Kafka you can stream it to Postgres (or any other database) using the kafka-connect-jdbc Sink.
I actually use Cx_Oracle library in Python to work with my database Oracle.
import cx_Oracle as Cx
# Parameters for server connexion
dsn_tns = Cx.makedsn(_ip, _port, service_name=_service_name)
# Connexion with Oracle Database
db = Cx.connect(_user, _password, dsn_tns)
# Obtain a cursor for make SQL query
cursor = db.cursor()
One of my query write in an INSERT of a Python dataframe into my Oracle target table among some conditions.
query = INSERT INTO ORA_TABLE(ID1, ID2)
SELECT :1, :2
FROM DUAL
WHERE (:1 != 'NF' AND :1 NOT IN (SELECT ID1 FROM ORA_TABLE))
OR (:1 = 'NF' AND :2 NOT IN (SELECT ID2 FROM ORA_TABLE))
The goal of this query is to write only rows who respect conditions into the WHERE.
Actually ,this query works well when my Oracle target table have few rows. But, if my target Oracle table have more than 100 000 rows, it's very slow because I read through all the table in WHERE condition.
Is there a way to improve performance of this query with join or something else ?
End of code :
# SQL query incoming
cursor.prepare(query)
# Launch query with Python dataset
cursor.executemany(None, _py_table.values.tolist())
# Commit changes into Oracle database
db.commit()
# Close the cursor
cursor.close()
# Close the server connexion
db.close()
Here is a possible solution that could help: The sql that you have has an OR condition and only one part of this condition will be true for a given value. So I would divide it in two parts by checking the following in the code and constructing two inserts instead of one and at any point of time, only one would execute:
IF :1 != 'NF' then use the following insert:
INSERT INTO ORA_TABLE (ID1, ID2)
SELECT :1, :2
FROM DUAL
WHERE (:1 NOT IN (SELECT ID1
FROM ORA_TABLE));
and IF :1 = 'NF' then use the following insert:
INSERT INTO ORA_TABLE (ID1, ID2)
SELECT :1, :2
FROM DUAL
WHERE (:2 NOT IN (SELECT ID2
FROM ORA_TABLE));
So you check in code what is the value of :1 and depending on that use the two simplified insert. Please check if this is functionally the same as original query and verify if it improves the response time.
Assuming Pandas, consider exporting your data as a table to be used as staging for final migration where you run your subquery only once and not for every row of data set. In Pandas, you would need to interface with sqlalchemy to run the to_sql export operation. Note: this assumes your connected user has such DROP TABLE and CREATE TABLE privileges.
Also, consider using EXISTS subquery to combine both IN subqueries. Below subquery attempts to run opposite of your logic for exclusion.
import sqlalchemy
...
engine = sqlalchemy.create_engine("oracle+cx_oracle://user:password#dsn")
# EXPORT DATA -ALWAYS REPLACING
pandas_df.to_sql('myTempTable', con=engine, if_exists='replace')
# RUN TRANSACTION
with engine.begin() as cn:
sql = """INSERT INTO ORA_TABLE (ID1, ID2)
SELECT t.ID1, t.ID2
FROM myTempTable t
WHERE EXISTS
(
SELECT 1 FROM ORA_TABLE sub
WHERE (t.ID1 != 'NF' AND t.ID1 = sub.ID1)
OR (t.ID1 = 'NF' AND t.ID2 = sub.ID2)
)
"""
cn.execute(sql)
I want to use sqlite3 in Python. I have a table with four columns (id INT, other_no INT, position TEXT, classification TEXT, PRIMARY KEY is id). In this table, the column for classification is left empty and will be updated by the information from table 2. See my code below. I then have a second table which has three columns. (id INT, class TEXT, type TEXT, PRIMARY KEY (id)). Basically, the two tables have two common columns. In both tables, the primary key is the id column, the classification and class column would eventually have to be merged. So the code needs to be able to go through table 2 and whenever it finds a matching id in table 1 to updating the class column (of table 1) from the classification column of table 2. The information to build the two tables comes from two separate files.
# function to create Table1...
# function to create Table2...
(the tables are created as expected). The problem occurs when I try to update table1 with information from table2.
def update_table1():
con = sqlite3.connect('table1.db', 'table2.db') #I know this is wrong but if how do I connect table2 so that I don't get error that the Table2 global names is not defined?
cur = con.cursor()
if id in Table2 == id in Table1:
new_classification = Table2.class # so now instead of Null it should have the class information from table2
cur.execute("UPDATE Table1 SET class = ? WHERE id =? ", (new_classification, id))
con.commit()
But, I get an error for line2: TypeError: a float is required. I know that it's because I put two parameters in the connect method. But then if I only connect with Table1 I get the error Table2 is not defined.
I read this post Updating a column in one table through a column in another table I understand the logic around it but I can't translate the SQL code into Python. I have been working on this for some time and can't seem to just get it. Would you please help? Thanks
After the comments of a user I got this code but it still doesn't work:
#connect to the database containing the two tables
cur.execute("SELECT id FROM Table1")
for row in cur.fetchall():
row_table1 = row[0]
cur.execute("SELECT (id, class) FROM Table2")
for row1 in cur.fetchall():
row_table2 = row[0] #catches the id
row_table2_class = row[1] #catches the name
if row_table1 == row_table2:
print "yes" #as a test for me to see the loop worked
new_class = row_table_class
cur.execute("UPDATE Table1 SET classification=? WHERE id=?", (new_class, row_table1))
con.commit()
From this however I get an operational error. I know it's my syntax, but like I said I am new to this so any guidance is greatly appreciated.
You need a lot more code than what you have there. Your code logic should go something like this:
connect to sqlite db
execute a SELECT query on TABLE2 and fetch rows. Call this rows2.
execute a SELECT query on TABLE1 and fetch rows. Call this rows1.
For every id in rows1, if this id exists in rows2, execute an UPDATE on that particular id in TABLE1.
You are missing SELECT queries in your code:
cur = con.cursor()
if id in Table2 == id in Table1:
new_classification = Table2.class
You can't just directly test like this. You need to first fetch the rows in both tables using SELECT queries before you can test them out the way you want.
Find below modified code from what you posted above. I have just typed that code in here directly, so I have not had the chance to test it, but you can look at it to get an idea. This could probably even run.
Also, this is by no means the most efficient way to do this. This is actually very clunky. Especially because for every id in Table1, you are fetching all the rows for Table2 everytime to match. Instead, you would want to fetch all the rows for Table1 once, then all the rows for Table2 once and then match them up. I will leave the optimization to make this faster upto you.
import sqlite3
#connect to the database containing the two tables
conn = sqlite3.connect("<PUT DB FILENAME HERE>")
cur = conn.execute("SELECT id FROM Table1")
for row in cur.fetchall():
row_table1_id = row[0]
cur2 = conn.execute("SELECT id, class FROM Table2")
for row1 in cur2.fetchall():
row_table2_id = row1[0] # catches the id
row_table2_class = row1[1] # catches the name
if row_table1_id == row_table2_id:
print "yes" # as a test for me to see the loop worked
new_class = row_table2_class
conn.execute("UPDATE Table1 SET classification=? WHERE id=?", (new_class, row_table1_id))
conn.commit()
I have a simple table in mysql with the following fields:
id -- Primary key, int, autoincrement
name -- varchar(50)
description -- varchar(256)
Using MySQLdb, a python module, I want to insert a name and description into the table, and get back the id.
In pseudocode:
db = MySQLdb.connection(...)
queryString = "INSERT into tablename (name, description) VALUES" % (a_name, a_desc);"
db.execute(queryString);
newID = ???
I think it might be
newID = db.insert_id()
Edit by Original Poster
Turns out, in the version of MySQLdb that I am using (1.2.2)
You would do the following:
conn = MySQLdb(host...)
c = conn.cursor()
c.execute("INSERT INTO...")
newID = c.lastrowid
I am leaving this as the correct answer, since it got me pointed in the right direction.
I don't know if there's a MySQLdb specific API for this, but in general you can obtain the last inserted id by SELECTing LAST_INSERT_ID()
It is on a per-connection basis, so you don't risk race conditions if some other client performs an insert as well.
You could also do a
conn.insert_id
The easiest way of all is to wrap your insert with a select count query into a single stored procedure and call that in your code. You would pass in the parameters needed to the stored procedure and it would then select your row count.