creating blank field and receving the INTEGER PRIMARY KEY with sqlite, python - python

I am using sqlite with python. When i insert into table A i need to feed it an ID from table B. So what i wanted to do is insert default data into B, grab the id (which is auto increment) and use it in table A. Whats the best way receive the key from the table i just inserted into?

As Christian said, sqlite3_last_insert_rowid() is what you want... but that's the C level API, and you're using the Python DB-API bindings for SQLite.
It looks like the cursor method lastrowid will do what you want (search for 'lastrowid' in the documentation for more information). Insert your row with cursor.execute( ... ), then do something like lastid = cursor.lastrowid to check the last ID inserted.
That you say you need "an" ID worries me, though... it doesn't matter which ID you have? Unless you are using the data just inserted into B for something, in which case you need that row ID, your database structure is seriously screwed up if you just need any old row ID for table B.

Check out sqlite3_last_insert_rowid() -- it's probably what you're looking for:
Each entry in an SQLite table has a
unique 64-bit signed integer key
called the "rowid". The rowid is
always available as an undeclared
column named ROWID, OID, or _ROWID_ as
long as those names are not also used
by explicitly declared columns. If the
table has a column of type INTEGER
PRIMARY KEY then that column is
another alias for the rowid.
This routine returns the rowid of the
most recent successful INSERT into the
database from the database connection
in the first argument. If no
successful INSERTs have ever occurred
on that database connection, zero is
returned.
Hope it helps! (More info on ROWID is available here and here.)

Simply use:
SELECT last_insert_rowid();
However, if you have multiple connections writing to the database, you might not get back the key that you expect.

Related

Insert record into PostgreSQL table and expire old record

I have a set of data that gets updated periodically by a client. Once a month or so we will download a new set of this data. The dataset is about 50k records with a couple hundred columns of data.
I am trying to create a database that houses all of this data so we can run our own analysis on it. I'm using PostgreSQL and Python (psycopg2).
Occasionally, the client will add columns to the dataset, so there are a number of steps I want to take:
Add new records to the database table
Compare the old set of data with the new set of data and update the table where necessary
Keep the old records, and either add an "expired" flag, or an "db_expire_date" to keep track of whether a record is active or expired
Add any new columns of data to the database for all records
I know how to add new records to the database (1) using INSERT INTO, and how to add new columns of data to the database (4) using ALTER TABLE. But having issues with (2) and (3). I figured out how to update a record, using the following code:
rows = zip(*[update_records[col] for col in update_records])
cursor = conn.cursor()
cursor.execute("""CREATE TEMP TABLE temptable (""" + schema_list + """) ON COMMIT DROP""")
cursor.executemany("""INSERT INTO temptable (""" + var +""") VALUES ("""+ perc_s + """)""", rows)
cursor.execute("""
UPDATE tracking.test_table
SET mfg = temptable.mfg, db_updt_dt = CURRENT_TIMESTAMP
FROM temptable
WHERE temptable.app_id = tracking.test_table.app_id;
""");
cursor.rowcount
conn.commit()
cursor.close()
conn.close()
However, this just updated the record based on the app_id as the primary key.
What I'd like to figure out is how to keep the original record and set it as "expired" and then create a new, updated record. It seems that "app_id" shouldn't be my primary key, so i've created a new primary key as '"primary_key" INT GENERATED ALWAYS AS IDENTITY not null,'.
I'm just not sure where to go from here. I think that I could probably just use INSERT INTO to send the new records to the database. But i'm not sure how to "expire" the old records that way. Possibly I could use UPDATE table to set the older values to "expired". But I am wondering if there is a more straightforward way to do this.
I hope my question is clear. I'm hoping someone can point me in the right direction. Thanks
A pretty standard data warehousing technique is to define two additional date fields, a from-effective-date and a to-effective-date. You only append rows, never update. You add the candidate record if the source primary key does not exist in your table OR if any column value is different from the most recently added prior record in your table with the same primary key. (Each record supersedes the last).
As you add your record to the table you do 3 things:
The New record's from-effective-date gets the transaction file's date
The New record's to-effective-date gets a date WAY in the future, like 9999-12-31. The important thing here is that it will not expire until you say so.
The most recent prior record (the one you compared values for changes) has its to-effective-date Updated to the transaction file's date minus one day. This has the effect of expiring the old record.
This creates a chain of records with the same source primary key with each one covering a non-overlapping time period. This format is surprisingly easy to select from:
If you want to reproduce the most current transaction file you select Where to-effective-date > Current Date
If you want to reproduce the transaction file at any date for a report, you select Where myreportdate Between from-effective-date And to-effective-date.
If you want the entire update history for a key you select * Where the key = mykeyvalue Order By from-effective-date.
The only thing that is ugly about this scheme is when columns are added, the comparison test also must be altered to include those new columns in case something changes. If you want that to be dynamic, you're going to have to loop through the reflection meta data for each column in the table, but Python will need to know how comparing a text field might be different from comparing a BLOB, for example.
If you actually care about having a primary key (many data warehouses do not have primary keys) you can define a compound key on the source primary key + one of those effective dates, it doesn't really matter which one.
You're looking for the concept of a "natural key", which is how you would identify a unique row, regardless of what the explicit logical constraints on the table are.
This means that you're spot on that you need to change your primary key to be more inclusive. Your new primary key doesn't actually help you decipher which row you are looking for once you have both in there unless you already know which row you are looking for (that "identity" field).
I can think of two likely candidates to add to your natural key: date, or batch.
Either way, you would look for "App = X, [Date|batch] = Y" in the data to find that one. Batch would be upload 1, upload 2, etc. You just make it up, or derive it from the date, or something along those lines.
If you aren't sure which to add, and you aren't ever going to upload multiple times in one day, I would go with Date. That will give you more visibility over time, as you can see when and how often things change.
Once you have a natural key, you want to make it explicit in your data. You can either keep your identity column (see: Surrogate Key) or you can have a compound primary key. With no other input or constraints, I would go with a compound primary key for your situation.
I'm a MySQL DBA, so I'm cribbing a bit from the docs here: https://www.postgresqltutorial.com/postgresql-primary-key/
You do NOT want this:
CREATE TABLE test_table (
app_id INTEGER PRIMARY KEY,
date DATE,
active BOOLEAN
);
Instead, you want this:
CREATE TABLE test_table (
app_id INTEGER,
date DATE,
active BOOLEAN,
PRIMARY KEY (app_id, date)
);
I've added an active column here as well, since you wanted to deactivate rows. This isn't explicitly necessary from what you've described though - you can always assume the most recent upload is active. Or you can expand the columns to have a "active_start" date and an "active_end" date, which will enable another set of queries. But for what you've stated here so far, just the date column should suffice. :)
For step 2)
First, you have to identify the records that have the same data for this you can run a select query with where clause before inserting any recode and count the number of records you receive as output. If the count is more than 0 don't insert the recode otherwise you can insert the recode.
For step 3)
For this, you can insert a column as you mention above with the name 'db_expire_date' and insert the expiration value at the time of record insertion only.
You can also use a column like 'is_expire' but for that, you need to add a cron job that can update the DB periodically for the value of this column.

Is there any way to insert data at the bottom of the table?

I created a table importing data from a csv file into a SQL Server table. The table contains about 6000 rows that are all float. I am trying to insert a new row using INSERT (I am using Python/Spyder and SQL Server Management Studio) and it does insert the row but not at the bottom of the table but towards the middle. I have no idea why it does that. This is the code that I am using:
def create (conn):
print ("Create")
cursor = conn.cursor()
cursor.execute ("insert into PricesTest
(Price1,Price2,Price3,Price4,Price5,Price6,Price7,Price8,Price9,Price10,Price
11,Price12) values (?,?,?,?,?,?,?,?,?,?,?,?);",
(46,44,44,44,44,44,44,44,44,44,44,44))
conn.commit()
read (conn)
Any idea why this is happening? What I should add to my code to "force" that row to be added at the bottom of the table? Many thanks.
I managed to sort it out following different suggestions posted here. Basically I was conceptually wrong to think that tables in MS SQL have an order. I am now working with the data in my table using the ORDER BY dates (I added dates as my first column) and works well. Many thanks all for your help!!
The fact is that the new rows are inserted without any order by default because the server has no rule to order the newly inserted rows (there is no primary key defined). You should have created an identity column before importing your data (even you can do it now):
Id Int IDENTITY(1,1) primary key
This will ensure all rows will be added at the end of the table.
More info on the data type you could use on w3school : https://www.w3schools.com/sql/sql_datatypes.asp

Store a set in SQLite

I am using Python and I would like to have a list of IDs stored in disk preserving some of the functionalities of a set (that is, efficiently checking if an ID is contained). To this end, I think using SQLite library is a wise decision (at least that is my impression after googling and stacking a bit). However, I am a beginner in SQL world and could not find any post explaining what I am looking for.
How can I store IDs (strings) in SQLite and later check if a specific ID appears or not in the database?
import sqlite3
id1 = 'abc'
id2 = 'def'
# Initialization of the database
define_database()
# Update the database by inserting a new ID
insert_in_database(id1)
insert_in_database(id2)
# Check if the specified ID is contained in the database (returns a Boolean)
check_if_exists_in_database(id1)
PS: I am aware of the sqlite3 library.
Thanks!
Just use a table with a single column. This column must be indexed (explicitly, or by making it the primary key) for lookups over large data to be efficient:
db = sqlite3.connect('...filename...')
def define_database():
db.execute('CREATE TABLE IF NOT EXISTS MyStuff(id PRIMARY KEY)')
(Use a WITHOUT ROWID table if your Python version is recent enough to have a modern version of the SQLite library.)
Inserting is done with standard SQL:
def insert_in_database(value):
db.execute('INSERT INTO MyStuff(id) VALUES(?)', [value])
To check whether a value exists, just try to read its row:
def check_if_exists_in_database(value):
for row in db.execute('SELECT 1 FROM MyStuff WHERE id = ?', [value])
return True
else:
return False

Sqlite insert not working with python

I'm working with sqlite3 on python 2.7 and I am facing a problem with a many-to-many relationship. I have a table from which I am fetching its primary key like this
current.execute("SELECT ExtensionID FROM tblExtensionLookup where ExtensionName = ?",[ext])
and then i am fetching another primary key from another table
current.execute("SELECT HostID FROM tblHostLookup where HostName = ?",[host])
now what i am doing is i have a third table with these two keys as foreign keys and i inserted them like this
current.execute("INSERT INTO tblExtensionHistory VALUES(?,?)",[Hid,Eid])
The problem is i don't know why but the last insertion is not working it keeps giving errors. Now what i have tried is:
First I thought it was because I have an autoincrement primary id for the last mapping table which I didn't provide, but isn't it supposed to consider itself as it's auto incremented? However I went ahead and tried adding Null,None,0 but nothing works.
Secondly I thought maybe because i'm not getting the values from tables above so I tried printing it out and it shows so it works.
Any suggestions what I am doing wrong here?
EDIT :
When i don't provide primary key i get error as
The table has three columns but you provided only two values
and when i do provide them as None,Null or 0 it says
Parameter 0 is not supported probably because of unsupported type
I tried implementing the #abarnet way but still keeps saying parameter 0 not supported
connection = sqlite3.connect('WebInfrastructureScan.db')
with connection:
current = connection.cursor()
current.execute("SELECT ExtensionID FROM tblExtensionLookup where ExtensionName = ?",[ext])
Eid = current.fetchone()
print Eid
current.execute("SELECT HostID FROM tblHostLookup where HostName = ?",[host])
Hid = current.fetchone()
print Hid
current.execute("INSERT INTO tblExtensionHistory(HostID,ExtensionID) VALUES(?,?)",[Hid,Eid])
EDIT 2 :
The database schema is :
table 1:
CREATE TABLE tblHostLookup (
HostID INTEGER PRIMARY KEY AUTOINCREMENT,
HostName TEXT);
table2:
CREATE TABLE tblExtensionLookup (
ExtensionID INTEGER PRIMARY KEY AUTOINCREMENT,
ExtensionName TEXT);
table3:
CREATE TABLE tblExtensionHistory (
ExtensionHistoryID INTEGER PRIMARY KEY AUTOINCREMENT,
HostID INTEGER,
FOREIGN KEY(HostID) REFERENCES tblHostLookup(HostID),
ExtensionID INTEGER,
FOREIGN KEY(ExtensionID) REFERENCES tblExtensionLookup(ExtensionID));
It's hard to be sure without full details, but I think I can guess the problem.
If you use the INSERT statement without column names, the values must exactly match the columns as given in the schema. You can't skip over any of them.*
The right way to fix this is to just use the column names in your INSERT statement. Something like:
current.execute("INSERT INTO tblExtensionHistory (HostID, ExtensionID) VALUES (?,?)",
[Hid, Eid])
Now you can skip any columns you want (as long as they're autoincrement, nullable, or otherwise skippable, of course), or provide them in any order you want.
For your second problem, you're trying to pass in rows as if they were single values. You can't do that. From your code:
Eid = current.fetchone()
This will return something like:
[3]
And then you try to bind that to the ExtensionID column, which gives you an error.
In the future, you may want to try to write and debug the SQL statements in the sqlite3 command-line tool and/or your favorite GUI database manager (there's a simple extension that runs in for Firefox if you don't want anything fancy) and get them right, before you try getting the Python right.
* This is not true with all databases. For example, in MSJET/Access, you must skip over autoincrement columns. See the SQLite documentation for how SQLite interprets INSERT with no column names, or similar documentation for other databases.

How to get Inserted or selected row id in postgres using python

My postgres query is:
query = """INSERT INTO statustable(value) SELECT '%s'
WHERE NOT EXISTS (SELECT id, value FROM statustable
WHERE value = '%s') RETURNING id""" % (status, status)
cursor_postgres.execute(query)
conn_postgres.commit()
statusId = cursor_postgres.fetchone()[0]
print "statusId" + str(statusId)
I need to get the freshly inserted status value id if it doesnt exist, or select its id if already exist. RETURNING id choked this query entirely so I had to remove it to atleast get the selective insertion working.
Any clues how to get the statusId here? In another instance I am doing an Upsert.(Insert if not exist, update otherwise) Here again, I need the inserted or updated row id. (No, I am not using stored procedures, if that was your first question...)
Thanks in advance
I can't say I fully understand your motivation for insisting on a single query. I think your best bet is to have two simple queries:
SELECT id FROM statustable WHERE value = '%s'. This gives you the id if the entry exists, in which case skip step 2;
INSERT INTO statustable(value) VALUES('%s') RETURNING id. This'll give you the id of the newly created entry.
Lastly -- although I haven't verified whether this is a problem -- fetchone() across a commit looks slightly suspect.

Categories

Resources