I am trying to build a composite primary key for my tabels. They should also have a self incremented id. My problem is that when I use a composite primary key the ID becomes NULL (as seen in the pictures)
here it works as it should but no composite key
here the id is NULL no matter what.
I tried different synatxes and also key words like NOT NULL and AUTOINCREMENT but nothing seems to work.
Here is the code without composite key
mystr = "CREATE TABLE IF NOT EXISTS KM%s(id INTEGER PRIMARY KEY, date TEXT, client INTEGER)"%(month.replace('-',"))
print(mystr)
c.execute(mystr) #create a table
conn.commit()'''
Here is the code with COMPOSITE KEY
mystr = "CREATE TABLE IF NOT EXISTS KM%s(id INTEGER, date TEXT, client INTEGER, primary key (id, client)"%(month.replace('-',"))
print(mystr)
c.execute(mystr) #create a table
conn.commit()
I was sure that I'd used autoincremented integer columns in the past which were not primary keys, but it certainly doesn't work today with SQLite.
I must echo what #forpas has already said in the comment that you just can't do that.
The solution would be to add the UNIQUE constraint to id and generate your ID programmatically as you go. You do not need to track your current maximum ID because you can simply ask SQLite what the max is:
SELECT MAX(id) FROM KM<month>;
Increment that value by 1 and include it in your INSERT INTO statement.
I'd like to offer a couple of tips:
Using two integers as your composite key is a bad idea. Take composite key 1315 for example. Is that client 315 with an ID of 1, client 15 with an ID of 13, or client 5 with an ID of 131? It's true that primary keys are just for searching and do not have to be unique in many cases, but using integers generally does not work well.
The second tip is not to create a new database table for each month. A very good rule is that identically-structured tables should be combined into a single table. In this case you would add a column called month (actually, it would be 'date' then you would search by month) and keep everything in one table, not one table per month.
Related
I have a set of data that gets updated periodically by a client. Once a month or so we will download a new set of this data. The dataset is about 50k records with a couple hundred columns of data.
I am trying to create a database that houses all of this data so we can run our own analysis on it. I'm using PostgreSQL and Python (psycopg2).
Occasionally, the client will add columns to the dataset, so there are a number of steps I want to take:
Add new records to the database table
Compare the old set of data with the new set of data and update the table where necessary
Keep the old records, and either add an "expired" flag, or an "db_expire_date" to keep track of whether a record is active or expired
Add any new columns of data to the database for all records
I know how to add new records to the database (1) using INSERT INTO, and how to add new columns of data to the database (4) using ALTER TABLE. But having issues with (2) and (3). I figured out how to update a record, using the following code:
rows = zip(*[update_records[col] for col in update_records])
cursor = conn.cursor()
cursor.execute("""CREATE TEMP TABLE temptable (""" + schema_list + """) ON COMMIT DROP""")
cursor.executemany("""INSERT INTO temptable (""" + var +""") VALUES ("""+ perc_s + """)""", rows)
cursor.execute("""
UPDATE tracking.test_table
SET mfg = temptable.mfg, db_updt_dt = CURRENT_TIMESTAMP
FROM temptable
WHERE temptable.app_id = tracking.test_table.app_id;
""");
cursor.rowcount
conn.commit()
cursor.close()
conn.close()
However, this just updated the record based on the app_id as the primary key.
What I'd like to figure out is how to keep the original record and set it as "expired" and then create a new, updated record. It seems that "app_id" shouldn't be my primary key, so i've created a new primary key as '"primary_key" INT GENERATED ALWAYS AS IDENTITY not null,'.
I'm just not sure where to go from here. I think that I could probably just use INSERT INTO to send the new records to the database. But i'm not sure how to "expire" the old records that way. Possibly I could use UPDATE table to set the older values to "expired". But I am wondering if there is a more straightforward way to do this.
I hope my question is clear. I'm hoping someone can point me in the right direction. Thanks
A pretty standard data warehousing technique is to define two additional date fields, a from-effective-date and a to-effective-date. You only append rows, never update. You add the candidate record if the source primary key does not exist in your table OR if any column value is different from the most recently added prior record in your table with the same primary key. (Each record supersedes the last).
As you add your record to the table you do 3 things:
The New record's from-effective-date gets the transaction file's date
The New record's to-effective-date gets a date WAY in the future, like 9999-12-31. The important thing here is that it will not expire until you say so.
The most recent prior record (the one you compared values for changes) has its to-effective-date Updated to the transaction file's date minus one day. This has the effect of expiring the old record.
This creates a chain of records with the same source primary key with each one covering a non-overlapping time period. This format is surprisingly easy to select from:
If you want to reproduce the most current transaction file you select Where to-effective-date > Current Date
If you want to reproduce the transaction file at any date for a report, you select Where myreportdate Between from-effective-date And to-effective-date.
If you want the entire update history for a key you select * Where the key = mykeyvalue Order By from-effective-date.
The only thing that is ugly about this scheme is when columns are added, the comparison test also must be altered to include those new columns in case something changes. If you want that to be dynamic, you're going to have to loop through the reflection meta data for each column in the table, but Python will need to know how comparing a text field might be different from comparing a BLOB, for example.
If you actually care about having a primary key (many data warehouses do not have primary keys) you can define a compound key on the source primary key + one of those effective dates, it doesn't really matter which one.
You're looking for the concept of a "natural key", which is how you would identify a unique row, regardless of what the explicit logical constraints on the table are.
This means that you're spot on that you need to change your primary key to be more inclusive. Your new primary key doesn't actually help you decipher which row you are looking for once you have both in there unless you already know which row you are looking for (that "identity" field).
I can think of two likely candidates to add to your natural key: date, or batch.
Either way, you would look for "App = X, [Date|batch] = Y" in the data to find that one. Batch would be upload 1, upload 2, etc. You just make it up, or derive it from the date, or something along those lines.
If you aren't sure which to add, and you aren't ever going to upload multiple times in one day, I would go with Date. That will give you more visibility over time, as you can see when and how often things change.
Once you have a natural key, you want to make it explicit in your data. You can either keep your identity column (see: Surrogate Key) or you can have a compound primary key. With no other input or constraints, I would go with a compound primary key for your situation.
I'm a MySQL DBA, so I'm cribbing a bit from the docs here: https://www.postgresqltutorial.com/postgresql-primary-key/
You do NOT want this:
CREATE TABLE test_table (
app_id INTEGER PRIMARY KEY,
date DATE,
active BOOLEAN
);
Instead, you want this:
CREATE TABLE test_table (
app_id INTEGER,
date DATE,
active BOOLEAN,
PRIMARY KEY (app_id, date)
);
I've added an active column here as well, since you wanted to deactivate rows. This isn't explicitly necessary from what you've described though - you can always assume the most recent upload is active. Or you can expand the columns to have a "active_start" date and an "active_end" date, which will enable another set of queries. But for what you've stated here so far, just the date column should suffice. :)
For step 2)
First, you have to identify the records that have the same data for this you can run a select query with where clause before inserting any recode and count the number of records you receive as output. If the count is more than 0 don't insert the recode otherwise you can insert the recode.
For step 3)
For this, you can insert a column as you mention above with the name 'db_expire_date' and insert the expiration value at the time of record insertion only.
You can also use a column like 'is_expire' but for that, you need to add a cron job that can update the DB periodically for the value of this column.
I am trying to write some data to a table in a database which I am creating.
However, I am facing with an integrity error like:
sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) PRIMARY KEY must be unique
My question is how to avoid these errors as I will run a couple of times the script
Basically you are creating an object with an already existing primary key, and it's not accepted by SQLite. Verify it by querying the db with something like
select * from airport where id = 6256
If the query returns a result, you need to change the id of the airport you are saving. Since you use the autoincrement, you don't need to specify an id and the DBMS will assign the next free id in that table.
I am a beginner in mysql and may be its my fault somewhere, and not able to understand how this can be resolved.
This is structure of my table:-
CREATE TABLE `nearest_product_type` (
`id` integer AUTO_INCREMENT NOT NULL PRIMARY KEY,
`created` datetime NOT NULL,
`modified` datetime NOT NULL,
`name` varchar(15) NOT NULL UNIQUE
)
;
And this is the code i am trying:-
base = MySQLdb.connect (host="localhost", user = "root", passwd = "sheeshmohsin", db="points")
basecursor = base.cursor()
queryone = """INSERT INTO nearest_product_type (name,created,modified) VALUES (%s,%s,%s) ON DUPLICATE KEY UPDATE name=name """
category = "Indica"
valueone = (category,datetime.datetime.now(),datetime.datetime.now())
basecursor.execute(queryone, valueone)
product_id = basecursor.lastrowid
basecursor.close()
base.commit()
base.close()
print product_id
On running this python script, first time when category is not unique, it works fine, but on running again with the same category as first time, last row id returns 0. but i need the id of the last row which is updated.
And when i checked the rows in table, the auto-increment is also working, suppose if i run the script four times, in first time when category is unique the id is 1 and suppose another unique category comes in fourth time, then the id assigned to this row is 4, but it should be 2, because its second row. how can i solve this?
The ON DUPLICATE KEY UPDATE part here will not work as the key is the auto-increment column, which will never get duplicates.
It is almost certainly this clause that is causing the unexpected counts, particularly given the UNIQUE setting on name.
You can try using something like SELECT MAX(id) FROM nearest_product_type to get the last id added.
Something is wrong in the way you access the database. When you try to insert an new row in your database with a name that already exists, as the column name is declared to be unique, the insert will fail.
If you want to modify an existing row , you must use an UPDATE statement not an INSERT one. And there's nothing in SQL to do an insert or update.
And nothing in autoincrement guarantees that id are consecutive. All you know is that the database will allow a different id for each inserted row, but insertion failure can (and do in you case) result is holes is the id sequence.
Furthermore, some drivers may allow for pre-reservation of ids, particurarly with network connections to allow a client connection to get a bunch of ids in case it would insert more than one row. It that case, if another client asks for ids, and both clients insert rows alternatively, the id will not follow the insertion time.
How would I stop sqlite3 from adding the same exact values into a table if it is the exact same but otherwise add it? I'm totaly new to sqlite and don't know how to do this.
When you create the table, specify a unique constraint:
create table foo ( name varchar, id integer, unique ( name, id) );
You should define your table as #Robᵩ answered.
If you don't want, however, change an existing table definition - in SQLite you are very limited in ALTER TABLE, you can create a unique index:
CREATE UNIQUE INDEX foo_idx ON foo (name, id);
Note you are not allowed to create this index until you remove all duplicates.
I'm working with sqlite3 on python 2.7 and I am facing a problem with a many-to-many relationship. I have a table from which I am fetching its primary key like this
current.execute("SELECT ExtensionID FROM tblExtensionLookup where ExtensionName = ?",[ext])
and then i am fetching another primary key from another table
current.execute("SELECT HostID FROM tblHostLookup where HostName = ?",[host])
now what i am doing is i have a third table with these two keys as foreign keys and i inserted them like this
current.execute("INSERT INTO tblExtensionHistory VALUES(?,?)",[Hid,Eid])
The problem is i don't know why but the last insertion is not working it keeps giving errors. Now what i have tried is:
First I thought it was because I have an autoincrement primary id for the last mapping table which I didn't provide, but isn't it supposed to consider itself as it's auto incremented? However I went ahead and tried adding Null,None,0 but nothing works.
Secondly I thought maybe because i'm not getting the values from tables above so I tried printing it out and it shows so it works.
Any suggestions what I am doing wrong here?
EDIT :
When i don't provide primary key i get error as
The table has three columns but you provided only two values
and when i do provide them as None,Null or 0 it says
Parameter 0 is not supported probably because of unsupported type
I tried implementing the #abarnet way but still keeps saying parameter 0 not supported
connection = sqlite3.connect('WebInfrastructureScan.db')
with connection:
current = connection.cursor()
current.execute("SELECT ExtensionID FROM tblExtensionLookup where ExtensionName = ?",[ext])
Eid = current.fetchone()
print Eid
current.execute("SELECT HostID FROM tblHostLookup where HostName = ?",[host])
Hid = current.fetchone()
print Hid
current.execute("INSERT INTO tblExtensionHistory(HostID,ExtensionID) VALUES(?,?)",[Hid,Eid])
EDIT 2 :
The database schema is :
table 1:
CREATE TABLE tblHostLookup (
HostID INTEGER PRIMARY KEY AUTOINCREMENT,
HostName TEXT);
table2:
CREATE TABLE tblExtensionLookup (
ExtensionID INTEGER PRIMARY KEY AUTOINCREMENT,
ExtensionName TEXT);
table3:
CREATE TABLE tblExtensionHistory (
ExtensionHistoryID INTEGER PRIMARY KEY AUTOINCREMENT,
HostID INTEGER,
FOREIGN KEY(HostID) REFERENCES tblHostLookup(HostID),
ExtensionID INTEGER,
FOREIGN KEY(ExtensionID) REFERENCES tblExtensionLookup(ExtensionID));
It's hard to be sure without full details, but I think I can guess the problem.
If you use the INSERT statement without column names, the values must exactly match the columns as given in the schema. You can't skip over any of them.*
The right way to fix this is to just use the column names in your INSERT statement. Something like:
current.execute("INSERT INTO tblExtensionHistory (HostID, ExtensionID) VALUES (?,?)",
[Hid, Eid])
Now you can skip any columns you want (as long as they're autoincrement, nullable, or otherwise skippable, of course), or provide them in any order you want.
For your second problem, you're trying to pass in rows as if they were single values. You can't do that. From your code:
Eid = current.fetchone()
This will return something like:
[3]
And then you try to bind that to the ExtensionID column, which gives you an error.
In the future, you may want to try to write and debug the SQL statements in the sqlite3 command-line tool and/or your favorite GUI database manager (there's a simple extension that runs in for Firefox if you don't want anything fancy) and get them right, before you try getting the Python right.
* This is not true with all databases. For example, in MSJET/Access, you must skip over autoincrement columns. See the SQLite documentation for how SQLite interprets INSERT with no column names, or similar documentation for other databases.