Scenario: A quiz program with questions worth different amounts of points.
Sqlite database with a table Table1 with a field RunningTotal of type Int.
I'm looking to update the RunningTotal by the quantity 'updateby' passed to the function. This is a numerical value (but may be a string, so i'm converting it to integer to be sure.
tableid is used to identify which row to update.
eg (non-working code : error is that updateby is not a column name)
def UpdateRunningTotal(tableid,updateby)
updateby = int(updateby)
conn.execute("UPDATE Table1 RunningTotal=RunningTotal+updateby WHERE tableid=?", (tableid,))
I know if I put the following it works to increment the field by 1, but as a function i want more flexibility to increment by different amounts.
conn.execute("UPDATE Table1 RunningTotal=RunningTotal+1 WHERE tableid=?", (tableid,))
I'm trying to avoid doing a SELECT statement to read the current value of RunningTotal, do the math on that, and then use that result in the UPDATE statement...that seems inefficient to me (but may not be?)
conn.execute("UPDATE Table1 set RunningTotal=RunningTotal+? WHERE tableid=?", (updateby, tableid,))
use this statement ... i have checked.. its working fine its updting the previous qnty present in database by RunningTotal+updateby
hope your issue will be resolved
Related
I have a set of data that gets updated periodically by a client. Once a month or so we will download a new set of this data. The dataset is about 50k records with a couple hundred columns of data.
I am trying to create a database that houses all of this data so we can run our own analysis on it. I'm using PostgreSQL and Python (psycopg2).
Occasionally, the client will add columns to the dataset, so there are a number of steps I want to take:
Add new records to the database table
Compare the old set of data with the new set of data and update the table where necessary
Keep the old records, and either add an "expired" flag, or an "db_expire_date" to keep track of whether a record is active or expired
Add any new columns of data to the database for all records
I know how to add new records to the database (1) using INSERT INTO, and how to add new columns of data to the database (4) using ALTER TABLE. But having issues with (2) and (3). I figured out how to update a record, using the following code:
rows = zip(*[update_records[col] for col in update_records])
cursor = conn.cursor()
cursor.execute("""CREATE TEMP TABLE temptable (""" + schema_list + """) ON COMMIT DROP""")
cursor.executemany("""INSERT INTO temptable (""" + var +""") VALUES ("""+ perc_s + """)""", rows)
cursor.execute("""
UPDATE tracking.test_table
SET mfg = temptable.mfg, db_updt_dt = CURRENT_TIMESTAMP
FROM temptable
WHERE temptable.app_id = tracking.test_table.app_id;
""");
cursor.rowcount
conn.commit()
cursor.close()
conn.close()
However, this just updated the record based on the app_id as the primary key.
What I'd like to figure out is how to keep the original record and set it as "expired" and then create a new, updated record. It seems that "app_id" shouldn't be my primary key, so i've created a new primary key as '"primary_key" INT GENERATED ALWAYS AS IDENTITY not null,'.
I'm just not sure where to go from here. I think that I could probably just use INSERT INTO to send the new records to the database. But i'm not sure how to "expire" the old records that way. Possibly I could use UPDATE table to set the older values to "expired". But I am wondering if there is a more straightforward way to do this.
I hope my question is clear. I'm hoping someone can point me in the right direction. Thanks
A pretty standard data warehousing technique is to define two additional date fields, a from-effective-date and a to-effective-date. You only append rows, never update. You add the candidate record if the source primary key does not exist in your table OR if any column value is different from the most recently added prior record in your table with the same primary key. (Each record supersedes the last).
As you add your record to the table you do 3 things:
The New record's from-effective-date gets the transaction file's date
The New record's to-effective-date gets a date WAY in the future, like 9999-12-31. The important thing here is that it will not expire until you say so.
The most recent prior record (the one you compared values for changes) has its to-effective-date Updated to the transaction file's date minus one day. This has the effect of expiring the old record.
This creates a chain of records with the same source primary key with each one covering a non-overlapping time period. This format is surprisingly easy to select from:
If you want to reproduce the most current transaction file you select Where to-effective-date > Current Date
If you want to reproduce the transaction file at any date for a report, you select Where myreportdate Between from-effective-date And to-effective-date.
If you want the entire update history for a key you select * Where the key = mykeyvalue Order By from-effective-date.
The only thing that is ugly about this scheme is when columns are added, the comparison test also must be altered to include those new columns in case something changes. If you want that to be dynamic, you're going to have to loop through the reflection meta data for each column in the table, but Python will need to know how comparing a text field might be different from comparing a BLOB, for example.
If you actually care about having a primary key (many data warehouses do not have primary keys) you can define a compound key on the source primary key + one of those effective dates, it doesn't really matter which one.
You're looking for the concept of a "natural key", which is how you would identify a unique row, regardless of what the explicit logical constraints on the table are.
This means that you're spot on that you need to change your primary key to be more inclusive. Your new primary key doesn't actually help you decipher which row you are looking for once you have both in there unless you already know which row you are looking for (that "identity" field).
I can think of two likely candidates to add to your natural key: date, or batch.
Either way, you would look for "App = X, [Date|batch] = Y" in the data to find that one. Batch would be upload 1, upload 2, etc. You just make it up, or derive it from the date, or something along those lines.
If you aren't sure which to add, and you aren't ever going to upload multiple times in one day, I would go with Date. That will give you more visibility over time, as you can see when and how often things change.
Once you have a natural key, you want to make it explicit in your data. You can either keep your identity column (see: Surrogate Key) or you can have a compound primary key. With no other input or constraints, I would go with a compound primary key for your situation.
I'm a MySQL DBA, so I'm cribbing a bit from the docs here: https://www.postgresqltutorial.com/postgresql-primary-key/
You do NOT want this:
CREATE TABLE test_table (
app_id INTEGER PRIMARY KEY,
date DATE,
active BOOLEAN
);
Instead, you want this:
CREATE TABLE test_table (
app_id INTEGER,
date DATE,
active BOOLEAN,
PRIMARY KEY (app_id, date)
);
I've added an active column here as well, since you wanted to deactivate rows. This isn't explicitly necessary from what you've described though - you can always assume the most recent upload is active. Or you can expand the columns to have a "active_start" date and an "active_end" date, which will enable another set of queries. But for what you've stated here so far, just the date column should suffice. :)
For step 2)
First, you have to identify the records that have the same data for this you can run a select query with where clause before inserting any recode and count the number of records you receive as output. If the count is more than 0 don't insert the recode otherwise you can insert the recode.
For step 3)
For this, you can insert a column as you mention above with the name 'db_expire_date' and insert the expiration value at the time of record insertion only.
You can also use a column like 'is_expire' but for that, you need to add a cron job that can update the DB periodically for the value of this column.
Hello StackEx community.
I am implementing a relational database using SQLite interfaced with Python. My table consists of 5 attributes with around a million tuples.
To avoid large number of database queries, I wish to execute a single query that updates 2 attributes of multiple tuples. These updated values depend on the tuples' Primary Key value and so, are different for each tuple.
I am trying something like the following in Python 2.7:
stmt= 'UPDATE Users SET Userid (?,?), Neighbours (?,?) WHERE Username IN (?,?)'
cursor.execute(stmt, [(_id1, _Ngbr1, _name1), (_id2, _Ngbr2, _name2)])
In other words, I am trying to update the rows that have Primary Keys _name1 and _name2 by substituting the Neighbours and Userid columns with corresponding values. The execution of the two statements returns the following error:
OperationalError: near "(": syntax error
I am reluctant to use executemany() because I want to reduce the number of trips across the database.
I am struggling with this issue for a couple of hours now but couldn't figure out either the error or an alternate on the web. Please help.
Thanks in advance.
If the column that is used to look up the row to update is properly indexed, then executing multiple UPDATE statements would be likely to be more efficient than a single statement, because in the latter case the database would probably need to scan all rows.
Anyway, if you really want to do this, you can use CASE expressions (and explicitly numbered parameters, to avoid duplicates):
UPDATE Users
SET Userid = CASE Username
WHEN ?5 THEN ?1
WHEN ?6 THEN ?2
END,
Neighbours = CASE Username
WHEN ?5 THEN ?3
WHEN ?6 THEN ?4
END,
WHERE Username IN (?5, ?6);
I just had an issue with Django and PostgreSQL that I don't understand.
I have a simple model, defined such as:
class MyModel(models.Model):
my_field = models.IntegerField()
my_other_field = models.TextField()
In my view, i have something similar to:
my_object = MyModel(my_field=1, my_other_field='blah')
my_object.save()
Everything was working fine, until this morning. I got this error:
IntegrityError at /my_url/
duplicate key value violates unique constraint "my_model_pkey"
DETAIL: Key (id)=(3) already exists.
CONTEXT: Remote SQL command: INSERT INTO public.my_model(id, my_field, my_other_field) VALUES ($1, $2, $3) RETURNING id
I had this error once, I know it is related to the way PostgreSQL syncs the sequential table associated with my model with the id column. I has to run this function in PostgreSQL until the id returned was greater than the biggest value of the id.
select nextval('my_model_id_seq'::regclass);
My question is: Why did this happen in the first place? And how to prevent it in the future ?
By the way, that's the only way I insert data into the table, I've never inserted data manually.
I hope the question is clear enough
I think the question is not "why is my sequence getting messed up" - rather it is "why is Django trying to supply a value for the id column when inserting a row, instead of allowing the database to insert the next value in the sequence".
The Django documentation describes the algorithm it uses to decide whether it should be doing an UPDATE or an INSERT when you call save().
This algorithm involves checking if the 'id' field of the object is already set to some value. If it is not, then it does an INSERT (presumably not specifying a value for the 'id' field). If it is set, then it first tries to do an UPDATE; if that does not result in an updated record, then it will do an INSERT (this time presumably it would specify a value for the 'id' field).
As pointed out in Erwin's answer, the error message which you seeing indicates it is trying to insert a row while specifying the value for the 'id' field.
I note that it appears this algorithm has changed in version 1.6 of Django. Previously it used a SELECT first to see if a record existed, then an UPDATE if it did or an INSERT if it did not. If your problem has started occurring since upgrading, then that could be a cause. The documentation notes:
There are some rare cases where the database doesn’t report that a row
was updated even if the database contains a row for the object’s
primary key value. An example is the PostgreSQL ON UPDATE trigger
which returns NULL. In such cases it is possible to revert to the old
algorithm by setting the select_on_save option to True.
If this were happening for you, then it would explain your symptoms: the error would actually be occurring when trying to update a value in the database, and django would erroneously think that the row did not exist and then try to create it.
You could check for this by setting 'select_on_save' to true to revert to the old behavior.
Another possible reason for this would be if your code inadvertently set the 'id' attribute on an object to some value, and then called save(). This could cause various problems, depending on whether the value already existed in the database or not. In particular, it might result in creating a row which has an 'id' value which is ahead of the current range of the sequence associated with the column, so that later on you would get errors trying to insert into the row.
Another possible reason could be using the 'force_insert' argument to save(), on a row which had previously loaded from the database (so that it was actually an existing row you should be updating).
The root of the problem lies here (SQL command from your error message):
INSERT INTO public.my_model(id, my_field, my_other_field)
VALUES ($1, $2, $3)
RETURNING id
Since your id column seems to be a serial type, do not insert values manually. Let the default draw from the sequence automatically. Should be:
INSERT INTO public.my_model(my_field, my_other_field)
VALUES ($1, $2)
RETURNING id;
That's the whole point of adding RETURNING id to begin with: to return the newly generated id. If you pass in a value yourself, you wouldn't need to have it returned.
Fix
If the sequence got out of sync somehow, because manual entries conflict with the numbers from nextval(), run this query once:
SELECT setval('my_model_id_seq', max(id)) FROM my_model;
This sets the sequence to the current maximum. Next call is next number, no off-by-one error.
I am having troubles finding out if I can even do this. Basically, I have a csv file that looks like the following:
1111,804442232,1
1112,312908721,1
1113,A*2434,1
1114,A*512343128760987,1
1115,3512748,1
1116,1111,1
1117,1234,1
This is imported into a sqlite database in memory for manipulation. I will be importing multiple files into this database after some manipulation. Sqlite is allowing me to keep constraints on the tables and receive errors where needed without creating additional functions just to check each constraint while using arrays in python. I want to do a few things but the first of which is to prepend field2 where all field2 strings match an entry in field1.
For example, in the above data field2 in entry 6 matches entry 1. In this case I would like to prepend field2 in entry 6 with '555'
If this is not possible I do believe I could make do using a regex and just do this on every row with 4 digits in field2... though... I have yet to successfully get REGEX working using python/sqlite as it always throws me an error.
I am working within Python using Sqlite3 to connect/manipulate my sqlite database.
EDIT: I am looking for a method to manipulate the resultant tables which reside in a sqlite database rather than manipulating just the csv data. The data above is just a simple representation of what is contained in the files I am working with. Would it be better to work with arrays containing the data from the csv files? These files have 10,000+ entries and about 20-30 columns.
If you must do it in SQLite, how about this:
First, get the column names of the table by running the following and parsing the result
def get_columns(table_name, cursor):
cursor.execute('pragma table_info(%s)' % table_name)
return [row[1] for row in cursor]
conn = sqlite3.connect('test.db')
columns = get_columns('test_table',conn.cursor())
For each of those columns, run the following update, that does your prepending
def prepend(column, reference, prefix, cursor):
query = '''
UPDATE %s
SET %s = 'prefix' || %s
WHERE %s IN (SELECT %s FROM %s)
''' % (table, column, column, column, reference, table)
cursor.execute(query)
reference = 'field1'
[prepend('test_table', column, reference, '555', conn.cursor())
for column in columns
if column != reference]
Note that this is expensive: O(n^2) for each column you want to do it for.
As per your edit and Nathan's answer, it might be better to simply work with python's builtin datastructures. You can always insert it into SQLite after.
10,000 entries is not really much so it might not matter in the end. It all depends on your reason for requiring it to be done in SQLite (which we don't have much visibility of).
There is no need to use regex expressions to do this, just throw the contents from the first column into a set and then iterate through the rows and update the second field.
first_col_values = set(row[0] for row in rows)
for row in rows:
if row[1] in first_col_values:
row[1] = '555' + row[1]
So... I found the answer to my own question after a ridiculous amount of my own searching and trial and error. My unfamiliarity with SQL had me stumped as I was trying all kinds of crazy things. In the end... this was the simple type of solution I was looking for:
prefix="555"
cur.execute("UPDATE table SET field2 = %s || field2 WHERE field2 IN (SELECT field1 FROM table)"% (prefix))
I kept the small amount of python in there but what I was looking for was the SQL statement. Not sure why nobody else came up with something that simple =/. Unsatisfied with the answers so far, I had been searching far and wide for this simple line >_<.
I am using sqlite with python. When i insert into table A i need to feed it an ID from table B. So what i wanted to do is insert default data into B, grab the id (which is auto increment) and use it in table A. Whats the best way receive the key from the table i just inserted into?
As Christian said, sqlite3_last_insert_rowid() is what you want... but that's the C level API, and you're using the Python DB-API bindings for SQLite.
It looks like the cursor method lastrowid will do what you want (search for 'lastrowid' in the documentation for more information). Insert your row with cursor.execute( ... ), then do something like lastid = cursor.lastrowid to check the last ID inserted.
That you say you need "an" ID worries me, though... it doesn't matter which ID you have? Unless you are using the data just inserted into B for something, in which case you need that row ID, your database structure is seriously screwed up if you just need any old row ID for table B.
Check out sqlite3_last_insert_rowid() -- it's probably what you're looking for:
Each entry in an SQLite table has a
unique 64-bit signed integer key
called the "rowid". The rowid is
always available as an undeclared
column named ROWID, OID, or _ROWID_ as
long as those names are not also used
by explicitly declared columns. If the
table has a column of type INTEGER
PRIMARY KEY then that column is
another alias for the rowid.
This routine returns the rowid of the
most recent successful INSERT into the
database from the database connection
in the first argument. If no
successful INSERTs have ever occurred
on that database connection, zero is
returned.
Hope it helps! (More info on ROWID is available here and here.)
Simply use:
SELECT last_insert_rowid();
However, if you have multiple connections writing to the database, you might not get back the key that you expect.