I'm inserting data from some csv files into my SQLite3 database with a python script I wrote. When I run the script, it inserts the first row into the database, but gives this error when trying to inset the second:
sqlite3.IntegrityError: columns column_name1, column_name2 are not unique.
It is true the values in column_name1 and column_name2 are same in the first two rows of the csv file. But, this seems a bit strange to me, because reading about this error indicated that it signifies a uniqueness constraint on one or more of the database's columns. I checked the database details using SQLite Expert Personal, and it does not show any uniqueness constraints on the current table. Also, none of the fields that I am entering specify the primary key. It seems that the database automatically assigns those. Any thoughts on what could be causing this error? Thanks.
import sqlite3
import csv
if __name__ == '__main__' :
conn = sqlite3.connect('ts_database.sqlite3')
c = conn.cursor()
fileName = "file_name.csv"
f = open(fileName)
csv_f = csv.reader(f)
for row in csv_f:
command = "INSERT INTO table_name(column_name1, column_name2, column_name3)"
command += " VALUES (%s, '%s', %s);" % (row[0],row[1],row[2])
print command
c.execute(command)
conn.commit()
f.close()
If SQLite is reporting an IntegrityError error it's very likely that there really is a PRIMARY KEY or UNIQUE KEY on those two columns and that you are mistaken when you state there is not. Ensure that you're really looking at the same instance of the database.
Also, do not write your SQL statement using string interpolation. It's dangerous and also difficult to get correct (as you probably know considering you have single quotes on one of the fields). Using parameterized statements in SQLite is very, very simple.
The error may be due to duplicate column names in the INSERT INTO statement. I am guessing it is a typo and you meant column_name3 in the INSERT INTO statement.
Related
I am automating a task through Python that will run an SQL statement to insert into an existing table in a DB.
My CSV headers look as such:
ID,ACCOUNTID,CATEGORY,SUBCATEGORY,CREATION_DATE,CREATED_BY,REMARK,ISIMPORTANT,TYPE,ENTITY_TYPE
My values:
seq_addnoteid.nextval,123456,TEST,ADMN_TEST,sysdate,ME,This is a test,Y,1,A
NOTE: Currently, seq_addnote works from DB but in my code i added a small snippet to get the max ID and the rows will increase this by one for each iteration.
Sysdate could also be passed as format '19-MAY-22'
If i was to run from DB, this would work:
insert into notes values(seq_addnoteid.nextval,'123456','TEST','ADMN_TEST',sysdate,'ME','This is a test','Y',1,'A');
# Snippet to get function
cursor.execute("SELECT MAX(ID) from NOTES")
max = cursor.fetchone()
max = int(max[0])
with open ('sample.csv', 'r') as f:
reader = csv.reader(f)
columns = next(reader)
query = 'INSERT INTO NOTES({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
cursor = conn.cursor()
for data in reader:
cursor.execute(query, data)
conn.commit()
print("Records inserted successfully")
cursor.close()
conn.close()
Currently, i'm getting Oracle-Error-Message: ORA-01036: illegal variable name/number and i think its because of my query.format line. However, I'm looking for help to get this code to handle the data types properly.
Thanks!
Try printing your query before you execute it. I think you'll find that it's printing this:
INSERT INTO NOTES(ID,ACCOUNTID,CATEGORY,SUBCATEGORY,CREATION_DATE,CREATED_BY,REMARK,ISIMPORTANT,TYPE,ENTITY_TYPE)
values(seq_addnoteid.nextval,123456,TEST,ADMN_TEST,sysdate,ME,This is a test,Y,1,A);
Which will also give you a ORA-01036 if you try to run it manually.
The problem is that you want some of your column values to be literal values, and some of them to be strings escaped in single-quotes, and your code doesn't do that. I don't think there's an easy to way to do it with ','.join(), so you'll either need to modify your CSVs to quote the strings, like:
seq_addnoteid.nextval,"'123456'","'TEST'","'ADMN_TEST'",sysdate,"'ME'","'This is a test'","'Y'",1,"'A'"
Or modify your query.format to add the quotes around the parameters that you want to treat as strings:
query.format(','.join(columns), "?,'?','?','?',?,'?','?','?',?,'?'")
As the commenters mentioned, pandas does handle this all very nicely.
EDIT: I see what you're saying. I'm not sure pandas will help with the literal functions you want to pass to the insert. But yes, you should be able to change your CSV and then do:
query.format(','.join(columns) + ',ID,CREATION_DATE', "'?','?','?','?','?','?',?,'?',seq_addnoteid.nextval,sysdate")
As a side note, a lot of people do this sort of thing on the database side in a BEFORE INSERT trigger, e.g.:
create or replace trigger NOTES_INS_TRG
before insert on NOTES
for each row
begin
:NEW.ID := seq_addnoteid.nextval;
:NEW.CREATION_DATE := sysdate;
end;
/
Then you could leave those columns out of your insert entirely.
Edit again:
I'm not sure you can use ? for bind/substitution variables in cx_oracle (see documentation ). So where your raw query is currently:
INSERT INTO NOTES(ACCOUNTID,CATEGORY,SUBCATEGORY,CREATED_BY,REMARK,ISIMPORTANT,TYPE,ENTITY_TYPE,ID,CREATION_DATE)
values (seq_addnoteid.nextval,sysdate,'?','?','?','?','?','?',?,'?')
You'd need something like:
INSERT INTO NOTES(ACCOUNTID,CATEGORY,SUBCATEGORY,CREATED_BY,REMARK,ISIMPORTANT,TYPE,ENTITY_TYPE,ID,CREATION_DATE)
values (seq_addnoteid.nextval,sysdate,:1,:2,:3,:4,:5,:6,:7,:8)
We can probably do that by modifying the format string again to generate some bind variables:
query.format('ID,CREATION_DATE,' + ','.join(columns),
"seq_addnoteid.nextval,sysdate," + ','.join([':'+c for c in columns])
Again, try printing the query before executing it to make sure the column names and values are lining up correctly.
I want to read a csv file to insert data into postgres SQL with Python
but I have these error:
cursor.execute(passdata)
psycopg2.IntegrityError: duplicate key value violates unique constraint "prk_constraint_project"
DETAIL: Key (project_code)=(%s) already exists.
My code is:
clinicalCSVINSERT = open(clinicalname, 'r')
reader = csv.reader(clinicalCSVINSERT, delimiter='\t')
passdata = "INSERT INTO project (project_code, program_name ) VALUES ('%s', '%s')";
cursor.execute(passdata)
conn.commit()`
What does this error mean?
Is it possible to have a working script?
The immediate problem with your code is that you are trying to include the literal %s. Since you probably did run it more than once you already have a literal %s in that unique column hence the exception.
It is necessary to pass the values wrapped in an iterable as parameters to the execute method. The %s is just a value place holder.
passdata = """
INSERT INTO project (project_code, program_name )
VALUES (%s, %s)
"""
cursor.execute(passdata, (the_project_code, the_program_name))
Do not quote the %s. Psycopg will do it if necessary.
As your code does not include a loop it will only insert one row from the csv. There are some patterns to insert the whole file. If the requirements allow just use copy_from which is simpler and faster.
I am having troubles finding out if I can even do this. Basically, I have a csv file that looks like the following:
1111,804442232,1
1112,312908721,1
1113,A*2434,1
1114,A*512343128760987,1
1115,3512748,1
1116,1111,1
1117,1234,1
This is imported into a sqlite database in memory for manipulation. I will be importing multiple files into this database after some manipulation. Sqlite is allowing me to keep constraints on the tables and receive errors where needed without creating additional functions just to check each constraint while using arrays in python. I want to do a few things but the first of which is to prepend field2 where all field2 strings match an entry in field1.
For example, in the above data field2 in entry 6 matches entry 1. In this case I would like to prepend field2 in entry 6 with '555'
If this is not possible I do believe I could make do using a regex and just do this on every row with 4 digits in field2... though... I have yet to successfully get REGEX working using python/sqlite as it always throws me an error.
I am working within Python using Sqlite3 to connect/manipulate my sqlite database.
EDIT: I am looking for a method to manipulate the resultant tables which reside in a sqlite database rather than manipulating just the csv data. The data above is just a simple representation of what is contained in the files I am working with. Would it be better to work with arrays containing the data from the csv files? These files have 10,000+ entries and about 20-30 columns.
If you must do it in SQLite, how about this:
First, get the column names of the table by running the following and parsing the result
def get_columns(table_name, cursor):
cursor.execute('pragma table_info(%s)' % table_name)
return [row[1] for row in cursor]
conn = sqlite3.connect('test.db')
columns = get_columns('test_table',conn.cursor())
For each of those columns, run the following update, that does your prepending
def prepend(column, reference, prefix, cursor):
query = '''
UPDATE %s
SET %s = 'prefix' || %s
WHERE %s IN (SELECT %s FROM %s)
''' % (table, column, column, column, reference, table)
cursor.execute(query)
reference = 'field1'
[prepend('test_table', column, reference, '555', conn.cursor())
for column in columns
if column != reference]
Note that this is expensive: O(n^2) for each column you want to do it for.
As per your edit and Nathan's answer, it might be better to simply work with python's builtin datastructures. You can always insert it into SQLite after.
10,000 entries is not really much so it might not matter in the end. It all depends on your reason for requiring it to be done in SQLite (which we don't have much visibility of).
There is no need to use regex expressions to do this, just throw the contents from the first column into a set and then iterate through the rows and update the second field.
first_col_values = set(row[0] for row in rows)
for row in rows:
if row[1] in first_col_values:
row[1] = '555' + row[1]
So... I found the answer to my own question after a ridiculous amount of my own searching and trial and error. My unfamiliarity with SQL had me stumped as I was trying all kinds of crazy things. In the end... this was the simple type of solution I was looking for:
prefix="555"
cur.execute("UPDATE table SET field2 = %s || field2 WHERE field2 IN (SELECT field1 FROM table)"% (prefix))
I kept the small amount of python in there but what I was looking for was the SQL statement. Not sure why nobody else came up with something that simple =/. Unsatisfied with the answers so far, I had been searching far and wide for this simple line >_<.
In my Python application I have been using sqlite3.Row as the row factory to index results by name for a while with no issues. Recently I moved my application to a new server (no code changes), and I discovered this method of indexing is now unexpectedly failing on the new server given quite a specific condition. I cannot see any explanation for it.
The problem seems to occur on the new server when I have the DISTINCT keyword in my select query:
import sqlite3
conn = sqlite3.connect(':memory:')
conn.row_factory = sqlite3.Row
c = conn.cursor()
c.execute('create table test ([name] text)')
c.execute("insert into test values ('testing')")
conn.commit()
c.execute('select [name] from test')
row = c.fetchone()
print row['name'] # works fine on both machines
c.execute('select distinct [name] from test') # add distinct keyword
row = c.fetchone()
print row['name'] # fails on new server (no item with that key)
As you can see I am able to sandbox this problem using an in-memory database, so the problem is nothing to do with my existing data. Both machines are Debian based (old: Ubuntu 8.10, new: Debian 5.0.3) and both machines are running Python 2.5.2. I believe the sqlite3 module is a core part of the Python install, so I do not know how this subtle breakage can be occurring since the python versions are identical.
Has anyone got any ideas, or seen anything like this before?
Thanks,
Chris
Try adding the line
print row.keys()
instead of "print row['name']" to see what column 0's actual name is in the second case (it's probably altered by the "DISTINCT" keyword).
Alternatively you can use row[0] in this case, but that's most likely not what you want. :)
I had a different, but similar, problem but googling "indexerror no item with that key" led me to this question. In my case the issue was that different sqlite versions appear to handle row key names in row_factory = sqlite3.Row mode differently. In sqlite 3.24.0, a query like:
select table.col
from table
...creates a key in the row dictionary like col. But older versions appear to use the qualified key like table.col. Providing an explicit alias or not qualifying the column is a workaround. e.g:
select table.col as "col"
from table
Or:
select col
from table
I want to load a CSV file that looks like this:
Acct. No.,1-15 Days,16-30 Days,31-60 Days,61-90 Days,91-120 Days,Beyond 120 Days
2314134101,898.89,8372.16,5584.23,7744.41,9846.54,2896.25
2414134128,5457.61,7488.26,9594.02,6234.78,273.7,2356.13
2513918869,2059.59,7578.59,9395.51,7159.15,5827.48,3041.62
1687950783,4846.85,8364.22,9892.55,7213.45,8815.33,7603.4
2764856043,5250.11,9946.49,8042.03,6058.64,9194.78,8296.2
2865446086,596.22,7670.04,8564.08,3263.85,9662.46,7027.22
,4725.99,1336.24,9356.03,1572.81,4942.11,6088.94
,8248.47,956.81,8713.06,2589.14,5316.68,1543.67
,538.22,1473.91,3292.09,6843.89,2687.07,9808.05
,9885.85,2730.72,6876,8024.47,1196.87,1655.29
But if you notice, some of the fields are incomplete. I'm thinking MySQL will just skip the row where the first column is missing. When I run the command:
LOAD DATA LOCAL INFILE 'test-long.csv' REPLACE INTO TABLE accounts
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(cf_535, cf_580, cf_568, cf_569, cf_571, cf_572);
And the MySQL output is:
Query OK, 41898 rows affected, 20948 warnings (0.78 sec)
Records: 20949 Deleted: 20949 Skipped: 0 Warnings: 20948
The number of lines is only 20,949 but MySQL reports it as 41,898 rows affected. Why so? Also, nothing really changed in the table. I also couldn't see what the warnings generated is all about. I wanted to use the LOAD DATA INFILE because it takes python half a second to update each row which translates to 2.77 hours for a file with 20,000+ records.
UPDATE: Modified the code to set auto-commit to 'False' and added a db.commit() statement:
# Tell MySQLdb to turn off auto-commit
db.autocommit(False)
# Set count to 1
count = 1
while count < len(contents):
if contents[count][0] != '':
cursor.execute("""
UPDATE accounts SET cf_580 = %s, cf_568 = %s, cf_569 = %s, cf_571 = %s, cf_572 = %s
WHERE cf_535 = %s""" % (contents[count][1], contents[count][2], contents[count][3], contents[count][4], contents[count][5], contents[count][0]))
count += 1
try:
db.commit()
except:
db.rollback()
You have basically 3 issues here. In reverse order
Are you doing your Python inserts in individual statements? You probably want to surround them all with a begin transaction/commit. 20,000 commits could easily take hours.
Your import statement defines 6 fields, but the CSV has 7 fields. That would explain the double row count: every line of input results in 2 rows in the database, the 2nd one with fields 2-6 null.
Incomplete rows will be inserted with null or default values for the missing columns. This may not be what you want with those malformed rows.
If your python program can't perform fast enough even with a single transaction, you should at least have the python program edit/clean the data file before importing. If Acct. No. is the primary key, as seems reasonable, inserting rows with blank will either cause the whole import to fail, or if auto number is on, cause bogus data to be imported.
If you use REPLACE keyword in LOAD DATA, then number after "Deleted: " shows how many rows were actually replaced