Read a CSV to insert data into Postgres SQL with Python - python

I want to read a csv file to insert data into postgres SQL with Python
but I have these error:
cursor.execute(passdata)
psycopg2.IntegrityError: duplicate key value violates unique constraint "prk_constraint_project"
DETAIL: Key (project_code)=(%s) already exists.
My code is:
clinicalCSVINSERT = open(clinicalname, 'r')
reader = csv.reader(clinicalCSVINSERT, delimiter='\t')
passdata = "INSERT INTO project (project_code, program_name ) VALUES ('%s', '%s')";
cursor.execute(passdata)
conn.commit()`
What does this error mean?
Is it possible to have a working script?

The immediate problem with your code is that you are trying to include the literal %s. Since you probably did run it more than once you already have a literal %s in that unique column hence the exception.
It is necessary to pass the values wrapped in an iterable as parameters to the execute method. The %s is just a value place holder.
passdata = """
INSERT INTO project (project_code, program_name )
VALUES (%s, %s)
"""
cursor.execute(passdata, (the_project_code, the_program_name))
Do not quote the %s. Psycopg will do it if necessary.
As your code does not include a loop it will only insert one row from the csv. There are some patterns to insert the whole file. If the requirements allow just use copy_from which is simpler and faster.

Related

Using Python - How can I parse CSV list of both integers and strings and add to SQL table through Insert Statement?

I am automating a task through Python that will run an SQL statement to insert into an existing table in a DB.
My CSV headers look as such:
ID,ACCOUNTID,CATEGORY,SUBCATEGORY,CREATION_DATE,CREATED_BY,REMARK,ISIMPORTANT,TYPE,ENTITY_TYPE
My values:
seq_addnoteid.nextval,123456,TEST,ADMN_TEST,sysdate,ME,This is a test,Y,1,A
NOTE: Currently, seq_addnote works from DB but in my code i added a small snippet to get the max ID and the rows will increase this by one for each iteration.
Sysdate could also be passed as format '19-MAY-22'
If i was to run from DB, this would work:
insert into notes values(seq_addnoteid.nextval,'123456','TEST','ADMN_TEST',sysdate,'ME','This is a test','Y',1,'A');
# Snippet to get function
cursor.execute("SELECT MAX(ID) from NOTES")
max = cursor.fetchone()
max = int(max[0])
with open ('sample.csv', 'r') as f:
reader = csv.reader(f)
columns = next(reader)
query = 'INSERT INTO NOTES({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
cursor = conn.cursor()
for data in reader:
cursor.execute(query, data)
conn.commit()
print("Records inserted successfully")
cursor.close()
conn.close()
Currently, i'm getting Oracle-Error-Message: ORA-01036: illegal variable name/number and i think its because of my query.format line. However, I'm looking for help to get this code to handle the data types properly.
Thanks!
Try printing your query before you execute it. I think you'll find that it's printing this:
INSERT INTO NOTES(ID,ACCOUNTID,CATEGORY,SUBCATEGORY,CREATION_DATE,CREATED_BY,REMARK,ISIMPORTANT,TYPE,ENTITY_TYPE)
values(seq_addnoteid.nextval,123456,TEST,ADMN_TEST,sysdate,ME,This is a test,Y,1,A);
Which will also give you a ORA-01036 if you try to run it manually.
The problem is that you want some of your column values to be literal values, and some of them to be strings escaped in single-quotes, and your code doesn't do that. I don't think there's an easy to way to do it with ','.join(), so you'll either need to modify your CSVs to quote the strings, like:
seq_addnoteid.nextval,"'123456'","'TEST'","'ADMN_TEST'",sysdate,"'ME'","'This is a test'","'Y'",1,"'A'"
Or modify your query.format to add the quotes around the parameters that you want to treat as strings:
query.format(','.join(columns), "?,'?','?','?',?,'?','?','?',?,'?'")
As the commenters mentioned, pandas does handle this all very nicely.
EDIT: I see what you're saying. I'm not sure pandas will help with the literal functions you want to pass to the insert. But yes, you should be able to change your CSV and then do:
query.format(','.join(columns) + ',ID,CREATION_DATE', "'?','?','?','?','?','?',?,'?',seq_addnoteid.nextval,sysdate")
As a side note, a lot of people do this sort of thing on the database side in a BEFORE INSERT trigger, e.g.:
create or replace trigger NOTES_INS_TRG
before insert on NOTES
for each row
begin
:NEW.ID := seq_addnoteid.nextval;
:NEW.CREATION_DATE := sysdate;
end;
/
Then you could leave those columns out of your insert entirely.
Edit again:
I'm not sure you can use ? for bind/substitution variables in cx_oracle (see documentation ). So where your raw query is currently:
INSERT INTO NOTES(ACCOUNTID,CATEGORY,SUBCATEGORY,CREATED_BY,REMARK,ISIMPORTANT,TYPE,ENTITY_TYPE,ID,CREATION_DATE)
values (seq_addnoteid.nextval,sysdate,'?','?','?','?','?','?',?,'?')
You'd need something like:
INSERT INTO NOTES(ACCOUNTID,CATEGORY,SUBCATEGORY,CREATED_BY,REMARK,ISIMPORTANT,TYPE,ENTITY_TYPE,ID,CREATION_DATE)
values (seq_addnoteid.nextval,sysdate,:1,:2,:3,:4,:5,:6,:7,:8)
We can probably do that by modifying the format string again to generate some bind variables:
query.format('ID,CREATION_DATE,' + ','.join(columns),
"seq_addnoteid.nextval,sysdate," + ','.join([':'+c for c in columns])
Again, try printing the query before executing it to make sure the column names and values are lining up correctly.

“Incorrect string value” when trying to insert String into MySQL via Python and Text file

What is causing this incorrect string? I have read many questions and answers and here are my results. I still am getting same error after reading answers.
I am getting the following error:
ERROR 1366 (HY000) at line 34373: Incorrect string value: '\xEF\xBB\xBF<?x...' for column 'change' at row 1
When I try to enter the following into SQL:
Line number 34373: INSERT INTO gitlog_changes VALUES ('123456', 'NhincCommonEntity.xsd', '<?xml version=\"1.0\" encoding=\"UTF-8\"?>');
My table looks like this:
DROP TABLE IF EXISTS `gitlog_changes`;
CREATE TABLE `gitlog_changes` (
`hashID` varchar(40) NOT NULL,
`filename` varchar(450) DEFAULT NULL,
`change` mediumtext
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I read many answers that say change the charset to UTF8 [1][2][3][4]. So I execute this:
alter table yourTableName DEFAULT CHARACTER SET utf8;
I continue to get the same error. Then I alter table yourTableName DEFAULT CHARACTER SET utf8mb4_general_ci;
Still same error occurs.
I also attempt to read in a file from python and do direct commits to the database. From this answer[1]. I get a warning instead of an error.
I insert the following code into my python script:
cursor.execute("SET NAMES 'utf8'")
cursor.execute("SET CHARACTER SET utf8")
Python script:
def insert_changes(modList):
db = MySQLdb.connect("localhost", "user", "password", "table")
cursor = db.cursor()
cursor.execute("SET NAMES 'utf8'")
cursor.execute("SET CHARACTER SET utf8")
for mod in modList:
hashID = mod["hashID"]
fileName = mod["fileName"]
change = mod["change"]
cursor.execute("INSERT INTO gitlog_changes VALUES (%s, %s, %s" , (hashID, fileName, change))
# # disconnect from server
db.commit()
db.close()
The warning I get here is: Warning: Invalid utf8 character string: '\xEF\xBB\xBF<?x...'
cursor.execute("INSERT INTO gitlog_changes VALUES (%s, %s, %s)" , (hashID, fileName, change))
The string you're trying to insert into db has an unusual character at its beginning. I just copied your string:
In [1]: a = '<'
In [2]: a
Out[2]: '\xef\xbb\xbf<'
You need to get rid of those characters. This is a good post explaining what these characters are.
Text you're trying to insert contains UTF-8 BOM in the beginning (that's the \xEF\xBB\xBF in your error).
Please check this answer to see how to convert from UTF-8 with BOM into UTF-8.
As stated in MySQL docs
MySQL uses no BOM for UTF-8 values.
So the only solution is decoding this string in your python code.

SQLite3 Columns Are Not Unique

I'm inserting data from some csv files into my SQLite3 database with a python script I wrote. When I run the script, it inserts the first row into the database, but gives this error when trying to inset the second:
sqlite3.IntegrityError: columns column_name1, column_name2 are not unique.
It is true the values in column_name1 and column_name2 are same in the first two rows of the csv file. But, this seems a bit strange to me, because reading about this error indicated that it signifies a uniqueness constraint on one or more of the database's columns. I checked the database details using SQLite Expert Personal, and it does not show any uniqueness constraints on the current table. Also, none of the fields that I am entering specify the primary key. It seems that the database automatically assigns those. Any thoughts on what could be causing this error? Thanks.
import sqlite3
import csv
if __name__ == '__main__' :
conn = sqlite3.connect('ts_database.sqlite3')
c = conn.cursor()
fileName = "file_name.csv"
f = open(fileName)
csv_f = csv.reader(f)
for row in csv_f:
command = "INSERT INTO table_name(column_name1, column_name2, column_name3)"
command += " VALUES (%s, '%s', %s);" % (row[0],row[1],row[2])
print command
c.execute(command)
conn.commit()
f.close()
If SQLite is reporting an IntegrityError error it's very likely that there really is a PRIMARY KEY or UNIQUE KEY on those two columns and that you are mistaken when you state there is not. Ensure that you're really looking at the same instance of the database.
Also, do not write your SQL statement using string interpolation. It's dangerous and also difficult to get correct (as you probably know considering you have single quotes on one of the fields). Using parameterized statements in SQLite is very, very simple.
The error may be due to duplicate column names in the INSERT INTO statement. I am guessing it is a typo and you meant column_name3 in the INSERT INTO statement.

python, mysql, inserting string into table, error 1054

I am having the problem
OperationalError: (1054, "Unknown column 'Ellie' in 'field list'")
With the code below, I'm trying to insert data from json into a my sql database. The problem happens whenever I try to insert a string in this case "Ellie" This is something do to with string interpolation I think but I cant get it to work despite trying some other solutions I have seen here..
CREATE TABLE
con = MySQLdb.connect('localhost','root','','tweetsdb01')
cursor = con.cursor()
cursor.execute("CREATE TABLE IF NOT EXISTS User(user_id BIGINT NOT NULL PRIMARY KEY, username varchar(25) NOT NULL,user varchar(25) NOT NULL) CHARACTER SET utf8 COLLATE utf8_unicode_ci ENGINE=InnoDB")
con.commit()
INSERT INTO
def populate_user(a,b,c):
con = MySQLdb.connect('localhost','root','','tweetsdb01')
cursor = con.cursor()
cursor.execute("INSERT INTO User(user_id,username,user) VALUES(%s,%s,%s)"%(a,b,c))
con.commit()
cursor.close()
READ FILE- this calls the populate method above
def read(file):
json_data=open(file)
tweets = []
for i in range(10):
tweet = json.loads(json_data.readline())
populate_user(tweet['from_user_id'],tweet['from_user_name'],tweet['from_user'])
Use parametrized SQL:
cursor.execute("INSERT INTO User(user_id,username,user) VALUES (%s,%s,%s)", (a,b,c))
(Notice the values (a,b,c) are passed to the function execute as a second argument, not as part of the first argument through string interpolation). MySQLdb will properly quote the arguments for you.
PS. As Vajk Hermecz notes, the problem occurs because the string 'Ellie' is not being properly quoted.
When you do the string interpolation with "(%s,)" % (a,) you get
(Ellie,) whereas what you really want is ('Ellie',). But don't bother doing the quoting yourself. It is safer and easier to use parametrized SQL.
Your problem is that you are adding the values into the query without any escaping.... Now it is just broken. You could do something like:
cursor.execute("INSERT INTO User(user_id,username,user) VALUES(\"%s\",\"%s\",\"%s\")"%(a,b,c))
But that would just introduce SQL INJECTION into your code.
NEVER construct SQL statements with concatenating query and data. Your parametrized queries...
The proper solution here would be:
cursor.execute("INSERT INTO User(user_id,username,user) VALUES(%s,%s,%s)", (a,b,c))
So, the problem with your code was that you have used the % operator which does string formatting, and finally you just gave one parameter to cursor.execute. Now the proper solution, is that instead of doing the string formatting yourself, you give the query part to cursor.execute in the first parameter, and provide the tuple with arguments in the second parameter.

python execute many with "on duplicate key update"?

I am trying to executemany in python with on duplicate key update, with the following script:
# data from a previous query (returns 4 integers in each row)
rows = first_cursor.fetchall()
query="""
INSERT INTO data (a, b, c)
VALUES (%s,%s,%s) ON DUPLICATE KEY UPDATE a=%s
"""
second_cursor.executemany(query,rows)
I'm getting this error:
File "/usr/lib/pymodules/python2.6/MySQLdb/cursors.py", line 212, in executemany
self.errorhandler(self, TypeError, msg)
File "/usr/lib/pymodules/python2.6/MySQLdb/connections.py", line 35, in defaulterrorhandler
raise errorclass, errorvalue
TypeError: not all arguments converted during string formatting
Is this even possible without creating my own loop?
This is a bug in MySQLdb due to the regex that MySQLdb uses to parse INSERT statements:
In /usr/lib/pymodules/python2.7/MySQLdb/cursors.py:
restr = (r"\svalues\s*"
r"(\(((?<!\\)'[^\)]*?\)[^\)]*(?<!\\)?'"
r"|[^\(\)]|"
r"(?:\([^\)]*\))"
r")+\))")
insert_values= re.compile(restr)
Although there have been numerous bug reports about this problem that have been closed as fixed, I was able to reproduce the error in MySQLdb version 1.2.3. (Note the latest version of MySQLdb at the moment is 1.2.4b4.)
Maybe this bug is fixable, I don't really know. But I think it is just the tip of the iceberg -- it points to much more trouble lurking just a little deeper. You could have for instance an INSERT ... SELECT statement with nested SELECT statements with WHERE conditions and parameters sprinkled all about... Making the regex more and more complicated to handle these cases seems to me like a losing battle.
You could use oursql; it does not use regex or string formating. It passes parametrized queries and arguments to the server separately.
When you write sql like following:
sql = insert into A (id, last_date, count) values(%s, %s, %s) on duplicate key update last_date=%s, count=count+%s'
You will get the following error: TypeError: not all arguments converted during string formatting.
So when you use "ON DUPLICATE KEY UPDATE" in python, you need to write sql like this:
sql = 'insert into A (id, last_date, count) values(%s, %s, %s) on duplicate key update last_date=values(last_date),count=count+values(count)'
found:
on duplicate key update col1=VALUES(col1), col2=VALUES(col2)
https://hardforum.com/threads/python-mysql-not-all-arguments-converted-during-string-formatting.1367039/
It is a bug of mysqldb as ubuntu said, sightly change the sql then it works:
insert into tb_name(col1, col2) select 1,2 on duplicate key update col1=1

Categories

Resources