psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xa0

psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xa0 - python

I have done quite a bit of googling on this error and have boiled it down to the fact that the databases I am working with are in different encodings.
The AIX server I am working with is running
psql 8.2.4
server_encoding | LATIN1 | | Client Connection Defaults / Locale and Formatting | Sets the server (database) character set encoding.
The windows 2008 R2 server I am working with is running
psql (9.3.4)
CREATE DATABASE postgres
WITH OWNER = postgres
ENCODING = 'UTF8'
TABLESPACE = pg_default
LC_COLLATE = 'English_Australia.1252'
LC_CTYPE = 'English_Australia.1252'
CONNECTION LIMIT = -1;
COMMENT ON DATABASE postgres
IS 'default administrative connection database';
Now when i try execute my below python script I get this error
Traceback (most recent call last):
File "datamain.py", line 39, in <module>
sys.exit(main())
File "datamain.py", line 33, in main
write_file_to_table("cms_jobdef.txt", "cms_jobdef", con_S104838)
File "datamain.py", line 21, in write_file_to_table
cur.copy_from(f, table, ",")
psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xa0
CONTEXT: COPY cms_jobdef, line 15209
Here is my script
import psycopg2
import StringIO
import sys
import pdb
def connect_db(db, usr, pw, hst, prt):
conn = psycopg2.connect(database=db, user=usr,
password=pw, host=hst, port=prt)
return conn
def write_table_to_file(file, table, connection):
f = open(file, "w")
cur = connection.cursor()
cur.copy_to(f, table, ",")
f.close()
cur.close()
def write_file_to_table(file, table, connection):
f = open(file,"r")
cur = connection.cursor()
cur.copy_from(f, table, ",")
f.close()
cur.close()
def main():
login = open('login.txt','r')
con_tctmsv64 = connect_db("x", "y",
login.readline().strip(),
"d.domain", "c")
con_S104838 = connect_db("x", "y", "z", "a", "b")
try:
write_table_to_file("cms_jobdef.txt", "cms_jobdef", con_tctmsv64)
write_file_to_table("cms_jobdef.txt", "cms_jobdef", con_S104838)
finally:
con_tctmsv64.close()
con_S104838.close()
if __name__ == "__main__":
sys.exit(main())
have removed some sensitive data.
So I'm not sure how I can proceed. As far as I can tell the copy_expert method might help by exporting as a UTF8 encoding. But because the server I am pulling the data from is running 8.2.4 I Dont think it supports COPY encoding format.
I think my best shot is to try and reinstall the postgre database with an encoding of LATIN1 on the windows server. When I try and do that I get the below error.
So im quite stuck,any help would be greatly appreciated!
Update I installed the postgre db on the windows as LATIN1 encoding by changing the default local to 'C'. This however gave me the below error and doesnt seem like a likely successful/correct approach
I have also tried encoding the files in BINARY using the PSQL COPY function
def write_table_to_file(file, table, connection):
f = open(file, "w")
cur = connection.cursor()
#cur.copy_to(f, table, ",")
cur.copy_expert("COPY cms_jobdef TO STDOUT WITH BINARY", f)
f.close()
cur.close()
def write_file_to_table(file, table, connection):
f = open(file,"r")
cur = connection.cursor()
#cur.copy_from(f, table)
cur.copy_expert("COPY cms_jobdef FROM STDOUT WITH BINARY", f)
f.close()
cur.close()
Still no luck I get the same error
DataError: invalid byte sequence for encoding "UTF8": 0xa0
CONTEXT: COPY cms_jobdef, line 15209, column descript
In relation to Phils answer I have tried this approach with still no success.
import psycopg2
import StringIO
import sys
import pdb
import codecs
def connect_db(db, usr, pw, hst, prt):
conn = psycopg2.connect(database=db, user=usr,
password=pw, host=hst, port=prt)
return conn
def write_table_to_file(file, table, connection):
f = open(file, "w")
#fx = codecs.EncodedFile(f,"LATIN1", "UTF8")
cur = connection.cursor()
cur.execute("SHOW client_encoding;")
print cur.fetchone()
cur.copy_to(f, table)
#cur.copy_expert("COPY cms_jobdef TO STDOUT WITH BINARY", f)
f.close()
cur.close()
def write_file_to_table(file, table, connection):
f = open(file,"r")
cur = connection.cursor()
cur.execute("SET CLIENT_ENCODING TO 'LATIN1';")
cur.execute("SHOW client_encoding;")
print cur.fetchone()
cur.copy_from(f, table)
#cur.copy_expert("COPY cms_jobdef FROM STDOUT WITH BINARY", f)
f.close()
cur.close()
def main():
login = open('login.txt','r')
con_tctmsv64 = connect_db("x", "y",
login.readline().strip(),
"ctmtest1.int.corp.sun", "5436")
con_S104838 = connect_db("x", "y", "z", "t", "5432")
try:
write_table_to_file("cms_jobdef.txt", "cms_jobdef", con_tctmsv64)
write_file_to_table("cms_jobdef.txt", "cms_jobdef", con_S104838)
finally:
con_tctmsv64.close()
con_S104838.close()
if __name__ == "__main__":
sys.exit(main())
output
In [4]: %run datamain.py
('sql_ascii',)
('LATIN1',)
In [5]:
This completes successfully but when i run a
select * from cms_jobdef;
Nothing is in the new database
I have even tried converting the file format from LATIN1 to UTF8. Still no luck
The weird thing is when I do this process manually by only using the postgre COPY function it works. I have no idea why. Once again any help would be greatly appreciated.

Turns out there are a few options to solve this problem.
The option to change the clients encoding suggested by Phil does work.
cur.execute("SET CLIENT_ENCODING TO 'LATIN1';")
Another option is to convert the data in the fly. I used a python module called codecs to do this.
f = open(file, "w")
fx = codecs.EncodedFile(f,"LATIN1", "UTF8")
cur = connection.cursor()
cur.execute("SHOW client_encoding;")
print cur.fetchone()
cur.copy_to(fx, table)
The key line being
fx = codecs.EncodedFile(f,"LATIN1", "UTF8")
My main problem was that I was not committing my changes to the database! Silly me :)

I'm in the process of migrating from an SQL_ASCII database to a UTF8 database, and ran into the same problem. Based on this answer, I simply added this statement to the start of my import script:
set client_encoding to 'latin1'
and everything appears to have imported correctly.

Related

Get a type in python with cx_Oracle

I'm working in a simple script to load a dump file into an AWS RDS Oracle install.
But I have some problems getting the FILE_TYPE with connection.gettype
Here is my code:
from __future__ import print_function
import cx_Oracle
size_limit=1024
print(cx_Oracle.clientversion())
connection = cx_Oracle.connect("XXX", "XX", "XXXX:XXX/XXX")
cursor = connection.cursor()
try:
typeObj = connection.gettype("VARCHAR")
OTYPE = connection.gettype("UTL_FILE.FILE_TYPE")
NFILE = cursor.callfunc('UTL_FILE.FOPEN',returnType=OTYPE,parameters=['DATA_PUMP_DIR','BIOMGRDB_DEV-test.dmp','wb',size_limit])
f = open("BIOMGRDB_DEV.dmp", "rb")
try:
byte = f.read(size_limit)
while byte != "":
# Do stuff with byte.
print("Reading")
print(byte)
cursor.callfunc('UTL_FILE.PUT_RAW', parameters=[NFILE,byte])
byte = f.read(size_limit)
finally:
cursor.callfunc('UTL_FILE.FCLOSE', parameters=[NFILE])
f.close()
finally:
print("FINALLY")
cursor.close()
connection.close()
Error is as following, no matter what type I use:
cx_Oracle.DatabaseError: OCI-22303: type ""."UTL_FILE.FILE_TYPE" not found
It looks like it is trying to find the type inner a package.
I ran the following PL/SQL code just to ensure I have privileges to write files:
declare
fHandle UTL_FILE.FILE_TYPE;
begin
fHandle := UTL_FILE.FOPEN('DATA_PUMP_DIR', 'test_file', 'w');
UTL_FILE.PUT(fHandle, 'This is the first line');
UTL_FILE.PUT(fHandle, 'This is the second line');
UTL_FILE.PUT_LINE(fHandle, 'This is the third line');
UTL_FILE.FCLOSE(fHandle);
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE('Exception: SQLCODE=' || SQLCODE || ' SQLERRM=' || SQLERRM);
RAISE;
end;
And it works well.
Here my enviornmet configuration:
Python 2.7.13
Instant client (19, 3, 0, 0, 0)
cx_Oracle 7.1.3
Oracle 12c
I hope someone can help me.
Update
I create a new type as following:
CREATE OR REPLACE TYPE TYPE1 AS OBJECT
( SOMETHING VARCHAR2(100)
)
And if i use it as following:
OTYPE = connection.gettype("TYPE1")
It works and I get the following error:
NFILE = cursor.callfunc('UTL_FILE.FOPEN',returnType=OTYPE,parameters=['DATA_PUMP_DIR','BIOMGRDB_DEV-test.dmp','wb',size_limit])
cx_Oracle.DatabaseError: ORA-06550: line 1, column 13:
PLS-00382: expression is of wrong type
ORA-06550: line 1, column 7:
Now, I have a question: How can I use the FILE_TYPE object in cx_Oracle?
I also tried to create a synonym and it does not work. Any suggestion?

This works exactly as expected:
import cx_Oracle
conn = cx_Oracle.connect("user/pw#host/service")
fileType = conn.gettype("UTL_FILE.FILE_TYPE")
cursor = conn.cursor()
sizeLimit = 1024
result = cursor.callfunc("UTL_FILE.FOPEN", fileType,
["DATA_PUMP_DIR", "test_file", "wb", sizeLimit])
print("Result:", result)
Please try that script and advise the error and traceback you get. Also please advise on the exact version of the database you are using. There are limitations for using PL/SQL types like records (which is what UTL_FILE.FILE_TYPE is).

What is the best way to dump MySQL table data to csv and convert character encoding?

I have a table with about 200 columns. I need to take a dump of the daily transaction data for ETL purposes. Its a MySQL DB. I tried that with Python both using pandas dataframe as well as basic write to CSV file method. I even tried to look for the same functionality using shell script. I saw one such for oracle Database using sqlplus. Following are my python codes with the two approaches:
Using Pandas:
import MySQLdb as mdb
import pandas as pd
host = ""
user = ''
pass_ = ''
db = ''
query = 'SELECT * FROM TABLE1'
conn = mdb.connect(host=host,
user=user, passwd=pass_,
db=db)
df = pd.read_sql(query, con=conn)
df.to_csv('resume_bank.csv', sep=',')
Using basic python file write:
import MySQLdb
import csv
import datetime
currentDate = datetime.datetime.now().date()
host = ""
user = ''
pass_ = ''
db = ''
table = ''
con = MySQLdb.connect(user=user, passwd=pass_, host=host, db=db, charset='utf8')
cursor = con.cursor()
query = "SELECT * FROM %s;" % table
cursor.execute(query)
with open('Data_on_%s.csv' % currentDate, 'w') as f:
writer = csv.writer(f)
for row in cursor.fetchall():
writer.writerow(row)
print('Done')
The table has about 300,000 records. It's taking too much time with both the python codes.
Also, there's an issue with encoding here. The DB resultset has some latin-1 characters for which I'm getting some errors like : UnicodeEncodeError: 'ascii' codec can't encode character '\x96' in position 1078: ordinal not in range(128).
I need to save the CSV in Unicode format. Can you please help me with the best approach to perform this task.
A Unix based or Python based solution will work for me. This script needs to be run daily to dump daily data.

You can achieve that just leveraging MySql. For example:
SELECT * FROM your_table WHERE...
INTO OUTFILE 'your_file.csv'
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
FIELDS ESCAPED BY '\'
LINES TERMINATED BY '\n';
if you need to schedule your query put such a query into a file (e.g., csv_dump.sql) anche create a cron task like this one
00 00 * * * mysql -h your_host -u user -ppassword < /foo/bar/csv_dump.sql

For strings this will use the default character encoding which happens to be ASCII, and this fails when you have non-ASCII characters. You want unicode instead of str.
rows = cursor.fetchall()
f = open('Data_on_%s.csv' % currentDate, 'w')
myFile = csv.writer(f)
myFile.writerow([unicode(s).encode("utf-8") for s in rows])
fp.close()

You can use mysqldump for this task. (Source for command)
mysqldump -u username -p --tab -T/path/to/directory dbname table_name --fields-terminated-by=','
The arguments are as follows:
-u username for the username
-p to indicate that a password should be used
-ppassword to give the password via command line
--tab Produce tab-separated data files
For mor command line switches see https://dev.mysql.com/doc/refman/5.5/en/mysqldump.html
To run it on a regular basis, create a cron task like written in the other answers.

How do I import a MySQL database in a Python script?

I've seen some similar questions about this on StackOverflow but haven't found an answer that works; see http://stackoverflow.com/questions/4408714/execute-sql-file-with-python-mysqldb AND http://stackoverflow.com/questions/10593876/execute-sql-file-in-python-with-mysqldb?lq=1
Here is my code:
import pymysql
import sys
import access # holds credentials
import mysql_connector # connects to MySQL, is fully functional
class CreateDB(object):
def __init__(self):
self.cursor = None
self.conn = pymysql.connect(host, user, passwd)
def create_database(self):
try:
with self.conn.cursor() as cursor:
for line in open('file.sql'):
cursor.execute(line)
self.conn.commit()
except Warning as warn:
f = open(access.Credentials().error_log, 'a')
f.write('Warning: %s ' % warn + '\nStop.\n')
sys.exit()
create = CreateDB()
create.create_database()
When I run my script I get the following error:
pymysql.err.InternalError: (1065, 'Query was empty')
My .sql file is successfully loaded when I import directly through MySQL and there is a single query on each line of the file. Does anybody have a solution for this? I have followed the suggestions on other posts but have not had any success.

Take care of empty lines in the end of the file by:
if line.strip(): cursor.execute(line)

You can execute all the SQL in the file at once, by using the official MySQL Connector/Python and the Multi parameter in its cursor.execute method.
Quote from the second link:
If multi is set to True, execute() is able to execute multiple statements specified in the operation string. It returns an iterator that enables processing the result of each statement.
Example code from the link, slightly modified:
import mysql.connector
file = open('script.sql')
sql = file.read()
cnx = mysql.connector.connect(user='u', password='p', host='h', database='d')
cursor = cnx.cursor()
for result in cursor.execute(sql, multi=True):
if result.with_rows:
print("Rows produced by statement '{}':".format(
result.statement))
print(result.fetchall())
else:
print("Number of rows affected by statement '{}': {}".format(
result.statement, result.rowcount))
cnx.close()

Fast MySQL Import

Writing a script to convert raw data for MySQL import I worked with a temporary textfile so far which I later imported manually using the LOAD DATA INFILE... command.
Now I included the import command into the python script:
db = mysql.connector.connect(user='root', password='root',
host='localhost',
database='myDB')
cursor = db.cursor()
query = """
LOAD DATA INFILE 'temp.txt' INTO TABLE myDB.values
FIELDS TERMINATED BY ',' LINES TERMINATED BY ';';
"""
cursor.execute(query)
cursor.close()
db.commit()
db.close()
This works but temp.txt has to be in the database directory which isn't suitable for my needs.
Next approch is dumping the file and commiting directly:
db = mysql.connector.connect(user='root', password='root',
host='localhost',
database='myDB')
sql = "INSERT INTO values(`timestamp`,`id`,`value`,`status`) VALUES(%s,%s,%s,%s)"
cursor=db.cursor()
for line in lines:
mode, year, julian, time, *values = line.split(",")
del values[5]
date = datetime.strptime(year+julian, "%Y%j").strftime("%Y-%m-%d")
time = datetime.strptime(time.rjust(4, "0"), "%H%M" ).strftime("%H:%M:%S")
timestamp = "%s %s" % (date, time)
for i, value in enumerate(values[:20], 1):
args = (timestamp,str(i+28),value, mode)
cursor.execute(sql,args)
db.commit()
Works as well but takes around four times as long which is too much. (The same for construct was used in the first version to generate temp.txt)
My conclusion is that I need a file and the LOAD DATA INFILE command to be faster. To be free where the textfile is placed the LOCAL option seems useful. But with MySQL Connector (1.1.7) there is the known error:
mysql.connector.errors.ProgrammingError: 1148 (42000): The used command is not allowed with this MySQL version
So far I've seen that using MySQLdb instead of MySQL Connector can be a workaround. Activity on MySQLdb however seems low and Python 3.3 support will probably never come.
Is LOAD DATA LOCAL INFILE the way to go and if so is there a working connector for python 3.3 available?
EDIT: After development the database will run on a server, script on a client.

I may have missed something important, but can't you just specify the full filename in the first chunk of code?
LOAD DATA INFILE '/full/path/to/temp.txt'
Note the path must be a path on the server.

To use LOAD DATA INFILE with every accessible file you have to set the
LOCAL_FILES client flag while creating the connection
import mysql.connector
from mysql.connector.constants import ClientFlag
db = mysql.connector.connect(client_flags=[ClientFlag.LOCAL_FILES], <other arguments>)

Writing accented characters to Oracle

I have to update an existing script so that it writes some data to an Oracle 10g database. The script and the database both run on the same Solaris 10 (Intel) machine. Python is v2.4.4.
I'm using cx_Oracle and can read/write to the database with no problem. But the data I'm writing contains accented characters which are not getting written correctly. The accented character turns into an upside-down question mark.
The value is read from a binary file with this code, in :
class CustomerHeaderRecord:
def __init__( self, rec, debug = False ):
self.record = rec
self.acct = rec[ 84:104 ]
And the contents of the acct variable displays on-screen correctly.
Below is the code that writes to the db (the acct value is passed in as the val_1 variable):
class MQ:
def __init__( self, rec, debug = False ):
self.customer_record = CustomerHeaderRecord( rec, debug )
self.add_record(self.customer_record.acct, self.cm_custid)
def add_record(self, val_1, val_2):
cur = conn.cursor()
qry = "select count(*) from table_name where value1 = :val1"
cur.execute(qry, {'val1':val_1})
count = cur.fetchone()
if count[0] == 0:
cur = conn.cursor()
qry = "insert into table_name (value1, value2) values(:val1, :val2)"
cur.execute(qry, {'val1':val_1, 'val2':val_2})
conn.commit()
The acct value doesn't make it to the database correctly. I've googled a bunch of stuff about unicode and UTF-8 but haven't found anything that helps me yet. In the database, the NLS_LANGUAGE is 'American' and the NLS_CHARACTERSET is 'AL32UTF8'.
Do I need to 'do something' to/with the acct variable before/during the insert?

Your input file appears to be encoding in Latin-1. Decode this to unicode data; cx_Oracle will do the rest for you:
acct = rec[ 84:104 ].decode('latin1')
or use the codecs.open() function to open the file for automatic decoding:
inputfile = codecs.open(filename, 'r', encoding='latin1')
Reading from inputfile will give you unicode data.
On insertion, the cx_Oracle library will encode unicode values to the correct encoding that Oracle expects. You do need to set the NLS_LANG environment variable to AL32UTF8 before connecting, either in the shell or in Python with:
os.environ["NLS_LANG"] = ".AL32UTF8"
You may want to review the Python Unicode HOWTO for more details.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xa0 - python

I'm in the process of migrating from an SQL_ASCII database to a UTF8 database, and ran into the same problem. Based on this answer, I simply added this statement to the start of my import script: set client_encoding to 'latin1' and everything appears to have imported correctly.

Related

Get a type in python with cx_Oracle

What is the best way to dump MySQL table data to csv and convert character encoding?

How do I import a MySQL database in a Python script?

Fast MySQL Import

Writing accented characters to Oracle

Categories

Resources