Writing accented characters to Oracle - python

I have to update an existing script so that it writes some data to an Oracle 10g database. The script and the database both run on the same Solaris 10 (Intel) machine. Python is v2.4.4.
I'm using cx_Oracle and can read/write to the database with no problem. But the data I'm writing contains accented characters which are not getting written correctly. The accented character turns into an upside-down question mark.
The value is read from a binary file with this code, in :
class CustomerHeaderRecord:
def __init__( self, rec, debug = False ):
self.record = rec
self.acct = rec[ 84:104 ]
And the contents of the acct variable displays on-screen correctly.
Below is the code that writes to the db (the acct value is passed in as the val_1 variable):
class MQ:
def __init__( self, rec, debug = False ):
self.customer_record = CustomerHeaderRecord( rec, debug )
self.add_record(self.customer_record.acct, self.cm_custid)
def add_record(self, val_1, val_2):
cur = conn.cursor()
qry = "select count(*) from table_name where value1 = :val1"
cur.execute(qry, {'val1':val_1})
count = cur.fetchone()
if count[0] == 0:
cur = conn.cursor()
qry = "insert into table_name (value1, value2) values(:val1, :val2)"
cur.execute(qry, {'val1':val_1, 'val2':val_2})
conn.commit()
The acct value doesn't make it to the database correctly. I've googled a bunch of stuff about unicode and UTF-8 but haven't found anything that helps me yet. In the database, the NLS_LANGUAGE is 'American' and the NLS_CHARACTERSET is 'AL32UTF8'.
Do I need to 'do something' to/with the acct variable before/during the insert?

Your input file appears to be encoding in Latin-1. Decode this to unicode data; cx_Oracle will do the rest for you:
acct = rec[ 84:104 ].decode('latin1')
or use the codecs.open() function to open the file for automatic decoding:
inputfile = codecs.open(filename, 'r', encoding='latin1')
Reading from inputfile will give you unicode data.
On insertion, the cx_Oracle library will encode unicode values to the correct encoding that Oracle expects. You do need to set the NLS_LANG environment variable to AL32UTF8 before connecting, either in the shell or in Python with:
os.environ["NLS_LANG"] = ".AL32UTF8"
You may want to review the Python Unicode HOWTO for more details.

Related

Executing Python script from PHP and insert in mysql

I have a php script that executes a python script and returns the data back and it is stored in mysql. It's working fine but when the data is stored in the database it inserts an additional blank row. My question would be, how could I make so that it stores only the actual data I recieve.
This is part of the python script
##ser.write(sync)
ser.write(counters)
a = ser.read(30)
state = binascii.hexlify(a)
asd = re.sub(rb'([0-9, a-z, A-Z])(?!$)', rb'\1,', state)
url = 'http://127.0.0.1/sastest/meters.php'
x = requests.post(url, data = asd)
print(asd)
And this is from the PHP
passthru("meters.py");
$incomingData = file_get_contents("php://input");
$qry1 = "INSERT INTO machtest(data)
values('".$incomingData."')";
mysqli_query($conn,$qry1);
From comments we discover the overall process:
When I call meters.php it activates meters.py. meters.py interrogates a devices and sends the data back to meters.php
Because PHP's passthru does not support output return but a similar function exec does as array object with each line as elements, use that instead and do not have Python post back a response. Of course, always run parameterization when interacting with databases and passing input values.
Python (meters.py)
ser.write(counters)
a = ser.read(30)
state = binascii.hexlify(a)
asd = re.sub(rb'([0-9, a-z, A-Z])(?!$)', rb'\1,', state)
print(asd)
PHP (meters.php)
// USE output ARG
exec(command = "meters.py", output = $incomingData);
// USE PARAMETERIZATION
$qry = "INSERT INTO machtest (data) VALUES (%s)";
$stmt = mysqli_prepare($qry);
mysqli_stmt_bind_param($stmt, "s", $incomingData[0]);
mysqli_stmt_execute($stmt);
See mysqli prepared statement docs
Alternatively, have Python run all processing including device and database interaction. Then, have PHP call the .py script:
Python (meters.py)
import mysql.connector # USE ANY MySQL DB-API. THIS IS AN EXAMPLE
...
### INTERROGATE DEVICE
ser.write(counters)
a = ser.read(30)
state = binascii.hexlify(a)
asd = re.sub(rb'([0-9, a-z, A-Z])(?!$)', rb'\1,', state)
### APPEND TO DATABASE
# OPEN CONNECTION AND CURSOR
conn = mysql.connector.connect(host='localhost', database='mydatabase',
user='root', password='pwd')
cur = conn.cursor()
# USE PARAMETERIZATION
qry = "INSERT INTO machtest (data) VALUES (%s)"
cur.execute(qry, (asd,))
conn.commit()
cur.close()
conn.close()
See MySQL cursor execute docs
PHP (meters.php)
// NO NEED FOR output
passthru(command = "meters.py");

What is the best way to dump MySQL table data to csv and convert character encoding?

I have a table with about 200 columns. I need to take a dump of the daily transaction data for ETL purposes. Its a MySQL DB. I tried that with Python both using pandas dataframe as well as basic write to CSV file method. I even tried to look for the same functionality using shell script. I saw one such for oracle Database using sqlplus. Following are my python codes with the two approaches:
Using Pandas:
import MySQLdb as mdb
import pandas as pd
host = ""
user = ''
pass_ = ''
db = ''
query = 'SELECT * FROM TABLE1'
conn = mdb.connect(host=host,
user=user, passwd=pass_,
db=db)
df = pd.read_sql(query, con=conn)
df.to_csv('resume_bank.csv', sep=',')
Using basic python file write:
import MySQLdb
import csv
import datetime
currentDate = datetime.datetime.now().date()
host = ""
user = ''
pass_ = ''
db = ''
table = ''
con = MySQLdb.connect(user=user, passwd=pass_, host=host, db=db, charset='utf8')
cursor = con.cursor()
query = "SELECT * FROM %s;" % table
cursor.execute(query)
with open('Data_on_%s.csv' % currentDate, 'w') as f:
writer = csv.writer(f)
for row in cursor.fetchall():
writer.writerow(row)
print('Done')
The table has about 300,000 records. It's taking too much time with both the python codes.
Also, there's an issue with encoding here. The DB resultset has some latin-1 characters for which I'm getting some errors like : UnicodeEncodeError: 'ascii' codec can't encode character '\x96' in position 1078: ordinal not in range(128).
I need to save the CSV in Unicode format. Can you please help me with the best approach to perform this task.
A Unix based or Python based solution will work for me. This script needs to be run daily to dump daily data.
You can achieve that just leveraging MySql. For example:
SELECT * FROM your_table WHERE...
INTO OUTFILE 'your_file.csv'
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
FIELDS ESCAPED BY '\'
LINES TERMINATED BY '\n';
if you need to schedule your query put such a query into a file (e.g., csv_dump.sql) anche create a cron task like this one
00 00 * * * mysql -h your_host -u user -ppassword < /foo/bar/csv_dump.sql
For strings this will use the default character encoding which happens to be ASCII, and this fails when you have non-ASCII characters. You want unicode instead of str.
rows = cursor.fetchall()
f = open('Data_on_%s.csv' % currentDate, 'w')
myFile = csv.writer(f)
myFile.writerow([unicode(s).encode("utf-8") for s in rows])
fp.close()
You can use mysqldump for this task. (Source for command)
mysqldump -u username -p --tab -T/path/to/directory dbname table_name --fields-terminated-by=','
The arguments are as follows:
-u username for the username
-p to indicate that a password should be used
-ppassword to give the password via command line
--tab Produce tab-separated data files
For mor command line switches see https://dev.mysql.com/doc/refman/5.5/en/mysqldump.html
To run it on a regular basis, create a cron task like written in the other answers.

Sqlite3 cannot correctly query a UTF-8 string?

I'm having a lot of trouble using python's sqlite3 library with UTF-8 strings. I need this encoding because I am working people's names, in my database.
My SQL schema for the desired table is:
CREATE TABLE senators (id integer, name char);
I would like to do the following in Python (ignore the very ugly way I wrote the select statement. I did it this way for debugging purposes):
statement = u"select * from senators where name like '" + '%'+row[0]+'%'+"'"
c.execute(statement)
row[0] is the name of each row in a file that has this type of entry:
Dário Berger,1
Edison Lobão,1
Eduardo Braga,1
While I have a non empty result for names like Eduardo Braga, any time my string has UTF-8 characters, I get a null result.
I have checked that my file has in fact been saved with UTF-8 encoding (Microsoft Notepad). On a Apple mac, in the terminal, I used the PRAGMA command in the sqlite3 shell to check the encoding:
sqlite> PRAGMA encoding;
UTF-8
Does anybody have an idea what I can do here?
EDIT - Complete example:
Python script that creates the databases, and populates with initial data from senators.csv (file):
# -*- coding: utf-8 -*-
import sqlite3
import csv
conn = sqlite3.connect('senators.db')
c = conn.cursor()
c.execute('''CREATE TABLE senators (id integer, name char)''')
c.execute('''CREATE TABLE polls (id integer, senator char, vote integer, FOREIGN KEY(senator) REFERENCES senators(name))''')
with open('senators.csv', encoding='utf-8') as f:
f_csv = csv.reader(f)
for row in f_csv:
c.execute(u"INSERT INTO senators VALUES(?,?)", (row[1], row[0]))
conn.commit()
conn.close()
Script that populates the polls table, using Q1.txt (file).
import csv
import sqlite3
import re
import glob
conn = sqlite3.connect('senators.db')
c = conn.cursor()
POLLS = {
'senator': 'votes/senator/Q*.txt',
'deputee': 'votes/deputee/Q*.txt',
}
s_polls = glob.glob(POLLS['senator'])
d_polls = glob.glob(POLLS['deputee'])
for poll in s_polls:
m = re.match('.*Q(\d+)\.txt', poll)
poll_id = m.groups(0)
with open(poll, encoding='utf-8') as p:
f_csv = csv.reader(p)
for row in f_csv:
c.execute(u'SELECT id FROM senators WHERE name LIKE ?', ('%'+row[0]+'%',))
data = c.fetchone()
print(data) # I should not get None results here, but I do, exactly when the query has UTF-8 characters.
Note the file paths, if you want to test these scripts out.
Ok guys,
After a lot of trouble, I found out that the problem was that the encodings, all though were both considered UTF-8, were still different anyways. The difference was that while the database was decomposed UTF-8 (ã = a + ~), my input was in precomposed form (one code for the ã character).
To fix it, I had to convert all my input data to the decomposed form.
from unicodedata import normalize
with open(poll, encoding='utf-8') as p:
f_csv = csv.reader(p)
for row in f_csv:
name = normalize("NFD",row[0])
c.execute(u'SELECT id FROM senators WHERE name LIKE ?', ('%'+name+'%',))
See this article, for some excellent information on the subject.
From the SQLite docs:
Important Note: SQLite only understands upper/lower case for ASCII characters by default. The LIKE operator is case sensitive by default for unicode characters that are beyond the ASCII range. For example, the expression 'a' LIKE 'A' is TRUE but 'æ' LIKE 'Æ' is FALSE.
Also, use query parameters. Your query is vulnerable to SQL injection.

Get SQLAlchemy to encode correctly strings with cx_Oracle

My problem is that SQLAlchemy seems to be writing the text not properly encoded in my Oracle database.
I include fragments of the code below:
engine = create_engine("oracle://%s:%s#%s:%s/%s?charset=utf8"%(db_username, db_password, db_hostname,db_port, db_database), encoding='utf8')
connection = engine.connect()
session = Session(bind = connection)
class MyClass(DeclarativeBase):
"""
Model of the be persisted
"""
__tablename__ = "enconding_test"
id = Column(Integer, Sequence('encoding_test_id_seq'),primary_key = True)
blabla = Column(String(255, collation='utf-8'), default = '')
autoload = True
content = unicode("äüößqwerty","utf_8")
t = MyClass(blabla=content.encode("utf_8"))
session.add(t)
session.commit()
If now I read the contents of the database, I get printed something like:
????????qwerty
instead of the original:
äüößqwerty
So basically my question is what do I have to do, to properly store these German characters in the database?
Thanks in advance!
I found a related topic, that actually answers my question:
Python 2.7 connection to oracle loosing polish characters
You simply add the following line, before creating the database connection:
os.environ["NLS_LANG"] = "GERMAN_GERMANY.UTF8"
Additional documentation about which strings you need for different languages are found at the Oracle website:
Oracle documentation on Unicode Support

Python to Pull Oracle Data in Unicode (Arabic) format

I am using cx_Oracle to fetch some data stored in Arabic characters from an Oracle database. Below is how I try to connect to the database. When I try to print the results, specially those columns stored in Arabic, I get something like "?????" which seems to me that the data was not coded properly.
I tried to print random Arabic string in Python it went alright, which indicates the problem is in the manner in which I am pulling data from the database.
connection = cx_Oracle.connect(username, password, instanceName)
wells = getWells(connection)
def getWells(conn):
cursor = conn.cursor()
wells = []
cursor.execute(sql)
clmns = len(cursor.description)
for row in cursor.fetchall():
print row
well = {}
for i in range(0, clmns):
if type(row[i]) is not datetime.datetime:
well[cursor.description[i][0]] = row[i]
else:
well[cursor.description[i][0]] = row[i].isoformat()
wells.append(well)
cursor.close()
connection.close()
return wells
In order to force a reset of the default encoding from the environment, you can call the setdefaultencoding method in the sys module.
As this is not recommended, it is not visible by default and a reload is required.
It is recommended that you attempt to fix the encoding set in the shell for the user on the host system rather than modifying in a script.
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

Categories

Resources