I got a database in DBsqlite.in this DBsqlite database I have have a records containing portugese text like "Hiper-radiação simétrica periocular bem delimitada, homogênea."
and the characters like ç ã é ê don't parse right in my python script.
While normal english text is doing it perfectly.
In my terminal window (I use a mac) the
I know it has something do to with the encoding. but the code still doesn't recognise portugese.
my sample code:
# -*- coding: UTF-8 -*-
import xml.etree.ElementTree as ET
import sqlite3
#open a database connection to the database translateDB.sqlite
conn = sqlite3.connect('translateDB.sqlite')
#prepare a cursor object using cursus() method
cursor = conn.cursor()
#test input
# this doesn't work
text = ('Hiper-radiação simétrica periocular bem delimitada, homogênea')
# this does work in english
#text = ('Well delimited, homogeneous symmetric periocular hyper- radiation.')
# Execute SQL query using execute() method.
cursor.execute('SELECT * FROM translate WHERE L2_portugese=?', (text,))
# Fetch a single row using fetchone() method and display it.
print cursor.fetchone()
# Disconnect from server
conn.close()
any tips & tricks are greatly appreciated. Ron
Related
I'm trying to develop a (really) simple server who an iOS app will interrogate. The Python script has to connect to the MySQL database and return data in JSON format. I can not achieve that it works also with special characters like è or é. This is a short and simplified version of my code with a lot of debugging printing inside...
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import MySQLdb
import json
print ("Content-Type: application/json; charset=utf-8\n\n")
db = MySQLdb.connect("localhost","root","******","*******" )
cursor = db.cursor()
sql = "SELECT * FROM places WHERE name IN (\"Palazzina Majani\")"
try:
cursor.execute(sql)
num_fields = len(cursor.description)
field_names = [i[0] for i in cursor.description]
results = cursor.fetchall()
print ("------------------results:")
print (results)
output_json = []
for row in results:
output_json.append(dict(zip(field_names,row)))
print ("------------------output_json:")
print (output_json)
output = json.dumps(output_json, ensure_ascii=False)
print ("------------------output:")
print (output)
except:
print ("Error")
db.close()
And this is what I get with terminal and also with browser:
Content-Type: application/json; charset=utf-8
------------------results:
(('Palazzina Majani', 'kasj \xe8.\xe9', 'palazzina_majani'),)
------------------output_json:
[{'imageName': 'palazzina_majani', 'name': 'Palazzina Majani', 'description': 'kasj \xe8.\xe9'}]
------------------output:
[{"imageName": "palazzina_majani", "name": "Palazzina Majani", "description": "kasj ?.?"}]
How can I manage those special characters (and the mainly used from latin-1)?
What if I simply replace single quotes with double quotes from "output_json" insted of using json.dumps?
Thank you!
SOLUTION
As Parfait said in the comments, passing charset='utf8' inside connect() solved the problem!
I'm having a lot of trouble using python's sqlite3 library with UTF-8 strings. I need this encoding because I am working people's names, in my database.
My SQL schema for the desired table is:
CREATE TABLE senators (id integer, name char);
I would like to do the following in Python (ignore the very ugly way I wrote the select statement. I did it this way for debugging purposes):
statement = u"select * from senators where name like '" + '%'+row[0]+'%'+"'"
c.execute(statement)
row[0] is the name of each row in a file that has this type of entry:
Dário Berger,1
Edison Lobão,1
Eduardo Braga,1
While I have a non empty result for names like Eduardo Braga, any time my string has UTF-8 characters, I get a null result.
I have checked that my file has in fact been saved with UTF-8 encoding (Microsoft Notepad). On a Apple mac, in the terminal, I used the PRAGMA command in the sqlite3 shell to check the encoding:
sqlite> PRAGMA encoding;
UTF-8
Does anybody have an idea what I can do here?
EDIT - Complete example:
Python script that creates the databases, and populates with initial data from senators.csv (file):
# -*- coding: utf-8 -*-
import sqlite3
import csv
conn = sqlite3.connect('senators.db')
c = conn.cursor()
c.execute('''CREATE TABLE senators (id integer, name char)''')
c.execute('''CREATE TABLE polls (id integer, senator char, vote integer, FOREIGN KEY(senator) REFERENCES senators(name))''')
with open('senators.csv', encoding='utf-8') as f:
f_csv = csv.reader(f)
for row in f_csv:
c.execute(u"INSERT INTO senators VALUES(?,?)", (row[1], row[0]))
conn.commit()
conn.close()
Script that populates the polls table, using Q1.txt (file).
import csv
import sqlite3
import re
import glob
conn = sqlite3.connect('senators.db')
c = conn.cursor()
POLLS = {
'senator': 'votes/senator/Q*.txt',
'deputee': 'votes/deputee/Q*.txt',
}
s_polls = glob.glob(POLLS['senator'])
d_polls = glob.glob(POLLS['deputee'])
for poll in s_polls:
m = re.match('.*Q(\d+)\.txt', poll)
poll_id = m.groups(0)
with open(poll, encoding='utf-8') as p:
f_csv = csv.reader(p)
for row in f_csv:
c.execute(u'SELECT id FROM senators WHERE name LIKE ?', ('%'+row[0]+'%',))
data = c.fetchone()
print(data) # I should not get None results here, but I do, exactly when the query has UTF-8 characters.
Note the file paths, if you want to test these scripts out.
Ok guys,
After a lot of trouble, I found out that the problem was that the encodings, all though were both considered UTF-8, were still different anyways. The difference was that while the database was decomposed UTF-8 (ã = a + ~), my input was in precomposed form (one code for the ã character).
To fix it, I had to convert all my input data to the decomposed form.
from unicodedata import normalize
with open(poll, encoding='utf-8') as p:
f_csv = csv.reader(p)
for row in f_csv:
name = normalize("NFD",row[0])
c.execute(u'SELECT id FROM senators WHERE name LIKE ?', ('%'+name+'%',))
See this article, for some excellent information on the subject.
From the SQLite docs:
Important Note: SQLite only understands upper/lower case for ASCII characters by default. The LIKE operator is case sensitive by default for unicode characters that are beyond the ASCII range. For example, the expression 'a' LIKE 'A' is TRUE but 'æ' LIKE 'Æ' is FALSE.
Also, use query parameters. Your query is vulnerable to SQL injection.
I'm trying to store chinese text in a Microsoft SQL database.
My column type is nvarchar. And if I run this query directly in SSMS the results are saved correctly:
insert into xtemp (ind, desp)
values (1, N'人大花来民北村分社訃実真属葉')
But if I try to do it from python code, database just stores it as "????????????"
Can anyone point me in the right direction?
This is my python code:
connectionString1 = 'Driver={SQL Server};Server=x.database.windows.net,1433;Database=x;Uid=x;Pwd=x;Connection Timeout=30;Encrypt=yes;CHARSET=UTF8;'
connection1 = pyodbc.connect(connectionString1)
cursor1 = connection1.cursor()
q1 = '''
insert into xtemp (ind, desp)
values (2, N'人大花来民北村分社訃実真属葉')
'''
cursor1.execute(q1)
connection1.commit()
print("done")
cursor1.close()
connection1.close()
Try adding # -*- coding: utf-8 -*- at the beginning of your file, and also change
q1 = '''
to
q1 = u'''
This should enable Unicode support in your script and properly encode Chinese characters when sending then to the database engine.
I am trying to read a file, parse the data using python 2.7, and import the data into sqlite3. However, I'm running into a problem when inserting the data. After I parse a line from the file, the é in my string is replaced with \xe9. After I split the line from my file, I want a list that contains [73,'Misérables, Les'] but instead I'm getting [73,'Mis\xe9rables, Les'] which is screwing up the SQL INSERT statement. How can I fix this?
#!/usr/bin/python
# -*- coding: latin-1 -*-
import sqlite3
line = '73::Misérables, Les'.decode('latin-1')
vals = line.split("::")
con = sqlite3.connect('myDb.db')
cur = con.cursor()
cur.execute("DROP TABLE IF EXISTS movie")
cur.execute('CREATE TABLE movie (id INT, title TEXT)')
sql = 'INSERT INTO movie VALUES (?,?)'
cur.execute(sql,tuple(vals))
cur.execute('SELECT * FROM movie')
for record in cur:
print record
Your program inserts data into the db perfectly. It subsequently retrieves the correct data. Your problem is when you display the result.
When you print a tuple, the system displays the repr() of each item, not the str() of each item. Thus you see \xe9 instead of é in the output.
To get what you want, try replacing the loop at the end of your program:
for record in cur:
print record[0], record[1]
Greetings
By using pymssql library, I want to write data to a MSSQL database however I encounter encoding issues. Here is my sample code to write to the DB:
# -*- coding: utf-8 -*-
import _mssql
....
Connection info data here
....
def mssql_connect():
return _mssql.connect(server=HOST, user=USERNAME, password=PASS, database=DB, charset="utf-8")
con = mssql_connect()
INSERT_EX_SQL = "INSERT INTO myDatabsae (Id, ProgramName, ProgramDetail) VALUES (1, 'Test Characters ÜŞiçÇÖö', 'löşüIIğĞü');"
con.execute_non_query(INSERT_EX_SQL)
con.close()
Sadly the data that was written to DB is corrupted:
The Collacation of my mssql db is: Turkish_CI_AS
How can this be solved?
Here is a possible solution:
The key is INSERT_EX_SQ.encode('your language encoder').
Try this instead:
con.execute_non_query(INSERT_EX_SQ.encode('your language encoder'))