Python stores " ó " as " Ã³ "

Python stores " ó " as " Ã³ " - python

I use scrapy to take information from a website that according to w3 validator is utf-8..
My python project has
# -*- coding: utf-8 -*-
I receive some names like López J and when I print it, it shows fine...
But when I want to store it into the mysql I get some error about ascii not being able to encode blah blah blah...
If I use .encode ('ascii', 'ignore') i get: Lpez J
If I use .encode ('ascii', 'replace') i get: LÃ³pez J
if I use .encode ('utf-8') i get: LÃ³pez J
What should I do?
I'm in a big trouble here :'(

When you connect to the database use charset='utf-8', use_unicode=True with other keywords to connect() method. This should make the dababase accept and return unicode values, so you don't have to (and shouldn't) encode them manually.
Example:
>>> import MySQLdb
>>> conn = MySQLdb.connect(... , use_unicode=True, charset='utf8')
>>> cur = conn.cursor()
>>> cur.execute('CREATE TABLE testing(x VARCHAR(20))')
0L
>>> cur.execute('INSERT INTO testing values(%s)', ('López J',))
1L
>>> cur.execute('SELECT * FROM testing')
1L
>>> print cur.fetchall()[0][0]
López J

Check your server, database, table, column and connection character sets.
As a quick test, try executing
SET NAMES 'utf8';
immediately after connecting.

Related

MySQL to JSON using Python

I'm trying to develop a (really) simple server who an iOS app will interrogate. The Python script has to connect to the MySQL database and return data in JSON format. I can not achieve that it works also with special characters like è or é. This is a short and simplified version of my code with a lot of debugging printing inside...
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import MySQLdb
import json
print ("Content-Type: application/json; charset=utf-8\n\n")
db = MySQLdb.connect("localhost","root","******","*******" )
cursor = db.cursor()
sql = "SELECT * FROM places WHERE name IN (\"Palazzina Majani\")"
try:
cursor.execute(sql)
num_fields = len(cursor.description)
field_names = [i[0] for i in cursor.description]
results = cursor.fetchall()
print ("------------------results:")
print (results)
output_json = []
for row in results:
output_json.append(dict(zip(field_names,row)))
print ("------------------output_json:")
print (output_json)
output = json.dumps(output_json, ensure_ascii=False)
print ("------------------output:")
print (output)
except:
print ("Error")
db.close()
And this is what I get with terminal and also with browser:
Content-Type: application/json; charset=utf-8
------------------results:
(('Palazzina Majani', 'kasj \xe8.\xe9', 'palazzina_majani'),)
------------------output_json:
[{'imageName': 'palazzina_majani', 'name': 'Palazzina Majani', 'description': 'kasj \xe8.\xe9'}]
------------------output:
[{"imageName": "palazzina_majani", "name": "Palazzina Majani", "description": "kasj ?.?"}]
How can I manage those special characters (and the mainly used from latin-1)?
What if I simply replace single quotes with double quotes from "output_json" insted of using json.dumps?
Thank you!
SOLUTION
As Parfait said in the comments, passing charset='utf8' inside connect() solved the problem!

how to avoid the following issue, using python3?

Hello I have the following code:
from __future__ import print_function
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import json
import pandas as pd
import re
import threading
import pickle
import sqlite3
#from treetagger import TreeTagger
conn = sqlite3.connect('Telcel.db')
cursor = conn.cursor()
cursor.execute('select id_comment from Tweets')
id_comment = [i for i in cursor]
cursor.execute('select id_author from Tweets')
id_author = [i for i in cursor]
cursor.execute('select comment_message from Tweets')
comment_message = [i[0].encode('utf-8').decode('latin-1') for i in cursor]
cursor.execute('select comment_published from Tweets')
comment_published = [i for i in cursor]
That is working well in python 2.7.12, output:
~/data$ python DBtoList.py
8003
8003
8003
8003
However when I run the same code using python3 as follows, I got:
~/data$ python3 DBtoList.py
Traceback (most recent call last):
File "DBtoList.py", line 21, in <module>
comment_message = [i[0].encode('utf-8').decode('latin-1') for i in cursor]
File "DBtoList.py", line 21, in <listcomp>
comment_message = [i[0].encode('utf-8').decode('latin-1') for i in cursor]
sqlite3.OperationalError: Could not decode to UTF-8 column 'comment_message' with text 'dancing music ������'
I searched for this line and I found:
"dancing music 😜"
I am not sure why the code is working in python 2, it seems that python Python 3.5.2 is not able to decode this character at this line:
comment_message = [i[0].encode('utf-8').decode('latin-1') for i in cursor]
so I would like to appreciate suggestions to fix this problem, thanks for the support

Python 3 has no issue with the string itself if you store it using the Python sqlite3 API. I've set utf-8 as my default encoding everywhere.
import sqlite3
conn = sqlite3.connect(':memory:')
conn.execute('create table Tweets (comment_message text)')
conn.execute('insert into Tweets values ("dancing music 😜")')
[(tweet,) ] = conn.execute('select comment_message from tweets')
tweet
output:
'dancing music 😜'
Now, let's see the type:
>>> type(tweet)
str
So everything is fine if you work with Python str from the start.
Now, as an aside, the thing you are trying to do (encode utf-8, decode latin-1) makes very little sense, especially if you have things like emojis in the string. Look what happens to your tweet:
>>> tweet.encode('utf-8').decode('latin-1')
'dancing music ð\x9f\x98\x9c'
But now to your problem: You have stored strings (byte sequences) in your database using an encoding different from utf-8. The error you are seeing is caused by the sqlite3 library attempting to decode these byte sequences and failing because the bytes are not valid utf-8 sequences. The only way to solve this problem is:
Find out what encoding was used to encode the strings in the database
Use that encoding to decode the strings by setting conn.text_factory = lambda x: str(x, 'latin-1'). This assumes you've stored the strings using latin1.
I would then suggest that you run through the database and update the values so that they now are encoded using utf-8 which is the default behaviour.
See also this question.
I also highly recommend that you read this article about how encodings work.

portugese characters in DBsqlite and python parsing not recognised

I got a database in DBsqlite.in this DBsqlite database I have have a records containing portugese text like "Hiper-radiação simétrica periocular bem delimitada, homogênea."
and the characters like ç ã é ê don't parse right in my python script.
While normal english text is doing it perfectly.
In my terminal window (I use a mac) the
I know it has something do to with the encoding. but the code still doesn't recognise portugese.
my sample code:
# -*- coding: UTF-8 -*-
import xml.etree.ElementTree as ET
import sqlite3
#open a database connection to the database translateDB.sqlite
conn = sqlite3.connect('translateDB.sqlite')
#prepare a cursor object using cursus() method
cursor = conn.cursor()
#test input
# this doesn't work
text = ('Hiper-radiação simétrica periocular bem delimitada, homogênea')
# this does work in english
#text = ('Well delimited, homogeneous symmetric periocular hyper- radiation.')
# Execute SQL query using execute() method.
cursor.execute('SELECT * FROM translate WHERE L2_portugese=?', (text,))
# Fetch a single row using fetchone() method and display it.
print cursor.fetchone()
# Disconnect from server
conn.close()
any tips & tricks are greatly appreciated. Ron

Inserting UNICODE into sqlite3

I am trying to read a file, parse the data using python 2.7, and import the data into sqlite3. However, I'm running into a problem when inserting the data. After I parse a line from the file, the é in my string is replaced with \xe9. After I split the line from my file, I want a list that contains [73,'Misérables, Les'] but instead I'm getting [73,'Mis\xe9rables, Les'] which is screwing up the SQL INSERT statement. How can I fix this?
#!/usr/bin/python
# -*- coding: latin-1 -*-
import sqlite3
line = '73::Misérables, Les'.decode('latin-1')
vals = line.split("::")
con = sqlite3.connect('myDb.db')
cur = con.cursor()
cur.execute("DROP TABLE IF EXISTS movie")
cur.execute('CREATE TABLE movie (id INT, title TEXT)')
sql = 'INSERT INTO movie VALUES (?,?)'
cur.execute(sql,tuple(vals))
cur.execute('SELECT * FROM movie')
for record in cur:
print record

Your program inserts data into the db perfectly. It subsequently retrieves the correct data. Your problem is when you display the result.
When you print a tuple, the system displays the repr() of each item, not the str() of each item. Thus you see \xe9 instead of é in the output.
To get what you want, try replacing the loop at the end of your program:
for record in cur:
print record[0], record[1]

Hebrew appears as gibberish, DB importing with PyPyODBC

I'm trying to work with a Hebrew database, unfortunately the output is gibberish. What am I doing wrong?
# -*- coding: utf-8 -*-
import pypyodbc
conn = pypyodbc.connect('Driver={Microsoft Access Driver (*.mdb)};DBQ=C:\\client.mdb')
cur = conn.cursor()
cur.execute('''SELECT * FROM Client''')
d = cur.fetchone()
for field in d:
print field
If I look at cur.fetchone():
'\xf0\xf1\xe0\xf8', '\xe0\xe9\xe0\xe3'
Output:
αΘαπ
2001
εδßΘ
αΘ°σ

If either of נסאר or איאד is meaningful, then try:
field.decode('cp1255')
Google Translate suggests this might correspond to a person named Iyad Nassar.

try use:
field.encode('utf-8')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python stores " ó " as " Ã³ " - python

Check your server, database, table, column and connection character sets. As a quick test, try executing SET NAMES 'utf8'; immediately after connecting.

Related

MySQL to JSON using Python

how to avoid the following issue, using python3?

portugese characters in DBsqlite and python parsing not recognised

Inserting UNICODE into sqlite3

Hebrew appears as gibberish, DB importing with PyPyODBC

Categories

Resources