i neeed a python script to generate a csv file from my database XXXX. i wrote thise script but i have something wrong :
import mysql.connector
import csv
filename=open('test.csv','wb')
c=csv.writer(filename)
cnx = mysql.connector.connect(user='XXXXXXX', password='XXXXX',
host='localhost',
database='XXXXX')
cursor = cnx.cursor()
query = ("SELECT `Id_Vendeur`, `Nom`, `Prenom`, `email`, `Num_magasin`, `Nom_de_magasin`, `Identifiant_Filiale`, `Groupe_DV`, `drt_Cartes`.`gain` as 'gain', `Date_Distribution`, `Status_Grattage`, `Date_Grattage` FROM `drt_Cartes_Distribuer`,`drt_Agent`,`drt_Magasin`,`drt_Cartes` where `drt_Cartes_Distribuer`.`Id_Vendeur` = `drt_Agent`.`id_agent` AND `Num_magasin` = `drt_Magasin`.`Numero_de_magasin` AND `drt_Cartes_Distribuer`.`Id_Carte` = `drt_Cartes`.`num_carte`")
cursor.execute(query)
for Id_Vendeur, Nom, Prenom, email, Num_magasin, Nom_de_magasin, Identifiant_Filiale, Groupe_DV, gain, Date_Distribution, Status_Grattage, Date_Grattage in cursor:
c.writerow([Id_Vendeur, Nom, Prenom, email, Num_magasin, Nom_de_magasin, Identifiant_Filiale, Groupe_DV, gain, Date_Distribution, Status_Grattage, Date_Grattage] )
cursor.close()
filename.close()
cnx.close()
when i executing the command on phpmyadmin its look working very well but from my shell i got thise message :
# python test.py
Traceback (most recent call last):
File "test.py", line 18, in <module>
c.writerow([Id_Vendeur, Nom, Prenom, email, Num_magasin, Nom_de_magasin, Identifiant_Filiale, Groupe_DV, gain, Date_Distribution, Status_Grattage, Date_Grattage] )
UnicodeEncodeError: 'ascii' codec can't encode character u'\xeb' in position 5: ordinal not in range(128)
It looks you are using csv for Python 2.7. Quoting docs:
Note This version of the csv module doesn’t support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe; see the examples in section Examples.
Options, choice one of them:
Follow doc link, go to samples section, and modify your code accordantly.
Use a csv packet with unicode supprt like https://pypi.python.org/pypi/unicodecsv
Your data from the database are not only ascii characteres. I suggest you use the 'unicodecvs' python module as suggested in the answer to this question: How to write UTF-8 in a CSV file
Related
Hello I have the following code:
from __future__ import print_function
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import json
import pandas as pd
import re
import threading
import pickle
import sqlite3
#from treetagger import TreeTagger
conn = sqlite3.connect('Telcel.db')
cursor = conn.cursor()
cursor.execute('select id_comment from Tweets')
id_comment = [i for i in cursor]
cursor.execute('select id_author from Tweets')
id_author = [i for i in cursor]
cursor.execute('select comment_message from Tweets')
comment_message = [i[0].encode('utf-8').decode('latin-1') for i in cursor]
cursor.execute('select comment_published from Tweets')
comment_published = [i for i in cursor]
That is working well in python 2.7.12, output:
~/data$ python DBtoList.py
8003
8003
8003
8003
However when I run the same code using python3 as follows, I got:
~/data$ python3 DBtoList.py
Traceback (most recent call last):
File "DBtoList.py", line 21, in <module>
comment_message = [i[0].encode('utf-8').decode('latin-1') for i in cursor]
File "DBtoList.py", line 21, in <listcomp>
comment_message = [i[0].encode('utf-8').decode('latin-1') for i in cursor]
sqlite3.OperationalError: Could not decode to UTF-8 column 'comment_message' with text 'dancing music ������'
I searched for this line and I found:
"dancing music 😜"
I am not sure why the code is working in python 2, it seems that python Python 3.5.2 is not able to decode this character at this line:
comment_message = [i[0].encode('utf-8').decode('latin-1') for i in cursor]
so I would like to appreciate suggestions to fix this problem, thanks for the support
Python 3 has no issue with the string itself if you store it using the Python sqlite3 API. I've set utf-8 as my default encoding everywhere.
import sqlite3
conn = sqlite3.connect(':memory:')
conn.execute('create table Tweets (comment_message text)')
conn.execute('insert into Tweets values ("dancing music 😜")')
[(tweet,) ] = conn.execute('select comment_message from tweets')
tweet
output:
'dancing music 😜'
Now, let's see the type:
>>> type(tweet)
str
So everything is fine if you work with Python str from the start.
Now, as an aside, the thing you are trying to do (encode utf-8, decode latin-1) makes very little sense, especially if you have things like emojis in the string. Look what happens to your tweet:
>>> tweet.encode('utf-8').decode('latin-1')
'dancing music ð\x9f\x98\x9c'
But now to your problem: You have stored strings (byte sequences) in your database using an encoding different from utf-8. The error you are seeing is caused by the sqlite3 library attempting to decode these byte sequences and failing because the bytes are not valid utf-8 sequences. The only way to solve this problem is:
Find out what encoding was used to encode the strings in the database
Use that encoding to decode the strings by setting conn.text_factory = lambda x: str(x, 'latin-1'). This assumes you've stored the strings using latin1.
I would then suggest that you run through the database and update the values so that they now are encoded using utf-8 which is the default behaviour.
See also this question.
I also highly recommend that you read this article about how encodings work.
I have a MySQL table, with XML content stored in a longtext field, encoded as utf8mb4_general_ci
Database Table
I want to use a Python script to read in the XML data from the transcript field, modify an element, and then write the value back to the database.
When I try to get the XML content into an Element using ElementTree.tostring I get the following encoding error:
Traceback (most recent call last):
File "ImageProcessing.py", line 33,
in <module> root = etree.fromstring(row[1])
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1300,
in XML parser.feed(text)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etre e/ElementTree.py", line 1640,
in feed self._parser.Parse(data, 0)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2014' in position 9568: ordinal not in range(128)
Code:
import datetime
import mysql.connector
import xml.etree.ElementTree as etree
# Creates the config parameters, connects
# to the database and creates a cursor
config = {
'user': 'username',
'password': 'password',
'host': '127.0.0.1',
'database': 'dbname',
'raise_on_warnings': True,
'use_unicode': True,
'charset': 'utf8',
}
cnx = mysql.connector.connect(**config)
cursor = cnx.cursor()
# Structures the SQL query
query = ("SELECT * FROM transcription")
# Executes the query and fetches the first row
cursor.execute(query)
row = cursor.fetchone()
while row is not None:
print(row[0])
#Some of the things I have tried to resolve the encoding issue
#parser = etree.XMLParser(encoding="utf-8")
#root = etree.fromstring(row[1], parser=parser)
#row[1].encode('ascii', 'ignore')
#Line where the encoding error is being thrown
root = etree.fromstring(row[1])
for img in root.iter('img'):
refno = img.text
img.attrib['href']='http://www.link.com/images.jsp?doc=' + refno
print img.tag, img.attrib, img.text
row = cursor.fetchone()
cursor.close()
cnx.close()
You've got everything well setup and your database connection is returning Unicodes, which is a good thing.
Unfortunately, ElementTree's fromstring() requires a byte str not a Unicode. This is so ElementTree can decode it using the encoding defined in the XML header.
You need to use this instead:
utf_8_xml = row[1].encode("utf-8")
root = etree.fromstring(utf_8_xml)
I'm getting this error UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2014'
I'm trying to load lots of news articles into a MySQLdb. However I'm having difficulty handling non-standard characters, I get hundreds of these errors for all sorts of characters. I can handle them individually using .replace() although I would like a more complete solution to handle them correctly.
ubuntu#ip-10-0-0-21:~/scripts/work$ python test_db_load_error.py
Traceback (most recent call last):
File "test_db_load_error.py", line 27, in <module>
cursor.execute(sql_load)
File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 157, in execute
query = query.encode(charset)
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2014' in position 158: ordinal not in range(256)
My script;
import MySQLdb as mdb
from goose import Goose
import string
import datetime
host = 'rds.amazonaws.com'
user = 'news'
password = 'xxxxxxx'
db_name = 'news_reader'
conn = mdb.connect(host, user, password, db_name)
url = 'http://www.dailymail.co.uk/wires/ap/article-3060183/Andrew-Lesnie-Lord-Rings-cinematographer-dies.html?ITO=1490&ns_mchannel=rss&ns_campaign=1490'
g = Goose()
article = g.extract(url=url)
body = article.cleaned_text
body = body.replace("'","`")
load_date = str(datetime.datetime.now())
summary = article.meta_description
title = article.title
image = article.top_image
sql_load = "insert into articles " \
" (title,summary,article,,image,source,load_date) " \
" values ('%s','%s','%s','%s','%s','%s');" % \
(title,summary,body,image,url,load_date)
cursor = conn.cursor()
cursor.execute(sql_load)
#conn.commit()
Any help would be appreciated.
When you create your mysqldb connection pass the charset='utf8' to the connection.
conn = mdb.connect(host, user, password, db_name, charset='utf8')
If your database is actually configured for Latin-1, then you cannot store non-Latin-1 characters in it. That includes U+2014, EM DASH.
The ideal solution is to just switch to a database configured for UTF-8. Just pass charset='utf-8' when initially creating the database, and every time you connect to it. (If you already have existing data, you probably want to use MySQL tools to migrate the old database to a new one, instead of Python code, but the basic idea is the same.)
However, sometimes that isn't possible. Maybe you have other software that can't be updated, requires Latin-1, and needs to share the same database. Or maybe you've mixed Latin-1 text and binary data in ways that can't be programmatically unmixed, or your database is just too huge to migrate, or whatever. In that case, you have two choices:
Destructively convert your strings to Latin-1 before storing and searching. For example, you might want to convert an em dash to -, or to --, or maybe it's not all that important and you can just convert all non-Latin-1 characters to ? (which is faster and simpler).
Come up with an encoding scheme to smuggle non-Latin-1 characters into the database. This means some searches become more complicated, or just can't be done directly in the database.
This might be a heavy read, but at least got me started.
http://www.joelonsoftware.com/articles/Unicode.html
I have the following code . I use Python 2.7
import csv
import sqlite3
conn = sqlite3.connect('torrents.db')
c = conn.cursor()
# Create table
c.execute('''DROP TABLE torrents''')
c.execute('''CREATE TABLE IF NOT EXISTS torrents
(name text, size long, info_hash text, downloads_count long,
category_id text, seeders long, leechers long)''')
with open('torrents_mini.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter='|')
for row in spamreader:
name = unicode(row[0])
size = row[1]
info_hash = unicode(row[2])
downloads_count = row[3]
category_id = unicode(row[4])
seeders = row[5]
leechers = row[6]
c.execute('INSERT INTO torrents (name, size, info_hash, downloads_count,
category_id, seeders, leechers) VALUES (?,?,?,?,?,?,?)',
(name, size, info_hash, downloads_count, category_id, seeders, leechers))
conn.commit()
conn.close()
The error message I receive is
Traceback (most recent call last):
File "db.py", line 15, in <module>
name = unicode(row[0])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 14: ordinal not in range(128)
If I don't convert into unicode then the error i get is
sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
adding name = row[0].decode('UTF-8') gives me another error
Traceback (most recent call last):
File "db.py", line 27, in <module>
for row in spamreader:
_csv.Error: line contains NULL byte
the data contained in the csv file is in the following format
Tha Twilight New Moon DVDrip 2009 XviD-AMiABLE|694554360|2cae2fc76d110f35917d5d069282afd8335bc306|0|movies|0|1
Edit:I finally dropped the attempt and accomplished the task using sqlite3 command-line tool(it was quite easy).
I do not yet know what caused the errors , but when sqlite3 was importing the said csv file , it kept popping warnings about "unescaped character", the character being quotes(").
Thanks to everyone who tried to help.
Your data is not encoded as ASCII. Use the correct codec for your data.
You can tell Python what codec to use with:
unicode(row[0], correct_codec)
or use the str.decode() method:
row[0].decode(correct_codec)
What that correct codec is, we cannot tell you. You'll have to consult whatever you got the file from.
If you cannot figure out what encoding was used, you could use a package like chardet to make an educated guess, but take into account that such a library is not fail-proof.
I am still learning Python and as a little Project I wrote a script that would take the values I have in a text file and insert them into a sqlite3 database. But some of the names have weird letter (I guess you would call them non-ASCII), and generate an error when they come up. Here is my little script (and please tell me if there is anyway it could be more Pythonic):
import sqlite3
f = open('complete', 'r')
fList = f.readlines()
conn = sqlite3.connect('tpb')
cur = conn.cursor()
for i in fList:
exploaded = i.split('|')
eList = (
(exploaded[1], exploaded[5])
)
cur.execute('INSERT INTO magnets VALUES(?, ?)', eList)
conn.commit()
cur.close()
And it generates this error:
Traceback (most recent call last):
File "C:\Users\Admin\Desktop\sortinghat.py", line 13, in <module>
cur.execute('INSERT INTO magnets VALUES(?, ?)', eList)
sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a te
xt_factory that can interpret 8-bit bytestrings (like text_factory = str). It is
highly recommended that you instead just switch your application to Unicode str
ings.
To get the file contents into unicode you need to decode from whichever encoding it is in.
It looks like you're on Windows so a good bet is cp1252.
If you got the file from somewhere else all bets are off.
Once you have the encoding sorted, an easy way to decode is to use the codecs module, e.g.:
import codecs
# ...
with codecs.open('complete', encoding='cp1252') as fin: # or utf-8 or whatever
for line in fin:
to_insert = (line.split('|')[1], line.split('|')[5])
cur.execute('INSERT INTO magnets VALUES (?,?)', to_insert)
conn.commit()
# ...