Python Utf-8 writing into CSV

Python Utf-8 writing into CSV - python

I fail to save already encoded data into CSV. I could decode the CSV file afterwards, but I rather do all data cleaning before. I managed to save only text, but when I add timestamp it is impossible.
What I am doing wrong? I read that if srt() and .encode() is not working and should try .join instead, but still nothing
error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
code:
def on_data(self, data):
try:
#print data
tweet = data.split(',"text":"')[1].split('","source')[0]
x = tweet.encode('utf-8')
y = x.decode('unicode-escape')
print y
saveThis = y
#saveThis = str(time.time())+'::' + tweet.decode('ascii', 'ignore')
#saveThis = u' '.join((time.time()+'::'+tweet)).encode('utf-8')
saveFile = open('twitDB.csv', 'a')
saveFile.write(saveThis)
saveFile.write('\n')
saveFile.close()
return True
except BaseException, e:
print 'fail on data,', str(e)
time.sleep(5)
def on_error(self, status):
print status

First of all, make sure you handle your JSON data properly, using the json module.
Next, don't catch BaseException, you have no reason to catch memory errors or keyboard interrupts here. Catch more specific exceptions, instead.
Next, encode your data before writing:
def on_data(self, data):
try:
tweet = json.loads(data)['text']
except (ValueError, KeyError), e:
# Not JSON or no text key
print 'fail on data {}'.format(data)
return
with open('twitDB.csv', 'a') as save_file:
save_file.write(tweet.encode('utf8') + '\n')
return True

Related

ERROR uploading: 'latin-1' codec can't encode character '\u2019' with JSON data upload

I am using python to upload some JSON data to the application UI but getting following error.
ERROR uploading: 'latin-1' codec can't encode character '\u2019' in position 5735: Body ('â') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
The program takes the input from a sample.json file which includes a special character ( ' ) and that's giving the error.
Value: amex%?'
My code looks like:
def read_from_file(file_path, target_path=None):
try:
f = open(file_path, "r")
data = json.load(f)
f.close()
if target_path:
result_obj = []
for obj in data:
if target_path in obj['Key']:
result_obj.append(obj)
data = result_obj
except Exception as e:
print ("ERROR reading file:", e, file=sys.stderr)
exit(1)
return data
def upload(server, token, data):
params = {"token": token}
for obj in data:
try:
payload = obj['Value']
url = server + obj['Key']
response = requests.put(url, data=payload, params=params)
if response.status_code != 200:
raise Exception("HTTP code %s on PUT %s" % (response.status_code, url))
except Exception as e:
print ("ERROR uploading:", e, file=sys.stderr)
exit(1)
Can somebody please advise where I need to change my code to include special character ( ' ) while upload?

Cannot post a zip file in Python. Unicode decoding error

When trying to submit a zip file using urllib2 I am getting a UnicodeDecodeError with the following messages:
Exception during urlopen: 'ascii' codec can't decode byte 0xf1 in position 12: ordinal not in range(128)
Exception: 'ascii' codec can't decode byte 0xf1 in position 12: ordinal not in range(128)
Exception of type: <type 'exceptions.UnicodeDecodeError'>
Exception. Message: "". Doc: "Unicode decoding error.".
Exception during export:
e.__doc__=Unicode decoding error.
The exception is raised on the line response = urllib2.urlopen(request).
def depositZipFile(tempZipFileName, tempZipFilePath, depositUrl, tr):
print('depositZipFile(). tempZipFileName=%s, tempZipFilePath=%s, depositUrl=%s, tr=%s' % (tempZipFileName, tempZipFilePath, depositUrl, str(tr)))
with open(tempZipFilePath, 'rb') as f:
zipData = f.read()
print('depositZipFile(). type(zipData)=%s' % type(zipData))
headers = {
'In-Progress': 'true',
'Content-Disposition': 'filename=' + tempZipFileName,
'Content-Type': 'application/zip',
'Content-Length': os.stat(tempZipFilePath).st_size,
'Content-Transfer-Encoding': 'binary',
'Packaging': 'http://purl.org/net/sword/package/METSDSpaceSIP',
}
try:
request = urllib2.Request(depositUrl, data=zipData, headers=headers)
try:
response = urllib2.urlopen(request)
except Exception as e:
print('Exception during urlopen: ' + str(e))
raise e
print('Got response. response=%s' % str(response))
xmlText = response.read()
xmlRoot = ET.fromstring(xmlText)
linkElement = xmlRoot.find('xmlns:link[#rel="alternate"]', namespaces=dict(xmlns='http://www.w3.org/2005/Atom'))
if linkElement is None:
raise ValueError('No redirection URL is found in the response.')
href = linkElement.attrib['href']
return href
except urllib2.HTTPError as e:
print('HTTPError: ' + str(e))
print('HTTPError: %s' % str(e.code))
print('HTTPError message: %s' % e.read())
raise e
except Exception as e:
print('Exception: ' + str(e))
print('Exception of type: %s' % type(e))
print('Exception. Message: "%s". Doc: "%s".' % (e.message, e.__doc__))
raise e
Before the aforementioned method is called the user is authenticated using basic authentication. See the following method.
def authenticateUser(tr, url):
user = getConfigurationProperty(tr, 'user')
password = getConfigurationProperty(tr, 'password')
realm = getConfigurationProperty(tr, 'realm')
pm = urllib2.HTTPPasswordMgr()
pm.add_password(realm, url, user, password)
authHandler = urllib2.HTTPBasicAuthHandler(pm)
opener = urllib2.build_opener(authHandler)
urllib2.install_opener(opener)
I am very new to Python and maybe I am missing something obvious. Please advise.
I am using Python 2.7, Jython implementation.

Aparently the problem was that the type of depositUrl was unicode instead of str. Therefore, the urllib2.Request() method was expecting unicode types for all parameters. When I made the following conversion everything srtarted working:
depositUrl = str(depositUrl)

TypeError: Can't convert 'bytes' object to str implicitly while working with sockets

So I'm trying to convert code from Python 2.7 to Python 3, and it seems as though something has changed. I'm trying to receive binary data over a socket and now it doesn't work. Here's my code.
EDIT: I have added my send code. Also, I don't really like the way it works right now, it's overcomplicated. If you can it would be nice to have a better way of sending/receiving data.
def recv(self):
# Receive the length of the incoming message (unpack the binary data)
dataLength = socket.ntohl(struct.unpack("I", self._recv(4))[0])
# Receive the actual data
return self._recv(dataLength)
def _recv(self, length):
try:
data = ''
recvLen = 0
while recvLen < length:
newData = self.sock.recv(length-recvLen)
if newData == '':
self.isConnected = False
raise exceptions.NetworkError(errors.CLOSE_CONNECTION, errno=errors.ERR_CLOSED_CONNECTION)
data = data + newData # TypeError here
recvLen += len(newData)
return data
except socket.error as se:
raise exceptions.NetworkError(str(se))
def send(self, data):
if type(data) is not str:
raise TypeError()
dataLength = len(data)
# Send the length of the message (int converted to network byte order and packed as binary data)
self._send(struct.pack("I", socket.htonl(dataLength)), 4)
# Send the actual data
self._send(data, dataLength)
def _send(self, data, length):
sentLen = 0
while sentLen < length:
try:
amountSent = self.sock.send(data[sentLen:])
except Exception:
self.isConnected = False
raise exceptions.NetworkError(errors.UNEXPECTED_CLOSE_CONNECTION)
if amountSent == 0:
self.isConnected = False
raise exceptions.NetworkError(errors.UNEXPECTED_CLOSE_CONNECTION)
sentLen += amountSent

Python 3 sends data as bytes so you have to decode to string
data = data + newData.decode('utf-8')
# or
data = data + newData.decode('ascii')
if you need bytes data then use
data = b''
and keep without .decode()
data = data + newData
EDIT: for new code in question.
When you send you have to convert/encode string to bytes and after that get its length. Native chars has length 1 as unicode but they can use 2 bytes (or more).
When you receive you have to work with bytes b'' and at the end convert/decode bytes to string again.
See comments # <-- in code
def send(self, data):
if not isinstance(data, str): # <-- prefered method
#if type(data) is not str:
raise TypeError()
data = data.encode('utf-8') # <-- convert to bytes
# get size of bytes
dataLength = len(data)
# Send the length of the message (int converted to network byte order and packed as binary data)
self._send(struct.pack("I", socket.htonl(dataLength)), 4)
# Send the actual data
self._send(data, dataLength)
def recv(self):
# Receive the length of the incoming message (unpack the binary data)
dataLength = socket.ntohl(struct.unpack("I", self._recv(4))[0])
# Receive the actual data
return self._recv(dataLength).decode('utf-8') # <-- convert to string again
def _recv(self, length):
try:
data = b'' # <-- use bytes
recvLen = 0
while recvLen < length:
newData = self.sock.recv(length-recvLen)
#if newData == b'': # <-- use bytes
if not newData: # <-- or
self.isConnected = False
raise exceptions.NetworkError(errors.CLOSE_CONNECTION, errno=errors.ERR_CLOSED_CONNECTION)
data = data + newData # TypeError here
recvLen += len(newData)
return data
except socket.error as se:
raise exceptions.NetworkError(str(se))

Use of subprocess.call results in "too many open files"

I have the following code to create thumbnails and save images. However, after about 1000 items it raises an error saying too many open files. Where is this coming from? And how would I fix the code?
def download_file(url, extension='jpg'):
""" Download a large file. Return path to saved file.
"""
req = requests.get(url)
if not req.ok:
return None
guid = str(uuid.uuid4())
tmp_filename = '/tmp/%s.%s' % (guid, extension)
with open(tmp_filename, 'w') as f:
for chunk in req.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
f.flush()
return tmp_filename
def update_artwork_item(item):
# Download the file
tmp_filename = util.download_file(item.artwork_url)
# Create thumbs
THUMB_SIZES = [(1000, 120), (1000, 30)]
guid = str(uuid.uuid4())
S3_BASE_URL = 'https://s3-us-west-1.amazonaws.com/xxx/'
try:
for size in THUMB_SIZES:
outfile = '%s_%s.jpg' % (guid, size[1])
img = Image.open(tmp_filename).convert('RGB')
img.thumbnail(size, Image.ANTIALIAS)
img.save(outfile, "JPEG")
s3_cmd = '%s %s premiere-avails --norr --public' % (S3_CMD, outfile) ## doesn't work half the time
x = subprocess.check_call(shlex.split(s3_cmd))
if x: raise
subprocess.call(['rm', outfile], stdout=FNULL, stderr=subprocess.STDOUT)
except Exception, e:
print '&&&&&&&&&&', Exception, e
else:
# Save the artwork icons
item.artwork_120 = S3_BASE_URL + guid + '_120.jpg'
item.artwork_30 = S3_BASE_URL + guid + '_30.jpg'
# hack to fix parallel saving
while True:
try:
item.save()
except Exception, e:
print '******************', Exception, e
time.sleep(random.random()*1e-1)
continue
else:
subprocess.call(['rm', tmp_filename], stdout=FNULL, stderr=subprocess.STDOUT)
break

It's almost certainly your use of subprocess.call. subprocess.call is asynchronous, and returns a pipe object, which you are responsible for closing. (See the documentation). So what's happening is that each time you call subprocess.call, a new pipe object is being returned, and you eventually run out of file handles.
By far the easiest thing to do would be to just remove the file from Python by calling os.remove instead of piping to the Unix rm command. Your use of check_call is okay, because check_call is synchronous and won't return a file object you have to close.

md5 search using exceptions

import httplib
import re
md5 = raw_input('Enter MD5: ')
conn = httplib.HTTPConnection("www.md5.rednoize.com")
conn.request("GET", "?q="+ md5)
try:
response = conn.getresponse()
data = response.read()
result = re.findall('<div id="result" >(.+?)</div', data)
print result
except:
print "couldnt find the hash"
raw_input()
I know I'm probably implementing the code wrong, but which exception should I use for this? if it cant find the hash then raise an exception and print "couldnt find the hash"

Since re.findall doesn't raise exceptions, that's probably not how you want to check for results. Instead, you could write something like
result = re.findall('<div id="result" >(.+?)</div', data)
if result:
print result
else:
print 'Could not find the hash'

If you realy like to have an exception there you have to define it:class MyError(Exception):
def init(self, value):
self.value = value
def str(self):
return repr(self.value)
try:
response = conn.getresponse()
data = response.read()
result = re.findall('(.+?)</div', data)
if not result:
raise MyError("Could not find the hash")
except MyError:
raise

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Utf-8 writing into CSV - python

Related

ERROR uploading: 'latin-1' codec can't encode character '\u2019' with JSON data upload

Cannot post a zip file in Python. Unicode decoding error

TypeError: Can't convert 'bytes' object to str implicitly while working with sockets

Use of subprocess.call results in "too many open files"

md5 search using exceptions

Categories

Resources