UnicodeEncodeError: 'ascii' - python

Sorry guys I'm really new at this.. Here is the full python script.
The purpose of the script is to read two different 1 wire temperature sensors and then use HTTP post to write those values into a mysql database.
#!/usr/bin/python
# -*- coding: utf-8 -*-
import requests
import hashlib
import time
#Dont forget to fill in PASSWORD and URL TO saveTemp (twice) in this file
sensorids = ["28-00000", "28-000004"]
avgtemperatures = []
for sensor in range(len(sensorids)):
temperatures = []
for polltime in range(0,3):
text = '';
while text.split("\n")[0].find("YES") == -1:
# Open the file that we viewed earlier so that python can see what is in it. Replace the serial number as before.
tfile = open("/sys/bus/w1/devices/"+ sensorids[sensor] +"/w1_slave")
# Read all of the text in the file.
text = tfile.read()
# Close the file now that the text has been read.
tfile.close()
time.sleep(1)
# Split the text with new lines (\n) and select the second line.
secondline = text.split("\n")[1]
# Split the line into words, referring to the spaces, and select the 10th word (counting from 0).
temperaturedata = secondline.split(" ")[9]
# The first two characters are "t=", so get rid of those and convert the temperature from a string to a number.
temperature = float(temperaturedata[2:])
# Put the decimal point in the right place and display it.
temperatures.append(temperature / 1000 * 9.0 / 5.0 + 32.0)
avgtemperatures.append(sum(temperatures) / float(len(temperatures)))
print avgtemperatures[0]
print avgtemperatures[1]
session = requests.Session()
# Getting a fresh nonce which we will use in the authentication step.
nonce = session.get(url='http://127.0.0.1/temp/saveTemp.php?step=nonce').text
# Hashing the nonce, the password and the temperature values (to provide some integrity).
response = hashlib.sha256('{}PASSWORD{}{}'.format(nonce.encode('utf8'), *avgtemperatures).hexdigest())
# Post data of the two temperature values and the authentication response.
post_data = {'response':response, 'temp1':avgtemperatures[0], 'temp2': avgtemperatures[1]}
post_request = session.post(url='http://127.0.0.1/temp/saveTemp.php', data=post_data)
if post_request.status_code == 200 :
print post_request.text
Below is the NEW error that I get.
Traceback (most recent call last):
File "/var/www/pollSensors.py", line 42, in <module>
response = hashlib.sha256('{}PASSWORD{}{}'.format(nonce.encode('utf8'), *avgtemperatures).hexdigest())
AttributeError: 'str' object has no attribute 'hexdigest'

nonce is a unicode value; session.get(..).text is always unicode.
You are trying to force that value into a string without explicitly providing an encoding. As a result Python is trying to encode it for you with the default ASCII codec. That encoding is failing.
Encode your Unicode values to strings explicitly instead. For a SHA 256 hash, UTF-8 is probably fine.
response = hashlib.sha256(nonce.encode('utf8') + 'PASSWORD' +
str(avgtemperatures[0]) +
str(avgtemperatures[1])).hexdigest()
or use string templating:
response = hashlib.sha256('{}PASSWORD{}{}'.format(
nonce.encode('utf8'), *avgtemperatures)).hexdigest()

I got the similar issue except that it was decimal instead of ascci
remove directory: your-profile.spyder2\spyder.lock

Related

Multi line base64 string returns only last line after decoding in Python 3

I have a set of base64 encoded strings. When I try to decode the same, the decoded text just has the last line of the actual string. On any online base64 decoder, the entire text is displayed. I have tried with and without encoding the text to utf-8 prior to decoding.
Here is my code:
import base64
encodedStr = "QXJyaXZhbCBNZXNzYWdlDURhdGUgKFVUQyk6IDI2SlVMMjAN"
#with encoding
encodedstr_bytes = encodedStr.encode('utf-8')
decodedBytes = base64.b64decode(encodedstr_bytes)
decodedStr = decodedBytes.decode('utf8')
print(decodedStr)
#without encoding
decodedBytes = base64.b64decode(encodedStr)
decodedStr = decodedBytes.decode("utf-8")
print(decodedStr)
Output :
Date (UTC): 26JUL20
Date (UTC): 26JUL20
Required Output :
Arrival Message
Date (UTC): 26JUL20
Your string has a carriage return ("\r"), but no linefeed ("\n"). On Windows, this instructs the printer to return to the start of the line and overwrite what is there. The following code behaves the same:
print("foo\rbar") # Prints "bar"
print("fooqux\rbar") # Prints "barqux"

UnicodeEncodeError in Python CSV manipulation script

I have a script that was working earlier but now stops due to UnicodeEncodeError.
I am using Python 3.4.3.
The full error message is the following:
Traceback (most recent call last):
File "R:/A/APIDevelopment/ScivalPubsExternal/Combine/ScivalPubsExt.py", line 58, in <module>
outputFD.writerow(row)
File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\x8a' in position 413: character maps to <undefined>
How can I address this error?
The Python script is the following below:
import pdb
import csv,sys,os
import glob
import os
import codecs
os.chdir('R:/A/APIDevelopment/ScivalPubsExternal/Combine')
joinedFileOut='ScivalUpdate'
csvSourceDir="R:/A/APIDevelopment/ScivalPubsExternal/Combine/AustralianUniversities"
# create dictionary from Codes file (Institution names and codes)
codes = csv.reader(open('Codes.csv'))
#rows of the file are stored as lists/arrays
InstitutionCodesDict = {}
InstitutionYearsDict = {}
for row in codes:
#keys: instnames, #values: instcodes
InstitutionCodesDict[row[0]] = row[1]
#define year dictionary with empty values field
InstitutionYearsDict[row[0]] = []
#to create a fiel descriptor for the outputfile, wt means text mode (also rt opr r is the same)
with open(joinedFileOut,'wt') as csvWriteFD:
#write the file (it is still empty here)
outputFD=csv.writer(csvWriteFD,delimiter=',')
#with closes the file at the end, if exception occurs then before that
# open each scival file, create file descriptor (encoding needed) and then read it and print the name of the file
if not glob.glob(csvSourceDir+"/*.csv"):
print("CSV source files not found")
sys.exit()
for scivalFile in glob.glob(csvSourceDir+"/*.csv"):
#with open(scivalFile,"rt", encoding="utf8") as csvInFD:
with open(scivalFile,"rt", encoding="ISO-8859-1") as csvInFD:
fileFD = csv.reader(csvInFD)
print(scivalFile)
#create condition for loop
printon=False
#reads all rows in file and creates lists/arrays of each row
for row in fileFD:
if len(row)>1:
#the next printon part is skipped when looping through the rows above the data because it is not set to true
if printon:
#inserts instcode and inst sequentially to each row where there is data and after the header row
row.insert(0, InstitutionCode)
row.insert(0, Institution)
if row[10].strip() == "-":
row[10] = " "
else:
p = row[10].zfill(8)
q = p[0:4] + '-' + p[4:]
row[10] = q
#writes output file
outputFD.writerow(row)
else:
if "Publications at" in row[1]:
#get institution name from cell B1
Institution=row[1].replace('Publications at the ', "").replace('Publications at ',"")
print(Institution)
#lookup institution code from dictionary
InstitutionCode=InstitutionCodesDict[Institution]
#printon gets set to TRUE after the header column
if "Title" in row[0]: printon=True
if "Publication years" in row[0]:
#get the year to print it later to see which years were pulled
year=row[1]
#add year to institution in dictionary
if not year in InstitutionYearsDict[Institution]:
InstitutionYearsDict[Institution].append(year)
# Write a report showing the institution name followed by the years for
# which we have that institution's data.
with open("Instyears.txt","w") as instReportFD:
for inst in (InstitutionYearsDict):
instReportFD.write(inst)
for yr in InstitutionYearsDict[inst]:
instReportFD.write(" "+yr)
instReportFD.write("\n")
Make sure to use the correct encoding of your source and destination files. You open files in three locations:
codes = csv.reader(open('Codes.csv'))
: : :
with open(joinedFileOut,'wt') as csvWriteFD:
outputFD=csv.writer(csvWriteFD,delimiter=',')
: : :
with open(scivalFile,"rt", encoding="ISO-8859-1") as csvInFD:
fileFD = csv.reader(csvInFD)
This should look something like:
# Use the correct encoding. If you made this file on
# Windows it is likely Windows-1252 (also known as cp1252):
with open('Codes.csv', encoding='cp1252') as f:
codes = csv.reader(f)
: : :
# The output encoding can be anything you want. UTF-8
# supports all Unicode characters. Windows apps tend to like
# the files to start with a UTF-8 BOM if the file is UTF-8,
# so 'utf-8-sig' is an option.
with open(joinedFileOut,'w', encoding='utf-8-sig') as csvWriteFD:
outputFD=csv.writer(csvWriteFD)
: : :
# This file is probably the cause of your problem and is not ISO-8859-1.
# Maybe UTF-8 instead? 'utf-8-sig' will safely handle and remove a UTF-8 BOM
# if present.
with open(scivalFile,'r', encoding='utf-8-sig') as csvInFD:
fileFD = csv.reader(csvInFD)
The error is caused by an attempt to write a string containing a U+008A character using the default cp1252 encoding of your system. It is trivial to fix, just declare a latin1 encoding (or iso-8859-1) for your output file (because it just outputs the original byte without conversion):
with open(joinedFileOut,'wt', encoding='latin1') as csvWriteFD:
But this will only hide the real problem: where does this 0x8a character come from? My advice is to intercept the exception and dump the line where it occurs:
try:
outputFD.writerow(row)
except UnicodeEncodeError:
# print row, the name of the file being processed and the line number
It is probably caused by one of the input files not being is-8859-1 encoded but more probably utf8 encoded...

Convertion between ISO-8859-2 and UTF-8 in Python

I'm wondering how can I convert ISO-8859-2 (latin-2) characters (I mean integer or hex values that represents ISO-8859-2 encoded characters) to UTF-8 characters.
What I need to do with my project in python:
Receive hex values from serial port, which are characters encoded in ISO-8859-2.
Decode them, this is - get "standard" python unicode strings from them.
Prepare and write xml file.
Using Python 3.4.3
txt_str = "ąęłóźć"
txt_str.decode('ISO-8859-2')
Traceback (most recent call last): File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
The main problem is still to prepare valid input for the "decode" method (it works in python 2.7.10, and thats the one I'm using in this project). How to prepare valid string from decimal value, which are Latin-2 code numbers?
Note that it would be uber complicated to receive utf-8 characters from serial port, thanks to devices I'm using and communication protocol limitations.
Sample data, on request:
68632057
62206A75
7A647261
B364206F
20616775
777A616E
616A2061
6A65696B
617A20B6
697A7970
6A65B361
70697020
77F36469
62202C79
6E647572
75206A65
7963696C
72656D75
6A616E20
73726F67
206A657A
65647572
77207972
73772065
00000069
This is some sample data. ISO-8859-2 pushed into uint32, 4 chars per int.
bit of code that manages unboxing:
l = l[7:].replace(",", "").replace(".", "").replace("\n","").replace("\r","") # crop string from uart, only data left
vl = [l[0:2], l[2:4], l[4:6], l[6:8]] # list of bytes
vl = vl[::-1] # reverse them - now in actual order
To get integer value out of hex string I can simply use:
int_vals = [int(hs, 16) for hs in vl]
Your example doesn't work because you've tried to use a str to hold bytes. In Python 3 you must use byte strings.
In reality, if you're using PySerial then you'll be reading byte strings anyway, which you can convert as required:
with serial.Serial('/dev/ttyS1', 19200, timeout=1) as ser:
s = ser.read(10)
# Py3: s == bytes
# Py2.x: s == str
my_unicode_string = s.decode('iso-8859-2')
If your iso-8895-2 data is actually then encoded to ASCII hex representation of the bytes, then you have to apply an extra layer of encoding:
with serial.Serial('/dev/ttyS1', 19200, timeout=1) as ser:
hex_repr = ser.read(10)
# Py3: hex_repr == bytes
# Py2.x: hex_repr == str
# Decodes hex representation to bytes
# Eg. b"A3" = b'\xa3'
hex_decoded = codecs.decode(hex_repr, "hex")
my_unicode_string = hex_decoded.decode('iso-8859-2')
Now you can pass my_unicode_string to your favourite XML library.
Interesting sample data. Ideally your sample data should be a direct print of the raw data received from PySerial. If you actually are receiving the raw bytes as 8-digit hexadecimal values, then:
#!python3
from binascii import unhexlify
data = b''.join(unhexlify(x)[::-1] for x in b'''\
68632057
62206A75
7A647261
B364206F
20616775
777A616E
616A2061
6A65696B
617A20B6
697A7970
6A65B361
70697020
77F36469
62202C79
6E647572
75206A65
7963696C
72656D75
6A616E20
73726F67
206A657A
65647572
77207972
73772065
00000069'''.splitlines())
print(data.decode('iso-8859-2'))
Output:
W chuj bardzo długa nazwa jakiejś zapyziałej pipidówy, brudnej ulicyumer najgorszej rudery we wsi
Google Translate of Polish to English:
The dick very long name some zapyziałej Small Town , dirty ulicyumer worst hovel in the village
This topic is closed. Working code, that handles what need to be done:
x=177
x.to_bytes(1, byteorder='big').decode("ISO-8859-2")

Writing to UTF-16-LE text file with BOM

I've read a few postings regarding Python writing to text files but I could not find a solution to my problem. Here it is in a nutshell.
The requirement: to write values delimited by thorn characters (u00FE; and surronding the text values) and the pilcrow character (u00B6; after each column) to a UTF-16LE text file with BOM (FF FE).
The issue: The written-to text file has whitespace between each column that I did not script for. Also, it's not showing up right in UltraEdit. Only the first value ("mom") shows. I welcome any insight or advice.
The script (simplified to ease troubleshooting; the actual script uses a third-party API to obtain the list of values):
import os
import codecs
import shutil
import sys
import codecs
first = u''
textdel = u'\u00FE'.encode('utf_16_le') #thorn
fielddel = u'\u00B6'.encode('utf_16_le') #pilcrow
list1 = ['mom', 'dad', 'son']
num = len(list1) #pretend this is from the metadata profile
f = codecs.open('c:/myFile.txt', 'w', 'utf_16_le')
f.write(u'\uFEFF')
for item in list1:
mytext2 = u''
i = 0
i = i + 1
mytext2 = mytext2 + item + textdel
if i < (num - 1):
mytext2 = mytext2 + fielddel
f.write(mytext2 + u'\n')
f.close()
You're double-encoding your strings. You've already opened your file as UTF-16-LE, so leave your textdel and fielddel strings unencoded. They will get encoded at write time along with every line written to the file.
Or put another way, textdel = u'\u00FE' sets textdel to the "thorn" character, while textdel = u'\u00FE'.encode('utf-16-le') sets textdel to a particular serialized form of that character, a sequence of bytes according to that codec; it is no longer a sequence of characters:
textdel = u'\u00FE'
len(textdel) # -> 1
type(textdel) # -> unicode
len(textdel.encode('utf-16-le')) # -> 2
type(textdel.encode('utf-16-le')) # -> str

Unable to convert UTF-8 characters - Python

I receive a bunch of data into a variable using a mechanize and urllib in Python 2.7. However, certain characters are not decoded despite using .decode(UTF-8). The full code is as follows:
#!/usr/bin/python
import urllib
import mechanize
from time import time
total_time = 0
count = 0
def send_this(url):
global count
count = count + 1
this_browser=mechanize.Browser()
this_browser.set_handle_robots(False)
this_browser.addheaders=[('User-agent','Chrome')]
translated=this_browser.open(url).read().decode("UTF-8")
return translated
def collect_this(my_ltarget,my_lhome,data):
global total_time
data = data.replace(" ","%20")
get_url="http://mymemory.translated.net/api/ajaxfetch?q="+data+"&langpair="+my_lhome+"|"+my_ltarget+"&mtonly=1"
return send_this(get_url)
ctr = 0
print collect_this("hi-IN","en-GB","This is my first proper computer program.")
The output of the print statement is:
{"responseData":{"translatedText":"\u092f\u0939 \u092e\u0947\u0930\u093e \u092a\u0939
u0932\u093e \u0938\u092e\u0941\u091a\u093f\u0924 \u0915\u0902\u092a\u094d\u092f\u0942\u091f
\u0930 \u092a\u094d\u0930\u094b\u0917\u094d\u0930\u093e\u092e \u0939\u0948
\u0964"},"responseDetails":"","responseStatus":200,"matches":[{"id":0,"segment":"This is my
first proper computer program.","translation":"\u092f\u0939 \u092e\u0947\u0930\u093e \u092a
\u0939\u0932\u093e \u0938\u092e\u0941\u091a\u093f\u0924 \u0915\u0902\u092a\u094d\u092f\u0942
\u091f\u0930 \u092a\u094d\u0930\u094b\u0917\u094d\u0930\u093e\u092e \u0939\u0948
\u0964","quality":"70","reference":"Machine Translation provided by Google, Microsoft,
Worldlingo or MyMemory customized engine.","usage-count":0,"subject":"All","created-
by":"MT!","last-updated-by":"MT!","create-date":"2013-12-20","last-update-
date":"2013-12-20","match":0.85}]}
The characters starting with \u... are supposed to be the characters that were supposed to be converted.
Where have I gone wrong?
You don't have a UTF-8-encoded string. You have JSON with JSON unicode escapes in it. Decode it with a JSON decoder:
import json
json.loads(your_json_string)

Categories

Resources