UnicodeEncodeError Python 2.7 - python

I am using Tweepy for authentication and I am trying to print text, but I am unable to print the text. I am getting some UnicodeEncodeError. I tried some method but I was unable to solve it.
# -*- coding: utf-8 -*-
import tweepy
consumer_key = ""
consumer_secret = ""
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
public_tweets = api.home_timeline()
for tweet in public_tweets:
print tweet.text.decode("utf-8")+'\n'
Error:
(venv) C:\Users\e2sn7cy\Documents\GitHub\Tweepy>python tweepyoauth.py
Throwback to my favourite! Miss this cutie :) #AdityaRoyKapur https://t.co/sxm8g1qhEb/n
Cristiano Ronaldo: 3 hat-tricks in his last 3 matches.
Lionel Messi: 3 trophies in his last 3 matches. http://t.co/For1It4QxF/n
How to Bring the Outdoors in With Indoor Gardens http://t.co/efQjwcszDo http://t.co/1NLxSzHxlI/n
Traceback (most recent call last):
File "tweepyoauth.py", line 17, in <module>
print tweet.text.decode("utf-8")+'/n'
File "C:\myPython\venv\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)

This line print tweet.text.decode("utf-8")+'/n' is the cause.
You decode tweet.text as utf-8 into an unicode string. Fine until here.
But you next try to concatenate it with a raw string '/n' (BTW, I think you really wanted \n) and python try to convert the unicode string to an ascii raw string giving the error.
You should concatenate with a unicode string to obtain a unicode string without conversion :
print tweet.text.decode("utf-8") + u'\n'
If this is not enough, it could be because your environment cannot directly print unicode strings. Then you should explictely encode it in the native charset of your system :
print (tweet.text.decode("utf-8") + u'\n').encode('cp850')
[here replace 'cp850' (my charset) with the charset on your system]

Related

How to encode a string using zero width steganography

I am running Python 3.7.x and am trying to figure out how to encode a string, {CTF-FLAG1}, using zero width steganography.
I am using zwsp-steg-py to do so, but I do not know how to use this to encode text into other text, see below:
I want to encode {CTF-FLAG1} inside of the text Now you see me, now you don't. using zero width steganography.
I installed zwsp-steg-py and tried:
#coding=utf-8
import zwsp_steg
encoded = zwsp_steg.encode("{CTF-Flag1}", zwsp_steg.MODE_ZWSP)
decoded = zwsp_steg.decode(encode​​​​​​​‍‌‌‌​​​​​​‌​​‌​​​​​​​​‍‌‍‌​​​​​​​‌‍​​​​​​​​​‍‌‍‌​​​​​​‌‌​​​​​​​​​‌​‌‍‌​​​​​​‌​‍‌‌​​​​​​​‌‍‌‌)
print(decoded)
Yet, the result is:
C:\Users\jerry\Desktop>python decode.py
Traceback (most recent call last):
File "decode.py", line 5, in <module>
decoded = zwsp_steg.decode(encoded)
File "C:\Python367-64\lib\site-packages\zwsp_steg\steganography.py", line 72, in decode
raise TypeError('Unknown encoding detected!')
TypeError: Unknown encoding detected!
I don't think I'm doing it right.
#coding=utf-8
import zwsp_steg
encoded = zwsp_steg.encode("{CTF-Flag1}", zwsp_steg.MODE_ZWSP)
decoded = zwsp_steg.decode(encode​​​​​​​‍‌‌‌​​​​​​‌​​‌​​​​​​​​‍‌‍‌​​​​​​​‌‍​​​​​​​​​‍‌‍‌​​​​​​‌‌​​​​​​​​​‌​‌‍‌​​​​​​‌​‍‌‌​​​​​​​‌‍‌‌, zwsp_steg.MODE_ZWSP)
print(decoded)
# example with string padding
encoded += "This is a test string"
print(encoded)
decoded_the_string = zwsp_steg.decode(encode​​​​​​​‍‌‌‌​​​​​​‌​​‌​​​​​​​​‍‌‍‌​​​​​​​‌‍​​​​​​​​​‍‌‍‌​​​​​​‌‌​​​​​​​​​‌​‌‍‌​​​​​​‌​‍‌‌​​​​​​​‌‍‌‌, zwsp_steg.MODE_ZWSP)
print(decoded_the_string)

Convert base64 encoded google service account key to JSON file using Python

hello I'm trying to convert a google service account JSON key (contained in a base64 encoded field named privateKeyData in file foo.json - more context here ) into the actual JSON file (I need that format as ansible only accepts that)
The foo.json file is obtained using this google python api method
what I'm trying to do (though I am using python) is also described this thread which by the way does not work for me (tried on OSx and Linux).
#!/usr/bin/env python3
import json
import base64
with open('/tmp/foo.json', 'r') as f:
ymldict = json.load(f)
b64encodedCreds = ymldict['privateKeyData']
b64decodedBytes = base64.b64decode(b64encodedCreds,validate=True)
outputStr = b64decodedBytes
print(outputStr)
#issue
outputStr = b64decodedBytes.decode('UTF-8')
print(outputStr)
yields
./test.py
b'0\x82\t\xab\x02\x01\x030\x82\td\x06\t*\x86H\x86\xf7\r\x01\x07\x01\xa0\x82\tU\x04\x82\tQ0\x82\tM0\x82\x05q\x06\t*\x86H\x86\xf7\r\x01\x07\x01\xa0\x82\x05b\x04\x82\x05^0\x82\x05Z0\x82\x05V\x06\x0b*\x86H\x86\xf7\r\x01\x0c\n\x01\x02\xa0\x82\x#TRUNCATING HERE
Traceback (most recent call last):
File "./test.py", line 17, in <module>
outputStr = b64decodedBytes.decode('UTF-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 1: invalid start byte
I think I have run out of ideas and spent now more than a day on this :(
what am I doing wrong?
Your base64 decoding logic looks fine to me. The problem you are facing is probably due to a character encoding mismatch. The response body you received after calling create (your foo.json file) is probably not encoded with UTF-8. Check out the response header's Content-Type field. It should look something like this:
Content-Type: text/javascript; charset=Shift_JIS
Try to decode your base64 decoded string with the encoding used in the content type
b64decodedBytes.decode('Shift_JIS')

Can't determine cause of SyntaxError: invalid character in identifier

I am attempting to use Brickman to use Python coding on a LEGO EV3. When I try to run my code I get the following error
>>robot#ev3dev:~$ python3 CoffeePi_Test/Main.py
File "CoffeePi_Test/Main.py", line 14
tags=[‘coffeepi’]
^
SyntaxError: invalid character in identifier
Here is my code. I am not sure what is triggering this error.
#!/usr/bin/env python3
#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json
import time
import vending
import ev3dev.ev3 as ev3
from ev3dev.auto import *
tags=[legovend]
words=[]
#Variables that contains the user credentials to access Twitter API
#Two versions, redudancy if the first fails...
access_token = # redacted
access_token_secret = # redacted
consumer_key = # redacted
consumer_secret = # redacted
access_token2 = # redacted
access_token_secret2 = # redacted
consumer_key2 = # redacted
consumer_secret2 = # redacted
#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
def on_data(self, data):
global mA,mB,home,cs, lastVend
try:
tweet = json.loads(data)
tw
except:
pass
if tags:
if all(tag in tweet['text'].lower() for tag in tags):
print (tweet['user']['screen_name'], ' - ', tweet['text'])
if int(tweet['timestamp_ms'])>lastVend:
ev3.Leds.set_color(ev3.Leds.LEFT, ev3.Leds.RED)
ev3.Leds.set_color(ev3.Leds.RIGHT, ev3.Leds.RED)
vending.onTweet(mA, mB, cs, home)
time.sleep(2)
lastVend = int(round(time.time() * 1000))+1000
ev3.Leds.set_color(ev3.Leds.LEFT, ev3.Leds.GREEN)
ev3.Leds.set_color(ev3.Leds.RIGHT, ev3.Leds.GREEN)
return True
def on_error(self, status):
print( status)
return False
if __name__ == '__main__':
#This handles Twitter authetification and the connection to Twitter Streaming API
print( 'here')
global mA,mB,home,cs, lastVend
mA = ev3.MediumMotor('outA')
mB = ev3.MediumMotor('outB')
home = mA.position - 40
cs=ColorSensor()
lastVend = int(round(time.time() * 1000))
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
auth2 = OAuthHandler(consumer_key2, consumer_secret2)
auth2.set_access_token(access_token2, access_token_secret2)
stream2 = Stream(auth2, l)
try:
Sound.speak('Ready to go').wait()
print ('trying 1')
stream.filter(track=tags)
print('trying 2')
stream2.filter(track=tags)
Sound.speak('Could not connect').wait()
except Exception as e:
print( e)
Humor is usually not welcome on stackoverflow, but there are exceptions from this rule so let's this answer be such exception.
But let's go back to answering the actual question:
Can't determine cause of SyntaxError: invalid character in identifier
I am not sure what is triggering this error.
I have to admit that I write this fully correct answer with a chuckle :) :
The cause of the SyntaxError is an invalid character in identifier.
The invalid character in the identifier is the character:
[ ’ ]
U+2019 ’ e2 80 99 RIGHT SINGLE QUOTATION MARK
Here Python code with which you can find out for yourself details about the "character" which is causing the trouble:
#!/usr/bin/ python3
# -*- coding: <utf8> -*-
# tags=[’coffeepi’]
coffeepi = "’"
print(ord(coffeepi), hex(ord(coffeepi)), bin(ord(coffeepi)))
what prints:
8217, 0x2019 , 0b10000000011001
Check out this here
http://www.utf8-chartable.de/unicode-utf8-table.pl
where I have got the information about the character from, and this here
https://docs.python.org/3/reference/lexical_analysis.html#identifiers
for more detailed information about which characters are allowed in an indentifier, because not all are and apparently this one does not belong to the range of allowed characters.
By the way: this "character" is NOT a one byte character like it is for ASCII characters. How this character is encoded in UTF-8 is specified in the Unicode table (e2 80 99).
AND ... in the entire code you have provided in the question there is no trace of what you have specified to cause the error message.
How you can fix that?
Just replace these strange quotes with standard quotes " "
ADDENDUM: The strange quotes should be probably standard quotes around a string with a word put into the list. As the Python interpreter is running into this strange quotes it assumes they are part of a variable name (identifier) and not part of a string specification. That is the reason for the confusing error message as you don't see from the code at the first glance WHY the Python interpreter thinks it deals with an variable name (identifier) and not a quoted section of text which should be turned into a Python string.

ascii codec cant encode character u'\u2019' ordinal out of range(128)

Python 2.6, upgrading not an option
Script is designed to take fields from a arcgis database and create Insert oracle statements to a text file that can be used at a later date. There are 7500 records after 3000 records it errors out and says the problem lies at.
fieldValue = unicode(str(row.getValue(field.name)),'utf-8',errors='ignore')
I have tried seemly every variation of unicode and encode. I am new to python and really just need someone with experience to look at my code and see where the problem is.
import arcpy
#Where the GDB Table is located
fc = "D:\GIS Data\eMaps\ALDOT\ALDOT_eMaps_SignInventory.gdb/SignLocation"
fields = arcpy.ListFields(fc)
cursor = arcpy.SearchCursor(fc)
#text file output
outFile = open(r"C:\Users\kkieliszek\Desktop\transfer.text", "w")
#insert values into table billboard.sign_inventory
for row in cursor:
outFile.write("Insert into billboard.sign_inventory() Values (")
for field in fields:
fieldValue = unicode(str(row.getValue(field.name)),'utf-8',errors='ignore')
if row.isNull(field.name) or fieldValue.strip() == "" : #if field is Null or a Empty String print NULL
value = "NULL"
outFile.write('"' + value + '",')
else: #print the value in the field
value = str(row.getValue(field.name))
outFile.write('"' + value + '",')
outFile.write("); \n\n ")
outFile.close() # This closes the text file
Error Code:
Traceback (most recent call last):
File "tablemig.py", line 25, in <module>
fieldValue = unicode(str(row.getValue(field.name)),'utf-8',errors='ignore')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 76: ordinal not in range(128)
Never call str() on a unicode object:
>>> str(u'\u2019')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 0: ordinal not in range(128)
To write rows that contain Unicode strings in csv format, use UnicodeWriter instead of formatting the fields manually. It should fix several issues at once.
File TextWrappers make manual encoding/decoding unnecessary.
Assuming the result from the row is a Unicode, simply use the io.open() with the encoding attribute set to the required encoding.
For example:
import io
with io.open(r"C:\Users\kkieliszek\Desktop\transfer.text", "w", encoding='utf-8') as my_file:
my_file(my_unicode)
The problem is that you need to decode/encode unicode/byte string instead of just calling str on it. So, if you have a byte string object, then you need to call encode on it to convert it into unicode object ignoring utf content. On the other hand, if you have unicode object, you need to call decode on it to convert it into byte string ignoring utf again. So, just use this function instead
import re
def remove_unicode(string_data):
""" (str|unicode) -> (str|unicode)
recovers ascii content from string_data
"""
if string_data is None:
return string_data
if isinstance(string_data, str):
string_data = str(string_data.decode('ascii', 'ignore'))
else:
string_data = string_data.encode('ascii', 'ignore')
remove_ctrl_chars_regex = re.compile(r'[^\x20-\x7e]')
return remove_ctrl_chars_regex.sub('', string_data)
fieldValue = remove_unicode(row.getValue(field.name))
It should fix the problem.

YouTube API UnicodeEncodeError in Python 3.4

I was exploring the YouTube Data API and finding that improperly encoded results were holding me back. I got good results until I retrieve a set that includes unmapped characters in the titles. My code is NOW (cleaned up a little for you fine folks):
import urllib.request
import urllib.parse
import json
import datetime
# Look for videos published up to THIS MANY hours ago
IntHoursToSub = 2
RightNow = datetime.datetime.utcnow()
StartedAgo = datetime.timedelta(hours=-(IntHoursToSub))
HourAgo = RightNow + StartedAgo
HourAgo = str(HourAgo).replace(" ", "T")
HourAgo = HourAgo[:HourAgo.find(".")] + "Z"
# Get API Key from your safe place and set up the API link
YouTubeAPIKey = open('YouTubeAPIKey.txt', 'r').read()
locuURL = "https://www.googleapis.com/youtube/v3/search"
values = {"key": YouTubeAPIKey,
"part": "snippet",
"publishedAfter": HourAgo,
"relevanceLanguage": "en",
"regionCode": "us",
"maxResults": "50",
"type": "live"}
postData = urllib.parse.urlencode(values)
fullURL = locuURL + "?" + postData
# Set up response holder and handle exceptions
respData = ""
try:
req = urllib.request.Request(fullURL)
respData = urllib.request.urlopen(req).read().decode()
except Exception as e:
print(str(e))
#print(respData)
# Read JSON response and iterate through for video names/URLs
jsonData = json.loads(respData)
for object in jsonData["items"]:
if object["id"]["kind"] == "youtube#video":
print(object["snippet"]["title"], "https://www.youtube.com/watch?v=" + object["id"]["videoId"])
The error was:
Traceback (most recent call last):
File "C:/Users/Chad LaFarge/PycharmProjects/APIAccess/YouTubeAPI.py", line 33, in <module>
print(respData)
File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u25bb' in position 11737: character maps to <undefined>
UPDATE
MJY Called it! Starting from PyCharm menu bar:
File -> Settings... -> Editor -> File Encodings, then set: "IDE Encoding", "Project Encoding" and "Default encoding for properties files" ALL to UTF-8 and she now works like a charm.
Many thanks!
Check the sys.stdout.encoding.
If this is not UTF-8, the problem is not in YouTube API.
Please check such as environment variables PYTHONIOENCODING, terminal and locale settings.

Categories

Resources