hello I'm trying to convert a google service account JSON key (contained in a base64 encoded field named privateKeyData in file foo.json - more context here ) into the actual JSON file (I need that format as ansible only accepts that)
The foo.json file is obtained using this google python api method
what I'm trying to do (though I am using python) is also described this thread which by the way does not work for me (tried on OSx and Linux).
#!/usr/bin/env python3
import json
import base64
with open('/tmp/foo.json', 'r') as f:
ymldict = json.load(f)
b64encodedCreds = ymldict['privateKeyData']
b64decodedBytes = base64.b64decode(b64encodedCreds,validate=True)
outputStr = b64decodedBytes
print(outputStr)
#issue
outputStr = b64decodedBytes.decode('UTF-8')
print(outputStr)
yields
./test.py
b'0\x82\t\xab\x02\x01\x030\x82\td\x06\t*\x86H\x86\xf7\r\x01\x07\x01\xa0\x82\tU\x04\x82\tQ0\x82\tM0\x82\x05q\x06\t*\x86H\x86\xf7\r\x01\x07\x01\xa0\x82\x05b\x04\x82\x05^0\x82\x05Z0\x82\x05V\x06\x0b*\x86H\x86\xf7\r\x01\x0c\n\x01\x02\xa0\x82\x#TRUNCATING HERE
Traceback (most recent call last):
File "./test.py", line 17, in <module>
outputStr = b64decodedBytes.decode('UTF-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 1: invalid start byte
I think I have run out of ideas and spent now more than a day on this :(
what am I doing wrong?
Your base64 decoding logic looks fine to me. The problem you are facing is probably due to a character encoding mismatch. The response body you received after calling create (your foo.json file) is probably not encoded with UTF-8. Check out the response header's Content-Type field. It should look something like this:
Content-Type: text/javascript; charset=Shift_JIS
Try to decode your base64 decoded string with the encoding used in the content type
b64decodedBytes.decode('Shift_JIS')
Related
Problem with encoding when i call request.urlopen() method.
Instance of ftplib.FTP() in urllib.request.ftpwrapper init() and retrfile() methods work with default latin-1 and i need to chose between utf-8 and cp1251
I see 3 ways:
Way i want, but don't know how.
Call request.urlopen() with param that contains encoding. And that encoding must be written to the self.ftp.encoding (ftplib.FTP())
Way I don't like.
Get file name encoding from ftp (ftp lib) and use it in request.urlopen(url.encode(file_name_encoding).decode('latin-1')).
Problem description.
I have a file with Cyrillic (rus) characters in its name.
Steps:
Connecting to FTP
con = ftplib.FTP()
con.connect(host, port)
con.login(username, password)
Getting files list
list_files = [_v for _v in self.con.nlst(_path)]
['Message.xml', 'Message_ÁÏ_TT.xml']
(For files Message.xml, Message_БП_TT.xml)
Fix it with using on the first step
con.encoding = 'utf-8'
con.sendcmd('OPTS UTF8 ON')
Then I need to use:
from urllib import request
url = 'ftp://login:password#ftpaddr:21/folder//Message_БП_TT.xml'
request.urlopen(url.encode().decode('latin-1'))
And then getting Exception:
{URLError}<urlopen error ftp error: URLError("ftp error: error_perm('550 The system cannot find the file specified. ')")>
In request lib there are init() and retrfile() where ftp connection initializing.
And i don't see the way how to change ftp default encoding "latin-1".
Use this method because with urllib.response.addinfourl parse heavy xml files.
P.S.
With some FTP this method works well and the file can be successfully read. And with some of them getting that exception. The reasons are not clear yet. And there is no way to get and analyze the FTP settings.
Solution I don't like.
As i understand file name on FTP can be in utf-8 or in cp1251 (win-1251) encoding.
When ftplib initing with standard encode (latin-1) its will look like:
Message_ÐÐ_TT.xml - utf-8
Message_ÁÏ_TT.xml - cp1251
I don't know what encoding uses on ftp while making request, and always use utf-8 (encode()). So i don't like it, but it works:
try:
return request.urlopen(url.encode('utf-8').decode('latin-1'))
except URLError:
return request.urlopen(url.encode('cp1251').decode('latin-1'))
P.S. utf-8 under try for clarity
I am running Python 3.7.x and am trying to figure out how to encode a string, {CTF-FLAG1}, using zero width steganography.
I am using zwsp-steg-py to do so, but I do not know how to use this to encode text into other text, see below:
I want to encode {CTF-FLAG1} inside of the text Now you see me, now you don't. using zero width steganography.
I installed zwsp-steg-py and tried:
#coding=utf-8
import zwsp_steg
encoded = zwsp_steg.encode("{CTF-Flag1}", zwsp_steg.MODE_ZWSP)
decoded = zwsp_steg.decode(encode)
print(decoded)
Yet, the result is:
C:\Users\jerry\Desktop>python decode.py
Traceback (most recent call last):
File "decode.py", line 5, in <module>
decoded = zwsp_steg.decode(encoded)
File "C:\Python367-64\lib\site-packages\zwsp_steg\steganography.py", line 72, in decode
raise TypeError('Unknown encoding detected!')
TypeError: Unknown encoding detected!
I don't think I'm doing it right.
#coding=utf-8
import zwsp_steg
encoded = zwsp_steg.encode("{CTF-Flag1}", zwsp_steg.MODE_ZWSP)
decoded = zwsp_steg.decode(encode, zwsp_steg.MODE_ZWSP)
print(decoded)
# example with string padding
encoded += "This is a test string"
print(encoded)
decoded_the_string = zwsp_steg.decode(encode, zwsp_steg.MODE_ZWSP)
print(decoded_the_string)
Please go through the archival data USA GOV Sample Data
Now I want to read this file in R then getting below mentioned error
result = fromJSON(textFileName)
Error in fromJSON(textFileName) : unexpected character 'u'
When I want to read it in Python then getting below mentioned error
import json
records = [json.loads(line) for line in open(path)]
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4088: character maps to <undefined>
can someone please help me that how can I read this kind of files.
I couldn't get the codes OP provided on the question on my system too(windows/Rstudio/Jupyter). I dig around and find this for R, adapting it to this case:
library(jsonlite)
out <- lapply(readLines("usagov_bitly_data2013-05-17-1368817803"), fromJSON)
df<-data.frame(Reduce(rbind, out))
Although the error I got in R is curiously different from yours.
result = fromJSON("usagov_bitly_data2013-05-17-1368817803")
#Error in parse_con(txt, bigint_as_char) : parse error: trailing garbage
# [ 34.730400, -86.586098 ] } { "a": "Mozilla\/5.0 (Windows N
# (right here) ------^
For Python, as mentioned by juanpa, it seems to be a matter of encoding. The following code works for me.
import json
import os
path=os.path.abspath("usagov_bitly_data2013-05-17-1368817803")
print(path)
file = open(path, encoding="utf8")
records = [json.loads(line) for line in file]
Solution in R:
library(jsonlite)
# if you have a local file
conn <- gzcon(file("usagov_bitly_data2013-05-17-1368817803.gz", "rb"))
# if you read it from URL
conn <- gzcon(url("http://1usagov.measuredvoice.com/bitly_archive/usagov_bitly_data2013-05-17-1368817803.gz"))
data <- stream_in(conn)
I'm trying to allow users to signup to my service and I'm noticing errors whenever somebody signs up with Latin american characters in their name.I tried reading several SO posts/websites as per below:
Python regex against Latin-1 character encoding?
http://www.w3.org/TR/2009/WD-html5-20090423/infrastructure.html#character-encodings-0
http://docs.python.org/2/library/json.html
https://pypi.python.org/pypi/anyjson
but was still unable to solve it. My code example is as per below:
>>> val = json.dumps({"name":"Déjà"}, encoding="ISO-8859-1")
>>> val
'{"name": "D\\u00c3\\u00a9j\\u00c3\\u00a0"}'
Is there anyway to force the encoding to work in this case for both that and deserializing? Any help is appreciated!
EDIT
The client is Android and iPhone applications. I'm using the following libraries to encode the json on the clients:
http://loopj.com/android-async-http/ (android)
https://github.com/AFNetworking/AFNetworking (ios)
EDIT 2
The same text was received by the server from the Android client as per below:
{"NAME":"D\ufffdj\ufffd"}
I was using anyjson to deserialize that and it said:
File "/usr/local/lib/python2.7/dist-packages/anyjson/__init__.py", line 135, in loads
return implementation.loads(value)
File "/usr/local/lib/python2.7/dist-packages/anyjson/__init__.py", line 99, in loads
return self._decode(s)
File "/usr/local/lib/python2.7/dist-packages/simplejson/__init__.py", line 454, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python2.7/dist-packages/simplejson/decoder.py", line 374, in decode
obj, end = self.raw_decode(s)
File "/usr/local/lib/python2.7/dist-packages/simplejson/decoder.py", line 393, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
ValueError: ('utf8', "D\xe9j\xe0", 1, 2, 'invalid continuation byte')
JSON should almost always be in Unicode (when encoded), and if you're writing a webserver, UTF-8. The following, in Python 3, is basically correct:
In [1]: import json
In [2]: val = json.dumps({"name":"Déjà"})
In [3]: val
Out[3]: '{"name": "D\\u00e9j\\u00e0"}'
A closer look:
'{"name": "D\\u00e9j\\u00e0"}'
^^^^^^^
The text \u00e9, which in JSON means "é".
The slash is doubled because we're looking at a repr of a str.
You can then send val to the client, and in Javascript, JSON.parse should give you the right result.
Because you mentioned, "when somebody signs up": that implies data coming from the client (web browser) to you. How is that data being sent? What library/libraries are you writing a webserver in?
Turns out this was mainly an issue in how I was doing the encoding from the Android side.
I am now setting the StringEntity this way in Android and it's working now:
StringEntity se = new StringEntity(obj.toString(), "UTF-8");
se.setContentType("application/json;charset=UTF-8");
se.setContentEncoding( new BasicHeader(HTTP.CONTENT_TYPE, "application/json"));
Also, I was using anyjson on the server which was using simplejson. This was creating errors at times as well. I switched to using the json library for Python.
I try to read an email from a file, like this:
import email
with open("xxx.eml") as f:
msg = email.message_from_file(f)
and I get this error:
Traceback (most recent call last):
File "I:\fakt\real\maildecode.py", line 53, in <module>
main()
File "I:\fakt\real\maildecode.py", line 50, in main
decode_file(infile, outfile)
File "I:\fakt\real\maildecode.py", line 30, in decode_file
msg = email.message_from_file(f) #, policy=mypol
File "C:\Python33\lib\email\__init__.py", line 56, in message_from_file
return Parser(*args, **kws).parse(fp)
File "C:\Python33\lib\email\parser.py", line 55, in parse
data = fp.read(8192)
File "C:\Python33\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1920: character maps to <undefined>
The file contains a multipart email, where the part is encoded in UTF-8. The file's content or encoding might be broken, but I have to handle it anyway.
How can I read the file, even if it has Unicode errors? I cannot find the policy object compat32 and there seems to be no way to handle an exception and let Python continue right where the exception occured.
What can I do?
To parse an email message in Python 3 without unicode errors, read the file in binary mode and use the email.message_from_binary_file(f) (or email.message_from_bytes(f.read())) method to parse the content (see the documentation of the email.parser module).
Here is code that parses a message in a way that is compatible with Python 2 and 3:
import email
with open("xxx.eml", "rb") as f:
try:
msg = email.message_from_binary_file(f) # Python 3
except AttributeError:
msg = email.message_from_file(f) # Python 2
(tested with Python 2.7.13 and Python 3.6.0)
I can't test on your message, so I don't know if this will actually work, but you can do the string decoding yourself:
with open("xxx.eml", encoding='utf-8', errors='replace') as f:
text = f.read()
msg = email.message_from_string(f)
That's going to get you a lot of replacement characters if the message isn't actually in UTF-8. But if it's got \x81 in it, UTF-8 is my guess.
with open('email.txt','rb') as f:
ascii_txt = f.read().encode('ascii','backslashreplace')
with open('email.txt','w') as f:
f.write(ascii_text)
#now do your processing stuff
I doubt it is the best way to handle this ... but its at least a way ...
A method which works on python 3, which finds finds the encoding and reloads with the correct one.
msg=email.message_from_file(open('file.eml', errors='replace'))
codes=[x for x in msg.get_charsets() if x!=None]
if len(codes)>=1 :
msg=email.message_from_file(open('file.eml', encoding=codes[0]))
I have tried with msg.get_charset(), but it sometimes answers None while another encoding is available, hence the slightly involved encoding detection