Encodings in ConfigParser (Python) - python

Python 3.1.3
What I need is to read dictionary from cp1251-file using ConfigParser.
My example:
config = configparser.ConfigParser()
config.optionxform = str
config.read("file.cfg")
DataStrings = config.items("DATA")
DataBase = dict()
for Dstr in DataStrings:
str1 = Dstr[0]
str2 = Dstr[1]
DataBase[str1] = str2
After that I'm trying to replace some words in some UTF-8 files according dictionary. But sometimes it doesn't works (for example, with symbols of "new line-carriage return").
My file in UTF-8 and configuration file (dictionary) in CP1251. Seems like trouble, I have to decode config into UTF-8.
I've tryed this:
str1 = Dstr[0].encode('cp1251').decode('utf-8-sig')
But error "'utf8' codec can't decode byte 0xcf in position 0" appeared.
If I use .decode('','ignore') - I just lose almost all config file.
What should I do?

Python 3.1 is in the no-mans-land of Python versions. Ideally you'd upgrade to Python 3.5, which would let you do config.read("file.cfg", encoding="cp1251")
If you must stay on 3.1x, you can use the ConfigParser.readfp() method to read from a previously opened file using the correct encoding:
import configparser
config = configparser.ConfigParser()
config.optionxform = str
config_file = open("file.cfg", encoding="cp1251")
config.readfp(config_file)

Related

Convert base64 encoded google service account key to JSON file using Python

hello I'm trying to convert a google service account JSON key (contained in a base64 encoded field named privateKeyData in file foo.json - more context here ) into the actual JSON file (I need that format as ansible only accepts that)
The foo.json file is obtained using this google python api method
what I'm trying to do (though I am using python) is also described this thread which by the way does not work for me (tried on OSx and Linux).
#!/usr/bin/env python3
import json
import base64
with open('/tmp/foo.json', 'r') as f:
ymldict = json.load(f)
b64encodedCreds = ymldict['privateKeyData']
b64decodedBytes = base64.b64decode(b64encodedCreds,validate=True)
outputStr = b64decodedBytes
print(outputStr)
#issue
outputStr = b64decodedBytes.decode('UTF-8')
print(outputStr)
yields
./test.py
b'0\x82\t\xab\x02\x01\x030\x82\td\x06\t*\x86H\x86\xf7\r\x01\x07\x01\xa0\x82\tU\x04\x82\tQ0\x82\tM0\x82\x05q\x06\t*\x86H\x86\xf7\r\x01\x07\x01\xa0\x82\x05b\x04\x82\x05^0\x82\x05Z0\x82\x05V\x06\x0b*\x86H\x86\xf7\r\x01\x0c\n\x01\x02\xa0\x82\x#TRUNCATING HERE
Traceback (most recent call last):
File "./test.py", line 17, in <module>
outputStr = b64decodedBytes.decode('UTF-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 1: invalid start byte
I think I have run out of ideas and spent now more than a day on this :(
what am I doing wrong?
Your base64 decoding logic looks fine to me. The problem you are facing is probably due to a character encoding mismatch. The response body you received after calling create (your foo.json file) is probably not encoded with UTF-8. Check out the response header's Content-Type field. It should look something like this:
Content-Type: text/javascript; charset=Shift_JIS
Try to decode your base64 decoded string with the encoding used in the content type
b64decodedBytes.decode('Shift_JIS')

Python - Parse INI text (not INI file)

I have a case where I get a string version of an INI file and not the file itself. I am getting it this way from an external source and hence cannot be changed.
Example of the string version:
'[DEFAULT]\nScore = 0.1\n\n[dev]\nHost = abc.com\nPort = 0000\n\n[qa]\nHost = xyz.com\nEsPort = 1000\n\n[main]\nHost = pqr.com\nPort = 2000\n'
I tried parsing this string using the configParser library in python:
>>> config = configparser.ConfigParser()
>>> config.read(my_ini_str)
[]
I get back nothing with the read function. Is there any other way to achieve this?
Thanks in advance!
UPDATE - Forgot to add that I am using Python 2.
This is what the read_string() method is for.
Parse configuration data from a string.
Hence:
>>> config = configparser.ConfigParser()
>>> config.read_string(my_ini_str)
you have to use read_string in order to parse a string.
then you have to use getters on "config" object, such as :
>>> config.get('dev', 'Host')
'abc.com'
or
>>> config.getfloat(config.default_section, 'Score')
0.1

Python UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3

I'm reading a config file in python getting sections and creating new config files for each section.
However.. I'm getting a decode error because one of the strings contains Español=spain
self.output_file.write( what.replace( " = ", "=", 1 ) )
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
How would I adjust my code to allow for encoded characters such as these? I'm very new to this so please excuse me if this is something simple..
class EqualsSpaceRemover:
output_file = None
def __init__( self, new_output_file ):
self.output_file = new_output_file
def write( self, what ):
self.output_file.write( what.replace( " = ", "=", 1 ) )
def get_sections():
configFilePath = 'C:\\test.ini'
config = ConfigParser.ConfigParser()
config.optionxform = str
config.read(configFilePath)
for section in config.sections():
configdata = {k:v for k,v in config.items(section)}
confignew = ConfigParser.ConfigParser()
cfgfile = open("C:\\" + section + ".ini", 'w')
confignew.add_section(section)
for x in configdata.items():
confignew.set(section,x[0],x[1])
confignew.write( EqualsSpaceRemover( cfgfile ) )
cfgfile.close()
If you use python2 with from __future__ import unicode_literals then every string literal you write is an unicode literal, as if you would prefix every literal with u"...", unless you explicitly write b"...".
This explains why you get an UnicodeDecodeError on this line:
what.replace(" = ", "=", 1)
because what you actually do is
what.replace(u" = ",u"=",1 )
ConfigParser uses plain old str for its items when it reads a file using the parser.read() method, which means what will be a str. If you use unicode as arguments to str.replace(), then the string is converted (decoded) to unicode, the replacement applied and the result returned as unicode. But if what contains characters that can't be decoded to unicode using the default encoding, then you get an UnicodeDecodeError where you wouldn't expect one.
So to make this work you can
use explicit prefixes for byte strings: what.replace(b" = ", b"=", 1)
or remove the unicode_litreals future import.
Generally you shouldn't mix unicode and str (python3 fixes this by making it an error in almost any case). You should be aware that from __future__ import unicode_literals changes every non prefixed literal to unicode and doesn't automatically change your code to work with unicode in all case. Quite the opposite in many cases.

AES key moved in config file generate " AES key must be either 16, 24, or 32 bytes long" error

I'm using python,flask.I'm doing encryption with AES. It works good, I encrypt and decrypt data easly .
To secure the encryption key i moved my encryption key from app form in config file. First I saved a variable in a config file,I declared ENCRYPTION_KEY in config.cfg .
[Encryption]
ENCRYPTION_KEY = b'\xbf\xc0\x85)\x10nc\x94\x02)j\xdf\xcb\xc4\x94\x9d(\x9e[EX\xc8\xd5\xbfI{\xa2$\x05(\xd5\x18'
and then on init file i declared:
app.config['ENCRYPTION_KEY'] = config.get('Encryption', 'ENCRYPTION_KEY')
I tried accessing it from key = flask.config['ENCRYPTION_KEY']. I print key in console just to be sure that the command works:
def encrypt_data(self, form_data):
key = current_app.config['ENCRYPTION_KEY']
print "KEY : " , key
cipher = AES.new(key)
//code...
And in console key is printed:
Now when i try to use this key from config file,i have an error message:
This message appears only because i moved that key in config file, because as I said before i used the same key for the same methods and it works perfectly?
Can anybody help me, why I'm getting this error?
Your use of the ConfigParser module is the cause of the problem. Given the config file shown:
>>> config.get('Encryption', 'ENCRYPTION_KEY')
"b'\\xbf\\xc0\\x85)\\x10nc\\x94\\x02)j\\xdf\\xcb\\xc4\\x94\\x9d(\\x9e[EX\\xc8\\xd5\\xbfI{\\xa2$\\x05(\\xd5\\x18'"
>>> len(config.get('Encryption', 'ENCRYPTION_KEY'))
92
you can see here that ConfigParser simply returns the value associated with the given config variable as text, not as a Python string. Because the config value contains backslash escape sequences, these backslashes are escaped with additional backslashes. This breaks the \x character sequences which then blow out to 4 characters.
The easiest way around this is to use Flask's config files:
config.cfg
ENCRYPTION_KEY = b'\xbf\xc0\x85)\x10nc\x94\x02)j\xdf\xcb\xc4\x94\x9d(\x9e[EX\xc8\xd5\xbfI{\xa2$\x05(\xd5\x18'
>>> import flask
>>> app = flask.Flask('test')
>>> app = flask.Flask('')
>>> app.config.from_pyfile('config.cfg')
True
>>> app.config['ENCRYPTION_KEY']
'\xbf\xc0\x85)\x10nc\x94\x02)j\xdf\xcb\xc4\x94\x9d(\x9e[EX\xc8\xd5\xbfI{\xa2$\x05(\xd5\x18'
>>> len(app.config['ENCRYPTION_KEY'])
32
If you do not want to use Flask's config files your options are (in order of preference):
Use ast.literal_eval() to safely convert the raw string into a Python string:
from ast import literal_eval
app.config['ENCRYPTION_KEY'] = literal_eval(config.get('Encryption', 'ENCRYPTION_KEY'))
Base64 encode the value in the config file, e.g.
[Encryption]
ENCRYPTION_KEY = v8CFKRBuY5QCKWrfy8SUnSieW0VYyNW/SXuiJAUo1Rg=
Then decode it when you access the key:
app.config['ENCRYPTION_KEY'] = config.get('Encryption', 'ENCRYPTION_KEY').decode('base64')
Use eval() to convert the raw string into a Python string:
app.config['ENCRYPTION_KEY'] = eval(config.get('Encryption', 'ENCRYPTION_KEY'))
although this is considered bad/dangerous practice and you would be better off using literal_eval() or base64 encoding.
Store the key as a binary value in the file: [Encryption]
ENCRYPTION_KEY = ¿À<85>)^Pnc<94>^B)jßËÄ<94><9d>(<9e>[EXÈÕ¿I{¢$^E(Õ^X
but that is very difficult to maintain.

How can I understand this python error message?

Hi can you help me decode this message and what to do:
main.py", line 1278, in post
message.body = "%s %s/%s/%s" % (msg, host, ad.key().id(), slugify(ad.title.encode('utf-8')))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
Thanks
UPDATE having tried removing the encode call it appears to work:
class Recommend(webapp.RequestHandler):
def post(self, key):
ad= db.get(db.Key(key))
email = self.request.POST['tip_email']
host = os.environ.get("HTTP_HOST", os.environ["SERVER_NAME"])
senderemail = users.get_current_user().email() if users.get_current_user() else 'info#monton.cl' if host.endswith('.cl') else 'info#monton.com.mx' if host.endswith('.mx') else 'info#montao.com.br' if host.endswith('.br') else 'admin#koolbusiness.com'
message = mail.EmailMessage(sender=senderemail, subject="%s recommends %s" % (self.request.POST['tip_name'], ad.title) )
message.to = email
message.body = "%s %s/%s/%s" % (self.request.POST['tip_msg'],host,ad.key().id(),slugify(ad.title))
message.send()
matched_images=ad.matched_images
count = matched_images.count()
if ad.text:
p = re.compile(r'(www[^ ]*|http://[^ ]*)')
text = p.sub(r'\1',ad.text.replace('http://',''))
else:
text = None
self.response.out.write("Message sent<br>")
path = os.path.join(os.path.dirname(__file__), 'market', 'market_ad_detail.html')
self.response.out.write(template.render(path, {'user_url':users.create_logout_url(self.request.uri) if users.get_current_user() else users.create_login_url(self.request.uri),
'user':users.get_current_user(), 'ad.user':ad.user,'count':count, 'ad':ad, 'matched_images': matched_images,}))
The problem here is your underlying model (message.body) only wants ASCII text but you're trying to give it a string encoded in unicode.
But since you've got a normal ascii string here, you can just make python print out the '?' character when you've got a non-ascii-printing string.
"UNICODE STRING".encode('ascii','replace').decode('ascii')
So like from your example above:
message.body = "%s %s/%s/%s" % \
(msgencode('ascii','replace').decode('ascii'),
hostencode('ascii','replace').decode('ascii'),
ad.key().id()encode('ascii','replace').decode('ascii'),
slugify(ad.title)encode('ascii','replace').decode('ascii'))
Or just encode/decode on the variable that has the unicode character.
But this isn't an optimal solution. The best idea is to make message.body a unicode string. Being that doesn't seem feasible (I'm not familiar with GAE), you can use this to at least not have errors.
You've got a Unicode character in a place that you're not supposed to. Most often I find this error is having MS Word-style slanted quotes.
One of these fields has some characters that cannot be encoded. If you switch to python 3 (it has better unicode support), or you change the encoding of the entire script the problem should stop, about the best way to change the encoding in 2.x is using the encoding comment line. If you see http://evanjones.ca/python-utf8.html you will see more of an explanation of using python with utf-8 support the best suggestion is add # -*- coding: utf-8 -*- to the top of your script. And handle scripts like this
s = "hello normal string"
u = unicode( s, "utf-8" )
backToBytes = u.encode( "utf-8" )
I had a similar problem when using Django norel and Google App Engine.
The problem was at the folder containing the application. Probably isn't this the problem described in this question, but, maybe helps someone don't waste time like me.
Try first change you application folder maybe to /home/ and try to run again, if doesn't works, try something more.

Categories

Resources