I am running Python 3.7.x and am trying to figure out how to encode a string, {CTF-FLAG1}, using zero width steganography.
I am using zwsp-steg-py to do so, but I do not know how to use this to encode text into other text, see below:
I want to encode {CTF-FLAG1} inside of the text Now you see me, now you don't. using zero width steganography.
I installed zwsp-steg-py and tried:
#coding=utf-8
import zwsp_steg
encoded = zwsp_steg.encode("{CTF-Flag1}", zwsp_steg.MODE_ZWSP)
decoded = zwsp_steg.decode(encode)
print(decoded)
Yet, the result is:
C:\Users\jerry\Desktop>python decode.py
Traceback (most recent call last):
File "decode.py", line 5, in <module>
decoded = zwsp_steg.decode(encoded)
File "C:\Python367-64\lib\site-packages\zwsp_steg\steganography.py", line 72, in decode
raise TypeError('Unknown encoding detected!')
TypeError: Unknown encoding detected!
I don't think I'm doing it right.
#coding=utf-8
import zwsp_steg
encoded = zwsp_steg.encode("{CTF-Flag1}", zwsp_steg.MODE_ZWSP)
decoded = zwsp_steg.decode(encode, zwsp_steg.MODE_ZWSP)
print(decoded)
# example with string padding
encoded += "This is a test string"
print(encoded)
decoded_the_string = zwsp_steg.decode(encode, zwsp_steg.MODE_ZWSP)
print(decoded_the_string)
Related
hello I'm trying to convert a google service account JSON key (contained in a base64 encoded field named privateKeyData in file foo.json - more context here ) into the actual JSON file (I need that format as ansible only accepts that)
The foo.json file is obtained using this google python api method
what I'm trying to do (though I am using python) is also described this thread which by the way does not work for me (tried on OSx and Linux).
#!/usr/bin/env python3
import json
import base64
with open('/tmp/foo.json', 'r') as f:
ymldict = json.load(f)
b64encodedCreds = ymldict['privateKeyData']
b64decodedBytes = base64.b64decode(b64encodedCreds,validate=True)
outputStr = b64decodedBytes
print(outputStr)
#issue
outputStr = b64decodedBytes.decode('UTF-8')
print(outputStr)
yields
./test.py
b'0\x82\t\xab\x02\x01\x030\x82\td\x06\t*\x86H\x86\xf7\r\x01\x07\x01\xa0\x82\tU\x04\x82\tQ0\x82\tM0\x82\x05q\x06\t*\x86H\x86\xf7\r\x01\x07\x01\xa0\x82\x05b\x04\x82\x05^0\x82\x05Z0\x82\x05V\x06\x0b*\x86H\x86\xf7\r\x01\x0c\n\x01\x02\xa0\x82\x#TRUNCATING HERE
Traceback (most recent call last):
File "./test.py", line 17, in <module>
outputStr = b64decodedBytes.decode('UTF-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 1: invalid start byte
I think I have run out of ideas and spent now more than a day on this :(
what am I doing wrong?
Your base64 decoding logic looks fine to me. The problem you are facing is probably due to a character encoding mismatch. The response body you received after calling create (your foo.json file) is probably not encoded with UTF-8. Check out the response header's Content-Type field. It should look something like this:
Content-Type: text/javascript; charset=Shift_JIS
Try to decode your base64 decoded string with the encoding used in the content type
b64decodedBytes.decode('Shift_JIS')
I am running below program in python3, but getting an error
Str = "this is string example....wow!!!";
Str = Str.encode('base64','strict');
print ("Encoded String: " + Str)
print ("Decoded String: " + Str.decode('base64','strict'))
and getting Error is:-
File "learn.py", line 646, in <module>
Str = Str.encode('base64','strict');
LookupError: 'base64' is not a text encoding; use codecs.encode() to handle arbitrary codecs
Base64 is not a string encoding method, its a method for encoding bytes use the base64 module instead
Encode and Decode base64 (using more than required variable so that everyone can keep track of operations).
import base64
print("base64 Encoded/Decoded string \n")
str="this is a test string "
s1=str.encode('ascii') # it will result like: b'this is a test string'
s2=base64.b64encode(s1); #storing value before printing
print("encoded : ",s2)
print("decoded : ",base64.b64decode(s2)) #direct conversion
Python 2.6, upgrading not an option
Script is designed to take fields from a arcgis database and create Insert oracle statements to a text file that can be used at a later date. There are 7500 records after 3000 records it errors out and says the problem lies at.
fieldValue = unicode(str(row.getValue(field.name)),'utf-8',errors='ignore')
I have tried seemly every variation of unicode and encode. I am new to python and really just need someone with experience to look at my code and see where the problem is.
import arcpy
#Where the GDB Table is located
fc = "D:\GIS Data\eMaps\ALDOT\ALDOT_eMaps_SignInventory.gdb/SignLocation"
fields = arcpy.ListFields(fc)
cursor = arcpy.SearchCursor(fc)
#text file output
outFile = open(r"C:\Users\kkieliszek\Desktop\transfer.text", "w")
#insert values into table billboard.sign_inventory
for row in cursor:
outFile.write("Insert into billboard.sign_inventory() Values (")
for field in fields:
fieldValue = unicode(str(row.getValue(field.name)),'utf-8',errors='ignore')
if row.isNull(field.name) or fieldValue.strip() == "" : #if field is Null or a Empty String print NULL
value = "NULL"
outFile.write('"' + value + '",')
else: #print the value in the field
value = str(row.getValue(field.name))
outFile.write('"' + value + '",')
outFile.write("); \n\n ")
outFile.close() # This closes the text file
Error Code:
Traceback (most recent call last):
File "tablemig.py", line 25, in <module>
fieldValue = unicode(str(row.getValue(field.name)),'utf-8',errors='ignore')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 76: ordinal not in range(128)
Never call str() on a unicode object:
>>> str(u'\u2019')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 0: ordinal not in range(128)
To write rows that contain Unicode strings in csv format, use UnicodeWriter instead of formatting the fields manually. It should fix several issues at once.
File TextWrappers make manual encoding/decoding unnecessary.
Assuming the result from the row is a Unicode, simply use the io.open() with the encoding attribute set to the required encoding.
For example:
import io
with io.open(r"C:\Users\kkieliszek\Desktop\transfer.text", "w", encoding='utf-8') as my_file:
my_file(my_unicode)
The problem is that you need to decode/encode unicode/byte string instead of just calling str on it. So, if you have a byte string object, then you need to call encode on it to convert it into unicode object ignoring utf content. On the other hand, if you have unicode object, you need to call decode on it to convert it into byte string ignoring utf again. So, just use this function instead
import re
def remove_unicode(string_data):
""" (str|unicode) -> (str|unicode)
recovers ascii content from string_data
"""
if string_data is None:
return string_data
if isinstance(string_data, str):
string_data = str(string_data.decode('ascii', 'ignore'))
else:
string_data = string_data.encode('ascii', 'ignore')
remove_ctrl_chars_regex = re.compile(r'[^\x20-\x7e]')
return remove_ctrl_chars_regex.sub('', string_data)
fieldValue = remove_unicode(row.getValue(field.name))
It should fix the problem.
I try to read an email from a file, like this:
import email
with open("xxx.eml") as f:
msg = email.message_from_file(f)
and I get this error:
Traceback (most recent call last):
File "I:\fakt\real\maildecode.py", line 53, in <module>
main()
File "I:\fakt\real\maildecode.py", line 50, in main
decode_file(infile, outfile)
File "I:\fakt\real\maildecode.py", line 30, in decode_file
msg = email.message_from_file(f) #, policy=mypol
File "C:\Python33\lib\email\__init__.py", line 56, in message_from_file
return Parser(*args, **kws).parse(fp)
File "C:\Python33\lib\email\parser.py", line 55, in parse
data = fp.read(8192)
File "C:\Python33\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1920: character maps to <undefined>
The file contains a multipart email, where the part is encoded in UTF-8. The file's content or encoding might be broken, but I have to handle it anyway.
How can I read the file, even if it has Unicode errors? I cannot find the policy object compat32 and there seems to be no way to handle an exception and let Python continue right where the exception occured.
What can I do?
To parse an email message in Python 3 without unicode errors, read the file in binary mode and use the email.message_from_binary_file(f) (or email.message_from_bytes(f.read())) method to parse the content (see the documentation of the email.parser module).
Here is code that parses a message in a way that is compatible with Python 2 and 3:
import email
with open("xxx.eml", "rb") as f:
try:
msg = email.message_from_binary_file(f) # Python 3
except AttributeError:
msg = email.message_from_file(f) # Python 2
(tested with Python 2.7.13 and Python 3.6.0)
I can't test on your message, so I don't know if this will actually work, but you can do the string decoding yourself:
with open("xxx.eml", encoding='utf-8', errors='replace') as f:
text = f.read()
msg = email.message_from_string(f)
That's going to get you a lot of replacement characters if the message isn't actually in UTF-8. But if it's got \x81 in it, UTF-8 is my guess.
with open('email.txt','rb') as f:
ascii_txt = f.read().encode('ascii','backslashreplace')
with open('email.txt','w') as f:
f.write(ascii_text)
#now do your processing stuff
I doubt it is the best way to handle this ... but its at least a way ...
A method which works on python 3, which finds finds the encoding and reloads with the correct one.
msg=email.message_from_file(open('file.eml', errors='replace'))
codes=[x for x in msg.get_charsets() if x!=None]
if len(codes)>=1 :
msg=email.message_from_file(open('file.eml', encoding=codes[0]))
I have tried with msg.get_charset(), but it sometimes answers None while another encoding is available, hence the slightly involved encoding detection
I've got this function, which I modified from material in chapter 1 of the online NLTK book. It's been very useful to me but, despite reading the chapter on Unicode, I feel just as lost as before.
def openbookreturnvocab(book):
fileopen = open(book)
rawness = fileopen.read()
tokens = nltk.wordpunct_tokenize(rawness)
nltktext = nltk.Text(tokens)
nltkwords = [w.lower() for w in nltktext]
nltkvocab = sorted(set(nltkwords))
return nltkvocab
When I tried it the other day on Also Sprach Zarathustra, it clobbered words with an umlat over the o's and u's. I'm sure some of you will know why that happened. I'm also sure that it's quite easy to fix. I know that it just has to do with calling a function that re-encodes the tokens into unicode strings. If so, that it seems to me it might not happen inside that function definition at all, but here, where I prepare to write to file:
def jotindex(jotted, filename, readmethod):
filemydata = open(filename, readmethod)
jottedf = '\n'.join(jotted)
filemydata.write(jottedf)
filemydata.close()
return 0
I heard that what I had to do was encode the string into unicode after reading it from the file. I tried amending the function like so:
def openbookreturnvocab(book):
fileopen = open(book)
rawness = fileopen.read()
unirawness = rawness.decode('utf-8')
tokens = nltk.wordpunct_tokenize(unirawness)
nltktext = nltk.Text(tokens)
nltkwords = [w.lower() for w in nltktext]
nltkvocab = sorted(set(nltkwords))
return nltkvocab
But that brought this error, when I used it on Hungarian. When I used it on German, I had no errors.
>>> import bookroutines
>>> elles1 = bookroutines.openbookreturnvocab("lk1-les1")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "bookroutines.py", line 9, in openbookreturnvocab
nltktext = nltk.Text(tokens)
File "/usr/lib/pymodules/python2.6/nltk/text.py", line 285, in __init__
self.name = " ".join(map(str, tokens[:8])) + "..."
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 4: ordinal not in range(128)
I fixed the function that files the data like so:
def jotindex(jotted, filename, readmethod):
filemydata = open(filename, readmethod)
jottedf = u'\n'.join(jotted)
filemydata.write(jottedf)
filemydata.close()
return 0
However, that brought this error, when I tried to file the German:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "bookroutines.py", line 23, in jotindex
filemydata.write(jottedf)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 414: ordinal not in range(128)
>>>
...which is what you get when you try to write the u'\n'.join'ed data.
>>> jottedf = u'/n'.join(elles1)
>>> filemydata.write(jottedf)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 504: ordinal not in range(128)
For each string that you read from your file, you can convert them to unicode by calling rawness.decode('utf-8'), if you have the text in UTF-8. You will end up with unicode objects. Also, I don't know what "jotted" is, but you may want to make sure it's a unicode object and use u'\n'.join(jotted) instead.
Update:
It appears that the NLTK library doesn't like unicode objects. Fine, then you have to make sure that you are using str instances with UTF-8 encoded text. Try using this:
tokens = nltk.wordpunct_tokenize(unirawness)
nltktext = nltk.Text([token.encode('utf-8') for token in tokens])
and this:
jottedf = u'\n'.join(jotted)
filemydata.write(jottedf.encode('utf-8'))
but if jotted is really a list of UTF-8-encoded str, then you don't need this and this should be enough:
jottedf = '\n'.join(jotted)
filemydata.write(jottedf)
By the way, it looks as though NLTK isn't very cautious with respect to unicode and encoding (at least, the demos). Better be careful and check that it has processed your tokens correctly. Also, and this may have caused the fact that you get errors with Hungarian text and not German text, check your encodings.