I am trying to write the current window to a file. Problem with the code is that it must use an encoding (utf-8) otherwise if a window gets openened like outlook with windowname: Inbox - Outlook - Mail it gives the following error:
UnicodeEncodeError: 'charmap' codec can't encode character '\u200e' in
position 16: character maps to
But when using the utf-8 encoded file, it can not be encoded into base64, this gives the following error(of course):
ValueError: string argument should contain only ASCII characters
Is there a way to encode or encrypt this file(I've used rot-13 which worked and md5 but this didnt work well with reading and decrypting). Or to make the output of q = w.GetWindowText (w.GetForegroundWindow()) not in 'utf-8'.
code:
import win32gui
import time
import psutil
import win32process
i = 0
while i <= 1:
time.sleep(2)
w=win32gui
q = w.GetWindowText (w.GetForegroundWindow())
q =str(q)
print(q)
pid = win32process.GetWindowThreadProcessId(w.GetForegroundWindow())
print(psutil.Process(pid[-1]).name())
with open("lolp.txt",'w',encoding='utf-8')as f:
f.write(q)
As far as I can tell you need to remove non-ascii symols from q. It can be done in many ways. For example:
import win32gui
q = win32gui.GetWindowText(win32gui.GetForegroundWindow())
def remove_non_ascii(char):
return 32 <= ord(char) <= 126
q = filter(remove_non_ascii, q)
print("".join(list(q)))
Here are another solutions: How can I remove non-ASCII characters but leave periods and spaces?
Related
I'm reading a config file in python getting sections and creating new config files for each section.
However.. I'm getting a decode error because one of the strings contains Español=spain
self.output_file.write( what.replace( " = ", "=", 1 ) )
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
How would I adjust my code to allow for encoded characters such as these? I'm very new to this so please excuse me if this is something simple..
class EqualsSpaceRemover:
output_file = None
def __init__( self, new_output_file ):
self.output_file = new_output_file
def write( self, what ):
self.output_file.write( what.replace( " = ", "=", 1 ) )
def get_sections():
configFilePath = 'C:\\test.ini'
config = ConfigParser.ConfigParser()
config.optionxform = str
config.read(configFilePath)
for section in config.sections():
configdata = {k:v for k,v in config.items(section)}
confignew = ConfigParser.ConfigParser()
cfgfile = open("C:\\" + section + ".ini", 'w')
confignew.add_section(section)
for x in configdata.items():
confignew.set(section,x[0],x[1])
confignew.write( EqualsSpaceRemover( cfgfile ) )
cfgfile.close()
If you use python2 with from __future__ import unicode_literals then every string literal you write is an unicode literal, as if you would prefix every literal with u"...", unless you explicitly write b"...".
This explains why you get an UnicodeDecodeError on this line:
what.replace(" = ", "=", 1)
because what you actually do is
what.replace(u" = ",u"=",1 )
ConfigParser uses plain old str for its items when it reads a file using the parser.read() method, which means what will be a str. If you use unicode as arguments to str.replace(), then the string is converted (decoded) to unicode, the replacement applied and the result returned as unicode. But if what contains characters that can't be decoded to unicode using the default encoding, then you get an UnicodeDecodeError where you wouldn't expect one.
So to make this work you can
use explicit prefixes for byte strings: what.replace(b" = ", b"=", 1)
or remove the unicode_litreals future import.
Generally you shouldn't mix unicode and str (python3 fixes this by making it an error in almost any case). You should be aware that from __future__ import unicode_literals changes every non prefixed literal to unicode and doesn't automatically change your code to work with unicode in all case. Quite the opposite in many cases.
I´m having problems to brute force the key for a string encrypted with RC4/ARC4.
This is the encrypted string:
E7Ev08_MEojYBixHRKTKQnRSC4hkriZ7XPsy3p4xAHUPj41Dlzu9
And the string is also hashed with base64, so complete encoded string is:
RTdFdjA4X01Fb2pZQml4SFJLVEtRblJTQzRoa3JpWjdYUHN5M3A0eEFIVVBqNDFEbHp1OQ==
#-*- coding: utf-8 -*-
import threading
import sys
import time
import re
import itertools
from itertools import product
from Crypto.Cipher import ARC4
import base64
def special_match(strg):
try:
strg.decode('utf-8')
except UnicodeDecodeError:
pass
else:
print('\nkey found at %s, key: %s' % (time.ctime(), rc4_key))
try:
f=open('key.txt','ab')
f.write('Key (%s): %s\n' % (time.ctime(), rc4_key))
f.write('Decrypted string: ' + strg + '\n')
f.close()
except Exception as e:
print('ERROR WRITING KEY TO FILE: ' + str(e))
chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
end_chars = chars[::-1][0:7]
encoded_string = 'RTdFdjA4X01Fb2pZQml4SFJLVEtRblJTQzRoa3JpWjdYUHN5M3A0eEFIVVBqNDFEbHp1OQ=='
spinner = itertools.cycle(['-', '/', '|', '\\'])
while 1:
try:
# Iteration processess of possibel keys
for length in range(7,8): # only do length of 7
for attempt in itertools.permutations(chars, length):
rc4_key = ''.join(attempt) # This key is unknown, we are looking for it..
Ckey = ARC4.new(rc4_key)
decoded = Ckey.decrypt(encoded_string.decode('base64'))
special_match(decoded)
sys.stdout.write(spinner.next()) # write the next character
sys.stdout.flush() # flush stdout buffer (actual character display)
sys.stdout.write('\b') # erase the last written char
# Exit the script when we have done all password-combination-iterations
if (rc4_key == end_chars):
print('iteration of combinations done! No key found.. :(\n' + time.ctime())
exit()
except KeyboardInterrupt:
print('\nKeybord interrupt, exiting gracefully anyway on %s at %s' % (rc4_key, time.ctime()))
sys.exit()
I´m using http://crypo.bz.ms/secure-rc4-online to encrypt the string and https://www.base64encode.org to encode it with UTF-8.
Question
Why doesn't my script work to find the key?
(Im not receiving any error message, it is more of a general question if I have missed something in my code, or approach of the problem.)
plaintext: This is something that I have encrypted, key: ABCFMSG
Alright, it seems that crypo.bz uses a realy weird system. Basically they have a really weird encoding which causes discrepancies if you simply use their characters.
For example encoding 'a' with key 'A' should produce a character with value 163.
In hex A3. In crypo.bz we get 'oc' instead.
So you have two possibilities. Either do some ciphertext analysis or use another site. I recommend this one as they tell you in what they encode the result:
http://www.fyneworks.com/encryption/RC4-Encryption/index.asp
Take the hex and convert it to string, the you should be able to decipher it
Your code seems to be working by the way ;)
Tell me if you have additional questions
EDIT: did some additional analysis, and it is really, really weird.
in crypo.bz IF the algorithm is correct 163 is oc
160 is nc
but 161 is mc ??
If anyone figures this out please tell me!
EDITEDIT:
here is the encrypted, but not encoded string '#ÔèïH§¢6pbpÊ]õªœIôŒ>Yœ5îfäGuæxÖa…ë6°'
Your program takes like half a second to find the key ;)
Python 2.6, upgrading not an option
Script is designed to take fields from a arcgis database and create Insert oracle statements to a text file that can be used at a later date. There are 7500 records after 3000 records it errors out and says the problem lies at.
fieldValue = unicode(str(row.getValue(field.name)),'utf-8',errors='ignore')
I have tried seemly every variation of unicode and encode. I am new to python and really just need someone with experience to look at my code and see where the problem is.
import arcpy
#Where the GDB Table is located
fc = "D:\GIS Data\eMaps\ALDOT\ALDOT_eMaps_SignInventory.gdb/SignLocation"
fields = arcpy.ListFields(fc)
cursor = arcpy.SearchCursor(fc)
#text file output
outFile = open(r"C:\Users\kkieliszek\Desktop\transfer.text", "w")
#insert values into table billboard.sign_inventory
for row in cursor:
outFile.write("Insert into billboard.sign_inventory() Values (")
for field in fields:
fieldValue = unicode(str(row.getValue(field.name)),'utf-8',errors='ignore')
if row.isNull(field.name) or fieldValue.strip() == "" : #if field is Null or a Empty String print NULL
value = "NULL"
outFile.write('"' + value + '",')
else: #print the value in the field
value = str(row.getValue(field.name))
outFile.write('"' + value + '",')
outFile.write("); \n\n ")
outFile.close() # This closes the text file
Error Code:
Traceback (most recent call last):
File "tablemig.py", line 25, in <module>
fieldValue = unicode(str(row.getValue(field.name)),'utf-8',errors='ignore')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 76: ordinal not in range(128)
Never call str() on a unicode object:
>>> str(u'\u2019')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 0: ordinal not in range(128)
To write rows that contain Unicode strings in csv format, use UnicodeWriter instead of formatting the fields manually. It should fix several issues at once.
File TextWrappers make manual encoding/decoding unnecessary.
Assuming the result from the row is a Unicode, simply use the io.open() with the encoding attribute set to the required encoding.
For example:
import io
with io.open(r"C:\Users\kkieliszek\Desktop\transfer.text", "w", encoding='utf-8') as my_file:
my_file(my_unicode)
The problem is that you need to decode/encode unicode/byte string instead of just calling str on it. So, if you have a byte string object, then you need to call encode on it to convert it into unicode object ignoring utf content. On the other hand, if you have unicode object, you need to call decode on it to convert it into byte string ignoring utf again. So, just use this function instead
import re
def remove_unicode(string_data):
""" (str|unicode) -> (str|unicode)
recovers ascii content from string_data
"""
if string_data is None:
return string_data
if isinstance(string_data, str):
string_data = str(string_data.decode('ascii', 'ignore'))
else:
string_data = string_data.encode('ascii', 'ignore')
remove_ctrl_chars_regex = re.compile(r'[^\x20-\x7e]')
return remove_ctrl_chars_regex.sub('', string_data)
fieldValue = remove_unicode(row.getValue(field.name))
It should fix the problem.
I wish to use the cp437 character map from the utf-8 encoding.
I have all the code points for each of the cp437 characters.
The following code correctly displays a single cp437 character:
import locale
locale.setlocale(locale.LC_ALL, '')
icon u'\u263A'.encode('utf-8')
print icon
Whereas the following code shows most of the cp437 characters, but not all:
for i in range(0x00,0x100):
print chr(i).decode('cp437')
My guess is that the 2nd approach is not referencing the utf-8 encoding, but a separate incomplete cp437 character set.
I would like a way to summon a cp437 character from the utf-8 without having to specify each of the 256 individual code points. I have resorted to manually typing the unicode code point strings in a massive 16x16 table. Is there a better way?
The following code demonstrates this:
from curses import *
import locale
locale.setlocale(locale.LC_ALL, '')
def main(stdscr):
maxyx = stdscr.getmaxyx()
text= str(maxyx)
y_mid=maxyx[0]//2
x_mid=maxyx[1]//2
next_y,next_x = y_mid, x_mid
curs_set(1)
noecho()
event=1
y=0; x=0
icon1=u'\u2302'.encode('utf-8')
icon2=chr(0x7F).decode('cp437')
while event !=ord('q'):
stdscr.addstr(y_mid,x_mid-10,icon1)
stdscr.addstr(y_mid,x_mid+10,icon2)
event = stdscr.getch()
wrapper(main)
The icon on left is from utf-8 and does print to screen.
The icon on the right is from decode('cp437') and does not print to screen correctly [appears as ^?]
As mentioned by #Martijn in the comments, the stock cp437 decoder converts characters 0-127 straight into their ASCII equivalents. For some applications this would be the right thing, as you wouldn't for example want '\n' to translate to u'\u25d9'. But for full fidelity to the code page, you need a custom decoder and encoder.
The codec module makes it easy to add your own codecs, but examples are hard to find. I used the sample at http://pymotw.com/2/codecs/ along with the Wikipedia table for Code page 437 to generate this module - it automatically registers a codec with the name 'cp437ex' when you import it.
import codecs
codec_name = 'cp437ex'
_table = u'\0\u263a\u263b\u2665\u2666\u2663\u2660\u2022\u25d8\u25cb\u25d9\u2642\u2640\u266a\u266b\u263c\u25ba\u25c4\u2195\u203c\xb6\xa7\u25ac\u21a8\u2191\u2193\u2192\u2190\u221f\u2194\u25b2\u25bc !"#$%&\'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u2302\xc7\xfc\xe9\xe2\xe4\xe0\xe5\xe7\xea\xeb\xe8\xef\xee\xec\xc4\xc5\xc9\xe6\xc6\xf4\xf6\xf2\xfb\xf9\xff\xd6\xdc\xa2\xa3\xa5\u20a7\u0192\xe1\xed\xf3\xfa\xf1\xd1\xaa\xba\xbf\u2310\xac\xbd\xbc\xa1\xab\xbb\u2591\u2592\u2593\u2502\u2524\u2561\u2562\u2556\u2555\u2563\u2551\u2557\u255d\u255c\u255b\u2510\u2514\u2534\u252c\u251c\u2500\u253c\u255e\u255f\u255a\u2554\u2569\u2566\u2560\u2550\u256c\u2567\u2568\u2564\u2565\u2559\u2558\u2552\u2553\u256b\u256a\u2518\u250c\u2588\u2584\u258c\u2590\u2580\u03b1\xdf\u0393\u03c0\u03a3\u03c3\xb5\u03c4\u03a6\u0398\u03a9\u03b4\u221e\u03c6\u03b5\u2229\u2261\xb1\u2265\u2264\u2320\u2321\xf7\u2248\xb0\u2219\xb7\u221a\u207f\xb2\u25a0\xa0'
decoding_map = { i: ord(ch) for i, ch in enumerate(_table) }
encoding_map = codecs.make_encoding_map(decoding_map)
class Codec(codecs.Codec):
def encode(self, input, errors='strict'):
return codecs.charmap_encode(input, errors, encoding_map)
def decode(self, input, errors='strict'):
return codecs.charmap_decode(input, errors, decoding_map)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input, self.errors, encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input, self.errors, decoding_map)[0]
class StreamReader(Codec, codecs.StreamReader):
pass
class StreamWriter(Codec, codecs.StreamWriter):
pass
def _register(encoding):
if encoding == codec_name:
return codecs.CodecInfo(
name=codec_name,
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter)
codecs.register(_register)
Also note that decode produces Unicode strings, while encode produces byte strings. Printing a Unicode string should always work, but your question indicates you may have an incorrect default encoding. One of these should work:
icon2='\x7f'.decode('cp437ex')
icon2='\x7f'.decode('cp437ex').encode('utf-8')
I have the following function to create cipher text and then save it:
def create_credential(self):
des = DES.new(CIPHER_N, DES.MODE_ECB)
text = str(uuid.uuid4()).replace('-','')[:16]
cipher_text = des.encrypt(text)
return cipher_text
def decrypt_credential(self, text):
des = DES.new(CIPHER_N, DES.MODE_ECB)
return des.decrypt(text)
def update_access_credentials(self):
self.access_key = self.create_credential()
print repr(self.access_key) # "\xf9\xad\xfbO\xc1lJ'\xb3\xda\x7f\x84\x10\xbbv&"
self.access_password = self.create_credential()
self.save()
And I will call:
>>> from main.models import *
>>> u=User.objects.all()[0]
>>> u.update_access_credentials()
And this is the stacktrace I get:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf5 in position 738: invalid start byte
Why is this occurring and how would I get around it?
You are storing a bytestring into a Unicode database field, so it'll try and decode to Unicode.
Either use a database field that can store opaque binary data, decode explicitly to Unicode (latin-1 maps bytes one-on-one to Unicode codepoints) or wrap your data into a representation that can be stored as text.
For Django 1.6 and up, use a BinaryField, for example. For earlier versions, using a binary-to-text conversion (such as Base64) would be preferable over decoding to Latin-1; the result of the latter would not give you meaningful textual data but Django may try to display it as such (in the admin interface for example).
It's occurring because you're attempting to save non-text data in a text field. Either use a non-text field instead, or encode the data as text via e.g. Base-64 encoding.
Using base64 encoding and decoding here fixed this:
import base64
def create_credential(self):
des = DES.new(CIPHER_N, DES.MODE_ECB)
text = str(uuid.uuid4()).replace('-','')[:16]
cipher_text = des.encrypt(text)
base64_encrypted_message = base64.b64encode(cipher_text)
return base64_encrypted_message
def decrypt_credential(self, text):
text = base64.b64decode(text)
des = DES.new(CIPHER_N, DES.MODE_ECB)
message = des.decrypt(text)
return message