How can I save string s got in this way:
f = open("a.jpg", "rb")
b = f.read()
s = str(b)[2:-1]
as .jpg file? In other program I have only s like form of this image, so it is: "\\xff\\xd8\\xff\\xe0...".
My recommendation would be to change the code, that creates the string in this strange way.
If not possible:
Where is this string coming from? Is this a trusted source or is it coming from a web server or a person who might want to hack / break your computer?
Is the string really created with str(b)[2:-1] or is this just an approximation of the real problem?
I am asking, as this is making things a little more complicated then necessary. (It requires adding a try / except)
Following code should work:
from ast import literal_eval
def stripped_str_b_to_bytes(s):
try:
return literal_eval("b'" + s + "'")
except SyntaxError:
return literal_eval('b"' + s + '"')
testvalues = [
b"A'B",
b'A"B',
bytes([v for v in range(256)]),
]
for b in testvalues:
print("testing with ", b)
s = b[2:-1]
print("S =", s)
c = stripped_str_b_to_bytes(s)
assert b == c
It tries to prepend b' and append ' to s and evaluate that string as a python expression.
If this doesn't work, then it tries to prepend b" and append " to s and evaluate it.
Related
I am currently working on a binary encryption code: [Sender(Msg Input=> Binary Conversion)] : [Receiver (Binary Conversion => Msg Output)]
As of now I am able to convert text based Msgs , e.g) How are you? etc.
print("Enter Msg:")
def Binary_Encryption(message):
message = ''.join(format(i, 'b') for i in bytearray(message, encoding ='utf-8'))
print(message)
Binary_Encryption(input("").replace (" ","\\"))
Output: 10010001101111111011110111001100001111001011001011011100111100111011111110101111111
After the binary string is obtained, by just copying the string and placing it within this block of code will decrypt it.
def Binary_Decryption(binary):
string = int(binary, 2)
return string
bin_data = (input("Enter Binary:\n"))
str_data =''
for i in range(0, len(bin_data), 7):
temp_data = bin_data[i:i + 7]
decimal_data = Binary_Decryption(temp_data)
str_data = str_data + chr(decimal_data)
print("Decrypted Text:\n"+str_data.replace("\\"," "))
Output: How are you?
But I am not able to convert a certain inputs , e.g) ?? , 8879 , Oh! How are You? etc.
basically the msgs that are not being converted are Msgs with multiple uses of numbers or special
characters.
Msg Input for ?? gives "⌂▼" and 8879 gives "qc?☺" while Oh! How are You? gives "OhC9◄_o9CeK93_k▼
I think the problem is that the special characters (!, ?) contains only 6 bits, while the other characters 7.This messes things up if there are other characters behind the special one I think. Maybe something like this should work. There is probably a better way to solve this though.
def Binary_Encryption(message):
s = ""
for i in bytearray(message, encoding="utf-8"):
c = format(i, "b")
addon = 7 - len(c)
c = addon * "0" + c # prepend 0 if len shorter than 7
s += c # Add to string
print(s)
Your problem is that you are copying the output from binary_encrypt directly which truncate leading zeros so 8 instead of being 00111000 it became 111000 which result in 2 bits being used from next ASCII binary character since ASCII characters are represented as 8-bits values to print number 8897 use0011100000111000001110010011011100001010 as input to binary_decrypt. look for ASCII table to see the binary equivalents for each character.Just edit your code like this.
print("Enter Msg:")
def Binary_Encryption(message):
# pass 08b to format
message = ''.join(format(i, '08b') for i in bytearray(message, encoding ='utf-8'))
print(message)
Binary_Encryption(input("").replace (" ","\\"))
I am trying to find and replace several lines of plain text in multiple files with input() but when I enter '\n' characters to represent where the new line chars would be in the text, it doesn't find it and doesn't replace it.
I tried to use raw_strings but couldn't get them to work.
Is this a job for regular expressions?
python 3.7
import os
import re
import time
start = time.time()
# enter path and check input for standard format
scan_folder = input('Enter the absolute path to scan:\n')
validate_path_regex = re.compile(r'[a-z,A-Z]:\\?(\\?\w*\\?)*')
mo = validate_path_regex.search(scan_folder)
if mo is None:
print('Path is not valid. Please re-enter path.\n')
import sys
sys.exit()
os.chdir(scan_folder)
# get find/replaceStrings, and then confirm that inputs are correct.
find_string = input('Enter the text you wish to find:\n')
replace_string = input('Enter the text to replace:\n')
permission = input('\nPlease confirm you want to replace '
+ find_string + ' with '
+ replace_string + ' in ' + scan_folder
+ ' directory.\n\nType "yes" to continue.\n')
if permission == 'yes':
change_count = 0
# Context manager for results file
with open('find_and_replace.txt', 'w') as results:
for root, subdirs, files in os.walk(scan_folder):
for file in files:
# ignore files that don't endwith '.mpr'
if os.path.join(root, file).endswith('.mpr'):
fullpath = os.path.join(root, file)
# context manager for each file opened
with open(fullpath, 'r+') as f:
text = f.read()
# only add to changeCount if find_string is in text
if find_string in text:
change_count += 1
# move cursor back to beginning of the file
f.seek(0)
f.write(text.replace(find_string, replace_string))
results.write(str(change_count)
+ ' files have been modified to replace '
+ find_string + ' with ' + replace_string + '.\n')
print('Done with replacement')
else:
print('Find and replace has not been executed')
end = time.time()
print('Program took ' + str(round((end - start), 4)) + ' secs to complete.\n')
find_string = BM="LS"\nTI="12"\nDU="7"
replace_string = BM="LSL"\nDU="7"
The original file looks like
BM="LS"
TI="12"
DU="7"
and I would like it to change to
BM="LSL"
DU="7"
but the file doesn't change.
So, the misconception you have is the distinction between source code, which understands escape sequences like "this is a string \n with two lines", and things like "raw strings" (a concept that doesn't make sense in this context) and the data your are providing as user input. The input function basically processes data coming in from the standard input device. When you provide data to standard input, it is being interpreted as a raw bytes and then the input function assumes its meant to be text (decoded using whatever your system setting imply). There are two approaches to allow a user to input newlines, the first is to use sys.stdin, however, this will require you to provide an EOF, probably using ctrl + D:
>>> import sys
>>> x = sys.stdin.read()
here is some text and i'm pressing return
to make a new line. now to stop input, press control d>>> x
"here is some text and i'm pressing return\nto make a new line. now to stop input, press control d"
>>> print(x)
here is some text and i'm pressing return
to make a new line. now to stop input, press control d
This is not very user-friendly. You have to either pass a newline and an EOF, i.e. return + ctrl + D or do ctrl + D twice, and this depends on the system, I believe.
A better approach would be to allow the user to input escape sequences, and then decode them yourself:
>>> x = input()
I want this to\nbe on two lines
>>> x
'I want this to\\nbe on two lines'
>>> print(x)
I want this to\nbe on two lines
>>> x.encode('utf8').decode('unicode_escape')
'I want this to\nbe on two lines'
>>> print(x.encode('utf8').decode('unicode_escape'))
I want this to
be on two lines
>>>
I am looking to convert a file to binary for a project, preferably using Python as I am most comfortable with it, though if walked-through, I could probably use another language.
Basically, I need this for a project I am working on where we want to store data using a DNA strand and thus need to store files in binary ('A's and 'T's = 0, 'G's and 'C's = 1)
Any idea how I could proceed? I did find that use could encode in base64, then decode it, but it seems a bit inefficient, and the code that I have doesn't seem to work...
import base64
import tkinter as tk
from tkinter import filedialog
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
print(file_path)
with open(file_path) as f:
encoded = base64.b64encode(f.readlines())
print(encoded)
Also, I already have a program to do that simply with text. Any tips on how to improve it would also be appreciated!
import binascii
t = bytearray(str(input("Texte?")), 'utf8')
h = binascii.hexlify(t)
b = bin(int(h, 16)).replace('b','')
#removing the b that appears in the end for some reason
g = b.replace('1','G').replace('0','A')
print(g)
For example, if I input test:
ok so for the text to DNA:
I input 'test' and expect the DNA sequence that comes from the binary
the binary being: 01110100011001010111001101110100 (Also I asked to print every conversion in the example so that it is more comprehensible)
>>>Texte?test #Asks the text
>>>b'74657374' #converts to hex
>>>01110100011001010111001101110100 #converts to binary
>>>AGGGAGAAAGGAAGAGAGGGAAGGAGGGAGAA #converts 0 to A and 1 to G
So, thanks to #jonrshape and Sergey Vturin, I finally was able to achieve what I wanted!
My program asks for a file, turns it into binary, which then gives me its equivalent in "DNA code" using pairs of binary numbers (00 = A, 01 = T, 10 = G, 11 = C)
import binascii
from tkinter import filedialog
file_path = filedialog.askopenfilename()
x = ""
with open(file_path, 'rb') as f:
for chunk in iter(lambda: f.read(32), b''):
x += str(binascii.hexlify(chunk)).replace("b","").replace("'","")
b = bin(int(x, 16)).replace('b','')
g = [b[i:i+2] for i in range(0, len(b), 2)]
dna = ""
for i in g:
if i == "00":
dna += "A"
elif i == "01":
dna += "T"
elif i == "10":
dna += "G"
elif i == "11":
dna += "C"
print(x) #hexdump
print(b) #converted to binary
print(dna) #converted to "DNA"
Of course, it is inefficient!
base64 is designed to store binary in a text. It makes a bigger size block after conversion.
btw: what efficiency do you want? compactness?
if so: second sample is much nearer to what you want
btw: in your task you loose information! Are you aware of this?
Here is a sample how to store and restore.
It stores data in an easy to understand Hex-In-Text format -- just for the sake of a demo. If you want compactness - you can easily modify the code so as to store in binary file or if you want 00011001 view - modification will be easy too.
import math
#"make a long test string"
import numpy as np
s=''.join((str(x) for x in np.random.randint(4,size=33)))\
.replace('0','A').replace('1','T').replace('2','G').replace('3','C')
def store_(s):
size=len(s) #size will changed to fit 8*integer so remember true value of it and store with data
s2=s.replace('A','0').replace('T','0').replace('G','1').replace('C','1')\
.ljust( int(math.ceil(size/8.)*8),'0') #add '0' to 8xInt to the right
a=(hex( eval('0b'+s2[i*8:i*8+8]) )[2:].rjust(2,'0') for i in xrange(len(s2)/8))
return ''.join(a),size
yourDataAsHexInText,sizeToStore=store_(s)
print yourDataAsHexInText,sizeToStore
def restore_(s,size=None):
if size==None: size=len(s)/2
a=( bin(eval('0x'+s[i*2:i*2+2]))[2:].rjust(8,'0') for i in xrange(len(s)/2))
#you loose information, remember?, so it`s only A or G
return (''.join(a).replace('1','G').replace('0','A') )[:size]
restore_(yourDataAsHexInText,sizeToStore)
print "so check it"
print s ,"(input)"
print store_(s)
print s.replace('C','G').replace('T','A') ,"to compare with information loss"
print restore_(*store_(s)),"restored"
print s.replace('C','G').replace('T','A') == restore_(*store_(s))
result in my test:
63c9308a00 33
so check it
AGCAATGCCGATGTTCATCGTATACTTTGACTA (input)
('63c9308a00', 33)
AGGAAAGGGGAAGAAGAAGGAAAAGAAAGAGAA to compare with information loss
AGGAAAGGGGAAGAAGAAGGAAAAGAAAGAGAA restored
True
I have narrowed down my problem in the following code. I am trying to convert a string into equivalent z3 expression. The problem is that when the variable name is big, the 'eval' puts extra \n in between the expression but if I use a smaller variable name the extra \n is not there. I need to have a bigger variable name because that is not under my control. Please suggest how can I make the code working correctly with bigger variable names
EXTRA \n PRODUCING CODE
from z3 import BitVec, Solver ##UnresolvedImport
z3sig_dict = {}
z3sig_dict['v__DOT__process_1_reg3'] = {"z3name":BitVec('v__DOT__process_1_reg3', 32), "bits":32}
z3sig_dict['v__DOT__process_1_reg3_1'] = {"z3name":BitVec('v__DOT__process_1_reg3_1', 32), "bits":32}
string = "(z3sig_dict['v__DOT__process_1_reg3']['z3name'] == (8 + (z3sig_dict['v__DOT__process_1_reg3_1']['z3name'] % 0x20000000)))"
s = Solver()
print(string)
clause = eval(string)
print(str(clause))
s.add(clause)
The output of this code is
(z3sig_dict['v__DOT__process_1_reg3']['z3name'] == (8 + (z3sig_dict['v__DOT__process_1_reg3_1']['z3name'] % 0x20000000)))
v__DOT__process_1_reg3 ==
8 + v__DOT__process_1_reg3_1%536870912
CORRECTLY WORKING CODE
from z3 import BitVec, Solver ##UnresolvedImport
z3sig_dict = {}
z3sig_dict['reg3'] = {"z3name":BitVec('reg3', 32), "bits":32}
z3sig_dict['reg3_1'] = {"z3name":BitVec('reg3_1', 32), "bits":32}
string = "(z3sig_dict['reg3']['z3name'] == (8 + (z3sig_dict['reg3_1']['z3name'] % 0x20000000)))"
s = Solver()
print(string)
clause = eval(string)
print(str(clause))
s.add(clause)
The output of this code is
(z3sig_dict['reg3']['z3name'] == (8 + (z3sig_dict['reg3_1']['z3name'] % 0x20000000)))
reg3 == 8 + reg3_1%536870912
SOME OBSERVATIONS
If I reduce the % 0x20000000 to % 0x2000, then also the code works correctly, but incorrectly if I add one more zero i.e 0x20000
Z3 adds the \n because it thinks the output is too wide for the shell to print. By default, it assumes that only 80 characters fit into one line, but it's easy to tell Z3 to use more space:
from z3 import *
set_param(max_lines=1, max_width=1000000)
print(str(clause))
I am reading a large amount of data from an excel spreadsheet in which I read (and reformat and rewrite) from the spreadsheet using the following general structure:
book = open_workbook('file.xls')
sheettwo = book.sheet_by_index(1)
out = open('output.file', 'w')
for i in range(sheettwo.nrows):
z = i + 1
toprint = """formatting of the data im writing. important stuff is to the right -> """ + str(sheettwo.cell(z,y).value) + """ more formatting! """ + str(sheettwo.cell(z,x).value.encode('utf-8')) + """ and done"""
out.write(toprint)
out.write("\n")
where x and y are arbitrary cells in this case, with x being less arbitrary and containing utf-8 characters
So far I have only been using the .encode('utf-8') in cells where I know there will be errors otherwise or foresee an error without using utf-8.
My question is basically this: is there a disadvantage to using .encode('utf-8') on all of the cells even if it is unnecessary? Efficiency is not an issue. the main issue is that it works even if there is a utf-8 character in a place there shouldn't be. If no errors would occur if I just lump the ".encode('utf-8')" onto every cell read, I will probably end up doing that.
The XLRD Documentation states it clearly: "From Excel 97 onwards, text in Excel spreadsheets has been stored as Unicode.". Since you are likely reading in files newer than 97, they are containing Unicode codepoints anyway. It is therefore necessary that keep the content of these cells as Unicode within Python and do not convert them to ASCII (which you do in with the str() function). Use this code below:
book = open_workbook('file.xls')
sheettwo = book.sheet_by_index(1)
#Make sure your writing Unicode encoded in UTF-8
out = open('output.file', 'w')
for i in range(sheettwo.nrows):
z = i + 1
toprint = u"formatting of the data im writing. important stuff is to the right -> " + unicode(sheettwo.cell(z,y).value) + u" more formatting! " + unicode(sheettwo.cell(z,x).value) + u" and done\n"
out.write(toprint.encode('UTF-8'))
This answer is really a few mild comments on the accepted answer, but they need better formatting than the SO comment facility provides.
(1) Avoiding the SO horizontal scrollbar enhances the chance that people will read your code. Try wrapping your lines, for example:
toprint = u"".join([
u"formatting of the data im writing. "
u"important stuff is to the right -> ",
unicode(sheettwo.cell(z,y).value),
u" more formatting! ",
unicode(sheettwo.cell(z,x).value),
u" and done\n"
])
out.write(toprint.encode('UTF-8'))
(2) Presumably you are using unicode() to convert floats and ints to unicode; it does nothing for values that are already unicode. Be aware that unicode(), like str(), gives you only 12 digits of precision for floats:
>>> unicode(123456.78901234567)
u'123456.789012'
If that is a bother, you might like to try something like this:
>>> def full_precision(x):
>>> ... return unicode(repr(x) if isinstance(x, float) else x)
>>> ...
>>> full_precision(u'\u0400')
u'\u0400'
>>> full_precision(1234)
u'1234'
>>> full_precision(123456.78901234567)
u'123456.78901234567'
(3) xlrd builds Cell objects on the fly when demanded.
sheettwo.cell(z,y).value # slower
sheettwo.cell_value(z,y) # faster