Python program problems - python

So I've been working on this python code for a few days now. I'm trying to decode a zero-one code I made previously. Simply put it hides genomic code...
binary = raw_input ('Enter binary code:')
binary = binary.replace('00', 'A')
binary = binary.replace('01', 'C')
binary = binary.replace('10', 'G')
binary = binary.replace('11', 'T')
print binary
My issue is, it will accept something like 0110 = CG. But when I add any characters after that it messes up, like 011011 should be CGT instead it's C1CC1. If anyone could identify this issue, or even solve it that would be great.

Repeatedly take off two characters and decode them
s = "100101001010101010110"
decode = {'00':'A', '01':'C', '10':'G', '11':'T'}
while s:
(code, s) = (s[:2], s[2:])
print decode[code]

An alternative solution to ForceBru's, using the re module:
import re
dna = '100010101001010111000'
base_pairs = {'00': 'A', '01': 'C', '10':'G', '11': 'T'}
alpha_dna = ''.join([base_pairs[x] for x in re.findall('..?', dna)])
# alpha_dna == 'GAGGGCCCTA'

In the following code error checking is omitted for the sake of clarity
a='011011'
rep={'00': 'A','01': 'C','10':'G','11': 'T'}
res=''
for x in xrange(0,len(a),2):
res+=rep[a[x]+a[x+1]]
print res
Here you just have to split the string into some blocks of length 2 and then use each of these blocks as a key of a dictionary.

One liner! Probably not useful, but just for fun:
"".join([ {'00':'A', '01':'C', '10':'G', '11':'T'}[code] for code in [ binary[i:i+2] for i in range(0,len(binary),2) ]])
Yes, I'm addicted to list comprehensions.

Related

Index strings by letter including diacritics

I'm not sure how to formulate this question, but I'm looking for a magic function that makes this code
for x in magicfunc("H̶e̕l̛l͠o͟ ̨w̡o̷r̀l҉ḑ!͜"):
print(x)
Behave like this:
H̶
e̕
l̛
l͠
o͟
̨
w̡
o̷
r̀
l҉
ḑ
!͜
Basically, is there a built in unicode function or method that takes a string and outputs an array per glyph with all their respective unicode decorators and diacritical marks and such? The same way that a text editor moves the cursor over to the next letter instead of iterating all of the combining characters.
If not, I'll write the function myself, no help needed. Just wondering if it already exists.
You can use unicodedata.combining to find out if a character is combining:
def combine(s: str) -> Iterable[str]:
buf = None
for x in s:
if unicodedata.combining(x) != 0:
# combining character
buf += x
else:
if buf is not None:
yield buf
buf = x
if buf is not None:
yield buf
Result:
>>> for x in combine("H̶e̕l̛l͠o͟ ̨w̡o̷r̀l҉ḑ!͜"):
... print(x)
...
H̶
e̕
l̛
l͠
o͟
̨
w̡
o̷
r̀
l
ḑ
!͜
Issue is that COMBINING CYRILLIC MILLIONS SIGN is not recognized as combining, not sure why. You could also test if COMBINING is in the unicodedata.name(x) for the character, that should solve it.
The 3rd party regex module can search by glyph:
>>> import regex
>>> s="H̶e̕l̛l͠o͟ ̨w̡o̷r̀l҉ḑ!͜"
>>> for x in regex.findall(r'\X',s):
... print(x)
...
H̶
e̕
l̛
l͠
o͟
̨
w̡
o̷
r̀
l҉
ḑ
!͜

How to convert Binary to Strings?

I am currently working on a binary encryption code: [Sender(Msg Input=> Binary Conversion)] : [Receiver (Binary Conversion => Msg Output)]
As of now I am able to convert text based Msgs , e.g) How are you? etc.
print("Enter Msg:")
def Binary_Encryption(message):
message = ''.join(format(i, 'b') for i in bytearray(message, encoding ='utf-8'))
print(message)
Binary_Encryption(input("").replace (" ","\\"))
Output: 10010001101111111011110111001100001111001011001011011100111100111011111110101111111
After the binary string is obtained, by just copying the string and placing it within this block of code will decrypt it.
def Binary_Decryption(binary):
string = int(binary, 2)
return string
bin_data = (input("Enter Binary:\n"))
str_data =''
for i in range(0, len(bin_data), 7):
temp_data = bin_data[i:i + 7]
decimal_data = Binary_Decryption(temp_data)
str_data = str_data + chr(decimal_data)
print("Decrypted Text:\n"+str_data.replace("\\"," "))
Output: How are you?
But I am not able to convert a certain inputs , e.g) ?? , 8879 , Oh! How are You? etc.
basically the msgs that are not being converted are Msgs with multiple uses of numbers or special
characters.
Msg Input for ?? gives "⌂▼" and 8879 gives "qc?☺" while Oh! How are You? gives "OhC9◄_o9CeK93_k▼
I think the problem is that the special characters (!, ?) contains only 6 bits, while the other characters 7.This messes things up if there are other characters behind the special one I think. Maybe something like this should work. There is probably a better way to solve this though.
def Binary_Encryption(message):
s = ""
for i in bytearray(message, encoding="utf-8"):
c = format(i, "b")
addon = 7 - len(c)
c = addon * "0" + c # prepend 0 if len shorter than 7
s += c # Add to string
print(s)
Your problem is that you are copying the output from binary_encrypt directly which truncate leading zeros so 8 instead of being 00111000 it became 111000 which result in 2 bits being used from next ASCII binary character since ASCII characters are represented as 8-bits values to print number 8897 use0011100000111000001110010011011100001010 as input to binary_decrypt. look for ASCII table to see the binary equivalents for each character.Just edit your code like this.
print("Enter Msg:")
def Binary_Encryption(message):
# pass 08b to format
message = ''.join(format(i, '08b') for i in bytearray(message, encoding ='utf-8'))
print(message)
Binary_Encryption(input("").replace (" ","\\"))

How can I use .replace() on a .txt file with accented characters?

So I have a code that takes a .txt file and adds it to a variable as a string.
Then, I try to use .replace() on it to change the character "ó" to "o", but it is not working! The console prints the same thing.
Code:
def normalize(filename):
#Ignores errors because I get the .txt from my WhatsApp conversations and emojis raise an error.
#File says: "Es una rubrica de evaluación." (among many emojis)
txt_raw = open(filename, "r", errors="ignore")
txt_read = txt_raw.read()
#Here, only the "o" is replaced. In the real code, I use a for loop to iterate through all chrs.
rem_accent_txt = txt_read.replace("ó", "o")
print(rem_accent_txt)
return
Expected output:
"Es una rubrica de evaluacion."
Current Output:
"Es una rubrica de evaluación."
It does not print an error or anything, it just prints it as it is.
I believe the problem lies on the fact that the string comes from a file because when I just create a string and use the code, it does work, but it does not work when I get the string from a file.
EDIT: SOLUTION!
Thanks to #juanpa.arrivillaga and #das-g I came up with this solution:
from unidecode import unidecode
def get_txt(filename):
txt_raw = open(filename, "r", encoding="utf8")
txt_read = txt_raw.read()
txt_decode = unidecode(txt_read)
print(txt_decode)
return txt_decode
Almost certainly, what is occuring is that you have a unormalized unicode strings. Essentially, there are two ways to create "ó" in unicode:
>>> combining = 'ó'
>>> composed = 'ó'
>>> len(combining), len(composed)
(2, 1)
>>> list(combining)
['o', '́']
>>> list(composed)
['ó']
>>> import unicodedata
>>> list(map(unicodedata.name, combining))
['LATIN SMALL LETTER O', 'COMBINING ACUTE ACCENT']
>>> list(map(unicodedata.name, composed))
['LATIN SMALL LETTER O WITH ACUTE']
Just normalize your strings:
>>> composed == combining
False
>>> composed == unicodedata.normalize("NFC", combining)
True
Although, taking a step back, do you really want to remove accents? Or do you just want to normalize to composed, like the above?
As an aside, you shouldn't ignore the errors when reading your text file. You should use the correct encoding. I suspect what is happening is that you are writing your text file using an incorrect encoding, because you should be able to handle emojis just fine, they aren't anything special in unicode.
>>> emoji = "😀"
>>> print(emoji)
😀
>>>
>>> unicodedata.name(emoji)
'GRINNING FACE'

how to convert repr into encoded string [duplicate]

This question already has answers here:
Convert "\x" escaped string into readable string in python
(4 answers)
Closed 7 months ago.
I have this str (coming from a file I can't fix):
In [131]: s
Out[131]: '\\xce\\xb8Oph'
This is close to the repr of a string encoded in utf8:
In [132]: repr('θOph'.encode('utf8'))
Out[132]: "b'\\xce\\xb8Oph'"
I need the original encoded string. I can do it with
In [133]: eval("b'{}'".format(s)).decode('utf8')
Out[133]: 'θOph'
But I would be ... sad? if there were no simpler option to get it. Is there a better way?
Your solution is OK, the only thing is that eval is dangerous when used with arbitrary inputs. The safe alternative is to use ast.literal_eval:
>>> s = '\\xce\\xb8Oph'
>>> from ast import literal_eval
>>> literal_eval("b'{}'".format(s)).decode('utf8')
'\u03b8Oph'
With eval you are subject to:
>>> eval("b'{}'".format("1' and print('rm -rf /') or b'u r owned")).decode('utf8')
rm -rf /
'u r owned'
Since ast.literal_eval is the opposite of repr for literals, I guess it is what you are looking for.
[updade]
If you have a file with escaped unicode, you may want to open it with the unicode_escape encoding as suggested in the answer by Ginger++. I will keep my answer because the question was "how to convert repr into encoded string", not "how to decode file with escaped unicode".
Just open your file with unicode_escape encoding, like:
with open('name', encoding="unicode_escape") as f:
pass # your code here
Original answer:
>>> '\\xce\\xb8Oph'.encode('utf-8').decode('unicode_escape')
'θOph'
You can get rid of that encoding to UTF-8, if you read your file in binary mode instead of text mode:
>>> b'\\xce\\xb8Oph'.decode('unicode_escape')
'θOph'
Unfortunately, this is really problematic. It's \ killing you softly here.
I can only think of:
s = '\\xce\\xb8Oph\\r\\nMore test\\t\\xc5\\xa1'
n = ""
x = 0
while x!=len(s):
if s[x]=="\\":
sx = s[x+1:x+4]
marker = sx[0:1]
if marker=="x": n += chr(int(sx[1:], 16)); x += 4
elif marker in ("'", '"', "\\", "n", "r", "v", "t", "0"):
# Pull this dict out of a loop to speed things up
n += {"'": "'", '"': '"', "\\": "\\", "n": "\n", "r": "\r", "t": "\t", "v": "\v", "0": "\0"}[marker]
x += 2
else: n += s[x]; x += 1
else: n += s[x]; x += 1
print repr(n), repr(s)
print repr(n.decode("UTF-8"))
There might be some other trick to pull this off, but at the moment this is all I got.
To make a teeny improvement on GingerPlusPlus's answer:
import tempfile
with tempfile.TemporaryFile(mode='rb+') as f:
f.write(r'\xce\xb8Oph'.encode())
f.flush()
f.seek(0)
print(f.read().decode('unicode_escape').encode('latin1').decode())
If you open the file in binary mode (i.e. rb, since you're reading, I added + since I was also writing to the file) you can skip the first encode call. It's still awkward, because you have to bounce through the decode/encode hop, but you at least do get to avoid that first encoding call.

Convert file to binary code in Python

I am looking to convert a file to binary for a project, preferably using Python as I am most comfortable with it, though if walked-through, I could probably use another language.
Basically, I need this for a project I am working on where we want to store data using a DNA strand and thus need to store files in binary ('A's and 'T's = 0, 'G's and 'C's = 1)
Any idea how I could proceed? I did find that use could encode in base64, then decode it, but it seems a bit inefficient, and the code that I have doesn't seem to work...
import base64
import tkinter as tk
from tkinter import filedialog
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
print(file_path)
with open(file_path) as f:
encoded = base64.b64encode(f.readlines())
print(encoded)
Also, I already have a program to do that simply with text. Any tips on how to improve it would also be appreciated!
import binascii
t = bytearray(str(input("Texte?")), 'utf8')
h = binascii.hexlify(t)
b = bin(int(h, 16)).replace('b','')
#removing the b that appears in the end for some reason
g = b.replace('1','G').replace('0','A')
print(g)
For example, if I input test:
ok so for the text to DNA:
I input 'test' and expect the DNA sequence that comes from the binary
the binary being: 01110100011001010111001101110100 (Also I asked to print every conversion in the example so that it is more comprehensible)
>>>Texte?test #Asks the text
>>>b'74657374' #converts to hex
>>>01110100011001010111001101110100 #converts to binary
>>>AGGGAGAAAGGAAGAGAGGGAAGGAGGGAGAA #converts 0 to A and 1 to G
So, thanks to #jonrshape and Sergey Vturin, I finally was able to achieve what I wanted!
My program asks for a file, turns it into binary, which then gives me its equivalent in "DNA code" using pairs of binary numbers (00 = A, 01 = T, 10 = G, 11 = C)
import binascii
from tkinter import filedialog
file_path = filedialog.askopenfilename()
x = ""
with open(file_path, 'rb') as f:
for chunk in iter(lambda: f.read(32), b''):
x += str(binascii.hexlify(chunk)).replace("b","").replace("'","")
b = bin(int(x, 16)).replace('b','')
g = [b[i:i+2] for i in range(0, len(b), 2)]
dna = ""
for i in g:
if i == "00":
dna += "A"
elif i == "01":
dna += "T"
elif i == "10":
dna += "G"
elif i == "11":
dna += "C"
print(x) #hexdump
print(b) #converted to binary
print(dna) #converted to "DNA"
Of course, it is inefficient!
base64 is designed to store binary in a text. It makes a bigger size block after conversion.
btw: what efficiency do you want? compactness?
if so: second sample is much nearer to what you want
btw: in your task you loose information! Are you aware of this?
Here is a sample how to store and restore.
It stores data in an easy to understand Hex-In-Text format -- just for the sake of a demo. If you want compactness - you can easily modify the code so as to store in binary file or if you want 00011001 view - modification will be easy too.
import math
#"make a long test string"
import numpy as np
s=''.join((str(x) for x in np.random.randint(4,size=33)))\
.replace('0','A').replace('1','T').replace('2','G').replace('3','C')
def store_(s):
size=len(s) #size will changed to fit 8*integer so remember true value of it and store with data
s2=s.replace('A','0').replace('T','0').replace('G','1').replace('C','1')\
.ljust( int(math.ceil(size/8.)*8),'0') #add '0' to 8xInt to the right
a=(hex( eval('0b'+s2[i*8:i*8+8]) )[2:].rjust(2,'0') for i in xrange(len(s2)/8))
return ''.join(a),size
yourDataAsHexInText,sizeToStore=store_(s)
print yourDataAsHexInText,sizeToStore
def restore_(s,size=None):
if size==None: size=len(s)/2
a=( bin(eval('0x'+s[i*2:i*2+2]))[2:].rjust(8,'0') for i in xrange(len(s)/2))
#you loose information, remember?, so it`s only A or G
return (''.join(a).replace('1','G').replace('0','A') )[:size]
restore_(yourDataAsHexInText,sizeToStore)
print "so check it"
print s ,"(input)"
print store_(s)
print s.replace('C','G').replace('T','A') ,"to compare with information loss"
print restore_(*store_(s)),"restored"
print s.replace('C','G').replace('T','A') == restore_(*store_(s))
result in my test:
63c9308a00 33
so check it
AGCAATGCCGATGTTCATCGTATACTTTGACTA (input)
('63c9308a00', 33)
AGGAAAGGGGAAGAAGAAGGAAAAGAAAGAGAA to compare with information loss
AGGAAAGGGGAAGAAGAAGGAAAAGAAAGAGAA restored
True

Categories

Resources