I am looking to convert a file to binary for a project, preferably using Python as I am most comfortable with it, though if walked-through, I could probably use another language.
Basically, I need this for a project I am working on where we want to store data using a DNA strand and thus need to store files in binary ('A's and 'T's = 0, 'G's and 'C's = 1)
Any idea how I could proceed? I did find that use could encode in base64, then decode it, but it seems a bit inefficient, and the code that I have doesn't seem to work...
import base64
import tkinter as tk
from tkinter import filedialog
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
print(file_path)
with open(file_path) as f:
encoded = base64.b64encode(f.readlines())
print(encoded)
Also, I already have a program to do that simply with text. Any tips on how to improve it would also be appreciated!
import binascii
t = bytearray(str(input("Texte?")), 'utf8')
h = binascii.hexlify(t)
b = bin(int(h, 16)).replace('b','')
#removing the b that appears in the end for some reason
g = b.replace('1','G').replace('0','A')
print(g)
For example, if I input test:
ok so for the text to DNA:
I input 'test' and expect the DNA sequence that comes from the binary
the binary being: 01110100011001010111001101110100 (Also I asked to print every conversion in the example so that it is more comprehensible)
>>>Texte?test #Asks the text
>>>b'74657374' #converts to hex
>>>01110100011001010111001101110100 #converts to binary
>>>AGGGAGAAAGGAAGAGAGGGAAGGAGGGAGAA #converts 0 to A and 1 to G
So, thanks to #jonrshape and Sergey Vturin, I finally was able to achieve what I wanted!
My program asks for a file, turns it into binary, which then gives me its equivalent in "DNA code" using pairs of binary numbers (00 = A, 01 = T, 10 = G, 11 = C)
import binascii
from tkinter import filedialog
file_path = filedialog.askopenfilename()
x = ""
with open(file_path, 'rb') as f:
for chunk in iter(lambda: f.read(32), b''):
x += str(binascii.hexlify(chunk)).replace("b","").replace("'","")
b = bin(int(x, 16)).replace('b','')
g = [b[i:i+2] for i in range(0, len(b), 2)]
dna = ""
for i in g:
if i == "00":
dna += "A"
elif i == "01":
dna += "T"
elif i == "10":
dna += "G"
elif i == "11":
dna += "C"
print(x) #hexdump
print(b) #converted to binary
print(dna) #converted to "DNA"
Of course, it is inefficient!
base64 is designed to store binary in a text. It makes a bigger size block after conversion.
btw: what efficiency do you want? compactness?
if so: second sample is much nearer to what you want
btw: in your task you loose information! Are you aware of this?
Here is a sample how to store and restore.
It stores data in an easy to understand Hex-In-Text format -- just for the sake of a demo. If you want compactness - you can easily modify the code so as to store in binary file or if you want 00011001 view - modification will be easy too.
import math
#"make a long test string"
import numpy as np
s=''.join((str(x) for x in np.random.randint(4,size=33)))\
.replace('0','A').replace('1','T').replace('2','G').replace('3','C')
def store_(s):
size=len(s) #size will changed to fit 8*integer so remember true value of it and store with data
s2=s.replace('A','0').replace('T','0').replace('G','1').replace('C','1')\
.ljust( int(math.ceil(size/8.)*8),'0') #add '0' to 8xInt to the right
a=(hex( eval('0b'+s2[i*8:i*8+8]) )[2:].rjust(2,'0') for i in xrange(len(s2)/8))
return ''.join(a),size
yourDataAsHexInText,sizeToStore=store_(s)
print yourDataAsHexInText,sizeToStore
def restore_(s,size=None):
if size==None: size=len(s)/2
a=( bin(eval('0x'+s[i*2:i*2+2]))[2:].rjust(8,'0') for i in xrange(len(s)/2))
#you loose information, remember?, so it`s only A or G
return (''.join(a).replace('1','G').replace('0','A') )[:size]
restore_(yourDataAsHexInText,sizeToStore)
print "so check it"
print s ,"(input)"
print store_(s)
print s.replace('C','G').replace('T','A') ,"to compare with information loss"
print restore_(*store_(s)),"restored"
print s.replace('C','G').replace('T','A') == restore_(*store_(s))
result in my test:
63c9308a00 33
so check it
AGCAATGCCGATGTTCATCGTATACTTTGACTA (input)
('63c9308a00', 33)
AGGAAAGGGGAAGAAGAAGGAAAAGAAAGAGAA to compare with information loss
AGGAAAGGGGAAGAAGAAGGAAAAGAAAGAGAA restored
True
Related
I've been writing Python code for only about 4 weeks. I'm writing a little text based game to learn and test everything I know. I can easily make this work with a value entered into the console as an integer, but for whatever reason I can't get my code to work with reading this value from a text file.
Earlier in the program, my code saves a value to a text file, just one value, then later it opens the same text file, over-writes the value with a new value based on a very simple calculation. That calculation is the first value, plus 5. I've spent a bunch of time reading on this site and going through my books and at this point I'm pretty sure I'm just missing something obvious.
The first piece of code that creates the doc and sets the value:
def set_hp(self):
f = open('player_hp.txt', 'w')
self.hitpoints = str(int(self.hitpoints))
f.write(self.hitpoints)
f.close()
This is the trouble section...I have commented the line with the problem.
def camp_fire():
print
print "You stop to build a fire and rest..."
print "Resting will restore your health."
print "You gather wood from the ground around you. You spark\n\
your flint against some tinder. A flame appears.\n\
You sit, and close your eyes in weariness. A peaceful calm takes you\n\
into sleep."
f = open('player_hp.txt', 'r')
orig_hp = f.readlines()
orig_hp = str(orig_hp)
f = open('player_hp.txt', 'w')
new_value = orig_hp + 5 ##this is where my code breaks
new_value = str(int(new_value))
f.write(new_value)
f.close()
print "You have gained 5 hitpoints from resting. Your new HP are {}.".format(new_value)
This is the error I get:
File "C:\Python27\Awaken03.py", line 300, in camp_fire
new_value = orig_hp + 5
TypeError: cannot concatenate 'str' and 'int' objects
I know you can't concatenate a string and an integer, but I keep trying different methods to convert the string to an integer to do the quick math, but I can't seem to get it right.
Error message is clear, you are trying to concatenate string with an integer. You should change the line from:
new_value = orig_hp + 5
To:
new_value = str(int(orig_hp) + 5)
Then you can use above value to write directly into file as a string as:
##new_value = str(int(new_value))## Skip this line
f.write(new_value)
f.readlines() returns a list of lines, in your case something like ['10']. So str(orig_hp) is a textual representation of this list, like '[\'10\']', which you won't be able to interpret as an integer.
You can just use f.read() to read the whole file at once in a string, which will be something like '10', and convert it to a integer:
orig_hp = int(f.read())
I have a file named sample.txt which looks like below
ServiceProfile.SharediFCList[1].DefaultHandling=1
ServiceProfile.SharediFCList[1].ServiceInformation=
ServiceProfile.SharediFCList[1].IncludeRegisterRequest=n
ServiceProfile.SharediFCList[1].IncludeRegisterResponse=n
Here my requirement is to remove the brackets and the integer and enter os commands with that
ServiceProfile.SharediFCList.DefaultHandling=1
ServiceProfile.SharediFCList.ServiceInformation=
ServiceProfile.SharediFCList.IncludeRegisterRequest=n
ServiceProfile.SharediFCList.IncludeRegisterResponse=n
I am quite a newbie in Python. This is my first attempt. I have used these codes to remove the brackets:
#!/usr/bin/python
import re
import os
import sys
f = os.open("sample.txt", os.O_RDWR)
ret = os.read(f, 10000)
os.close(f)
print ret
var1 = re.sub("[\(\[].*?[\)\]]", "", ret)
print var1f = open("removed.cfg", "w+")
f.write(var1)
f.close()
After this using the file as input I want to form application specific commands which looks like this:
cmcli INS "DefaultHandling=1 ServiceInformation="
and the next set as
cmcli INS "IncludeRegisterRequest=n IncludeRegisterRequest=y"
so basically now I want the all the output to be bunched to a set of two for me to execute the commands on the operating system.
Is there any way that I could bunch them up as set of two?
Reading 10,000 bytes of text into a string is really not necessary when your file is line-oriented text, and isn't scalable either. And you need a very good reason to be using os.open() instead of open().
So, treat your data as the lines of text that it is, and every two lines, compose a single line of output.
from __future__ import print_function
import re
command = [None,None]
cmd_id = 1
bracket_re = re.compile(r".+\[\d\]\.(.+)")
# This doesn't just remove the brackets: what you actually seem to want is
# to pick out everything after [1]. and ignore the rest.
with open("removed_cfg","w") as outfile:
with open("sample.txt") as infile:
for line in infile:
m = bracket_re.match(line)
cmd_id = 1 - cmd_id # gives 0, 1, 0, 1
command[cmd_id] = m.group(1)
if cmd_id == 1: # we have a pair
output_line = """cmcli INS "{0} {1}" """.format(*command)
print (output_line, file=outfile)
This gives the output
cmcli INS "DefaultHandling=1 ServiceInformation="
cmcli INS "IncludeRegisterRequest=n IncludeRegisterResponse=n"
The second line doesn't correspond to your sample output. I don't know how the input IncludeRegisterResponse=n is supposed to become the output IncludeRegisterRequest=y. I assume that's a mistake.
Note that this code depends on your input data being precisely as you describe it and has no error checking whatsoever. So if the format of the input is in reality more variable than that, then you will need to add some validation.
I'm trying to implement a markdown-like language to do math with. The basic idea is to have a file where you can write down your math, then have a python-script do the calculations and spit out tex.
However, I'm facing the problem, that Sympy refuses to spit out values, it only gives me back the equation. Much weirder is the fact, that it DOES spit out values in an alternate test-script, that is essentially the same code.
This is the working code:
import sympy as sp
m = sp.symbols('m')
kg = sp.symbols('kg')
s = sp.symbols('s')
g = sp.sympify(9.80665*m/s**2)
mass = sp.sympify(0.2*kg)
acc = sp.sympify(g)
F = sp.sympify(mass*acc)
print F
Output:
1.96133*kg*m/s**2
This the not working code:
import re
import sympy as sp
print 'import sympy as sp'
#read units
mymunits = 'units.mymu'
with open(mymunits) as mymu:
mymuinput = mymu.readlines()
for lines in mymuinput:
lines = re.sub('\s+','',lines).split()
if lines != []:
if lines[0][0] != '#':
unit = lines[0].split('#')[0]
globals()[unit] = sp.symbols(unit)
print unit+' = sp.symbols(\''+unit+'\')'
#read constants
mymconstants = 'constants.mymc'
with open(mymconstants) as mymc:
mymcinput = mymc.readlines()
for lines in mymcinput:
lines = re.sub('\s+','',lines).split()
if lines != []:
if lines[0][0] != '#':
constant = lines[0].split('#')[0].split(':=')
globals()[constant[0]] = sp.sympify(constant[1])
print constant[0]+' = sp.sympify('+constant[1]+')'
#read file
mymfile = 'test.mym'
with open(mymfile) as mym:
myminput = mym.readlines()
#create equations by removing spaces and splitting lines
for line in myminput:
line = line.replace(' ','').strip().split(';')
for eqstr in line:
if eqstr != '':
eq = re.split(':=',eqstr)
globals()[eq[0]] = sp.sympify(eq[1])
print eq[0]+' = sp.sympify('+eq[1]+')'
print 'print F'
print F
It outputs this:
acc*mass
It SHOULD output a value, just like the test-script.
The same script also outputs the code that is used in the test-script. The only difference is, that in the not-working script, I try to generate the code from an input-file, which looks like that:
mass := 0.2*kg ; acc := g
F := mass*acc
as well as files for units:
#SI
m #length
kg #mass
s #time
and constants:
#constants
g:=9.80665*m/s**2 #standard gravity
The whole code is also to be found on github.
What I don't get is why the one version works, while the other doesn't. Any ideas are welcomed.
Thank you.
Based on Everts comment, I cam up with this solution:
change:
sp.sympify(eq[1])
to:
sp.sympify(eval(eq[1]))
For my class assignment we need to decrypt a message that used RSA Encryption. We were given code that should help us with the decryption, but its not helping.
def block_decode(x):
output = ""
i = BLOCK_SIZE+1
while i > 0:
b1 = int(pow(95,i-1))
y = int(x/b1)
i = i - 1
x = x - y*b1
output = output + chr(y+32)
return output
I'm not great with python yet but it looks like it is doing something one character at a time. What really has me stuck is the data we were given. Can't figure out where or how to store it or if it is really decrypted data using RSA. below are just 3 lines of 38 lines some lines have ' or " or even multiple.
FWfk ?0oQ!#|eO Wgny 1>a^ 80*^!(l{4! 3lL qj'b!.9#'!/s2_
!BH+V YFKq _#:X &?A8 j_p< 7\[0 la.[ a%}b E`3# d3N? ;%FW
KyYM!"4Tz yuok J;b^!,V4) \JkT .E[i i-y* O~$? o*1u d3N?
How do I get this into a string list?
You are looking for the function ord which is a built-in function that
Returns the integer ordinal of a one-character string.
So for instance, you can do:
my_file = open("file_containing_encrypted_message")
data = my_file.read()
to read in the encrypted contents.
Then, you can iterate over each character doing
char_val = ord(each_character)
block_decode(char_val)
I'm wondering, how can I count for example all "s" characters and print their number in a text file that I'm importing? Tried few times to do it by my own but I'm still doing something wrong. If someone could give me some tips I would really appreciate that :)
Open the file, the "r" means it is opened as readonly mode.
filetoread = open("./filename.txt", "r")
With this loop, you iterate over all the lines in the file and counts the number of times the character chartosearch appears. Finally, the value is printed.
total = 0
chartosearch = 's'
for line in filetoread:
total += line.count(chartosearch)
print("Number of " + chartosearch + ": " + total)
I am assuming you want to read a file, find the number of s s and then, store the result at the end of the file.
f = open('blah.txt','r+a')
data_to_read = f.read().strip()
total_s = sum(map(lambda x: x=='s', data_to_read ))
f.write(str(total_s))
f.close()
I did it functionally just to give you another perspective.
You open the file with an open("myscript.txt", "r") with the mode as "r" because you are reading. To remove whitespaces and \n's, we do a .read().split(). Then, using a for loop, we loop over each individual character and check if it is an 'S' or an 's', and each time we find one, we add one to the scount variable (scount is supposed to mean S-count).
filetoread = open("foo.txt").read().split()
scount = 0
for k in ''.join(filetoread):
if k.lower() == 's':
scount+=1
print ("There are %d 's' characters" %(scount))
Here's a version with a reasonable time performance (~500MB/s on my machine) for ascii letters:
#!/usr/bin/env python3
import sys
from functools import partial
byte = sys.argv[1].encode('ascii') # s
print(sum(chunk.count(byte)
for chunk in iter(partial(sys.stdin.buffer.read, 1<<14), b'')))
Example:
$ echo baobab | ./count-byte b
3
It could be easily changed to support arbitrary Unicode codepoints:
#!/usr/bin/env python3
import sys
from functools import partial
char = sys.argv[1]
print(sum(chunk.count(char)
for chunk in iter(partial(sys.stdin.read, 1<<14), '')))
Example:
$ echo ⛄⛇⛄⛇⛄ | ./count-char ⛄
3
To use it with a file, you could use a redirect:
$ ./count-char < input_file