How to convert a word in string to binary - python

I was working on a module(the import module stuff) which would help to convert words in string to hex and binary(And octal if possible).I finished the hex part.But now I am struggling in case of the binary.I don't know where to start from or what to do.What I want to do is simple.It would take an input string such as 'test'.The function inside the module would convert it to binary.
What I have done till now is given below:
def string_hex(string): # Converts a word to hex
keyword = string.encode()
import binascii
hexadecimal=str(binascii.hexlify(keyword), 'ascii')
formatted_hex=':'.join(hexadecimal[i:i+2] for i in range(0, len(hexadecimal), 2))
return formatted_hex
def hex_string(hexa):
# hexa(Given this name because there is a built-in function hex()) should be written as string.For accuracy on words avoid symbols(, . !)
string = bytes.fromhex(hexa)
formatted_string = string.decode()
return formatted_string
I saved in the directory where I have installed my python in the name experiment.py.This is the way I call it.
>>> from experiment import string_hex
>>> string_hex('test')
'74:65:73:74'
Just like that I am able to convert it back also like this:
>>> from experiment import hex_string
>>> hex_string('74657374')
'test'
Just like this wanted to convert words in strings to binary.And one more thing I am using python 3.4.2.Please help me.

You can do it as follows. You don't even have to import binascii.
def string_hex(string):
return ':'.join(format(ord(c), 'x') for c in string)
def hex_string(hexa):
hexgen = (hexa[i:i+2] for i in range(0, len(hexa), 2))
return ''.join(chr(eval('0x'+n)) for n in hexgen)
def string_bin(string):
return ':'.join(format(ord(c), 'b') for c in string)
def bin_string(binary):
bingen = (binary[i:i+7] for i in range(0, len(binary), 7))
return ''.join(chr(eval('0b'+n)) for n in bingen)
And here is the output:
>>> string_hex('test')
'74:65:73:74'
>>> hex_string('74657374')
'test'
>>> string_bin('test')
'1110100:1100101:1110011:1110100'
>>> bin_string('1110100110010111100111110100')
'test'

Related

Index strings by letter including diacritics

I'm not sure how to formulate this question, but I'm looking for a magic function that makes this code
for x in magicfunc("H̶e̕l̛l͠o͟ ̨w̡o̷r̀l҉ḑ!͜"):
print(x)
Behave like this:
H̶
e̕
l̛
l͠
o͟
̨
w̡
o̷
r̀
l҉
ḑ
!͜
Basically, is there a built in unicode function or method that takes a string and outputs an array per glyph with all their respective unicode decorators and diacritical marks and such? The same way that a text editor moves the cursor over to the next letter instead of iterating all of the combining characters.
If not, I'll write the function myself, no help needed. Just wondering if it already exists.
You can use unicodedata.combining to find out if a character is combining:
def combine(s: str) -> Iterable[str]:
buf = None
for x in s:
if unicodedata.combining(x) != 0:
# combining character
buf += x
else:
if buf is not None:
yield buf
buf = x
if buf is not None:
yield buf
Result:
>>> for x in combine("H̶e̕l̛l͠o͟ ̨w̡o̷r̀l҉ḑ!͜"):
... print(x)
...
H̶
e̕
l̛
l͠
o͟
̨
w̡
o̷
r̀
l
ḑ
!͜
Issue is that COMBINING CYRILLIC MILLIONS SIGN is not recognized as combining, not sure why. You could also test if COMBINING is in the unicodedata.name(x) for the character, that should solve it.
The 3rd party regex module can search by glyph:
>>> import regex
>>> s="H̶e̕l̛l͠o͟ ̨w̡o̷r̀l҉ḑ!͜"
>>> for x in regex.findall(r'\X',s):
... print(x)
...
H̶
e̕
l̛
l͠
o͟
̨
w̡
o̷
r̀
l҉
ḑ
!͜

Base64 to string of 1s and 0s convertor

I did some search for this question, but I can not find relevant answer. I am trying to convert some string form input() function to Base64 and from Base64 to raw string of 1s and 0s. My converter is working, but its output are not the raw bits, but something like this: b'YWhvag=='.
Unfortunately, I need string of 1s and 0s, because I want to send this data via "flickering" of LED.
Can you help me figure it out please? Thank you for any kind of help!
import base64
some_text = input()
base64_string = (base64.b64encode(some_text.encode("ascii")))
print(base64_string)
If I have understood it corretly you want binary equivalent of string like 'hello' to ['1101000', '1100101', '1101100', '1101100', '1101111'] each for h e l l o
import base64
some_text = input('enter a string to be encoded(8 bit encoding): ')
def encode(text):
base64_string = (base64.b64encode(text.encode("ascii")))
return ''.join([bin(i)[2:].zfill(8) for i in base64_string])
def decode(binary_digit):
b = str(binary_digit)
c = ['0b'+b[i:i+8] for i in range(0,len(b), 8)]
c = ''.join([chr(eval(i)) for i in c])
return base64.b64decode(c).decode('ascii')
encoded_value = encode(some_text)
decoded_value = decode(encoded_value)
print(f'encoded value of {some_text} in 8 bit encoding is: {encoded_value}')
print(f'decoded value of {encoded_value} is: {decoded_value}')

python from hex to shellcode format

I try to convert a hex string to shellcode format
For example: I have a file in hex string like aabbccddeeff11223344
and I want to convert that through python to show this exact format:
"\xaa\xbb\xcc\xdd\xee\xff\x11\x22\x33\x44" including the quotes "".
My code is:
with open("file","r") as f:
a = f.read()
b = "\\x".join(a[i:i+2] for i in range(0, len(a), 2))
print b
so my output is aa\xbb\xcc\xdd\xee\xff\x11\x22\x33\x44\x.
I understand I can do it via sed command but I wonder how I may accomplish this through python.
The binascii standard module will help here:
import binascii
print repr(binascii.unhexlify("aabbccddeeff11223344"))
Output:
>>> print repr(binascii.unhexlify("aabbccddeeff11223344"))
'\xaa\xbb\xcc\xdd\xee\xff\x11"3D'

Convert file to binary code in Python

I am looking to convert a file to binary for a project, preferably using Python as I am most comfortable with it, though if walked-through, I could probably use another language.
Basically, I need this for a project I am working on where we want to store data using a DNA strand and thus need to store files in binary ('A's and 'T's = 0, 'G's and 'C's = 1)
Any idea how I could proceed? I did find that use could encode in base64, then decode it, but it seems a bit inefficient, and the code that I have doesn't seem to work...
import base64
import tkinter as tk
from tkinter import filedialog
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
print(file_path)
with open(file_path) as f:
encoded = base64.b64encode(f.readlines())
print(encoded)
Also, I already have a program to do that simply with text. Any tips on how to improve it would also be appreciated!
import binascii
t = bytearray(str(input("Texte?")), 'utf8')
h = binascii.hexlify(t)
b = bin(int(h, 16)).replace('b','')
#removing the b that appears in the end for some reason
g = b.replace('1','G').replace('0','A')
print(g)
For example, if I input test:
ok so for the text to DNA:
I input 'test' and expect the DNA sequence that comes from the binary
the binary being: 01110100011001010111001101110100 (Also I asked to print every conversion in the example so that it is more comprehensible)
>>>Texte?test #Asks the text
>>>b'74657374' #converts to hex
>>>01110100011001010111001101110100 #converts to binary
>>>AGGGAGAAAGGAAGAGAGGGAAGGAGGGAGAA #converts 0 to A and 1 to G
So, thanks to #jonrshape and Sergey Vturin, I finally was able to achieve what I wanted!
My program asks for a file, turns it into binary, which then gives me its equivalent in "DNA code" using pairs of binary numbers (00 = A, 01 = T, 10 = G, 11 = C)
import binascii
from tkinter import filedialog
file_path = filedialog.askopenfilename()
x = ""
with open(file_path, 'rb') as f:
for chunk in iter(lambda: f.read(32), b''):
x += str(binascii.hexlify(chunk)).replace("b","").replace("'","")
b = bin(int(x, 16)).replace('b','')
g = [b[i:i+2] for i in range(0, len(b), 2)]
dna = ""
for i in g:
if i == "00":
dna += "A"
elif i == "01":
dna += "T"
elif i == "10":
dna += "G"
elif i == "11":
dna += "C"
print(x) #hexdump
print(b) #converted to binary
print(dna) #converted to "DNA"
Of course, it is inefficient!
base64 is designed to store binary in a text. It makes a bigger size block after conversion.
btw: what efficiency do you want? compactness?
if so: second sample is much nearer to what you want
btw: in your task you loose information! Are you aware of this?
Here is a sample how to store and restore.
It stores data in an easy to understand Hex-In-Text format -- just for the sake of a demo. If you want compactness - you can easily modify the code so as to store in binary file or if you want 00011001 view - modification will be easy too.
import math
#"make a long test string"
import numpy as np
s=''.join((str(x) for x in np.random.randint(4,size=33)))\
.replace('0','A').replace('1','T').replace('2','G').replace('3','C')
def store_(s):
size=len(s) #size will changed to fit 8*integer so remember true value of it and store with data
s2=s.replace('A','0').replace('T','0').replace('G','1').replace('C','1')\
.ljust( int(math.ceil(size/8.)*8),'0') #add '0' to 8xInt to the right
a=(hex( eval('0b'+s2[i*8:i*8+8]) )[2:].rjust(2,'0') for i in xrange(len(s2)/8))
return ''.join(a),size
yourDataAsHexInText,sizeToStore=store_(s)
print yourDataAsHexInText,sizeToStore
def restore_(s,size=None):
if size==None: size=len(s)/2
a=( bin(eval('0x'+s[i*2:i*2+2]))[2:].rjust(8,'0') for i in xrange(len(s)/2))
#you loose information, remember?, so it`s only A or G
return (''.join(a).replace('1','G').replace('0','A') )[:size]
restore_(yourDataAsHexInText,sizeToStore)
print "so check it"
print s ,"(input)"
print store_(s)
print s.replace('C','G').replace('T','A') ,"to compare with information loss"
print restore_(*store_(s)),"restored"
print s.replace('C','G').replace('T','A') == restore_(*store_(s))
result in my test:
63c9308a00 33
so check it
AGCAATGCCGATGTTCATCGTATACTTTGACTA (input)
('63c9308a00', 33)
AGGAAAGGGGAAGAAGAAGGAAAAGAAAGAGAA to compare with information loss
AGGAAAGGGGAAGAAGAAGGAAAAGAAAGAGAA restored
True

Python read data from file and convert to double precision

I've been reading an ASCII data file using python. Then I covert the data into a numpy array.
However, I've noticed that the numbers are being rounded.
E.g. My original value from the file is: 2368999.932089
which python has rounded to: 2368999.93209
here is an example of my code:
import numpy as np
datafil = open("test.txt",'r')
tempvar = []
header = datafil.readline()
for line in datafil:
word = line.split()
char = word[0] # take the first element word[0] of the list
word.pop() # remove the last element from the list "word"
if char[0:3] >= '224' and char[0:3] < '225':
tempvar.append(word)
strvar = np.array(tempvar,dtype = np.longdouble) # Here I want to read all data as double
print(strvar.shape)
var = strvar[:,0:23]
print(var[0,22]) # here it prints 2368999.93209 but the actual value is 2368999.932089
Any ideas guys?
Abedin
I think this is not a problem of your code. It's the usual floating point representation in Python. See
https://docs.python.org/2/tutorial/floatingpoint.html
I think when you print it, print already formatted your number to str
In [1]: a=2368999.932089
In [2]: print a
2368999.93209
In [3]: str(a)
Out[3]: '2368999.93209'
In [4]: repr(a)
Out[4]: '2368999.932089'
In [5]: a-2368999.93209
Out[5]: -9.997747838497162e-07
I'm not totally sure what you're trying to do, but simplified with test.txt containing only
asdf
2368999.932089
and then the code:
import numpy as np
datafil = open("test.txt",'r')
tempvar = []
header = datafil.readline()
for line in datafil:
tempvar.append(line)
print(tempvar)
strvar = np.array(tempvar, dtype=np.float)
print(strvar.shape)
print(strvar)
I get the following output:
$ python3 so.py
['2368999.932089']
(1,)
[ 2368999.932089]
which seems to be working fine.
Edit: Updated with your provided line, so test.txt is
asdf
t JD a e incl lasc aper truean rdnnode RA Dec RArate Decrate metdr1 metddr1 metra1 metdec1 metbeta1 metdv1 metsl1 metarrJD1 beta JDej name 223.187263 2450520.619348 3.12966 0.61835 70.7196 282.97 171.324 -96.2738 1.19968 325.317 35.8075 0.662368 0.364967 0.215336 3.21729 -133.586 46.4884 59.7421 37.7195 282.821 2450681.900221 0 2368999.932089 EH2003
and the code
import numpy as np
datafil = open("test.txt",'r')
tempvar = []
header = datafil.readline()
for line in datafil:
tempvar.append(line.split(' '))
print(tempvar)
strvar = np.array(tempvar[0][-2], dtype=np.float)
print(strvar)
the last print still outputs 2368999.932089 for me. So I'm guessing this is a platform issue? What happens if you force dtype=np.float64 or dtype=np.float128? Some other sanity checks: have you tried spitting out the text before it is converted to a float? And what do you get from doing something like:
>>> np.array('2368999.932089')
array('2368999.932089',
dtype='<U14')
>>> float('2368999.932089')
2368999.932089

Categories

Resources