Converting a string to binary - python

I need some help converting a string to binary. I have to do it using my own code, not built in functions (except I can use 'ord' to get the characters into decimal).
The problem I have is that it only seems to convert the first character into binary, not all of the characters of the string. For instance, if you type "hello" it will convert the h to binary but not the whole thing.
Here's what I have so far
def convertFile():
myList = []
myList2 = []
flag = True
string = input("input a string: ")
for x in string:
x = ord(x)
myList.append(x)
print(myList)
for i in range(len(myList)):
for x in myList:
print(x)
quotient = x / 2
quotient = int(quotient)
print(quotient)
remainder = x % 2
remainder = int(remainder)
print(remainder)
myList2.append(remainder)
print(myList2)
if int(quotient) < 1:
pass
else:
x = quotient
myList2.reverse()
print ("" .join(map(str, myList2)))
convertFile()

If you're just wanting "hex strings", you can use the following snippet:
''.join( '%x' % ord(i) for i in input_string )
Eg. 'hello' => '68656c6c6f', where 'h' => '68' in the ascii table.

def dec2bin(decimal_value):
return magic_that_converts_a_decimal_to_binary(decimal_value)
ordinal_generator = (ord(letter) for letter in my_word) #generators are lazily evaluated
bins = [dec2bin(ordinal_value) for ordinal_value in ordinal_generator]
print bins
as an aside this is bad
for x in myList:
...
x = whatever
since once it goes to x again at the top whatever you set x equal to gets tossed out and x gets assigned the next value in the list

Related

How Does The Base64 Work and How To Encode/Decode in it?

I have a problem that asks me to encode a string to base64 format! I think I got it based on my code! The string: "Man" works and other short strings work. But the string: "this is a string!!" doesn't work! And also I want to use the non-padding version. The questions asks me to use the non-padding version. Can you please explain the process of how to encode this string: "this is a string!!"! I have to turn the letters to ascii, and then turn them into binary and divide them into 6 bytes and then turn them to decimal and refer to a chart of ascii and then use them. This is all I know! But, please don't give me the code. I want to try out the coding on my own. But please explain the process. There are no good videos explaining this topic! And by the way, I am using python Thank you
Here is the code I have:
def decimal(binary):
binary = str(binary); power = len(binary)-1
values = []
for x in binary:
if x == "1":
values.append((x, 2**power))
power -= 1
return sum([v for b,v in values if b == "1"])
string = "Man"
byte = ""
for x in string:
byte += bin(ord(x))[0] + bin(ord(x))[2:]
values = []
for x in range(0, len(byte), 6):
values.append(byte[x:x+6])
abc = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
table = {x:abc[x] for x in range(len(abc))}
print("".join(table[decimal(x)] for x in values))
I am using python!
Adjusted parts are explained using in-line comments:
import sys # merely for manipulation with supplied arguments
import math
if len(sys.argv) == 1:
string = "This is a string!!!"
else:
string = ' '.join([sys.argv[i] for i in range(1,len(sys.argv))])
def decimal(binary):
binary = str(binary); power = len(binary)-1
values = []
for x in binary:
if x == "1":
values.append((x, 2**power))
power -= 1
return sum([v for b,v in values if b == "1"])
byte = ""
for x in string.encode('utf-8'): # ASCII is a proper subset of UTF-8
byte += bin(x)[2:].rjust(8,'0') # get binary string of length 8
byte = byte.ljust(math.ceil(len(byte)/6)*6,'0') # length must be divisible by 6
values = []
for x in range(0, len(byte), 6):
values.append(byte[x:x+6])
abc = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
table = {x:abc[x] for x in range(len(abc))}
print(string) # input
padding = '=' * (((3 - len(string.encode('utf-8'))) % 3) % 3)
ooutput = "".join(table[decimal(x)] for x in values)
print(ooutput)
print(ooutput + padding) # for the sake of completeness
import base64 # merely for comparison/reference output
# ↓↓↓ output from base64 module ↓↓↓
print(base64.b64encode(string.encode('utf-8')).decode('utf-8'))
Output: .\SO\66724448.py ěščř ĚŠČŘ & .\SO\66724448.py
ěščř ĚŠČŘ
xJvFocSNxZkgxJrFoMSMxZg
xJvFocSNxZkgxJrFoMSMxZg=
xJvFocSNxZkgxJrFoMSMxZg=
This is a string!!!
VGhpcyBpcyBhIHN0cmluZyEhIQ
VGhpcyBpcyBhIHN0cmluZyEhIQ==
VGhpcyBpcyBhIHN0cmluZyEhIQ==

Python: Count character in string which are following each other

I have a string in which I want to count the occurrences of # following each other to replace them by numbers to create a increment.
For example:
rawString = 'MyString1_test##_edit####'
for x in xrange(5):
output = doConvertMyString(rawString)
print output
MyString1_test01_edit0001
MyString1_test02_edit0002
MyString1_test03_edit0003
MyString1_test04_edit0004
MyString1_test05_edit0005
Assuming that the number of # is not fixed and that rawString is a user input containing only string.ascii_letters + string.digits + '_' + '#, how can I do that?
Here is my test so far:
rawString = 'MyString1_test##_edit####'
incrDatas = {}
key = '#'
counter = 1
for x in xrange(len(rawString)):
if rawString[x] != key:
counter = 1
continue
else:
if x > 0:
if rawString[x - 1] == key:
counter += 1
else:
pass
# ???
You may use zfill in the re.sub replacement to pad any amount of # chunks. #+ regex pattern matches 1 or more # symbols. The m.group() stands for the match the regex found, and thus, we replace all #s with the incremented x converted to string padded with the same amount of 0s as there are # in the match.
import re
rawString = 'MyString1_test##_edit####'
for x in xrange(5):
output = re.sub(r"#+", lambda m: str(x+1).zfill(len(m.group())), rawString)
print output
Result of the demo:
MyString1_test01_edit0001
MyString1_test02_edit0002
MyString1_test03_edit0003
MyString1_test04_edit0004
MyString1_test05_edit0005
The code below converts the rawString to a format string, using groupby in a list comprehension to find groups of hashes. Each run of hashes is converted into a format directive to print a zero-padded integer of the appropriate width, runs of non-hashes are simply joined back together.
This code works on Python 2.6 and later.
from itertools import groupby
def convert(template):
return ''.join(['{{x:0{0}d}}'.format(len(list(g))) if k else ''.join(g)
for k, g in groupby(template, lambda c: c == '#')])
rawString = 'MyString1_test##_edit####'
fmt = convert(rawString)
print(repr(fmt))
for x in range(5):
print(fmt.format(x=x))
output
'MyString1_test{x:02d}_edit{x:04d}'
MyString1_test00_edit0000
MyString1_test01_edit0001
MyString1_test02_edit0002
MyString1_test03_edit0003
MyString1_test04_edit0004
How about this-
rawString = 'MyString1_test##_edit####'
splitString = rawString.split('_')
for i in xrange(10): # you may put any count
print '%s_%s%02d_%s%04d' % (splitString[0], splitString[1][0:4], i, splitString[2][0:4], i, )
You can try this naive (and probably not most efficient) solution. It assumes that the number of '#' is fixed.
rawString = 'MyString1_test##_edit####'
for i in range(1, 6):
temp = rawString.replace('####', str(i).zfill(4)).replace('##', str(i).zfill(2))
print(temp)
>> MyString1_test01_edit0001
MyString1_test02_edit0002
MyString1_test03_edit0003
MyString1_test04_edit0004
MyString1_test05_edit0005
test_string = 'MyString1_test##_edit####'
def count_hash(raw_string):
str_list = list(raw_string)
hash_count = str_list.count("#") + 1
for num in xrange(1, hash_count):
new_string = raw_string.replace("####", "000" + str(num))
new_string = new_string.replace("##", "0" + str(num))
print new_string
count_hash(test_string)
It's a bit clunky, and only works for # counts of less than 10, but seems to do what you want.
EDIT: By "only works" I mean that you'll get extra characters with the fixed number of # symbols inserted
EDIT2: amended code

Hex Coded Decimal in Python

I need to make a function that accepts an integer and returns a binary string of that integer encoded as Hex Coded Decimal, for later packing into a struct.
for example, I have written this:
def convert_int(x):
"""
Accepts an integer, outputs a hexadecimal string in HCD format
Caution! Byte order is ALREADY little endian!
"""
result = b''
while x > 0:
hcd = chr(int(str(divmod(x, 100)[1]), 16))
result = result + hcd
x = divmod(x, 100)[0]
return result
so convert_int(1234) would be 3412h and so on. What is the most Pythonic and elegant way of writing this?
upd: made the function output little endian strings ready for packing.
def convert_to_hcd(num):
chars = []
while num:
num, ones = divmod(num, 10)
num, tens = divmod(num, 10)
chars.append(chr(tens * 16 + ones))
chars.reverse()
return "".join(chars)
convert_to_hcd(1234) # => returns '\x124' (which is correct because '\x34' == '4')
So, the correct code for me is the following, note that the byte order is reversed (little endian):
def convert_int(x):
"""
Accepts an integer, outputs a hexadecimal string in HCD format
Caution! Byte order is ALREADY little endian!
"""
result = b''
while x > 0:
hcd = chr(int(str(divmod(x, 100)[1]), 16))
result = result + hcd
x = divmod(x, 100)[0]
return result
What about this little snippet... :)
def convert_to_hcd(num): return "".join([ "\\x"+ (lambda i, k: ""+i+k if i !='m' else "0"+k)(i,k) for i,k in zip((numif not len(num) % 2 else 'm' +num) [0::2], (numif not len(num) % 2 else 'm' +num[1::2])])
def intToHex(num):
numBin = [int(n) for n in str(num)]
result = 0
for n in numBin:
result = result*16 + n
return(hex(result))
this is clean
>>> hex(1234)
'0x4d2'
Google isn't your enemy...

Encoding a 128-bit integer in Python?

Inspired by the "encoding scheme" of the answer to this question, I implemented my own encoding algorithm in Python.
Here is what it looks like:
import random
from math import pow
from string import ascii_letters, digits
# RFC 2396 unreserved URI characters
unreserved = '-_.!~*\'()'
characters = ascii_letters + digits + unreserved
size = len(characters)
seq = range(0,size)
# Seed random generator with same randomly generated number
random.seed(914576904)
random.shuffle(seq)
dictionary = dict(zip(seq, characters))
reverse_dictionary = dict((v,k) for k,v in dictionary.iteritems())
def encode(n):
d = []
n = n
while n > 0:
qr = divmod(n, size)
n = qr[0]
d.append(qr[1])
chars = ''
for i in d:
chars += dictionary[i]
return chars
def decode(str):
d = []
for c in str:
d.append(reverse_dictionary[c])
value = 0
for i in range(0, len(d)):
value += d[i] * pow(size, i)
return value
The issue I'm running into is encoding and decoding very large integers. For example, this is how a large number is currently encoded and decoded:
s = encode(88291326719355847026813766449910520462)
# print s -> "3_r(AUqqMvPRkf~JXaWj8"
i = decode(s)
# print i -> "8.82913267194e+37"
# print long(i) -> "88291326719355843047833376688611262464"
The highest 16 places match up perfectly, but after those the number deviates from its original.
I assume this is a problem with the precision of extremely large integers when dividing in Python. Is there any way to circumvent this problem? Or is there another issue that I'm not aware of?
The problem lies within this line:
value += d[i] * pow(size, i)
It seems like you're using math.pow here instead of the built-in pow method. It returns a floating point number, so you lose accuracy for your large numbers. You should use the built-in pow or the ** operator or, even better, keep the current power of the base in an integer variable:
def decode(s):
d = [reverse_dictionary[c] for c in s]
result, power = 0, 1
for x in d:
result += x * power
power *= size
return result
It gives me the following result now:
print decode(encode(88291326719355847026813766449910520462))
# => 88291326719355847026813766449910520462

How to split big numbers?

I have a big number, which I need to split into smaller numbers in Python. I wrote the following code to swap between the two:
def split_number (num, part_size):
string = str(num)
string_size = len(string)
arr = []
pointer = 0
while pointer < string_size:
e = pointer + part_size
arr.append(int(string[pointer:e]))
pointer += part_size
return arr
def join_number(arr):
num = ""
for x in arr:
num += str(x)
return int(num)
But the number comes back different. It's hard to debug because the number is so large so before I go into that I thought I would post it here to see if there is a better way to do it or whether I'm missing something obvious.
Thanks a lot.
Clearly, any leading 0s in the "parts" can't be preserved by this operation. Can't join_number also receive the part_size argument, so that it can reconstruct the string formats with all the leading zeros?
Without some information such as part_size that's known to both the sender and receiver, or the equivalent (such as the base number to use for a similar split and join based on arithmetic, roughly equivalent to 10**part_size given the way you're using part_size), the task becomes quite a bit harder. If the receiver is initially clueless about this, why not just place the part_size (or base, etc) as the very first int in the arr list that's being sent and received? That way, the encoding trivially becomes "self-sufficient", i.e., doesn't need any supplementary parameter known to both sender and receiver.
There is no need to convert to and from strings, which can be very time consuming for really large numbers
>>> def split_number(n, part_size):
... base = 10**part_size
... L = []
... while n:
... n,part = divmod(n,base)
... L.append(part)
... return L[::-1]
...
>>> def join_number(L, part_size):
... base = 10**part_size
... n = 0
... L = L[::-1]
... while L:
... n = n*base+L.pop()
... return n
...
>>> print split_number(1000005,3)
[1, 0, 5]
>>> print join_number([1,0,5],3)
1000005
>>>
Here you can see that just converting the number to a str takes longer than my entire function!
>>> from time import time
>>> t=time();b = split_number(2**100000,3000);print time()-t
0.204252004623
>>> t=time();b = split_number(2**100000,30);print time()-t
0.486856222153
>>> t=time();b = str(2**100000);print time()-t
0.730905056
You should think of the following number split into 3-sized chunks:
1000005 -> 100 000 5
You have two problems. The first is that if you put those integers back together, you'll get:
100 0 5 -> 100005
(i.e., the middle one is 0, not 000) which is not what you started with. Second problem is that you're not sure what size the last part should be.
I would ensure that you're first using a string whose length is an exact multiple of the part size so you know exactly how big each part should be:
def split_number (num, part_size):
string = str(num)
string_size = len(string)
while string_size % part_size != 0:
string = "0%s"%(string)
string_size = string_size + 1
arr = []
pointer = 0
while pointer < string_size:
e = pointer + part_size
arr.append(int(string[pointer:e]))
pointer += part_size
return arr
Secondly, make sure that you put the parts back together with the right length for each part (ensuring you don't put leading zeros on the first part of course):
def join_number(arr, part_size):
fmt_str = "%%s%%0%dd"%(part_size)
num = arr[0]
for x in arr[1:]:
num = fmt_str%(num,int(x))
return int(num)
Tying it all together, the following complete program:
#!/usr/bin/python
def split_number (num, part_size):
string = str(num)
string_size = len(string)
while string_size % part_size != 0:
string = "0%s"%(string)
string_size = string_size + 1
arr = []
pointer = 0
while pointer < string_size:
e = pointer + part_size
arr.append(int(string[pointer:e]))
pointer += part_size
return arr
def join_number(arr, part_size):
fmt_str = "%%s%%0%dd"%(part_size)
num = arr[0]
for x in arr[1:]:
num = fmt_str%(num,int(x))
return int(num)
x = 1000005
print x
y = split_number(x,3)
print y
z = join_number(y,3)
print z
produces the output:
1000005
[1, 0, 5]
1000005
which shows that it goes back together.
Just keep in mind I haven't done Python for a few years. There's almost certainly a more "Pythonic" way to do it with those new-fangled lambdas and things (or whatever Python calls them) but, since your code was of the basic form, I just answered with the minimal changes required to get it working. Oh yeah, and be wary of negative numbers :-)
Here's some code for Alex Martelli's answer.
def digits(n, base):
while n:
yield n % base
n //= base
def split_number(n, part_size):
base = 10 ** part_size
return list(digits(n, base))
def join_number(digits, part_size):
base = 10 ** part_size
return sum(d * (base ** i) for i, d in enumerate(digits))

Categories

Resources