I want to generate 500 random binary strings of length 32 (i.e., from 00000...0 to 1111...1 where each string is length 32). I'm wondering how I could do this efficiently or Pythonically.
Right now, I randomly choose between 0 and 1 32 times and then append the results into a string. This seems pretty inefficient and not pythonic. Is there some library functions i could use for this?
import random
rnd_32bit = f'{random.getrandbits(32):=032b}'
Since python 3.9 you can do it this way:
from random import randbytes
def rand_32bit_string(byte_count):
return "{0:b}".format(int.from_bytes(randbytes(4), "big"))
Easiest way is probably to use the secrets library (part of the python standard library as of python 3.6) and its token_bytes function.
import secrets
s = secrets.token_bytes(nbytes=4)
This question has various solutions for converting from bytes to a str of bit digits. You could do the following:
s = bin(int(secrets.token_hex(nbytes=4), 16))[2:].rjust(32, '0')
Throw this in a for loop and you're done!
In python 3.9 you also have the option of random.randbytes, e.g. random.randbytes(4), which behaves similarly (though secrets will use a cryptographically secure source of randomness, if that matters to you).
Related
I’m using Python 2 and am attempting to performing sha256 on binary values using hashlib.
I’ve become a bit stuck as I’m quite new to it all but have cobbled together:
hashlib.sha256('0110100001100101011011000110110001101111’.decode('hex')).hexdigest()
I believe it interprets the string as hex based on substituting the hex value (‘68656c6c6f’) into the above and it returning
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
And comparing to this answer in which ‘hello’ or ‘68656c6c6f’ is used.
I think the answer lies with the decode component but I can’t find an example for binary only ‘hex’ or ‘utf-8’
Is anyone able to suggest what needs to be changed so that the function interprets as binary values instead of hex?
Here is code that does each of the data conversions you are looking for. These steps can all be combined, but are separated here so you can see each value.
import hashlib
import binascii
binstr = '0110100001100101011011000110110001101111'
hexstr = "{0:0>4X}".format(int(binstr,2)) # '68656C6C6F'
data = binascii.a2b_hex(hexstr) # 'hello'
output = hashlib.sha256(data).hexdigest()
print output
OUTPUT:
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
I am working on an encryption puzzle and am needing to take the exclusive or of two binary numbers (I'm using the operator package in Python). If I run operator.xor(1001111, 1100001) for instance I get the very weird output 2068086. Why doesn't it return 0101110 or at least 101110?
Because Python doesn't see that as binary numbers. Instead use:
operator.xor(0b1001111, 0b1100001)
The calculated answer is using the decimal values you provided, not their binary appearance. What you are really asking is...
1001111 ^ 1100001
When you mean is 79 ^ 97. Instead try using the binary literals as so...
0b1001111 ^ 0b1100001
See How do you express binary literals in Python? for more information.
Because 1001111 and 1100001 are not binary numbers. 1001111 is One million, one thousand, one hundred and eleven, while 1100001 is One million, one hundred thousands and one. Python doesn't recognize these as binary numbers. Binary numbers have to be prefixed with 0b to be recognized as binary numbers in Python/Python 3. So the correct way is this:
operator.xor(0b1001111, 0b1100001)
But hey! We get 46 as output. We should fix that. Thankfully, there IS a built-in in Python/Python 3. It's the function bin(n). That function prints a number a binary, prefixed with 0b. So our final code would be:
bin(operator.xor(0b1001111, 0b1100001))
If we want to hide the 0b (mostly in cases where that number is printed to the screen), we should use [2:] like this:
bin(operator.xor(0b1001111, 0b1100001))[2:]
A shorter way (warning looks like a tutorial for something you *should* already know)
Well, operator.xor() is too big for an operator :)
If that is the case (99.9%), instead you should use a^b. I think you already know this but why to import a whole module just for the xor operator? If you like to type the word xor instead, import the operator module like this: from operator import a, b.... Then use like this: bin(xor(a,b)). I hope you already know that stuff but I want to make sure you enjoy coding even more :)
Im writing my own version of ssl and in order to create a master key, I need to create 2 random numbers of 16 bytes and xor them.
can someone help me doing so?
i hope you do this for scientific purposes... ssl is huge. and - as always in crypto - a lot can go wrong with an implementation... good luck! but as an effort to study/improve e.g. openssl, that would be a very welcome effort!
generating random bytes:
starting from python 3.6 there is the secrets module in python. secrets.token_bytes(16) will output 16 random bytes.
from secrets import token_bytes
print(token_bytes(16))
for python <= 3.5:
import os
print(os.urandom(16))
xoring bytes
in order to xor the bytes a and b (which both have length 16)
byteorder = "little"
bytesize = 16
tmp_int = int.from_bytes(a, byteorder) ^ int.from_bytes(b, byteorder)
return tmp_int.to_bytes(16, byteorder)
What about
int(os.urandom(16).encode('hex'),16) ^ int(os.urandom(16).encode('hex'),16)
It is often operating system and computer (i.e. hardware) specific.
On Linux, you could use /dev/random (read 16 bytes from it) but read random(4) first.
Be very careful, it is a very sensitive issue and a lot of things can go silently wrong.
BTW, I don't think that rewriting SSL from scratch is reasonable (except for learning purposes).
Haskell and Python don't seem to agree on Murmurhash2 results. Python, Java, and PHP returned the same results but Haskell don't. Am I doing something wrong regarding Murmurhash2 on Haskell?
Here is my code for Haskell Murmurhash2:
import Data.Digest.Murmur32
main = do
print $ asWord32 $ hash32WithSeed 1 "woohoo"
And here is the code written in Python:
import murmur
if __name__ == "__main__":
print murmur.string_hash("woohoo", 1)
Python returned 3650852671 while Haskell returned 3966683799
From a quick inspection of the sources, it looks like the algorithm operates on 32 bits at a time. The Python version gets these by simply grabbing 4 bytes at a time from the input string, while the Haskell version converts each character to a single 32-bit Unicode index.
It's therefore not surprising that they yield different results.
The murmur-hash package (I am its author) does not promise to compute the same hashes as other languages. If you rely on hashes to be compatible with other software that computes hashes I suggest you create newtype wrappers that compute hashes the way you want them. For text, in particular, you need to at least specify the encoding. In your case you could convert the text to an ASCII string using Data.ByteString.Char8.pack, but that still doesn't give you the same hash since the ByteString instance is more of a placeholder.
BTW, I'm not actively improving that package because MurmurHash2 has been superseded by MurmurHash3, but I keep accepting patches.
I have a very big list of 0's and 1's that are represented as integers - by default - by python, I think: [randint(0, 1) for i in range(50*98)]
And I want to optimize the code so that it utilizes much less memory. The obvious way is to use just 1 bit to represent each of these numbers.
Is it possible to build a list of real binary numbers in python?
Regards,
Bruno
EDIT: Thank you all.
From the answers I found out Python doesn't do this by default, so I found this library (that is installed by Macports on OSX so it saves me some trouble) that does bit operations:
python-bitstring
This uses the bitstring module and constructs a BitArray object from your list:
from bitstring import BitArray
b = BitArray([randint(0, 1) for i in range(50*98)])
Internally this is now stored packed as bytes so will take considerably less memory. You can slice, index, check and set bits etc. with the usual notation and there additional methods such as set, all and any to modify the bits.
To get the data back as a binary string just use b.bin and to get out the byte packed data use b.tobytes() which will pad with zero bits up to a byte boundary.
As delnan already said in a comment, you will not be able to use real binary numbers if you mean bit-for-bit equivalent memory usage.
Integers (or longs) are of course real binary numbers in the meaning that you can address individual bits (using bit-wise operators, but that is easily hidden in a class). Also, long objects can become arbitrarily large, i.e. you can use them to simulate arbitrarily large bitsets. It is not going to be very fast if you do it in Python, but not very difficult either and a good start.
Using your binary generation scheme from above, you can do the following:
reduce(
lambda (a, p), b: (b << p | a, p + 1),
(random.randint(0, 1) for i in range(50*98)),
(0, 0)
)[0]
Of course, random supports arbitrarily large upper boundaries, so you can do just that:
r = random.randint(0, 2**(50*98))
This is not entirely the same, since the individual binary digits in the are not independent in the same way they are independent when you create each digit for itself. Then again, knowing you pRNGs work, they are not really independent in the other case, either. If this is a concern for you, you probably should not use the random module at all, but a hardware RNG.
It's called a bit vector, or a bitmap. Try e.g. BitVector. If you want to implement it yourself, you need to use a numeric object, not a list, and use bitwise operations to toggle bits, e.g.
bitmap = 0
bit = (1 << 24)
bitmap |= bit # enable bit
bitmap &= ~bit # disable bit
It seems you need some kind of bit-set. I'm not sure if this example entirely suits your needs, but it is worth a try.
Perhaps you could look into a lossless compression scheme for your data? Presumably there will be a lot of redundancy in such a list.