Many online gambling games use a function that converts a hash into a decimal from 0-(usually 2^52).
Here's some code I grabbed that works fine, but I don't understand why it works:
def get_result(hash):
hm = hmac.new(str.encode(hash),b'', hashlib.sha256) #hashing object
h = hm.hexdigest() #hex digest, 32 bytes 256 bit
print(h) #Something like 848ab848c6486d4f64
c = int(h,16)
print(c) #numbers only, 77 numbers long...?
if (c % 33 == 0):
return 1
h = int(h[:13],16)
return (((100 * E - h) / (E - h)) // 1) / 100.0
The part of the code that I don't understand is the conversion from h to c. h is a hex digest, so it is base-16. The python documentation says that the int(a,b) function converts the string a into a base-b integer. Here's my question:
How can an integer number be base-16? Isn't the definition of decimal base-10 (0-9)? Where do the extra 6 come from?
As far as I'm aware, a single hex digit can be stored by 4 bits, or 1/2 a byte. So a hex string of 64 length will occupy 32 bytes. Does this mean that any base of this data will also be 32 bytes? (converting the hex string to base-n, n being anything)
What does the fact that the c variable is always 77 digits long mean?
How can an integer number be base-16? Where do the extra 6 come from?
This is known as the hexadecimal system.
Isn't the definition of decimal base-10 (0-9)?
Integer and decimal are not synonyms. You can have a integer in base 2 instead of base 10.
As far as I'm aware, a single hex digit can be stored by 4 bits, or 1/2 a byte. So a hex string of 64 length will occupy 32 bytes.
There are two different concepts: a hex string and a hex integer.
When you type in Python, for example, "8ff", you're creating a hex string of length 3. A string is an array of characters. A character is (under the hood) a 1-byte integer. Therefore, you're storing 3 bytes¹ (about your second statement, a hex string of length 64 will actually occupy 64 bytes).
Now, when you type in Python 0x8ff, you're creating a hex integer of 3 digits. If you print it, it'll show 2303, because of the conversion from base-16 (8ff, hex) to base-10 (2303, dec). A single integer stores 4 bytes², so you're storing 4 bytes.
Does this mean that any base of this data will also be 32 bytes? (converting the hex string to base-n, n being anything)
It depends, what type of data?
A string of length 3 will always occupy 3 bytes (let's ignore Unicode), it doesn't matter if its "8ff" or "123".
A string of length 10 will always occupy 10 bytes, it doesn't matter if its "85d8afff" or "ef08c0e38e".
An integer will always occupy 4 bytes³, it doesn't matter if its 10 or 1000000.
What does the fact that the c variable is always 77 digits long mean?
As #flakes noted, that's because 2^256 ~= 1.16e+77 in decimal.
¹ Actually a string of length 3 stores 4 bytes: three for its characters and one for the null terminator.
¹ Let's ignore that integers in Python are unbounded.
² If it's lesser than 2,147,483,647 (signed) or 4,294,967,295 (unsigned).
Related
I want to convert my int number into an hex one specifying the number of characters in my final hex representation.
This is my simple code that takes an input an converts it into hex:
my_number = int(1234)
hex_version = hex(my_number)
This code returns a string equal to 0x4d2.
However I would like my output to contain 16 characters, so basically it should be 0x00000000000004d2.
Is there a way to specify the number of output character to the hex() operator? So that it pads for the needed amount of 0.
From Python's Format Specification Mini-Language:
n = int(1234)
h = format(n, '#018x')
The above will generate the required string. The magic number 18 is obtained as follows: 16 for the width you need + 2 for '0' (zero) and 'x' (for the hex descriptor string prefix).
It seems base58 and base56 conversion treat input data as a single Big Endian number; an unsigned bigint number.
If I'm encoding some integers into shorter strings by trying to use base58 or base56 it seems in some implementations the integer is taken as a native (little endian in my case) representation of bytes and then converted to a string, while in other implementations the number is converted to big endian representation first. It seems the loose specifications of these encoding don't clarify which approach is right. Is there an explicit specification of which to do, or a more wildly popular option of the two I'm not aware of?
I was trying to compare some methods of making a short URL. The source is actually a 10 digit number that's less than 4 billion. In this case I was thinking to make it an unsigned 4 byte integer, possibly Little Endian, and then encode it with a few options (with alphabets):
base64 A…Za…z0…9+/
base64 url-safe A…Za…z0…9-_
Z85 0…9a…zA…Z.-:+=^!/*?&<>()[]{}#%$#
base58 1…9A…HJ…NP…Za…km…z (excluding 0IOl+/ from base64 & reordered)
base56 2…9A…HJ…NP…Za…kmnp…z (excluding 1o from base58)
So like, base16, base32 and base64 make pretty good sense in that they're taking 4, 5 or 6 bits of input data at a time and looking them up in an alphabet index. The latter uses 4 symbols per 3 bytes. Straightforward, and this works for any data.
The other 3 have me finding various implementations that disagree with each other as to the right output. The problem appears to be that no amount of bytes has a fixed number of lookups in these. EG taking 2^1 to 2^100 and getting the remainders for 56, 58 and 85 results in no remainders of 0.
Z85 (ascii85 and base85 etal.) approach this by grabbing 4 bytes at a time and encoding them to 5 symbols and accepting some waste. But there's byte alignment to some degree here (base64 has alignment per 16 symbols, Z85 gets there with 5). But the alphabet is … not great for urls, command-line, nor sgml/xml use.
base58 and base56 seem intent on treating the input bytes like a Big Endian ordered bigint and repeating: % base; lookup; -= % base; /= base on the input bigint. Which… I mean, I think that ends up modifying most of the input for every iteration.
For my input that's not a huge performance concern though.
Because we shouldn't treat the input as string data, or we get output longer than the 10 digit decimal number input and what's the point in that, does anyone know of any indication of which kind of processing for the output results in something canonical for base56 or base58?
Have the Little Endian 4 byte word of the 10 digit number (<4*10^10) turned into a sequence of bytes that represent a different number if Big Endian, and convert that by repeating the steps.
Have the 10 digit number (<4*10^10) represented in 4 bytes Big Endian before converting that by repeating the steps.
I'm leaning towards going the route of the 2nd way.
For example given the number: 3003295320
The little endian representation is 58 a6 02 b3
The big endian representation is b3 02 a6 58, Meaning
base64 gives:
>>> base64.b64encode(int.to_bytes(3003295320,4,'little'))
b'WKYCsw=='
>>> base64.b64encode(int.to_bytes(3003295320,4,'big'))
b'swKmWA=='
>>> base64.b64encode('3003295320'.encode('ascii'))
b'MzAwMzI5NTMyMA==' # Definitely not using this
Z85 gives:
>>> encode(int.to_bytes(3003295320,4,'little'))
b'sF=ea'
>>> encode(int.to_bytes(3003295320,4,'big'))
b'VJv1a'
>>> encode('003003295320'.encode('ascii')) # padding to 4 byte boundary
b'fFCppfF+EAh8v0w' # Definitely not using this
base58 gives:
>>> base58.b58encode(int.to_bytes(3003295320,4,'little'))
b'3GRfwp'
>>> base58.b58encode(int.to_bytes(3003295320,4,'big'))
b'5aPg4o'
>>> base58.b58encode('3003295320')
b'3soMTaEYSLkS4w' # Still not using this
base56 gives:
>>> b56encode(int.to_bytes(3003295320,4,'little'))
b'4HSgyr'
>>> b56encode(int.to_bytes(3003295320,4,'big'))
b'6bQh5q'
>>> b56encode('3003295320')
b'4uqNUbFZTMmT5y' # Longer than 10 digits so...
I need to store and handle huge amounts of very long numbers, which are in range from 0 to f 64 times (ffffffffff.....ffff).
If I store these numbers in a file, I need 1 byte for each character (digit) + 2 bytes for \n symbol = up to 66 bytes. However to represent all possible numbers we need not more than 34 bytes (4 bits represent digits from 0 to f, therefore 4 [bits] * 64 [amount of hex digits]/8 [bits a in byte] = 32 bytes + \n, of course).
Is there any way to store the number without consuming excess memory?
So far I have created converter from hex (with 16 digits per symbol) to a number with base of 76 (hex + all letters and some other symbols), which reduces size of a number to 41 + 2 bytes.
You are trying to store 32 bytes long. Why not just store them as binary numbers? That way you need to store only 32 bytes per number instead of 41 or whatever. You can add on all sorts of quasi-compression schemes to take advantage of things like most of your numbers being shorter than 32 bytes.
If your number is a string, convert it to an int first. Python3 ints are basically infinite precision, so you will not lose any information:
>>> num = '113AB87C877AAE3790'
>>> num = int(num, 16)
>>> num
317825918024297625488
Now you can convert the result to a byte array and write it to a file opened for binary writing:
with open('output.bin', 'wb') as file:
file.write(num.to_bytes(32, byteorder='big'))
The int method to_bytes converts your number to a string of bytes that can be placed in a file. You need to specify the string length and the order. 'big' makes it easier to read a hex dump of the file.
To read the file back and decode it using int.from_bytes in a similar manner:
with open('output.bin', 'rb') as file:
bytes = file.read(32)
num = int.from_bytes(bytes, byteorder='big')
Remember to always include the b in the file mode, or you may run into unexpected problems if you try to read or write data with codes for \n in it.
Both the read and write operation can be looped as a matter of course.
If you anticipate storing an even distribution of numbers, then see Mad Physicist's answer. However, If you anticipate storing mostly small numbers but need to be able to store a few large numbers, then these schemes may also be useful.
If you only need to account for integers that are 255 or fewer bytes (2040 or fewer bits) in length, then simply convert the int to a bytes object and store the length in an additional byte, like this:
# This was only tested with non-negative integers!
def encode(num):
assert isinstance(num, int)
# Convert the number to a byte array and strip away leading null bytes.
# You can also use byteorder="little" and rstrip.
# If the integer does not fit into 255 bytes, an OverflowError will be raised.
encoded = num.to_bytes(255, byteorder="big").lstrip(b'\0')
# Return the length of the integer in the first byte, followed by the encoded integer.
return bytes([len(encoded)]) + encoded
def encode_many(nums):
return b''.join(encode(num) for num in nums)
def decode_many(byte_array):
assert isinstance(byte_array, bytes)
result = []
start = 0
while start < len(byte_array):
# The first byte contains the length of the integer.
int_length = byte_array[start]
# Read int_length bytes and decode them as int.
new_int = int.from_bytes(byte_array[(start+1):(start+int_length+1)], byteorder="big")
# Add the new integer to the result list.
result.append(new_int)
start += int_length + 1
return result
To store integers of (practically) infinite length, you can use this scheme, based on variable-length quantities in the MIDI file format. First, the rules:
A byte has eight bits (for those who don't know).
In each byte except the last, the left-most bit (the highest-order bit) will be 1.
The lower seven bits (i.e. all bits except the left-most bit) in each byte, when concatenated together, form an integer with a variable number of bits.
Here are a few examples:
0 in binary is 00000000. It can be represented in one byte without modification as 00000000.
127 in binary is 01111111. It can be represented in one byte without modification as 01111111.
128 in binary is 10000000. It must be converted to a two-byte representation: 10000001 00000000. Let's break that down:
The left-most bit in the first byte is 1, which means that it is not the last byte.
The left-most bit in the second byte is 0, which means that it is the last byte.
The lower seven bits in the first byte are 0000001, and the lower seven bits in the second byte are 0000000. Concatenate those together, and you get 00000010000000, which is 128.
173249806138790 in binary is 100111011001000111011101001001101111110110100110.
To store it:
First, split the binary number into groups of seven bits: 0100111 0110010 0011101 1101001 0011011 1111011 0100110 (a leading 0 was added)
Then, add a 1 in front of each byte except the last, which gets a 0: 10100111 10110010 10011101 11101001 10011011 11111011 00100110
To retrieve it:
First, drop the first bit of each byte: 0100111 0110010 0011101 1101001 0011011 1111011 0100110
You are left with an array of seven-bit segments. Join them together: 100111011001000111011101001001101111110110100110
When that is converted to decimal, you get 173,249,806,138,790.
Why, you ask, do we make the left-most bit in the last byte of each number a 0? Well, doing that allows you to concatenate multiple numbers together without using line breaks. When writing the numbers to a file, just write them one after another. When reading the numbers from a file, use a loop that builds an array of integers, ending each integer whenever it detects a byte where the left-most bit is 0.
Here are two functions, encode and decode, which convert between int and bytes in Python 3.
# Important! These methods only work with non-negative integers!
def encode(num):
assert isinstance(num, int)
# If the number is 0, then just return a single null byte.
if num <= 0:
return b'\0'
# Otherwise...
result_bytes_reversed = []
while num > 0:
# Find the right-most seven bits in the integer.
current_seven_bit_segment = num & 0b1111111
# Change the left-most bit to a 1.
current_seven_bit_segment |= 0b10000000
# Add that to the result array.
result_bytes_reversed.append(current_seven_bit_segment)
# Chop off the right-most seven bits.
num = num >> 7
# Change the left-most bit in the lowest-order byte (which is first in the list) back to a 0.
result_bytes_reversed[0] &= 0b1111111
# Un-reverse the order of the bytes and convert the list into a byte string.
return bytes(reversed(result_bytes_reversed))
def decode(byte_array):
assert isinstance(byte_array, bytes)
result = 0
for part in byte_array:
# Shift the result over by seven bits.
result = result << 7
# Add in the right-most seven bits from this part.
result |= (part & 0b1111111)
return result
Here are two functions for working with lists of ints:
def encode_many(nums):
return [encode(num) for num in nums]
def decode_many(byte_array):
parts = []
# Split the byte array after each byte where the left-most bit is 0.
start = 0
for i, b in enumerate(byte_array):
# Check whether the left-most bit in this byte is 0.
if not (b & 0b10000000):
# Copy everything up to here into a new part.
parts.append(byte_array[start:(i+1)])
start = i + 1
return [decode(part) for part in parts]
The densest possible way without knowing more about the numbers would be 256 bits per number (32 bytes).
You can store them right after one another.
A function to write to a file might look like this:
def write_numbers(numbers, file):
for n in numbers:
file.write(n.to_bytes(32, 'big'))
with open('file_name', 'wb') as f:
write_numbers(get_numbers(), f)
And to read the numbers, you can make a function like this:
def read_numbers(file):
while True:
read = file.read(32)
if not read:
break
yield int.from_bytes(read, 'big')
with open('file_name', 'rb') as f:
for n in read_numbers(f):
do_stuff(n)
I have a function that accepts 'data' as a parameter. Being new to python I wasn't really sure that that was even a type.
I noticed when printing something of that type it would be
b'h'
if I encoded the letter h. Which dosen't make a ton of sense to me. Is there a way to define bits in python, such as 1 or 0. I guess b'h' must be in hex? Is there a way for me to simply define an eight bit string
bits1 = 10100000
You're conflating a number of unrelated things.
First of all, (in Python 3), quoted literals prefixed with b are of type bytes -- that means a string of raw byte values. Example:
x = b'abc'
print(type(x)) # will output `<class 'bytes'>`
This is in contrast to the str type, which is a (Unicode) string.
Integer literals can be expressed in binary using an 0b prefix, e.g.
y = 0b10100000
print(y) # Will output 160
For what I know, 'data' is not a type. Your function (probably) accepts anything you pass to it, regardless of its type.
Now, b'h' means "the number (int) whose binary sequence maps to the char ´h´", this is not hexadecimal, but a number with possibly 8 bits (1 byte, which is the standard size for int and char).
The ASCII code for ´h´ is 104 (decimal), written in binary that would be b'\b01101000', or in hexa b'\x68'.
So, here is the answer I think you are looking for: if you want to code an 8-bit int from its binary representation just type b'\b01101000' (for 104). I would recommend to use hexa instead, to make it more compact and readable. In hexa, every four bits make a symbol from 0 to f, and the symbols can be concatenated every four bits to form a larger number. So the bit sequence 01101000 is written b'\b0110\b1000' or b'\x6\x8', which can be written as b'\x68'. The preceding b, before the quote marks tells python to interpret the string as a binary sequence expressed in the base defined by \b or \x (or \d for decimal), instead of using escape characters.
I know that array.tostring gives the array of machine values. But I am trying to figure out how they are represented.
e.g
>>> a = array('l', [2])
>>> a.tostring()
'\x02\x00\x00\x00'
Here, I know that 'l' means each index will be min of 4 bytes and that's why we have 4 bytes in the tostring representation. But why is the Most significant bit populated with \x02. Shouldn't it be '\x00\x00\x00\x02'?
>>> a = array('l', [50,3])
>>> a.tostring()
'2\x00\x00\x00\x03\x00\x00\x00'
Here I am guessing the 2 in the beginning is because 50 is the ASCII value of 2, then why don't we have the corresponding char for ASCII value of 3 which is Ctrl-C
But why is the Most significant bit populated with \x02. Shouldn't it be '\x00\x00\x00\x02'?
The \x02 in '\x02\x00\x00\x00' is not the most significant byte. I guess you are confused by trying to read it as a hexadecimal number where the most significant digit is on the left. This is not how the string representation of an array returned by array.tostring() works. Bytes of the represented value are put together in a string left-to-right in the order from least significant to most significant. Just consider the array as a list of bytes, and the first (or, rather, 0th) byte is on the left, as is usual in regular python lists.
why don't we have the corresponding char for ASCII value of 3 which is Ctrl-C?
Do you have any example where python represents the character behind Ctrl-C as Ctrl-C or similar? Since the ASCII code 3 corresponds to an unprintable character and it has no corresponding escape sequence, hence it is represented through its hex code.