Looping through an 8-byte hex code in python - python

I have an 8-byte hex code and i'm trying to loop through 00-FF for each byte starting from the last to the first but i'm new to python and i'm stuck.
here's the code for looping through 00-FF:
for h in range(0, 256):
return "{:02x}".format(h)
Essentially what i have is a
hexcode = 'f20bdba6ff29eed7'
EDIT: I'm going to add a little background information on this and remove my previous explanation. I'm writing a Padding Attack aglorithm with DES. Here's what needs to happen:
hexcode = 'f20bdba6ff29eed7'
for i in range(0,8):
(FOR i = 0 ) Loop through 00-FF for the last byte (i.e. 'd7') until you find the value that works
(FOR i = 1 ) Loop through 00-FF for the 7th byte (i.e. 'ee') until you find the value that works
and so on until the 1st byte
MY PROBLEM: my problem is that i dont know how to loop through all 8 bytes of the hex. This is easy when it's the 8th byte, because i just remove the last two elements and loop through 00-FF and add it back to hex to see if its the correct code. Essentially what the code does is :
remove last two elements of hex
try 00 as the last two elements
if the test returns true then stop
if it doesn't move to 01 (cont. through FF, but stop when you find the correct value)
My issue is, when its the bytes in the middle (2-7) how do i loop for a middle value and then add it all back together. Essentially
For byte 7:
Remove elements 13,14 from hex
try 00 as element 13,14
add hex[0:12] + 00 + hex [15:16]
if 00 returns true then stop, if not then loop through 01-ff until it does
and so on for bytes 6,5,4,3,2,1

Here's a quick way to break a string into chunks. We assume it has even length.
hexcode = 'f20bdba6ff29eed7'
l = len(hexcode)
for n in range(0, len(hexcode)/2):
index = n * 2
print hexcode[index:index+2]
If you need to operate on the string representations, you could easily generate a list of two character bytecodes using something similar to this. Personally, I prefer to operate on the bytes, and use the hexcodes only for IO.
[hexcode[n*2:n*2+2] for n in range(len(hexcode)/2)]

Related

Fill up string of binary digits until certain length is reached

I have a string of ones and zeros, which typically has a length of 8*n, since they come from "n" bytes.
Now I want to arrange them into groups of five and I want to fill up the string with "0" until there are 16 5-bit "bytes" in total.
This is what I came up with but I can't figure out why this is not working.
while(len(binary_i) // (5*16) != 0):
binary_i = binary_i + "0"
Your problem seems to be that you're using // instead of %. Also, if you want to use a Python built-in alternative instead of using this while loop, try this:
binary_i.ljust(5*16, '0') # Fill `binary_i` using `0` up to 5*16 characters

How can I densely store large numbers in a file?

I need to store and handle huge amounts of very long numbers, which are in range from 0 to f 64 times (ffffffffff.....ffff).
If I store these numbers in a file, I need 1 byte for each character (digit) + 2 bytes for \n symbol = up to 66 bytes. However to represent all possible numbers we need not more than 34 bytes (4 bits represent digits from 0 to f, therefore 4 [bits] * 64 [amount of hex digits]/8 [bits a in byte] = 32 bytes + \n, of course).
Is there any way to store the number without consuming excess memory?
So far I have created converter from hex (with 16 digits per symbol) to a number with base of 76 (hex + all letters and some other symbols), which reduces size of a number to 41 + 2 bytes.
You are trying to store 32 bytes long. Why not just store them as binary numbers? That way you need to store only 32 bytes per number instead of 41 or whatever. You can add on all sorts of quasi-compression schemes to take advantage of things like most of your numbers being shorter than 32 bytes.
If your number is a string, convert it to an int first. Python3 ints are basically infinite precision, so you will not lose any information:
>>> num = '113AB87C877AAE3790'
>>> num = int(num, 16)
>>> num
317825918024297625488
Now you can convert the result to a byte array and write it to a file opened for binary writing:
with open('output.bin', 'wb') as file:
file.write(num.to_bytes(32, byteorder='big'))
The int method to_bytes converts your number to a string of bytes that can be placed in a file. You need to specify the string length and the order. 'big' makes it easier to read a hex dump of the file.
To read the file back and decode it using int.from_bytes in a similar manner:
with open('output.bin', 'rb') as file:
bytes = file.read(32)
num = int.from_bytes(bytes, byteorder='big')
Remember to always include the b in the file mode, or you may run into unexpected problems if you try to read or write data with codes for \n in it.
Both the read and write operation can be looped as a matter of course.
If you anticipate storing an even distribution of numbers, then see Mad Physicist's answer. However, If you anticipate storing mostly small numbers but need to be able to store a few large numbers, then these schemes may also be useful.
If you only need to account for integers that are 255 or fewer bytes (2040 or fewer bits) in length, then simply convert the int to a bytes object and store the length in an additional byte, like this:
# This was only tested with non-negative integers!
def encode(num):
assert isinstance(num, int)
# Convert the number to a byte array and strip away leading null bytes.
# You can also use byteorder="little" and rstrip.
# If the integer does not fit into 255 bytes, an OverflowError will be raised.
encoded = num.to_bytes(255, byteorder="big").lstrip(b'\0')
# Return the length of the integer in the first byte, followed by the encoded integer.
return bytes([len(encoded)]) + encoded
def encode_many(nums):
return b''.join(encode(num) for num in nums)
def decode_many(byte_array):
assert isinstance(byte_array, bytes)
result = []
start = 0
while start < len(byte_array):
# The first byte contains the length of the integer.
int_length = byte_array[start]
# Read int_length bytes and decode them as int.
new_int = int.from_bytes(byte_array[(start+1):(start+int_length+1)], byteorder="big")
# Add the new integer to the result list.
result.append(new_int)
start += int_length + 1
return result
To store integers of (practically) infinite length, you can use this scheme, based on variable-length quantities in the MIDI file format. First, the rules:
A byte has eight bits (for those who don't know).
In each byte except the last, the left-most bit (the highest-order bit) will be 1.
The lower seven bits (i.e. all bits except the left-most bit) in each byte, when concatenated together, form an integer with a variable number of bits.
Here are a few examples:
0 in binary is 00000000. It can be represented in one byte without modification as 00000000.
127 in binary is 01111111. It can be represented in one byte without modification as 01111111.
128 in binary is 10000000. It must be converted to a two-byte representation: 10000001 00000000. Let's break that down:
The left-most bit in the first byte is 1, which means that it is not the last byte.
The left-most bit in the second byte is 0, which means that it is the last byte.
The lower seven bits in the first byte are 0000001, and the lower seven bits in the second byte are 0000000. Concatenate those together, and you get 00000010000000, which is 128.
173249806138790 in binary is 100111011001000111011101001001101111110110100110.
To store it:
First, split the binary number into groups of seven bits: 0100111 0110010 0011101 1101001 0011011 1111011 0100110 (a leading 0 was added)
Then, add a 1 in front of each byte except the last, which gets a 0: 10100111 10110010 10011101 11101001 10011011 11111011 00100110
To retrieve it:
First, drop the first bit of each byte: 0100111 0110010 0011101 1101001 0011011 1111011 0100110
You are left with an array of seven-bit segments. Join them together: 100111011001000111011101001001101111110110100110
When that is converted to decimal, you get 173,249,806,138,790.
Why, you ask, do we make the left-most bit in the last byte of each number a 0? Well, doing that allows you to concatenate multiple numbers together without using line breaks. When writing the numbers to a file, just write them one after another. When reading the numbers from a file, use a loop that builds an array of integers, ending each integer whenever it detects a byte where the left-most bit is 0.
Here are two functions, encode and decode, which convert between int and bytes in Python 3.
# Important! These methods only work with non-negative integers!
def encode(num):
assert isinstance(num, int)
# If the number is 0, then just return a single null byte.
if num <= 0:
return b'\0'
# Otherwise...
result_bytes_reversed = []
while num > 0:
# Find the right-most seven bits in the integer.
current_seven_bit_segment = num & 0b1111111
# Change the left-most bit to a 1.
current_seven_bit_segment |= 0b10000000
# Add that to the result array.
result_bytes_reversed.append(current_seven_bit_segment)
# Chop off the right-most seven bits.
num = num >> 7
# Change the left-most bit in the lowest-order byte (which is first in the list) back to a 0.
result_bytes_reversed[0] &= 0b1111111
# Un-reverse the order of the bytes and convert the list into a byte string.
return bytes(reversed(result_bytes_reversed))
def decode(byte_array):
assert isinstance(byte_array, bytes)
result = 0
for part in byte_array:
# Shift the result over by seven bits.
result = result << 7
# Add in the right-most seven bits from this part.
result |= (part & 0b1111111)
return result
Here are two functions for working with lists of ints:
def encode_many(nums):
return [encode(num) for num in nums]
def decode_many(byte_array):
parts = []
# Split the byte array after each byte where the left-most bit is 0.
start = 0
for i, b in enumerate(byte_array):
# Check whether the left-most bit in this byte is 0.
if not (b & 0b10000000):
# Copy everything up to here into a new part.
parts.append(byte_array[start:(i+1)])
start = i + 1
return [decode(part) for part in parts]
The densest possible way without knowing more about the numbers would be 256 bits per number (32 bytes).
You can store them right after one another.
A function to write to a file might look like this:
def write_numbers(numbers, file):
for n in numbers:
file.write(n.to_bytes(32, 'big'))
with open('file_name', 'wb') as f:
write_numbers(get_numbers(), f)
And to read the numbers, you can make a function like this:
def read_numbers(file):
while True:
read = file.read(32)
if not read:
break
yield int.from_bytes(read, 'big')
with open('file_name', 'rb') as f:
for n in read_numbers(f):
do_stuff(n)

How do I represent a string as a number?

I need to represent a string as a number, however it is 8928313 characters long, note this string can contain more than just alphabet letters, and I have to be able to convert it back efficiently too. My current (too slow) code looks like this:
alpha = 'abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ,.?!#()+-=[]/*1234567890^*{}\'"$\\&#;|%<>:`~_'
alphaLeng = len(alpha)
def letterNumber(letters):
letters = str(letters)
cof = 1
nr = 0
for i in range(len(letters)):
nr += cof*alpha.find(letters[i])
cof *= alphaLeng
print(i,' ',len(letters))
return str(nr)
Ok, since other people are giving awful answers, I'm going to step in.
You shouldn't do this.
You shouldn't do this.
An integer and an array of characters are ultimately the same thing: bytes. You can access the values in the same way.
Most number representations cap out at 8 bytes (64-bits). You're looking at 8 MB, or 1 million times the largest integer representation. You shouldn't do this. Really.
You shouldn't do this. Your number will just be a custom, gigantic number type that would be identical under the hood.
If you really want to do this, despite all the reasons above, here's how...
Code
def lshift(a, b):
# bitwise left shift 8
return (a << (8 * b))
def string_to_int(data):
sum_ = 0
r = range(len(data)-1, -1, -1)
for a, b in zip(bytearray(data), r):
sum_ += lshift(a, b)
return sum_;
DONT DO THIS
Explanation
Characters are essentially bytes: they can be encoded in different ways, but ultimately you can treat them within a given encoding as a sequence of bytes. In order to convert them to a number, we can shift them left 8-bits for their position in the sequence, creating a unique number. r, the range value, is the position in reverse order: the 4th element needs to go left 24 bytes (3*8), etc.
After getting the range and converting our data to 8-bit integers, we can then transform the data and take the sum, giving us our unique identifier. It will be identical byte-wise (or in reverse byte-order) of the original number, but just "as a number". This is entirely futile. Don't do it.
Performance
Any performance is going to be outweighed by the fact that you're creating an identical object for no valid reason, but this solution is decently performant.
1,000 elements takes ~486 microseconds, 10,000 elements takes ~20.5 ms, while 100,000 elements takes about 1.5 seconds. It would work, but you shouldn't do it. This means it's scaled as O(n**2), which is likely due to memory overhead of reallocating the data each time the integer size gets larger. This might take ~4 hours to process all 8e6 elements (14365 seconds, calculated fitting the lower-order data to ax**2+bx+c). Remember, this is all to get the identical byte representation as the original data.
Futility
Remember, there are ~1e78 to 1e82 atoms in the entire universe, on current estimates. This is ~2^275. Your value will be able to represent 2^71426504, or about 260,000 times as many bits as you need to represent every atom in the universe. You don't need such a number. You never will.
If there are only ANSII characters. You can use ord() and chr().
built-in functions
There are several optimizations you can perform. For example, the find method requires searching through your string for the corresponding letter. A dictionary would be faster. Even faster might be (benchmark!) the chr function (if you're not too picky about the letter ordering) and the ord function to reverse the chr. But if you're not picky about ordering, it might be better if you just left-NULL-padded your string and treated it as a big binary number in memory if you don't need to display the value in any particular format.
You might get some speedup by iterating over characters instead of character indices. If you're using Python 2, a large range will be slow since a list needs to be generated (use xrange instead for Python 2); Python 3 uses a generator, so it's better.
Your print function is going to slow down output a fair bit, especially if you're outputting to a tty.
A big number library may also buy you speed-up: Handling big numbers in code
Your alpha.find() function needs to iterate through alpha on each loop.
You can probably speed things up by using a dict, as dictionary lookups are O(1):
alpha = 'abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ,.?!#()+-=[]/*1234567890^*{}\'"$\\&#;|%<>:`~_'
alpha_dict = { letter: index for index, letter in enumerate(alpha)}
print(alpha.find('$'))
# 83
print(alpha_dict['$'])
# 83
Store your strings in an array of distinct values; i.e. a string table. In your dataset, use a reference number. A reference number of n corresponds to the nth element of the string table array.

Python array.tostring - Explanation for the byte representation

I know that array.tostring gives the array of machine values. But I am trying to figure out how they are represented.
e.g
>>> a = array('l', [2])
>>> a.tostring()
'\x02\x00\x00\x00'
Here, I know that 'l' means each index will be min of 4 bytes and that's why we have 4 bytes in the tostring representation. But why is the Most significant bit populated with \x02. Shouldn't it be '\x00\x00\x00\x02'?
>>> a = array('l', [50,3])
>>> a.tostring()
'2\x00\x00\x00\x03\x00\x00\x00'
Here I am guessing the 2 in the beginning is because 50 is the ASCII value of 2, then why don't we have the corresponding char for ASCII value of 3 which is Ctrl-C
But why is the Most significant bit populated with \x02. Shouldn't it be '\x00\x00\x00\x02'?
The \x02 in '\x02\x00\x00\x00' is not the most significant byte. I guess you are confused by trying to read it as a hexadecimal number where the most significant digit is on the left. This is not how the string representation of an array returned by array.tostring() works. Bytes of the represented value are put together in a string left-to-right in the order from least significant to most significant. Just consider the array as a list of bytes, and the first (or, rather, 0th) byte is on the left, as is usual in regular python lists.
why don't we have the corresponding char for ASCII value of 3 which is Ctrl-C?
Do you have any example where python represents the character behind Ctrl-C as Ctrl-C or similar? Since the ASCII code 3 corresponds to an unprintable character and it has no corresponding escape sequence, hence it is represented through its hex code.

I'm trying to figure out simple list encryption in Python

So, I have to simulate some sort of encryption protocol. For example, I have list
Hey=['Z','A']
I then transform that list into ascii list using ord() command. No big deal. Problem is here. In order to encrypt I enter some shift value that will move the ascii value and then reform it back to a letter. It's all supposed to be capital letters ranging from A to Z, so ascii code ranges from 65 to 90. I've modified shift value, so that even if it's bigger than 26 it's still works fine. However, how do I modify ascii list itself, so that if one element of a list is bigger than 90 it shifts back?
I've tried this:
num=[ord(i)+shift for i in hey]
if num[i]>90:
num[i]=num[i]-26
However, shift will happen only if both (or all) elements of a list are bigger than 90. Is there a way to make that condition affect each element separately? So that if ascii value of one element is bigger than 90 then shift will happen, but another value will be unaffected until it becomes bigger than 90.
I think using the modulo operator % would be better here. This gets the remainder of a division.
Examples:
>>> 10 % 5
0
>>> 10 % 2
0
>>> 10 % 3
1
>>> 10 % 6
4
Using this, you could replace your code with this:
num = [(ord(i) + shift - 65) % 26 + 65 for i in hey]
This also works with large values of shift. Subtracting 26 means that you can still go out of range when shift >= 27.
num=[(ord(i)+shift) if ord(i) + shift <= 90 else (ord(i)+shift - 26) for i in hey]
although I think theres something wrong with your else ... you should probably wrap around to the begining of the ascii set

Categories

Resources