Related
This question already has an answer here:
Why do I get an int when I index bytes?
(1 answer)
Closed 1 year ago.
Intro
Hi, I'm doing some fun cryptoanalysis exercises from cryptopals, and I have now encountered an 'issue', that also has happened earlier, and I really don't understand.
currently, I have a ciphertext that I read from a file in the following way:
with open("6.txt", 'r') as infile:
b64_encoded = infile.readlines()
ciphertext = b64decode('\n'.join([x.strip() for x in b64_encoded if x != ""]))
It's now a bytes objec, and looks like this when printed (this is just an excerpt):
b'\x1dB\x1fM\x0b\x0f\x02\x1fO\x13N<\x1aie\x1fI\x1c\x0eN\x13\x01\x0b\x07N\x1b\x01\x16E6\x00\x1e\x01Id T\x1d\x1dC3SNeR\x06\x00GT\x1c\rEM\x07\x04\x0cS\x12<\x0c\x1e\x08I\x1a\t\x11O\x14L!\x1aG+\x00\x05\x1dGY\x11\x04\t\x00d&\x07S\x007\x16\x06\x0c\x1a\x17A\x1d\x01RT0_\x00 \x13\n\x05GO\x12H\x08ENe>\x16\t8E\x06\x05\x08\x1aF\x07O\x1fYx~jb6\x0c\x1d\x0fA\rH\x06U\x1a\x1b\x00\x1dBt\x04\x1e\x01I\x1a\t\x11\x02Rz\x7fI\x00H:\x00\x1a\x13I\x1aOEH\x0f\x1d\rS\x04:\x01R\x19\x01\x0bA\x13\x06\x00L1_Sb\x15\x06\x07\t\x07T\x0b\x17A\x14\x16Iy35\x0b\x1b\x01\x05\x0fF\x07O\x1dNxNH\'R\x04\x07\x0cEXH\x08A\x00O T\x08t\x0b\x1d\x19I\x02\x00\x0e\x16\\\x00R0ie\x1fI\x02\x02T\x00\x01\x0b\x07N\x02\x10S\x01&\x10\x15M\x02\x07\x02\x1fO\x1bNx0i6R\n\x01\tT\x06\x07\tSN\x02\x10S\x08;\x10\x06\x05I\x0f\x0f\x10O;\x00:_G+\x1cId3OT\x02\ (...)
context
I have gotten to a point in the exercise, where I need to transpose this ciphertext, in accordance with a certain keylength that I know k, such that I can get a collection of strings where each string n, contains all the ciphertext characters that would have been encrypted with the n'th chararcter of the key.
That means that if I call my function transpose(ciphertext, keyLen) with arguments transpose("123456789", 3), then my output would be:
[['1', '4', '7'], ['2', '5', '8'], ['3', '6', '9']]
Problem
Ok, the transpose function I have made looks like this:
def transpose(string, n):
buckets = [[] for i in range(n)]
i = 0
for c in string:
buckets[i].append( c)
i += 1
if i > (n-1):
i = 0
return buckets
When I use it on a string, it works just as expected, and outputs the expected output for "123456789".
But When I pass my 'bytes' ciphertext, then the output looks like this:
[[29, 54, 60, 55, 56, 116, 58, 53, 116, 38, 59, 116, 94, 58, 55, 49, 57, 57, 59, 61, 53, 34, 53, 58, 59, 116, 36, 32, 101, 48, 60, 51, 58, 116, 53, 58, 115, 116, 116, 49, 33, 13, 116, 60, 54, 59, 122, 59, 44, 53, 53, 116, 49, 50, 116, 45, 51, 49, 45, 59, 53, 38, 116, 94, 60, 48, 94, 59, 116, 49, 116, 54, 55, 116, 116, 58, 115, 33, 49, 59, 53, 58, 32, 49, 38, 53, 53, 45, 59, 116, 45, 61, 59, 94, 54, 32, 48, 55, 120, 39], [66, 0, 12, 22, 69, 4, 1, 11, 11, 16, 16, 12, 40, 66, 4, 69, 11, 0, 9, 11, 22, 0, 11, 66, 11, 10, 9, 69, 72, 69, 28, 12, 69, 111, 14, 69, 22, 13, 17, 23, 17, 10, 11, 69, 9, 11, 69, 8, 12, 69, 11, 44, 69, 23, 34, 12, 13, 23, 69, 69, 23, 10, 8, 54, 17, 69, 60, 18, 12, 4, 60, 10, 4, 4, 9, 69, 69, 23, 73, 8, 23, 28, 69, 69, 28, 28, 28, 69, 69, 111, 69, 0, 8, 53, 10, 13, 0, 73, 69, 12], [31, 30, 30, 6, 6, 30, 82, 27, 29, 21, 6, 6, 11, 94, 7, 120, 94, 82, 82, 85, 23, 82, 82, 82, 23, 20, 19, 7, 64, 120, 31, 30, 23, 59, 23, 95, 82, 19, 29, 94, 82, 7, 29, 31, 11, 83, 36, 27, 17, 5, 22, 17, 19, 23, 30, 28, 82, 23, 6, 26, 6, 7, 27, 2, 82, 31, 29, 28, 28, 0, 61, 22, 28, 28, 11, 17, 19, 23, 82, 23, 23, 6, 6, 22, 16, 82, 94, 21, 5, 62, 6, 92, 23, 30, 11, 19, 0, 82, 49, 17], [77, 1, 8, 12, 5, 1, 25, 1, 25, 77, 5, 77, 77, 77, 30, 44, 77, 103, 25, 77, 77, 0, 9, 29, 77, 11, 20, 29, 64, 43 (...)
And now my bytes have been converted to their integer representations?
This does'nt really make sense to me, since all I am doing is to iterate through the bytes and place them in buckets.
Why is it that these bytes are turned into integers if all you do is iterate over them?
Ah, I remember this problem in cryptopals, I was facepalming when I understood how this trick works.
As MisterMiyagi said it, bytes is not a sequence of "bytes", but a sequence of ints.
If you index into an str, you get another str:
>>> type("abc"[0])
<class 'str'>
But with bytes:
>>> type(b"abc"[0])
<class 'int'>
So it's your for loop that 'converts' them to int. This can be directly done by list:
>>> list(b"abc")
[97, 98, 99]
But the reverse is also easily possible:
>>> bytes([97, 98, 99])
b'abc'
Line 10 of the python code below has an UnboundLocalError. Can anyone please teach me how to fix this?
def answer(data, n):
new_data = []
for each_integer in data:
new_data = [each_integer for each_integer in data if data.count(each_integer) == n]
if n > 1:
new_data = data
print("\n\nNew Data")
print(new_data)
supplied_data = [53, 85, 29, 23, 29, 26, 88, 78, 5, 75, 74, 44, 33, 62, 98, 50, 89, 93, 24, 14, 74, 49, 83, 45, 41, 14, 68, 76, 68, 8, 77, 85, 17, 3, 9, 30, 71, 48, 18, 25, 86, 55, 55, 20, 74, 76, 99, 87, 59, 87, 36, 29, 29, 8, 22, 65, 1, 18, 23, 5, 13, 60, 7, 5, 98, 61, 78, 64, 36, 60, 49, 57, 31, 32, 41, 86, 52, 90, 9, 55, 35, 35, 2, 44, 8, 19, 96, 81, 68, 7, 8, 51, 9, 76, 12, 96, 61, 99, 74]
answer(supplied_data, 0)
answer(supplied_data, 1)
answer(supplied_data, 6)
The Traceback
>>> def answer(data, n):
... for each_integer in data:
... new_data = [each_integer for each_integer in data if data.count(each_integer) == n]
... if n > 1:
... new_data = data
... print("\n\nNew Data")
... print(new_data)
...
... supplied_data = [53, 85, 29, 23, 29, 26, 88, 78, 5, 75, 74, 44, 33, 62, 98, 50, 89, 93, 24, 14, 74, 49, 83, 45, 41, 14, 68, 76, 68, 8, 77, 85, 17, 3, 9, 30, 71, 48, 18, 25, 86, 55, 55, 20,
74, 76, 99, 87, 59, 87, 36, 29, 29, 8, 22, 65, 1, 18, 23, 5, 13, 60, 7, 5, 98, 61, 78, 64, 36, 60, 49, 57, 31, 32, 41, 86, 52, 90, 9, 55, 35, 35, 2, 44, 8, 19, 96, 81, 68, 7, 8, 51, 9, 76, 12
, 96, 61, 99, 74]
File "", line 9
supplied_data = [53, 85, 29, 23, 29, 26, 88, 78, 5, 75, 74, 44, 33, 62, 98, 50, 89, 93, 24, 14, 74, 49, 83, 45, 41, 14, 68, 76, 68, 8, 77, 85, 17, 3, 9, 30, 71, 48, 18, 25, 86, 55, 55, 20,
74, 76, 99, 87, 59, 87, 36, 29, 29, 8, 22, 65, 1, 18, 23, 5, 13, 60, 7, 5, 98, 61, 78, 64, 36, 60, 49, 57, 31, 32, 41, 86, 52, 90, 9, 55, 35, 35, 2, 44, 8, 19, 96, 81, 68, 7, 8, 51, 9, 76, 12
, 96, 61, 99, 74]
^
SyntaxError: invalid syntax
>>> answer(supplied_data, 0)
Traceback (most recent call last):
File "", line 1, in
NameError: name 'answer' is not defined
>>> answer(supplied_data, 1)
Traceback (most recent call last):
File "", line 1, in
NameError: name 'answer' is not defined
>>> answer(supplied_data, 6)
Traceback (most recent call last):
File "", line 1, in
NameError: name 'answer' is not defined
>>>
The problem is you have a branch in your logic that uses new_data but it is possible that new_data is never defined. This will happen when data is empty, so you never enter the for-loop body and create new_data, and n <= 1, i.e. you enter the else-block, where you use new_data without it being defined.
See:
>>> answer([],1)
New Data
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 10, in answer
UnboundLocalError: local variable 'new_data' referenced before assignment
You could solve this quickly by putting new_data = data at the top of your function. Honestly, this approach is really inefficient because it works in quadratic time. some_list.count iterates over the entire list each time! So, it is better to make one pass to count element, and then another pass to filter - this will be linear time:
>>> from collections import Counter
>>> def answer(data, n):
... counts = Counter(data)
... return [e for e in data if counts[e] == n]
...
>>> answer(data, 6)
[]
>>> answer(data, 2)
[85, 23, 78, 44, 98, 14, 49, 41, 14, 85, 18, 86, 99, 87, 87, 36, 18, 23, 60, 7, 98, 61, 78, 36, 60, 49, 41, 86, 35, 35, 44, 96, 7, 96, 61, 99]
>>> answer(data, 3)
[5, 68, 76, 68, 9, 55, 55, 76, 5, 5, 9, 55, 68, 9, 76]
>>> answer(data, 4)
[29, 29, 74, 74, 8, 74, 29, 29, 8, 8, 8, 74]
>>> answer(data, 5)
I have a large file filled with integers separated by white space and comma. I am trying to read in 1KB at a time and convert it into a list of integers.
This code works fine:
with open('test_age.txt', 'r+') as inf:
with open('test_age_out.txt', 'r+') as outf:
sorted_list =[]
a = [x.strip() for x in inf.read(1000).split(',')]
int_a = map(int, a)
f = tempfile.TemporaryFile()
outf_array = sorted(int_a)
f.write(str(outf_array))
f.seek(0)
#etc...
output:
[1, 1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, etc...
But once I add in a while loop to read the next 1KB:
with open('test_age.txt', 'r+') as inf:
with open('test_age_out.txt', 'r+') as outf:
sorted_list =[]
while True:
a = [x.strip() for x in inf.read(1000).split(',')]
int_a = map(int, a)
if not a:
break
f = tempfile.TemporaryFile()
outf_array = sorted(int_a)
print outf_array
f.write(str(outf_array))
f.seek(0)
I get the output and a ValueError:
[1, 1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8,
8, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 12, 12, 12,
12, 12, 12, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 15, 15, 16, 17, 18,
19, 19, 20, 20, 20, 20, 21, 21, 22, 22, 22, 23, 23, 24, 24, 24, 24, 25,
25, 25, 25, 25, 26, 26, 26, 26, 27, 27, 27, 28, 28, 29, 30, 30, 30, 30,
31, 31, 31, 32, 32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 35, 35,
35, 35, 35, 36, 36, 37, 37, 37, 37, 38, 38, 39, 39, 39, 39, 39, 39, 40,
40, 40, 40, 41, 41, 42, 43, 43, 43, 44, 44, 44, 44, 44, 45, 46, 46, 46,
46, 47, 47, 47, 47, 47, 48, 48, 48, 48, 48, 48, 49, 49, 49, 50, 50, 50,
50, 50, 50, 51, 51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 52, 53, 53, 54,
54, 54, 55, 55, 55, 55, 56, 56, 56, 56, 56, 57, 57, 57, 57, 58, 58, 58,
59, 59, 60, 60, 60, 61, 62, 62, 62, 62, 63, 63, 63, 63, 63, 63, 63, 64,
64, 64, 65, 66, 66, 67, 67, 67, 67, 68, 68, 68, 68, 68, 69, 69, 69, 69,
69, 69, 69, 70, 70, 70, 70, 71, 71, 72, 72, 73, 74, 74, 74, 75, 76, 76,
76, 76, 77, 77, 77, 77, 78, 78, 79, 79, 79, 79, 81, 81, 81, 81, 82, 82,
82, 82, 82, 83, 83, 83, 83, 84, 85, 85, 85, 85, 86, 86, 86, 87, 87, 87,
87, 87, 87, 88, 88, 88, 88, 88, 88, 88, 89, 89, 89, 89, 90, 90, 90, 91,
91, 91, 91, 91, 91, 91, 92, 92, 93, 93, 93, 94, 94, 94, 94, 95, 95,
96, 96, 96, 97, 97, 98, 99, 100, 100, 100, 100, 100]
[2, 3, 3, 3, 3, 4, 4, 5, 5, 6, 8, 9, 10, 10, 11, 11, 11, 11, 12, 12,12,
13, 14, 15, 17, 17, 17, 17, 17, 17, 18, 18, 18, 20, 21, 22, 22, 22, 22,
23, 23, 24, 24, 24, 26, 27, 27, 27, 27, 28, 28, 29, 29, 29, 29, 30, 32,
32, 32, 32, 33, 33, 34, 34, 36, 37, 37, 37, 37, 38, 39, 41, 41, 42, 43,
44, 44, 46, 46, 47, 48, 49, 49, 49, 49, 51, 51, 52, 52, 52, 52, 53, 54,
54, 54, 55, 55, 56, 60, 60, 61, 61, 61, 62, 63, 63, 64, 65, 65, 65, 65,
66, 66, 67, 68, 68, 68, 70, 70, 73, 73, 73, 74, 74, 75, 75, 75, 77, 77,
77, 77, 78, 78, 78, 78, 79, 80, 81, 81, 82, 82, 83, 83, 83, 83, 84, 84,
85, 85, 85, 85, 86, 87, 88, 90, 91, 91, 91, 92, 93, 93, 93, 94, 95, 97,
98, 98, 99, 100]
int_a = map(int, a)
ValueError: invalid literal for int() with base 10: ''
I am not sure why this is happening. If I call print, it seems as if the lists ARE being created and sorted. However the ValueError exists. What gives?
Look at the output of str.split with a passed delimiter appearing at the head or tail of a string:
>>> ', 3, 5'.split(', ')
['', '3', '5']
That empty string is what your program is trying (and failing) to parse as an integer. ''.strip() doesn't help (and isn't necessary for int(), by the way - it automatically ignores leading and trailing whitespace). I recommend reading blocks that are guaranteed to be full and valid, such as lines. If the file is just one big line, you'll have to do some extra work to save the last characters from a line and move them into the next line's processing. Don't forget to process the remaining characters after the loop.
line = inf.read(1000)
new += line
current, delimiter, new = line.rpartition(', ')
# process current
# continue loop to add more content
If the file can comfortably fit in your system's memory, you could just read the entire file and split it in one go:
numbers = map(int, inf.read().split(', '))
i am a beginner and i would like to know how to remove the multiples of 11 and 4 from this the range. I would like to include all other numbers excluding 4, 11 and their variable. Is there a way of doing this without individual writing each code snippet?
for i in range(1,101):
print (2**i)-1
>>> [i for i in range(1,101) if i%4!=0 and i%11!=0]
[1, 2, 3, 5, 6, 7, 9,
10, 13, 14, 15, 17, 18, 19,
21, 23, 25, 26, 27, 29,
30, 31, 34, 35, 37, 38, 39,
41, 42, 43, 45, 46, 47, 49,
50, 51, 53, 54, 57, 58, 59,
61, 62, 63, 65, 67, 69,
70, 71, 73, 74, 75, 78, 79,
81, 82, 83, 85, 86, 87, 89,
90, 91, 93, 94, 95, 97, 98]
I'm using python 3.2.3 IDLE and this is my code:
originalList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
newList = orginalList[0.05:0.95] #<<<<I have no idea what I'm doing here
print (newList)
I have an original list of numbers, they are 1 - 100 and i want to make a new list from the original list however the new list must only have data that belongs to the sub-range 5%- 95% of the original list
so the new list must be like [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18....95]. How do i do that? i know my newList code is wrong
originalList.sort()
newList = originalList[int(len(originalList) * .05) : int(len(originalList) * .95)]
sl = slice(4, 95)
print(originalList[sl])
Also see http://docs.python.org/2/library/functions.html#slice
size = len(originalList)
newList = originalList[0.05*size - 1:0.95*size + 1]
If you want to get part of a list, the syntax is
List = [1,2,3,4,5,6,7,8,9,10]
newList = [*start index*:*Index to end AT*]
so, the first number is the index where the sub-list starts, while the second number is the index at which the sublist stops (that index is not included).
hope this helps!
I'd also use a list comprehension for creating the original list... less mistake prone.
originalList = range(1,101)
newList = originalList[(len(originalList)*.05)-1:len(originalList)*.95]
print newList
Gives the desired result...
Edit: Changed range to be more concise per comment below.
For lists of arbitrary length, you could do:
>>> l = range(200)
>>> percentage = 5
>>> skip = int(len(l) * (float(percentage) / 100) / 2)
>>> len(l[skip:-skip])
190
You could use the fidx module, which allows percentages as indexes:
import fidx
originalList = fidx([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100])
# or better: originalList = fidx.list(range(1,101))
newList = originalList[0.05:0.95]
print (newList)
which returns
[6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95]