How to make a decent R function for rainbow table? - python

I'm trying to develop a rainbow table. The search function works well, but the problem is when I try to generate the table.
There are 54 possible characters and the passwords are 3 characters long (for now). I can generate a table with 1024 lines, and 153 columns. In theory, if there were no collisions, there would be a >95% chance that I crack the password (1024*153 ≈ 54^3).
But I'm getting 126662 collisions.... Here's my R function
def reduct(length, hashData, k):
### Initializing variables and mapping ###
while len(pwd) != length:
pwdChar = (int(hashData[0], 16) + int(hashData[1], 16) + int(hashData[2], 16) + int(hashData[3], 16) - 7 + 3*k) % 54
hashData = hashData[3:]
pwd += mapping[pwdChar][1]
return pwd
How can this function result in so many collisions? The maximum sum of the first 4 nibbles can be 60 so -7 ensures it's between 0 and 53 (equal chance for all chars). +3k makes it different for every column and % 54 to make sure it fits in the mapping.

Related

A variable is spontaneously growing when I attempt to assign it to a list index

Essentially, I have a list of hashes. I need to modify each one slightly by inserting a section of the hash in front of it. A bit random, I admit, but all of the main code works fine.
import hashlib
bytes_to_hash = b"this is an interesting bytes object"
cycles = 65
digest_length = 10000
hash_list = [
hashlib.shake_256(bytes_to_hash).digest(digest_length),
hashlib.shake_256(bytes_to_hash + b"123").digest(digest_length)
]
for _ in range(cycles):
number = 0 #will be used to generate the next hash later on
for x in range(4):
selection = int(hashlib.sha256(hash_list[-1] + str(x).encode()).hexdigest(), 16) % digest_length
#^ used to pseudorandomly select a number
for count, hash in enumerate(hash_list):
number += hash[selection]
prev_hash = hash_list[count - 1]
prev_hash = prev_hash[:selection - 512] + hash[selection - 512 : selection] + prev_hash[selection:]
#hash_list[count - 1] = prev_hash
#^ this is the issue
number = number % digest_length
selection_range = hash_list[-1][number - 128 : number]
hash_list.append(hashlib.shake_256(selection_range).digest(digest_length))
The problem comes when I attempt to actually modify the list itself. This simple line screws everything up -
hash_list[count - 1] = prev_hash
After adding this line, the script suddenly takes a very long time of to execute, up from about 75 ms to over 100 seconds. A sanity check of removing that line and inserting print(len(prev_hash)) proved that the length of prev_hash is always 10,000, exactly what I expected. It is only when I attempt to assign prev_hash to a list index that the length reaches as high as 19,000,000 for no apparent reason. The weirdest part is that after reintroducing that line, it seems that the length of prev_hash itself is what's actually expanding, not the value in the list.
I have no idea what I'm doing wrong, any help is appreciated.

Reindexing error within np.where and for loop

I have a large CSV table with ~1,200 objects. I am narrowing down those to a volume limited sample (VLS) of 326 objects by setting certain parameters (only certain distances, etc.)
Within this VLS I am using a for loop to count the number of specific types of objects. I don't want to just count the entire VLS at once though, instead it'll count in "sections" (think of drawing boxes on a scatter plot and counting what's in each box).
I'm pretty sure my issue is because of the way pandas reads in the columns of my CSV table and the "box" array I have can't talk to the columns that are "dtype: object."
I don't expect someone to have a perfect fix for this, but even pointing me to some specific and relevant information on pandas would be helpful and appreciated. I try reading the documentation for pandas, but I don't understand much.
This is how I read in the CSV table and my columns in case it's relevant:
file = pd.read_csv(r'~/Downloads/CSV')
#more columns than this, but they're all defined like this in my code
blend = file["blend"]
dec = file["dec"]
When I define my VLS inside the definition of the section I'm looking at (named 'box') the code does work, and the for loop properly counts the objects.
This is what it looks like when it works:
color = np.array([-1,0,1])
for i in color:
box1 = np.where((constant box parameters) & (variable par >= i)&
(variable par < i+1) &('Volume-limited parameters I wont list'))[0]
binaries = np.where(blend[box1].str[:1].eq('Y'))[0]
candidates = np.where(blend[box1].str[0].eq('?'))[0]
singles = np.where(blend[box1].str[0].eq('N'))[0]
print ("from", i, "to", i+1, "there are", len(binaries), "binaries,", len(candidates), "candidates,", len(singles), "singles.")
# Correct Output:
"from -1 to 0 there are 7 binaries, 1 candidates, 78 singles."
"from 0 to 1 there are 3 binaries, 1 candidates, 24 singles."
"from 1 to 2 there are 13 binaries, 6 candidates, 69 singles."
The problem, is that I don't want to include the parameters for my VLS in the np.where() for "box". This is how I would like my code to look:
vollim = np.where((dec >= -30)&(dec <= 60) &(p_anglemas/err_p_anglemas
>= 5) &(dist<=25) &(err_j_k_mag < 0.2))[0]
j_k_mag_vl = j_k_mag[vollim]
abs_jmag_vl = abs_jmag[vollim]
blend_vl = blend[vollim]
hires_vl = hires[vollim]
#%%
color = np.array([-1,0,1])
for i in color:
box2 = np.where((abs_jmag_vl >= 13)&(abs_jmag_vl <= 16) &
(j_k_mag_vl >= i)&(j_k_mag_vl < i+1))[0]
binaries = np.where(blend_vl[box2].str[:1].eq('Y'))[0]
candidates = np.where(blend_vl[box2].str[0].eq('?'))[0]
singles = np.where(blend_vl[box2].str[0].eq('N'))[0]
print ("from", i, "to", i+1, "there are", len(binaries), "binaries,", len(candidates), "candidates,", len(singles), "singles.")
#Wrong Output:
"from -1 to 0 there are 4 binaries, 1 candidates, 22 singles."
"from 0 to 1 there are 1 binaries, 0 candidates, 5 singles."
"from 1 to 2 there are 4 binaries, 0 candidates, 14 singles."
When I print blend_vl[box2] a lot of the elements for blend_vl have been changed from their regular strings to "NaN" which I do not understand.
When I print box1 and box2 they are the same lengths but they are different indices.
I think blend_vl[box2] would work properly if I changed blend_vl into a flat array?
I know this is a lot of information at once, but I appreciate any input. Even if just some more info about how pandas and arrays. TIA!!

Shortest possible generated unique ID

So we can generate a unique id with str(uuid.uuid4()), which is 36 characters long.
Is there another method to generate a unique ID which is shorter in terms of characters?
EDIT:
If ID is usable as primary key then even better
Granularity should be better than 1ms
This code could be distributed, so we can't assume time independence.
If this is for use as a primary key field in db, consider just using auto-incrementing integer instead.
str(uuid.uuid4()) is 36 chars but it has four useless dashes (-) in it, and it's limited to 0-9 a-f.
Better uuid4 in 32 chars:
>>> uuid.uuid4().hex
'b327fc1b6a2343e48af311343fc3f5a8'
Or just b64 encode and slice some urandom bytes (up to you to guarantee uniqueness):
>>> base64.b64encode(os.urandom(32))[:8]
b'iR4hZqs9'
TLDR
Most of the times it's better to work with numbers internally and encode them to short IDs externally. So here's a function for Python3, PowerShell & VBA that will convert an int32 to an alphanumeric ID. Use it like this:
int32_to_id(225204568)
'F2AXP8'
For distributed code use ULIDs: https://github.com/mdipierro/ulid
They are much longer but unique across different machines.
How short are the IDs?
It will encode about half a billion IDs in 6 characters so it's as compact as possible while still using only non-ambiguous digits and letters.
How can I get even shorter IDs?
If you want even more compact IDs/codes/Serial Numbers, you can easily expand the character set by just changing the chars="..." definition. For example if you allow all lower and upper case letters you can have 56 billion IDs within the same 6 characters. Adding a few symbols (like ~!##$%^&*()_+-=) gives you 208 billion IDs.
So why didn't you go for the shortest possible IDs?
The character set I'm using in my code has an advantage: It generates IDs that are easy to copy-paste (no symbols so double clicking selects the whole ID), easy to read without mistakes (no look-alike characters like 2 and Z) and rather easy to communicate verbally (only upper case letters). Sticking to numeric digits only is your best option for verbal communication but they are not compact.
I'm convinced: show me the code
Python 3
def int32_to_id(n):
if n==0: return "0"
chars="0123456789ACEFHJKLMNPRTUVWXY"
length=len(chars)
result=""
remain=n
while remain>0:
pos = remain % length
remain = remain // length
result = chars[pos] + result
return result
PowerShell
function int32_to_id($n){
$chars="0123456789ACEFHJKLMNPRTUVWXY"
$length=$chars.length
$result=""; $remain=[int]$n
do {
$pos = $remain % $length
$remain = [int][Math]::Floor($remain / $length)
$result = $chars[$pos] + $result
} while ($remain -gt 0)
$result
}
VBA
Function int32_to_id(n)
Dim chars$, length, result$, remain, pos
If n = 0 Then int32_to_id = "0": Exit Function
chars$ = "0123456789ACEFHJKLMNPRTUVWXY"
length = Len(chars$)
result$ = ""
remain = n
Do While (remain > 0)
pos = remain Mod length
remain = Int(remain / length)
result$ = Mid(chars$, pos + 1, 1) + result$
Loop
int32_to_id = result
End Function
Function id_to_int32(id$)
Dim chars$, length, result, remain, pos, value, power
chars$ = "0123456789ACEFHJKLMNPRTUVWXY"
length = Len(chars$)
result = 0
power = 1
For pos = Len(id$) To 1 Step -1
result = result + (InStr(chars$, Mid(id$, pos, 1)) - 1) * power
power = power * length
Next
id_to_int32 = result
End Function
Public Sub test_id_to_int32()
Dim i
For i = 0 To 28 ^ 3
If id_to_int32(int32_to_id(i)) <> i Then Debug.Print "Error, i=", i, "int32_to_id(i)", int32_to_id(i), "id_to_int32('" & int32_to_id(i) & "')", id_to_int32(int32_to_id(i))
Next
Debug.Print "Done testing"
End Sub
Yes. Just use the current UTC millis. This number never repeats.
const uniqueID = new Date().getTime();
EDIT
If you have the rather seldom requirement to produce more than one ID within the same millisecond, this method is of no use as this number‘s granularity is 1ms.

Resolution of the knapsack approach by bruteforce in python

I'm actually trying to resolve the knapsack problem with bruteforce. I know it's not efficient at all, I just want to implement it in python.
The problem is it take long time. In my point of view too much time for a bruteforce. So maybe I make some mistakes in my code...
def solve_it(input_data):
# Start the counting clock
start_time = time.time()
# Parse the input
lines = input_data.split('\n')
firstLine = lines[0].split()
item_count = int(firstLine[0])
capacity = int(firstLine[1])
items = []
for i in range(1, item_count+1):
line = lines[i]
parts = line.split()
items.append(Item(i-1, int(parts[0]), int(parts[1])))
# a trivial greedy algorithm for filling the knapsack
# it takes items in-order until the knapsack is full
value = 0
weight = 0
best_value = 0
my_list_combinations = list()
our_range = 2 ** (item_count)
print(our_range)
output = ""
for i in range(our_range):
# for exemple if item_count is 7 then 2 in binary is
# 0000010
binary = binary_repr(i, width=item_count)
# print the value every 0.25%
if (i % (our_range/400) == 0):
print("i : " + str(i) + '/' + str(our_range) + ' ' +
str((i * 100.0) / our_range) + '%')
elapsed_time_secs = time.time() - start_time
print "Execution: %s secs" % \
timedelta(seconds=round(elapsed_time_secs))
my_list_combinations = (tuple(binary))
sum_weight = 0
sum_value = 0
for item in items:
sum_weight += int(my_list_combinations[item.index]) * \
int(item.weight)
if sum_weight <= capacity:
for item in items:
sum_value += int(my_list_combinations[item.index]) * \
int(item.value)
if sum_value > best_value:
best_value = sum_value
output = 'The decision variable is : ' + \
str(my_list_combinations) + \
' with a total value of : ' + str(sum_value) + \
' for a weight of : ' + str(sum_weight) + '\n'
return output
Here is the file containing the 30 objects :
30 100000 # 30 objects with a maximum weight of 100000
90000 90001
89750 89751
10001 10002
89500 89501
10252 10254
89250 89251
10503 10506
89000 89001
10754 10758
88750 88751
11005 11010
88500 88501
11256 11262
88250 88251
11507 11514
88000 88001
11758 11766
87750 87751
12009 12018
87500 87501
12260 12270
87250 87251
12511 12522
87000 87001
12762 12774
86750 86751
13013 13026
86500 86501
13264 13278
86250 86251
I dont show the code relative to the reading of the file because I think it's pointless... For 19 objects I'm able to solve the problem with bruteforce in 14 seconds. But for 30 objects I have calculated that it would take me roughly 15h. So I think that there is a problem in my computing...
Any help would be appreciated :)
Astrus
Your issue, that solving the knapsack problem takes too long, is indeed frustrating, and it pops up in other places where algorithms are high-order polynomial or non-polynomial. You're seeing what it means for an algorithm to have exponential runtime ;) In other words, whether your python code is efficient or not, you can very easily construct a version of the knapsack problem that your computer won't be able to solve within your lifetime.
Exponential running time means that every time you add another object to your list, the brute-force solution will take twice as long. If you can solve for 19 objects in 14 seconds, that suggests that 30 objects will take 14 secs x (2**11) = 28672 secs = about 8 hours. To do 31 objects might take about 16 hours. Etc.
There are dynamic programming approaches to the knapsack problem which trade off runtime for memory (see the Wikipedia page), and there are numerical optimisers which are tuned to solve constraint-based problems very quickly (again, Wikipedia), but none of this really alters the fact that finding exact solutions to the knapsack problem is just plain hard.
TL;DR: You're able to solve for 19 objects in a reasonable amount of time, but not for 30 (or 100 objects). This is a property of the problem you're solving, and not a shortcoming with your Python code.

Matlab Concatenating Row Vectors With A For Loop

Here is what I did:
w = zeros(1,28);
e = zeros(1,63) + 1;
r = zeros(1,90) + 2;
t = zeros(1,100) + 3;
y = zeros(1,90) + 4;
u = zeros(1,63) + 5;
i = zeros(1,28) + 6;
qa = horzcat(w,e,r,t,y,u,i);
hist(qa,25,0.5)
h = findobj(gca,'Type','patch');
set(h,'FaceColor',[.955 0 0],'EdgeColor','w');
I would like to achieve the effect, but it in a more succinct way. This is my attempt:
v= zeros(1,28);
for i=2:8
v(i) = horzcat(v(i-1) + (i-1));
end
And the error I receive is "Cell contents assignment to a non-cell array object."
Also, would anyone know what the python equivalent would be, if it is not too much to ask?
You can also achieve this without a for loop, albeit somewhat less intiutive. But hey, it's without loops! Besides, it gives you freedom to pick a different set of values.
v=[0;1;2;3;4;5;6]; %values
r=[28 63 90 100 90 63 28]; %number of repeats
qa=zeros(sum(r),1);
qa(cumsum([1 r(1:end-1)]))=1;
qa=v(cumsum(qa));
You don't need a cell array to concatenate vectors for which one of the dimensions always remain the same (for example, row or columns, in your case, row).
You can define the sizes in a separate array and then use for loop as follows.
szArray=[28 63 90 100 90 63 28];
qa=[];
for i=0:length(szArray)-1
%multiplying by i replicates the addition of a scalar you have done.
qa=[qa i*ones(1,szArray(i+1)];
end
This is still hardcoding. It will only apply to the exact problem you have mentioned above.

Categories

Resources