Convert integer to a random but deterministically repeatable choice - python

How do I convert an unsigned integer (representing a user ID) to a random looking but actually a deterministically repeatable choice? The choice must be selected with equal probability (irrespective of the distribution of the the input integers). For example, if I have 3 choices, i.e. [0, 1, 2], the user ID 123 may always be randomly assigned choice 2, whereas the user ID 234 may always be assigned choice 1.
Cross-language and cross-platform algorithmic reproducibility is desirable. I'm inclined to use a hash function and modulo unless there is a better way. Here is what I have:
>>> num_choices = 3
>>> id_num = 123
>>> int(hashlib.sha256(str(id_num).encode()).hexdigest(), 16) % num_choices
2
I'm using the latest stable Python 3. Please note that this question is similar but not exactly identical to the related question to convert a string to random but deterministically repeatable uniform probability.

Using hash and modulo
import hashlib
def id_to_choice(id_num, num_choices):
id_bytes = id_num.to_bytes((id_num.bit_length() + 7) // 8, 'big')
id_hash = hashlib.sha512(id_bytes)
id_hash_int = int.from_bytes(id_hash.digest(), 'big') # Uses explicit byteorder for system-agnostic reproducibility
choice = id_hash_int % num_choices # Use with small num_choices only
return choice
>>> id_to_choice(123, 3)
0
>>> id_to_choice(456, 3)
1
Notes:
The built-in
hash
method must not be used because it can preserve the input's
distribution, e.g. with hash(123). Alternatively, it can return values that differ when Python is restarted, e.g. with hash('123').
For converting an int to bytes, bytes(id_num) works but is grossly inefficient as it returns an array of null bytes, and so it must not be used. Using int.to_bytes is better. Using str(id_num).encode() works but wastes a few bytes.
Admittedly, using modulo doesn't offer exactly uniform probability,[1][2] but this shouldn't bias much for this application because id_hash_int is expected to be very large and num_choices is assumed to be small.
Using random
The random module can be used with id_num as its seed, while addressing concerns surrounding both thread safety and continuity. Using randrange in this manner is comparable to and simpler than hashing the seed and taking modulo.
With this approach, not only is cross-language reproducibility a concern, but reproducibility across multiple future versions of Python could also be a concern. It is therefore not recommended.
import random
def id_to_choice(id_num, num_choices):
localrandom = random.Random(id_num)
choice = localrandom.randrange(num_choices)
return choice
>>> id_to_choice(123, 3)
0
>>> id_to_choice(456, 3)
2

An alternative is to encrypt the user ID. If you keep the encryption key the same, then each input number will encrypt to a different output number up to the block size of the cipher you use. DES uses 64 bit blocks which cover IDs 000000 to 18446744073709551615. That will give a random appearing replacement for the user ID, which is guaranteed not to give two different user IDs the same 'random' number because encryption is a one-to-one permutation of the block values.

I apologize I don't have Python implementation but I do have very clear, readable and self evident implementation in Java which should be easy to translate into Python with minimal effort. The following produce long predictable evenly distributed sequences covering all range except zero
XorShift ( http://www.arklyffe.com/main/2010/08/29/xorshift-pseudorandom-number-generator )
public int nextQuickInt(int number) {
number ^= number << 11;
number ^= number >>> 7;
number ^= number << 16;
return number;
}
public short nextQuickShort(short number) {
number ^= number << 11;
number ^= number >>> 5;
number ^= number << 3;
return number;
}
public long nextQuickLong(long number) {
number ^= number << 21;
number ^= number >>> 35;
number ^= number << 4;
return number;
}
or XorShift128Plus (need to re-seed state0 and state1 to non-zero values before using, http://xoroshiro.di.unimi.it/xorshift128plus.c)
public class XorShift128Plus {
private long state0, state1; // One of these shouldn't be zero
public long nextLong() {
long state1 = this.state0;
long state0 = this.state0 = this.state1;
state1 ^= state1 << 23;
return (this.state1 = state1 ^ state0 ^ (state1 >> 18) ^ (state0 >> 5)) + state0;
}
public void reseed(...) {
this.state0 = ...;
this.state1 = ...;
}
}
or XorOshiro128Plus (http://xoroshiro.di.unimi.it/)
public class XorOshiro128Plus {
private long state0, state1; // One of these shouldn't be zero
public long nextLong() {
long state0 = this.state0;
long state1 = this.state1;
long result = state0 + state1;
state1 ^= state0;
this.state0 = Long.rotateLeft(state0, 55) ^ state1 ^ (state1 << 14);
this.state1 = Long.rotateLeft(state1, 36);
return result;
}
public void reseed() {
}
}
or SplitMix64 (http://xoroshiro.di.unimi.it/splitmix64.c)
public class SplitMix64 {
private long state;
public long nextLong() {
long result = (state += 0x9E3779B97F4A7C15L);
result = (result ^ (result >> 30)) * 0xBF58476D1CE4E5B9L;
result = (result ^ (result >> 27)) * 0x94D049BB133111EBL;
return result ^ (result >> 31);
}
public void reseed() {
this.state = ...;
}
}
or XorShift1024Mult (http://xoroshiro.di.unimi.it/xorshift1024star.c) or Pcg64_32 (http://www.pcg-random.org/, http://www.pcg-random.org/download.html)

The simplest method is to modulo user_id by number of options:
choice = user_id % number_of_options
It's very easy and fast. However if you know user_id's you may to guess an algorithm.
Also, pseudorandom sequences can be obtained from random seeded with user constants (e.g. user_id):
>>> import random
>>> def generate_random_value(user_id):
... random.seed(user_id)
... return random.randint(1, 10000)
...
>>> [generate_random_value(x) for x in range(20)]
[6312, 2202, 927, 3899, 3868, 4186, 9402, 5306, 3715, 7586, 9362, 7412, 7776, 4244, 1751, 3424, 5924, 8553, 2970, 709]
>>> [generate_random_value(x) for x in range(20)]
[6312, 2202, 927, 3899, 3868, 4186, 9402, 5306, 3715, 7586, 9362, 7412, 7776, 4244, 1751, 3424, 5924, 8553, 2970, 709]
>>>

Related

implement the logic in c++ using python, how?

I want to implement below logic in c++ using python.
struct hash_string ///
{
hash_string() {}
uint32_t operator ()(const std::string &text) const
{
//std::cout << text << std::endl;
static const uint32_t primes[16] =
{
0x01EE5DB9, 0x491408C3, 0x0465FB69, 0x421F0141,
0x2E7D036B, 0x2D41C7B9, 0x58C0EF0D, 0x7B15A53B,
0x7C9D3761, 0x5ABB9B0B, 0x24109367, 0x5A5B741F,
0x6B9F12E9, 0x71BA7809, 0x081F69CD, 0x4D9B740B,
};
//std::cout << text.size() << std::endl;
uint32_t sum = 0;
for (size_t i = 0; i != text.size(); i ++) {
sum += primes[i & 15] * (unsigned char)text[i];
//std::cout << text[i] <<std::endl;
// std::cout << (unsigned char)text[i] << std::endl;
}
return sum;
}
};
python version is like this, which is not completed yet, since I haven't found a way to convert text to unsigned char. So, please help!
# -*- coding: utf-8 -*-
text = u'连衣裙女韩范'
primes = [0x01EE5DB9, 0x491408C3, 0x0465FB69, 0x421F0141,
0x2E7D036B, 0x2D41C7B9, 0x58C0EF0D, 0x7B15A53B,
0x7C9D3761, 0x5ABB9B0B, 0x24109367, 0x5A5B741F,
0x6B9F12E9, 0x71BA7809, 0x081F69CD, 0x4D9B740B]
//*text[i] does not work (of course), but how to mimic the logic above
rand = [primes[i & 15]***text[i]** for i in range(len(text))]
print rand
sum_agg = sum(rand)
print sum_agg
Take text=u'连衣裙女韩范' for example, c++ version returns 18 for text.size() and sum is 2422173716, while, in python, I don't know how to make it 18.
The equality of text size is essential, as a start at least.
Because you are using unicode, for an exact reproduction you will need to turn text in a series of bytes (chars in c++).
bytes_ = text.encode("utf8")
# when iterated over this will yield ints (in python 3)
# or single character strings in python 2
You should use more pythonic idioms for iterating over a pair of sequences
pairs = zip(bytes_, primes)
What if bytes_ is longer than primes? Use itertools.cycle
from itertools import cycle
pairs = zip(bytes_, cycle(primes))
All together:
from itertools import cycle
text = u'连衣裙女韩范'
primes = [0x01EE5DB9, 0x491408C3, 0x0465FB69, 0x421F0141,
0x2E7D036B, 0x2D41C7B9, 0x58C0EF0D, 0x7B15A53B,
0x7C9D3761, 0x5ABB9B0B, 0x24109367, 0x5A5B741F,
0x6B9F12E9, 0x71BA7809, 0x081F69CD, 0x4D9B740B]
# if python 3
rand = [byte * prime for byte, prime in zip(text.encode("utf8"), cycle(primes))]
# else if python 2 (use ord to convert single character string to int)
rand = [ord(byte) * prime for byte, prime in zip(text.encode("utf8"), cycle(primes))]
hash_ = sum(rand)

Python pack / unpack converts to Objective C

I have one Python script and I wants to convert in objective C.
from struct import *
data = [10000,10000,10000,10]
d = [int(i) for i in data]
print d
list = unpack('BBBBBBBBBBBBBBBB',pack('>IIII', d[0], d[1], d[2], d[3]))
print list
Output
[10000, 10000, 10000, 10]
(0, 0, 39, 16, 0, 0, 39, 16, 0, 0, 39, 16, 0, 0, 0, 10)
I have did first int array conversion in objective C, But stuck at pack and unpack
Objective C for first part
NSArray *timouts = [NSArray arrayWithObjects:[NSString stringWithFormat:#"10000"],[NSString stringWithFormat:#"10000"],[NSString stringWithFormat:#"10000"],[NSString stringWithFormat:#"10"],nil];
NSMutableArray *ary = [NSMutableArray arrayWithCapacity:4];
NSInteger coutner = 0;
for (NSString *string in timouts)
{
int outVal;
NSScanner* scanner = [NSScanner scannerWithString:string];
[scanner scanInt:&outVal];
ary[coutner] = [NSString stringWithFormat:#"%d",outVal];
coutner++;
}
I have try to do so, But not much aware with Python scripting. Not got the way how pack and unpack works.
First of all I want to say that you should try to learn Objective-C on a better level. It is not too hard (esp. coming from Python, because both languages are dynamically typed). However, I will give you some advises to your Q.
Let's have a closer view to your code:
A.
You have:
NSArray *timouts = [NSArray arrayWithObjects:[NSString stringWithFormat:#"10000"],[NSString stringWithFormat:#"10000"],[NSString stringWithFormat:#"10000"],[NSString stringWithFormat:#"10"],nil];
I really do not see any benefit from converting all numbers to strings. Simply store numbers:
NSArray *timeOuts = #[#10000, #10000, #10000, #10];
#[] means "array literal", #x means NSNumber instance object.
B.
You can print out the list itself simply with NSLog():
NSLog( #"%#", timeOuts );
C.
You have to read instances of NSNumber, because you stored such instances:
NSMutableArray * bytes = [NSMutableArray arrayWithCapacity:4];
for (NSNumber *value in timeOuts) // Take out numbers
{
…
}
D.
Now to the hardest part: unpacking
Because you stored instances of NSNumber into the array, it is pretty easy to get the integer value:
NSMutableArray *bytes = [NSMutableArray arrayWithCapacity:4];
for (NSNumber *value in timeOuts) // Take out numbers
{
int intValue = [value intValue];
…
}
E.
You can "pack them" into a string with -stringWithFormat:. However, if I understand the log in your Q correct, you want to print out the single bytes of a value, not the whole value.
NSMutableArray *bytes = [NSMutableArray arrayWithCapacity:4];
for (NSNumber *value in timeOuts) // Take out numbers
{
int intValue = [value intValue];
for( int x = 0; x < 4; x++ )
{
int byte = intValue & 0xFF000000; // Mask out bit 0-23
[bytes addObject:#(byte)]; // Store byte
intValue <<= 8; // Shift up bit 0-23 to 8-31 for the next iteration
}
}
NSLog( #"%#", bytes );
z.
So we end up with this:
NSArray *timeOuts = #[#10000, #10000, #10000, #10];
NSLog( #"%#", timeOuts );
NSMutableArray *bytes = [NSMutableArray arrayWithCapacity:4];
for (NSNumber *value in timeOuts) // Take out numbers
{
int intValue = [value intValue];
for( int x = 0; x < 4; x++ )
{
int byte = intValue & 0xFF; // Mask out bit 8-31
[bytes addObject:#(byte)]; // Store byte
intValue >>= 8; // Shift down bit 8-31 to 0-23 for the next iteration
}
}
NSLog( #"%#", bytes );
If you really need to store the values as strings, let me know. I will edit my answer.

Number of ones in a binary number stored in base 10 [duplicate]

Efficient way to count number of 1s in the binary representation of a number in O(1) if you have enough memory to play with. This is an interview question I found on an online forum, but it had no answer. Can somebody suggest something, I cant think of a way to do it in O(1) time?
That's the Hamming weight problem, a.k.a. population count. The link mentions efficient implementations. Quoting:
With unlimited memory, we could simply create a large lookup table of the Hamming weight of every 64 bit integer
I've got a solution that counts the bits in O(Number of 1's) time:
bitcount(n):
count = 0
while n > 0:
count = count + 1
n = n & (n-1)
return count
In worst case (when the number is 2^n - 1, all 1's in binary) it will check every bit.
Edit:
Just found a very nice constant-time, constant memory algorithm for bitcount. Here it is, written in C:
int BitCount(unsigned int u)
{
unsigned int uCount;
uCount = u - ((u >> 1) & 033333333333) - ((u >> 2) & 011111111111);
return ((uCount + (uCount >> 3)) & 030707070707) % 63;
}
You can find proof of its correctness here.
Please note the fact that: n&(n-1) always eliminates the least significant 1.
Hence we can write the code for calculating the number of 1's as follows:
count=0;
while(n!=0){
n = n&(n-1);
count++;
}
cout<<"Number of 1's in n is: "<<count;
The complexity of the program would be: number of 1's in n (which is constantly < 32).
I saw the following solution from another website:
int count_one(int x){
x = (x & (0x55555555)) + ((x >> 1) & (0x55555555));
x = (x & (0x33333333)) + ((x >> 2) & (0x33333333));
x = (x & (0x0f0f0f0f)) + ((x >> 4) & (0x0f0f0f0f));
x = (x & (0x00ff00ff)) + ((x >> 8) & (0x00ff00ff));
x = (x & (0x0000ffff)) + ((x >> 16) & (0x0000ffff));
return x;
}
public static void main(String[] args) {
int a = 3;
int orig = a;
int count = 0;
while(a>0)
{
a = a >> 1 << 1;
if(orig-a==1)
count++;
orig = a >> 1;
a = orig;
}
System.out.println("Number of 1s are: "+count);
}
countBits(x){
y=0;
while(x){
y += x & 1 ;
x = x >> 1 ;
}
}
thats it?
Below are two simple examples (in C++) among many by which you can do this.
We can simply count set bits (1's) using __builtin_popcount().
int numOfOnes(int x) {
return __builtin_popcount(x);
}
Loop through all bits in an integer, check if a bit is set and if it is then increment the count variable.
int hammingDistance(int x) {
int count = 0;
for(int i = 0; i < 32; i++)
if(x & (1 << i))
count++;
return count;
}
That will be the shortest answer in my SO life: lookup table.
Apparently, I need to explain a bit: "if you have enough memory to play with" means, we've got all the memory we need (nevermind technical possibility). Now, you don't need to store lookup table for more than a byte or two. While it'll technically be Ω(log(n)) rather than O(1), just reading a number you need is Ω(log(n)), so if that's a problem, then the answer is, impossible—which is even shorter.
Which of two answers they expect from you on an interview, no one knows.
There's yet another trick: while engineers can take a number and talk about Ω(log(n)), where n is the number, computer scientists will say that actually we're to measure running time as a function of a length of an input, so what engineers call Ω(log(n)) is actually Ω(k), where k is the number of bytes. Still, as I said before, just reading a number is Ω(k), so there's no way we can do better than that.
Below will work as well.
nofone(int x) {
a=0;
while(x!=0) {
x>>=1;
if(x & 1)
a++;
}
return a;
}
The following is a C solution using bit operators:
int numberOfOneBitsInInteger(int input) {
int numOneBits = 0;
int currNum = input;
while (currNum != 0) {
if ((currNum & 1) == 1) {
numOneBits++;
}
currNum = currNum >> 1;
}
return numOneBits;
}
The following is a Java solution using powers of 2:
public static int numOnesInBinary(int n) {
if (n < 0) return -1;
int j = 0;
while ( n > Math.pow(2, j)) j++;
int result = 0;
for (int i=j; i >=0; i--){
if (n >= Math.pow(2, i)) {
n = (int) (n - Math.pow(2,i));
result++;
}
}
return result;
}
The function takes an int and returns the number of Ones in binary representation
public static int findOnes(int number)
{
if(number < 2)
{
if(number == 1)
{
count ++;
}
else
{
return 0;
}
}
value = number % 2;
if(number != 1 && value == 1)
count ++;
number /= 2;
findOnes(number);
return count;
}
I came here having a great belief that I know beautiful solution for this problem. Code in C:
short numberOfOnes(unsigned int d) {
short count = 0;
for (; (d != 0); d &= (d - 1))
++count;
return count;
}
But after I've taken a little research on this topic (read other answers:)) I found 5 more efficient algorithms. Love SO!
There is even a CPU instruction designed specifically for this task: popcnt.
(mentioned in this answer)
Description and benchmarking of many algorithms you can find here.
The best way in javascript to do so is
function getBinaryValue(num){
return num.toString(2);
}
function checkOnces(binaryValue){
return binaryValue.toString().replace(/0/g, "").length;
}
where binaryValue is the binary String eg: 1100
There's only one way I can think of to accomplish this task in O(1)... that is to 'cheat' and use a physical device (with linear or even parallel programming I think the limit is O(log(k)) where k represents the number of bytes of the number).
However you could very easily imagine a physical device that connects each bit an to output line with a 0/1 voltage. Then you could just electronically read of the total voltage on a 'summation' line in O(1). It would be quite easy to make this basic idea more elegant with some basic circuit elements to produce the output in whatever form you want (e.g. a binary encoded output), but the essential idea is the same and the electronic circuit would produce the correct output state in fixed time.
I imagine there are also possible quantum computing possibilities, but if we're allowed to do that, I would think a simple electronic circuit is the easier solution.
I have actually done this using a bit of sleight of hand: a single lookup table with 16 entries will suffice and all you have to do is break the binary rep into nibbles (4-bit tuples). The complexity is in fact O(1) and I wrote a C++ template which was specialized on the size of the integer you wanted (in # bits)… makes it a constant expression instead of indetermined.
fwiw you can use the fact that (i & -i) will return you the LS one-bit and simply loop, stripping off the lsbit each time, until the integer is zero — but that’s an old parity trick.
The below method can count the number of 1s in negative numbers as well.
private static int countBits(int number) {
int result = 0;
while(number != 0) {
result += number & 1;
number = number >>> 1;
}
return result;
}
However, a number like -1 is represented in binary as 11111111111111111111111111111111 and so will require a lot of shifting. If you don't want to do so many shifts for small negative numbers, another way could be as follows:
private static int countBits(int number) {
boolean negFlag = false;
if(number < 0) {
negFlag = true;
number = ~number;
}
int result = 0;
while(number != 0) {
result += number & 1;
number = number >> 1;
}
return negFlag? (32-result): result;
}
In python or any other convert to bin string then split it with '0' to get rid of 0's then combine and get the length.
len(''.join(str(bin(122011)).split('0')))-1
By utilizing string operations of JS one can do as follows;
0b1111011.toString(2).split(/0|(?=.)/).length // returns 6
or
0b1111011.toString(2).replace("0","").length // returns 6
I had to golf this in ruby and ended up with
l=->x{x.to_s(2).count ?1}
Usage :
l[2**32-1] # returns 32
Obviously not efficient but does the trick :)
Ruby implementation
def find_consecutive_1(n)
num = n.to_s(2)
arr = num.split("")
counter = 0
max = 0
arr.each do |x|
if x.to_i==1
counter +=1
else
max = counter if counter > max
counter = 0
end
max = counter if counter > max
end
max
end
puts find_consecutive_1(439)
Two ways::
/* Method-1 */
int count1s(long num)
{
int tempCount = 0;
while(num)
{
tempCount += (num & 1); //inc, based on right most bit checked
num = num >> 1; //right shift bit by 1
}
return tempCount;
}
/* Method-2 */
int count1s_(int num)
{
int tempCount = 0;
std::string strNum = std::bitset< 16 >( num ).to_string(); // string conversion
cout << "strNum=" << strNum << endl;
for(int i=0; i<strNum.size(); i++)
{
if('1' == strNum[i])
{
tempCount++;
}
}
return tempCount;
}
/* Method-3 (algorithmically - boost string split could be used) */
1) split the binary string over '1'.
2) count = vector (containing splits) size - 1
Usage::
int count = 0;
count = count1s(0b00110011);
cout << "count(0b00110011) = " << count << endl; //4
count = count1s(0b01110110);
cout << "count(0b01110110) = " << count << endl; //5
count = count1s(0b00000000);
cout << "count(0b00000000) = " << count << endl; //0
count = count1s(0b11111111);
cout << "count(0b11111111) = " << count << endl; //8
count = count1s_(0b1100);
cout << "count(0b1100) = " << count << endl; //2
count = count1s_(0b11111111);
cout << "count(0b11111111) = " << count << endl; //8
count = count1s_(0b0);
cout << "count(0b0) = " << count << endl; //0
count = count1s_(0b1);
cout << "count(0b1) = " << count << endl; //1
A Python one-liner
def countOnes(num):
return bin(num).count('1')

Optimising string generate and test

I am trying to run a simulation to test the average Levenshtein distance between random
binary strings.
To speed it up I am using this C extension.
My code is as follows.
from Levenshtein import distance
for i in xrange(20):
sum = 0
for j in xrange(1000):
str1 = ''.join([random.choice("01") for x in xrange(2**i)])
str2 = ''.join([random.choice("01") for x in xrange(2**i)])
sum += distance(str1,str2)
print sum/(1000*2**i)
I think the slowest part is now the string generation. Can that be sped up somehow or is there some other speed up I could try?
I also have 8 cores but I don't know how hard it would be take advantage of those.
Unfortunately I can't use pypy because of the C extension.
The following solution should be way better in terms of runtime.
It generates a number with 2**i random bits (random.getrandbits), converts it to a string of the number's binary representation (bin), takes everything beginning with the 3nd character to the end (because the result of bin is prepended with '0b') and prepends the resulting string with zeros to have the length you want.
str1 = bin(random.getrandbits(2**i))[2:].zfill(2**i)
Quick timing for your maximum string length of 2**20:
from timeit import Timer
>>> t=Timer("''.join(random.choice('01') for x in xrange(2**20))", "import random")
>>> sorted(t.repeat(10,1))
[0.7849910731831642, 0.787418033587528, 0.7894113893237318, 0.789840397476155, 0.7907980049587877, 0.7908638883536696, 0.7911707057912736, 0.7935838766477445, 0.8014726470912592, 0.8228315074311467]
>>> t=Timer("bin(random.getrandbits(2**20))[2:].zfill(2**20)", "import random")
>>> sorted(t.repeat(10,1))
[0.005115922216191393, 0.005215130351643893, 0.005234282501078269, 0.005451850921190271, 0.005531523863737675, 0.005627284612046424, 0.005746794025981217, 0.006217553864416914, 0.014556016781853032, 0.014710766150983545]
That's a speedup of a factor of 150 on average.
You can create a Python string using the Python/C API, which will be significantly faster than any method that exclusively uses Python, since Python itself is implemented in Python/C. Performance will likely primarily depend on the efficiency of the random number generator. If you are on a system with a reasonable random(3) implementation, such as the one in glibc, an efficient implementation of random string would look like this:
#include <Python.h>
/* gcc -shared -fpic -O2 -I/usr/include/python2.7 -lpython2.7 rnds.c -o rnds.so */
static PyObject *rnd_string(PyObject *ignore, PyObject *args)
{
const char choices[] = {'0', '1'};
PyObject *s;
char *p, *end;
int size;
if (!PyArg_ParseTuple(args, "i", &size))
return NULL;
// start with a two-char string to avoid the empty string singleton.
if (!(s = PyString_FromString("xx")))
return NULL;
_PyString_Resize(&s, size);
if (!s)
return NULL;
p = PyString_AS_STRING(s);
end = p + size;
for (;;) {
unsigned long rnd = random();
int i = 31; // random() provides 31 bits of randomness
while (i-- > 0 && p < end) {
*p++ = choices[rnd & 1];
rnd >>= 1;
}
if (p == end)
break;
}
return s;
}
static PyMethodDef rnds_methods[] = {
{"rnd_string", rnd_string, METH_VARARGS },
{NULL, NULL, 0, NULL}
};
PyMODINIT_FUNC initrnds(void)
{
Py_InitModule("rnds", rnds_methods);
}
Testing this code with halex's benchmark shows that it is 280x faster than the original code, and 2.3x faster than halex's code (on my machine):
# the above code
>>> t1 = Timer("rnds.rnd_string(2**20)", "import rnds")
>>> sorted(t1.repeat(10,1))
[0.0029861927032470703, 0.0029909610748291016, ...]
# original generator
>>> t2 = Timer("''.join(random.choice('01') for x in xrange(2**20))", "import random")
>>> sorted(t2.repeat(10,1))
[0.8376679420471191, 0.840252161026001, ...]
# halex's generator
>>> t3 = Timer("bin(random.getrandbits(2**20-1))[2:].zfill(2**20-1)", "import random")
>>> sorted(t3.repeat(10,1))
[0.007007122039794922, 0.007027149200439453, ...]
Adding C code to a project is a complication, but for a 280x speedup of a critical operation, it might well be worth it.
For further efficiency improvement, look into faster RNGs, and invoke call them from separate threads, in order to parallelize random number generation is parallelized. The latter would benefit from a lock-free synchronization mechanism to make sure that inter-thread communication doesn't bog down the otherwise fast generation process.

rijndael: Encrypt/decrypt the same data in C and Python

I'm trying to figure out how to mirror encryption/decryption from an existing C function over to python. However, in my tests of encrypting with C and decrypting with python, I can't figure out some elements around the key.
These were all code samples online, so I commented things like the base64 call in Python, and at this point I'm unsure on:
1) If I correctly determined the KEYBIT to KEY_SIZE/BLOCK_SIZE settings.
2) How to get from password to key in python to match the C code.
3) Am I missing any core conversion steps?
rijndael.h in C:
#define KEYLENGTH(keybits) ((keybits)/8)
#define RKLENGTH(keybits) ((keybits)/8+28)
#define NROUNDS(keybits) ((keybits)/32+6)
encrypting in C
#define KEYBITS 256
unsigned long rk[RKLENGTH(KEYBITS)];
unsigned char key[KEYLENGTH(KEYBITS)];
char *password = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
for (i = 0; i < sizeof(key); i++)
key[i] = *password != 0 ? *password++ : 0;
nrounds = rijndaelSetupEncrypt(rk, key, 256);
count = 0;
while (count < strlen(input)) {
unsigned char ciphertext[16];
unsigned char plaintext[16];
for (i = 0; i < sizeof(plaintext); i++) {
if (count < strlen(input))
plaintext[i] = input[count++];
else
plaintext[i] = 0;
}
rijndaelEncrypt(rk, nrounds, plaintext, ciphertext);
if (fwrite(ciphertext, sizeof(ciphertext), 1, output) != 1)
fclose(file);
fputs("File write error", stderr);
return 0;
}
}
Decrypt in Python
KEY_SIZE = 32
BLOCK_SIZE = 16
def decrypt(password, filename):
#
# I KNOW THIS IS WRONG, BUT HOW DO I CONVERT THE PASSWD TO KEY?
#
key = password
padded_key = key.ljust(KEY_SIZE, '\0')
#ciphertext = base64.b64decode(encoded)
ciphertext = file_get_contents(filename);
r = rijndael(padded_key, BLOCK_SIZE)
padded_text = ''
for start in range(0, len(ciphertext), BLOCK_SIZE):
padded_text += r.decrypt(ciphertext[start:start+BLOCK_SIZE])
plaintext = padded_text.split('\x00', 1)[0]
return plaintext
Thanks!
The example C code just copies 32 bytes from the password string into the key. If the key is less than 32 bytes, it padds on the right with zeroes.
Translated into python, this would be:
key = password[:32]+b'\x00'*(32-len(password))
Which actually produces the same result as
password.ljust(32, '\0')
You should note however that this method of generating keys is considerd extremely unsafe. If the attacker suspects that the key consists of ASCII characters padded with 0 bytes, the keyspace (amount of possible keys) is reduced considerably. If the key is made out of random bytes, there are 256^32 = 1.15e77 keys. If the key e.g. begins with 8 ASCII characters followed by zeroes, there are only (127-32)^8 = 6.63e15 possible keys. And since there are dictionaries out there with often-used passwords, the attacker probably wouldn't have to exhaust this reduced keyspace; he would try the relatively small dictionaries first.
Consider using a cryptographic hash function or another proper key derivation function to convert the passphrase into a key.
Try using the pycrypto toolkit. It implements Rijndael/AES and other ciphers.

Categories

Resources