How to rebuild the libmem_crc32_direct CRC function in python?

How to rebuild the libmem_crc32_direct CRC function in python? - python

I like to rebuild the libmem_crc32_direct function in python.
I used the crcmod python package before. So I like to setup the crc generator by using it.
the c-code looks like:
uint32_t crc_process_chunk(uint8_t* data, uint32_t len) {
return ~libmem_crc32_direct(data, len, 0xFFFFFFFF);
}
my python code looks so far:
def bit_not(n, numbits=8):
return (1 << numbits) - 1 - n
def getCRC(imageBA):
crcGen = crcmod.mkCrcFun(0x104C11DB7, initCrc=0xFFFFFFFF)
val = crcGen(imageBA)
val = bit_not(val, 32)
return val
The returned value of the python code is not equal of the one in c. So I guess I mad some error.
Any ideas?

Doesn't (1 << numbits) == 0? If this is two's complement math it should work as bit_not could be return 0-1-n. However, this isn't needed, since there is an optional xorOut parameter for crcmod. I'm thinking that since the optional rev parameter for reversed (reflected) input and output defaults to true, it needs to be set to false. I think the call to create the crc generator should be:
crcGen = crcmod.mkCrcFun(0x104C11DB7, initCrc=0xFFFFFFF, rev=False, xorOut=0xFFFFFFFF)

B bit tricky because 64Bit arithmetic on PC vs 32Bit arithmetic on ARM STM32F4, but finally this solution works:
def libmem_crc32_direct_with_xor(im, startAddr, l):
fw = im[startAddr:startAddr+l]
crcGen = crcmod.Crc(0x104C11DB7, initCrc=0xFFFFFFFF, rev = False)
crcGen.update(fw)
return (~crcGen.crcValue ) & 0xFFFFFFFF # 32bit xor

Related

What is a python "cksum" equivalent for very large files and how does it work?

I have a problem that i need to validate huge compressed files after download (usually more than 10-20gb per file) against reference checksums that have apparently been generated using cksum
(To be more precise: My python script needs to download large compressed files from the ncbi ftp-server that was supposed to provide md5 checksums for validating the downloads, but instead only provided some different unspecified filehash/checksum values. After some trial and error I found that these checksums were identical to the output of the unix tool cksum, which apparently genereates CRC-checksums. So to compare/validate these i need to generate cksum-equivalent checksums for the downloaded files.)
It appears that the unix tool cksum yields totally different checksum values than the supposed equivalent unix tool crc32 (or the python zlib.crc32() function, for that matter). When googling the problem I could not understand the explanations for why this occurs, especially since they appear to be identical on some systems? So maybe this is because I work on a 64 bit system (but then: who doesn't nowadays)?
using built-in python modules I can easily generate md5- and CRC32 checksums, but none of these are equivalent to the cksum output, neither in decimal nor in hexadecimal representation.
I did find a previous post here on stackoverflow pointing to a snippet that seems to solve this. But while it works for small files, A.) I do not understand a word of it, so I have a hard time adapting it and B.) it does not seem to work well with large files.
for completeness sake: here is the snippet (python3 version):
#!/usr/bin/env python
import sys
crctab = [ 0x00000000, 0x04c11db7, 0x09823b6e, 0x0d4326d9, 0x130476dc,
0x17c56b6b, 0x1a864db2, 0x1e475005, 0x2608edb8, 0x22c9f00f,
0x2f8ad6d6, 0x2b4bcb61, 0x350c9b64, 0x31cd86d3, 0x3c8ea00a,
0x384fbdbd, 0x4c11db70, 0x48d0c6c7, 0x4593e01e, 0x4152fda9,
0x5f15adac, 0x5bd4b01b, 0x569796c2, 0x52568b75, 0x6a1936c8,
0x6ed82b7f, 0x639b0da6, 0x675a1011, 0x791d4014, 0x7ddc5da3,
0x709f7b7a, 0x745e66cd, 0x9823b6e0, 0x9ce2ab57, 0x91a18d8e,
0x95609039, 0x8b27c03c, 0x8fe6dd8b, 0x82a5fb52, 0x8664e6e5,
0xbe2b5b58, 0xbaea46ef, 0xb7a96036, 0xb3687d81, 0xad2f2d84,
0xa9ee3033, 0xa4ad16ea, 0xa06c0b5d, 0xd4326d90, 0xd0f37027,
0xddb056fe, 0xd9714b49, 0xc7361b4c, 0xc3f706fb, 0xceb42022,
0xca753d95, 0xf23a8028, 0xf6fb9d9f, 0xfbb8bb46, 0xff79a6f1,
0xe13ef6f4, 0xe5ffeb43, 0xe8bccd9a, 0xec7dd02d, 0x34867077,
0x30476dc0, 0x3d044b19, 0x39c556ae, 0x278206ab, 0x23431b1c,
0x2e003dc5, 0x2ac12072, 0x128e9dcf, 0x164f8078, 0x1b0ca6a1,
0x1fcdbb16, 0x018aeb13, 0x054bf6a4, 0x0808d07d, 0x0cc9cdca,
0x7897ab07, 0x7c56b6b0, 0x71159069, 0x75d48dde, 0x6b93dddb,
0x6f52c06c, 0x6211e6b5, 0x66d0fb02, 0x5e9f46bf, 0x5a5e5b08,
0x571d7dd1, 0x53dc6066, 0x4d9b3063, 0x495a2dd4, 0x44190b0d,
0x40d816ba, 0xaca5c697, 0xa864db20, 0xa527fdf9, 0xa1e6e04e,
0xbfa1b04b, 0xbb60adfc, 0xb6238b25, 0xb2e29692, 0x8aad2b2f,
0x8e6c3698, 0x832f1041, 0x87ee0df6, 0x99a95df3, 0x9d684044,
0x902b669d, 0x94ea7b2a, 0xe0b41de7, 0xe4750050, 0xe9362689,
0xedf73b3e, 0xf3b06b3b, 0xf771768c, 0xfa325055, 0xfef34de2,
0xc6bcf05f, 0xc27dede8, 0xcf3ecb31, 0xcbffd686, 0xd5b88683,
0xd1799b34, 0xdc3abded, 0xd8fba05a, 0x690ce0ee, 0x6dcdfd59,
0x608edb80, 0x644fc637, 0x7a089632, 0x7ec98b85, 0x738aad5c,
0x774bb0eb, 0x4f040d56, 0x4bc510e1, 0x46863638, 0x42472b8f,
0x5c007b8a, 0x58c1663d, 0x558240e4, 0x51435d53, 0x251d3b9e,
0x21dc2629, 0x2c9f00f0, 0x285e1d47, 0x36194d42, 0x32d850f5,
0x3f9b762c, 0x3b5a6b9b, 0x0315d626, 0x07d4cb91, 0x0a97ed48,
0x0e56f0ff, 0x1011a0fa, 0x14d0bd4d, 0x19939b94, 0x1d528623,
0xf12f560e, 0xf5ee4bb9, 0xf8ad6d60, 0xfc6c70d7, 0xe22b20d2,
0xe6ea3d65, 0xeba91bbc, 0xef68060b, 0xd727bbb6, 0xd3e6a601,
0xdea580d8, 0xda649d6f, 0xc423cd6a, 0xc0e2d0dd, 0xcda1f604,
0xc960ebb3, 0xbd3e8d7e, 0xb9ff90c9, 0xb4bcb610, 0xb07daba7,
0xae3afba2, 0xaafbe615, 0xa7b8c0cc, 0xa379dd7b, 0x9b3660c6,
0x9ff77d71, 0x92b45ba8, 0x9675461f, 0x8832161a, 0x8cf30bad,
0x81b02d74, 0x857130c3, 0x5d8a9099, 0x594b8d2e, 0x5408abf7,
0x50c9b640, 0x4e8ee645, 0x4a4ffbf2, 0x470cdd2b, 0x43cdc09c,
0x7b827d21, 0x7f436096, 0x7200464f, 0x76c15bf8, 0x68860bfd,
0x6c47164a, 0x61043093, 0x65c52d24, 0x119b4be9, 0x155a565e,
0x18197087, 0x1cd86d30, 0x029f3d35, 0x065e2082, 0x0b1d065b,
0x0fdc1bec, 0x3793a651, 0x3352bbe6, 0x3e119d3f, 0x3ad08088,
0x2497d08d, 0x2056cd3a, 0x2d15ebe3, 0x29d4f654, 0xc5a92679,
0xc1683bce, 0xcc2b1d17, 0xc8ea00a0, 0xd6ad50a5, 0xd26c4d12,
0xdf2f6bcb, 0xdbee767c, 0xe3a1cbc1, 0xe760d676, 0xea23f0af,
0xeee2ed18, 0xf0a5bd1d, 0xf464a0aa, 0xf9278673, 0xfde69bc4,
0x89b8fd09, 0x8d79e0be, 0x803ac667, 0x84fbdbd0, 0x9abc8bd5,
0x9e7d9662, 0x933eb0bb, 0x97ffad0c, 0xafb010b1, 0xab710d06,
0xa6322bdf, 0xa2f33668, 0xbcb4666d, 0xb8757bda, 0xb5365d03,
0xb1f740b4 ]
UNSIGNED = lambda n: n & 0xffffffff
def memcrc(b):
n = len(b)
i = c = s = 0
for c in b:
tabidx = (s>>24)^c
s = UNSIGNED((s << 8)) ^ crctab[tabidx]
while n:
c = n & 0o0377
n = n >> 8
s = UNSIGNED(s << 8) ^ crctab[(s >> 24) ^ c]
return UNSIGNED(~s)
if __name__ == '__main__':
fname = sys.argv[-1]
buffer = open(fname, 'rb').read()
print("%d\t%d\t%s" % (memcrc(buffer), len(buffer), fname))
Could someone please help me understand this?
what exactly is the problem with the difference between cksum and crc32?
is it simply the fact that the one is 32bit based and the other 64 bit?
Can i simply convert between the values produced by both, and if yes how?
what is the purpose of the crctab in the above snippet and how does the conversion work there?

I don't know the why part of your question. All I can say is that the great thing about standards is that you have so many to choose from.
cksum is specified by POSIX to use a different CRC than the more common CRC-32 you find in zlib, Python, used in zip and gzip files, etc. The CRC-32/CKSUM has this specification (from Greg Cook's CRC catalog):
width=32 poly=0x04c11db7 init=0x00000000 refin=false refout=false xorout=0xffffffff check=0x765e7680 residue=0xc704dd7b name="CRC-32/CKSUM"
The more common CRC-32 has this specification:
width=32 poly=0x04c11db7 init=0xffffffff refin=true refout=true xorout=0xffffffff check=0xcbf43926 residue=0xdebb20e3 name="CRC-32/ISO-HDLC"
The cksum utility on my system (macOS) computes the CRC-32/CKSUM, but it also has options to compute the CRC-32/ISO_HDLC, as well as two other actual checksums, the first from the BSD Unix sum command, and the second from the AT&T System V Unix sum command.
There is apparently no shortage of results that cksum might produce.
No, it has nothing to do with 32 vs. 64 bit systems.
No, you cannot convert between the values.
The purpose of the table is to speed up the CRC calculation by precomputing the CRC of every byte value.

I've tried refactoring your code into a class following a similar api to the standard Python hashlib:
class crc32:
def __init__(self):
self.nchars = 0
self.crc = 0
def update(self, buf):
crc = self.crc
for c in buf:
crc = crctab[(crc >> 24) ^ c] ^ ((crc << 8) & 0xffffffff)
self.crc = crc
self.nchars += len(buf)
def digest(self):
crc = self.crc
n = self.nchars
while n:
c = n & 0xff
crc = crctab[(crc >> 24) ^ c] ^ ((crc << 8) & 0xffffffff)
n >>= 8
return UNSIGNED(~crc)
I've expanded out UNSIGNED to try and make it faster and reordered some of the statements to be more similar to the standard zlib library (as used by Python) while I was trying to understand the differences. It seems they use a different polynomial to generate the table, but otherwise it's the same.
The above code can be used as:
with open('largefile', 'rb') as fd:
digest = crc32()
while buf := fd.read(4096):
digest.update(buf)
print(digest.digest())
which prints out the expected 1135714720 for a file created by:
echo -n hello world > test.txt
The above code should work for large files, but given the performance of Python this would take far too long to be useful. A 75MB file I have takes ~11 seconds, while cksum takes just ~0.2 seconds. You should be able to get somewhere with using Cython to speed it up, but that's a bit more fiddly and if your're struggling with the existing code it's going to be quite a learning curve!
I've had another play and got performance similar to cksum with Cython, the code looks like:
cdef unsigned int *crctab_c = [
// copy/paste crctab from above
]
cdef class crc32_c:
cdef unsigned int crc, nchars
def __init__(self):
self.nchars = 0
self.crc = 0
cdef _update(self, bytes buf):
cdef unsigned int crc, i, j
cdef unsigned char c
crc = self.crc
for c in buf:
i = (crc >> 24) ^ c
j = crc << 8
crc = crctab_c[i] ^ j
self.crc = crc
self.nchars += len(buf)
def update(self, buf):
return self._update(buf)
def digest(self):
crc = self.crc
n = self.nchars
while n:
c = n & 0xff
crc = crctab_c[(crc >> 24) ^ c] ^ ((crc << 8) & 0xffffffff)
n >>= 8
return (~crc) & 0xffffffff
after compiling this code with Cython, it can be used in a similar manner to the previous class. Performance is pretty good: Python now takes ~200ms for a 75MiB file and is basically the same as cksum, but much slower than zlib which only takes ~80ms.

Int to Float conversion Python to C++

I have a function written in Python that works perfectly for what I need(it wasn't written by me).
I need to convert it to C++ so that it provides the same outcome. I know that it saves that float into 16-bit texture, so I am guessing this is converting 32-bit int into 16-bit float. All I need to to is to make it work in C++. Here is the python function:
def packTextureBits(index):
index = int(index)
index = index +1024
sigh=index&0x8000
sigh=sigh<<16
exptest=index&0x7fff
if exptest==0:
exp=0
else:
exp=index>>10
exp=exp&0x1f
exp=exp-15
exp=exp+127
exp=exp<<23
mant=index&0x3ff
mant=mant<<13
index=sigh|exp|mant
cp = pointer(c_int(index))
fp = cast(cp, POINTER(c_float))
return fp.contents.value
This was my approach in C++, but it returns completely screwed up values:
float PackIntToFloat(int value)
{
value += 1024;
int sign = (value & 0x8000) << 16;
int exp = value & 0x7fff;
if(exp != 0)
{
exp = value >> 10;
exp = exp & 0x1f;
exp = exp - 15 + 127;
exp = exp << 23;
}
int mant = (value & 0x3fff) << 13;
value = sign | exp | mant;
int* cp = new int(value);
float* fp = reinterpret_cast<float*>(cp);
return *fp;
// Also tried return (float)value; but returns other weird values.
}

So I owe you apologize guys. I was being stupid, not doing enough tests before posting here. My C++ solution is 100% working. I tested separate colors of the texture, and as it turned out, I assigned values to the texture the wrong way. I tried pushing floats into the texture, and it was 16 bit texture. I needed to convert these floats into half-precision floats after this conversion above, and then it started working. Texture flag called PF_FloatRGBA led me into believing that floats were the right thing to assign there, and they werent.
I still need to learn a lot. Thanks for all your help!

Perl Crypt-Eksblowfish Cypher encrypted string and must be decrypted in python

A Perl script use this module to encrypt string
http://search.cpan.org/~zefram/Crypt-Eksblowfish-0.009/lib/Crypt/Eksblowfish.pm
I need to code the decrypt fonction in python . I know the key and the salt .
I tried to use py-bcrypt but it seems that the two equiv function
$ciphertext = $cipher->encrypt($plaintext);
$plaintext = $cipher->decrypt($ciphertext);
are not implemented .
How can i do ? Is there a python module anywhere that can help me to decrypt my strings ?

Update: The complete answer is the Perl code:
my $cipher = Crypt::EksBlowFish->new($cost, $salt, $key);
is equivalent to this Python code:
bf = Eksblowfish()
bf.expandkey(salt, key)
for i in xrange(cost << 1):
bf.expandkey(0, key)
bf.expandkey(0, salt)
See this repo for example code: https://github.com/erantapaa/python-bcrypt-tests
Original answer:
A partial answer...
I'm assuming you are calling this Perl code like this:
use Crypt::EksBlowfish;
my $cipher = Crypt::EksBlowFish->new($cost, $salt, $key);
$encoded = $cipher->encrypt("some plaintext");
The new method is implemented by the C function setup_eksblowfish_ks() in lib/Crypt/EksBlowfish.xs. This looks like it is the same as the expandKey method in the Python code (link)
The main difference is the $cost parameter which is not present in the Python method. In the Perl code the $cost parameter controls how many times this loop is executed after the key schedule has been set up:
for(count = 1U << cost; count--; ) {
for(j = 0; j != 2; j++) {
merge_key(j == 0 ? expanded_key : expanded_salt, ks);
munge_subkeys(ks);
}
}
The Perl ->encrypt() method enciphers a 64-bit word. The equivalent Python code is:
bf.cipher(xl, xr, bf.ENCRYPT)
where xl and xr are integers representing the left 32-bits and right 32-bits respectively.
So the recipe should go something like this:
Create the Python object: bf = EksBlowfish()
Initialize the key schedule: bf.expandkey(salt, key)
Further munge the key schedule using the cost parameter (TBD)
Encrypt with bf.cipher(xl, xr, bf.ENCRYPT)

C to Python code conversion(print address-like values)

I am trying to convert the following code from c to Python. The C code looks like:
seed = (time(0) ^ (getpid() << 16));
fprintf("0x%08x \n", seed);
that outputs values like 0x7d24defb.
And the python code:
time1 = int(time.time())
seed = (time1 ^ (os.getpid() <<16))
that outputs values like: 1492460964
What do i need to modify at the python code so I get address-like values?

It depends on the way the value is displayed. The %x flag in printf-functions displays the given value in hexadecimal. In Python you can use the hex function to convert the value to a hexadecimal representation.

The equivalent Python code to: fprintf("0x%08x \n", seed);
>>> '0x{:08x}"'.format(1492460964)
'0x58f525a4"'
Note that hex() alone won't pad zeros to size 8 like the C code does.

I suppose this is what you what:
>>> n =hex (int(time.time()) ^ (os.getpid() <<16))
>>> print n
0x431c2fd2
>>>

Read 32-bit signed value from an "unsigned" bytestream

I want to extract data from a file whoose information is stored in big-endian and always unsigned. How does the "cast" from unsigned int to int affect the actual decimal value? Am I correct that the most left bit decides about the whether the value is positive or negative?
I want to parse that file-format with python, and reading and unsigned value is easy:
def toU32(bits):
return ord(bits[0]) << 24 | ord(bits[1]) << 16 | ord(bits[2]) << 8 | ord(bits[3])
but how would the corresponding toS32 function look like?
Thanks for the info about the struct-module. But I am still interested in the solution about my actual question.

I would use struct.
import struct
def toU32(bits):
return struct.unpack_from(">I", bits)[0]
def toS32(bits):
return struct.unpack_from(">i", bits)[0]
The format string, ">I", means read a big endian, ">", unsigned integer, "I", from the string bits. For signed integers you can use ">i".
EDIT
Had to look at another StackOverflow answer to remember how to "convert" a signed integer from an unsigned integer in python. Though it is less of a conversion and more of reinterpreting the bits.
import struct
def toU32(bits):
return ord(bits[0]) << 24 | ord(bits[1]) << 16 | ord(bits[2]) << 8 | ord(bits[3])
def toS32(bits):
candidate = toU32(bits);
if (candidate >> 31): # is the sign bit set?
return (-0x80000000 + (candidate & 0x7fffffff)) # "cast" it to signed
return candidate
for x in range(-5,5):
bits = struct.pack(">i", x)
print toU32(bits)
print toS32(bits)

I would use the struct module's pack and unpack methods.
See Endianness of integers in Python for some examples.

The non-conditional version of toS32(bits) could be something like:
def toS32(bits):
decoded = toU32(bits)
return -(decoded & 0x80000000) + (decoded & 0x7fffffff)
You can pre-compute the mask for any other bit size too of course.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to rebuild the libmem_crc32_direct CRC function in python? - python

Related

What is a python "cksum" equivalent for very large files and how does it work?

Int to Float conversion Python to C++

Perl Crypt-Eksblowfish Cypher encrypted string and must be decrypted in python

C to Python code conversion(print address-like values)

Read 32-bit signed value from an "unsigned" bytestream

Categories

Resources