python2.7 memory limit release - python

I want one Giga byte data string.
using this code
length = 0x20000000
payload = ''.join(random.choice(string.printable) for _ in range(length))
but python excepted and print this error "MemoryError"
full error message :
payload = ''.join(random.choice(string.printable) for _ in range(length))
MemoryError
i found this excepted case in stack overflow.
using "import sys, sys.setrecursionlimit(10**6)" will fix it!
so, i adding this code.
but not solved!!
i can't "import resource". because, can't not install..

I think the best option for you is to use a bytearray:
barray = bytearray()
length = 0x20000000
for _ in range(length):
barray.append(random.choice(string.printable))
This only consumed about 0.5 gigs on my machine.
Note, increasing the recursion limit won't help you here, indeed you aren't using recursion at all. You are just making something that is very large. Just the array of pointers underlying the list that gets created by ''.join will require about 0x20000000 * 8 * 1e-9 == 4.294967296 gigabytes, and that doesn't count the strings in the list themselves, each of which requires a full python object, which is another 40 or so bytes per object, so you see, you were just running out of memory. So taking into account your individual string objects:
>>> 0x20000000 * (48) * 1e-9
25.769803776000003
So you would need over 20 gigs! Doable on some modern laptops, but 8 gigs sure isn't enough.

Related

How do I pack bits from one byte array to another efficiently in python3?

I have a fairly large byte array in python. In the simplest situation the byte array only contains 0 or 1 values (0x00, 0x01), also the array is always a multiple of 8 in length. How can I pack these "bits" into another byte array (it doesn't need to be mutable) so the source index zero goes to the MSB of the first output byte etc.
For example if src = bytearray([1,0,0,0,1,0,0,1, 1,1,1,0,0,0,1,0, 1,1,1,1,1,1,1,1])
Desired output would be b'\x89\xe2\xff'.
I could do it with a for loop and bit shifting and or-ing and concatenation, but there surely is a faster/better built-in way to do this.
In a follow up question, I also might want to have the source byte array contain values from the set 0-3 and pack these 4 at a time into the output array. Is there a way of doing that?
In general is there a way of interpreting elements of a list as true or false and packing them 8 at a time into a byte array?
As ridiculous as it may sound, the fastest solution using builtins may be to build a string and pass it to int, much as the fastest way to count 1-bits in an int is bin(n).count('1'). And it's dead simple, too:
def unbitify_byte(src):
s = ''.join(map(str, src))
n = int(s, 2)
return n.to_bytes(len(src)//8, 'big')
Equivalent (but marginally more complex) code using gmpy2 instead of native Python int is a bit faster.
And you can extend it to 2-bit values pretty easily:
def unhalfnybblify_byte(src):
s = ''.join(map(str, src))
n = int(s, 4)
return n.to_bytes(len(src)//4, 'big')
If you want something more flexible, but possibly slower, here's a simple solution using ctypes.
If you know C, you can probably see a struct of 8 single-bit bit-fields would come in handy here. And you can write the equivalent struct type in Python like this:
class Bits(ctypes.Structure):
_fields_ = [(f'bit{8-i}', ctypes.c_uint, 1) for i in range(8)]
And you can construct one of them from 8 ints that are all 0 or 1:
bits = Bits(*src[:8])
And you can convert that to a single int by using an ugly cast or a simple union:
class UBits(ctypes.Union):
_fields_ = [('bits', Bits), ('i', ctypes.c_uint8)]
i = UBits(Bits(*src[:8])).i
So now it's just a matter of chunking src into groups of 8 in big-endian order:
chunks = (src[i:i+8][::-1] for i in range(0, len(src), 8))
dst = bytearray(UBits(Bits(*chunk)).i for chunk in chunks)
And it should be pretty obvious how to extend this to four 2-bit fields, or two 4-bit fields, or even two 3-bit fields and a 2-bit field, per byte.
However, despite looking like low-level C code, it's probably slower. Still, it might be worth testing to see if it's fast enough for your uses.
A custom C extension can probably do better. And there are a number of bit-array-type modules on PyPI to try out. But if you want to go down that road, numpy is the obvious answer. You can't get any simpler than this:
np.packbits(src)
(A bytearray works just fine as an "array-like".)
It's also hard to beat for speed.
For comparison, here's some measurements:
60ns/byte + 0.3µs: np.packbits on an array instead of a bytearray
60ns/byte + 1.9µs: np.packbits
440ns/byte + 3.2µs: for and bit-twiddling in PyPy instead of CPython
570µs/byte + 3.8µs: int(…, 2).to_bytes(…) in PyPy instead of CPython
610ns/byte + 9.1µs: bitarray
800ns/byte + 2.9µs: gmpy.mpz(…)…
1.0µs/byte + 2.8µs: int(…, 2).to_bytes(…)
2.9µs/byte + 0.2µs: (UBits(Bits(*chunk)) …)
16.µs/byte + 0.9µs: for and bit-twiddling
Using numpy, with test code and comments:
#!/usr/bin/env python3
import numpy as np
def pack_bits(a):
# big-endian - use '<u8' if you want little-endian
#0000000A0000000B0000000C0000000D0000000E0000000F0000000G0000000H
b = np.copy(a.view('>u8'))
#0000000A000000AB000000BC000000CD000000DE000000EF000000FG000000GH
b |= b >> 7
#0000000A000000AB00000ABC0000ABCD0000BCDE0000CDEF0000DEFG0000EFGH
b |= b >> 14
#0000000A000000AB00000ABC0000ABCD000ABCDE00ABCDEF0ABCDEFGABCDEFGH
b |= b >> 28
return np.array(b, dtype='u1')
def main():
a = []
for i in range(256):
# build 8-bit lists without numpy, then convert
a.append(np.array([int(b) for b in bin(256 + i)[2+1:]], dtype='u1'))
a = np.array(a)
print(a)
b = pack_bits(a)
print(b)
if __name__ == '__main__':
main()
Similar code exists for other deinterleaving, bit since the number of bits between inputs is less than the number of bytes in a word, we can avoid the masking here (note that the 0ABCDEFG does not overlap the ABCDEFGH).

Huge memory consumption while trying to run this code

While trying to run this code:
l = 1000000
w = [1, 1]
for i in range(2, l):
w.append(w[-1] + w[-2])
computer hangs on and Blue screen of death appears. The only info which I get is about MEMORY MANAGEMENT. Problem occurs in version 2.7 of Python and 3.4 as well.
Code works good for l = 100000.
Can someone explain me exactly why? I am using Windows 10 64-bit, Python 2.7.8 64-bit from Active Python.
EDIT:
Here is R code which works well:
len <- 1000000
fibvals <- numeric(len)
fibvals[1] <- 1
fibvals[2] <- 1
for (i in 3:len) {
fibvals[i] <- fibvals[i-1]+fibvals[i-2]
}
The numbers you're producing are huger than you might realize. For example, here's the size in memory of the last one:
>>> a, b = 1, 1
>>> for i in xrange(2, 1000000):
... a, b = b, a+b
...
>>> sys.getsizeof(b)
92592
That's 92 kilobytes for one integer. All of them put together would be somewhere in the vicinity of 46-ish gigabytes, and you only have 16 gigabytes.
Your R code used 64-bit floating-point numbers, which promptly overflow to infinity at around the 1476th number.
The fibonacci numbers are HUGE. In R and other languages, integers overflow, so not that much memory is required. But in Python, integers simply don't overflow. The 1000000th fibonacci number would require terabytes of space. Once your OS uses up all the physical RAM, it'll switch over to hard disk swap. When it runs out of that, you get a kernel fault.
In python list take too much space in memory. try to use tuple
Example code:
l = 1000000
w = (1,1)
for i in xrange(2,l):
w = w + (w[-1] + w[-2],)
execution of program take time, that depend on your number of cpu's and main memory.

How to create a fixed size (unsigned) integer in python?

I want to create a fixed size integer in python, for example 4 bytes. Coming from a C background, I expected that all the primitive types will occupy a constant space in memory, however when I try the following in python:
import sys
print sys.getsizeof(1000)
print sys.getsizeof(100000000000000000000000000000000000000000000000000000000)
I get
>>>24
>>>52
respectively.
How can I create a fixed size (unsigned) integer of 4 bytes in python? I need it to be 4 bytes regardless if the binary representation uses 3 or 23 bits, since later on I will have to do byte level memory manipulation with Assembly.
You can use struct.pack with the I modifier (unsigned int). This function will warn when the integer does not fit in four bytes:
>>> from struct import *
>>> pack('I', 1000)
'\xe8\x03\x00\x00'
>>> pack('I', 10000000)
'\x80\x96\x98\x00'
>>> pack('I', 1000000000000000)
sys:1: DeprecationWarning: 'I' format requires 0 <= number <= 4294967295
'\x00\x80\xc6\xa4'
You can also specify endianness.
the way I do this (and its usually to ensure a fixed width integer before sending to some hardware) is via ctypes
from ctypes import c_ushort
def hex16(self, data):
'''16bit int->hex converter'''
return '0x%004x' % (c_ushort(data).value)
#------------------------------------------------------------------------------
def int16(self, data):
'''16bit hex->int converter'''
return c_ushort(int(data,16)).value
otherwise struct can do it
from struct import pack, unpack
pack_type = {'signed':'>h','unsigned':'>H',}
pack(self.pack_type[sign_type], data)
you are missing something here I think
when you send a character you will be sending 1 byte so even though
sys.getsizeof('\x05')
reports larger than 8 you are still only sending a single byte when you send it. the extra overhead is python methods that are attached to EVERYTHING in python, those do not get transmitted
you complained about getsizeof for the struct pack answer but accepted the c_ushort answer so I figured I would show you this
>>> sys.getsizeof(struct.pack("I",15))
28
>>> sys.getsizeof(c_ushort(15))
80
however that said both of the answers should do exactly what you want
I have no idea if there's a better way to do this, but here's my naive approach:
def intn(n, num_bits=4):
return min(2 ** num_bits - 1, n)

Slow Big Int Output in python

Is there anyway to improve performance of "str(bigint)" and "print bigint" in python ? Printing big integer values takes a lot of time. I tried to use the following recursive technique :
def p(x,n):
if n < 10:
sys.stdout.write(str(x))
return
n >>= 1
l = 10**n
k = x/l
p(k,n)
p(x-k*l,n)
n = number of digits,
x = bigint
But the method fails for certain cases where x in a sub call has leading zeros. Is there any alternative to it or any faster method. ( Please do not suggest using any external module or library ).
Conversion from a Python integer to a string has a running of O(n^2) where n is the length of the number. For sufficiently large numbers, it will be slow. For a 1,000,001 digit number, str() takes approximately 24 seconds on my computer.
If you are really needing to convert very large numbers to a string, your recursive algorithm is a good approach.
The following version of your recursive code should work:
def p(x,n=0):
if n == 0:
n = int(x.bit_length() * 0.3)
if n < 100:
return str(x)
n >>= 1
l = 10**n
a,b = divmod(x, l)
upper = p(a,n)
lower = p(b,n).rjust(n, "0")
return upper + lower
It automatically estimates the number of digits in the output. It is about 4x faster for a 1,000,001 digit number.
If you need to go faster, you'll probably need to use an external library.
For interactive applications, the built-in print and str functions run in the blink of an eye.
>>> print(2435**356)
392312129667763499898262143039114894750417507355276682533585134425186395679473824899297157270033375504856169200419790241076407862555973647354250524748912846623242257527142883035360865888685267386832304026227703002862158054991819517588882346178140891206845776401970463656725623839442836540489638768126315244542314683938913576544051925370624663114138982037489687849052948878188837292688265616405774377520006375994949701519494522395226583576242344239113115827276205685762765108568669292303049637000429363186413856005994770187918867698591851295816517558832718248949393330804685089066399603091911285844172167548214009780037628890526044957760742395926235582458565322761344968885262239207421474370777496310304525709023682281880997037864251638836009263968398622163509788100571164918283951366862838187930843528788482813390723672536414889756154950781741921331767254375186751657589782510334001427152820459605953449036021467737998917512341953008677012880972708316862112445813219301272179609511447382276509319506771439679815804130595523836440825373857906867090741932138749478241373687043584739886123984717258259445661838205364797315487681003613101753488707273055848670365977127506840194115511621930636465549468994140625
>>> str(2435**356)
'392312129667763499898262143039114894750417507355276682533585134425186395679473824899297157270033375504856169200419790241076407862555973647354250524748912846623242257527142883035360865888685267386832304026227703002862158054991819517588882346178140891206845776401970463656725623839442836540489638768126315244542314683938913576544051925370624663114138982037489687849052948878188837292688265616405774377520006375994949701519494522395226583576242344239113115827276205685762765108568669292303049637000429363186413856005994770187918867698591851295816517558832718248949393330804685089066399603091911285844172167548214009780037628890526044957760742395926235582458565322761344968885262239207421474370777496310304525709023682281880997037864251638836009263968398622163509788100571164918283951366862838187930843528788482813390723672536414889756154950781741921331767254375186751657589782510334001427152820459605953449036021467737998917512341953008677012880972708316862112445813219301272179609511447382276509319506771439679815804130595523836440825373857906867090741932138749478241373687043584739886123984717258259445661838205364797315487681003613101753488707273055848670365977127506840194115511621930636465549468994140625'
If however you are printing big integers to (standard output, say) so that they can be read (from standard input) by another process, and you are finding the binary-to-decimal operations impacting the overall performance, you can look at Is there a faster way to convert an arbitrary large integer to a big endian sequence of bytes? (although the accepted answer suggests numpy, which is an external library, though there are other suggestions).

Random strings in Python 2.6 (Is this OK?)

I've been trying to find a more pythonic way of generating random string in python that can scale as well. Typically, I see something similar to
''.join(random.choice(string.letters) for i in xrange(len))
It sucks if you want to generate long string.
I've been thinking about random.getrandombits for a while, and figuring out how to convert that to an array of bits, then hex encode that. Using python 2.6 I came across the bitarray object, which isn't documented. Somehow I got it to work, and it seems really fast.
It generates a 50mil random string on my notebook in just about 3 seconds.
def rand1(leng):
nbits = leng * 6 + 1
bits = random.getrandbits(nbits)
uc = u"%0x" % bits
newlen = int(len(uc) / 2) * 2 # we have to make the string an even length
ba = bytearray.fromhex(uc[:newlen])
return base64.urlsafe_b64encode(str(ba))[:leng]
edit
heikogerlach pointed out that it was an odd number of characters causing the issue. New code added to make sure it always sent fromhex an even number of hex digits.
Still curious if there's a better way of doing this that's just as fast.
import os
random_string = os.urandom(string_length)
and if you need url safe string :
import os
random_string = os.urandom(string_length).hex()
(note random_string length is greatest than string_length in that case)
Sometimes a uuid is short enough and if you don't like the dashes you can always.replace('-', '') them
from uuid import uuid4
random_string = str(uuid4())
If you want it a specific length without dashes
random_string_length = 16
str(uuid4()).replace('-', '')[:random_string_length]
Taken from the 1023290 bug report at Python.org:
junk_len = 1024
junk = (("%%0%dX" % junk_len) % random.getrandbits(junk_len *
8)).decode("hex")
Also, see the issues 923643 and 1023290
It seems the fromhex() method expects an even number of hex digits. Your string is 75 characters long.
Be aware that something[:-1] excludes the last element! Just use something[:].
Regarding the last example, the following fix to make sure the line is even length, whatever the junk_len value:
junk_len = 1024
junk = (("%%0%dX" % (junk_len * 2)) % random.getrandbits(junk_len * 8)).decode("hex")

Categories

Resources