This should be easy. After looking at what's going on, I'm not so sure. Is the writing/reading of a single binary integer atomic? That's how the underlying hardware does reads and writes of 32-bit integers. After some research, I realized Python does not store integers as a collection of bytes. It doesn't even store bytes as a collection of bytes. There's overhead involved. Does this overhead break the atomic nature of binary integers?
Here is some code I used trying to figure this out:
import time
import sys
tm=time.time()
int_tm = int(tm * 1000000)
bin_tm = bin(int_tm)
int_bin_tm = int(bin_tm, 2)
print('tm:', tm, ", Size:", sys.getsizeof(tm))
print('int_tm:', int_tm, ", Size:", sys.getsizeof(int_tm))
print('bin_tm:', bin_tm, ", Size:", sys.getsizeof(bin_tm))
print('int_bin_tm:', int_bin_tm, ", Size:", sys.getsizeof(int_bin_tm))
Output:
tm: 1581435513.076924 , Size: 24
int_tm: 1581435513076924 , Size: 32
bin_tm: 0b101100111100100111010100101111111011111110010111100 , Size: 102
int_bin_tm: 1581435513076924 , Size: 32
For a couple side questions, does Python's binary representation of integers really consume so much more memory? Am I using the wrong type for converting decimal integers to bytes?
Python doesn't guarantee any atomic operations other than specific mutex constructs like locks and semaphores. Some operations will seem to be atomic because the GIL will prevent bytecode from being run on multiple python threads at once "This lock is necessary mainly because CPython's memory management is not thread-safe".
Basically what this means is that python will ensure an entire bytecode instruction is completely evaluated before allowing another thread to continue. This does not mean however an entire line of code is guaranteed to complete without interruption. This is especially true with function calls. For a deeper look at this, take a look at the dis module.
I will also point out that this talk about atomicity means nothing at a hardware level, the whole idea of an interpreted language is to abstract out the hardware. If you want to consider "actual" hardware atomicity, it will generally be a function provided by the operating system (which is how python likely implements things like threading.Lock).
Note on data sizes (this is just a quickie, because this is a whole other question):
sizeof(tm): 8 bytes for the 64 bit float, 8 bytes for a pointer to the data type, and 8 bytes for a pointer to the reference count
sizeof(int_tm): ints are a bit more complicated as some small values are "cached" using a smaller format for efficiency, then large values use a more flexible type where the number of bytes used to store the int can expand to however big it needs to be.
sizeof(bin_tm): This is actually a string, which is why it takes so much more memory than just a number, there is a fairly significant overhead, plus at least one byte per character.
"Am I using the wrong type for converting..?" We need to know what you're trying to do with the result to answer this.
Related
Use Python 3.9.2 read the beginning of TB size binary file (piece of it) as below:
file=open(filename,'rb')
bytes=file.read(8)
print(bytes)
b'\x14\x00\x80?\xb5\x0c\xf81'
I tried np.fromfile np.fromfile(np.complex64) ways to read the file filename.
float_data1 = np.fromfile(filename,np.float32)
float_data2 = np.fromfile(filename,np.complex64)
As the binary file always bigger than 500GB,even TB size,how to read complex data from TB size binary file, fast and keep the most acuuracy?
This is related to your ham post.
samples = np.fromfile(filename, np.complex128)
and
Those codes equal to -1.9726906072368233e-31,+3.6405886029665884e-23.
No, they don't equal that. That's just your interpretation of bytes as float64. That interpretation is incorrect!
You assume these are 64-bit floating point numbers. They are not; you really need to stop assuming that; it's wrong, and we can't help you if you still act as if it were 64-bit floats forming a 128 bit complex value.
Besides documents,I compare the byte content in the answer,that is more than reading docs.
As I already pointed out, that is wrong. Your computer can read anything as any type, just as you tell them, even if it's not the original type it's been stored in. You stored complex64, but read complex128. That's why your values are so inplausible.
It's 32-bit floats, forming a 64 bit complex value. The official block documentation for the file sink also points that out, and even explains the numpy dtype you need to use!
Anyways, you can use numpy's memmap functionality to map the file contents without reading them all to RAM. That works. Again, you need to use the right dtype, which is, to repeat this the 10th time, not complex128.
It's really easy:
data = numpy.memmap(filename, dtype=numpy.complex64)
done.
I'm trying to convert a very long string (like:'1,2,3,4,5,6...n' n=60M) to numpy.array.
I have tried to convert the string to a list, then use numpy.array() to convert the list to array.
But there is a problem that a lot of memory will be used(pls. let me know if you know why, sys.getsizeof(list) much less than memory used) when string convert to list.
I also have tried to use numpy.fromstring(). but it seems will spend a lot of time(wait a long time but still no result).
Is there any methods that can reduce memory used and more efficiently except sperate the string to a lot of pieces?
When you have a string value and change it in your program, then the previous value will remain in a part of memory and the changed string will be placed in a new part of RAM.
As a result, the old values in RAM remain unused.
For this purpose, Garbage Collector is used and cleans your RAM from old, unused values But it will take time.
You can do this yourself.
You can use the gc module to see different generations of your objects
See this:
import gc
print(gc.get_threshold())
result:
(596, 2, 1)
In this example, we have 596 objects in our youngest generation, two objects in the next generation, and one object in the oldest generation.
I am trying to understand why this python code results in a process that requires 236 MB of memory, considering that the list is only 76 MB long.
import sys
import psutil
initial = psutil.virtual_memory().available / 1024 / 1024
available_memory = psutil.virtual_memory().available
vector_memory = sys.getsizeof([])
vector_position_memory = sys.getsizeof([1]) - vector_memory
positions = 10000000
print "vector with %d positions should use %d MB of memory " % (positions, (vector_memory + positions * vector_position_memory) / 1024 / 1024)
print "it used %d MB of memory " % (sys.getsizeof(range(0, positions)) / 1024 / 1024)
final = psutil.virtual_memory().available / 1024 / 1024
print "however, this process used in total %d MB" % (initial - final)
The output is:
vector with 10000000 positions should use 76 MB of memory
it used 76 MB of memory
however, this process used in total 236 MB
Adding x10 more positions (i.e. positions = 100000000) results in x10 more memory.
vector with 100000000 positions should use 762 MB of memory
it used 762 MB of memory
however, this process used in total 2330 MB
My ultimate goal is to suck as much memory as I can to create a very long list. To do this, I created this code to understand/predict how big my list could be based on available memory. To my surprise, python needs a ton of memory to manage my list, I guess.
Why does python use so much memory?! What is it doing with it? Any idea on how I can predict python's memory requirements to effectively create a list to use pretty much all the available memory while preventing the OS from doing swap?
The getsizeof function only includes the space used by the list itself.
But the list is effectively just an array of pointers to int objects, and you created 10000000 of those, and each one of those takes memory as well—typically 24 bytes.
The first few numbers (usually up to 255) are pre-created and cached by the interpreter, so they're effectively free, but the rest are not. So, you want to add something like this:
int_memory = sys.getsizeof(10000)
print "%d int objects should use another %d MB of memory " % (positions - 256, (positions - 256) * int_memory / 1024 / 1024)
And then the results will make more sense.
But notice that if you aren't creating a range with 10M unique ints, but instead, say, 10M random ints from 0-10000, or 10M copies of 0, that calculation will no longer be correct. So if want to handle those cases, you need to do something like stash the id of every object you've seen so far and skip any additional references to the same id.
The Python 2.x docs used to have a link to an old recursive getsizeof function that does that, and more… but that link went dead, so it was removed.
The 3.x docs have a link to a newer one, which may or may not work in Python 2.7. (I notice from a quick glance that it uses a __future__ statement for print, and falls back from reprlib.repr to repr, so it probably does.)
If you're wondering why every int is 24 bytes long (in 64-bit CPython; it's different for different platforms and implementations, of course):
CPython represents every builtin type as a C struct that contains, at least, space for a refcount and a pointer to the type. Any actual value the object needs to represent is in addition to that.1 So, the smallest non-singleton type is going to take 24 bytes per instance.
If you're wondering how you can avoid using up 24 bytes per integer, the answer is to use NumPy's ndarray—or, if for some reason you can't, the stdlib's array.array.
Either one lets you specify a "native type", like np.int32 for NumPy or i for array.array, and create an array that holds 100M of those native-type values directly. That will take exactly 4 bytes per value, plus a few dozen constant bytes of header overhead, which is a lot smaller than a list's 8 bytes of pointer, plus a bit of slack at the end that scales with the length, plus an int object wrapping up each value.
Using array.array, you're sacrificing speed for space,2 because every time you want to access one of those values, Python has to pull it out and "box" it as an int object.
Using NumPy, you're gaining both speed and space, because NumPy will let you perform vectorized operations over the whole array in a tightly-optimized C loop.
1. What about non-builtin types, that you create in Python with class? They have a pointer to a dict—which you can see from Python-land as __dict__—that holds all the attributes you add. So they're 24 bytes according to getsizeof, but of course you have to also add the size of that dict.
2. Unless you aren't. Preventing your system from going into swap hell is likely to speed things up a lot more than the boxing and unboxing slows things down. And, even if you aren't avoiding that massive cliff, you may still be avoiding smaller cliffs involving VM paging or cache locality.
I am trying to encode some data ( a very big string actually ) in a very memory efficient way on the Redis side. According to the Redis docs, it is claimed that "use hashes when possible", and it declares two configuration parameters:
The "hash-max-zipmap-entries", which if I understood well it denotes how many keys at most every hash key must have ( am I right?).
The "hash-max-zipmap-value", which denotes the maximum length for the value. Does it refer to the field or to the value, actually? And the length is in bytes, characters, or what?
My thought is to split the string ( which somehow has fixed length) in such quantities that will play well with the above parameters, and store them as values. The fields should be just sequence numbers, to ensure a consistent decoding..
EDIT: I have benchmarked extensively and it seems that encoding the string in a hash yields a ~50% better memory consumption.
Here is my benchmarking script:
import redis, random, sys
def new_db():
db = redis.Redis(host='localhost', port=6666, db=0)
db.flushall()
return db
def db_info(db):
return " used memory %s " % db.info()["used_memory_human"]
def random_string(_len):
letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890"
return "".join([letters[random.randint(0,len(letters)-1)] for i in range(_len) ])
def chunk(astr, size):
while len(astr) > size:
yield astr[:size]
astr = astr[size:]
if len(astr):
yield astr
def encode_as_dict(astr, size):
dod={}
cnt = 0
for i in chunk(astr,size):
dod[cnt] = i
cnt+=1
return dod
db=new_db()
r = random_string(1000000)
print "size of string in bytes ", sys.getsizeof(r)
print "default Redis memory consumption", db_info(db)
dict_chunk = 10000
print "*"*100
print "BENCHMARKING \n"
db=new_db()
db.set("akey", r)
print "as string " , db_info(db)
print "*"*100
db=new_db()
db.hmset("akey", encode_as_dict(r,dict_chunk))
print "as dict and stored at value" , db_info(db)
print "*"*100
and the results on my machine (32bit Redis instance):
size of string in bytes 1000024
default Redis memory consumption used memory 534.52K
******************************************************************************************
BENCHMARKING
as string used memory 2.98M
******************************************************************************************
as dict and stored at value used memory 1.49M
I am asking if there is a more efficient way to store the string as a hash, by playing with the parameters I mentioned. So firstly, I must be aware of what they mean.. Then I'll benchmark again and see if there is more gain..
EDIT2: Am I an idiot? The benchmarking is correct, but it's confirmed for one big string. If I repeat for many big strings, storing them as big strings is the definite winner.. I think that the reason why I got those results for one string lies in the Redis internals..
Actually, the most efficient way to store a large string is as a large string - anything else adds overhead. The optimizations you mention are for dealing with lots of short strings, where empty space between the strings can become an issue.
Performance on storing a large string may not be as good as for small strings due to the need to find more contiguous blocks to store it, but that is unlikely to actually affect anything.
I've tried reading the Redis docs about the settings you mention, and it isn't easy. But it doesn't sound to me like your plan is a good idea. The hashing they describe is designed to save memory for small values. The values are still stored completely in memory. It sounds to me like they are reducing the overhead when they appear many times, for example, when a string is added to many sets. Your string doesn't meet these criteria. I strongly doubt you will save memory using your scheme.
You can of course benchmark it to see.
Try to look at Redis Memory Usage article where you can find a good comparison of various data types and their memory consumption.
When you storing data in a hash you will just skip ~100 bytes per value overhead!
So when your string length is comparable, 100-200 bytes for instance, than you may see 30-50% memory savings, for integers it's a 10 times less memory though!
Here is couple of links:
About 100 bytes overhead
Different memory optimization comparision gist
I need to run a MonteCarlo simulations in parallel on different machines. The code is in c++, but the program is set up and launched with a python script that set a lot of things, in particular the random seed. The function setseed thake a 4 bytes unsigned integer
Using a simple
import time
setseed(int(time.time()))
is not very good because I submit the jobs to a queue on a cluster, they remain pending for some minutes then they starts, but the start time is impredicible, it can be that two jobs start at the same time (seconds), so I switch to:
setseet(int(time.time()*100))
but I'm not happy. What is the best solution? Maybe I can combine information from: time, machine id, process id. Or maybe the best solution is to read from /dev/random (linux machines)?
How to read 4 bytes from /dev/random?
f = open("/dev/random","rb")
f.read(4)
give me a string, I want an integer!
Reading from /dev/random is a good idea. Just convert the 4 byte string into an Integer:
f = open("/dev/random","rb")
rnd_str = f.read(4)
Either using struct:
import struct
rand_int = struct.unpack('I', rnd_string)[0]
Update Uppercase I is needed.
Or multiply and add:
rand_int = 0
for c in rnd_str:
rand_int <<= 8
rand_int += ord(c)
You could simply copy over the four bytes into an integer, that should be the least of your worries.
But parallel pseudo-random number generation is a rather complex topic and very often not done well. Usually you generate seeds on one machine and distribute them to the others.
Take a look at SPRNG, which handles exactly your problem.
If this is Linux or a similar OS, you want /dev/urandom -- it always produces data immediately.
/dev/random may stall waiting for the system to gather randomness. It does produce cryptographic-grade random numbers, but that is overkill for your problem.
You can use a random number as the seed, which has the advantage of being operating-system agnostic (no /dev/random needed), with no conversion from string to int:
Why not simply use
random.randrange(-2**31, 2**31)
as the seed of each process? Slightly different starting times give wildly different seeds, this way…
You could also alternatively use the random.jumpahead method, if you know roughly how many random numbers each process is going to use (the documentation of random.WichmannHill.jumpahead is useful).