This question already has an answer here:
Prevent RAM from paging to swap area (mlock)
(1 answer)
Closed 3 years ago.
I'm working on a password manager application for linux and I'm using Python for it.
Because of the security reasons I want to call mlock system call in order to avoid swapping password variable on hard drive.
I noticed that python itself didn't wrap this function.
so is there any way so can I avoid swapping?
Thanks
For CPython, there is no good answer for this that doesn't involve writing a Python C extension, since mlock works on pages, not objects. The internals of the str object differ from version to version (in Py3.3 and higher, a str may actually have several copies of the data in memory in different encodings, some inlined after the object structure, some dynamically allocated separately and linked by pointer), and even if you used ctypes to retrieve the necessary addresses and mlock-ed them all through ctypes mlock calls, you'll have a hell of a time determining when to mlock and when to munlock. Since mlock works on pages, you'd have to carefully track how many strings are currently in any given page (because if you just mlock and munlock blindly, and there are more than one things to lock in a page, the first munlock would unlock all of them; mlock/munlock is a boolean flag, it doesn't count the number of locks and unlocks).
Even if you manage that, you still would have a race between password acquisition and mlock during which the data could be written to swap, and those cached alternate encodings are computed lazily, so mlocking the non-NULL pointers at any given time doesn't necessarily mean those pointers might not be populated later.
You could partially avoid these problems through careful use of the mmap module and memoryviews (mmap gives you pages of memory, memoryview references said memory without copying it, so ctypes could be used to mlock the page), but you'd have to build it all from scratch (can't use the getpass module because it would store as a str for a moment).
In short, Python doesn't care about swapping or memory protection in the way you want; it trusts the swap file to be configured to your desired security (e.g. disabled or encrypted), neither providing additional protection nor providing the information you'd need to add it in.
Related
In my Python C Extension I am performing actions on an iterable of strings. So in a first step I call PySequence_Fast to convert it to a list and then iterate over the elements. For each string I use PyUnicode_DATA and then compare the strings using some criteria. So I only read from PyObjects, but never modify them.
Now I would like to process the list in parallel, which would require me to release the GIL. However I do not know which effects this has on my use case. Here are my current thoughts:
I can still use those APIs, since they are only macros, that directly read from the PyObjects without modifying them.
I have to use the APIs beforehand and store a array of structs that hold kind, length and data pointer of the strings
I have to use the APis beforehand and have to store a copy of the strings in a array
Case 1 would be the most performant and memory efficient. However it is stated, that without acquiring the GIL it is not allowed to perform on Python objects (does this include reading access) or use Python/C API functions.
Case 2 would be the next most efficient, since at least I do not have to copy all strings. However when I am not allowed to read from Python objects while the GIL is released, I wonder whether I would even be allowed to use a pointer to the data inside the PyObject.
Case 3 would require me to copy all strings. In my case this might make the multithreaded solution slower than a sequential solutions.
I hope someone can help me understand what I am allowed to do while the GIL is released.
I think the official answer is that you should not do method 1 and should use methods 2 and 3. And that while it might work now it could change in the future and break. This is especially important if you want to support things like PyPy's C-API wrapper (which might well use a different representation that Python internally). There are increasing moves to try to hide implementation details that you slightly risk getting caught out by.
Practically I think method 1 would work fine provided you only use the macro forms with no error checking - the GIL is mainly about stopping simultaneous writes putting Python objects in an undefined state, and you aren't doing this. Where I'd be slightly careful is if you ever have (deprecated) "non-canonical" unicode objects - things that look "macro-y" like PyUnicode_READY can cause them to be modified to the canonical state. Again, be especially wary of alternative (non-CPython) implementations of the C-API.
One alternative to consider would be to use the buffer protocol instead. Although I can't find it explicitly stated in the docs, the idea is that PyObject_GetBuffer and PyBuffer_Release require the GIL but reading/writing to the buffer doesn't. Here I have two sub-suggestions:
can you have a single object like a Numpy array that exposes all your strings as a buffer?
you can also get a buffer from a unicode object (as a utf-8 C-string) - the thing to do would be to create all the buffers with the GIL, do your parallel processing without, and them free them with the GIL. It's possible that the overhead for this might be inefficient. This is basically an "official" version of method 2.
I short, you'd probably get away with it, but if it ever breaks I doubt that a bug report to Python would be well-received (since it's technically wrong)
I have a python process serving as a WSGI-apache server. I have many copies of this process running on each of several machines. About 200 megabytes of my process is read-only python data. I would like to place these data in a memory-mapped segment so that the processes could share a single copy of those data. Best would be to be able to attach to those data so they could be actual python 2.7 data objects rather than parsing them out of something like pickle or DBM or SQLite.
Does anyone have sample code or pointers to a project that has done this to share?
This post by #modelnine on StackOverflow provides a really great comprehensive answer to this question. As he mentioned, using threads rather than process-forking in your webserver can significantly lesson the impact of this. I ran into a similar problem trying to share extremely-large NumPy arrays between CLI Python processes using some type of shared memory a couple of years ago, and we ended up using a combination of a sharedmem Python extension to share data between the workers (which proved to leak memory in certain cases, but, it's fixable probably). A read-only mmap() technique might work for you, but I'm not sure how to do that in pure-python (NumPy has a memmapping technique explained here). I've never found any clear and simple answers to this question, but hopefully this can point you in some new directions. Let us know what you end up doing!
It's difficult to share actual python objects because they are bound to the process address space. However, if you use mmap, you can create very usable shared objects. I'd create one process to pre-load the data, and the rest could use it. I found quite a good blog post that describes how it can be done: http://blog.schmichael.com/2011/05/15/sharing-python-data-between-processes-using-mmap/
Since it's read-only data you won't need to share any updates between processes (since there won't be any updates) I propose you just keep a local copy of it in each process.
If memory constraints is an issue you can have a look at using multiprocessing.Value or multiprocessing.Array without locks for this: https://docs.python.org/2/library/multiprocessing.html#shared-ctypes-objects
Other than that you'll have to rely on an external process and some serialising to get this done, I'd have a look at Redis or Memcached if I were you.
One possibility is to create a C- or C++-extension that provides a Pythonic interface to your shared data. You could memory map 200MB of raw data, and then have the C- or C++-extension provide it to the WSGI-service. That is, you could have regular (unshared) python objects implemented in C, which fetch data from some kind of binary format in shared memory. I know this isn't exactly what you wanted, but this way the data would at least appear pythonic to the WSGI-app.
However, if your data consists of many many very small objects, then it becomes important that even the "entrypoints" are located in the shared memory (otherwise they will waste too much memory). That is, you'd have to make sure that the PyObject* pointers that make up the interface to your data, actually themselves point to the shared memory. I.e, the python objects themselves would have to be in shared memory. As far as I can read the official docs, this isn't really supported. However, you could always try "handcrafting" python objects in shared memory, and see if it works. I'm guessing it would work, until the Python interpreter tries to free the memory. But in your case, it won't, since it's long-lived and read-only.
Say i store a password in plain text in a variable called passWd as a string.
How does python release this variable once i discard of it (for instance, with del passWd or passWd= 'new random data')?
Is the string stored as a byte-array meaning it can be overwritten in the memoryplace that it originally existed or is it a fixed set in a memory area which can't be modified and there for when assining a new value a new memory area is created and the old area is discareded but not overwritten by null?
I'm questioning how Python implements the safety of memory areas and would like to know more about it, mainly because i'm curious :)
From what i've gathered so far, using del (or __del__) causes the interpreter to not release memory areas of that variable automaticly which can cause issues, and also i'm not sure that del is so thurrow on deleting the values. But that's just from what i've gathered and not something in black or white :)
The main reason for me asking, is I'm intending to write a hand-over application that gets a string, does some I/O, passes it along to another subsystem (bootloader for raspberry pi for instance) and the interface is written in Python (how odd that must sound in some peoples ears..) and i'm not worried that the data is compromised during the I/O calculations but that a memory dump might be occuring in between the two subsystem handovers. or if the system is frozen (say a hiberation) say 20min after the system is booted and i removed the variable as fast as i could, but somehow it's still in the memory despite me doing a del passWd :)
(Ps. I've asked on Superuser, they refered me here aand i'm sorry for poor grammar!)
Unless you use custom coded input methods to get the password, it will be in many more places then just your immutable string. So don't worry too much.
The OS should take care that any data from your process is cleared before the memory is allocated to another process. This may of course fail if the page is copied to disk (swapped out or hibernated).
Secure password entry is not easy. Maybe you can find a special library or module that handles this.
I finally whent with two solutions.
ld_preload to replace the functionality of the string handling of Python on a lower level.
One other option which is a bit easier was to develop my own C library that has more functionality then what Python offers through the standard string handling.
Mainly the C code has a shread() function that writes over the memory area where the string "was" stored and some other error checks.
However, #Ber gave me a good enough answer to start developing my own solution since (as he pointed out) there is no secure method in Python and python stores strings in way to many places and relies on the OS (which, on it's own isn't a bad thing except when you don't trust the OS you are installing your realtively secure application on).
I'm working on a Python application which uses a number of open source third-party libraries. One of the libraries is based on ctypes, and I recently found more than 10 separate memory leaks in it. The causes of these leaks ranged from circular references on objects with explicit destructors (which Python can't garbage collect) to using c_char_p as a return type for functions returning non-const character arrays (resulting in the character arrays being converted automatically to Python strings and the original C-allocated arrays never being freed.)
I fixed the leaks I found and submitted a pull request to the author of the library. I've done some extremely informal testing by creating and deleting objects in a loop and watching Python's memory usage as I do so, and I think I've found all the leaks. However, as I'm planning to use this library in an application that I'd like to open source and hopefully have a few other people use, I'd like to be more sure than that. So my question is: is there a systematic way to find memory leaks in ctypes-based libraries?
During the process of fixing the leaks I've already found, I tried Heapy and objgraph but neither were particularly useful for this purpose. As far as I can tell, both of them will only show objects allocated on the Python heap, so they're of no use in finding leaks caused by improper handling of heap space allocated by C libraries. Is there a tool I can use in Python that can show me allocations on the C heap, and preferably also which Python objects, if any, refer to the allocated addresses?
You could try running the application under Valgrind. Valgrind's a useful tool for profiling memory use in compiled applications. This will at least detect the links and report their source.
You will certainly get false positives from Python calls. Check out this site for a nice description of how to use suppressions, which allow you to specifically ignore certain types of errors. See also Python's premade list of suppressions (here), and a description of why they are needed (here).
What would be the best way to handle lightweight crash recovery for my program?
I have a Python program that runs a number of test cases and the results are stored in a dictionary which serves as a cache. If I could save (and then restore) each item that is added to the dictionary, I could simply run the program again and the caching would provide suitable crash recovery.
You may assume that the keys and values in the dictionary are easily convertible to strings ie. using either str or the pickle module.
I want this to be completely cross platform - well at least as cross platform as Python is
I don't want to simply write out each value to a file and load it in my program might crash while I am writing the file
UPDATE: This is intended to be a lightweight module so a DBMS is out of the question.
UPDATE: Alex is correct in that I don't actually need to protect against crashes while writing out, but there are circumstances where I would like to be able to manually terminate it in a recoverable state.
UPDATE Added a highly limited solution using standard input below
There's no good way to guard against "your program crashing while writing a checkpoint to a file", but why should you worry so much about that?! What ELSE is your program doing at that time BESIDES "saving checkpoint to a file", that could easily cause it to crash?!
It's hard to beat pickle (or cPickle) for portability of serialization in Python, but, that's just about "turning your keys and values to strings". For saving key-value pairs (once stringified), few approaches are safer than just appending to a file (don't pickle to files if your crashes are far, far more frequent than normal, as you suggest tjey are).
If your environment is incredibly crash-prone for whatever reason (very cheap HW?-), just make sure you close the file (and fflush if the OS is also crash-prone;-), then reopen it for append. This way, worst that can happen is that the very latest append will be incomplete (due to a crash in the middle of things) -- then you just catch the exception raised by unpickling that incomplete record and redo only the things that weren't saved (because they weren't completed due to a crash, OR because they were completed but not fully saved due to a crash, comes to much the same thing in the end).
If you have the option of checkpointing to a database engine (instead of just doing so to files), consider it seriously! The DB engine will keep transaction logs and ensure ACID properties, making your application-side programming much easier IF you can count on that!-)
The pickle module supports serializing objects to a file (and loading from file):
http://docs.python.org/library/pickle.html
One possibility would be to create a number of smaller files ... each representing a subset of the state that you're trying to preserve and each with a checksum or tag indicating that it's complete as the last line/datum of the file (just before the file is closed).
If the checksum/tag is good then the rest of the data can be considered valid ... though program would then have to find all of these files, open and read all of them, and use meta data you've provided (in their headers or their names?) to determine which ones constitute the most recent cohesive state representation (or checkpoint) from which you can continue processing.
Without knowing more about the nature of the data that you're working with it's impossible to be more specific.
You can use files, of course, or you could use a DBMS system just about as easily. Any decent DBMS (PostgreSQL, MySQL if you're using the proper storage back-ends) can give you ACID guarantees and transactional support. So the data you read back should always be consistent with the constraints that you put in your schema and/or with the transactions (BEGIN, COMMIT, ROLLBACK) that you processed.
A possible advantage of posting your serialized date to a DBMS is that you can host the DBMS on a separate system (which is unlikely to suffer the same instabilities as your test host at the same times).
Pickle/cPickle have problems.
I use the JSON module to serialize objects out. I like it because not only does it work on any OS, but it will work fine in other programming languages, too; many other languages and platforms have readily-accessible JSON deserialization support, which makes it easy to use the same objects in different programs.
Solution with severe restrictions
If I don't worry about it crashing while writing out and I only want to allow manual termination, I can use standard output to control this. Unfortunately, this can only terminate the program when a control point is reached. This could be solved by creating a new thread to read standard input. This thread could use a global lock to check if the main thread is inside a critical section (writing to a file) and terminate the program if this is not the case.
Downsides:
This is reasonably complex
It adds an extra thread
It stops me using standard input for anything else