In a machine learning project written in python, I need an efficient circular buffer like collections.deque but with constant-time access to any element like numpy.array. The problem is that deque is apparently a linked list. Is there something efficient readily implemented in a python library that I am not aware of for this use-case please?
I could simply have a modified fixed-size numpy.array with a moving 0 index in my use-case, I guess, but that's for my python culture as it is not the first time I need something like that.
collections.deque is not exactly a linked-list. It's a doubly-linked list of arrays of size 64. I'd say it's a pretty decent choice when you want both the random-access and appending on both ends without constant reallocation.
However, if you've done proper performance profiling and that circular buffer is really your bottle-neck then you can implement the buffer in plain C for performance and add bindings to python.
Related
I have a question: what is the big O or time complexity of Python's built-in function zfill?
I don't know where to find this information.
Python zfill implementation lives here.
Here is is relatively apparent that the implementation is primarily composed of an allocation and memset/memcpy, with the rest being very simple addition/subtraction.
The big-O of these operations are platform/implementation/circumstance driven and often not that depended on the length of the string (They might happen instantaneous. They might allocate a new page for heap and request access to extra memory from an external services taking a few second). But for your purposes I would act like it is O(n), where n is the size of the resulting string, since that is probably what memset/memcpy have (as discussed here alloc is not really measurable).
But in truth, you probably shouldn't worry about this, since there is nothing you can do to change it. A manually implementation would certainly be slower.
I'm trying to write a simple language interpreter for a custom language in C. I want to use C over C++ due to C's simplicity.
The things I'm not sure how to do in C is, storing variables and variable lookups.
I was planning to store variables in an array, but I think I'd need a variable sized array.
I also don't know an efficient way to lookup variables from an array besides just looping through it.
So I'd like to know, what is an efficient way of creating a variable sized array? How does Python or Ruby or Go store and retrieve variables efficiently?
How does Python or Ruby or Go store and retrieve variables efficiently?
Python and Ruby use hash-tables: the name of the variable is translated into an integer, and that integer is used as index into an array. It can always happen that several names collide (translate to the same integer), so that needs to be taken into account by allowing several bindings from name to value at the same slot, but there will only be a few to check for each name.
Go is compiled, so the variable is translated to an address (either static or an offset with respect to the stack—or frame—pointer) at compile-time.
what is an efficient way of creating a variable sized array?
If you decided to do that, you would use malloc and realloc.
In the case of resizing the array of buckets of a hash-table, realloc is unfortunately not useful because all the keys in the old array of buckets need to be re-hashed one by one to find where they go in the new array. If you know the maximum size of programs that will be interpreted by your interpreter, you can allocate the hash-table directly at the size that works for the largest programs, and avoid writing the hash-table resizing function.
I think you can get really carried away when trying to implement a variable-storage yourself. I would recommend you use an existing hashmap like uthash just to see how it works out for you conceptually and encapsulate it as good as possible. If it turns out to be a bottleneck, you can come back and optimize later.
I am somewhat confident to say, that at that time, you will not pick a dynamically expanding array. You have to consider that you need to implement a string-based search to find a variable by name, so you will have a hard time doing better than a hashmap with a dynamically expanding array. Search on it would be O(n) if unsorted and O(log n) if sorted, whereas the hashmap has O(1) search complexity.
My background: new to python, most programming experience is with C and Java.
My understanding is that Python uses lists as the basic 'array' data structure. In C, arrays are guaranteed to be have a contiguous memory allocation. What is the underlying memory layout of lists in Python?
For example, what can be known about the memory layout of 'block' (see below)? I'm tempted to think about accessing elements in the lists via pointers, as one could in C, but I think I need a new paradigm for Python.
block = blocksize * [0]
Anyways, my real question. I need to pass a zero'd out chunk of memory to a function of 'blocksize' length. What is the best way to do this in Python? Currently, this is what I'm doing:
zero_block = blocksize * [0]
z = SHA256.new(array.array('b',zero_block)).hexdigest()
My understanding is that 'zero_block' is a list, and array.array(typecode[, initializer]) will call array.fromlist() to populate the array.
To sum it up:
What is the correct way to think about memory layout for data types such as lists, sets, dicts, etc. in Python?
In the above code, what would be a better way to create a array of zero's with 'blocksize' size?
Where/how does the notion of a pointers and memory addressing fit in with Python?
Thanks.
ps. this is my first StackOverflow question asked;
In Python 2.x, the best way to handle blocks of contiguous data is as strings. For the more advanced data structures, you shouldn't really think about how they're stored. Rather, if you need to interface with something that wants a specific binary structure, then think about how to convert Python data to a string with that format, and/or vice versa. Look into the "struct" module.
'\0' * blocksize
There's really no place in Python for the notion of pointers or memory addressing, and that's one of its strengths. Memory is allocated and released as you need it, automatically, in the background.
In my code I have for loop that indexes over a multidimensional numpy array and does some operation using the sub-array that is obtained at each iteration. It looks like this
for sub in Arr:
#do stuff using sub
Now the stuff that is done using sub is fully vectorized, so it should be efficient. On the other hand this loop iterates about ~10^5 times and is the bottleneck. Do you think I will get an improvement by offloading this part to C. I am somewhat reluctant to do so because the do stuff using sub uses broadcasting, slicing, smart-indexing tricks that would be tedious to write in plain C. I would also welcome thoughts and suggestions about how to deal with broadcasting, slicing, smart-indexing when offloading computation to C.
If you can't 'vectorize' the entire operation and looping is indeed the bottleneck, then I highly recommend using Cython. I've been dabbling around with it recently and it is straightforward to work with and has a decent interface with numpy. For something like a langevin integrator I saw a 115x speedup over a decent implementation in numpy. See the documentation here:
http://docs.cython.org/src/tutorial/numpy.html
and I also recommend looking at the following paper
You may see satisfactory speedups by just typing the input array and the loop counter, but if you want to leverage the full potential of cython, then you are going to have to hardcode the equivalent broadcasting.
San you can take a look at scipy.weave. You can use scipy.weave.blitz to transparently translate your expression into C++ code and run it. It will handle slicing automatically and get rid of temporaries, but you claim that the body of your for loop does not create temporaries so your milage may vary.
However if you want to replace your entire for loop with something more efficient then you could make use of scipy.inline. The drawback is that you have to write C++ code. This should not be too hard because you can use Blitz++ syntax which is very close to numpy array expressions. Slicing is directly supported, broadcasting however is not.
There are two work arounds:
is to use the numpy-C api and use multi-dimensional iterators. They transparently handle broadcasting. However you are invoking the Numpy runtime so there might be some overhead. The other option, and possibly the simpler option is to use the usual matrix notation for broadcasting. Broadcast operations can be written as outer-products with vector of all ones. The good thing is that Blitz++ will not actually create this temporary broadcasted arrays in memory, it will figure out how to wrap it into an equivalent loop.
For the second option take a look at http://www.oonumerics.org/blitz/docs/blitz_3.html#SEC88 for index place holders. As long as your matrix has less than 11 dimensions you are fine. This link shows how they can be used to form outer products http://www.oonumerics.org/blitz/docs/blitz_3.html#SEC99 (search for outer products to go to the relevant part of the document).
Besides using Cython, you can write the bottle-neck part(s) in Fortran. Then use f2py to compile it to Python .pyd file.
In order to save space and the complexity of having to maintain the consistency of data between different sources, I'm considering storing start/end indices for some substrings instead of storing the substrings themselves. The trick is that if I do so, it's possible I'll be creating slices ALL the time. Is this something to be avoided? Is the slice operator fast enough I don't need to worry? How about the new object creation/destruction overhead?
Okay, I learned my lesson. Don't optimize unless there's a real problem you're trying to fix. (Of course this doesn't mean to right needlessly bad code, but that's beside the point...) Also, test and profile before coming to stack overflow. =D Thanks everyone!
Fast enough as opposed to what? How do you do it right now? What exactly are you storing, what exactly are you retrieving? The answer probably highly depends on this. Which brings us to ...
Measure! Don't discuss and analyze theoretically; try and measure what is the more performant way. Then decide whether the possible performance gain justifies refactoring your database.
Edit: I just ran a test measuring string slicing versus lookup in a dict keyed on (start, end) tuples. It suggests that there's not much of a difference. It's a pretty naive test, though, so take it with a pinch of salt.
In a comment the OP mentions bloat "in the database" -- but no information regarding what database he's talking about; from the scant information in that comment it would seem that Python string slices aren't necessarily what's involved, rather, the "slicing" would be done by the DB engine upon retrieval.
If that's the actual situation then I would recommend on general principles against storing redundant information in the DB -- a "normal form" (maybe in a lax sense of the expression;-) whereby information is stored just once and derived information is recomputed (or cached charge of the DB engine, etc;-) should be the norm, and "denormalization" by deliberately storing derived information very much the exception and only when justified by specific, well measured retrieval-performance needs.
If the reference to "database" was a misdirection;-), or rather used in a lax sense as I did for "normal form" above;-), then another consideration may apply: since Python strings are immutable, it would seem to be natural to not have to do slices by copying, but rather have each slice reuse part of the memory space of the parent it's being sliced from (much as is done for numpy arrays' slices). However that's not currently part of the Python core. I did once try a patch to that purpose, but the problem of adding a reference to the big string and thus making it stay in memory just because a tiny substring thereof is still referenced loomed large for general-purpose adaptation. Still it would be possible to make a special purpose subclass of string (and one of unicode) for the case in which the big "parent" string needs to stay in memory anyway. Currently buffer does a tiny bit of that, but you can't call string methods on a buffer object (without explicitly copying it to a string object first), so it's only really useful for output and a few special cases... but there's no real conceptual block against adding string method (I doubt that would be adopted in the core, but it should be decently easy to maintain as a third party module anyway;-).
The worth of such an approach can hardly be solidly proven by measurement, one way or another -- speed would be very similar to the current implicitly-copying approach; the advantage would come entirely in terms of reducing memory footprint, which wouldn't so much make any given Python code faster, but rather allow a certain program to execute on a machine with a bit less RAM, or multi-task better when several instances are being used at the same time in separate processes. See rope for a similar but richer approach once experimented with in the context of C++ (but note it didn't make it into the standard;-).
I haven't done any measurements either, but since it sounds like you're already taking a C approach to a problem in Python, you might want to take a look at Python's built-in mmap library:
Memory-mapped file objects behave like both strings and like file objects. Unlike normal string objects, however, these are mutable. You can use mmap objects in most places where strings are expected; for example, you can use the re module to search through a memory-mapped file. Since they’re mutable, you can change a single character by doing obj[index] = 'a', or change a substring by assigning to a slice: obj[i1:i2] = '...'. You can also read and write data starting at the current file position, and seek() through the file to different positions.
I'm not sure from your question if that's exactly what you're looking for. And it bears repeating that you need to take some measurements. Python's timeit library is the easy one to use, but there's also cProfile or hotshot, although hotshot is at risk of being removed from the standard library as I understand it.
Would slices be ineffective because they create copies of the source string? This may or may not be an issue. If it turns out to be an issue, would it not be possible to simply implement a "String view"; an object that has a reference to the source string and has a start and end point.. Upon access/iteration, it just reads from the source string.
premature optimization is the rool of all evil.
Prove to yourself that you really have a need to optimize code, then act.