I have a deque object what holds a large amount of data. I want to extract, say, 4096 elements from the front of the queue (I'm using it as a kind of FIFO). It seems like there should be way of doing this without having to iterate over 4096 pop requests.
Is this correct/efficient/stupid?
A = arange(100000)
B = deque()
C = [] # List will do
B.extend(A) # Nice large deque
# extract 4096 elements
for i in xrange(4096):
C.append(A.popleft())
There is no multi-pop method for deques. You're welcome to submit a feature request to bugs.python.org and I'll consider adding it.
I don't know the details of your use case, but if your data comes in blocks of 4096, consider storing the blocks in tuples or lists and then adding the blocks to the deque:
block = data[:4096]
d.append(block)
...
someblock = d.popleft()
Where you're using a deque the .popleft() method is really the best method of getting elements off the front. You can index into it, but index performance degrades toward the middle of the deque (as opposed to a list that has quick indexed access, but slow pops). You could get away with this though (saves a few lines of code):
A = arange(100000)
B = deque(A)
C = [B.popleft() for _i in xrange(4096)]
Related
If I have a list that is made up of 1MM ids, how would I pull from that list in intervals of 50k?
For example:
[1]cusid=df['customer_id'].unique().tolist()
[1]1,000,500
If I want to pull in chunks, is the below correct for 50k?
cusid=cusid[:50000] - first 50k ids
cusid=cusid[50000:100001] - the next 50k of ids
cusid=cusid[100001:150001] - the next 50k
are my interval selections correct?
Thanks!
cusid2 = [cusid[a:a+50000] for a in range(0, 950000, 50000)]
This is a list comprehension basically you will add to your list every element cusid[a: a+50000] for a going from 0 to 950000 (so 1m minus 50k) and iterate with a step of 50k so a will go up by 50k every iteration
Couple of things to mention:
It seems that you're using "data science" stack for your work, good chance you have numpy available, please take a look at numpy.array_split. You can calculate chunk amount once and use np view machinery. Most probably this is a lot faster than bringing np arrays in to native python lists
Idiomatic python approach (IMO) would be leveraging iterators + islice:
from itertools import islice
# create iterator from your array/list, this is cheap operation
iterator = iter(cusid)
# if you want element-wise operations, you can use your chunk in loops or function that require iterations
# this is really memory-efficient, as you don't put whole chunk in memory
chunk = islice(iterator, 50000)
s = sum(chunk)
# in case you really need whole chunk in memory, just turn isclice into list
chunk = list(islice(iterator, 50000))
last_in_chunk = chunk[-1]
# and you always use same code to consume next chunk from your source
# without maintaining any counters
next_chunk = list(islice(iterator, 50000))
When your iterator is exhausted (there's no values left) you will get empty chunk(s). When there's not enough elements to create full chunk, you will get as much as is left there.
I want to check a condition against the front of a queue before deciding whether or not to pop. How can I achieve this in python with collections.deque?
list(my_deque)[0]
seems ugly and poor for performance.
TL;DR: assuming your deque is called d, just inspect d[0], since the "leftmost" element in a deque is the front (you might want to test before the length of the deque to make sure it's not empty). Taking #asongtoruin's suggestion, use if d: to test whether the deque is empty (it's equivalent to if len(d) == 0:, but more pythonic)
###Why not converting to list?
Because deques are indexable and you're testing the front. While a deque has an interface similar to a list, the implementation is optimized for front- and back- operations. Quoting the documentation:
Deques support thread-safe, memory efficient appends and pops from
either side of the deque with approximately the same O(1) performance
in either direction.
Though list objects support similar operations, they are optimized for
fast fixed-length operations and incur O(n) memory movement costs for
pop(0) and insert(0, v) operations which change both the size and
position of the underlying data representation.
Converting to list might be desirable if you have lots of operations accessing the "middle" of the queue. Again quoting the documentation:
Indexed access is O(1) at both ends but slows to O(n) in the middle.
For fast random access, use lists instead.
Conversion to list is O(n), but every subsequent access is O(1).
You can simply find the last element using my_deque[-1] or my_deque[len(my_deque)-1] .
Here is a simple implementation that allowed me to check the front of the queue before popping (using while and q[0]):
Apply your own condition against q[0], before q.popleft(), below:
testLst = [100,200,-100,400,340]
q=deque(testLst)
while q:
print(q)
print('{}{}'.format("length of queue: ", len(q)))
print('{}{}'.format("head: ", q[0]))
print()
q.popleft()
output:
deque([100, 200, -100, 400, 340])
length of queue: 5
head: 100
deque([200, -100, 400, 340])
length of queue: 4
head: 200
deque([-100, 400, 340])
length of queue: 3
head: -100
deque([400, 340])
length of queue: 2
head: 400
deque([340])
length of queue: 1
head: 340
Assuming your deque is implemented from collections python
from collections import deque
deque = deque() //syntax
Deque too can be interpreted as a list in terms of accessing using indices.
You can peek front element by using deque[0] and peek last using deque[-1]
This works without popping elements from left or right and seems efficient too.
I have a very large (say a few thousand) list of partitions, something like:
[[9,0,0,0,0,0,0,0,0],
[8,1,0,0,0,0,0,0,0],
...,
[1,1,1,1,1,1,1,1,1]]
What I want to do is apply to each of them a function (which outputs a small number of partitions), then put all the outputs in a list and remove duplicates.
I am able to do this, but the problem is that my computer gets very slow if I put the above list directly into the python file (esp. when scrolling). What is making it slow? If it is memory being used to load the whole list,
Is there a way to put the partitions in another file, and have the function just read the list term by term?
EDIT: I am adding some code. My code is probably very inefficient because I'm quite an amateur. So what I really have is a list of lists of partitions, that I want to add to:
listofparts3 = [[[3],[2,1],[1,1,1]],
[[6],[5,1],...,[1,1,1,1,1,1]],...]
def addtolist3(n):
a=int(n/3)-2
counter = 0
added = []
for i in range(len(listofparts3[a])):
first = listofparts3[a][i]
if len(first)<n:
for i in range(n-len(first)):
first.append(0)
answer = lowering1(fock(first),-2)[0]
for j in range(len(answer)):
newelement = True
for k in range(len(added)):
if (answer[j]==added[k]).all():
newelement = False
break
if newelement==True:
added.append(answer[j])
print(counter)
counter = counter+1
for i in range(len(added)):
added[i]=partition(added[i]).tolist()
return(added)
fock, lowering1, partition are all functions in earlier code, they are pretty simple functions. The above function, say addtolist(24), takes all the partition of 21 that I have and returns the desired list of partitions of 24, which I can then append to the end of listofparts3.
A few thousand partitions uses only a modest amount of memory, so that likely isn't the source of your problem.
One way to speed-up function application is to use map() for Python 3 or itertools.imap() from Python 2.
The fastest way to eliminate duplicates is to feed them into a Python set() object.
Is there any way of dynamically adding elements and at the same time removing some of them? Preferably in MATLAB.
For example, let's say I am streaming data from a sensor. Since it will be streaming forever, I would like to keep only the last, say, 100 samples/elements of the vector.
You may try Queue module in python:
from Queue import Queue
q=Queue()
to enque at back : q.put(x)
to deque from front : q.get()
You may use deque from collections as well(in case you have some advance requirement) in python:
from collections import deque
d = deque([])
to enque at back : d.append(x)
to enque at front : d.appendleft(x)
to deque from back : d.pop()
to deqeue from front : d.popleft()
There's no formal queue data structure for this in Matlab, but the basic case you describe can be implemented quite simply with clever use of indexing and max:
d = []; % Allocate empty array
n = 100; % Max length of buffer/queue
% A loop example for illustration
for i = 1:1e3
x = rand(1,3); % Data to append
d = [d(max(end-n+1+length(x),1):end) x]; % Append to end, remove from front if needed
end
The above assumes the appended data, x, is a row vector of length 0 to n. You can easily modify this to append to the front, etc. It could also be turned into a function as well.
You can also find classes that implement various forms of queues on the MathWorks File Exchange, e.g., this one.
Is there a way to remove elements from the start of a long list of numbers? Right now I am doing del arr[i:i+x] but it is slow since it has to move everything past that point to the left, which is time-consuming for large lists.
I looked into deques but not sure if those apply here. Could use some direction!
Yes deques do apply here, you should use them, it will be very fast if they are very near the front but slower if the start index is located towards the middle.
Indexed access is O(1) at both ends but slows to O(n) in the middle.
>>> from collections import deque
>>> def delete_slice(d, start, stop):
d.rotate(-start)
for i in range(stop-start): # use xrange on Python 2
d.popleft()
d.rotate(start)
>>> d = deque(range(15))
>>> delete_slice(d, 5, 10)
>>> d
deque([0, 1, 2, 3, 4, 10, 11, 12, 13, 14])
Note: Rotating past the middle, as previously stated, will be slow, if you want to support fast deletions from the right side you can extend the code like so:
def delete_slice(d, start, stop):
stop = min(stop, len(d)) # don't go past the end
start = min(start, stop) # don't go past stop
if start < len(d) // 2:
d.rotate(-start)
for i in range(stop-start): # use xrange on Python 2
d.popleft()
d.rotate(start)
else:
n = len(d) - stop
d.rotate(n)
for i in range(stop - start):
d.pop()
d.rotate(-n)
Of course there is some other error checking you will need to do but I'll leave that out of here for simplicity's sake. Unfortunately these methods are not already provided by deque itself, so you have to implement them like this.
To implement deque slicing, use a similar approach applying rotate() to bring a target element to the left side of the deque. Remove old entries with popleft(), add new entries with extend(), and then reverse the rotation. With minor variations on that approach, it is easy to implement Forth style stack manipulations such as dup, drop, swap, over, pick, rot, and roll.
Yes, deque applies here. Have prepared an example which shows how to use it:
import collections
"create deque from list"
d=collections.deque([1,2,3,4,5,6])
"remove first element"
d.popleft()
print d
Output:
deque([2,3,4,5,6])
I'm thinking you probably want a tree or skiplist.
I did a study of Python tree implementations a while back:
http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/
You might be better off asking about this in the algorithms section of the site.
If you're doing several deletions in a row, it might be more efficient to create a new list using a generator with a filter:
arr = [e for e in arr if not rejected(e)]
If you need to work with indexes, you can use enumerate:
arr = [e for i, e in enumerate(arr) if not rejected(i)]
Both operations are O(n) (O(2*n) in space), while performing several deletions in a row is O(n*m) (but O(n) in space).
deque has this characteristic, which may not be what you want:
Indexed access is O(1) at both ends but slows to O(n) in the middle.
Lists are not optimized for appending or popping at the front. Deques are, though.
Note also that the deque.remove(item) in the CPython implementation will search from the front. If you need to move/remove queue items frequently, you may benefit from considering whether most items to be removed will be near the end and use the queue flipped if that is the case, i.e. appendleft and pop instead of append and popleft.