Removing n multiple items from a list - python

Question
I want to remove items from a list such that I keep the first n items, and remove the next 2n items.
For example
for n=8, I want to keep the first 8, remove the next 16 and repeat this as necessary:
a = range(48)
Which I want to become
[0,1,2,3,4,5,6,7,24,25,26,27,28,29,30,31]
This is to pick out the first 8 hours of a day, and run a function on each hour.
I've found it hard to phrase this in search queries so the answer is probably simple but I've had no luck!

You could just use a comprehension list:
[ a[i] for i in range(len(a)) if (i % 24 < 8) ]
The above only create a new list. If you want to edit the list in place, you must explicitely delete unwanted elements, starting from the end to avoid changing indexes:
for i in range(len(a) - 1, 0, -1):
if i % 24 >= 8:
del a[i]

def hours(n):
items = [x for x in range(49)]
del items[n:n*3]
print(items)
hours(8)
Depending on how new you are you might have a hard time understanding this code, so I will try to explain a little:
We start by creating a function which takes a parameter n which, for test purposes, we will be using 8 then we use a list comprehension to generate all our numbers (0, 48) and then delete the unneeded elements using the del statement, we are deleting from the nth to the n*3 element in the list. For example, if n were to be passed as 9 our use of the del statement could be translated as: del [9:27].
Hope this makes sense.

This should be quite easy to understand
a = range(48)
n=8
result=[]
while a:
result+= a[:n]
a=a[n*3:]
print result

Related

Dynamic substrings on List. 10 elements before variable

I have problem with dynamic substrings. I have list which can have 1000 elements, 100 elements or even 20. I want to make copy of that list, which will have elements from -10 to variable.
For example(pseudo-code):
L = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
variable = 12
print L[substring:variable]
>>> L = [2,3,4,5,6,7,8,9,10,12]
I can't figure out how make it correct. The point is that variable is always changing by one.
Here is my piece of code:
def Existing(self, Pages):
if(self.iter <= 10):
list = self.other_list[:self.iter]
else:
list = self.other_list[self.iter-10:self.iter]
result = 0
page = Pages[0]
list.reverse()
for blocks in Pages:
if(list.index(blocks) > result):
result = list.index(blocks)
page = blocks
return page
That method is looking for the element which has the farest index.
This part can be unclear. So assume that we have
list = [1,2,3,4,1,5,2,1,2,3,4]
Method should return 5, because it is the farest element. List has duplicates and .index() is returning index of the first element so i reverse list. With that code sometimes program returns that some element do not exist in List. The problem (after deep review with debbuger) is with substrings in self.other_list.
Could you help me with that problem? How to make it correct? Thanks for any advice.
EDIT: Because my problem is not clear enough (I was sure that it can be), so here are more examples.
Okay, so list Pages are list which cointains currently pages which are used. Second list "list" are list of all pages which HAS BEEN used. Method is looking for pages which are already used and choose that one which has been not used for the longest time. With word "use" I mean the index of element. What means the farest element? That one which the smallest index (remember about duplicates, the last duplicates means the real index).
So we have:
Pages = [1,3,5,9]
and
list = [1,2,5,3,6,3,5,1,2,9,3,2]
Method should return 5.
To sum up:
I'm looking for substring which give result:
With list =[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
For variable 12: [2,3,4,5,6,7,8,9,10,12]
for 13: [3,4,5,6,7,8,9,10,11,13]
ect :-)
I know that problem can be complicated. So i would aks you to focus only about substrings. :-) Thanks you very much!
If I understood your problem correctly you want to find the index of items from pages that is at minimum position in lst(taking duplicates in consideration).
So, for this you need to first reverse the list and then first the index of each item in pages in lst, if item is not found then return negative Infinity. Out of those indices you can find the max item and you'll get your answer.
from functools import partial
pages = [1,3,5,9]
lst = [1,2,5,3,6,3,5,1,2,9,3,2]
def get_index(seq, i):
try:
return seq.index(i)
except ValueError:
return float('-inf')
lst.reverse()
print max(pages, key=partial(get_index, lst))
#5
Note that the above method will take quadratic time, so it won't perform well for huge lists. If you're not concerned with some additional memory but linear time then you can use set and dict for this:
pages_set = set(pages)
d = {}
for i, k in enumerate(reversed(lst), 1):
if k not in d and k in pages_set:
d[k] = len(lst) - i
print min(d, key=d.get)
#5

Loop in a list (Python v3) beginner

I am stuck in making a loop that will eliminate the values(from the alist) that are below average.
Thanks for the help.
a=input("Enter a list of values separated by a coma : ")
alist=eval(a)
print("the list is : ",alist)
average = sum(alist)/len(alist)
print("the average is : ",average)
for i in alist:
if alist[i]<average:
alist.remove[i]
You are almost there. Instead of removing elements, select elements you want to retain instead:
alist = [a for a in alist if a>=average]
Your mistake here is that for i in alist: is iterating over list elements themselves, not indexes, so alist[i] is throwing an error (or returning nonsense).
For the "loop" you can use a filter and a lambda function.
above_average = list(filter(lambda x: x >= average, alist))
For the rest of your code, I suggest you clean it up to something which is safer (use of eval is very bad)
import ast
user_string = raw_input('input a list of numbers separated by a commas: ')
alist = list(ast.literal_eval(user_string)))
So, in all, I would write your code as something like this:
import ast
user_string = raw_input('input a list of numbers separated by a commas: ')
numbers = list(ast.literal_eval(user_string)))
average = sum(numbers)/len(numbers)
print('The numbers: {}'.format(numbers))
print('The average: {}'.format(average))
above_average = list(filter(lambda x: x >= average, numbers))
# now do what you want with the above_average numbers.
Other answers tell you how to do it. I'll tell you why it doesn't work:
You iterate over the list and, at the same time, modify it.
This leads to items being missed during the iteration.
Why?
Internally, the iteration works via an index to the list. So it is the same as doing
idx = 0
while True:
try:
i = alist[idx]
except IndexError:
break
idx += 1
if alist[i] < average:
alist.remove(i)
What happens if you are at the element #3, go to the next one and then remove #3? Right, the indexes of the remaining ones move down and you are pointing to the one which formerly was #5. The old #4 is skipped at this test.
(BTW, I don't know if you noticed, I have replaced your [] behind .remove with ().)
You are mixing two ways of iterating a list: By index, and by element. In your loop, i is not the index, but the element of the list itself, thus alist[i] won't work.
If you use the for x in somelist loop, then x is the element itself, not the index of the element. For iterating over the indices, you can use for i in range(len(somelist)), or you could use for i, x in enumerate(somelist) to loop over tuples of index and element.
Also note that removing elements from a list or other kinds of collections while you are looping them generally is a bad idea. Better create a copy of the list.
for x in list(alist): # creates a copy of alist
if x < average: # remember: x is the element itselt
alist.remove(x) # remove element x from list
But the way you do it (with eval of a comma-separated string of numbers), alist is a tuple, not a list, and thus has no remove method at all. Thus you either have to convert it to a list before (alist = list(eval(a)), or use one of the approaches given in the other answers, creating a new list using list comprehension or filter and retaining the "good" elements.
As a general principle for asking StackOverflow questions like this, you should always include example input and output -- show what happens, and what you expect to happen.
In this case, I believe there are two three problems with your code:
Edit: Third, but possibly most importantly, look at glglgl's answer. If you implement the two fixes I describe below, you'll still have one problem: your code won't necessarily remove all the items you want to remove, because it'll skip over some items.
First, you say alist[i], which grabs the element of alist at index i. But saying for i in alist makes i be successive elements in the list already. Example:
mylist = [1, 2, 4]
for i in mylist:
print(i)
Would give you the output:
1
2
4
If you instead said this (which is like what you wrote)
mylist = [1, 2, 4]
for i in mylist:
print(mylist[i])
It wouldn't work as you'd expect, because you'd get the element at index 1, the element at index 2, and then try to get the element at index 4, but that wouldn't exist. You'll get something like this:
2
4
IndexError: list index out of range
Second, your syntax for removing an element is wrong. You should use alist.remove(i) instead of alist.remove[i]. You want to call a function, so you use parentheses. The square brackets are for indexing and slicing.

Multiple mismatches in DNA search sequence regex

I have written this barbaric script to create permutations of a string of characters that contain n (up to n=4) $'s in all possible combinations of positions within the string. I will eventually .replace('$','(\\w)') to use for mismatches in a dna search sequence. Because of the way I wrote the script, some of the permutations have less than the requested number of $'s. I then wrote a script to remove them, but it doesn't seem to be effective, and each time I run the removal script, it removes more of the unwanted permutations. In the code pasted below, you will see that I test the function with a simple sequence with 4 mismatches. I then run a series of removal scripts that count how many expressions are removed each time...in my experience, it takes about 8 times to remove all expressions with less than 4 wild-card $'s. I have a couple questions about this:
Is there a built in function for searches with 'n' mismatches? Maybe even in biopython? So far, I've seen the Paul_McGuire_regex function:
Search for string allowing for one mismatch in any location of the string,
which seems only to generate 1 mismatch. I must admit, I don't fully understand all of the code in the remainining functions on that page, as I am a very new coder.
Since I see this as a good exercise for me, is there a better way to write this entire script?...Can I iterate Paul_McGuire_regex function as many times as I need?
Most perplexing to me, why won't the removal script work 100% the first time?
Thanks for any help you can provide!
def Mismatch(Search,n):
List = []
SearchL = list(Search)
if n > 4:
return("Error: Maximum of 4 mismatches")
for i in range(0,len(Search)):
if n == 1:
SearchL_i = list(Search)
SearchL_i[i] = '$'
List.append(''.join(SearchL_i))
if n > 1:
for j in range (0,len(Search)):
if n == 2:
SearchL_j = list(Search)
SearchL_j[i] = '$'
SearchL_j[j] = '$'
List.append(''.join(SearchL_j))
if n > 2:
for k in range(0,len(Search)):
if n == 3:
SearchL_k = list(Search)
SearchL_k[i] = '$'
SearchL_k[j] = '$'
SearchL_k[k] = '$'
List.append(''.join(SearchL_k))
if n > 3:
for l in range(0,len(Search)):
if n ==4:
SearchL_l = list(Search)
SearchL_l[i] = '$'
SearchL_l[j] = '$'
SearchL_l[k] = '$'
SearchL_l[l] = '$'
List.append(''.join(SearchL_l))
counter=0
for el in List:
if el.count('$') < n:
counter+=1
List.remove(el)
return(List)
List_RE = Mismatch('abcde',4)
counter = 0
for el in List_RE:
if el.count('$') < 4:
List_RE.remove(el)
counter+=1
print("Filter2="+str(counter))
We can do away with questions 2 and 3 by answering question 1, but understanding question 3 is important so I'll do that first and then show how you can avoid it entirely:
Question 3
As to question 3, it's because when you loop over a list in python and make changes to it within the loop, the list that you loop over changes.
From the python docs on control flow (for statement section):
It is not safe to modify the sequence being iterated over in the loop
(this can only happen for mutable sequence types, such as lists).
Say your list is [a,b,c,d] and you loop through it with for el in List.
Say el is currently a and you do List.remove(el).
Now, your list is [b,c,d]. However, the iterator points to the second element in the list (since it's done the first), which is now c.
In essence, you've skipped b. So the problem is that you are modifying the list you are iterating over.
There are a few ways to fix this: if your List is not expensive to duplicate, you could make a copy. So iterate over List[:] but remove from List.
But suppose it's expensive to make copies of List all the time.
Then what you do is iterate over it backwards. Note the reversed below:
for el in reversed(List):
if el.count('$') < n:
counter+=1
List.remove(el)
return(List)
In the example above, suppose we iterate backwards over List.
The iterator starts at d, and then goes to c.
Suppose we remove c, so that List=[a,b,d].
Since the iterator is going backwards, it now points to element b, so we haven't skipped anything.
Basically, this avoids modifying bits of the list you have yet to iterate over.
Questions 1 & 2
If I understand your question correctly, you basically want to choose n out of m positions, where m is the length of the string (abcde), and place a '$' in each of these n positions.
In that case, you can use the itertools module to do that.
import itertools
def Mismatch(Search,n):
SearchL = list(Search)
List = [] # hold output
# print list of indices to replace with '$'
idxs = itertools.combinations(range(len(SearchL)),n)
# for each combination `idx` in idxs, replace str[idx] with '$':
for idx in idxs:
str = SearchL[:] # make a copy
for i in idx:
str[i]='$'
List.append( ''.join(str) ) # convert back to string
return List
Let's look at how this works:
turn the Search string into a list so it can be iterated over, create empty List to hold results.
idxs = itertools.combinations(range(len(SearchL)),n) says "find all subsets of length n in the set [0,1,2,3,...,length-of-search-string -1].
Try
idxs = itertools.combinations(range(5),4)
for idx in idxs:
print idx
to see what I mean.
Each element of idxs is a tuple of n indices from 0 to len(SearchL)-1 (e.g. (0,1,2,4). Replace the i'th character of SearchL with a '$' for each i in the tuple.
Convert the result back into a string and add it to List.
As an example:
Mismatch('abcde',3)
['$$$de', '$$c$e', '$$cd$', '$b$$e', '$b$d$', '$bc$$', 'a$$$e', 'a$$d$', 'a$c$$', 'ab$$$']
Mismatch('abcde',4) # note, the code you had made lots of duplicates.
['$$$$e', '$$$d$', '$$c$$', '$b$$$', 'a$$$$']

How do I handle the following situation in Python?

I want to say
a[current] = value
rather than saying
a.append(value)
because I want to show that the current value is value. The former listing shows this better. I come from C, so I am a bit confused with python lists. In C I preallocate space, so a[current] would exist and contain junk before I assign it value. Can I do something similar in Python?
You can do something like
[0] * 10
which will result in
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
But your approach will probably not be very "pythonic". If switching to Python, you should also think about thinking in python. ;-)
When I first started using Python I ran into this problem all the time. You can do things like '[0]*10', but is inelegant, and will cause you problems if you try to do the same thing for lists of lists.
I finally solved the problem by realizing most of the time I just needed to reformulate the problem into something more pythonic. List comprehensions (as noted above) will almost always be the correct answer. If you are preallocating space, that means you are probably going to iterate over some other list, do some operation, then set an element in the new list to that value:
newList = [op(e) for e in oldList]
You can get fancier. If for example, you don't want all the elements, you can set up a filter:
newList = [op(e) for e in oldList if e < 5]
which will only put items that are less than 5 into the newList.
However, sometimes list comprehension isn't want you want. For example, if you're doing math-oriented coding, then numpy can come to the rescue:
myVector = numpy.zeros(10)
will create an array with 10 elements.
You can allocate a list of length n by using
my_list = [None] * n
Obviously, the list will be initialised rather than containing junk.
That said, note that often a list comprehension is a good replacement for a loop containing calls to list.append().
If you want to create a list with n elements initialized to zero (read "preallocate") in Python, use this:
somelist = [0] * n
Is this what you want?
If you don't like append, you can do things like
a = [None]*10
for i in range(10):
a[i] = something()
you might be interested also in python arrays.
I think that the most approximate syntax would be:
a.insert(current, value)
but if current isn't the last position in the array, insert will allocate some extra space and shift everything after current in one position. Don't know if this is the desired behavior. The following code is just like an append:
a.insert(len(a), value)
If you want to show that the current value is 'value', why don't you just use a variable for it?
a.append(value)
current_value = value
If you are maintaining a separate current variable to indicate where the next item will be inserted (that is, your line a[current] = value is followed immediately by current += 1 and you wish you could just write a[current++] = value), then you're writing C in Python and should probably stop. :-)
Otherwise you probably want to preallocate the list with the number of items you want it to contain, as others have shown. If you want a list that will automatically extend to fill in missing values, such that a[100] = value will work even if the list only has 50 items, this can be done with a custom __setitem__() method on a list subclass:
class expandinglist(list):
def __setitem__(self, index, value):
length = len(self)
if index < length:
list.__setitem__(self, index, value)
elif index = length: # you don't actually need this case, it's just a bit
self.append(value) # faster than the below for adding a single item
else:
self.extend(([0] * (index - length)) + [value])
lyst = expandinglist()
lyst[5] = 5
print lyst
>> [0, 0, 0, 0, 0, 5]

Why does Python skip elements when I modify a list while iterating over it?

I'm currently developing a program in python and I just noticed that something was wrong with the foreach loop in the language, or maybe the list structure. I'll just give a generic example of my problem to simplify, since I get the same erroneous behavior on both my program and my generic example:
x = [1,2,2,2,2]
for i in x:
x.remove(i)
print x
Well, the problem here is simple, I though that this code was supposed to remove all elements from a list. Well, the problem is that after it's execution, I always get 2 remaining elements in the list.
What am I doing wrong? Thanks for all the help in advance.
Edit: I don't want to empty a list, this is just an example...
This is a well-documented behaviour in Python, that you aren't supposed to modify the list being iterated through. Try this instead:
for i in x[:]:
x.remove(i)
The [:] returns a "slice" of x, which happens to contain all its elements, and is thus effectively a copy of x.
When you delete an element, and the for-loop incs to the next index, you then skip an element.
Do it backwards. Or please state your real problem.
I think, broadly speaking, that when you write:
for x in lst:
# loop body goes here
under the hood, python is doing something like this:
i = 0
while i < len(lst):
x = lst[i]
# loop body goes here
i += 1
If you insert lst.remove(x) for the loop body, perhaps then you'll be able to see why you get the result you do?
Essentially, python uses a moving pointer to traverse the list. The pointer starts by pointing at the first element. Then you remove the first element, thus making the second element the new first element. Then the pointer move to the new second – previously third – element. And so on. (it might be clearer if you use [1,2,3,4,5] instead of [1,2,2,2,2] as your sample list)
Why don't you just use:
x = []
It's probably because you're changing the same array that you're iterating over.
Try Chris-Jester Young's answer if you want to clear the array your way.
I know this is an old post with an accepted answer but for those that may still come along...
A few previous answers have indicated it's a bad idea to change an iterable during iteration. But as a way to highlight what is happening...
>>> x=[1,2,3,4,5]
>>> for i in x:
... print i, x.index(i)
... x.remove(i)
... print x
...
1 0
[2, 3, 4, 5]
3 1
[2, 4, 5]
5 2
[2, 4]
Hopefully the visual helps clarify.
I agree with John Fouhy regarding the break condition. Traversing a copy of the list works for the remove() method, as Chris Jester-Young suggested. But if one needs to pop() specific items, then iterating in reverse works, as Erik mentioned, in which case the operation can be done in place. For example:
def r_enumerate(iterable):
"""enumerator for reverse iteration of an iterable"""
enum = enumerate(reversed(iterable))
last = len(iterable)-1
return ((last - i, x) for i,x in enum)
x = [1,2,3,4,5]
y = []
for i,v in r_enumerate(x):
if v != 3:
y.append(x.pop(i))
print 'i=%d, v=%d, x=%s, y=%s' %(i,v,x,y)
or with xrange:
x = [1,2,3,4,5]
y = []
for i in xrange(len(x)-1,-1,-1):
if x[i] != 3:
y.append(x.pop(i))
print 'i=%d, x=%s, y=%s' %(i,x,y)
If you need to filter stuff out of a list it may be a better idea to use list comprehension:
newlist = [x for x in oldlist if x%2]
for instance would filter all even numbers out of an integer list
The list stored in the memory of a computer. This deals with the pointer to a memory artifact. When you remove an element, in a by-element loop, you are then moving the pointer to the next available element in the memory address
You are modifying the memory and iterating thru the same.
The pointer to the element moves through the list to the next spot available.
So in the case of the Size being 5...enter code here
[**0**,1,2,3,4]
remove 0 ---> [1,**2**,3,4] pointer moves to second index.
remove 2 ---> [1,3,**4**] pointer moves to 3rd index.
remove 4 ---> [1,3]
I was just explaining this to my students when they used pop(1). Another very interesting side-effect error.
x=[1,**2**,3,4,5]
for i in x:
x.pop(1)
print(x,i)
[1, **3**, 4, 5] 1 at index 0 it removed the index 1 (2)
[1, **4**, 5] 3 at index 1 it removed the index 1 (3)
[1, 5] 5 at index 2 it removed the index 1 (4)
heh.
They were like why isnt this working... I mean... it did... exactly what you told it to do. Not a mind reader. :)

Categories

Resources