Remove erroneous elements in sequential list - python

Below is a simplfied issue that I have.
Lets say I have a list of elements as follows:
my_list = [0, 0, 1, 0, 1, 2, 2, 0, 0, 3, 3, 3, 2, 3]
I want it to be sequential from my_list[0] to my_list[-1], so smallest value first, largest is last.
If any element does not follow the sequence, I want to remove it.
So for the above example the output I want is:
my_list = [0, 0, 1, 1, 2, 2, 3, 3, 3, 3]
How can I do this? I know I could just enumerate and check if previous idx is <= the current, but if you have more than 1 outlier then this theory falls apart.
E.g.
new_list = []
for idx, el in enumerate(my_list):
if idx>0:
if my_list[idx-1] <=el:
new_list.append(el) # only these values count
output of new_list is:
[0, 1, 1, 2, 2, 0, 3, 3, 3, 3]
So still getting that outlier (0) at index 5
Note - I know I could sort() the list, but I want to actively remove the outliers, not sort.

Since you want every element in new_list to be greater-than-or-equal-to its previous element, you should compare it with the last element appended to new_list.
Besides, new_list should start with an element, not starting empty, or new_list[-1] will fail.
new_list = [my_list[0]]
for el in my_list:
if el >= new_list[-1]:
new_list.append(el)

You could do this by comparing each number with the cumulative maximum at its position. The cumulative maximum can be computed using the accumulate() function from itertools. Combining the numbers with their respective cumulative maximum can be achieved using the zip() function:
from itertools import accumulate
my_list = [0, 0, 1, 0, 1, 2, 2, 0, 0, 3, 3, 3, 2, 3]
my_list = [a for a,m in zip(my_list,accumulate(my_list,max)) if a==m]
print(my_list)
[0, 0, 1, 1, 2, 2, 3, 3, 3, 3]

Related

If B is a subset of A, remove B from a, with elements in the same order

lets suppose A = [1, 2, 3, 1, 2, 3, 2, 1, 3, 1, 2, 3].
If B is a subset of A, say [2, 1, 3];
I want to remove B from A in the same order to get [1, 2, 3, 1, 2, 3, 1, 2, 3].
Here's the code I used:
_set = [
1, 2, 3,
1, 2, 3,
2, 1, 3,
1, 2, 3
]
subset = [2, 1, 3]
def remove_subset(_set, subset):
for i in subset:
_set.remove(i)
return _set
print(remove_subset(_set, subset))
But it gives the output [1, 2, 3, 2, 1, 3, 1, 2, 3].
The expected output is: [1, 2, 3, 1, 2, 3, 1, 2, 3].
As written, you're removing the individual elements without paying attention to the order they appear. It looks like you want to delete the elements of the subset only if they appear contiguously in the same order as they do in the subset.
Sadly, this is something Python lists won't do for you (there is no equivalent to str.replace that allows you to remove a fixed sequence wherever it occurs). So you're stuck finding the index where your subsequence occurs and deleting it from the list:
for i in range(len(lst)): # Renamed set to lst, since it is a list, and to avoid shadowing set constructor
if lst[i:i+len(sublst)] == sublst: # Renamed subset to sublst to match
del lst[i:i+len(sublst)] # We found a place where the sublst begins, slice it out
break
This only removes one copy (to avoid issues with mutating a list as you iterate over it, or considering how to handle it if the sublist can overlap itself in the main list).
Side-note: This (iterating over slices of a sequence) is basically the only circumstance in which iterating over sequence indices is the Pythonic solution (there are ways to avoid the unPythonic for i in range(len(lst)): involving enumerate and zip and iter and explicit sequence multiplication and star-unpacking, but they're so ugly it's not worth the bother).
One way could be to convert your lists to strings and use the .replace method of strings -
a = [1, 2, 3, 1, 2, 3, 2, 1, 3, 1, 2, 3]
b = [2, 1, 3]
str_a = ''.join(map(str, a))
str_b = ''.join(map(str, b))
str_a_b_removed = str_a.replace(str_b, '')
list(map(int, list(str_a_b_removed)))
Output
[1, 2, 3, 1, 2, 3, 1, 2, 3]

Remove duplicate numbers from a list

I was attempting to remove all duplicated numbers in a list.
I was trying to understand what is wrong with my code.
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
for x in numbers:
if numbers.count(x) >= 2:
numbers.remove(x)
print(numbers)
The result I got was:
[1, 1, 6, 5, 2, 3]
I guess the idea is to write code yourself without using library functions. Then I would still suggest to use additional set structure to store your previous items and go only once over your array:
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
unique = set()
for x in numbers:
if x not in unique:
unique.add(x)
numbers = list(unique)
print(numbers)
If you want to use your code then the problem is that you modify collection in for each loop, which is a big NO NO in most programming languages. Although Python allows you to do that, the problem and solution are already described in this answer: How to remove items from a list while iterating?:
Note: There is a subtlety when the sequence is being modified by the loop (this can only occur for mutable sequences, i.e. lists). An internal counter is used to keep track of which item is used next, and this is incremented on each iteration. When this counter has reached the length of the sequence the loop terminates. This means that if the suite deletes the current (or a previous) item from the sequence, the next item will be skipped (since it gets the index of the current item which has already been treated). Likewise, if the suite inserts an item in the sequence before the current item, the current item will be treated again the next time through the loop. This can lead to nasty bugs that can be avoided by making a temporary copy using a slice of the whole sequence, e.g.,
for x in a[:]:
if x < 0: a.remove(x)
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
Using a shallow copy of the list:
for x in numbers[:]:
if numbers.count(x) >= 2:
numbers.remove(x)
print(numbers) # [1, 6, 5, 2, 3]
Alternatives:
Preserving the order of the list:
Using dict.fromkeys()
print(list(dict.fromkeys(numbers).keys())) # [1, 6, 5, 2, 3]
Using more_itertools.unique_everseen(iterable, key=None):
from more_itertools import unique_everseen
print(list(unique_everseen(numbers))) # [1, 6, 5, 2, 3]
Using pandas.unique:
import pandas as pd
print(pd.unique(numbers).tolist()) # [1, 6, 5, 2, 3]
Using collections.OrderedDict([items]):
from collections import OrderedDict
print(list(OrderedDict.fromkeys(numbers))) # [1, 6, 5, 2, 3]
Using itertools.groupby(iterable[, key]):
from itertools import groupby
print([k for k,_ in groupby(numbers)]) # [1, 6, 5, 2, 3]
Ignoring the order of the list:
Using numpy.unique:
import numpy as np
print(np.unique(numbers).tolist()) # [1, 2, 3, 5, 6]
Using set():
print(list(set(numbers))) # [1, 2, 3, 5, 6]
Using frozenset([iterable]):
print(list(frozenset(numbers))) # [1, 2, 3, 5, 6]
Why don't you simply use a set:
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
numbers = list(set(numbers))
print(numbers)
Before anything, the first advice I can give is to never edit over an array that you are looping. All kinds of wacky stuff happens. Your code is fine (I recommend reading other answers though, there's an easier way to do this with a set, which pretty much handles the duplication thing for you).
Instead of removing number from the array you are looping, just clone the array you are looping in the actual for loop syntax with slicing.
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
for x in numbers[:]:
if numbers.count(x) >= 2:
numbers.remove(x)
print(numbers)
print("Final")
print(numbers)
The answer there is numbers[:], which gives back a clone of the array. Here's the print output:
[1, 1, 1, 6, 5, 5, 2, 3]
[1, 1, 6, 5, 5, 2, 3]
[1, 6, 5, 5, 2, 3]
[1, 6, 5, 5, 2, 3]
[1, 6, 5, 5, 2, 3]
[1, 6, 5, 2, 3]
[1, 6, 5, 2, 3]
[1, 6, 5, 2, 3]
[1, 6, 5, 2, 3]
Final
[1, 6, 5, 2, 3]
Leaving a placeholder here until I figure out how to explain why in your particular case it's not working, like the actual step by step reason.
Another way to solve this making use of the beautiful language that is Python, is through list comprehension and sets.
Why a set. Because the definition of this data structure is that the elements are unique, so even if you try to put in multiple elements that are the same, they won't appear as repeated in the set. Cool, right?
List comprehension is some syntax sugar for looping in one line, get used to it with Python, you'll either use it a lot, or see it a lot :)
So with list comprehension you will iterate an iterable and return that item. In the code below, x represents each number in numbers, x is returned to be part of the set. Because the set handles duplicates...voila, your code is done.
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
nubmers_a_set = {x for x in numbers }
print(nubmers_a_set)
This seems like homework but here is a possible solution:
import numpy as np
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
filtered = list(np.unique(numbers))
print(filtered)
#[1, 2, 3, 5, 6]
This solution does not preserve the ordering. If you need also the ordering use:
filtered_with_order = list(dict.fromkeys(numbers))
Why don't you use fromkeys?
numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
numbers = list(dict.fromkeys(numbers))
Output: [1,6,5,2,3]
The flow is as follows.
Now the list is [1, 1, 1, 1, 6, 5, 5, 2, 3] and Index is 0.
The x is 1. The numbers.count(1) is 4 and thus the 1 at index 0 is removed.
Now the numbers list becomes [1, 1, 1, 6, 5, 5, 2, 3] but the Index will +1 and becomes 1.
The x is 1. The numbers.count(1) is 3 and thus the 1 and index 1 is removed.
Now the numbers list becomes [1, 1, 6, 5, 5, 2, 3] but the Index will +1 and becomes 2.
The x will be 6.
etc...
So that's why there are two 1's.
Please correct me if I am wrong. Thanks!
A fancy method is to use collections.Counter:
>>> from collections import Counter
>>> numbers = [1, 1, 1, 1, 6, 5, 5, 2, 3]
>>> c = Counter(numbers)
>>> list(c.keys())
[1, 6, 5, 2, 3]
This method have a linear time complexity (O(n)) and uses a really performant library.
You can try:
from more_itertools import unique_everseen
items = [1, 1, 1, 1, 6, 5, 5, 2, 3]
list(unique_everseen(items))
or
from collections import OrderedDict
>>> items = [1, 1, 1, 1, 6, 5, 5, 2, 3]
>>> list(OrderedDict.fromkeys(items))
[1, 2, 0, 3]
more you can find here
How do you remove duplicates from a list whilst preserving order?

Repeat different elements of an array different amounts of times

Say I have an array with longitudes, lonPorts
lonPort =np.loadtxt('LongPorts.txt',delimiter=',')
for example:
lonPort=[0,1,2,3,...]
And I want to repeat each element a different amount of times. How do I do this? This is what I tried:
Repeat =[5, 3, 2, 3,...]
lonPort1=[]
for i in range (0,len(lenDates)):
lonPort1[sum(Repeat[0:i])]=np.tile(lonPort[i],Repeat[i])
So the result would be:
lonPort1=[0,0,0,0,0,1,1,1,2,2,3,3,3,...]
The error I get is:
list assignment index out of range
How do I get rid of the error and make my array?
Thank you!
You can use np.repeat():
np.repeat(a, [5,3,2,3])
Example:
In [3]: a = np.array([0,1,2,3])
In [4]: np.repeat(a, [5,3,2,3])
Out[4]: array([0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3, 3])
Without relying on numpy, you can create a generator that will consume your items one by one, and repeat them the desired amount of time.
x = [0, 1, 2, 3]
repeat = [4, 3, 2, 1]
def repeat_items(x, repeat):
for item, r in zip(x, repeat):
while r > 0:
yield item
r -= 1
for value in repeat_items(x, repeat):
print(value, end=' ')
displays 0 0 0 0 1 1 1 2 2 3.
Providing a numpy-free solution for future readers that might want to use lists.
>>> lst = [0,1,2,3]
>>> repeat = [5, 3, 2, 3]
>>> [x for sub in ([x]*y for x,y in zip(lst, repeat)) for x in sub]
[0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3, 3]
If lst contains mutable objects, be aware of the pitfalls of sequence multiplication for sequences holding mutable elements.

Count frequency of words in multiple lists from a larger vocabulary?

I know how to count frequency of elements in a list but here's a lightly different question. I have a larger set of vocabulary and a few lists that only use part of the total vocabulary. Using numbers instead of words as an example:
vocab=[1,2,3,4,5,6,7]
list1=[1,2,3,4]
list2=[2,3,4,5,6,6,7]
list3=[3,2,4,4,1]
and I want the output to keep "0"s when a word is not used:
count1=[1,1,1,1,0,0,0]
count2=[0,1,1,1,1,2,1]
count3=[1,1,1,2,0,0,0]
I guess I need to sort the words, but how do I keep the "0" records?
This can be done using the list object's inbuilt count function, within a list comprehension.
>>> vocab = [1, 2, 3, 4, 5, 6, 7]
>>> list1 = [1, 2, 3, 4]
>>> list2 = [2, 3, 4, 5, 6, 6, 7]
>>> list3 = [3, 2, 4, 4, 1]
>>> [list1.count(v) for v in vocab]
[1, 1, 1, 1, 0, 0, 0]
>>> [list2.count(v) for v in vocab]
[0, 1, 1, 1, 1, 2, 1]
>>> [list3.count(v) for v in vocab]
[1, 1, 1, 2, 0, 0, 0]
Iterate over each value in vocab, accumulating the frequencies for them.
You could also achieve this with the follwing (Python 2):
map(lambda v: list1.count(v), vocab)

Generate Permutation in Lexicographic Order using Recursion

I'm doing project euler q24, but this snippet of code that generates permutations doesn't work as expected. I'm not sure how to explain the logic of the code, but it uses recursion to create every set of permutations at a certain index then moves onto the next index.
def genPermutation(num,index):
if (index == (len(num)-1)):
print(num)
else:
i = 0
while i<(len(num)-index):
newList = num
temp = newList[index+i]
newList.pop(index+i)
newList.insert(index,temp)
genPermutation(newList,index+1)
i = i+1
a = [0,1,2,3,4,5]
genPermutation(a,0)
Your major flaw is that assigning a list does not create a new list, when you recurse down you are changing the same list as further up in the call stack, so you are getting duplicates and strange ordering.
You need:
newList = num[:] # Create a new list
However, you've also got a few unnecessaries. A) You don't need a while loop, B) You don't need an index and a pop:
def genPermutation(num,index):
if index == len(num)-1:
print(num)
return
for i in range(index, len(num)):
newList = num[:]
temp = newList.pop(i)
newList.insert(index, temp)
genPermutation(newList, index+1)
Gives you the full list without duplicates:
>>> a = list(range(6))
>>> genPermutation(a,0))
[[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 5, 4],
[0, 1, 2, 4, 3, 5],
[0, 1, 2, 4, 5, 3],
[0, 1, 2, 5, 3, 4],
[0, 1, 2, 5, 4, 3],
[0, 1, 3, 2, 4, 5],
[0, 1, 3, 2, 5, 4],
...
However, this whole approach is very inefficient. Using recursion with all these list creations is very expensive when compared to a iterative approach, see the implementation of itertools.permutation()

Categories

Resources