remove duplicate list elements - python

I want to remove the duplicate elements in a list only when one element is repeated many times, like this:
li = ['Human','Human','Human'] => li = ['Human']
but not when there are two or more different elements:
li = ['Human','Monkey','Human', 'Human']

You can do it easily with sets as below:
li = list(set(li))

def clean(lst):
if lst.count(lst[0]) == len(lst):
return [lst[0]]
else:
return lst
Does that do what you want?
if so, then you can do it in place as well
def clean_in_place(lst):
if lst.count(lst[0]) == len(lst):
lst[:] = [lst[0]]

lst = ['Human','Human','Human'] => lst = ['Human']
lst = ['Human','Monkey','Human', 'Human'] => lst = ['Human','Monkey','Human', 'Human']
it was do what you want?
if lst.count(lst[0])==len(lst):
lst=list(set(lst))
print lst
else:
print lst

Related

python list of lists contain substring

I have the list_of_lists and I need to get the string that contains 'height' in the sublists and if there is no height at all I need to get 'nvt' for the whole sublist.
I have tried the following:
list_of_lists = [['width=9','length=3'],['width=6','length=4','height=4']]
_lists = []
for list in list_of_lists:
list1 = []
for st in list:
if ("height" ) in st:
list1.append(st)
else:
list1.append('nvt')
_lists.append(list1)
OUT = _lists
the result I need to have is :
_lists = ['nvt', 'height=4']
what I'm getting is:
_lists = [['nvt','nvt'],['nvt','nvt','height=4']]
This is a good case for implementing a for/else construct as follows:
list_of_lists = [['width=9','length=3'],['width=6','length=4','height=4']]
result = []
for e in list_of_lists:
for ss in e:
if ss.startswith('height'):
result.append(ss)
break
else:
result.append('nvt')
print(result)
Output:
['nvt', 'height=4']
Note:
This could probably be done with a list comprehension but I think this is more obvious and probably has no significant difference in terms of performance
This should work, you can assign height variable to first value in the sublist where s.startswith("height") is True, and if nothing matches this filter, you can assign height to 'nvt'.
_lists = []
for sublist in list_of_lists:
height = next(filter(lambda s: s.startswith("height"), sublist), 'nvt')
_lists.append(height)
And if you wish to be crazy, you can use list comprehension to reduce the code to the:
_lists = [next(filter(lambda s: s.startswith("height"), sublist), 'nvt') for sublist in list_of_lists]
Try this (Python 3.x):
import re
list_of_lists = [['width=9','length=3'],['width=6','length=4','height=4']]
_lists = []
r = re.compile("height=")
for li in list_of_lists:
match = list(filter(r.match, li))
if len(match) > 0:
_lists.extend(match)
else:
_lists.append('nvt')
OUT = _lists
print(OUT)

Remove duplicates in nested list with specified index

I want to remove the list which contains the maximum number( In this list 16)
Code sample
Lst = [["D",16],["B",10],["A",13],["B",16]]
required output
Lst =[["B",10],["A",13]]
You can use max to get the maximum number and then filter the original list by using list comprehension:
lst = [["D",16],["B",10],["A",13],["B",16]]
max_num = max(x[1] for x in lst)
output = [sublst for sublst in lst if sublst[1] < max_num]
print(output) # [['B', 10], ['A', 13]]
Lst = [["D",16],["B",10],["A",13],["B",16]]
max_elem = -float("inf")
for element in Lst:
if element[1] > max_elem:
max_elem = element[1]
for i in reversed(range(len(Lst))):
if Lst[i][1] == max_elem:
Lst.pop(i)
print(Lst)

Sorting a list based on upper and lower case

I have a list:
List1 = ['name','is','JOHN','My']
I want to append the pronoun as the first item in a new list and append the names at last. Other items should be in the middle and their positions can change.
So far I have written:
my_list = ['name','is','JOHN','My']
new_list = []
for i in my_list:
if i.isupper():
my_list.remove(i)
new_list.append(i)
print(new_list)
Here, I can't check if an item is completely upper case or only its first letter is upper case.
Output I get:
['name','is','JOHN','My']
Output I want:
['My','name','is','JOHN']
or:
['My','is','name','JOHN']
EDIT: I have seen this post and it doesn’t have answers to my question.
i.isupper() will tell you if it's all uppercase.
To test if just the first character is uppercase and the rest lowercase, you can use i.istitle()
To make your final result, you can append to different lists based on the conditions.
all_cap = []
init_cap = []
non_cap = []
for i in my_list:
if i.isupper():
all_cap.append(i)
elif i.istitle():
init_cap.append(i)
else:
non_cap.append(i)
new_list = init_cap + non_cap + all_cap
print(new_list)
DEMO
How about this:
s = ['name', 'is', 'JOHN', 'My']
pronoun = ''
name = ''
for i in s:
if i.isupper():
name = i
if i.istitle():
pronoun = i
result = [pronoun, s[0], s[1], name]
print(result)
Don't # me pls XD. Try this.
my_list = ['name','is','JOHN','My']
new_list = ['']
for i in range(len(my_list)):
if my_list[i][0].isupper() and my_list[i][1].islower():
new_list[0] = my_list[i]
elif my_list[i].islower():
new_list.append(my_list[i])
elif my_list[i].isupper():
new_list.append(my_list[i])
print(new_list)

Filter a list of strings by frequency

I have a list of strings:
a = ['book','book','cards','book','foo','foo','computer']
I want to return anything in this list that's x > 2
Final output:
a = ['book','book','book']
I'm not quite sure how to approach this. But here's two methods I had in mind:
Approach One:
I've created a dictionary to count the number of times an item appears:
a = ['book','book','cards','book','foo','foo','computer']
import collections
def update_item_counts(item_counts, itemset):
for a in itemset:
item_counts[a] +=1
test = defaultdict(int)
update_item_counts(test, a)
print(test)
Out: defaultdict(<class 'int'>, {'book': 3, 'cards': 1, 'foo': 2, 'computer': 1})
I want to filter out the list with this dictionary but I'm not sure how to do that.
Approach two:
I tried to write a list comprehension but it doesn't seem to work:
res = [k for k in a if a.count > 2 in k]
A very barebone answer is that you should replace a.count by a.count(k) in your second solution.
Although, do not attempt to use list.count for this, as this will traverse the list for each item. Instead count occurences first with collections.Counter. This has the advantage of traversing the list only once.
from collections import Counter
from itertools import repeat
a = ['book','book','cards','book','foo','foo','computer']
count = Counter(a)
output = [word for item, n in count.items() if n > 2 for word in repeat(item, n)]
print(output) # ['book', 'book', 'book']
Note that the list comprehension is equivalent to the loop below.
output = []
for item, n in count.items():
if n > 2:
output.extend(repeat(item, n))
Try this:
a_list = ['book','book','cards','book','foo','foo','computer']
b_list = []
for a in a_list:
if a_list.count(a) > 2:
b_list.append(a)
print(b_list)
# ['book', 'book', 'book']
Edit: You mentioned list comprehension. You are on the right track! You can do it with list comprehension like this:
a_list = ['book','book','cards','book','foo','foo','computer']
c_list = [a for a in a_list if a_list.count(a) > 2]
Good luck!
a = ['book','book','cards','book','foo','foo','computer']
list(filter(lambda s: a.count(s) > 2, a))
Your first attempt builds a dictionary with all of the counts. You need to take this a step further to get the items that you want:
res = [k for k in test if test[k] > 2]
Now that you have built this by hand, you should check out the builtin Counter class that does all of the work for you.
If you just want to print there are better answers already, if you want to remove you can try this.
a = ['book','book','cards','book','foo','foo','computer']
countdict = {}
for word in a:
if word not in countdict:
countdict[word] = 1
else:
countdict[word] += 1
for x, y in countdict.items():
if (2 >= y):
for i in range(y):
a.remove(x)
You can try this.
def my_filter(my_list, my_freq):
'''Filter a list of strings by frequency'''
# use set() to unique my_list, then turn set back to list
unique_list = list(set(my_list))
# count frequency in unique_list
frequencies = []
for value in unique_list:
frequencies.append(my_list.count(value))
# filter frequency
return_list = []
for i, frequency in enumerate(frequencies):
if frequency > my_freq:
for _ in range(frequency):
return_list.append(unique_list[i])
return return_list
a = ['book','book','cards','book','foo','foo','computer']
my_filter(a, 2)
['book', 'book', 'book']

How to delete item and his info in a list?

I have the following list:
lst= ['Jason', 999999999, 'jason#live.com', 'Curt', 333333333, 'curt#job.com']
I want to delete Jason and the next 2 entries and the following so I´m thinking in this :
for i in range(len(lst)):
if "Jason" in lst:
del lst[0]
del lst[1]
del lst[2]
else:
print("Jason not in lst")
Is this correct?
What I´m working with thanks to Tigerhawk is the following:
Original list:
lst = `[['Curt', 333333333, 'curt#job.com'], ['Jason', 999999999, 'jason#live.com']]`
def clean_lst(lst):
name=str(input("Name you want to delete:")) #With this I get the lst on the 1st paragraph
lst = sum(lst, [])
if len(lst)==0:
print("Empty List")
elif name in lst:
idx = lst.index(name)
del lst[idx:idx+3]
else:
print("Name is not on the list")
End result should look like this:
lst = `[['Curt', 333333333, 'curt#job.com']]`
If you can have more than one, start from the end of the list and del i to i + 3 if l[i] is equal to Jason:
l = ['Jason', 999999999, 'jason#live.com', 'Curt', 333333333, 'curt#job.com', "Jason", "foo", "bar"]
for i in range(len(l) - 1, -1, -1):
if l[i] == "Jason":
del l[i:i+3]
Output:
['Curt', 333333333, 'curt#job.com']
As far as your own code goes, it presumes that "Jason" is always the first element even after removing any previous which seems unlikely but only you know for sure.
The most efficient way to do this, is to either create a new list or update the original using a generator function:
def rem_jas(l):
it = iter(l)
for ele in it:
if ele == "Jason":
# skip two elements
next(it,"")
next(it, "")
else:
yield ele
Output:
In [30]: l = ['Jason', 999999999, 'jason#live.com', 'Curt', 333333333, 'curt#job.com', "Jason", "foo", "bar"]
In [31]: l[:] = rem_jas(l)
In [32]: l
Out[32]: ['Curt', 333333333, 'curt#job.com']
If you can possibly have Jason within two elements of another Jason then you need to decide what is the appropriate thing to do. If there is always at least 3 spaces then it will be fine.
Based on your edit and the fact you have a list of lists not a flat list, it seems you want to remove each sublist where the name appears which makes the code a lost simpler:
lst = [['Curt', 333333333, 'curt#job.com'], ['Jason', 999999999, 'jason#live.com']]
from itertools import chain
lst[:] = chain(*(sub for sub in lst if "Jason" not in sub))
print(lst)
Output:
['Curt', 333333333, 'curt#job.com']
sum is not a good way to flatten a list, itertools.chain is far more efficient.
If you want to keep the sublists then don't flatten:
lst[:] = (sub for sub in lst if "Jason" not in sub)
print(lst)
Or a hybrid if you have multiple Jasons and need to add a few prints based on conditions:
def rem_jas(l, name):
it = iter(l)
for ele in it:
if ele == name:
# skip two elements
next(it,"")
next(it, "")
else:
yield ele
def clean_lst(l):
name = "Jason"
for sub in l:
tmp = list(rem_jas(sub, name))
if tmp:
yield tmp
if len(tmp) == len(sub):
print("{} not in sublist".format(name))
lst[:] = clean_lst(lst)
print(lst)
Demo:
In [5]: lst = [['Curt', 333333333, 'curt#job.com'], ['Jason', 999999999, 'jason#live.com']]
In [6]: lst[:] = clean_lst(lst)
Jason not in sublist
In [7]: print(lst)
[['Curt', 333333333, 'curt#job.com']]
And lastly if you want to let the user know which sublist was missing the name:
def clean_lst(l):
name = "Jason"
for ind, sub in enumerate(l):
tmp = list(rem_jas(sub, name))
if tmp:
yield tmp
if len(tmp) == len(sub):
print("{} not in sublist {}".format(name, ind))
You can simply search for the appropriate index and then delete a slice of three entries:
lst = ['Jason', 999999999, 'jason#live.com', 'Curt', 333333333, 'curt#job.com']
if 'Jason' in lst:
idx = lst.index('Jason')
del lst[idx:idx+3]
Result:
>>> lst
['Curt', 333333333, 'curt#job.com']

Categories

Resources