isolating a sub list from a big list in python - python

I have a big list in python like this small example:
small example:
['MLEEDMEVAIKMVVVGNGAVGKSSMIQRYCKGIFTKDYKKTIGVDFLERQIQVNDEDVRLMLWDTAGQEEFDAITKAYYRGAQACVLVFSTTDRESFEAV', 'MDHTEGSPAEEPPAHAPSPGKFGERPPPKRLTREAMRNYLKERGDQTVLILHAKVAQKSYGNEKRFFCPPPCVYLMGSGWKKKKEQMERDGCSEQESQPCAFIGIGNSDQEMQQLNLEGKNYCTAKTLYISDSDKRKHFMLSVKMFYGNSDDIGVFLSKRIKVISKPSKKKQSLKNADLCIASGTKVALFNRLRSQTVSTRYLHVEGGNFHASSQQWGAFFIHLLDDDESEGEEFTVRDGYIHYGQTVKLVCSVTGMALPRLIIRKVDKQTALLDADDPVSQLHKCAFYLKDTERMYLCLSQERIIQFQATPCPKEPNKEMINDGASWTIISTDKAEYTFYEGMGPVLAPVTPVPVVESLQLNGGGDVAMLELTGQNFTPNLRVWFGDVEAETMYRCGESMLCVVPDISAFREGWRWVRQPVQVPVTLVRNDGIIYSTSLTFTYTPEPGPRPHCSAAGAILRANSSQVPPNESNTNSEGSYTNASTNSTSVTSSTATVVS']
in the file there are many items and each item is a sequence of characters. I want to make a new list in which every item has only one W. the expected output for the small example would be like the expected output.
expected output:
['MLEEDMEVAIKMVVVGNGAVGKSSMIQRYCKGIFTKDYKKTIGVDFLERQIQVNDEDVRLMLWDTAGQEEFDAITKAYYRGAQACVLVFSTTDRESFEAV']
I am trying to do that in python and wrote the following code:
newlist = []
for item in mylist:
for c in item:
if c == W:
newlist.append(item)
but it does not return what I want. do you know how to fix it?

Use .count
Ex:
res = []
mylist = ['MLEEDMEVAIKMVVVGNGAVGKSSMIQRYCKGIFTKDYKKTIGVDFLERQIQVNDEDVRLMLWDTAGQEEFDAITKAYYRGAQACVLVFSTTDRESFEAV', 'MDHTEGSPAEEPPAHAPSPGKFGERPPPKRLTREAMRNYLKERGDQTVLILHAKVAQKSYGNEKRFFCPPPCVYLMGSGWKKKKEQMERDGCSEQESQPCAFIGIGNSDQEMQQLNLEGKNYCTAKTLYISDSDKRKHFMLSVKMFYGNSDDIGVFLSKRIKVISKPSKKKQSLKNADLCIASGTKVALFNRLRSQTVSTRYLHVEGGNFHASSQQWGAFFIHLLDDDESEGEEFTVRDGYIHYGQTVKLVCSVTGMALPRLIIRKVDKQTALLDADDPVSQLHKCAFYLKDTERMYLCLSQERIIQFQATPCPKEPNKEMINDGASWTIISTDKAEYTFYEGMGPVLAPVTPVPVVESLQLNGGGDVAMLELTGQNFTPNLRVWFGDVEAETMYRCGESMLCVVPDISAFREGWRWVRQPVQVPVTLVRNDGIIYSTSLTFTYTPEPGPRPHCSAAGAILRANSSQVPPNESNTNSEGSYTNASTNSTSVTSSTATVVS']
for item in mylist:
if item.count("W") == 1:
res.append(item)
print(res)
or
res = [item for item in mylist if item.count("W") == 1]
Output:
['MLEEDMEVAIKMVVVGNGAVGKSSMIQRYCKGIFTKDYKKTIGVDFLERQIQVNDEDVRLMLWDTAGQEEFDAITKAYYRGAQACVLVFSTTDRESFEAV']

The problem is you are iterating each character in each string and appending when a condition is met. Moreover, your logic can't "undo" a list.append operation if another W is found. So if W is met twice in a string, you are appending twice.
Instead, you can use a list comprehension with list.count:
res = [i for i in L if i.count('W') == 1]

Related

Common items in list of lists

I have a list of lists, and I want to make a function that checks if each of the lists inside have exactly one item in common with all the other lists, if so return True.
Couldn't make it work, is there a simple way to do it without using modules?
I've tried something like this:
list_of_lists = [['d','b','s'],['e','b','f'],['s','f','l'],['b','l','t']]
new_list = []
for i in list_of_lists:
for item in i:
new_list.append(item)
if len(set(new_list)) == len(new_list)-len(list_of_lists):
return True
if you want to intersect all the items in the sublist you can convert them to a set and find intersection check if its an empty set.
list_of_lists = [['d','b','s'],['e','b','f'],['s','f','l'],['b','l','t']]
common_items = set.intersection(*[set(_) for _ in list_of_lists])
if len(common_items) == 1:
return True
return False
Using list comprehension:
def func(list_of_lists):
return sum([all([item in lst for lst in list_of_lists[1:]]) for item in list_of_lists[0]]) == 1
Works if each is guaranteed for one of each item. Also if its not, returns True only if there is one match.
use the Counter after joining a list and a compare list to determine occurrences. Ensure at least one item in the resulting list has a frequency of 2.
from collections import Counter
list_of_lists = [['d','b','s'],['e','b','f'],['s','f','l'],['b','l','t']]
for i in range(len(list_of_lists)):
for j in range(i+1,len(list_of_lists)):
result=(list_of_lists[i]+list_of_lists[j])
counts=(Counter(result))
matches={x for k,x in counts.items() if x==2}
if len(matches)==0:
print("missing a match")

Remove similar(but not the same) strings in a single list

I have a list of strings that look like:
my_list = ['https://www.google.com/', 'http://www.google.com/',
'https://www.google.com', 'http://www.google.com']
As you can see they are not the same but they all look very similar.
I also have a function which is:
from fuzzywuzzy import fuzz
def similar(a, b):
return fuzz.ratio(a,b)
I want to use this functions and say something like:
for a,b in my_list:
print (a,b)
if similar(a,b) > 0.95:
my_list.remove(b)
So I'm trying to remove similar looking strings from a list if they are above a certain similarity ratio. I want to do this so that in this list I would end up with just the first url, in this case my_list would end up being:
my_list = ['https://www.google.com/']
After doing some googling, I found fuzzywuzzy has an inbuilt function which is pretty great.
from fuzzywuzzy.process import dedupe
deduped_list = list(dedupe(my_list, threshold=97, scorer=fuzz.ratio))
In general you should never use list.remove() within an iteration loop, because the list iterator will get confused when you remove an item from the same list you are iterating over.
And because you always want to keep the first item you can exclude it from the loop:
idx = 1
while idx < len(my_list):
if similar(my_list[idx - 1], my_list[idx]) > 0.95:
my_list.remove(my_list[idx])
print(my_list)
alternative solution with list comprehension
first_item = my_list[0]
my_list = [first_item] + [item for item in my_list[1:] if similar(first_item, item) <= 0.95]
print(my_list)

split a list of list that contains string

I have a list of lists like this:
[["testo=text1","testo2=text2"],["testo3=text3","testo4=text4"]]
I want to split each element of each sublist by "=".
Desired result:
[['testo', 'text1'],['testo2', 'text2']]
My attempt was to iterate over each sub-list and split. But it's not working:
[j.split("=") for j in [i for i in splitted_params]]
keep getting 'list' object has no attribute 'split' error
try:
l = [["testo=text1","testo2=text2"],["testo3=text3","testo4=text4"]]
new_l = [inner_element.split("=") for x in l for inner_element in x]
print(new_l)
output:
[['testo', 'text1'], ['testo2', 'text2'], ['testo3', 'text3'], ['testo4', 'text4']]
You shouldn't try to be clever with python list comprehensions. In my opinion, you should go for the readable solution. :)
if __name__ == '__main__':
data = [
["testo=text1","testo2=text2"],
["testo3=text3","testo4=text4"]
]
for arr in data:
for index in range( len(arr) ):
arr[index] = arr[index].split('=')
print(data)
In your expression, [j.split("=") for j in [i for i in splitted_params]], the inner expression, [i for i in splitted_params] is evaluated first, which gives you a list. You did nothing in this list comprehension. Then, when you evaluate [j.split("=") for j in SOME_RESULT_YOU_GOT], you are trying to split a list, which is not possible.
You can use chain.from_iterable() to avoid the double for loop in the list comprehension:
from itertools import chain
l = [["testo=text1", "testo2=text2"], ["testo3=text3", "testo4=text4"]]
[i.split('=') for i in chain.from_iterable(l)]
# [['testo', 'text1'], ['testo2', 'text2'], ['testo3', 'text3'], ['testo4', 'text4']]
Explanation why your solution doesn‘t work:
splitted_params = [["testo=text1", "testo2=text2"], ["testo3=text3", "testo4=text4"]]
print([i for i in splitted_params] == splitted_params)
# True
So when you use [i for i in splitted_params] inside your listcomp you get the same list.
I think the problem is that [i for i in splitted_params] this doesn't return the lists in your list of lists.
it just returns your list of lists so when you then loop through it again, it will try to split the lists in the list of lists.
so I would suggest you just do a loop in a loop like this
listoflists = [["testo=text1", "testo2=text2"], ["testo3=text3", "testo4=text4"]]
for i in listoflists:
for j in i:
print(j.split("="))
It may not be as pretty but it does get the job done.

Elegant way to delete items in a list which do not has substrings that appear in another list

Recently I encountered this problem:
Say there is a list of something I want to process:
process_list=["/test/fruit/apple","/test/fruit/pineapple","/test/fruit/banana","/test/tech/apple-pen","/test/animal/python","/test/animal/penguin"]
And I want to exclude something using another list, for instance:
exclude_list=["apple","python"]
The process_list should be like this after I apply the exclude_list to it( any process_list item that contains a sub:
["/test/fruit/banana","/test/animal/penguin","/test/fruit/pineapple"]
or if the exclude_list is:
exclude_list=["pen","banana"]
The process_list should be this after apply the filter:
["/test/fruit/apple","/test/fruit/pineapple","/test/animal/python"]
So what I was trying at first was:
for item in exclude_list:
for name in (process_list):
if item in name:
process_list.remove(name)
Of course this didn't work because removing elements from the list while iterating over it using a for loop is not permitted. The code only removed the first match and then stopped.
So then I came up a way to do this with another list:
deletion_list=[] #Track names that need to be deleted
for item in exclude_list:
for name in (process_list):
if item in name:
deletion_list.append(name)
# A list comprehension
process_list=[ x for x in process_list if x not in deletion_list ]
It works, but my guts tell me there may be a more elegant way. Now it need s another list to store the name need to be deleted. Any ideas?
You may use the list comprehension expression using all() filter as:
# Here: `p` is the entry from `process_list`
# `e` is the entry from `exclude_list`
>>> [p for p in process_list if all(e not in p for e in exclude_list)]
['/test/fruit/banana', '/test/animal/penguin']
Regarding your statement:
Of course this didn't work because removing elements from the list while iterating over it using a for loop is not permitted. The code only removed the first match and then stopped.
You could have iterate over the copy of the list as:
for item in list(exclude_list): # OR, for item in exclude_list[:]:
# ^-- Creates new copy ----------------------------^
Just in addition you can also use regular expression e.g.
import re
pattern = '(' + ('|').join(exclude_list) + ')'
list(filter(lambda l : re.search(pattern,l) == None, process_list)) #filter will return iterator in case if you use python 3
Use os.path.basename to get the basename of pathname, use the build-in functon all to check if the basename is not included in exclude_list.
import os
process_list=["/test/fruit/apple","/test/fruit/pineapple","/test/fruit/banana","/test/tech/apple-pen","/test/animal/python","/test/animal/penguin"]
# Case 1
exclude_list=["apple","python"]
l = [s for s in process_list
if all(item not in os.path.basename(s) for item in exclude_list)]
print(l)
['/test/fruit/banana', '/test/animal/penguin']
# Case 2
exclude_list=["pen","banana"]
l = [s for s in process_list
if all(item not in os.path.basename(s) for item in exclude_list)]
print(l)
['/test/fruit/apple', '/test/fruit/pineapple', '/test/animal/python']
[line for line in lines if not any(word in line for word in words)]
Another approach to achieve what you want is as follows:
[item for item in process_list if not any(exc in item.split('/')[-1] for exc in exclude_list)]
Output:
>>> [item for item in process_list if not any(exc in item.split('/')[-1] for exc in exclude_list)]
['/test/fruit/banana', '/test/animal/penguin']
if you want to work directly with process_list and the list comprehension without any funny behavior you should work with a copy, which is created like this: process_list[:]
process_list=["/test/fruit/apple","/test/fruit/pineapple","/test/fruit/banana","/test/tech/apple-pen","/test/animal/python","/test/animal/penguin"]
exclude_list=["apple","python"]
process_list = [x for x in process_list[:] if not any(y in x for y in exclude_list)]

Python: trying to get three elements of a list with slice over a iterator

I'm new to python.
I'm trying to create another list from a big one just with 3 elements of that list at a time.
I'm trying this:
my_list = ['test1,test2,test3','test4,test5,test6','test7,test8,test9','test10,test11,test12']
new_three = []
for i in my_list:
item = my_list[int(i):3]
new_three.append(item)
# here I'll write a file with these 3 elements. Next iteration I will write the next three ones, and so on...
I'm getting this error:
item = my_list[int(i):3]
ValueError: invalid literal for int() with base 10: 'test1,test2,test3'
I also tried:
from itertools import islice
for i in my_list:
new_three.append(islice(my_list,int(i),3))
Got the same error. I cannot figure out what I'm doing wrong.
EDIT:
After many tries with help here, I could make it.
listrange = []
for i in range(len(li)/3 + 1):
item = li[i*3:(i*3)+3]
listrange.append(item)
Is this what you meant?
my_list = ['test1,test2,test3','test4,test5,test6','test7,test8,test9','test10,test11,test12']
for item in my_list:
print "this is one item from the list :", item
list_of_things = item.split(',')
print "make a list with split on comma:", list_of_things
# you can write list_of_things to disk here
print "--------------------------------"
In response to comments, if you want to generate a whole new list with the comma separated strings transformed into sublists, that is a list comprehension:
new_list = [item.split(',') for item in my_list]
And to split it up into groups of three items from the original list, see the answer linked in comments by PM 2Ring, What is the most "pythonic" way to iterate over a list in chunks?
I have adapted that to your specific case here:
my_list = ['test1,test2,test3','test4,test5,test6','test7,test8,test9','test10,test11,test12']
for i in xrange(0, len(my_list), 3):
# get the next three items from my_list
my_list_segment = my_list[i:i+3]
# here is an example of making a new list with those three
new_list = [item.split(',') for item in my_list]
print "three items from original list, with string split into sublist"
print my_list_segment
print "-------------------------------------------------------------"
# here is a more practical use of the three items, if you are writing separate files for each three
filename_this_segment = 'temp' # make up a filename, possibly using i/3+1 in the name
with open(filename_this_segment, 'w') as f:
for item in my_list_segment:
list_of_things = item.split(',')
for thing in list_of_things:
# obviously you'll want to format the file somehow, but that's beyond the scope of this question
f.write(thing)

Categories

Resources