delete the second item that starts with the same substring - python

I have a list l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
I want to delete the elements that start with the same sub-string if they exist (in this case 'abcd' and 'ghi').
N.B: in my situation, I know that the 'repeated' elements, if they exist, can be only 'abcd' or 'ghi'.
To delete them, I used this:
>>> l.remove('abcd') if ('abcdef' in l and 'abcd' in l) else l
>>> l.remove('ghi') if ('ghijklm' in l and 'ghi' in l) else l
>>> l
>>> ['abcdef', 'ghijklm', 'xyz', 'pqrs']
Is there a more efficient (or more automated) way to do this?

You can do it in linear time and O(n*m²) memory (where m is the length of your elements):
prefixes = {}
for word in l:
for x in range(len(word) - 1):
prefixes[word[:x]] = True
result = [word for word in l if word not in prefixes]
Iterate over each word and create a dictionary of the first character of each word, then the first two characters, then three, all the way up to all the characters of the word except the last one. Then iterate over the list again and if a word appears in that dictionary it's a shorter subset of some other word in the list

l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
for a in l[:]:
for b in l[:]:
if a.startswith(b) and a != b:
l.remove(b)
print(l)
Output
['abcdef', 'ghijklm', 'xyz', 'pqrs']

The following code does what you described.
your_list = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
print("Original list: %s" % your_list)
helper_list = []
for element in your_list:
for element2 in your_list:
if element.startswith(element2) and element != element2:
print("%s starts with %s" % (element, element2))
print("Remove: %s" % element)
your_list.remove(element)
print("Removed list: %s" % your_list)
Output:
Original list: ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
abcdef starts with abcd
Remove: abcdef
ghijklm starts with ghi
Remove: ghijklm
Removed list: ['abcd', 'ghi', 'xyz', 'pqrs']
On the other hand, I think there is more simple solution and you can solve it with list comprehension if you want.

#Andrew Allen's way
l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
i=0
l = sorted(l)
while True:
try:
if l[i] in l[i+1]:
l.remove(l[i])
continue
i += 1
except:
break
print(l)
#['abcdef', 'ghijklm', 'pqrs', 'xyz']

Try this it will work
l =['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
for i in l:
for j in l:
if len(i)>len(j) and j in i:
l.remove(j)

You can use
l = ['abcdef', 'abcd', 'ghijklm', 'ghi', 'xyz', 'pqrs']
if "abcdef" in l: # only 1 check for containment instead of 2
l = [x for x in l if x != "abcd"] # to remove _all_ abcd
# or
l = l.remove("abcd") # if you know there is only one abcd in it
This might be slightly faster (if you have far more elements then you show) because you only need to check once for "abcdef" - and then once untile the first/all of list for replacement.
>>> l.remove('abcd') if ('abcdef' in l and 'abcd' in l) else l
checks l twice for its full size to check containment (if unlucky) and then still needs to remove something from it
DISCLAIMER:
If this is NOT proven, measured bottleneck or security critical etc. I would not bother to do it unless I have measurements that suggests this is the biggest timesaver/optimization of all code overall ... with lists up to some dozends/hundreds (tummy feeling - your data does not support any analysis) the estimated gain from it is negligable.

Related

How to filter out strings that do not start with specific chars from a list [python]? [duplicate]

Given the list ['a','ab','abc','bac'], I want to compute a list with strings that have 'ab' in them. I.e. the result is ['ab','abc']. How can this be done in Python?
This simple filtering can be achieved in many ways with Python. The best approach is to use "list comprehensions" as follows:
>>> lst = ['a', 'ab', 'abc', 'bac']
>>> [k for k in lst if 'ab' in k]
['ab', 'abc']
Another way is to use the filter function. In Python 2:
>>> filter(lambda k: 'ab' in k, lst)
['ab', 'abc']
In Python 3, it returns an iterator instead of a list, but you can cast it:
>>> list(filter(lambda k: 'ab' in k, lst))
['ab', 'abc']
Though it's better practice to use a comprehension.
[x for x in L if 'ab' in x]
# To support matches from the beginning, not any matches:
items = ['a', 'ab', 'abc', 'bac']
prefix = 'ab'
filter(lambda x: x.startswith(prefix), items)
Tried this out quickly in the interactive shell:
>>> l = ['a', 'ab', 'abc', 'bac']
>>> [x for x in l if 'ab' in x]
['ab', 'abc']
>>>
Why does this work? Because the in operator is defined for strings to mean: "is substring of".
Also, you might want to consider writing out the loop as opposed to using the list comprehension syntax used above:
l = ['a', 'ab', 'abc', 'bac']
result = []
for s in l:
if 'ab' in s:
result.append(s)
mylist = ['a', 'ab', 'abc']
assert 'ab' in mylist

How can I remove a specific number from a string?

here is the code:
aList = ['0.01', 'xyz', 'J0.01', 'abc', 'xyz'];
aList.remove('0.01');
print("List : ", aList)
here is the output:
List :
['xyz', 'J0.01', 'abc', 'xyz']
How can I remove the 0.01 attached to 'J0.01'? I would like to keep the J. Thanks for your time! =)
Seems like you want
aList = ['0.01', 'xyz', 'J0.01', 'abc', 'xyz'];
>>> [z.replace('0.01', '') for z in aList]
['', 'xyz', 'J', 'abc', 'xyz']
If you want to remove also empty strings/whitespaces,
>>> [z.replace('0.01', '') for z in aList if z.replace('0.01', '').strip()]
['xyz', 'J', 'abc', 'xyz']
Using re module:
import re
aList = ['0.01', 'xyz', 'J0.01', 'abc', 'xyz'];
print([i for i in (re.sub(r'\d+\.?\d*$', '', i) for i in aList) if i])
Prints:
['xyz', 'J', 'abc', 'xyz']
EDIT:
The regexp substitution re.sub(r'\d+\.?\d*$', '', i) will substitute every digit followed by dot (optional) and followed by any number of digits for empty string. The $ signifies that the digit should be at the end of the string.
So. e.g. the following matches are valid: "0.01", "0.", "0". Explanation on external site here.
Something like that can works:
l = ['0.01', 'xyz', 'J0.01', 'abc', 'xyz']
string = '0.01'
result = []
for x in l :
if string in x:
substring = x.replace(string,'')
if substring != "":
result.append(substring)
else:
result.append(x)
print(result)
try it, regards.

List comprehension: Multiply each string to a single list

I have a list of strings and want to get a new list consisting on each element a number of times.
lst = ['abc', '123']
n = 3
I can do that with a for loop:
res = []
for i in lst:
res = res + [i]*n
print( res )
['abc', 'abc', 'abc', '123', '123', '123']
How do I do it with list comprehension?
My best try so far:
[ [i]*n for i in ['abc', '123'] ]
[['abc', 'abc', 'abc'], ['123', '123', '123']]
Use a nested list comprehension
>>> lst = ['abc', '123']
>>> n = 3
>>> [i for i in lst for j in range(n)]
['abc', 'abc', 'abc', '123', '123', '123']
The idea behind this is, you loop through the list twice and you print each of the element thrice.
See What does "list comprehension" mean? How does it work and how can I use it?
It can also be done as:
>>> lst = ['abc', '123']
>>> n=3
>>> [j for i in lst for j in (i,)*n]
['abc', 'abc', 'abc', '123', '123', '123']

Search a list of strings with a list of substrings

I have a list of strings and currently I can search for one substring at the time:
str = ['abc', 'efg', 'xyz']
[s for s in str if "a" in s]
which correctly returns
['abc']
Now let's say I have a list of substrings instead:
subs = ['a', 'ef']
I want a command like
[s for s in str if anyof(subs) in s]
which should return
['abc', 'efg']
>>> s = ['abc', 'efg', 'xyz']
>>> subs = ['a', 'ef']
>>> [x for x in s if any(sub in x for sub in subs)]
['abc', 'efg']
Don't use str as a variable name, it's a builtin.
Gets a little convoluted but you could do
[s for s in str if any([sub for sub in subs if sub in s])]
Simply use them one after the other:
[s for s in str for r in subs if r in s]
>>> r = ['abc', 'efg', 'xyz']
>>> s = ['a', 'ef']
>>> [t for t in r for x in s if x in t]
['abc', 'efg']
I still like map and filter, despite what is being said against and how comprehension can always replace a map and a filter. Hence, here is a map + filter + lambda version:
print filter(lambda x: any(map(x.__contains__,subs)), s)
which reads:
filter elements of s that contain any element from subs
I like how this uses words that carry a strong semantic meaning, rather than only if, for, in

How to check if list1 contains some elements of list2?

I need to remove few strings from the list1 so i put them into list2 and just cant find out how to make it works.
list1 = ['abc', 'def', '123']
list2 = ['def', 'xyz', 'abc'] # stuff to delete from list1
And I would like to remove 'abc' and 'def' from list1 so it only contains things that i need
You can do this by using list comprehension as a filter, like this
set2, list1 = set(['def', 'xyz', 'abc']), ['abc', 'def', '123']
print [item for item in list1 if item not in set2]
# ['123']
We convert the elements of list2 to a set, because they offer faster lookups.
The logic is similar to writing like this
result = []
for item in list1:
if item not in set2:
result.append(item)
If you don't have any duplicate in list1 (or if you want to remove duplicates), you can use this:
list1 = set(['abc', 'def', '123'])
list2 = set(['def', 'xyz', 'abc'])
print(list(list1 - list2))
list1 = set(['abc', 'def', '123'])
list2 = set(['def', 'xyz', 'abc'])
# here result will contain only the intersected element
# so its very less.
result = set(filter(set(list1).__contains__, list2))
newlist = list()
for elm in list1:
if elm not in result:
newlist.append(elm)
print newlist
Output:
['123']
An even shorter answer using builtin set methods:
list1 = ['abc', 'def', '123']
list2 = ['def', 'xyz', 'abc']
set1 = set(list1)
set2 = set(list2)
print(set1.difference(set2)))
Quote from above documentation:
"Return a new set with elements in the set that are not in the others."

Categories

Resources