list comprehension with substring replacement not working as intended - python

I have a list as follows:
alist = ['xx_comb', 'xx_combined', 'xxx_rrr', '123_comb']
I want to replace all occurrences of '_comb' with '_eeee' .
But not 'xx_combined'. Only if the word ends with '_comb', then the replacement should happen.
I tried
[sub.replace('_comb', '_eeee') for sub in alist if '_combined' not in sub)]
But this does not work.

Only if the word ends with _comb, then the replacement should occurs.
This is job for .endswith not in (is substring), also you should use ternary if rather than comprehension if which do filtering. That is:
alist = ['xx_comb', 'xx_combined', 'xxx_rrr', '123_comb']
result = [i.replace('_comb', '_eeee') if i.endswith('_comb') else i for i in alist]
print(result) # ['xx_eeee', 'xx_combined', 'xxx_rrr', '123_eeee']

The way your condition is written means that any value with _combined in it is not in your output list. Instead, you need to make the replace conditional on _combined not being in the value:
alist = ['xx_comb', 'xx_combined', 'xxx_rrr', '123_comb']
print([sub.replace('_comb', '_eeee') if '_combined' not in sub else sub for sub in alist])
Output:
['xx_eeee', 'xx_combined', 'xxx_rrr', '123_eeee']
Based on the wording of your question though, you might be better off using re.sub to replace _comb at the end of the string with _eeee:
import re
alist = ['xx_comb', 'xx_combined', 'xxx_rrr', '123_comb']
print([re.sub(r'_comb$', '_eeee', sub) for sub in alist])
Output:
['xx_eeee', 'xx_combined', 'xxx_rrr', '123_eeee']

Related

check if a nested list contains a substring

How to check if a nested list contains a substring?
strings = [[],["one", "two", "three"]]
substring = "wo"
strings_with_substring = [string for string in strings if substring in string]
print(strings_with_substring)
this script just prints :
[]
how to fix it? output should be:
two
==
Sayse, solution you provided doesn't work for me. I am new to python. I am sure I am missing something here. any thoughts?
import re
s = [[],["one", "two", "three"]]
substring = "wo"
# strings_with_substring = [string for string in strings if substring in string]
strings_with_substring = next(s for sl in strings for s in sl if substring in s)
print(strings_with_substring)
You are missing another level of iteration. Here is the looping logic without using a comprehension:
for sublist in strings:
for item in sublist:
if substring in item:
print(item)
Roll that up to a comprehension:
[item for sublist in strings for item in sublist if substring in item]
You're looking for
next(s for sl in strings for s in sl if substring in s)
This outputs "two", if you want a list of all elements then change the next for your list comprehension with given ammendments, or likewise, change next to any if you just want a boolean result
Since you said it should just print the string ~ You could use itertools to flatten your list and run it through a filter that you loop over.
from itertools import chain
strings = [[], ['one', 'two', 'three']]
substring = 'wo'
for found in filter(lambda s: substring in s, chain.from_iterable(strings)):
print(found)

Python finding multiple strings

I've a list ['test_x', 'text', 'x']. Is there any way to use regex in python to find if the string inside the list contains either '_x' or 'x'?
'x' should be a single string and not be part of a word without _.
The output should result in ['test_x', 'x'].
Thanks.
Using one line comprehension:
l = ['test_x', 'text', 'x']
result = [i for i in l if '_x' in i or 'x' == i]
You can use regexp this way:
import re
print(list(filter(lambda x: re.findall(r'_x|^x$',x),l)))
The regexp searches for exact patterns ('_x' or 'x') within each element of the list. applies the func to each element of the iterable.
You can make your expression more genric this way:
print(list(filter(lambda x: re.findall(r'[^A-Za-z]x|^\W*x\W*$',x),l)))
Here am telling python to search for expressions which DON't start with A to Z or a to z but end in x OR search for expressions that start and end with 0 or more non-word characters but have x in between. You can refer this quick cheatsheet on regular expressions https://www.debuggex.com/cheatsheet/regex/python
[re.findall('x|_x', s) for s in your_list]

Spliting string into two by comma using python

I have following data in a list and it is a hex number,
['aaaaa955554e']
I would like to split this into ['aaaaa9,55554e'] with a comma.
I know how to split this when there are some delimiters between but how should i do for this case?
Thanks
This will do what I think you are looking for:
yourlist = ['aaaaa955554e']
new_list = [','.join([x[i:i+6] for i in range(0, len(x), 6)]) for x in yourlist]
It will put a comma at every sixth character in each item in your list. (I am assuming you will have more than just one item in the list, and that the items are of unknown length. Not that it matters.)
i assume you wanna split into every 6th character
using regex
import re
lst = ['aaaaa955554e']
newlst = re.findall('\w{6}', lst[0])
# ['aaaaa9', '55554e']
Using list comprehension, this works for multiple items in lst
lst = ['aaaaa955554e']
newlst = [item[i:i+6] for i in range(0,len(a[0]),6) for item in lst]
# ['aaaaa9', '55554e']
This could be done using a regular expression substitution as follows:
import re
print re.sub(r'([a-zA-Z]+\d)(.*?)', r'\1,\2', 'aaaaa955554e', count=1)
Giving you:
aaaaa9,55554e
This splits after seeing the first digit.

How to get string list index?

I have a list text_lines = ['asdf','kibje','ABC','beea'] and I need to find an index where string ABCappears.
ABC = [s for s in text_lines if "ABC" in s]
ABC is now "ABC".
How to get index?
Greedy (raises exception if not found):
index = next(i for i, s in enumerate(text_lines) if "ABC" in s)
Or, collect all of them:
indices = [i for i, s in enumerate(text_lines) if "ABC" in s]
text_lines = ['asdf','kibje','ABC','beea']
abc_index = text_lines.index('ABC')
if 'ABC' appears only once. the above code works, because index gives the index of first occurrence.
for multiple occurrences you can check wim's answer
Simple python list function.
index = text_lines.index("ABC")
If the string is more complicated, you may need to combine with regex, but for a perfect match this simple solution is best.
Did you mean the index of "ABC" ?
If there is just one "ABC", you can use built-in index() method of list:
text_lines.index("ABC")
Else, if there are more than one "ABC"s, you can use enumerate over the list:
indices = [idx for idx,val in enumerate(text_lines) if val == "ABC"]

python: keep char only if it is within this list

i have a list:
a = ['a','b','c'.........'A','B','C'.........'Z']
and i have string:
string1= 's#$%ERGdfhliisgdfjkskjdfW$JWLI3590823r'
i want to keep ONLY those characters in string1 that exist in a
what is the most effecient way to do this? perhaps instead of having a be a list, i should just make it a string? like this a='abcdefg..........ABC..Z' ??
This should be faster.
>>> import re
>>> string1 = 's#$%ERGdfhliisgdfjkskjdfW$JWLI3590823r'
>>> a = ['E', 'i', 'W']
>>> r = re.compile('[^%s]+' % ''.join(a))
>>> print r.sub('', string1)
EiiWW
This is even faster than that.
>>> all_else = ''.join( chr(i) for i in range(256) if chr(i) not in set(a) )
>>> string1.translate(None, all_else)
'EiiWW'
44 microsec vs 13 microsec on my laptop.
How about that?
(Edit: turned out, translate yields the best performance.)
''.join([s for s in string1 if s in a])
Explanation:
[s for s in string1 if s in a]
creates a list of all characters in string1, but only if they are also in the list a.
''.join([...])
turns it back into a string by joining it with nothing ('') in between the elements of the given list.
List comprehension to the rescue!
wanted = ''.join(letter for letter in string1 if letter in a)
(Note that when passing a list comprehension to a function you can omit the brackets so that the full list isn't generated prior to being evaluated. While semantically the same as a list comprehension, this is called a generator expression.)
If, you are going to do this with large strings, there is a faster solution using translate; see this answer.
#katrielalex: To spell it out:
import string
string1= 's#$%ERGdfhliisgdfjkskjdfW$JWLI3590823r'
non_letters= ''.join(chr(i) for i in range(256) if chr(i) not in string.letters)
print string1.translate(None,non_letters)
print 'Simpler, but possibly less correct'
print string1.translate(None, string.punctuation+string.digits+string.whitespace)

Categories

Resources