Python finding multiple strings - python

I've a list ['test_x', 'text', 'x']. Is there any way to use regex in python to find if the string inside the list contains either '_x' or 'x'?
'x' should be a single string and not be part of a word without _.
The output should result in ['test_x', 'x'].
Thanks.

Using one line comprehension:
l = ['test_x', 'text', 'x']
result = [i for i in l if '_x' in i or 'x' == i]

You can use regexp this way:
import re
print(list(filter(lambda x: re.findall(r'_x|^x$',x),l)))
The regexp searches for exact patterns ('_x' or 'x') within each element of the list. applies the func to each element of the iterable.
You can make your expression more genric this way:
print(list(filter(lambda x: re.findall(r'[^A-Za-z]x|^\W*x\W*$',x),l)))
Here am telling python to search for expressions which DON't start with A to Z or a to z but end in x OR search for expressions that start and end with 0 or more non-word characters but have x in between. You can refer this quick cheatsheet on regular expressions https://www.debuggex.com/cheatsheet/regex/python

[re.findall('x|_x', s) for s in your_list]

Related

Need to remove quotes inside the array of a string using python

input = "'Siva', ['Aswin','latha'], 'Senthil',['Aswin','latha']"
expected output:
"'Siva', [Aswin,latha], 'Senthil',[Aswin,latha]"
I have used positive lookbehind and lookahead but its not working.
pattern
(?<=\[)\'+(?=\])
We can use re.sub here with a callback lambda function:
inp = "'Siva', ['Aswin','latha'], 'Senthil',['Aswin','latha']"
output = re.sub(r'\[.*?\]', lambda x: x.group().replace("'", ""), inp)
print(output)
This prints:
'Siva', [Aswin,latha], 'Senthil',[Aswin,latha]
import re
input = "'Siva', ['Aswin','latha'], 'Senthil',['Aswin','latha']"
print(re.sub(r"(?<=\[).*?(?=\])", lambda val: re.sub(r"'(\w+?)'", r"\1", val.group()), input))
# 'Siva', [Aswin,latha], 'Senthil',[Aswin,latha]
You can try something like this if you don't want to import re:
X = eval("'Siva', ['Aswin','latha'], 'Senthil',['Aswin','latha']")
Y = []
for x in X:
Y.append(f"[{', '.join(x)}]" if isinstance(x, list) else f"'{x}'")
print(", ".join(Y))
You can use re.findall with an alternation pattern to find either fragments between ] and [, or otherwise non-single-quote characters, and then join the fragments together with ''.join:
''.join(re.findall(r"[^\]]*\[|\][^\[]*|[^']+", input))
Demo: https://replit.com/#blhsing/ClassicFrostyBookmark
This is generally more efficient than using re.sub with a callback since there is overhead involved in making a callback for each match.

list comprehension with substring replacement not working as intended

I have a list as follows:
alist = ['xx_comb', 'xx_combined', 'xxx_rrr', '123_comb']
I want to replace all occurrences of '_comb' with '_eeee' .
But not 'xx_combined'. Only if the word ends with '_comb', then the replacement should happen.
I tried
[sub.replace('_comb', '_eeee') for sub in alist if '_combined' not in sub)]
But this does not work.
Only if the word ends with _comb, then the replacement should occurs.
This is job for .endswith not in (is substring), also you should use ternary if rather than comprehension if which do filtering. That is:
alist = ['xx_comb', 'xx_combined', 'xxx_rrr', '123_comb']
result = [i.replace('_comb', '_eeee') if i.endswith('_comb') else i for i in alist]
print(result) # ['xx_eeee', 'xx_combined', 'xxx_rrr', '123_eeee']
The way your condition is written means that any value with _combined in it is not in your output list. Instead, you need to make the replace conditional on _combined not being in the value:
alist = ['xx_comb', 'xx_combined', 'xxx_rrr', '123_comb']
print([sub.replace('_comb', '_eeee') if '_combined' not in sub else sub for sub in alist])
Output:
['xx_eeee', 'xx_combined', 'xxx_rrr', '123_eeee']
Based on the wording of your question though, you might be better off using re.sub to replace _comb at the end of the string with _eeee:
import re
alist = ['xx_comb', 'xx_combined', 'xxx_rrr', '123_comb']
print([re.sub(r'_comb$', '_eeee', sub) for sub in alist])
Output:
['xx_eeee', 'xx_combined', 'xxx_rrr', '123_eeee']

Spliting string into two by comma using python

I have following data in a list and it is a hex number,
['aaaaa955554e']
I would like to split this into ['aaaaa9,55554e'] with a comma.
I know how to split this when there are some delimiters between but how should i do for this case?
Thanks
This will do what I think you are looking for:
yourlist = ['aaaaa955554e']
new_list = [','.join([x[i:i+6] for i in range(0, len(x), 6)]) for x in yourlist]
It will put a comma at every sixth character in each item in your list. (I am assuming you will have more than just one item in the list, and that the items are of unknown length. Not that it matters.)
i assume you wanna split into every 6th character
using regex
import re
lst = ['aaaaa955554e']
newlst = re.findall('\w{6}', lst[0])
# ['aaaaa9', '55554e']
Using list comprehension, this works for multiple items in lst
lst = ['aaaaa955554e']
newlst = [item[i:i+6] for i in range(0,len(a[0]),6) for item in lst]
# ['aaaaa9', '55554e']
This could be done using a regular expression substitution as follows:
import re
print re.sub(r'([a-zA-Z]+\d)(.*?)', r'\1,\2', 'aaaaa955554e', count=1)
Giving you:
aaaaa9,55554e
This splits after seeing the first digit.

How to get string list index?

I have a list text_lines = ['asdf','kibje','ABC','beea'] and I need to find an index where string ABCappears.
ABC = [s for s in text_lines if "ABC" in s]
ABC is now "ABC".
How to get index?
Greedy (raises exception if not found):
index = next(i for i, s in enumerate(text_lines) if "ABC" in s)
Or, collect all of them:
indices = [i for i, s in enumerate(text_lines) if "ABC" in s]
text_lines = ['asdf','kibje','ABC','beea']
abc_index = text_lines.index('ABC')
if 'ABC' appears only once. the above code works, because index gives the index of first occurrence.
for multiple occurrences you can check wim's answer
Simple python list function.
index = text_lines.index("ABC")
If the string is more complicated, you may need to combine with regex, but for a perfect match this simple solution is best.
Did you mean the index of "ABC" ?
If there is just one "ABC", you can use built-in index() method of list:
text_lines.index("ABC")
Else, if there are more than one "ABC"s, you can use enumerate over the list:
indices = [idx for idx,val in enumerate(text_lines) if val == "ABC"]

How do you use a regex in a list comprehension in Python?

I'm trying to locate all index positions of a string in a list of words and I want the values returned as a list. I would like to find the string if it is on its own, or if it is preceded or followed by punctuation, but not if it is a substring of a larger word.
The following code only captures "cow" only and misses both "test;cow" and "cow."
myList = ['test;cow', 'one', 'two', 'three', 'cow.', 'cow', 'acow']
myString = 'cow'
indices = [i for i, x in enumerate(myList) if x == myString]
print indices
>> 5
I have tried changing the code to use a regular expression:
import re
myList = ['test;cow', 'one', 'two', 'three', 'cow.', 'cow', 'acow']
myString = 'cow'
indices = [i for i, x in enumerate(myList) if x == re.match('\W*myString\W*', myList)]
print indices
But this gives an error: expected string or buffer
If anyone knows what I'm doing wrong I'd be very happy to hear. I have a feeling it's something to do with the fact I'm trying to use a regular expression in there when it's expecting a string. Is there a solution?
The output I'm looking for should read:
>> [0, 4, 5]
Thanks
You don't need to assign the result of match back to x. And your match should be on x rather than list.
Also, you need to use re.search instead of re.match, since your the regex pattern '\W*myString\W*' will not match the first element. That's because test; is not matched by \W*. Actually, you only need to test for immediate following and preceding character, and not the complete string.
So, you can rather use word boundaries around the string:
pattern = r'\b' + re.escape(myString) + r'\b'
indices = [i for i, x in enumerate(myList) if re.search(pattern, x)]
There are a few problems with your code. First, you need to match the expr against the list element (x), not against the whole list (myList). Second, in order to insert a variable in the expression, you have to use + (string concatenation). And finally, use raw literals (r'\W) to properly interpet slashes in the expr:
import re
myList = ['test;cow', 'one', 'two', 'three', 'cow.', 'cow', 'acow']
myString = 'cow'
indices = [i for i, x in enumerate(myList) if re.match(r'\W*' + myString + r'\W*', x)]
print indices
If there are chances that myString contains special regexp characters (like a slash or a dot), you'll also need to apply re.escape to it:
regex = r'\W*' + re.escape(myString) + r'\W*'
indices = [i for i, x in enumerate(myList) if re.match(regex, x)]
As pointed out in the comments, the following might be a better option:
regex = r'\b' + re.escape(myString) + r'\b'
indices = [i for i, x in enumerate(myList) if re.search(regex, x)]

Categories

Resources