Regular Expressions: Search in list in python3

Regular Expressions: Search in list in python3 - python

I have a list of strings.
Consider the code below:
import re
mylist = ["http://abc/12345?abc", "https://abc/abc/2516423120?$abc$"]
r = re.compile("(\d{3,})")
result0 = list(filter(r.findall, mylist)) # Note 1
print(result0)
result1 = r.findall(mylist[0])
result2 = r.findall(mylist[1])
print(result1, result2)
The results are:
['http://abc/12345?abc', 'https://abc/abc/2516423120?$abc$']
['12345'] ['2516423120']
Why is there a difference in the results we get?
Code snippet

I'm not sure what you expected filter to do, but what it does here is that it returns an iterator over all elements x of mylist for which bool(r.findall(x)) is False. This is only the case if r.findall(x) returns an empty list, i.e. the regex does not match the string, so here result0 contains the same values as mylist.

Related

How to group all the first characters of a string in a list of string , all second character of a string and so on in a list of string in python

a=["cypatlyrm","aolsemone","nueeleuap"]
o/p needed is : canyoupleasetellmeyournamep
I have tried
for i in range(len(a)):
for j in range(len(a)):
res+=a[j][i]
it gives o/p : canyouple
how to get full output ?

You can use itertools.zip_longest with fill value as empty string'' and itertools.chain and the join the result to get what you want.
from itertools import zip_longest, chain
seq = ["cypatlyrm", "aolsemone", "nueeleuap"]
res = ''.join(chain.from_iterable(zip_longest(*seq, fillvalue='')))
print(res)
Output
canyoupleasetellmeyournamep
Using zip_longest makes sure that this also works with cases where the element sizes are not equal. If all elements in the list are guaranteed to be the same length then a normal zip would also work.
If all the elements have the same length then you can use this approach that does not need libraries that have to be imported.
seq = ["cypatlyrm", "aolsemone", "nueeleuap"]
res = ''
for i in range(len(seq[0])):
for j in seq:
res += j[i]
print(res)

Regex: Split characters with "/"

I have these strings, for example:
['2300LO/LCE','2302KO/KCE']
I want to have output like this:
['2300LO','2300LCE','2302KO','2302KCE']
How can I do it with Regex in Python?
Thanks!

You can make a simple generator that yields the pairs for each string. Then you can flatten them into a single list with itertools.chain()
from itertools import product, chain
def getCombos(s):
nums, code = re.match(r'(\d+)(.*)', s).groups()
for pair in product([nums], code.split("/")):
yield ''.join(pair)
a = ['2300LO/LCE','2302KO/KCE']
list(chain.from_iterable(map(getCombos, a)))
# ['2300LO', '2300LCE', '2302KO', '2302KCE']
This has the added side benefit or working with strings like '2300LO/LCE/XX/CC' which will give you ['2300LO', '2300LCE', '2300XX', '2300CC',...]

You can try something like this:
list1 = ['2300LO/LCE','2302KO/KCE']
list2 = []
for x in list1:
a = x.split('/')
tmp = re.findall(r'\d+', a[0]) # extracting digits
list2.append(a[0])
list2.append(tmp[0] + a[1])
print(list2)

This can be implemented with simple string splits.
Since you asked the output with regex, here is your answer.
list1 = ['2300LO/LCE','2302KO/KCE']
import re
r = re.compile("([0-9]{1,4})([a-zA-Z].*)/([a-zA-Z].*)")
out = []
for s in list1:
items = r.findall(s)[0]
out.append(items[0]+items[1])
out.append(items[2])
print(out)
The explanation for the regex - (4 digit number), followed by (any characters), followed by a / and (rest of the characters).
they are grouped with () , so that when you use find all, it becomes individual elements.

Does string contain any of the words in my list?

I want to check a string to see if it contains any of the words i have in my list.
the list is has somewhere around 100 individual words.
i have tried using regex but cant get it to work...
string = "<div class="header_links">$$ - $$$, Dansk, Veganske retter, Glutenfri retter</div>"
list = ['Café','Afrikansk','............','Sushi','Svensk','Sydamerikansk','Syditaliensk','Szechuan','Taiwansk','Thai','Tibetansk','Østeuropæisk','Dansk']
in this case the string has 'Dansk' in it. The string could contain more than one of the words in the list.
i want to write a piece of code that prints the words in the list which is also in the string.
in this case the output should be: Dansk
if there was more than one word in the string it should be: Dansk, ...., ....
I hope someone can help

>>> list = ['Café','Afrikansk','............','Sushi','Svensk','Sydamerikansk','Syditaliensk','Szechuan','Taiwansk','Thai','Tibetansk','Østeuropæisk','Dansk']
>>> string = """<div class="header_links">$$ - $$$, Dansk, Veganske retter, Glutenfri retter</div>"""
>>> [x for x in list if x in string]
['Dansk']
I recommend not using list as a variable name, as it usually referring to the type list (like str or int)

Use a list comprehension with a membership check:
[x for x in lst if x in string]
Note that I have renamed your list to lst, as list is built-in.
Example:
string = '<div class="header_links">$$ - $$$, Dansk, Veganske retter, Glutenfri retter</div>'
lst = ['Café','Afrikansk','Sushi','Svensk','Sydamerikansk','Syditaliensk','Szechuan','Taiwansk','Thai','Tibetansk','Østeuropæisk','Dansk']
print([x for x in lst if x in string])
# ['Dansk']

in your case you can use:
string_intersection = set(string.replace(',', '').split()).intersection(my_list)
print(*string_intersection, sep =',')
output:
Dansk

Python String Match with respective index

str1 = ['106.51.107.185', '122.169.20.139', '123.201.53.226']
str2 = ['106.51.107.185', '122.169.20.138', '123.201.53.226']
I need to match the above string based on their respective Index.
str1[0] match str2[0]
str1[1] match str2[1]
str1[2] match str2[2]
based on the match i need the output.
I tried from my end, between the 2 strings, str[0] is checking the match with str2[:], it need to match only with the respective indexes alone. Please assist.
Thanks !!!

Truth values
You can use:
from operator import eq
map(eq, str1, str2)
This will produce an iterable of booleans (True or False) in python-3.x, and a list of booleans in python-2.x. In case you want a list in python-3.x, you can use the list(..) construct over the map(..):
from operator import eq
list(map(eq, str1, str2))
This works since map takes as first argument a function (here eq from the operator package), and one or more iterables). It will then call that function on the arguments of the iterables (so the first item of str1 and str2, then the second item of str1 and str2, and so on). The outcome of that function call is yielded.
Indices
Alternatively, we can use list comprehension, to get the indices, for example:
same_indices = [i for i, (x, y) for enumerate(zip(str1, str2)) if x == y]
or the different ones:
diff_indices = [i for i, (x, y) for enumerate(zip(str1, str2)) if x != y]
We can also reuse the above map result with:
from operator import eq, itemgetter
are_same = map(eq, str1, str2)
same_indices = map(itemgetter(0),
filter(itemgetter(1), enumerate(are_same))
)
If we then convert same_indices to a list, we get:
>>> list(same_indices)
[0, 2]
We can also perform such construct on are_diff:
from operator import ne, itemgetter
are_diff = map(ne, str1, str2)
diff_indices = map(itemgetter(0),
filter(itemgetter(1), enumerate(are_diff))
)

You can use zip and list comprehension i.e
[i==j for i,j in zip(str1,str2)]
[True, False, True]

Following is a simple solution using for loop:
res = []
for i in range(len(str1)):
res.append(str1[i] == str2[i])
print(res)
Output:
[True, False, True]
One can also use list comprehension for this:
res = [ (str1[i] == str2[i]) for i in range(len(str1)) ]
Edit: to get indexes of matched and non-matched:
matched = []
non_matched = []
for i in range(len(str1)):
if str1[i] == str2[i]:
matched.append(i)
else:
non_matched.append(i)
print("matched:",matched)
print("non-matched:", non_matched)
Output:
matched: [0, 2]
non-matched: [1]

I am not sure of the exact output you need but, if you want to compare those two lists and get the difference between them you can convert them to set then subtract them as follows:
st = str(set(str1) - set(str2))

Spliting string into two by comma using python

I have following data in a list and it is a hex number,
['aaaaa955554e']
I would like to split this into ['aaaaa9,55554e'] with a comma.
I know how to split this when there are some delimiters between but how should i do for this case?
Thanks

This will do what I think you are looking for:
yourlist = ['aaaaa955554e']
new_list = [','.join([x[i:i+6] for i in range(0, len(x), 6)]) for x in yourlist]
It will put a comma at every sixth character in each item in your list. (I am assuming you will have more than just one item in the list, and that the items are of unknown length. Not that it matters.)

i assume you wanna split into every 6th character
using regex
import re
lst = ['aaaaa955554e']
newlst = re.findall('\w{6}', lst[0])
# ['aaaaa9', '55554e']
Using list comprehension, this works for multiple items in lst
lst = ['aaaaa955554e']
newlst = [item[i:i+6] for i in range(0,len(a[0]),6) for item in lst]
# ['aaaaa9', '55554e']

This could be done using a regular expression substitution as follows:
import re
print re.sub(r'([a-zA-Z]+\d)(.*?)', r'\1,\2', 'aaaaa955554e', count=1)
Giving you:
aaaaa9,55554e
This splits after seeing the first digit.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Regular Expressions: Search in list in python3 - python

Related

How to group all the first characters of a string in a list of string , all second character of a string and so on in a list of string in python

Regex: Split characters with "/"

Does string contain any of the words in my list?

Python String Match with respective index

Spliting string into two by comma using python

Categories

Resources