Python String Match with respective index

Python String Match with respective index - python

str1 = ['106.51.107.185', '122.169.20.139', '123.201.53.226']
str2 = ['106.51.107.185', '122.169.20.138', '123.201.53.226']
I need to match the above string based on their respective Index.
str1[0] match str2[0]
str1[1] match str2[1]
str1[2] match str2[2]
based on the match i need the output.
I tried from my end, between the 2 strings, str[0] is checking the match with str2[:], it need to match only with the respective indexes alone. Please assist.
Thanks !!!

Truth values
You can use:
from operator import eq
map(eq, str1, str2)
This will produce an iterable of booleans (True or False) in python-3.x, and a list of booleans in python-2.x. In case you want a list in python-3.x, you can use the list(..) construct over the map(..):
from operator import eq
list(map(eq, str1, str2))
This works since map takes as first argument a function (here eq from the operator package), and one or more iterables). It will then call that function on the arguments of the iterables (so the first item of str1 and str2, then the second item of str1 and str2, and so on). The outcome of that function call is yielded.
Indices
Alternatively, we can use list comprehension, to get the indices, for example:
same_indices = [i for i, (x, y) for enumerate(zip(str1, str2)) if x == y]
or the different ones:
diff_indices = [i for i, (x, y) for enumerate(zip(str1, str2)) if x != y]
We can also reuse the above map result with:
from operator import eq, itemgetter
are_same = map(eq, str1, str2)
same_indices = map(itemgetter(0),
filter(itemgetter(1), enumerate(are_same))
)
If we then convert same_indices to a list, we get:
>>> list(same_indices)
[0, 2]
We can also perform such construct on are_diff:
from operator import ne, itemgetter
are_diff = map(ne, str1, str2)
diff_indices = map(itemgetter(0),
filter(itemgetter(1), enumerate(are_diff))
)

You can use zip and list comprehension i.e
[i==j for i,j in zip(str1,str2)]
[True, False, True]

Following is a simple solution using for loop:
res = []
for i in range(len(str1)):
res.append(str1[i] == str2[i])
print(res)
Output:
[True, False, True]
One can also use list comprehension for this:
res = [ (str1[i] == str2[i]) for i in range(len(str1)) ]
Edit: to get indexes of matched and non-matched:
matched = []
non_matched = []
for i in range(len(str1)):
if str1[i] == str2[i]:
matched.append(i)
else:
non_matched.append(i)
print("matched:",matched)
print("non-matched:", non_matched)
Output:
matched: [0, 2]
non-matched: [1]

I am not sure of the exact output you need but, if you want to compare those two lists and get the difference between them you can convert them to set then subtract them as follows:
st = str(set(str1) - set(str2))

Related

How to group all the first characters of a string in a list of string , all second character of a string and so on in a list of string in python

a=["cypatlyrm","aolsemone","nueeleuap"]
o/p needed is : canyoupleasetellmeyournamep
I have tried
for i in range(len(a)):
for j in range(len(a)):
res+=a[j][i]
it gives o/p : canyouple
how to get full output ?

You can use itertools.zip_longest with fill value as empty string'' and itertools.chain and the join the result to get what you want.
from itertools import zip_longest, chain
seq = ["cypatlyrm", "aolsemone", "nueeleuap"]
res = ''.join(chain.from_iterable(zip_longest(*seq, fillvalue='')))
print(res)
Output
canyoupleasetellmeyournamep
Using zip_longest makes sure that this also works with cases where the element sizes are not equal. If all elements in the list are guaranteed to be the same length then a normal zip would also work.
If all the elements have the same length then you can use this approach that does not need libraries that have to be imported.
seq = ["cypatlyrm", "aolsemone", "nueeleuap"]
res = ''
for i in range(len(seq[0])):
for j in seq:
res += j[i]
print(res)

How to filter list based on multiple conditions?

I have the following lists:
target_list = ["FOLD/AAA.RST.TXT"]
and
mylist =
[
"FOLD/AAA.RST.12345.TXT",
"FOLD/BBB.RST.12345.TXT",
"RUNS/AAA.FGT.12345.TXT",
"FOLD/AAA.RST.87589.TXT",
"RUNS/AAA.RST.11111.TXT"
]
How can I filter only those records of mylist that correspond to target_list? The expected result is:
"FOLD/AAA.RST.12345.TXT"
"FOLD/AAA.RST.87589.TXT"
The following mask is considered for filtering mylist
xxx/yyy.zzz.nnn.txt
If xxx, yyy and zzz coincide with target_list, then the record should be selected. Otherwise it should be dropped from the result.
How can I solve this task withou using for loop?
selected_list = []
for t in target_list:
r1 = l.split("/")[0]
a1 = l.split("/")[1].split(".")[0]
b1 = l.split("/")[1].split(".")[1]
for l in mylist:
r2 = l.split("/")[0]
a2 = l.split("/")[1].split(".")[0]
b2 = l.split("/")[1].split(".")[1]
if (r1==r2) & (a1==a2) & (b1==b2):
selected_list.append(l)

You can define a "filter-making function" that preprocesses the target list. The advantages of this are:
Does minimal work by caching information about target_list in a set: The total time is O(N_target_list) + O(N), since set lookups are O(1) on average.
Does not use global variables. Easily testable.
Does not use nested for loops
def prefixes(target):
"""
>>> prefixes("FOLD/AAA.RST.TXT")
('FOLD', 'AAA', 'RST')
>>> prefixes("FOLD/AAA.RST.12345.TXT")
('FOLD', 'AAA', 'RST')
"""
x, rest = target.split('/')
y, z, *_ = rest.split('.')
return x, y, z
def matcher(target_list):
targets = set(prefixes(target) for target in target_list)
def is_target(t):
return prefixes(t) in targets
return is_target
Then, you could do:
>>> list(filter(matcher(target_list), mylist))
['FOLD/AAA.RST.12345.TXT', 'FOLD/AAA.RST.87589.TXT']

Define a function to filter values:
target_list = ["FOLD/AAA.RST.TXT"]
def keep(path):
template = get_template(path)
return template in target_list
def get_template(path):
front, numbers, ext = path.rsplit('.', 2)
template = '.'.join([front, ext])
return template
This uses str.rsplit which searches the string in reverse and splits it on the given character, . in this case. The parameter 2 means it only performs at most two splits. This gives us three parts, the front, the numbers, and the extension:
>>> 'FOLD/AAA.RST.12345.TXT'.rsplit('.', 2)
['FOLD/AAA.RST', '12345', 'TXT']
We assign these to front, numbers and ext.
We then build a string again using str.join
>>> '.'.join(['FOLD/AAA.RST', 'TXT']
'FOLD/AAA.RST.TXT'
So this is what get_template returns:
>>> get_template('FOLD/AAA.RST.12345.TXT')
'FOLD/AAA.RST.TXT'
We can use it like so:
mylist = [
"FOLD/AAA.RST.12345.TXT",
"FOLD/BBB.RST.12345.TXT",
"RUNS/AAA.FGT.12345.TXT",
"FOLD/AAA.RST.87589.TXT",
"RUNS/AAA.RST.11111.TXT"
]
from pprint import pprint
pprint(filter(keep, mylist))
Output:
['FOLD/AAA.RST.12345.TXT'
'FOLD/AAA.RST.87589.TXT']

You can use regular expressions to define a pattern, and check if your strings match that pattern.
In this case, split the target and insert a \d+ in between the xxx/yyy.zzz. and the .txt part. Use this as the pattern.
The pattern \d+ means any number of digits. The rest of the pattern will be created based on the literal values of xxx/yyy.zzz and .txt. Since the period has a special meaning in regular expressions, we have to escape it with a \.
import re
selected_list = []
for target in target_list:
base, ext = target.rsplit(".", 1)
pat = ".".join([base, "\d+", ext] ).replace(".", "\.")
selected_list.append([s for s in mylist if re.match(pat, s) is not None])
print(selected_list)
#[['FOLD/AAA.RST.12345.TXT', 'FOLD/AAA.RST.87589.TXT']]
If the pattern does not match, re.match returns None.

Why not use filter + lambda function:
import re
result=list(filter(lambda item: re.sub(r'.[0-9]+', '', item) == target_list[0], mylist))
Some comments:
The approach is to exclude digits from the comparison. So in the
lambda function, for each mylist item we replace digits with '',
then compare against the only item in target_list, target_list[0].
filter will match all items where the lambda function is True
Wrap everything in list to convert from filter object to list
object

pattern match get list and dict from string

I have string below,and I want to get list,dict,var from this string.
How can I to split this string to specific format?
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
import re
m1 = re.findall (r'(?=.*,)(.*?=\[.+?\],?)',s)
for i in m1 :
print('m1:',i)
I only get result 1 correctly.
Does anyone know how to do?
m1: list_c=[1,2],
m1: a=3,b=1.3,c=abch,list_a=[1,2],

Use '=' to split instead, then you can work around with variable name and it's value.
You still need to handle the type casting for values (regex, split, try with casting may help).
Also, same as others' comment, using dict may be easier to handle
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
al = s.split('=')
var_l = [al[0]]
value_l = []
for a in al[1:-1]:
var_l.append(a.split(',')[-1])
value_l.append(','.join(a.split(',')[:-1]))
value_l.append(al[-1])
output = dict(zip(var_l, value_l))
print(output)

You may have better luck if you more or less explicitly describe the right-hand side expressions: numbers, lists, dictionaries, and identifiers:
re.findall(r"([^=]+)=" # LHS and assignment operator
+r"([+-]?\d+(?:\.\d+)?|" # Numbers
+r"[+-]?\d+\.|" # More numbers
+r"\[[^]]+\]|" # Lists
+r"{[^}]+}|" # Dictionaries
+r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
s)
# [('list_c', '[1,2]'), ('a', '3'), ('b', '1.3'), ('c', 'abch'),
# ('list_a', '[1,2]'), ('dict_a', '{a:2,b:3}')]

The answer is like below
import re
from pprint import pprint
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1],Save,Record,dict_a={a:2,b:3}'
m1 = re.findall(r"([^=]+)=" # LHS and assignment operator
+r"([+-]?\d+(?:\.\d+)?|" # Numbers
+r"[+-]?\d+\.|" # More numbers
+r"\[[^]]+\]|" # Lists
+r"{[^}]+}|" # Dictionaries
+r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
s)
temp_d = {}
for i,j in m1:
temp = i.strip(',').split(',')
if len(temp)>1:
for k in temp[:-1]:
temp_d[k]=''
temp_d[temp[-1]] = j
else:
temp_d[temp[0]] = j
pprint(temp_d)
Output is like
{'Record': '',
'Save': '',
'a': '3',
'b': '1.3',
'c': 'abch',
'dict_a': '{a:2,b:3}',
'list_a': '[1]',
'list_c': '[1,2]'}

Instead of picking out the types, you can start by capturing the identifiers. Here's a regex that captures all the identifiers in the string (for lowercase only, but see note):
regex = re.compile(r'([a-z]|_)+=')
#note if you want all valid variable names: r'([a-z]|[A-Z]|[0-9]|_)+'
cases = [x.group() for x in re.finditer(regex, s)]
This gives a list of all the identifiers in the string:
['list_c=', 'a=', 'b=', 'c=', 'list_a=', 'dict_a=']
We can now define a function to sequentially chop up s using the
above list to partition the string sequentially:
def chop(mystr, mylist):
temp = mystr.partition(mylist[0])[2]
cut = temp.find(mylist[1]) #strip leading bits
return mystr.partition(mylist[0])[2][cut:], mylist[1:]
mystr = s[:]
temp = [mystr]
mylist = cases[:]
while len() > 1:
mystr, mylist = chop(mystr, mylist)
temp.append(mystr)
This (convoluted) slicing operation gives this list of strings:
['list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'list_a=[1,2],dict_a={a:2,b:3}',
'dict_a={a:2,b:3}']
Now cut off the ends using each successive entry:
result = []
for x in range(len(temp) - 1):
cut = temp[x].find(temp[x+1]) - 1 #-1 to remove commas
result.append(temp[x][:cut])
result.append(temp.pop()) #get the last item
Now we have the full list:
['list_c=[1,2]', 'a=3', 'b=1.3', 'c=abch', 'list_a=[1,2]', 'dict_a={a:2,b:3}']
Each element is easily parsable into key:value pairs (and is also executable via exec).

Regular Expressions: Search in list in python3

I have a list of strings.
Consider the code below:
import re
mylist = ["http://abc/12345?abc", "https://abc/abc/2516423120?$abc$"]
r = re.compile("(\d{3,})")
result0 = list(filter(r.findall, mylist)) # Note 1
print(result0)
result1 = r.findall(mylist[0])
result2 = r.findall(mylist[1])
print(result1, result2)
The results are:
['http://abc/12345?abc', 'https://abc/abc/2516423120?$abc$']
['12345'] ['2516423120']
Why is there a difference in the results we get?
Code snippet

I'm not sure what you expected filter to do, but what it does here is that it returns an iterator over all elements x of mylist for which bool(r.findall(x)) is False. This is only the case if r.findall(x) returns an empty list, i.e. the regex does not match the string, so here result0 contains the same values as mylist.

removing duplicates from a bool list

I am trying to get a word in a list that is followed by a word with a ''.'' in it. for example, if this is a list
test_list = ["hello", "how", "are.", "you"]
it would select the word ''you'' I have managed to pull this off but I am trying to ensure that I do not get duplicate words.
Here is what I have so far
list = []
i = 0
bool = False
words = sent.split()
for word in words:
if bool:
list.append(word)
bool = False
# the bellow if statment seems to make everything worse instead of fixing the duplicate problem
if "." in word and word not in list:
bool = True
return list

Your whole code can be reduced to this example using zip() and list comprehension:
a = ['hello', 'how', 'are.', 'you']
def get_new_list(a):
return [v for k,v in zip(a, a[1:]) if k.endswith('.')]
Then, to remove the duplicates, if there is any, you can use set(), like this example:
final = set(get_new_list(a))
output:
{'you'}

This isn't based off of the code you posted, however it should do exactly what you're asking.
def get_word_after_dot(words):
for index, word in enumerate(words):
if word.endswith('.') and len(words) - index > 1:
yield words[index + 1]
Iterating over this generator will yield words that are followed by a period.

Here is a different approach to the same problem.
import itertools
from collections import deque
t = deque(map(lambda x: '.' in x, test_list)) # create a deque of bools
>>deque([False, False, True, False])
t.rotate(1) # shift it by one since we want the word after the '.'
>>deque([False, False, False, True])
set(itertools.compress(test_list, t)) # and then grab everywhere it is True
>>{'you'}

In the itertools recipes is the definition of pairwise which is useful to iterating a list 2 at a time:
def pairwise(iterable):
a, b = it.tee(iterable)
next(b, None)
return a, b
You can use this to create a list of words that follow a word ending in '.':
words = [n for m, n in zip(*pairwise(l)) if m[-1] == '.']
Remove duplicates:
seen = set()
results = [x for x in words if not (x in seen or seen.add(x))]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python String Match with respective index - python

You can use zip and list comprehension i.e [i==j for i,j in zip(str1,str2)] [True, False, True]

I am not sure of the exact output you need but, if you want to compare those two lists and get the difference between them you can convert them to set then subtract them as follows: st = str(set(str1) - set(str2))

Related

How to group all the first characters of a string in a list of string , all second character of a string and so on in a list of string in python

How to filter list based on multiple conditions?

pattern match get list and dict from string

Regular Expressions: Search in list in python3

removing duplicates from a bool list

Categories

Resources