Count letter differences of two strings Python - python

I'm having some problems with an exercise about strings in python.
I have 2 different lists:
list1= "ABCDEFABCDEF"
and
list2= "AZBYCXDWEVFABCDEF"
I need to compare those 2 lists according to their position so the 1 letter together, then the 2...using the min length (so here length of list1) and store the letters in a new variable according to if they are different or the same.
identicals=[]
different=[]
I tried to code something and it seems to find the same ones, but doesn't work on the different ones since it copies them multiple times.
for x in list1:
for y in list2:
if list1>list2:
if x==y:
identicals.append(x)
if x!=y :
different.append(x)
if list2>list1:
if y==x:
identicals.append(y)
if y!=x:
different.append(y)
EDIT: Output result should be something like this:
identicals=['A']
different=["Z","B","Y","C","X","D","W","E","V",F","A"]
The thing is that the letter A is only shown on identicals but not in different even if F!=A.

You are getting unwanted duplicates because you have a nested pair of for loops, so each item in list2 get tested for every item in list1.
The key idea is to iterate over the two strings in parallel. You can do that with the built-in zip function, which yields a tuple of the corresponding items from each iterable you feed it, stopping as soon as one of the iterables runs out of items.
From your example code, it looks like you want to take the items for the different list from the longer string. To do that efficiently, figure out which string is the longer before you start looping.
I've renamed your strings because it's confusing to give strings a name starting with "list".
s1 = "ABCDEFABCDEF"
s2 = "AZBYCXDWEVFABCDEF"
identicals = []
different = []
small, large = (s1, s2) if len(s1) <= len(s2) else (s2, s1)
for x, y in zip(small, large):
if x == y:
identicals.append(y)
else:
different.append(y)
print(identicals)
print(different)
output
['A']
['Z', 'B', 'Y', 'C', 'X', 'D', 'W', 'E', 'V', 'F', 'A']
We can make the for loop more compact at the expense of readability. We put our destination lists into a tuple and then use the equality test to select which list in that tuple to append to. This works because False has a numeric value of 0, and True has a numeric value of 1.
for x, y in zip(small, large):
(different, identicals)[x == y].append(y)

The problem is the inner loop. You are comparing each of the letters in list1 with all the letters of list2.
Instead you should have a single loop:
identicals=[]
different=[]
short_list = list1 if len(list1)<= len(list2) else list2
for i in range(len(short_list):
if list1[i] == list2[i]:
identicals.append(list1[i])
else:
different.append(short_list[i])

Try this
a = "ABCDEFABCDEF"
b = "AZBYCXDWEVFABCDEF"
import numpy
A = numpy.array(list(a))
B = numpy.array(list(b))
common = A[:len(B)] [ (A[:len(B)] == B[:len(A)]) ]
different = A[:len(B)] [ - (A[:len(B)] == B[:len(A)]) ]
>>> list(common)
['A']
>>> list(different)
['B', 'C', 'D', 'E', 'F', 'A', 'B', 'C', 'D', 'E', 'F']

Related

pytest dynamically generate cases by comparing 2 lists

I am having so much trouble in dynamically generating pytest test cases. here is the scenario.
I have 2 lists with string elements
List1 = ['a', 'b', 'c']
List2 = ['a', 'c', 'd']
I want to compare these 2 lists and compare first element from List1 with first element of List2.
For instance if first element from List1 'a' == first element from List2 'a' then it's a PASS. Now 2nd element from List1 does not match with 2nd element from List2 so it should throw FAIL in this case 'b' != 'c'
I am not sure how to write pytest test case for this. These 2 lists are long lists with too many elements
this is what I am doing
def list1():
for i in some_csv:
list1.append(i)
def list2():
for i in some_csv:
list2.append(i)
#pytest.mark.parametrize("list1", list1())
def test_validation(list1):
assert list1 in list2()
What am I doing wrong?
Why not just compare the list directly?
assert list1() == list2()
But if you insist on testing element-wise:
#pytest.mark.parametrize(
"one, two", # these are the args of the test case
zip(list1(), list2()), # generate tuples to unpack into the args
)
def test_element(one, two):
assert one == two
The test names will be messy, though. You may want to provide a list / iterator to ids to make the names look nicer.
In a first time you forgot the return in each list method.
In a second time the test is not the test you describe.
prametrize will loop over the result of list1() this mean you will compare if 'a' is in the result of list2().
Exemple
'a' in ['a', 'c', 'd']
Using in could introduce some issue for examle
List1 = ['a', 'b', 'c', 'd']
List2 = ['a', 'c', 'd' 'b']
'b' in list2() # This is True
My understanding of your issue is to compare the equality of the two list.To do it you can do
['a', 'b', 'c'] == ['a', 'c', 'd']
Your example will look like this
def list1():
for i in some_csv:
list1.append(i)
return list1
def list2():
for i in some_csv:
list2.append(i)
return list2
def test_validation():
assert list1() == list2()
Best regards

Combine two lists for web scraping project

I have two lists: one is a basic list, with some being "new line" symbols (\n), and the other is a list of lists.
I would like to combine these, inserting the elements from the second list into the first list where \n appears so that the end result looks like this:
first_list = ['a','b','c',\n, 'd','e','f','g','h',\n]
second_list = [[1,2,3], [4,5,6]]
combine the two lists to get:
combined_list = ['a','b','c',1,2,3,'d','e','f','g','h',4,5,6].
I'm not quite sure why, but all of the \n's in the first list in my example have the same index position. Thus, when I try to loop through both lists to first find the position of the first \n and insert [1,2,3] at that point, it ends up inserting [1,2,3] at all positions where \n appears. I tried to simplify the problem here to make it easier to communicate, but the original problem comes from a web scraping project I am working on to retrieve information from Linkedin, with the elements in these lists being profile attributes for Linkedin users. Perhaps that could help to explain why the \n's all have the same index position?
Any help with how to properly combine these lists in the above way/explanations for why the \n's have the same index position would be greatly appreciated! Please let me know if I can provide any additional details. Thanks.
I know you mentioned there were some indexing issues with the \n values, but hopefully this sets you on the right track..it works for the simplified example data you provided (re-formatted to be proper considering the letters are not variables)
l1 = ['a','b','c','\n','d','e','f','g','h','\n']
l2 = [[1,2,3], [4,5,6]]
l3 = []
n_count = 0
for i,l in zip(range(len(l1)),l1):
if l != '\n':
l3.append(l)
elif l == '\n':
l3.extend(l2[n_count])
n_count += 1
print(l3)
['a', 'b', 'c', 1, 2, 3, 'd', 'e', 'f', 'g', 'h', 4, 5, 6]
if you can figure out the indexing issue this might help you with minor modifications
I assume that List1 and/or List2 can be continued.
The number of lists in List2 needs to be higher or equal than '\n's in List1.
List1 = ['a','b','c', '\n', 'd','e','f','g','h', '\n']
List2 = [[1,2,3], [4,5,6]]
# wanted = [a,b,c,1,2,3,d,e,f,g,h,4,5,6]
list3 = []
counter = 0
for val in List1:
if val == '\n':
[list3.append(elem) for elem in List2[counter]]
counter += 1
else:
list3.append(val)
print(list3)
['a', 'b', 'c', 1, 2, 3, 'd', 'e', 'f', 'g', 'h', 4, 5, 6]

How to get certain number of alphabets from a list?

I have a 26-digit list. I want to print out a list of alphabets according to the numbers. For example, I have a list(consisting of 26-numbers from input):
[0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0]
I did like the output to be like this:
[e,e,l,s]
'e' is on the output 2-times because on the 4-th index it is the 'e' according to the English alphabet formation and the digit on the 4-th index is 2. It's the same for 'l' since it is on the 11-th index and it's digit is 1. The same is for s. The other letters doesn't appear because it's digits are zero.
For example, I give another 26-digit input. Like this:
[1,2,2,3,4,0,3,4,4,1,3,1,4,4,1,0,0,0,0,0,4,2,3,2,2,1]
The output should be:
[a,b,b,c,c,d,d,d,e,e,e,e,g,g,g,h,h,h,h,i,i,i,i,j,k,k,k,l,m,m,m,m,n,n,n,n,o,u,u,u,u,v,v,w,w,w,x,x,y,y,z]
Is, there any possible to do this in Python 3?
You can use chr(97 + item_index) to get the respective items and then multiply by the item itself:
In [40]: [j * chr(97 + i) for i, j in enumerate(lst) if j]
Out[40]: ['ee', 'l', 's']
If you want them separate you can utilize itertools module:
In [44]: from itertools import repeat, chain
In [45]: list(chain.from_iterable(repeat(chr(97 + i), j) for i, j in enumerate(lst) if j))
Out[45]: ['e', 'e', 'l', 's']
Yes, it is definitely possible in Python 3.
Firstly, define an example list (as you did) of numbers and an empty list to store the alphabetical results.
The actual logic to link with the index is using chr(97 + index), ord("a") = 97 therefore, the reverse is chr(97) = a. First index is 0 so 97 remains as it is and as it iterates the count increases and your alphabets too.
Next, a nested for-loop to iterate over the list of numbers and then another for-loop to append the same alphabet multiple times according to the number list.
We could do this -> result.append(chr(97 + i) * my_list[i]) in the first loop itself but it wouldn't yield every alphabet separately [a,b,b,c,c,d,d,d...] rather it would look like [a,bb,cc,ddd...].
my_list = [1,2,2,3,4,0,3,4,4,1,3,1,4,4,1,0,0,0,0,0,4,2,3,2,2,1]
result = []
for i in range(len(my_list)):
if my_list[i] > 0:
for j in range(my_list[i]):
result.append(chr(97 + i))
else:
pass
print(result)
An alternative to the wonderful answer by #Kasramvd
import string
n = [0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0]
res = [i * c for i, c in zip(n, string.ascii_lowercase) if i]
print(res) # -> ['ee', 'l', 's']
Your second example produces:
['a', 'bb', 'cc', 'ddd', 'eeee', 'ggg', 'hhhh', 'iiii', 'j', 'kkk', 'l', 'mmmm', 'nnnn', 'o', 'uuuu', 'vv', 'www', 'xx', 'yy', 'z']
Splitting the strings ('bb' to 'b', 'b') can be done with the standard schema:
[x for y in something for x in y]
Using a slightly different approach, which gives the characters individually as in your example:
import string
a = [0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0]
alphabet_lookup = np.repeat(np.arange(len(a)), a)
letter_lookup = np.array(list(string.ascii_lowercase))
res = letter_lookup[alphabet_lookup]
print(res)
To get
['e' 'e' 'l' 's']

Comparing Order of 2 Python Lists

I am looking for some help comparing the order of 2 Python lists, list1 and list2, to detect when list2 is out of order.
list1 is static and contains the strings a,b,c,d,e,f,g,h,i,j. This is the "correct" order.
list2 contains the same strings, but the order and the number of strings may change. (e.g. a,b,f,d,e,g,c,h,i,j or a,b,c,d,e)
I am looking for an efficient way to detect when list2 is our of order by comparing it against list1.
For example, if list2 is a,c,d,e,g,i should return true (as the strings are in order)
While, if list2 is a,d,b,c,e should return false (as string d appears out of order)
First, let's define list1:
>>> list1='a,b,c,d,e,f,g,h,i,j'.split(',')
>>> list1
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
While your list1 happens to be in alphabetical order, we will not assume that. This code works regardless.
Now, let's create a list2 that is out-of-order:
>>> list2 = 'a,b,f,d,e,g,c,h,i,j'.split(',')
>>> list2
['a', 'b', 'f', 'd', 'e', 'g', 'c', 'h', 'i', 'j']
Here is how to test whether list2 is out of order or not:
>>> list2 == sorted(list2, key=lambda c: list1.index(c))
False
False means out-of-order.
Here is an example that is in order:
>>> list2 = 'a,b,d,e'.split(',')
>>> list2 == sorted(list2, key=lambda c: list1.index(c))
True
True means in-order.
Ignoring elements of list1 not in list2
Let's consider a list2 that has an element not in list1:
>>> list2 = 'a,b,d,d,e,z'.split(',')
To ignore the unwanted element, let's create list2b:
>>> list2b = [c for c in list2 if c in list1]
We can then test as before:
>>> list2b == sorted(list2b, key=lambda c: list1.index(c))
True
Alternative not using sorted
>>> list2b = ['a', 'b', 'd', 'd', 'e']
>>> indices = [list1.index(c) for c in list2b]
>>> all(c <= indices[i+1] for i, c in enumerate(indices[:-1]))
True
Why do you need to compare it to list1 since it seems like list1 is in alphabetical order? Can't you do the following?
def is_sorted(alist):
return alist == sorted(alist)
print is_sorted(['a','c','d','e','g','i'])
# True
print is_sorted(['a','d','b','c','e'])
# False
Here's a solution that runs in expected linear time. That isn't too important if list1 is always 10 elements and list2 isn't any longer, but with longer lists, solutions based on index will experience extreme slowdowns.
First, we preprocess list1 so we can quickly find the index of any element. (If we have multiple list2s, we can do this once and then use the preprocessed output to quickly determine whether multiple list2s are sorted):
list1_indices = {item: i for i, item in enumerate(list1)}
Then, we check whether each element of list2 has a lower index in list1 than the next element of list2:
is_sorted = all(list1_indices[x] < list1_indices[y] for x, y in zip(list2, list2[1:]))
We can do better with itertools.izip and itertools.islice to avoid materializing the whole zip list, letting us save a substantial amount of work if we detect that list2 is out of order early in the list:
# On Python 3, we can just use zip. islice is still needed, though.
from itertools import izip, islice
is_sorted = all(list1_indices[x] < list1_indices[y]
for x, y in izip(list2, islice(list2, 1, None)))
is_sorted = not any(list1.index(list2[i]) > list1.index(list2[i+1]) for i in range(len(list2)-1))
The function any returns true if any of the items in an iterable are true. I combined this with a generator expression that loops through all the values of list2 and makes sure they're in order according to list1.
if list2 == sorted(list2,key=lambda element:list1.index(element)):
print('sorted')
Let's assume that when you are writing that list1 is strings a,b,c,d,e,f,g,h,i that this means that a could be 'zebra' and string b could actually be 'elephant' so the order may not be alphabetical. Also, this approach will return false if an item is in list2 but not in list1.
good_list2 = ['a','c','d','e','g','i']
bad_list2 = ['a','d','b','c','e']
def verify(somelist):
list1 = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
while len(list1) > 0:
try:
list1 = list1[:list1.index(somelist.pop())]
except ValueError:
return False
return True

Fast way to find lists contains two particular items?

I have a list of lists (about 200) contains different strings:
lists = [
['a', 'b', 'c', 'g', ...],
['b', 'c', 'f', 'a', ...],
...
]
now I'd like to find out all the lists that contains two given strings, in the given order.
for example, given ('a', 'g'), ['a', 'b', 'c', 'g', ...] will be matched.
what's the pythonic way of doing this?
In my opinion the most Pythonic way would be:
selection = [L for L in lists
if x1 in L and x2 in L and L.index(x1) < L.index(x2)]
the defect is that it will search each element twice, first to check the presence (forgetting the index) and second to check the ordering.
An alternative could be
def match(a, b, L):
try:
return L.index(a) < L.index(b)
except ValueError:
return False
selection = [L for L in lists if match(x1, x2, L)]
but I find it slightly uglier and I wouldn't use it unless performance is a problem.
If the logic required instead is to accept a list containing [... x2 ... x1 ... x2 ...] then the check is different:
selection = [L for L in lists
if x1 in L and x2 in L[L.index(x1)+1:]]
that translated to english as "if x1 is in the list and x2 is the part following first x1" that also works as expected if x1 and x2 are the same value.

Categories

Resources