Compare lists to find common elements in python [duplicate] - python

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Python - Intersection of two lists
i'm trying to compare two lists in order to find the number of elements they have in common.
The main problem I'm having is when either list contains repeated elements, for example
A = [1,1,1,1] and
B = [1,1,2,3]
using the code
n = 0
for x in A:
if x in B:
n += 1
print n
gives me the output that n = 4, as technically all elements of A are in B
I'd like to get the output that n = 2, preferably without using sets, Is there anyway I can adapt my code, or a new way of thinking about the problem to achieve this?
Thanks

It's not entirely clear what your specification is, but if you want the number of elements in A that appear in B, without regard to order, but with regard to multiplicity, use collections.Counter:
>>> from collections import Counter
>>> A = [1,1,1,1]
>>> B = [1,1,2,3]
>>> C = Counter(A) & Counter(B)
>>> sum(C.itervalues())
2
>>> list(C.elements())
[1, 1]

Here is an efficient (O(n logn)) way to do it without using sets:
def count_common(a, b):
ret = 0
a = sorted(a)
b = sorted(b)
i = j = 0
while i < len(a) and j < len(b):
c = cmp(a[i], b[j])
if c == 0:
ret += 1
if c <= 0:
i += 1
if c >= 0:
j += 1
return ret
print count_common([1,1,1,1], [1,1,2,3])
If your lists are always sorted, as they are in your example, you can drop the two sorted() calls. This would give an O(n) algorithm.

Here's an entirely different way of thinking about the problem.
Imagine I've got two words, "hello" and "world". To find the common elements, I could iterate through "hello", giving me ['h', 'e', 'l', 'l', 'o']. For each element in the list, I'm going to remove it from the second list(word).
Is 'h' in ['w', 'o', 'r', 'l', 'd']? No.
Is 'e' in ['w', 'o', 'r', 'l', 'd']? No.
Is 'l' in ['w', 'o', 'r', 'l', 'd']? Yes!
Remove it from "world", giving me ['w', 'o', 'r', 'd'].
is 'l' in ['w', 'o', 'r', 'd']? No.
Is 'o' in ['w', 'o', 'r', 'd']?
Yes! Remove it ['w', 'o', 'r', 'd'], giving me ['w', 'r', 'd']
Compare the length of the original object (make sure you've kept a copy around) to the newly generated object and you will see a difference of 2, indicating 2 common letters.

So you want the program to check whether only elements at the same indices in the two lists are equal? That would be pretty simple: Just iterate over the length of the two arrays (which I presume, are supposed to be of the same length), say using a variable i, and compare each by the A.index(i) and B.index(i) functions.
If you'd like, I could post the code.
If this is not what you want to do, please do make your problem clearer.

Related

Calculate the difference between 2 strings (Levenshtein distance)

I am trying to calculate the distance between two strings. The distance/difference between two strings refers to the minimum number of character insertions, deletions, and substitutions required to change one string to the other.
The method I have tried is to: convert two strings into lists, compare lists, check the differences, then add the differences
first_string = "kitten"
second_string = "sitting"
list_1 = list(first_string)
list_2 = list(second_string)
print("list_1 = ", list_1)
print("list_2 = ", list_2)
print(" ")
lengths = len(list_2) - len(list_1)
new_list = set(list_1) - set(list_2)
print(lengths)
print(new_list)
difference = lengths + int(new_list)
print(difference)
the output I get is:
list_1 = ['k', 'i', 't', 't', 'e', 'n']
list_2 = ['s', 'i', 't', 't', 'i', 'n', 'g']
1
{'e', 'k'}
Of which then I am trying to find out how to add these differences so it equals 3. I don't know how to make the outputs similar to add them together (adding 1 with {'e', 'k'} to equal a distance of 3).
You're almost there. Calculate the length of new_list using len() like you did with lengths:
difference = lengths + len(new_list)
Looks like you just need to change this line:
difference = lengths + int(len(new_list))
That should give you 3 like you want :)
This is referred to as the Levenshtein distance. Check out this implementation as further reading.

How can method which evaluates a list to determine if it contains specific consecutive items be improved?

I have a nested list of tens of millions of lists (I can use tuples also). Each list is 2-7 items long. Each item in a list is a string of 1-5 characters and occurs no more than once per list. (I use single char items in my example below for simplicity)
#Example nestedList:
nestedList = [
['a', 'e', 'O', 'I', 'g', 's'],
['w', 'I', 'u', 'O', 's', 'g'],
['e', 'z', 's', 'I', 'O', 'g']
]
I need to find which lists in my nested list contain a pair of items so I can do stuff to these lists while ignoring the rest. This needs to be as efficient as possible.
I am using the following function but it seems pretty slow and I just know there has to be a smarter way to do this.
def isBadInList(bad, checkThisList):
numChecks = len(list) - 1
for x in range(numChecks):
if checkThisList[x] == bad[0] and checkThisList[x + 1] == bad[1]:
return True
elif checkThisList[x] == bad[1] and checkThisList[x + 1] == bad[0]:
return True
return False
I will do this,
bad = ['O', 'I']
for checkThisList in nestedLists:
result = isBadInList(bad, checkThisList)
if result:
doStuffToList(checkThisList)
#The function isBadInList() only returns true for the first and third list in nestedList and false for all else.
I need a way to do this faster if possible. I can use tuples instead of lists, or whatever it takes.
nestedList = [
['a', 'e', 'O', 'I', 'g', 's'],
['w', 'I', 'u', 'O', 's', 'g'],
['e', 'z', 's', 'I', 'O', 'g']
]
#first create a map
pairdict = dict()
for i in range(len(nestedList)):
for j in range(len(nestedList[i])-1):
pair1 = (nestedList[i][j],nestedList[i][j+1])
if pair1 in pairdict:
pairdict[pair1].append(i+1)
else:
pairdict[pair1] = [i+1]
pair2 = (nestedList[i][j+1],nestedList[i][j])
if pair2 in pairdict:
pairdict[pair2].append(i+1)
else:
pairdict[pair2] = [i+1]
del nestedList
print(pairdict.get(('e','z'),None))
create a value pair and store them into map,the key is pair,value is index,and then del your list(this maybe takes too much memory),
and then ,you can take advantage of the dict for look up,and print the indexes where the value appears.
I think you could use some regex here to speed this up, although it will still be a sequential operation so your best case is O(n) using this approach since you have to iterate through each list, however since we have to iterate over every sublist as well that would make it O(n^2).
import re
p = re.compile('[OI]{2}|[IO]{2}') # match only OI or IO
def is_bad(pattern, to_check):
for item in to_check:
maybe_found = pattern.search(''.join(item))
if maybe_found:
yield True
else:
yield False
l = list(is_bad(p, nestedList))
print(l)
# [True, False, True]

How to rotate a list(not 2D) 90 degree clockwise?

As a beginner in Python, I think the biggest problem I have is overcomplicating a problem when it can be done a lot simpler. I have not found a solution for a list that is not two-dimensional, hence why I chose to ask.
Here is an example of what I am trying to do:
# Before
alphabet = ["ABCDEFG",
"HIJKLMN",
"OPQRSTU"]
# After
rotated_alphabet = ["OHA",
"PIB",
"QJC",
"RKD",
"SLE",
"TMF",
"UNG"]
What I have done so far:
length_of_column = len(alphabet)
length_of_row = len(alphabet[0])
temp_list = []
x = -1
for i in range(length_of_column):
while x < length_of_row-1:
x += 1
for row in alphabet:
temp_list.append(row[x])
temp_list = temp_list[::-1]
Output
print(temp_list)
>>> ['U', 'N', 'G', 'T', 'M', 'F', 'S','L','E','R','K','D','Q','J','C','P','I','B', 'O', 'H', 'A']
I need to make the list above in the desired format.
-How would I do this?
-Is there a simpler way to do it?
You can just zip the list of strings, and it will make tuples character by character, then you'll only have to join the tuples in reverse order. Here it is in just one line:
rotated_alphabet = [''.join(list(i)[::-1]) for i in zip(*alphabet)]
A variant of #MuhammadAhmad answer will be to use reversed, as reversed works with iterables, no need to convert to a list.
alphabet = ["ABCDEFG",
"HIJKLMN",
"OPQRSTU"]
rotated = [''.join(reversed(a)) for a in zip(*alphabet)]
print(rotated)
Output
['OHA', 'PIB', 'QJC', 'RKD', 'SLE', 'TMF', 'UNG']

Print same index of lists within lists [duplicate]

This question already has answers here:
Matrix Transpose in Python [duplicate]
(19 answers)
How do I transpose a List? [duplicate]
(4 answers)
Closed 4 years ago.
I have a list with lists in it.
[['H','J','K','L'],['Q','W','E','R'],['R','W','Q','T']]
I want to print the same indexes within one line with a space between them.the So output would be:
H Q R
J W W
K E Q
L R T
I tried using for loop using enumerate and while loop. Nothing seems to work and I'm just a beginner so I don't even know the correct way to approach this. I'd be really grateful if someone could help me out.
Thanks a lot! Have a nice day!
You can use str.join with zip:
s = [['H', 'J', 'K', 'L'], ['Q', 'W', 'E', 'R'], ['R', 'W', 'Q', 'T']]
new_s = '\n'.join(' '.join(i) for i in zip(*s))
Output:
H Q R
J W W
K E Q
L R T
'''
OP has 3 nested lists. OP wants to print each row of the list unto a line with
a space in between each line item.
'''
sample_list = [['H', 'J', 'K', 'L'], ['Q', 'W', 'E', 'R'], ['R', 'W', 'Q', 'T']]
import numpy as np
sample_array = np.array(sample_list)
print(sample_array.reshape(4, 3))
Here is the output:
[['H' 'J' 'K']
['L' 'Q' 'W']
['E' 'R' 'R']
['W' 'Q' 'T']]
As a beginner, it may be useful to learn about numpy. It simplifies tasks such as this when you are working with data in lists. What I did here is I created a numpy array from the nested list that you showed. numpy has a method called. reshape() that allows you to align the amount of rows and columns that you want for the data. In your example you wanted 4 rows and 3 columns for the letter, so passing 4, 3 to the .reshape() function will allow you to get 4 rows and 3 columns for the data. As a beginner, there are libraries in Python such as numpy that can help you simplify tasks such as this. I hope that this answer helps as you learn more.
For your specific case it's sufficient to do the following:
for i in range(len(list_a[0])):
print(" ".join([l[i] for l in list_a]))
The for loop makes sure that you are getting element in the order you want and the list comprehension just gets elements from sublists at specified index and prints them in order.

Multiply an integer in a list by a word in the list

I'm not sure how to multiply a number following a string by the string. I want to find the RMM of a compound so I started by making a dictionary of RMMs then have them added together. My issue is with compounds such as H2O.
name = input("Insert the name of a molecule/atom to find its RMM/RAM: ")
compound = re.sub('([A-Z])', r' \1', name)
Compound = compound.split(' ')
r = re.split('(\d+)', compound)
For example:
When name = H2O
Compound = ['', 'H2', 'O']
r = ['H', '2', 'O']
I want to multiply 2 by H making a value "['H', 'H', 'O']."
TLDR: I want integers following names in a list to print the previously listed object 'x' amount of times (e.g. [O, 2] => O O, [C, O, 2] => C O O)
The question is somewhat complicated, so let me know if I can clarify it. Thanks.
How about the following, after you define compound:
test = re.findall('([a-zA-z]+)(\d*)', compound)
expand = [a*int(b) if len(b) > 0 else a for (a, b) in test]
Match on letters of 1 or more instances followed by an optional number of digits - if there's no digit we just return the letters, if there is a digit we duplicate the letters by the appropriate value. This doesn't quite return what you expected - it instead will return ['HH', 'O'] - so please let me know if this suits.
EDIT: assuming your compounds use elements consisting of either a single capital letter or a single capital followed by a number of lowercase letters, you can add the following:
final = re.findall('[A-Z][a-z]*', ''.join(expand))
Which will return your elements each as a separate entry in the list, e.g. ['H', 'H', 'O']
EDIT 2: with the assumption of my previous edit, we can actually reduce the whole thing down to just a couple of lines:
name = raw_input("Insert the name of a molecule/atom to find its RMM/RAM: ")
test = re.findall('([A-z][a-z]*)(\d*)', name)
final = re.findall('[A-Z][a-z]*', ''.join([a*int(b) if len(b) > 0 else a for (a, b) in test]))
You could probably do something like...
compound = 'h2o'
final = []
for x in range(len(compound)):
if compound[x].isdigit() and x != 0:
for count in range(int(compound[x])-1):
final.append(compound[x-1])
else:
final.append(compound[x])
Use regex and a generator function:
import re
def multilpy_string(seq):
regex = re.compile("([a-zA-Z][0-9])|([a-zA-Z])")
for alnum, alpha in regex.findall(''.join(seq)):
if alnum:
for char in alnum[0] * int(alnum[1]):
yield char
else:
yield alpha
l = ['C', 'O', '2'] # ['C', 'O', 'O']
print(list(multilpy_string(l)))
We join your list back together using ''.join. Then we compile a regex pattern that matches two types of strings in your list. If the string is a letter and is followed by a number its put in a group. If its a single number, its put in its own group. We then iterate over each group. If we've found something in a group, we yield the correct values.
Here are a few nested for comprehensions to get it done in two lines:
In [1]: groups = [h*int(''.join(t)) if len(t) else h for h, *t in re.findall('[A-Z]\d*', 'H2O')]
In[2]: [c for cG in groups for c in cG]
Out[2]: ['H', 'H', 'O']
Note: I am deconstructing and reconstructing strings so this is probably not the most efficient method.
Here is a longer example:
In [2]: def findElements(molecule):
...: groups = [h*int(''.join(t)) if len(t) else h for h, *t in re.findall('[A-Z]\d*', molecule)]
...: return [c for cG in groups for c in cG]
In [3]: findElements("H2O5S7D")
Out[3]: ['H', 'H', 'O', 'O', 'O', 'O', 'O', 'S', 'S', 'S', 'S', 'S', 'S', 'S', 'D']
In python3 (I don't know about python2) you can simply multiply strings.
for example:
print("H"*2) # HH
print(2*"H") # HH
Proof that this information is useful:
r = ['H', '2', 'O']
replacements = [(index, int(ch)) for index, ch in enumerate(r) if ch.isdigit()]
for postion, times in replacements:
r[postion] = (times - 1) * r[postion - 1]
# flaten the result
r = [ch for s in r for ch in s]
print(r) # ['H', 'H', 'O']

Categories

Resources