check if two lists differ for an element - python

I have two lists:
list1=['h', 'e', 'n', 'o', 'p']
list2=['e', 'h', 'c', 'n', 'p', 'o']
I want my function diff1 to return true if these two lists differ for exactly one element
in this case diff1 return True because list2 has a 'c'
I can assume list2 has always exactly one more element than list1
thanks you for any help you can provide

You could use the symmetric difference of sets:
symmetric_difference(other)
set ^ other
Return a new set with elements
in either the set or other but not both.
list1=['h', 'e', 'n', 'o', 'p']
list2=['e', 'h', 'c', 'n', 'p', 'o']
sym_diff = set(list1).symmetric_difference(list2)
print(sym_diff)
# {'c'}
And you just need to check if this difference contains one item:
one_different = len(sym_diff) == 1
print(one_different)
# True

Related

To check whether at least one list contains a specific element

Could someone please tell me what is the shortest way to write this logic?
I have two lists as list_one and list_two containing some letters. If none of these two lists contain 'B', I need to print(True). The snippet I have written works, but I am curious to know whether there is a pythonic way to write this instead of  repeating 'B' twice in the same line.
list_one = ['A', 'K', 'L', 'J']
list_two = ['N', 'M', 'P', 'O']
if 'B' not in list_one and 'B' not in list_two:
print('True')
 
Thanks in advance and any help would be greatly appreciated. 
Well, you can do that (even though I think your way is the best):
list_one = ['A', 'K', 'L', 'J']
list_two = ['N', 'M', 'P', 'O']
if 'B' not in (set(list_one) & set(list_two)):
print('True')
Or:
if 'B' not in list_one + list_two:
print('True')
You can try the all function if it is more readable for you.
list_one = ['A', 'K', 'L', 'J']
list_two = ['N', 'M', 'P', 'O']
print(all('B' not in current_list for current_list in [list_one, list_two]))
We have sets in Python and they are really fast compared to lists.
Here some features about sets.
Sets are unordered.
Set elements are unique.
Duplicate elements are not allowed in sets.
Therefore you can search the item in a common set.
list_one = ['A', 'K', 'L', 'J']
list_two = ['N', 'M', 'P', 'O']
if 'B' not in set(list_one + list_two)
print('True')
Bonus:
You can use extend method to speed up list concatenation
set( list_one.extend( list_two ))
A different way of doing this is putting all lists in a Pandas DataFrame first:
import pandas as pd
df = pd.DataFrame(list(zip(list_one, list_two)), columns =['l1', 'l2'])
Then you could check easily if the character B is absent by returning a True. The double .any() is to check rows and columns:
~df.isin(['B']).any().any()

Unique elements of sublists depending on specific value in sublist

I an trying to select unique datasets from a very large quite inconsistent list.
My Dataset RawData consists of string-items of different length.
Some items occure many times, for example: ['a','b','x','15/30']
The key to compare the item is always the last string: for example '15/30'
The goal is: Get a list: UniqueData with items that occure only once. (i want to keep the order)
Dataset:
RawData = [['a','b','x','15/30'],['d','e','f','g','h','20/30'],['w','x','y','z','10/10'],['a','x','c','15/30'],['i','j','k','l','m','n','o','p','20/60'],['x','b','c','15/30']]
My desired solution Dataset:
UniqueData = [['a','b','x','15/30'],['d','e','f','g','h','20/30'],['w','x','y','z','10/10'],['i','j','k','l','m','n','o','p','20/60']]
I tried many possible solutions for instance:
for index, elem in enumerate(RawData): and appending to a new list if.....
for element in list does not work, because the items are not exactly the same.
Can you help me finding a solution to my problem?
Thanks!
The best way to remove duplicates is to add them into a set. Add the last element into a set as to keep track of all the unique values. When the value you want to add is already present in the set unique do nothing if not present add the value to set unique and append the lst to result list here it's new.
Try this.
new=[]
unique=set()
for lst in RawData:
if lst[-1] not in unique:
unique.add(lst[-1])
new.append(lst)
print(new)
#[['a', 'b', 'x', '15/30'],
['d', 'e', 'f', 'g', 'h', '20/30'],
['w', 'x', 'y', 'z', '10/10'],
['i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', '20/60']]
You could set up a new array for unique data and to track the items you have seen so far. Then as you loop through the data if you have not seen the last element in that list before then append it to unique data and add it to the seen list.
RawData = [['a', 'b', 'x', '15/30'], ['d', 'e', 'f', 'g', 'h', '20/30'], ['w', 'x', 'y', 'z', '10/10'],
['a', 'x', 'c', '15/30'], ['i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', '20/60'], ['x', 'b', 'c', '15/30']]
seen = []
UniqueData = []
for data in RawData:
if data[-1] not in seen:
UniqueData.append(data)
seen.append(data[-1])
print(UniqueData)
OUTPUT
[['a', 'b', 'x', '15/30'], ['d', 'e', 'f', 'g', 'h', '20/30'], ['w', 'x', 'y', 'z', '10/10'], ['i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', '20/60']]
RawData = [['a','b','x','15/30'],['d','e','f','g','h','20/30'],['w','x','y','z','10/10'],['a','x','c','15/30'],['i','j','k','l','m','n','o','p','20/60'],['x','b','c','15/30']]
seen = []
seen_indices = []
for _,i in enumerate(RawData):
# _ -> index
# i -> individual lists
if i[-1] not in seen:
seen.append(i[-1])
else:
seen_indices.append(_)
for index in sorted(seen_indices, reverse=True):
del RawData[index]
print (RawData)
Using a set to filter out entries for which the key has already been seen is the most efficient way to go.
Here's a one liner example using a list comprehension with internal side effects:
UniqueData = [rd for seen in [set()] for rd in RawData if not(rd[-1] in seen or seen.add(rd[-1])) ]

How can method which evaluates a list to determine if it contains specific consecutive items be improved?

I have a nested list of tens of millions of lists (I can use tuples also). Each list is 2-7 items long. Each item in a list is a string of 1-5 characters and occurs no more than once per list. (I use single char items in my example below for simplicity)
#Example nestedList:
nestedList = [
['a', 'e', 'O', 'I', 'g', 's'],
['w', 'I', 'u', 'O', 's', 'g'],
['e', 'z', 's', 'I', 'O', 'g']
]
I need to find which lists in my nested list contain a pair of items so I can do stuff to these lists while ignoring the rest. This needs to be as efficient as possible.
I am using the following function but it seems pretty slow and I just know there has to be a smarter way to do this.
def isBadInList(bad, checkThisList):
numChecks = len(list) - 1
for x in range(numChecks):
if checkThisList[x] == bad[0] and checkThisList[x + 1] == bad[1]:
return True
elif checkThisList[x] == bad[1] and checkThisList[x + 1] == bad[0]:
return True
return False
I will do this,
bad = ['O', 'I']
for checkThisList in nestedLists:
result = isBadInList(bad, checkThisList)
if result:
doStuffToList(checkThisList)
#The function isBadInList() only returns true for the first and third list in nestedList and false for all else.
I need a way to do this faster if possible. I can use tuples instead of lists, or whatever it takes.
nestedList = [
['a', 'e', 'O', 'I', 'g', 's'],
['w', 'I', 'u', 'O', 's', 'g'],
['e', 'z', 's', 'I', 'O', 'g']
]
#first create a map
pairdict = dict()
for i in range(len(nestedList)):
for j in range(len(nestedList[i])-1):
pair1 = (nestedList[i][j],nestedList[i][j+1])
if pair1 in pairdict:
pairdict[pair1].append(i+1)
else:
pairdict[pair1] = [i+1]
pair2 = (nestedList[i][j+1],nestedList[i][j])
if pair2 in pairdict:
pairdict[pair2].append(i+1)
else:
pairdict[pair2] = [i+1]
del nestedList
print(pairdict.get(('e','z'),None))
create a value pair and store them into map,the key is pair,value is index,and then del your list(this maybe takes too much memory),
and then ,you can take advantage of the dict for look up,and print the indexes where the value appears.
I think you could use some regex here to speed this up, although it will still be a sequential operation so your best case is O(n) using this approach since you have to iterate through each list, however since we have to iterate over every sublist as well that would make it O(n^2).
import re
p = re.compile('[OI]{2}|[IO]{2}') # match only OI or IO
def is_bad(pattern, to_check):
for item in to_check:
maybe_found = pattern.search(''.join(item))
if maybe_found:
yield True
else:
yield False
l = list(is_bad(p, nestedList))
print(l)
# [True, False, True]

Python divide each string by the total lenght of string

Thank you for your help and patience.
I am new to python and am attempting to calculate the number of times a particular atomic symbol appears divided by the total number of atoms. So that the function accepts a list of strings as argument and returns a list containing the fraction of 'C', 'H', 'O' and 'N'. But I keep on getting one result instead of getting all for each of my atoms. My attempt is below:
Atoms = ['N', 'C', 'C', 'O', 'H', 'H', 'C', 'H', 'H', 'H', 'H', 'O', 'H']
def count_atoms (atoms):
for a in atoms:
total = atoms.count(a)/len(atoms)
return total
Then
faa = count_atoms(atoms)
print(faa)
However I only get one result which is 0.07692307692307693. I was supposed to get a list starting with [0.23076923076923078,..etc], but I don't know how to. I was supposed to calculate the fraction of 'C', 'H', 'O' and 'N' atomic symbols in the molecule using a for loop and a return statement. :( Please help, it will be appreciated.
#ganderson comment explains the problem. as to alternative implementation here is one using collections.Counter
from collections import Counter
atoms = ['N', 'C', 'C', 'O', 'H', 'H', 'C', 'H', 'H', 'H', 'H', 'O', 'H']
def count_atoms(atoms):
num = len(atoms)
return {atom:count/num for atom, count in Counter(atoms).items()}
print(count_atoms(atoms))
Well you return the variable total at your first loop. Why don't you use a list to store your values? Like this:
atoms = ['N', 'C', 'C', 'O', 'H', 'H', 'C', 'H', 'H', 'H', 'H', 'O', 'H'] #python is case sensitive!
def count_atoms (atoms):
return_list = [] #empty list
for a in atoms:
total = atoms.count(a)/len(atoms)
return_list.append(total) #we add a new item
return return_list #we return everything and leave the function
It would be better to return a dictionary so you know which element the fraction corresponds to:
>>> fractions = {element: Atoms.count(element)/len(Atoms) for element in Atoms}
>>> fractions
{'N': 0.07692307692307693, 'C': 0.23076923076923078, 'O': 0.15384615384615385, 'H': 0.5384615384615384}
You can, then, even lookup the fraction for a particular element like:
>>> fractions['N']
0.07692307692307693
However, if you must use a for loop and a return statement, then answer from #not_a_bot_no_really_82353 would be the right one.
A simple one liner should do
[atoms.count(a)/float(len(atoms)) for a in set(atoms)]
Or better create a dictionary using comprehension
{a:atoms.count(a)/float(len(atoms)) for a in set(atoms)}
Output
{'C': 0.23076923076923078,
'H': 0.5384615384615384,
'N': 0.07692307692307693,
'O': 0.15384615384615385}
If you still want to use the for loop. I would suggest to go for map which would be a lot cleaner
atoms = ['N', 'C', 'C', 'O', 'H', 'H', 'C', 'H', 'H', 'H', 'H', 'O', 'H']
def count_atoms (a):
total = atoms.count(a)/float(len(atoms))
return total
map(count_atoms,atoms)

Recursive Selection Sort python

There is a recursive selection sort in the upcoming question that has to be done.
def selsort(l):
"""
sorts l in-place.
PRE: l is a list.
POST: l is a sorted list with the same elements; no return value.
"""
l1 = list("sloppy joe's hamburger place")
vl1 = l1
print l1 # should be: """['s', 'l', 'o', 'p', 'p', 'y', ' ', 'j', 'o', 'e', "'", 's', ' ', 'h', 'a', 'm', 'b', 'u', 'r', 'g', 'e', 'r', ' ', 'p', 'l', 'a', 'c', 'e']"""
ret = selsort(l1)
print l1 # should be """[' ', ' ', ' ', "'", 'a', 'a', 'b', 'c', 'e', 'e', 'e', 'g', 'h', 'j', 'l', 'l', 'm', 'o', 'o', 'p', 'p', 'p', 'r', 'r', 's', 's', 'u', 'y']"""
print vl1 # should be """[' ', ' ', ' ', "'", 'a', 'a', 'b', 'c', 'e', 'e', 'e', 'g', 'h', 'j', 'l', 'l', 'm', 'o', 'o', 'p', 'p', 'p', 'r', 'r', 's', 's', 'u', 'y']"""
print ret # should be "None"
I know how to get this by using key → l.sort(key=str.lower). But the question wants me to extract the maximum element, instead of the minimum, only to .append(...) it on to a recursively sorted sublist.
If I could get any help I would greatly appreciate it.
So. Do you understand the problem?
Let's look at what you were asked to do:
extract the maximum element, instead of the minimum, only to .append(...) it on to a recursively sorted sublist.
So, we do the following things:
1) Extract the maximum element. Do you understand what "extract" means here? Do you know how to find the maximum element?
2) Recursively sort the sublist. Here, "the sublist" consists of everything else after we extract the maximum element. Do you know how recursion works? You just call your sort function again with the sublist, relying on it to do the sorting. After all, the purpose of your function is to sort lists, so this is supposed to work, right? :)
3) .append() the maximum element onto the result of sorting the sublist. This should not require any explanation.
Of course, we need a base case for the recursion. When do we have a base case? When we can't follow the steps exactly as written. When does that happen? Well, why would it happen? Answer: we can't extract the maximum element if there are no elements, because then there is no maximum element to extract.
Thus, at the beginning of the function we check if we were passed an empty list. If we were, we just return an empty list, because sorting an empty list results in an empty list. (Do you see why?) Otherwise, we go through the other steps.
the sort method should do what you want. If you want the reverse, just use list.reverse()
If your job is to make your own sort method, that can be done.
Maybe try something like this:
def sort(l):
li=l[:] #to make new copy
newlist = [] #sorted list will be stored here
while len(li) != 0: #while there is stuff to be sorted
bestindex = -1 #the index of the highest element
bestchar = -1 #the ord value of the highest character
bestcharrep = -1 #a string representation of the best character
i = 0
for v in li:
if ord(v) < bestchar or bestchar == -1:#check if string is lower than old best
bestindex = i #Update best records
bestchar = ord(v)
bestcharrep = v
i += 1
del li[bestindex] #delete retrieved element from list
newlist.append(bestcharrep) #add element to new list
return newlist #return the sorted list

Categories

Resources