Repetition inside of lists - python

I want to build a function that will return True if any two items in a list are the same.
For example, [1,7,3,7,4] should return True and ["one","ONE","One"] should return False.
I need help with which parts of python look for duplicates.

Loop over the values and use a set to track what you have already seen. As soon as you see a value again, return True:
def has_duplicates(lst):
seen = set()
for elem in lst:
if elem in seen:
return True
seen.add(elem)
return False
This is very efficient in that it short-circuits; it won't loop over the whole list if a duplicate has been detected early on.

Martijn's answer is the best, but with a few exceptions, this is worth a try.
>>> chk = lambda x: len(l) != len(set(l)) # check the length after removing dupes.
>>> l = [1,7,3,7,4]
>>> chk(l)
True
>>> l = ["one","ONE","One"]
>>> chk(l)
False
Note - As Martijn mentions in a comment, this is a slower process.

Using a collections.Counter dict:
from collections import Counter
def has_dupes(l):
# if most repeated key count is > 1 we have at least one dupe
return Counter(l).most_common(1)[0][1] > 1
Or use any:
def has_dupes(l):
return any(v > 1 for v in Counter(l).values())

Related

Fast and pythonic way to find out if an anagram is a palindrome?

Given a string, how do we check if any anagram of it can be a palindrome?
For example let us consider the string "AAC". An anagram of it is "ACA" which is a palindrome. We have to write a method which takes a string and outputs true if we can form a palindrome from any anagram of the given string. Otherwise outputs false.
This is my current solution:
from collections import defaultdict
def check(s):
newdict = defaultdict(int)
for e in s:
newdict[e] += 1
times = 0
for e in newdict.values():
if times == 2:
return False
if e == 1:
times += 1
return True
Any shorter solutions using the python library?
Here is shorter solution that uses the standard library, with a corrected algorithm (all the character counts must be even, except for at most one):
from collections import Counter
def check(s):
return sum(1 for count in Counter(s).itervalues() if count % 2 == 1) <= 1
This is short but "slow", as the program goes through all the odd counts instead of stopping as soon as two are found. A faster solution that stops as soon as possible, is:
def check(s):
odd_counts = (count for count in Counter(s).itervalues() if count % 2 == 1)
try:
next(odd_counts) # Fails if there is no odd count
next(odd_counts) # Fails if there is one odd count
except StopIteration:
return True
else:
return False
This is probably better fit for code golfing, but eh it is quite trivial.
Observe that palindromes require a balanced set of sides, so you need generally even number of inputs per type. However a single odd item can be provided in the middle, so you can essentially raise that to a maximum of one set of characters that are odd. This can be done with a single list comprehension
>>> from collections import Counter
>>> def is_palindrome(letters):
... return len([v for v in Counter(letters).values() if v % 2]) <= 1
...
>>> is_palindrome('level')
True
>>> is_palindrome('levels')
False
>>> is_palindrome('levelss')
True
Oh wait, someone else beat with a solution, but that's what I got.
Without using Counter:
>>> def isit(s):
... ls = [ x % 2 for x in [s.count(x) for x in set(s)]]
... return [False, True][all(ls) or ls.count(1) == 1]
...
>>> isit('abc')
False
>>> isit('abb')
True
>>> isit('abbd')
False
>>> isit('abbdd')
True
>>> isit('abbdda')
True
>>>
Even though it's not algorithmically the best, (if your strings are not extremely long it's not a problem), I'd like to provide a more readable solution.
from itertools import permutations
def has_palindrome(s):
return any(c == c[::-1] for c in permutations(s,len(s)))

Search function python

Hi Im trying to create a search function in python, that goes through a list and searches for an element in it.
so far ive got
def search_func(list, x)
if list < 0:
return("failure")
else:
x = list[0]
while x > list:
x = list [0] + 1 <---- how would you tell python to go to the next element in the list ?
if (x = TargetValue):
return "success"
else
return "failure"
Well, you current code isn't very Pythonic. And there are several mistakes! you have to use indexes to acces an element in a list, correcting your code it looks like this:
def search_func(lst, x):
if len(lst) <= 0: # this is how you test if the list is empty
return "failure"
i = 0 # we'll use this as index to traverse the list
while i < len(lst): # this is how you test to see if the index is valid
if lst[i] == x: # this is how you check the current element
return "success"
i += 1 # this is how you advance to the next element
else: # this executes only if the loop didn't find the element
return "failure"
... But notice that in Python you rarely use while to traverse a list, a much more natural and simpler approach is to use for, which automatically binds a variable to each element, without having to use indexes:
def search_func(lst, x):
if not lst: # shorter way to test if the list is empty
return "failure"
for e in lst: # look how easy is to traverse the list!
if e == x: # we no longer care about indexes
return "success"
else:
return "failure"
But we can be even more Pythonic! the functionality you want to implement is so common that's already built into lists. Just use in to test if an element is inside a list:
def search_func(lst, x):
if lst and x in lst: # test for emptiness and for membership
return "success"
else:
return "failure"
Are you saying you want to see if an element is in a list? If so, there is no need for a function like that. Just use in:
>>> lst = [1, 2, 3]
>>> 1 in lst
True
>>> 4 in lst
False
>>>
This method is a lot more efficient.
If you have to do it without in, I suppose this will work:
def search_func(lst, x):
return "success" if lst.count(x) else "failure"
you dont need to write a function for searching, just use
x in llist
Update:
def search_func(llist,x):
for i in llist:
if i==x:
return True
return False
You are making your problem more complex, while solving any problem just think before starting to code. You are using while loops and so on which may sometimes becomes an infinite loop. You should use a for loop to solve it. This is better than while loop. So just check which condition helps you. That's it you are almost done.
def search_func(lst,x):
for e in lst: #here e defines elements in the given list
if e==x: #if condition checks whether element is equal to x
return True
else:
return False
def search(query, result_set):
if isinstance(query, str):
query = query.split()
assert isinstance(query, list)
results = []
for i in result_set:
if all(quer.casefold() in str(i).casefold() for quer in query):
results.append(i)
return results
Works best.

How to tell if items occur sequentially in a list in python

I am trying to figure out how to more cleanly determine if a particular item occurs in my list sequentially
for example suppose I have a list:
my_list=[1,2,2,2,4,5,1,0]
in the above example repeated instances of 1 do not occur sequentially in the list but all instances of 2 do. The only way I can seem to figure out how to do this is very clumsy
def check_sequencing(some_list,item_to_check):
prev_instance = 0
difference_list = []
for counter, item in enumerate(some_list):
if item_to_check == item:
difference_list.append(counter - prev_instance)
prev_instance = counter
if set(difference_list[1:]) == set([1]):
return 'True'
else:
return 'False'
I am trying to avoid importing another library (numpy) I was just sure when I started down this road that their would be a one liner but I can't find it.
>>> collections.Counter(x[0] for x in itertools.groupby(my_list)).get(1, 0) > 1
True
>>> collections.Counter(x[0] for x in itertools.groupby(my_list)).get(2, 0) > 1
False
You can use itertools.groupby to do this:
>>> import itertools
>>> any(len(list(n[1])) >= 2 for n in itertools.groupby(l))
True
If you want to avoid using len(list(gen)), you could use something like this:
>>> import itertools
>>> any(sum(1 for i in n[1]) >= 2 for n in itertools.groupby(l))
True
Ok, this time i have a real one-liner that works:
all(x==i for x in L[L.index(i):len(L)-[k for k in reversed(L)].index(i)])
If it's true, then it occurs more than once. Replace L with your list and i with the term you're searching for.

Is there a Python builtin for determining if an iterable contained a certain sequence?

For example, something like:
>>> [1, 2, 3].contains_sequence([1, 2])
True
>>> [1, 2, 3].contains_sequence([4])
False
I know that the in operator can do this for strings:
>>> "12" in "123"
True
But I'm looking for something that operates on iterables.
Referenced from https://stackoverflow.com/a/6822773/24718
modified to use a list.
from itertools import islice
def window(seq, n=2):
"""
Returns a sliding window (of width n) over data from the iterable
s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...
"""
it = iter(seq)
result = list(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + [elem]
yield result
def contains_sequence(all_values, seq):
return any(seq == current_seq for current_seq in window(all_values, len(seq)))
test_iterable = [1,2,3]
search_sequence = [1,2]
result = contains_sequence(test_iterable, search_sequence)
Is there a Python builtin? No. You can accomplish this task in various ways. Here is a recipe that does it, and also gives you the position of the subsequence in the containing sequence:
def _search(forward, source, target, start=0, end=None):
"""Naive search for target in source."""
m = len(source)
n = len(target)
if end is None:
end = m
else:
end = min(end, m)
if n == 0 or (end-start) < n:
# target is empty, or longer than source, so obviously can't be found.
return None
if forward:
x = range(start, end-n+1)
else:
x = range(end-n, start-1, -1)
for i in x:
if source[i:i+n] == target:
return i
return None
As far as I know, there's no way to do this. You can roll your own function pretty easily, but I doubt that will be terribly efficient.
>>> def contains_seq(seq,subseq):
... #try: junk=seq[:]
... #except: seq=tuple(seq)
... #try: junk=subseq[:]
... #except: subseq=tuple(subseq)
... ll=len(subseq)
... for i in range(len(seq)-ll): #on python2, use xrange.
... if(seq[i:i+ll] == subseq):
... return True
... return False
...
>>> contains_seq(range(10),range(3)) #True
>>> contains_seq(range(10),[2,3,6]) #False
Note that this solution does not work with generator type objects (it only works on objects that you can slice). You could check seq to see if it is sliceable before proceeding and cast to a tuple if it isn't sliceable -- But then you get rid of the benefits of slicing. You could re-write it to check one element at a time instead of using slicing, but I have a feeling performance would suffer even more.
If preserving of order is not necessary, you can use sets (builtin):
>>> set([1,2]).issubset([1,2,3])
True
>>> set([4]).issubset([1,2,3])
False
Otherwise:
def is_subsequence(sub, iterable):
sub_pos, sub_len = 0, len(sub)
for i in iterable:
if i == sub[sub_pos]:
sub_pos += 1
if sub_pos >= sub_len:
return True
else:
sub_pos = 0
return False
>>> is_subsequence([1,2], [0,1,2,3,4])
True
>>> is_subsequence([2,1], [0,1,2,3,4]) # order preserved
False
>>> is_subsequence([1,2,4], [0,1,2,3,4])
False
This one works with any iterator.
As others have said, there's no builtin for this. Here's an implementation that is potentially more efficient than the other answers I've seen -- in particular, it scans through the iterable, just keeping track of what prefix sizes of the target sequence it's seen. But that increased efficiency comes at some expense in increased verbosity over some of the other approaches that have been suggested.
def contains_seq(iterable, seq):
"""
Returns true if the iterable contains the given sequence.
"""
# The following clause is optional -- leave it if you want to allow `seq` to
# be an arbitrary iterable; or remove it if `seq` will always be list-like.
if not isinstance(seq, collections.Sequence):
seq = tuple(seq)
if len(seq)==0: return True # corner case
partial_matches = []
for elt in iterable:
# Try extending each of the partial matches by adding the
# next element, if it matches.
partial_matches = [m+1 for m in partial_matches if elt == seq[m]]
# Check if we should start a new partial match
if elt==seq[0]:
partial_matches.append(1)
# Check if we have a complete match (partial_matches will always
# be sorted from highest to lowest, since older partial matches
# come before newer ones).
if partial_matches and partial_matches[0]==len(seq):
return True
# No match found.
return False
deque appears to be useful here:
from collections import deque
def contains(it, seq):
seq = deque(seq)
deq = deque(maxlen=len(seq))
for p in it:
deq.append(p)
if deq == seq:
return True
return False
Note that this accepts arbitrary iterables for both arguments (no slicing required).
As there's no builtin, I made a nice version:
import itertools as it
def contains(seq, sub):
seq = iter(seq)
o = object()
return any(all(i==j for i,j in zip(sub, it.chain((n,),seq,
(o for i in it.count())))) for n in seq)
This do not require any extra lists (if you use it.izip or Py3k).
>>> contains([1,2,3], [1,2])
True
>>> contains([1,2,3], [1,2,3])
True
>>> contains([1,2,3], [2,3])
True
>>> contains([1,2,3], [2,3,4])
False
Extra points if you have no trouble reading it. (It does the job, but the implementation is not to be taked too seriously). ;)
You could convert it into a string and then do matching on it
full_list = " ".join([str(x) for x in [1, 2, 3]])
seq = " ".join([str(x) for x in [1, 2]])
seq in full_list

tuple checking in python

i've written a small program:
def check(xrr):
""" goes through the list and returns True if the list
does not contain common pairs, IE ([a,b,c],[c,d,e]) = true
but ([a,b,c],[b,a,c]) = false, note the lists can be longer than 2 tuples"""
x = xrr[:]
#sorting the tuples
sorted(map(sorted,x))
for i in range(len(x)-1):
for j in range(len(x)-1):
if [x[i]] == [x[i+1]] and [x[j]] == [x[j+1]]:
return False
return True
But it doesnt seem to work right, this is probably something extremely basic, but after a couple of days trying on and off, i cant really seem to get my head around where the error is.
Thanx in advance
There are so many problems with your code as others have mentioned. I'll try to explain how I would implement this function.
It sounds like what you want to do is actually this: You generate a list of pairs from the input sequences and see if there are any duplicates among the pairs. When you formulate the problem like this it gets much easier to implement.
First we need to generate the pairs. It can be done in many ways, the one you would probably do is:
def pairs( seq ):
ret = []
# go to the 2nd last item of seq
for k in range(len(seq)-1):
# append a pair
ret.append((seq[k], seq[k+1]))
return ret
Now we want to see (a,b) and (b,a) and the same tuple, so we simply sort the tuples:
def sorted_pairs( seq ):
ret = []
for k in range(len(seq)-1):
x,y = (seq[k], seq[k+1])
if x <= y:
ret.append((x,y))
else:
ret.append((y,x))
return ret
Now solving the problem is pretty straight forward. We just need to generate all these tuples and add them to a set. Once we see a pair twice we are done:
def has_common_pairs( *seqs ):
""" checks if there are any common pairs among any of the seqs """
# store all the pairs we've seen
seen = set()
for seq in seqs:
# generate pairs for each seq in seqs
pair_seq = sorted_pairs(seq)
for pair in pair_seq:
# have we seen the pair before?
if pair in seen:
return True
seen.add(pair)
return False
Now the function you were trying to implement is quite simple:
def check(xxr):
return not has_common_pairs(*xxr)
PS: You can generalize the sorted_pairs function to work on any kind of iterable, not only those that support indexing. For completeness sake I'll paste it below, but you don't really need it here and it' harder to understand:
def sorted_pairs( seq ):
""" yield pairs (fst, snd) generated from seq
where fst <= snd for all fst, snd"""
it = iter(seq)
fst = next(it)
for snd in it:
if first <= snd:
yield fst, snd
else:
yield snd, fst
first = snd
I would recommend using a set for this:
def check(xrr):
s = set()
for t in xrr:
u = tuple(sorted(t))
if u in s:
return False
s.add(u)
return True
This way, you don't need to sort the whole list and you stop when the first duplicate is found.
There are several errors in your code. One is that sorted returns a new list, and you just drop the return value. Another one is that you have two nested loops over your data where you would need only one. Here is the code that makes your approach work:
def check(xrr):
x = sorted(map(sorted,xrr))
for i in range(len(x)-1):
if x[i] == x[i+1]:
return False
return True
This could be shortened to
def check(xrr):
x = sorted(map(sorted,xrr))
return all(a != b for a, b in zip(x[:-1], x[1:]))
But note that the first code I gave will be more efficient.
BTW, a list in Python is [1, 2, 3], while a tuple is (1, 2, 3).
sorted doesn't alter the source, it returns a new list.
def check(xrr):
xrrs = map(sorted, xrr)
for i in range(len(xrrs)):
if xrrs[i] in xrrs[i+1:]: return False
return True
I'm not sure that's what's being asked, but if I understood it correctly, I'd write:
def check(lst):
return any(not set(seq).issubset(lst[0]) for seq in lst[1:])
print check([(1, 2, 3), (2, 3, 5)]) # True
print check([(1, 2, 3), (3, 2, 1)]) # False
Here is more general solution, note that it find duplicates, not 'non-duplicates', it's better this way and than to use not.
def has_duplicates(seq):
seen = set()
for item in seq:
if hasattr(item, '__iter__'):
item = tuple(sorted(item))
if item in seen:
return True
seen.add(item)
return False
This is more general solution for finding duplicates:
def get_duplicates(seq):
seen = set()
duplicates = set()
for item in seq:
item = tuple(sorted(item))
if item in seen:
duplicates.add(item)
else:
seen.add(item)
return duplicates
Also it is better to find duplicates, not the 'not duplicates', it saves a lot of confusion. You're better of using general and readable solution, than one-purpose functions.

Categories

Resources