How to get count of non-repeating values in list

How to get count of non-repeating values in list - python

I know I can do something like below to get number of occurrences of elements in the list:
from collections import Counter
words = ['a', 'b', 'c', 'a']
Counter(words).keys() # equals to list(set(words))
Counter(words).values() # counts the elements' frequency
Outputs:
['a', 'c', 'b']
[2, 1, 1]
But I want to get the count 2 for b and c as b and c occur exactly once in the list.
Is there any way to do this in concise / pythonic way without using Counter or even using above output from Counter?

You could just make an algorithm that does that, here is a one liner (thanks #d.b):
sum(x for x in Counter(words).values() if x == 1)
Or more than one line:
seen = []
count = 0
for word in words:
if word not in seen:
count += 1
seen.append(word)

Related

How can I use enumerate to count backwards?

letters = ['a', 'b', 'c']
Assume this is my list. Where for i, letter in enumerate(letters) would be:
0, a
1, b
2, c
How can I instead make it enumerate backwards, as:
2, a
1, b
0, c

This is a great solution and works perfectly:
items = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
for idx, item in enumerate(items, start=-len(items)):
print(f"reverse index for {item}: {abs(idx)}")
Here is the OUTPUT of the above snippet:
reverse index for a: 7
reverse index for b: 6
reverse index for c: 5
reverse index for d: 4
reverse index for e: 3
reverse index for f: 2
reverse index for g: 1
Here is what happening in above snippet:
enumerate's start arg is given a negative value.
enumerate always takes a step forward.
Finally we use abs on idx to find absolute value, which is always positive.
If you want to start indexing from zero then use -len(items) + 1 to fix off-by-one error

Try this:
letters = ['a', 'b', 'c']
for i, letter in reversed(list(enumerate(reversed(letters)))):
print(i, letter)
Output:
2 a
1 b
0 c

Try this:
l = len(letters)
for i, letter in enumerate(letters):
print(l-i, letters)

I would try to make a reverse list first then you may use enumerate()
letters = ['a', 'b', 'c']
letters.reverse()
for i, letter in enumerate(letters)

The zip function creates a list of element-wise pairs for two parameter lists.
list(zip([i for i in range(len(letters))][::-1], letters))

letters = ['a', 'b', 'c']
for i, letter in zip(range(len(letters)-1, -1, -1), letters):
print(i, letter)
prints
2 a
1 b
0 c
Taken from answer in a similar question: Traverse a list in reverse order in Python

tl;dr: size - index - 1
I'll assume the question you are asking is whether or not you can have the index be reversed while the item is the same, for example, the a has the ordering number of 2 when it actually has an index of 0.
To calculate this, consider that each element in your array or list wants to have the index of the item with the same "distance" (index wise) from the end of the collection. Calculating this gives you size - index.
However, many programming languages start arrays with an index of 0. Due to this, we would need to subtract 1 in order to make the indices correspond properly. Consider our last element, with an index of size - 1. In our original equation, we would get size - (size - 1), which is equal to size - size + 1, which is equal to 1. Therefore, we need to subtract 1.
Final equation (for each element): size - index - 1

We can define utility function (in Python3.3+)
from itertools import count
def enumerate_ext(iterable, start=0, step=1):
indices = count(start, step)
yield from zip(indices, iterable)
and use it directly like
letters = ['a', 'b', 'c']
for index, letter in enumerate_ext(letters,
start=len(letters) - 1,
step=-1):
print(index, letter)
or write helper
def reverse_enumerate(sequence):
yield from enumerate_ext(sequence,
start=len(sequence) - 1,
step=-1)
and use it like
for index, letter in reverse_enumerate(letters):
print(index, letter)

How to standardize the format of element in the list from big data

Trying to count unique value from the following list without using collection:
('TOILET','TOILETS','AIR CONDITIONING','AIR-CONDITIONINGS','AIR-CONDITIONING')
The output which I require is :
('TOILET':2,'AIR CONDITIONiNGS':3)
My code currently is
for i in Data:
if i in number:
number[i] += 1
else:
number[i] = 1
print number
Is it possible to get the output?

Using difflib.get_close_matches to help determine uniqueness
import difflib
a = ('TOILET','TOILETS','AIR CONDITIONING','AIR-CONDITIONINGS','AIR-CONDITIONING')
d = {}
for word in a:
similar = difflib.get_close_matches(word, d.keys(), cutoff = 0.6, n = 1)
#print(similar)
if similar:
d[similar[0]] += 1
else:
d[word] = 1
The actual keys in the dictionary will depend on the order of the words in the list.
difflib.get_close_matches uses difflib.SequenceMatcher to calculate the closeness (ratio) of the word against all possibilities even if the first possibility is close - then sorts by the ratio. This has the advantage of finding the closest key that has a ratio greater than the cutoff. But as the dictionary grows the searches will take longer.
If needed, you might be able to optimize a little by sorting the list first so that similar words appear in sequence and doing something like this (lazy evaluation) - choosing an appropriately large cutoff.
import difflib, collections
z = collections.OrderedDict()
a = sorted(a)
cutoff = 0.6
for word in a:
for key in z.keys():
if difflib.SequenceMatcher(None, word, key).ratio() > cutoff:
z[key] += 1
break
else:
z[word] = 1
Results:
>>> d
{'TOILET': 2, 'AIR CONDITIONING': 3}
>>> z
OrderedDict([('AIR CONDITIONING', 3), ('TOILET', 2)])
>>>
I imagine there are python packages that do this sort of thing and may be optimized.

I don't believe the python list has an easy built-in way to do what you are asking. It does, however, have a count method that can tell you how many of a specific element there are in a list. Example:
some_list = ['a', 'a', 'b', 'c']
some_list.count('a') #=> 2
Usually the way you get what you want is to construct an incrementable hash by taking advantage of the Hash::get(key, default) method:
some_list = ['a', 'a', 'b', 'c']
counts = {}
for el in some_list
counts[el] = counts.get(el, 0) + 1
counts #=> {'a' : 2, 'b' : 1, 'c' : 1}

You can try this:
import re
data = ('TOILETS','TOILETS','AIR CONDITIONING','AIR-CONDITIONINGS','AIR-CONDITIONING')
new_data = [re.sub("\W+", ' ', i) for i in data]
print new_data
final_data = {}
for i in new_data:
s = [b for b in final_data if i.startswith(b)]
if s:
new_data = s[0]
final_data[new_data] += 1
else:
final_data[i] = 1
print final_data
Output:
{'TOILETS': 2, 'AIR CONDITIONING': 3}

original = ('TOILETS', 'TOILETS', 'AIR CONDITIONING',
'AIR-CONDITIONINGS', 'AIR-CONDITIONING')
a_set = set(original)
result_dict = {element: original.count(element) for element in a_set}
First, making a set from original list (or tuple) gives you all values from it, but without repeating.
Then you create a dictionary with keys from that set and values as occurrences of them in the original list (or tuple), employing the count() method.

a = ['TOILETS', 'TOILETS', 'AIR CONDITIONING', 'AIR-CONDITIONINGS', 'AIR-CONDITIONING']
b = {}
for i in a:
b.setdefault(i,0)
b[i] += 1
You can use this code, but same as Jon Clements`s talk, TOILET and TOILETS aren't the same string, you must ensure them.

sort and swap in python same time

Enter first string:abbc
Enter second string:aAaabbccCCcccddeeffffggzba
Letters Found a:5 Letters Found b:3 Letters Found c:7
final [5, 3, 7] emptylist ['a', 'b', 'c']
the 2nd one
================== RESTART: /Users/Handy/Documents/task3.py ================== Enter first string:badcefdke
Enter second string:dkenzmdnfer
Letters Found d:2 Letters Found e:2 Letters Found f:1 Letters Found
k:1
final [2, 2, 1, 1]
emptylist ['d', 'e', 'f', 'k']
i need help so that in the first one i can print the highest numbber and its letter correctly in order. if i sort final from highest to lowest, i wouldn't want to sort emptylist but i would've swap it . But how do i swap it without using a function? thanks

This tallies repeated characters in a dictionary and then creates a list of the sorted (sorted using max() and dict.pop(key, None)) values
first_str = input('Enter first string >').strip()
second_str = input('Enter second string >').strip()
first_str = first_str.lower()
first_str = set(first_str)
second_str = second_str.lower()
counts = {}
for character in first_str:
counts [character] = second_str.count(character)
counts_list = []
popdict = {k: v for k,v in counts.items()}
for k in counts:
key, value = max(popdict, key=popdict.get),max(popdict.values())
counts_list.append([key,value])
popdict.pop(key, None)
for sublist in counts_list:
print (sublist[0]+' = '+str(sublist[1]))
prints:
[['c', 7], ['a', 5], ['b', 3]]
c = 7
a = 5
b = 3

Borda’s positional ranking

I have tree lists of elements sorted by descending scores. What I need to do is to use Borda’s positional ranking to combine ranked lists using information of the ordinal ranks of the elements in each list.Given lists t1, t2, t3 ... tk, for each candidate c and list ti, the score B ti (c) is the number of candidates ranked below c in ti.
So The total Borda score is B(c) = ∑ B ti (c)
The candidates are then sorted by descending Borda scores.
I tied that, but it does not give the output needed:
for i in list1, list2, list3:
borda = (((len(list1)-1) - list1.index(i)) + ((len(list2)-1) - list2.index(i)) + ((len(list3)-1) - list3.index(i)))
print borda
Can someone help me to implement the above function?

Calling index(i) takes time proportionate to the list size, and because you have to call that for every element, it ends up taking O(N^2) time where N is the list size. Much better to iterate one list at a time where you know the index and add that part of the score to a score accumulator in a dict.
def borda_sort(lists):
scores = {}
for l in lists:
for idx, elem in enumerate(reversed(l)):
if not elem in scores:
scores[elem] = 0
scores[elem] += idx
return sorted(scores.keys(), key=lambda elem: scores[elem], reverse=True)
lists = [ ['a', 'c'], ['b', 'd', 'a'], ['b', 'a', 'c', 'd'] ]
print borda_sort(lists)
# ['b', 'a', 'c', 'd']
The only tricky part here is scanning lists in reverse; this makes sure that if an element was not in one of the lists at all, its score increases by 0 for that list.
Compare with the other suggestion here:
import itertools
import random
def borda_simple_sort(lists):
candidates = set(itertools.chain(*lists))
return sorted([sum([len(l) - l.index(c) - 1 for l in lists if c in l], 0) for c in candidates], reverse=True)
# returns scores - a bit more work needed to return a list
# make 10 random lists of size 10000
lists = [ random.sample(range(10000), 10000) for x in range(10) ]
%timeit borda_sort(lists)
10 loops, best of 3: 40.9 ms per loop
%timeit borda_simple_sort(lists)
1 loops, best of 3: 30.8 s per loop
That's not a typo :) 40 milliseconds vs 30 seconds, a 750x speedup. The fast algorithm is not significantly more difficult to read in this case, and may even be easier to read, it just relies on an appropriate auxiliary data structure, and going through the data in the right order.

This could work:
sorted([sum([len(l) - l.index(c) - 1 for l in [list1, list2, list3] if c in l], 0) for c in [candidate1, candidate2, candidate3]], reverse=True)
Note that since scores are reordered you'll lose track of which candidate each score belongs to:
>>> list1 = ['a', 'c']
>>> list2 = ['b', 'd', 'a']
>>> list3 = ['b', 'a', 'c', 'd']
>>> candidates = ['a', 'b', 'c', 'd']
>>> sorted([sum([len(l) - l.index(c) - 1 for l in [list1, list2, list3] if c in l], 0) for c in candidates], reverse=True)
[5, 3, 1, 1]
In this case the first element of the list (the winner) is 'b', the second element in the list of candidates.

Difference Between Two Lists with Duplicates in Python

I have two lists that contain many of the same items, including duplicate items. I want to check which items in the first list are not in the second list. For example, I might have one list like this:
l1 = ['a', 'b', 'c', 'b', 'c']
and one list like this:
l2 = ['a', 'b', 'c', 'b']
Comparing these two lists I would want to return a third list like this:
l3 = ['c']
I am currently using some terrible code that I made a while ago that I'm fairly certain doesn't even work properly shown below.
def list_difference(l1,l2):
for i in range(0, len(l1)):
for j in range(0, len(l2)):
if l1[i] == l1[j]:
l1[i] = 'damn'
l2[j] = 'damn'
l3 = []
for item in l1:
if item!='damn':
l3.append(item)
return l3
How can I better accomplish this task?

You didn't specify if the order matters. If it does not, you can do this in >= Python 2.7:
l1 = ['a', 'b', 'c', 'b', 'c']
l2 = ['a', 'b', 'c', 'b']
from collections import Counter
c1 = Counter(l1)
c2 = Counter(l2)
diff = c1-c2
print list(diff.elements())

Create Counters for both lists, then subtract one from the other.
from collections import Counter
a = [1,2,3,1,2]
b = [1,2,3,1]
c = Counter(a)
c.subtract(Counter(b))

To take into account both duplicates and the order of elements:
from collections import Counter
def list_difference(a, b):
count = Counter(a) # count items in a
count.subtract(b) # subtract items that are in b
diff = []
for x in a:
if count[x] > 0:
count[x] -= 1
diff.append(x)
return diff
Example
print(list_difference("z y z x v x y x u".split(), "x y z w z".split()))
# -> ['y', 'x', 'v', 'x', 'u']
Python 2.5 version:
from collections import defaultdict
def list_difference25(a, b):
# count items in a
count = defaultdict(int) # item -> number of occurrences
for x in a:
count[x] += 1
# subtract items that are in b
for x in b:
count[x] -= 1
diff = []
for x in a:
if count[x] > 0:
count[x] -= 1
diff.append(x)
return diff

Counters are new in Python 2.7.
For a general solution to substract a from b:
def list_difference(b, a):
c = list(b)
for item in a:
try:
c.remove(item)
except ValueError:
pass #or maybe you want to keep a values here
return c

you can try this
list(filter(lambda x:l1.remove(x),li2))
print(l1)

Try this one:
from collections import Counter
from typing import Sequence
def duplicates_difference(a: Sequence, b: Sequence) -> Counter:
"""
>>> duplicates_difference([1,2],[1,2,2,3])
Counter({2: 1, 3: 1})
"""
shorter, longer = sorted([a, b], key=len)
return Counter(longer) - Counter(shorter)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get count of non-repeating values in list - python

You could just make an algorithm that does that, here is a one liner (thanks #d.b): sum(x for x in Counter(words).values() if x == 1) Or more than one line: seen = [] count = 0 for word in words: if word not in seen: count += 1 seen.append(word)

Related

How can I use enumerate to count backwards?

How to standardize the format of element in the list from big data

sort and swap in python same time

Borda’s positional ranking

Difference Between Two Lists with Duplicates in Python

Categories

Resources