Related
Is possible to have a count of how many times a value appears in a given list, and have '0' if the item is not in the list?
I have to use zip but the first list have 5 items and the other one created using count, have only 3. That's why I need to fill the other two position with 0 values.
You can achieve your purpose with itertools zip_longest.
With zip_longest, you can zip two lists of different lengths, just that the missing corresponding values will be filled with 'None'. You may define a suitable fill values as i have done below.
from itertools import zip_longest
a = ['a','b','c','d','e']
b = [1,4,3]
final_lst = list(zip_longest(a,b, fillvalue=0))
final_dict = dict(list(zip_longest(a,b, fillvalue=0))) #you may convert answer to dictionary if you wish
ELSE
If what you are trying to do is count the number of times items in a reference list appear in another list(taking record also of reference items that don't appear in the other list), you may use dictionary comprehension:
ref_list = ['a','b','c','d','e']#reference list
other_list = ['a','b','b','d','a','d','a','a','a']
count_dict = {n:other_list.count(n) for n in ref_list}
print (count_dict)
Output
{'a': 5, 'b': 2, 'c': 0, 'd': 2, 'e': 0}
Use collections.Counter, and then call get with a default value of 0 to see how many times any given element appears:
>>> from collections import Counter
>>> counts = Counter([1, 2, 3, 1])
>>> counts.get(1, 0)
2
>>> counts.get(2, 0)
1
>>> counts.get(5, 0)
0
If you want to count how many times a value appears in a list, you could do this:
def count_in_list(list_,value):
count=0
for e in list_:
if e==value:
count+=1
return count
And use the code like this:
MyList=[1,3,1,1,1,1,1,2]
count_in_list(MyList,1)
Output:
6
This will work without any additional things such as imports.
I have two lists like thw following:
a=['not','not','not','not']
b=['not','not']
and I have to find the len of the list containing the intesection of the two above list, so that the result is:
intersection=['not','not']
len(intersection)
2
Now the problem is that I have tried filter(lambda x: x in a,b) and filter (lambda x: x in b,a) but when one of two list in longer than the other I do not get an intersection but just a membership checking. In the example above, since all the members of a are in b I get a len of common elements of 4; what I instead want is the intersection, which is len 2.
Using set().intersection(set()) would instead create a set, which is not what I want since all the elements are the same.
Can you suggest me any valuable and compact solution to the problem?
If you don't mind using collections.Counter, then you could have a solution like
>>> import collections
>>> a=['not','not','not','not']
>>> b=['not','not']
>>> c1 = collections.Counter(a)
>>> c2 = collections.Counter(b)
and then index by 'not'
>>> c1['not'] + c2['not']
6
For the intersection, you need to
>>> (c1 & c2) ['not']
2
I don't see any particularly compact way to compute this. Let's just go for a solution first.
The intersection is some sublist of the shorter list (e.g. b). Now, for better performance when the shorter list is not extremely short, make the longer list a set (e.g. set(a)). The intersection can then be expressed as a list comprehension of those items in the shorter list which are also in the longer set:
def common_elements(a, b):
shorter, longer = (a, b) if len(a)<len(b) else (b, a)
longer = set(longer)
intersection = [item for item in shorter if item in longer]
return intersection
a = ['not','not','not','not']
b = ['not','not']
print(common_elements(a,b))
Have you considered the following approach?
a = ['not','not','not','not']
b = ['not','not']
min(len(a), len(b))
# 2
Since all the elements are the same, the number of common elements is just the minimum of the lengths of both lists.
Do it by set. First make those lists to sets and then take their intersection. Now there might be repetitions in the intersection. So for each elements in intersection take the minimum repetitions in a and b.
>>> a=['not','not','not','not']
>>> b=['not','not']
>>> def myIntersection(A,B):
... setIn = set(A).intersection(set(B))
... rt = []
... for i in setIn:
... for j in range(min(A.count(i),B.count(i))):
... rt.append(i)
... return rt
...
>>> myIntersection(a,b)
['not', 'not']
first question here, so i will get right to it:
using python 2.7
I have a dictionary of items, the keys are an x,y coordinate represented as a tuple: (x,y) and all the values are Boolean values.
I am trying to figure out a quick and clean method of getting a count of how many items have a given value. I do NOT need to know which keys have the given value, just how many.
there is a similar post here:
How many items in a dictionary share the same value in Python, however I do not need a dictionary returned, just an integer.
My first thought is to iterate over the items and test each one while keeping a count of each True value or something. I am just wondering, since I am still new to python and don't know all the libraries, if there is a better/faster/simpler way to do this.
thanks in advance.
This first part is mostly for fun -- I probably wouldn't use it in my code.
sum(d.values())
will get the number of True values. (Of course, you can get the number of False values by len(d) - sum(d.values())).
Slightly more generally, you can do something like:
sum(1 for x in d.values() if some_condition(x))
In this case, if x works just fine in place of if some_condition(x) and is what most people would use in real-world code)
OF THE THREE SOLUTIONS I HAVE POSTED HERE, THE ABOVE IS THE MOST IDIOMATIC AND IS THE ONE I WOULD RECOMMEND
Finally, I suppose this could be written a little more cleverly:
sum( x == chosen_value for x in d.values() )
This is in the same vein as my first (fun) solution as it relies on the fact that True + True == 2. Clever isn't always better. I think most people would consider this version to be a little more obscure than the one above (and therefore worse).
If you want a data structure that you can quickly access to check the counts, you could try using a Counter (as #mgilson points out, this relies on the values themselves being hashable):
>>> from collections import Counter
>>> d = {(1, 2): 2, (3, 1): 2, (4, 4): 1, (5, 6): 4}
>>> Counter(d.values())
Counter({2: 2, 1: 1, 4: 1})
You could then plug in a value and get the number of times it appeared:
>>> c = Counter(d.values())
>>> c[2]
2
>>> c[4]
1
Update:
Hello again. My question is, how can I compare values of an dictionary for equality. More Informationen about my Dictionary:
keys are session numbers
values of each key are nested lists -> f.e.
[[1,0],[2,0],[3,1]]
the length of values for each key arent the same, so it could be that session number 1 have more values then session number 2
here an example dictionary:
order_session =
{1:[[100,0],[22,1],[23,2]],10:[100,0],[232,0],[10,2],[11,2]],22:[[5,2],[23,2],....],
... }
My Goal:
Step 1: to compare the values of session number 1 with the values of the whole other session numbers in the dictionary for equality
Step 2: take the next session number and compare the values with the other values of the other session numbers, and so on
- finally we have each session number value compared
Step 3: save the result into a list f.e.
output = [[100,0],[23,2], ... ] or output = [(100,0),(23,2), ... ]
if you can see a value-pair [100,0] of session 1 and 10 are the same. also the value-pair [23,2] of session 1 and 22 are the same.
Thanks for helping me out.
Update 2
Thank you for all your help and tips to change the nested list of lists into list of tuples, which are quite better to handle it.
I prefer Boaz Yaniv solution ;)
I also like the use of collections.Counter() ... unlucky that I use 2.6.4 (Counter works at 2.7) maybe I change to 2.7 sometimes.
If your dictionary is long, you'd want to use sets, for better performance (looking up already-encountered values in lists is going to be quite slow):
def get_repeated_values(sessions):
known = set()
already_repeated = set()
for lst in sessions.itervalues():
session_set = set(tuple(x) for x in lst)
repeated = (known & session_set) - already_repeated
already_repeated |= repeated
known |= session_set
for val in repeated:
yield val
sessions = {1:[[100,0],[22,1],[23,2]],10:[[100,0],[232,0],[10,2],[11,2]],22:[[5,2],[23,2]]}
for x in get_repeated_values(sessions):
print x
I also suggest (again, for performance reasons) to nest tuples inside your lists instead of lists, if you're not going to change them on-the-fly. The code I posted here will work either way, but it would be faster if the values are already tuples.
There's probably a nicer and more optimal way to do this, but I'd work my way from here:
seen = []
output = []
for val in order_session.values():
for vp in val:
if vp in seen:
if not vp in output:
output.append(vp)
else:
seen.append(vp)
print(output)
Basically, what this does is to look through all the values, and if the value has been seen before, but not output before, it is appended to the output.
Note that this works with the actual values of the value pairs - if you have objects of various kinds that result in pointers, my algorithm might fail (I haven't tested it, so I'm not sure). Python re-uses the same object reference for "low" integers; that is, if you run the statements a = 5 and b = 5 after each other, a and b will point to the same integer object. However, if you set them to, say, 10^5, they will not. But I don't know where the limit is, so I'm not sure if this applies to your code.
>>> from collections import Counter
>>> D = {1:[[100,0],[22,1],[23,2]],
... 10:[[100,0],[232,0],[10,2],[11,2]],
... 22:[[5,2],[23,2]]}
>>> [k for k,v in Counter(tuple(j) for i in D.values() for j in i).items() if v>1]
[(23, 2), (100, 0)]
If you really really need a list of lists
>>> [list(k) for k,v in Counter(tuple(j) for i in D.values() for j in i).items() if v>1]
[[23, 2], [100, 0]]
order_session = {1:[[100,0],[22,1],[23,2]],10:[[100,0],[232,0],[10,2],[11,2]],22:[[5,2],[23,2],[80,21]],}
output = []
for pair in sum(order_session.values(), []):
if sum(order_session.values(), []).count(pair) > 1 and pair not in output:
output.append(pair)
print output
...
[[100, 0], [23, 2]]
what is the easiest way to sort a list of strings with digits at the end where some have 3 digits and some have 4:
>>> list = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> list.sort()
>>> print list
['asdf111', 'asdf123', 'asdf1234', 'asdf124']
should put the 1234 one on the end. is there an easy way to do this?
is there an easy way to do this?
Yes
You can use the natsort module.
>>> from natsort import natsorted
>>> natsorted(['asdf123', 'asdf1234', 'asdf111', 'asdf124'])
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
Full disclosure, I am the package's author.
is there an easy way to do this?
No
It's perfectly unclear what the real rules are. The "some have 3 digits and some have 4" isn't really a very precise or complete specification. All your examples show 4 letters in front of the digits. Is this always true?
import re
key_pat = re.compile(r"^(\D+)(\d+)$")
def key(item):
m = key_pat.match(item)
return m.group(1), int(m.group(2))
That key function might do what you want. Or it might be too complex. Or maybe the pattern is really r"^(.*)(\d{3,4})$" or maybe the rules are even more obscure.
>>> data= ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> data.sort( key=key )
>>> data
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
What you're probably describing is called a Natural Sort, or a Human Sort. If you're using Python, you can borrow from Ned's implementation.
The algorithm for a natural sort is approximately as follows:
Split each value into alphabetical "chunks" and numerical "chunks"
Sort by the first chunk of each value
If the chunk is alphabetical, sort it as usual
If the chunk is numerical, sort by the numerical value represented
Take the values that have the same first chunk and sort them by the second chunk
And so on
l = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
l.sort(cmp=lambda x,y:cmp(int(x[4:]), int(y[4:]))
You need a key function. You're willing to specify 3 or 4 digits at the end and I have a feeling that you want them to compare numerically.
sorted(list_, key=lambda s: (s[:-4], int(s[-4:])) if s[-4] in '0123456789' else (s[:-3], int(s[-3:])))
Without the lambda and conditional expression that's
def key(s):
if key[-4] in '0123456789':
return (s[:-4], int(s[-4:]))
else:
return (s[:-3], int(s[-3:]))
sorted(list_, key=key)
This just takes advantage of the fact that tuples sort by the first element, then the second. So because the key function is called to get a value to compare, the elements will now be compared like the tuples returned by the key function. For example, 'asdfbad123' will compare to 'asd7890' as ('asdfbad', 123) compares to ('asd', 7890). If the last 3 characters of a string aren't in fact digits, you'll get a ValueError which is perfectly appropriate given the fact that you passed it data that doesn't fit the specs it was designed for.
The issue is that the sorting is alphabetical here since they are strings. Each sequence of character is compared before moving to next character.
>>> 'a1234' < 'a124' <----- positionally '3' is less than '4'
True
>>>
You will need to due numeric sorting to get the desired output.
>>> x = ['asdf123', 'asdf1234', 'asdf111', 'asdf124']
>>> y = [ int(t[4:]) for t in x]
>>> z = sorted(y)
>>> z
[111, 123, 124, 1234]
>>> l = ['asdf'+str(t) for t in z]
>>> l
['asdf111', 'asdf123', 'asdf124', 'asdf1234']
>>>
L.sort(key=lambda s:int(''.join(filter(str.isdigit,s[-4:]))))
rather than splitting each line myself, I ask python to do it for me with re.findall():
import re
import sys
def SortKey(line):
result = []
for part in re.findall(r'\D+|\d+', line):
try:
result.append(int(part, 10))
except (TypeError, ValueError) as _:
result.append(part)
return result
print ''.join(sorted(sys.stdin.readlines(), key=SortKey)),