Build histogram - python

I'm trying to make histogram by python. I am starting with the following snippet:
def histogram(L):
d = {}
for x in L:
if x in d:
d[x] += 1
else:
d[x] = 1
return d
I understand it's using dictionary function to solve the problem.
But I'm just confused about the 4th line: if x in d:
d is to be constructed, there's nothing in d yet, so how come if x in d?

Keep in mind, that if is inside a for loop.
So, when you're looking at the very first item in L there is nothing in d, but when you get to the next item in L, there is something in d, so you need to check whether to make a new bin on the histogram (d[x] = 1), or add the item to an existing bin (d[x] += 1).
In Python, we actually have some shortcuts for this:
from collections import defaultdict
def histogram(L):
d = defaultdict(int)
for x in L:
d[x] += 1
return d
This automatically starts each bin in d at zero (what int() returns) so you don't have to check if the bin exists. On Python 2.7 or higher:
from collections import Counter
d = Counter(L)
Will automatically make a mapping of the frequencies of each item in L. No other code required.

The code inside of the for loop will be executed once for each element in L, with x being the value of the current element.
Lets look at the simple case where L is the list [3, 3]. The first time through the loop d will be empty, x will be 3, and 3 in d will be false, so d[3] will be set to 1. The next time through the loop x will be 3 again, and 3 in d will be true, so d[3] will be incremented by 1.

You can create a histogram with a dict comprehension:
histogram = {key: l.count(key) for key in set(L)}

I think the other guys have explained you why if x in d. But here is a clue, how this code should be written following "don't ask permission, ask forgiveness":
...
try:
d[x] += 1
except KeyError:
d[x] = 1
The reason for this, is that you expect this error to appear only once (at least once per method call). Thus, there is no need to check if x in d.

You can use a Counter, available from Python 2.7 and Python 3.1+.
>>> # init empty counter
>>> from collections import Counter
>>> c = Counter()
>>> # add a single sample to the histogram
>>> c.update([4])
>>> # add several samples at once
>>> c.update([4, 2, 2, 5])
>>> # print content
>>> print c
Counter({2: 2, 4: 2, 5: 1})
The module brings several nice features, like addition, subtraction, intersection and union on counters. The Counter can count anything which can be used as a dictionary key.

if x isn't in d, then it gets put into d with d[x] = 1. Basically, if x shows up in d more than once it increases the number matched with x.
Try using this to step through the code: http://people.csail.mit.edu/pgbovine/python/

You can create your own histogram in Python using for example matplotlib. If you want to see one example about how this could be implemented, you can refer to this answer.
In this specific case, you can use doing:
temperature = [4, 3, 1, 4, 6, 7, 8, 3, 1]
radius = [0, 2, 3, 4, 0, 1, 2, 10, 7]
density = [1, 10, 2, 24, 7, 10, 21, 102, 203]
points, sub = hist3d_bubble(temperature, density, radius, bins=4)
sub.axes.set_xlabel('temperature')
sub.axes.set_ylabel('density')
sub.axes.set_zlabel('radius')

Related

Unique integers and their counts in a list (making a histogram)

If I have a list like [10, 10, 2, 3, 1], how do I determine the unique values and the counts of each unique value? For example, the result should
tell me that the unique values are 10, 2, 3, 1 (i'm sure it would best if sorted) and then that the counts for each respective element is 2, 1, 1, 1.
My current solutions are as follows, though during an interview, the interviewer seemed to indicate these were not what he was looking for:
arr = [10, 10, 2, 3, 1]
# Using numpy... this is what I would actually use in practice
# could also use `np.hist` aand set the number of bins
# equal to to the number of unique elements??
unique, counts = np.unique(arr, return_counts=True)
histogram = zip(unique, counts)
# Should sort the elements of the array here... not sure
# which sorting algorithm to use and I haven't implemented
# one from scratch in a long time anyways so wouldn't know where
# to start
arr = sorting_algorithm(arr)
# Using a python dictionary
histogram = dict()
for ele in arr:
if ele not in histogram:
histogram[ele] = 1
else:
histogram[ele] += 1
I would suggest using Counter from the collections module.
from collections import Counter
arr = [10, 10, 2, 3, 1]
c = Counter(arr)
for k, v in c.items():
print(k, v)
**Output: **
10 2
2 1
3 1
1 1
You should use collections.Counter. Writing your own code to do this in an interview is probably not the end of the world, but I'd guess that the interviewer wasn't happy with your answer because numpy isn't necessary for this task.
import collections
arr = [10, 10, 2, 3, 1]
counters = collections.Counter(arr)
print(counters) # Gets frequencies.
print(list(sorted((key, value) for key, value in counters.items()))) # Obtain sorted list by key value.

Count elements in a list without using Counter

Should be returning a dictionary in the following format:
key_count([1, 3, 2, 1, 5, 3, 5, 1, 4]) ⇒ {
1: 3,
2: 1,
3: 2,
4: 1,
5: 2,
}
I know the fastest way to do it is the following:
import collections
def key_count(l):
return collections.Counter(l)
However, I would like to do it without importing collections.Counter.
So far I have:
x = []
def key_count(l):
for i in l:
if i not in x:
x.append(i)
count = []
for i in l:
if i == i:
I approached the problem by trying to extract the two sides (keys and values) of the dictionary into separate lists and then use zip to create the dictionary. As you can see, I was able to extract the keys of the eventual dictionary but I cannot figure out how to add the number of occurrences for each number from the original list in a new list. I wanted to create an empty list count that will eventually be a list of numbers that denote how many times each number in the original list appeared. Any tips? Would appreciate not giving away the full answer as I am trying to solve this! Thanks in advance
Separating the keys and values is a lot of effort when you could just build the dict directly. Here's the algorithm. I'll leave the implementation up to you, though it sort of implements itself.
Make an empty dict
Iterate through the list
If the element is not in the dict, set the value to 1. Otherwise, add to the existing value.
See the implementation here:
https://stackoverflow.com/a/8041395/4518341
Classic reduce problem. Using a loop:
a = [1, 3, 2, 1, 5, 3, 5, 1, 4]
m = {}
for n in a:
if n in m: m[n] += 1
else: m[n] = 1
print(m)
Or explicit reduce:
from functools import reduce
a = [1, 3, 2, 1, 5, 3, 5, 1, 4]
def f(m, n):
if n in m: m[n] += 1
else: m[n] = 1
return m
m2 = reduce(f, a, {})
print(m2)
use a dictionary to pair keys and values and use your x[] to track the diferrent items founded.
import collections
def keycount(l):
return collections.Counter(l)
key_count=[1, 3, 2, 1, 5, 3, 5, 1, 4]
x = []
dictionary ={}
def Collection_count(l):
for i in l:
if i not in x:
x.append(i)
dictionary[i]=1
else:
dictionary[i]=dictionary[i]+1
Collection_count(key_count)
[print(key, value) for (key, value) in sorted(dictionary.items())]

Python arranging a list to include duplicates

I have a list in Python that is similar to:
x = [1,2,2,3,3,3,4,4]
Is there a way using pandas or some other list comprehension to make the list appear like this, similar to a queue system:
x = [1,2,3,4,2,3,4,3]
It is possible, by using cumcount
s=pd.Series(x)
s.index=s.groupby(s).cumcount()
s.sort_index()
Out[11]:
0 1
0 2
0 3
0 4
1 2
1 3
1 4
2 3
dtype: int64
If you split your list into one separate list for each value (groupby), you can then use the itertools recipe roundrobin to get this behavior:
x = ([1, 2, 2, 3, 3, 3, 4, 4])
roundrobin(*(g for _, g in groupby(x)))
If I'm understanding you correctly, you want to retain all duplicates, but then have the list arranged in an order where you create what are in essence separate lists of unique values, but they're all concatenated into a single list, in order.
I don't think this is possible in a listcomp, and nothing's occurring to me for getting it done easily/quickly in pandas.
But the straightforward algorithm is:
Create a different list for each set of unique values: For i in x: if x not in list1, add to list 1; else if not in list2, add to list2; else if not in list3, ad to list3; and so on. There's certainly a way to do this with recursion, if it's an unpredictable number of lists.
Evaluate the lists based on their values, to determine the order in which you want to have them listed in the final list. It's unclear from your post exactly what order you want them to be in. Querying by the value in the 0th position could be one way. Evaluating the entire lists as >= each other is another way.
Once you have that set of lists and their orders, it's straightforward to concatenate them in order, in the final list.
essentially what you want is pattern, this pattern is nothing but the order in which we found unique numbers while traversing the list x for eg: if x = [4,3,1,3,5] then pattern = 4 3 1 5 and this will now help us in filling x again such that output will be [4,3,1,5,3]
from collections import defaultdict
x = [1,2,2,3,3,3,4,4]
counts_dict = defaultdict(int)
for p in x:
counts_dict[p]+=1
i =0
while i < len(x):
for p,cnt in counts_dict.items():
if i < len(x):
if cnt > 0:
x[i] = p
counts_dict[p]-=1
i+=1
else:
continue
else:
# we have placed all the 'p'
break
print(x) # [1, 2, 3, 4, 2, 3, 4, 3]
note: python 3.6+ dict respects insertion order and I am assuming that you are using python3.6+ .
This is what I thought of doing at first but It fails in some cases..
'''
x = [3,7,7,7,4]
i = 1
while i < len(x):
if x[i] == x[i-1]:
x.append(x.pop(i))
i = max(1,i-1)
else:
i+=1
print(x) # [1, 2, 3, 4, 2, 3, 4, 3]
# x = [2,2,3,3,3,4,4]
# output [2, 3, 4, 2, 3, 4, 3]
# x = [3,7,1,7,4]
# output [3, 7, 1, 7, 4]
# x = [3,7,7,7,4]
# output time_out
'''

Cannot iterate through two lists at once python

I am having a problem with my python code, but I am not sure what it is. I am creating a program that creates a table with all possible combinations of four digits provided the digits do not repeat, which I know is successful. Then, I create another table and attempt to add to this secondary table all of the values which use the same numbers in a different order (so I do not have, say, 1234, 4321, 3241, 3214, 1324, 2413, etc. on this table.) However, this does not seem to be working, as the second table has only one value. What have I done wrong? My code is below. Oh, and I know that the one value comes from appending the 1 at the top.
combolisttwo = list()
combolisttwo.append(1)
combolist = {(a, b, c, d) for a in {1, 2, 3, 4, 5, 6, 7, 8, 9, 0} for b in {1, 2, 3, 4, 5, 6, 7, 8, 9, 0} for c in {1, 2, 3, 4, 5, 6, 7, 8, 9, 0} for d in {1, 2, 3, 4, 5, 6, 7, 8, 9, 0} if a != b and a != c and a != d and b != c and b != d and c!=d}
for i in combolist:
x = 0
letternums = str(i)
letters = list(letternums)
for g in letters:
n = 0
hits = 0
nonhits = 0
letterstwo = str(combolisttwo[n])
if g == letterstwo[n]:
hits = hits + 1
if g != letterstwo[n]:
nonhits = nonhits + 1
if hits == 4:
break
if hits + nonhits == 4:
combolisttwo.append(i)
break
x = len(combolisttwo)
print (x)
All possible combinations of four digits provided the digits do not repeat:
import itertools as IT
combolist = list(IT.combinations(range(10), 4))
Then, I create another table and attempt to add to this secondary table all of the values which use the same numbers in a different order (so I do not have, say, 1234, 4321, 3241, 3214, 1324, 2413, etc. on this table.):
combolist2 = [item for combo in combolist
for item in IT.permutations(combo, len(combo))]
Useful references:
combinations -- for enumerating collections of elements without replacement
permutations -- for enumerating collections of elements in all possible orders
This code is pretty confused ;-) For example, you have n = 0 in your inner loop, but never set n to anything else. For another, you have x = 0, but never use x. Etc.
Using itertools is really best, but if you're trying to learn how to do these things yourself, that's fine. For a start, change your:
letters = list(letternums)
to
letters = list(letternums)
print(letters)
break
I bet you'll be surprised at what you see! The elements of your combolist are tuples, so when you do letternums = str(i) you get a string with a mix of digits, spaces, parentheses and commas. I don't think you're expecting anything but digits.
Your letterstwo is the string "1" (always, because you never change n). But it doesn't much matter, because you set hits and nonhits to 0 every time your for g in letters loop iterates. So hits and nonhits can never be bigger than 1.
Which answers your literal question ;-) combolisttwo.append(i) is never executed because
hits + nonhits == 4 is never true. That's why combolisttwo remains at its initial value ([1]).
Put some calls to print() in your code? That will help you see what's going wrong.

Filter list, only unique values - with python

I was wondering how I would take a list, ex a = [1, 5, 2, 5, 1], and have it filter out unique values, so that it only returns a number that only occurred once in the list. So it would give me a=[2] as a result.
I was able to figure out how to filter out duplicates, now how would I get rid of the duplicates?
No need for the straight answer, just a little tip or hint is welcome :)
I was able to find this on stackoverflow. It does what I want, but I do not understand the code though, can someone break it down for me?
d = {}
for i in l: d[i] = d.has_key(i)
[k for k in d.keys() if not d[k]]
>>> a = [1, 5, 2, 5, 1]
>>> from collections import Counter
>>> [k for k, c in Counter(a).iteritems() if c == 1]
[2]
Hear is what your code does:
d = {}
for i in list:
# if the item is already in the dictionary then map it to True,
# otherwise map it to False
# the first time a given item is seen it will be assigned False,
# the next time True
d[i] = d.has_key(i)
# pull out all the keys in the dictionary that are equal to False
# these are items in the original list that were only seen once in the loop
[k for k in d.keys() if not d[k]]

Categories

Resources