List with string code names to numeric codes in python3 - python

I am very new to Python. I have this list called 'prediction' with results from an LDA classification problem. The elements in 'prediction' are string, which I want to convert to numeric values. I am doing it by brute-force like:
aux2 = [0]*len(prediction)
i = 0
for k in prediction:
if k == 'ALA':
aux2[i] = 1
elif k == 'ARG':
aux2[i] = 2
elif k == 'ASN':
aux2[i] = 3
elif k == 'ASP':
aux2[i] = 4
...
elif k == 'VAL':
aux2[i] = 18
i = i+1
But I am sure there is a better way to do it. Please put me out of my ignorance!

You could use a dictionary for this!
Dictionaries use keys to refer to certain values; This means you can assign each string an integer value according to your need and use it as so:
translation_dict = {
'ALA': 1,
'ARG': 2,
'ASN': 3,
'ASP': 4,
...
'VAL': 18
}
aux2 = [0]*len(prediction)
for i, k in enumerate(prediction):
aux2[i] = translation_dict[k]
You'll also notices I swapped out your counter (i) with an enumerate function. This function takes any iterator and returns a new iterator which gives you the index in addition to the value, thus saving you from manually incrementing i.

Related

Efficient looping algorithm in python

I am doing the below operation where the set of values in a dictionary (dictionary_with_large_values) can have more than 1 million values. Looping over each of them is taking lot of time. There are 7 to 8 keys with such data in the dictionary. What is the better algorithm that can be used in python which is more time efficient? The algorithm checks if two strings are same after error checks and create mapping from parent string with list of similar strings(dictionary_test_1). And another dictionary to create a reverse mapping(dictionary_test_2)
#type of dictionary. Values contains set of strings
dictionary_with_large_values = defaultdict(set)
dictionary_test_1 = defaultdict(set)
dictionary_test_2 = defaultdict()
# method which stores data to dictionary
# algorithm to parse the data
for k,v in dictionary_with_large_values.items():
i=0
values = list(v)
while i < len(values):
string_data = values[i].replace(" ", "")
j = i + 1
while j < len(values):
string2_data = values[j].replace(" ", "")
# algorithm to check if normalized(string_data) == normalized(string2_data)
data = areTheySame(string_data,string2_data)
if data:
dictionary_test_1[values[i]].add(values[j])
dictionary_test_2[values[j]] = values[i]
del values[j]
else:
j += 1
i += 1

Count occurance of an item in a list and store it in another list if it is exists more than once

Let's say I have the following list.
my_list = ['4/10', '8/-', '9/2', '4/11', '-/13', '19/10', '25/-', '26/-', '4/12', '10/16']
I would like to check the occurrence of each item and if it exists more than once I would like to store it in a new list.
For example from the above list, 4 is existed 3 times before / as 4/10, 4/11, 4/12. So I would like to create a new list called new list and store them as new_list = '4/10', '4/11', '4/12, 19/10'.
An additional example I want to consider also /. if 10 exist twice as 4/10 and 10/16 I don want to consider it as a duplicate since the position after and before / is different.
If there any way to count the existence of an item in a list and store them in a new list?
I tried the following but got an error.
new_list = []
d = Counter(my_list)
for v in d.items():
if v > 1:
new_list.append(v)
The error TypeError: '>' not supported between instances of 'tuple' and 'int'
Can anyone help with this?
I think below code is quite self-explanatory. It will work alright. If you have any issues or need clarification, feel free to ask.
NOTE : This code is not very efficient and can be improved a lot. But will work allright if you are not running this on extremely large data.
my_list = ['4/10', '8/-', '9/2', '4/11', '-/13', '19/10', '25/-', '26/-', '4/12', '10/16']
frequency = {}; new_list = [];
for string in my_list:
x = '';
for j in string:
if j == '/':
break;
x += j;
if x.isdigit():
frequency[x] = frequency.get(x, 0) + 1;
for string in my_list:
x = '';
for j in string:
if j == '/':
break;
x += j;
if x.isdigit():
if frequency[x] > 1:
new_list.append(string);
print(new_list);
.items() is not what you think - it returns a list of key-value pairs (tuples), not sole values. You want to:
d = Counter(node)
new_list = [ k for (k,v) in d.items() if v > 1 ]
Besides, I am not sure how node is related to my_list but I think there is some additional processing you didn't show.
Update: after reading your comment clarifying the problem, I think it requires two separate counters:
first_parts = Counter([x.split('/')[0] for x in my_list])
second_parts = Counter([x.split('/')[1] for x in my_list])
first_duplicates = { k for (k,v) in first_parts.items() if v > 1 and k != '-' }
second_duplicates = { k for (k,v) in second_parts.items() if v > 1 and k != '-' }
new_list = [ e for e in my_list if
e.split('/')[0] in first_duplicates or e.split('/')[1] in second_duplicates ]
this might help : create a dict to contain the pairings and then extract the pairings that have a length more than one. defaultdict helps with aggregating data, based on the common keys.
from collections import defaultdict
d = defaultdict(list)
e = defaultdict(list)
m = [ent for ent in my_list if '-' not in ent]
for ent in m:
front, back = ent.split('/')
d[front].append(ent)
e[back].append(ent)
new_list = []
for k,v in d.items():
if len(v) > 1:
new_list.extend(v)
for k,v in e.items():
if len(v) > 1:
new_list.extend(v)
sortr = lambda x: [int(ent) for ent in x.split("/")]
from operator import itemgetter
sorted(set(new_list), key = sortr)
print(new_list)
['4/10', '4/11', '4/12', '19/10']

How do I generate a table from a list

I have a list that contains sublists with 3 values and I need to print out a list that looks like:
I also need to compare the third column values with eachother to tell if they are increasing or decreasing as you go down.
bb = 3.9
lowest = 0.4
#appending all the information to a list
allinfo= []
while bb>=lowest:
everything = angleWithPost(bb,cc,dd,ee)
allinfo.append(everything)
bb-=0.1
I think the general idea for finding out whether or not the third column values are increasing or decreasing is:
#Checking whether or not Fnet are increasing or decreasing
ii=0
while ii<=(10*(bb-lowest)):
if allinfo[ii][2]>allinfo[ii+1][2]:
abc = "decreasing"
elif allinfo[ii][2]<allinfo[ii+1][2]:
abc = "increasing"
ii+=1
Then when i want to print out my table similar to the one above.
jj=0
while jj<=(10*(bb-lowest))
print "%8.2f %12.2f %12.2f %s" %(allinfo[jj][0], allinfo[jj][1], allinfo[jj][2], abc)
jj+=1
here is the angle with part
def chainPoints(aa,DIS,SEG,H):
#xtuple x chain points
n=0
xterms = []
xterm = -DIS
while n<=SEG:
xterms.append(xterm)
n+=1
xterm = -DIS + n*2*DIS/(SEG)
#
#ytuple y chain points
k=0
yterms = []
while k<=SEG:
yterm = H + aa*m.cosh(xterms[k]/aa) - aa*m.cosh(DIS/aa)
yterms.append(yterm)
k+=1
return(xterms,yterms)
#
#
def chainLength(aa,DIS,SEG,H):
xterms, yterms = chainPoints(aa,DIS,SEG,H)# using x points and y points from the chainpoints function
#length of chain
ff=1
Lterm=0.
totallength=0.
while ff<=SEG:
Lterm = m.sqrt((xterms[ff]-xterms[ff-1])**2 + (yterms[ff]-yterms[ff-1])**2)
totallength += Lterm
ff+=1
return(totallength)
#
def angleWithPost(aa,DIS,SEG,H):
xterms, yterms = chainPoints(aa,DIS,SEG,H)
totallength = chainLength(aa,DIS,SEG,H)
#Find the angle
thetaradians = (m.pi)/2. + m.atan(((yterms[1]-yterms[0])/(xterms[1]-xterms[0])))
#Need to print out the degrees
thetadegrees = (180/m.pi)*thetaradians
#finding the net force
Fnet = abs((rho*grav*totallength))/(2.*m.cos(thetaradians))
return(totallength, thetadegrees, Fnet)
Review this Python2 implementation which uses map and an iterator trick.
from itertools import izip_longest, islice
from pprint import pprint
data = [
[1, 2, 3],
[1, 2, 4],
[1, 2, 3],
[1, 2, 5],
]
class AddDirection(object):
def __init__(self):
# This default is used if the series begins with equal values or has a
# single element.
self.increasing = True
def __call__(self, pair):
crow, nrow = pair
if nrow is None or crow[-1] == nrow[-1]:
# This is the last row or the direction didn't change. Just return
# the direction we previouly had.
inc = self.increasing
elif crow[-1] > nrow[-1]:
inc = False
else:
# Here crow[-1] < nrow[-1].
inc = True
self.increasing = inc
return crow + ["Increasing" if inc else "Decreasing"]
result = map(AddDirection(), izip_longest(data, islice(data, 1, None)))
pprint(result)
The output:
pts/1$ python2 a.py
[[1, 2, 3, 'Increasing'],
[1, 2, 4, 'Decreasing'],
[1, 2, 3, 'Increasing'],
[1, 2, 5, 'Increasing']]
Whenever you want to transform the contents of a list (in this case the list of rows), map is a good place where to begin thinking.
When the algorithm requires data from several places of a list, offsetting the list and zipping the needed values is also a powerful technique. Using generators so that the list doesn't have to be copied, makes this viable in real code.
Finally, when you need to keep state between calls (in this case the direction), using an object is the best choice.
Sorry if the code is too terse!
Basically you want to add a 4th column to the inner list and print the results?
#print headers of table here, use .format for consistent padding
previous = 0
for l in outer_list:
if l[2] > previous:
l.append('increasing')
elif l[2] < previous:
l.append('decreasing')
previous = l[2]
#print row here use .format for consistent padding
Update for list of tuples, add value to tuple:
import random
outer_list = [ (i, i, random.randint(0,10),)for i in range(0,10)]
previous = 0
allinfo = []
for l in outer_list:
if l[2] > previous:
allinfo.append(l +('increasing',))
elif l[2] < previous:
allinfo.append(l +('decreasing',))
previous = l[2]
#print row here use .format for consistent padding
print(allinfo)
This most definitely can be optimized and you could reduce the number of times you are iterating over the data.

Finding the mode of a list

Given a list of items, recall that the mode of the list is the item that occurs most often.
I would like to know how to create a function that can find the mode of a list but that displays a message if the list does not have a mode (e.g., all the items in the list only appear once). I want to make this function without importing any functions. I'm trying to make my own function from scratch.
You can use the max function and a key. Have a look at python max function using 'key' and lambda expression.
max(set(lst), key=lst.count)
You can use the Counter supplied in the collections package which has a mode-esque function
from collections import Counter
data = Counter(your_list_in_here)
data.most_common() # Returns all unique items and their counts
data.most_common(1) # Returns the highest occurring item
Note: Counter is new in python 2.7 and is not available in earlier versions.
Python 3.4 includes the method statistics.mode, so it is straightforward:
>>> from statistics import mode
>>> mode([1, 1, 2, 3, 3, 3, 3, 4])
3
You can have any type of elements in the list, not just numeric:
>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
'red'
Taking a leaf from some statistics software, namely SciPy and MATLAB, these just return the smallest most common value, so if two values occur equally often, the smallest of these are returned. Hopefully an example will help:
>>> from scipy.stats import mode
>>> mode([1, 2, 3, 4, 5])
(array([ 1.]), array([ 1.]))
>>> mode([1, 2, 2, 3, 3, 4, 5])
(array([ 2.]), array([ 2.]))
>>> mode([1, 2, 2, -3, -3, 4, 5])
(array([-3.]), array([ 2.]))
Is there any reason why you can 't follow this convention?
There are many simple ways to find the mode of a list in Python such as:
import statistics
statistics.mode([1,2,3,3])
>>> 3
Or, you could find the max by its count
max(array, key = array.count)
The problem with those two methods are that they don't work with multiple modes. The first returns an error, while the second returns the first mode.
In order to find the modes of a set, you could use this function:
def mode(array):
most = max(list(map(array.count, array)))
return list(set(filter(lambda x: array.count(x) == most, array)))
Extending the Community answer that will not work when the list is empty, here is working code for mode:
def mode(arr):
if arr==[]:
return None
else:
return max(set(arr), key=arr.count)
In case you are interested in either the smallest, largest or all modes:
def get_small_mode(numbers, out_mode):
counts = {k:numbers.count(k) for k in set(numbers)}
modes = sorted(dict(filter(lambda x: x[1] == max(counts.values()), counts.items())).keys())
if out_mode=='smallest':
return modes[0]
elif out_mode=='largest':
return modes[-1]
else:
return modes
A little longer, but can have multiple modes and can get string with most counts or mix of datatypes.
def getmode(inplist):
'''with list of items as input, returns mode
'''
dictofcounts = {}
listofcounts = []
for i in inplist:
countofi = inplist.count(i) # count items for each item in list
listofcounts.append(countofi) # add counts to list
dictofcounts[i]=countofi # add counts and item in dict to get later
maxcount = max(listofcounts) # get max count of items
if maxcount ==1:
print "There is no mode for this dataset, values occur only once"
else:
modelist = [] # if more than one mode, add to list to print out
for key, item in dictofcounts.iteritems():
if item ==maxcount: # get item from original list with most counts
modelist.append(str(key))
print "The mode(s) are:",' and '.join(modelist)
return modelist
Mode of a data set is/are the member(s) that occur(s) most frequently in the set. If there are two members that appear most often with same number of times, then the data has two modes. This is called bimodal.If there are more than 2 modes, then the data would be called multimodal. If all the members in the data set appear the same number of times, then the data set has no mode at all. Following function modes() can work to find mode(s) in a given list of data:
import numpy as np; import pandas as pd
def modes(arr):
df = pd.DataFrame(arr, columns=['Values'])
dat = pd.crosstab(df['Values'], columns=['Freq'])
if len(np.unique((dat['Freq']))) > 1:
mode = list(dat.index[np.array(dat['Freq'] == max(dat['Freq']))])
return mode
else:
print("There is NO mode in the data set")
Output:
# For a list of numbers in x as
In [1]: x = [2, 3, 4, 5, 7, 9, 8, 12, 2, 1, 1, 1, 3, 3, 2, 6, 12, 3, 7, 8, 9, 7, 12, 10, 10, 11, 12, 2]
In [2]: modes(x)
Out[2]: [2, 3, 12]
# For a list of repeated numbers in y as
In [3]: y = [2, 2, 3, 3, 4, 4, 10, 10]
In [4]: modes(y)
Out[4]: There is NO mode in the data set
# For a list of strings/characters in z as
In [5]: z = ['a', 'b', 'b', 'b', 'e', 'e', 'e', 'd', 'g', 'g', 'c', 'g', 'g', 'a', 'a', 'c', 'a']
In [6]: modes(z)
Out[6]: ['a', 'g']
If we do not want to import numpy or pandas to call any function from these packages, then to get this same output, modes() function can be written as:
def modes(arr):
cnt = []
for i in arr:
cnt.append(arr.count(i))
uniq_cnt = []
for i in cnt:
if i not in uniq_cnt:
uniq_cnt.append(i)
if len(uniq_cnt) > 1:
m = []
for i in list(range(len(cnt))):
if cnt[i] == max(uniq_cnt):
m.append(arr[i])
mode = []
for i in m:
if i not in mode:
mode.append(i)
return mode
else:
print("There is NO mode in the data set")
I wrote up this handy function to find the mode.
def mode(nums):
corresponding={}
occurances=[]
for i in nums:
count = nums.count(i)
corresponding.update({i:count})
for i in corresponding:
freq=corresponding[i]
occurances.append(freq)
maxFreq=max(occurances)
keys=corresponding.keys()
values=corresponding.values()
index_v = values.index(maxFreq)
global mode
mode = keys[index_v]
return mode
Short, but somehow ugly:
def mode(arr) :
m = max([arr.count(a) for a in arr])
return [x for x in arr if arr.count(x) == m][0] if m>1 else None
Using a dictionary, slightly less ugly:
def mode(arr) :
f = {}
for a in arr : f[a] = f.get(a,0)+1
m = max(f.values())
t = [(x,f[x]) for x in f if f[x]==m]
return m > 1 t[0][0] else None
This function returns the mode or modes of a function no matter how many, as well as the frequency of the mode or modes in the dataset. If there is no mode (ie. all items occur only once), the function returns an error string. This is similar to A_nagpal's function above but is, in my humble opinion, more complete, and I think it's easier to understand for any Python novices (such as yours truly) reading this question to understand.
def l_mode(list_in):
count_dict = {}
for e in (list_in):
count = list_in.count(e)
if e not in count_dict.keys():
count_dict[e] = count
max_count = 0
for key in count_dict:
if count_dict[key] >= max_count:
max_count = count_dict[key]
corr_keys = []
for corr_key, count_value in count_dict.items():
if count_dict[corr_key] == max_count:
corr_keys.append(corr_key)
if max_count == 1 and len(count_dict) != 1:
return 'There is no mode for this data set. All values occur only once.'
else:
corr_keys = sorted(corr_keys)
return corr_keys, max_count
For a number to be a mode, it must occur more number of times than at least one other number in the list, and it must not be the only number in the list. So, I refactored #mathwizurd's answer (to use the difference method) as follows:
def mode(array):
'''
returns a set containing valid modes
returns a message if no valid mode exists
- when all numbers occur the same number of times
- when only one number occurs in the list
- when no number occurs in the list
'''
most = max(map(array.count, array)) if array else None
mset = set(filter(lambda x: array.count(x) == most, array))
return mset if set(array) - mset else "list does not have a mode!"
These tests pass successfully:
mode([]) == None
mode([1]) == None
mode([1, 1]) == None
mode([1, 1, 2, 2]) == None
Here is how you can find mean,median and mode of a list:
import numpy as np
from scipy import stats
#to take input
size = int(input())
numbers = list(map(int, input().split()))
print(np.mean(numbers))
print(np.median(numbers))
print(int(stats.mode(numbers)[0]))
Simple code that finds the mode of the list without any imports:
nums = #your_list_goes_here
nums.sort()
counts = dict()
for i in nums:
counts[i] = counts.get(i, 0) + 1
mode = max(counts, key=counts.get)
In case of multiple modes, it should return the minimum node.
Why not just
def print_mode (thelist):
counts = {}
for item in thelist:
counts [item] = counts.get (item, 0) + 1
maxcount = 0
maxitem = None
for k, v in counts.items ():
if v > maxcount:
maxitem = k
maxcount = v
if maxcount == 1:
print "All values only appear once"
elif counts.values().count (maxcount) > 1:
print "List has multiple modes"
else:
print "Mode of list:", maxitem
This doesn't have a few error checks that it should have, but it will find the mode without importing any functions and will print a message if all values appear only once. It will also detect multiple items sharing the same maximum count, although it wasn't clear if you wanted that.
This will return all modes:
def mode(numbers)
largestCount = 0
modes = []
for x in numbers:
if x in modes:
continue
count = numbers.count(x)
if count > largestCount:
del modes[:]
modes.append(x)
largestCount = count
elif count == largestCount:
modes.append(x)
return modes
For those looking for the minimum mode, e.g:case of bi-modal distribution, using numpy.
import numpy as np
mode = np.argmax(np.bincount(your_list))
Okey! So community has already a lot of answers and some of them used another function and you don't want.
let we create our very simple and easily understandable function.
import numpy as np
#Declare Function Name
def calculate_mode(lst):
Next step is to find Unique elements in list and thier respective frequency.
unique_elements,freq = np.unique(lst, return_counts=True)
Get mode
max_freq = np.max(freq) #maximum frequency
mode_index = np.where(freq==max_freq) #max freq index
mode = unique_elements[mode_index] #get mode by index
return mode
Example
lst =np.array([1,1,2,3,4,4,4,5,6])
print(calculate_mode(lst))
>>> Output [4]
How my brain decided to do it completely from scratch. Efficient and concise :) (jk lol)
import random
def removeDuplicates(arr):
dupFlag = False
for i in range(len(arr)):
#check if we found a dup, if so, stop
if dupFlag:
break
for j in range(len(arr)):
if ((arr[i] == arr[j]) and (i != j)):
arr.remove(arr[j])
dupFlag = True
break;
#if there was a duplicate repeat the process, this is so we can account for the changing length of the arr
if (dupFlag):
removeDuplicates(arr)
else:
#if no duplicates return the arr
return arr
#currently returns modes and all there occurences... Need to handle dupes
def mode(arr):
numCounts = []
#init numCounts
for i in range(len(arr)):
numCounts += [0]
for i in range(len(arr)):
count = 1
for j in range(len(arr)):
if (arr[i] == arr[j] and i != j):
count += 1
#add the count for that number to the corresponding index
numCounts[i] = count
#find which has the greatest number of occurences
greatestNum = 0
for i in range(len(numCounts)):
if (numCounts[i] > greatestNum):
greatestNum = numCounts[i]
#finally return the mode(s)
modes = []
for i in range(len(numCounts)):
if numCounts[i] == greatestNum:
modes += [arr[i]]
#remove duplicates (using aliasing)
print("modes: ", modes)
removeDuplicates(modes)
print("modes after removing duplicates: ", modes)
return modes
def initArr(n):
arr = []
for i in range(n):
arr += [random.randrange(0, n)]
return arr
#initialize an array of random ints
arr = initArr(1000)
print(arr)
print("_______________________________________________")
modes = mode(arr)
#print result
print("Mode is: ", modes) if (len(modes) == 1) else print("Modes are: ", modes)
def mode(inp_list):
sort_list = sorted(inp_list)
dict1 = {}
for i in sort_list:
count = sort_list.count(i)
if i not in dict1.keys():
dict1[i] = count
maximum = 0 #no. of occurences
max_key = -1 #element having the most occurences
for key in dict1:
if(dict1[key]>maximum):
maximum = dict1[key]
max_key = key
elif(dict1[key]==maximum):
if(key<max_key):
maximum = dict1[key]
max_key = key
return max_key
def mode(data):
lst =[]
hgh=0
for i in range(len(data)):
lst.append(data.count(data[i]))
m= max(lst)
ml = [x for x in data if data.count(x)==m ] #to find most frequent values
mode = []
for x in ml: #to remove duplicates of mode
if x not in mode:
mode.append(x)
return mode
print mode([1,2,2,2,2,7,7,5,5,5,5])
Here is a simple function that gets the first mode that occurs in a list. It makes a dictionary with the list elements as keys and number of occurrences and then reads the dict values to get the mode.
def findMode(readList):
numCount={}
highestNum=0
for i in readList:
if i in numCount.keys(): numCount[i] += 1
else: numCount[i] = 1
for i in numCount.keys():
if numCount[i] > highestNum:
highestNum=numCount[i]
mode=i
if highestNum != 1: print(mode)
elif highestNum == 1: print("All elements of list appear once.")
If you want a clear approach, useful for classroom and only using lists and dictionaries by comprehension, you can do:
def mode(my_list):
# Form a new list with the unique elements
unique_list = sorted(list(set(my_list)))
# Create a comprehensive dictionary with the uniques and their count
appearance = {a:my_list.count(a) for a in unique_list}
# Calculate max number of appearances
max_app = max(appearance.values())
# Return the elements of the dictionary that appear that # of times
return {k: v for k, v in appearance.items() if v == max_app}
#function to find mode
def mode(data):
modecnt=0
#for count of number appearing
for i in range(len(data)):
icount=data.count(data[i])
#for storing count of each number in list will be stored
if icount>modecnt:
#the loop activates if current count if greater than the previous count
mode=data[i]
#here the mode of number is stored
modecnt=icount
#count of the appearance of number is stored
return mode
print mode(data1)
import numpy as np
def get_mode(xs):
values, counts = np.unique(xs, return_counts=True)
max_count_index = np.argmax(counts) #return the index with max value counts
return values[max_count_index]
print(get_mode([1,7,2,5,3,3,8,3,2]))
Perhaps try the following. It is O(n) and returns a list of floats (or ints). It is thoroughly, automatically tested. It uses collections.defaultdict, but I'd like to think you're not opposed to using that. It can also be found at https://stromberg.dnsalias.org/~strombrg/stddev.html
def compute_mode(list_: typing.List[float]) -> typing.List[float]:
"""
Compute the mode of list_.
Note that the return value is a list, because sometimes there is a tie for "most common value".
See https://stackoverflow.com/questions/10797819/finding-the-mode-of-a-list
"""
if not list_:
raise ValueError('Empty list')
if len(list_) == 1:
raise ValueError('Single-element list')
value_to_count_dict: typing.DefaultDict[float, int] = collections.defaultdict(int)
for element in list_:
value_to_count_dict[element] += 1
count_to_values_dict = collections.defaultdict(list)
for value, count in value_to_count_dict.items():
count_to_values_dict[count].append(value)
counts = list(count_to_values_dict)
if len(counts) == 1:
raise ValueError('All elements in list are the same')
maximum_occurrence_count = max(counts)
if maximum_occurrence_count == 1:
raise ValueError('No element occurs more than once')
minimum_occurrence_count = min(counts)
if maximum_occurrence_count <= minimum_occurrence_count:
raise ValueError('Maximum count not greater than minimum count')
return count_to_values_dict[maximum_occurrence_count]

Learning Python and using dictionaries

I'm working through exercises in Building Skills in Python, which to my knowledge don't have any published solutions.
In any case, I'm attempting to have a dictionary count the number of occurrences of a certain number in the original list, before duplicates are removed. For some reason, despite a number of variations on the theme below, I cant seem to increment the value for each of the 'keys' in the dictionary.
How could I code this with dictionaries?
dv = list()
# arbitrary sequence of numbers
seq = [2,4,5,2,4,6,3,8,9,3,7,2,47,2]
# dictionary counting number of occurances
seqDic = { }
for v in seq:
i = 1
dv.append(v)
for i in range(len(dv)-1):
if dv[i] == v:
del dv[-1]
seqDic.setdefault(v)
currentCount = seqDic[v]
currentCount += 1
print currentCount # debug
seqDic[v]=currentCount
print "orig:", seq
print "new: ", dv
print seqDic
defaultdict is not dict (it's a subclass, and may do too much of the work for you to help you learn via this exercise), so here's a simple way to do it with plain dict:
dv = list()
# arbitrary sequence of numbers
seq = [2,4,5,2,4,6,3,8,9,3,7,2,47,2]
# dictionary counting number of occurances
seqDic = { }
for i in seq:
if i in seqDic:
seqDic[i] += 1
else:
dv.append(i)
seqDic[i] = 1
this simple approach works particularly well here because you need the if i in seqDic test anyway for the purpose of building dv as well as seqDic. Otherwise, simpler would be:
for i in seq:
seqDic[i] = 1 + seqDic.get(i, 0)
using the handy method get of dict, which returns the second argument if the first is not a key in the dictionary. If you like this idea, here's a solution that also builds dv:
for i in seq:
seqDic[i] = 1 + seqDic.get(i, 0)
if seqDic[i] == 1: dv.append(i)
Edit: If you don't case about the order of items in dv (rather than wanting dv to be in the same order as the first occurrence of item in seq), then just using (after the simple version of the loop)
dv = seqDic.keys()
also works (in Python 2, where .keys returns a list), and so does
dv = list(seqDic)
which is fine in both Python 2 and Python 3. Under the same hypothesis (that you don't care about the order of items in dv) there are also other good solutions, such as
seqDic = dict.fromkeys(seq, 0)
for i in seq: seqDic[i] += 1
dv = list(seqDic)
here, we first use the fromkeys class method of dictionaries to build a new dict which already has 0 as the value corresponding to each key, so we can then just increment each entry without such precautions as .get or membership checks.
defaultdict makes this easy:
>>> from collections import defaultdict
>>> seq = [2,4,5,2,4,6,3,8,9,3,7,2,47,2]
>>> seqDic = defaultdict(int)
>>> for v in seq:
... seqDic[v] += 1
>>> print seqDic
defaultdict(<type 'int'>, {2: 4, 3: 2, 4: 2, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 47: 1})
I'm not really sure what you try to do .. count how often each number appears?
#arbitrary sequence of numbers
seq = [2,4,5,2,4,6,3,8,9,3,7,2,47,2]
#dictionary counting number of occurances
seqDic = {}
### what you want to do, spelled out
for number in seq:
if number in seqDic: # we had the number before
seqDic[number] += 1
else: # first time we see it
seqDic[number] = 1
#### or:
for number in seq:
current = seqDic.get(number, 0) # current count in the dict, or 0
seqDic[number] = current + 1
### or, to show you how setdefault works
for number in seq:
seqDic.setdefault(number, 0) # set to 0 if it doesnt exist
seqDic[number] += 1 # increase by one
print "orig:", seq
print seqDic
How about this:
#arbitrary sequence of numbers
seq = [2,4,5,2,4,6,3,8,9,3,7,2,47,2]
#dictionary counting number of occurances
seqDic = { }
for v in seq:
if v in seqDic:
seqDic[v] += 1
else:
seqDic[v] = 1
dv = seqDic.keys()
print "orig:", seq
print "new: ", dv
print seqDic
It's clean and I think it demonstrates what you are trying to learn how to do in a simple manner. It is possible to do this using defaultdict as others have pointed out, but knowing how to do it this way is instructive too.
Or, if you use Python3, you can use collections.Counter, which is essentially a dict, albeit subclassed.
>>> from collections import Counter
>>> seq = [2,4,5,2,4,6,3,8,9,3,7,2,47,2]
>>> Counter(seq)
Counter({2: 4, 3: 2, 4: 2, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 47: 1}
for v in seq:
try:
seqDic[v] += 1
except KeyError:
seqDic[v] = 1
That's the way I've always done the inner loop of things like this.
Apart from anything else, it's significantly faster than testing membership before working on the element, so if you have a few hundred thousand elements it saves a lot of time.

Categories

Resources