I am quite new to python so still getting to grips with the language.
I have the following function which takes a string and apply it to an algorithm which tells us if it aligns to models 1, 2, 3, 4, or 5.
Currently this piece of code:
def apply_text(text):
test_str = [text]
test_new = tfidf_m.transform(test_str)
prediction = 0
for m in range(0,5):
percentage = '{P:.1%}'.format(M=cat[m], P=lr_m[m].predict_proba(test_new)[0][1])
print(percentage)
And running the following function: apply_text('Terrible idea.')
Gives the following output:
71.4%
33.1%
2.9%
1.6%
4.9%
With Model 1 = 71.4%, Model 2 = 33.1%, ... Model 5 = 4.9%.
I want to only output the Model number where there is the highest percentage. So in the above example, the answer would be 1 as this has 71.4%.
As the output is a string type I am finding it difficult to find ways of converting this to an int and then comparing each value (probably in a loop of some sort) to obtain the maximum value
I think you want to save the percentages along with the model number, sort it and then return the highest.
This can be done by something like this:
def apply_text(text):
test_str = [text]
test_new = tfidf_m.transform(test_str)
prediction = 0
percentage_list = []
for m in range(0,5):
percentage = '{P:.1}'.format(M=cat[m], P=lr_m[m].predict_proba(test_new)[0][1])
percentage_list.append([m+1, float(percentage)])
percentage_list.sort(reverse=True, key=lambda a: a[1])
return percentage_list[0][0]
Things to note:
Sorting in reverse order as default is ascending. You could skip reversing and access the last element of precentage_list by accessing -1 element
The key function is used as we need to sort using the percentage
Try putting values in a list then you can utilize list methods:
percentage = []
for m in range(0, 5):
percentage.append('{P:.1%}'.format(M=cat[m], P=lr_m[m].predict_proba(test_new)[0][1]))
print(*percentage, sep='\n')
print('Max on model', percentage.index(max(percentage)))
Or using a dictionary:
percentage = {}
for m in range(0, 5):
percentage['Model ' + str(m)] = '{P:.1%}'.format(M=cat[m], P=lr_m[m].predict_proba(test_new)[0][1])
print(*percentage, sep='\n')
print('Max on', max(percentage.keys(), key=(lambda key: percentage[key])))
Related
I currently have the numbers above in a list. How would you go about adding similar numbers (by nearest 850) and finding average to make the list smaller.
For example I have the list
l = [2000,2200,5000,2350]
In this list, i want to find numbers that are similar by n+500
So I want all the numbers similar by n+500 which are 2000,2200,2350 to be added and divided by the amount there which is 3 to find the mean. This will then replace the three numbers added. so the list will now be l = [2183,5000]
As the image above shows the numbers in the list. Here I would like the numbers close by n+850 to all be selected and the mean to be found
It seems that you look for a clustering algorithm - something like K-means.
This algorithm is implemented in scikit-learn package
After you find your K means, you can count how many of your data were clustered with that mean, and make your computations.
However, it's not clear in your case what is K. You can try and run the algorithm for several K values until you get your constraints (the n+500 distance between the means)
You can use:
import numpy as np
l = np.array([2000,2200,5000,2350])
# find similar numbers (that are within each 500 fold)
similar = l // 500
# for each similar group get the average and convert it to integer (as in the desired output)
new_list = [np.average(l[similar == num]).astype(int) for num in np.unique(similar)]
print(new_list)
Output:
[2183, 5000]
Step 1:
list = [5620.77978515625,
7388.43017578125,
7683.580078125,
8296.6513671875,
8320.82421875,
8557.51953125,
8743.5,
9163.220703125,
9804.7939453125,
9913.86328125,
9940.1396484375,
9951.74609375,
10074.23828125,
10947.0419921875,
11048.662109375,
11704.099609375,
11958.5,
11964.8232421875,
12335.70703125,
13103.0,
13129.529296875,
16463.177734375,
16930.900390625,
17712.400390625,
18353.400390625,
19390.96484375,
20089.0,
34592.15625,
36542.109375,
39478.953125,
40782.078125,
41295.26953125,
42541.6796875,
42893.58203125,
44578.27734375,
45077.578125,
48022.2890625,
52535.13671875,
58330.5703125,
61597.91796875,
62757.12890625,
64242.79296875,
64863.09765625,
66930.390625]
Step 2:
seen = [] #to log used indices pairs
diff_dic = {} #to record indices and diff
for i,a in enumerate(list):
for j,b in enumerate(list):
if i!=j and (i,j)[::-1] not in seen:
seen.append((i,j))
diff_dic[(i,j)] = abs(a-b)
keys = []
for ind, diff in diff_dic.items():
if diff <= 850:
keys.append(ind)
uniques_k = [] #to record unique indices
for pair in keys:
for key in pair:
if key not in uniques_k:
uniques_k.append(key)
import numpy as np
list_arr = np.array(list)
nearest_avg = np.mean(list_arr[uniques_k])
list_arr = np.delete(list_arr, uniques_k)
list_arr = np.append(list_arr, nearest_avg)
list_arr
output:
array([ 5620.77978516, 34592.15625, 36542.109375, 39478.953125, 48022.2890625, 52535.13671875, 58330.5703125 , 61597.91796875, 62757.12890625, 66930.390625 , 20566.00205365])
You just need a conditional list comprehension like this:
l = [2000,2200,5000,2350]
n = 2000
a = [ (x) for x in l if ((n -250) < x < (n + 250)) ]
Then you can average with
np.mean(a)
or whatever method you prefer.
can anyone explain why my code for a hacker rank example is timing out. I'm new to whole idea of efficiency of code based on processing time. The code seems to work on small sets, but once I start testing cases using large datasets it times out. I've provided a brief explanation of the method and its purpose for context. But if you could provide any tips if you notice functions I'm using that might consume a large amount of runtime that would be great.
Complete the migratoryBirds function below.
Params: arr: an array of tallies of species of birds sighted by index.
For example. arr = [Type1 = 1, Type2 = 4, Type3 = 4, Type4 = 4, Type5 = 5, Type6 = 3]
Return the lowest type of the the mode of sightings. In this case 4 sightings is the
mode. Type2 is the lowest type that has the mode. So return integer 2.
def migratoryBirds(arr):
# list of counts of occurrences of birds types with the same
# number of sightings
bird_count_mode = []
for i in range(1, len(arr) + 1):
occurr_count = arr.count(i)
bird_count_mode.append(occurr_count)
most_common_count = max(bird_count_mode)
common_count_index = bird_count_mode.index(most_common_count) + 1
# Find the first occurrence of that common_count_index in arr
# lowest_type_bird = arr.index(common_count_index) + 1
# Expect Input: [1,4,4,4,5,3]
# Expect Output: [1 0 1 3 1 0], 3, 4
return bird_count_mode, most_common_count, common_count_index
P.S. Thank you for the edit Chris Charley. I just tried to edit it at the same time
Use collections.Counter() to create a dictionary that maps species to their counts. Get the maximum count from this, then get all the species with that count. Then search the list for the first element of one of those species.
import collections
def migratoryBirds(arr):
species_counts = collections.Counter(arr)
most_common_count = max(species_counts.values())
most_common_species = {species for species, count in species_counts if count = most_common_count}
for i, species in arr:
if species in most_common_species:
return i
I'm a newbie in python, and I need to find the most frequent element in list pdInput and how many elements are the same in the list of mostFreqenNum
mostFreqenNum = []
contMostnum = [0]
ContTraining = int(input('How many time You like to Train you input: '))
for i in range(ContTraining):
pdInput = int(
input('Please input your number whatever you want: '))
mostFreqenNum.append(pdInput)
for x in mostFreqenNum:
coutFreqenNum = contMostnum.count(x)
given a list of values inp, you can find the most common like this:
using collections.Counter
from collections import Counter
most_common = Counter(inp).most_common(1)
output is a tuple with (value, count) inside
using sorted
sorted(inp, key=lambda x: inp.count(x), reverse=True)[0]
output is the most common value in the list
using numpy: # note only works with numeric values
np.argmax(np.bincount(inp))
output is the most common value in the list
one more using builtins:
max(set(inp), key=inp.count)
output is the most common value in the list
another using pandas:
import pandas as pd
pd.value_counts(inp).index[0]
output is the most common value in the list
Why you dont use the built in module from python, statistics.
you can use the module like these :
import statistics
### your input code
mode = statistics.mode(mostFreqenNum)
print(mode)
mode() receive parameter list type.
Then you can use the count().
Another example, maybe like these:
>>> import statistics
>>> lists = [2,3,2,2,3,4,5]
>>> mode = statistics.mode(lists)
>>> print(mode)
2
>>> lists.count(2)
3
>>>
I am not sure what you are trying to do exactly, but maybe this could work:
mostFreqenNum = {}
contMostnum = 0
myList = [1, 2, 3, 2, 4, 3, 2, 3, 5, 3]
for i in myList:
if i in mostFreqenNum:
mostFreqenNum[i] += 1
else:
mostFreqenNum[i] = 1
for x in mostFreqenNum:
if mostFreqenNum[x] > contMostnum:
contMostnum = mostFreqenNum[x]
mostFreqKey = x
else:
continue
print(f'Most frequent key, {mostFreqKey}, seen {contMostnum} times.')
def Prediction_Model_v3():
alnv3 = [[],[]]
inpv3 = int(input('How many time You like to Train you input V3: '))
for i in range(inpv3):
pdInpv3 = int(
input('V3 input number whatever you want: '))
alnv3[0].append(pdInpv3)
mdv3 = statistics.mode(alnv3[0])
if(pdInpv3 == mdv3):
alnv3[1].append(str(len(alnv3[1])))
print('numberInput V3: ', alnv3[0])
print('Most Frequent number V3 is ', str(mdv3), ':', str(len(alnv3[1])))
pdtISv3 = (((inpv3-int(len(alnv3[1])))*100)/inpv3)
print('Result of prediction V3 is: ', str(
mdv3), '=', str(pdtISv3), '%')
alnv3.clear()
return str(pdtISv3)
import collections
from typing import Counter
numbers = [1,3,7,4,3,0,3,6,3]
c = Counter(numbers).most_common()
print(f"The most frequent number {c[0][0]} was {c[0][1]} times repeated")
I need help with the following problem for computer science
A clerk works in a store where the cost of each item is a positive integer number of dollars. So, for example,
something might cost $21, but nothing costs $9.99.
In order to make change a clerk has an unbounded number
of bills in each of the following denominations: $1, $2, $5, $10, and $20.
Write a procedure that takes two
arguments, the cost of an item and the amount paid, and prints how to make change using the smallest
possible number of bills.
Since I am also a beginner, I'll take it as a practice on python. Please see the codes below:
def pay_change(paid, cost):
# set up the change and an empty dictionary for result
change = paid - cost
result = {}
# get the result dictionary values for each bill
n_twenty = change // 20
result['$20'] = n_twenty
rest = change % 20
n_ten = rest // 10
result['$10'] = n_ten
rest = rest % 10
n_five = rest // 5
result['$5'] = n_five
rest = rest % 5
n_two = rest // 2
result['$2'] = n_two
rest = rest % 2
n_one = rest // 1
result['$1'] = n_one
# print(result) if you want to check the result dictionary
# present the result, do not show if value is 0
for k, v in result.items():
if v != 0:
print('Need', v, 'bills of', k)
The logic is to assume the change is over 20, and slowly calculated down, by using //, and calculate the rest by using %. No matter what, we end up with a dictionary, that gives how many bills are needed for each dollar bill.
And then, for those dollar bills that the value is 0, we don't need to show them, so I wrote a for loop to exam the values in this dictionary.
OK, now I've simplified to codes to avoid repeating snippets, I am quite happy with it:
def pay_change(paid, price):
# set up the change and an empty dictionary for result
global change
change = paid - price
bills = ['$20', '$10', '$5', '$2', '$1']
# create a function to calculate the change for each bills
def f(x):
global change
result = divmod(change, x)[0]
change = divmod(change, x)[1]
return result
temp = list(map(f, (20, 10, 5, 2, 1)))
# generate the final result as a dictionary
result = dict(zip(bills, temp))
# present the result, do not show if value is 0
for k, v in result.items():
if v != 0:
print('Need', v, 'bills of', k)
This question already has answers here:
Finding median of list in Python
(28 answers)
Closed 6 years ago.
I have data like this.
Ram,500
Sam,400
Test,100
Ram,800
Sam,700
Test,300
Ram,900
Sam,800
Test,400
What is the shortest way to fine the "median" from above data.
My result should be something like...
Median = 1/2(n+1), where n is the number of data values in the sample.
Test 500
Sam 700
Ram 800
Python 3.4 includes statistics built-in, so you can use the method statistics.median:
>>> from statistics import median
>>> median([1, 3, 5])
3
Use numpy's median function.
Its a little unclear how your data is actually represented, so I've assumed it is a list of tuples:
data = [('Ram',500), ('Sam',400), ('Test',100), ('Ram',800), ('Sam',700),
('Test',300), ('Ram',900), ('Sam',800), ('Test',400)]
from collections import defaultdict
def median(mylist):
sorts = sorted(mylist)
length = len(sorts)
if not length % 2:
return (sorts[length / 2] + sorts[length / 2 - 1]) / 2.0
return sorts[length / 2]
data_dict = defaultdict(list)
for el in data:
data_dict[el[0]].append(el[1])
print [(key,median(val)) for key, val in data_dict.items()]
print median([5,2,4,3,1])
print median([5,2,4,3,1,6])
#output:
[('Test', 300), ('Ram', 800), ('Sam', 700)]
3
3.5
The function median returns the median from a list. If there are an even number of entries it takes the middle value of the middle two entries (this is standard).
I've used defaultdict to create a dict keyed by your data and their values, which is a more useful representation of your data.
Check this out:
def median(lst):
even = (0 if len(lst) % 2 else 1) + 1
half = (len(lst) - 1) / 2
return sum(sorted(lst)[half:half + even]) / float(even)
Note:
sorted(lst) produces a sorted copy of lst;
sum([1]) == 1;
Easiest way to get the median of a list with integer data:
x = [1,3,2]
print "The median of x is:",sorted(x)[len(x)//2]
I started with user3100512's answer and quickly realized it doesn't work for an even number of items. I added some conditionals to it to compute the median.
def median(x):
if len(x)%2 != 0:
return sorted(x)[len(x)/2]
else:
midavg = (sorted(x)[len(x)/2] + sorted(x)[len(x)/2-1])/2.0
return midavg
median([4,5,6,7])
should return 5.5