I have a list of phone numbers that have been dialed (nums_dialed).
I also have a set of phone numbers which are the number in a client's office (client_nums)
How do I efficiently figure out how many times I've called a particular client (total)
For example:
>>>nums_dialed=[1,2,2,3,3]
>>>client_nums=set([2,3])
>>>???
total=4
Problem is that I have a large-ish dataset: len(client_nums) ~ 10^5; and len(nums_dialed) ~10^3.
which client has 10^5 numbers in his office? Do you do work for an entire telephone company?
Anyway:
print sum(1 for num in nums_dialed if num in client_nums)
That will give you as fast as possible the number.
If you want to do it for multiple clients, using the same nums_dialed list, then you could cache the data on each number first:
nums_dialed_dict = collections.defaultdict(int)
for num in nums_dialed:
nums_dialed_dict[num] += 1
Then just sum the ones on each client:
sum(nums_dialed_dict[num] for num in this_client_nums)
That would be a lot quicker than iterating over the entire list of numbers again for each client.
>>> client_nums = set([2, 3])
>>> nums_dialed = [1, 2, 2, 3, 3]
>>> count = 0
>>> for num in nums_dialed:
... if num in client_nums:
... count += 1
...
>>> count
4
>>>
Should be quite efficient even for the large numbers you quote.
Using collections.Counter from Python 2.7:
dialed_count = collections.Counter(nums_dialed)
count = sum(dialed_count[t] for t in client_nums)
Thats very popular way to do some combination of sorted lists in single pass:
nums_dialed = [1, 2, 2, 3, 3]
client_nums = [2,3]
nums_dialed.sort()
client_nums.sort()
c = 0
i = iter(nums_dialed)
j = iter(client_nums)
try:
a = i.next()
b = j.next()
while True:
if a < b:
a = i.next()
continue
if a > b:
b = j.next()
continue
# a == b
c += 1
a = i.next() # next dialed
except StopIteration:
pass
print c
Because "set" is unordered collection (don't know why it uses hashes, but not binary tree or sorted list) and it's not fair to use it there. You can implement own "set" through "bisect" if you like lists or through something more complicated that will produce ordered iterator.
The method I use is to simply convert the set into a list and then use the len() function to count its values.
set_var = {"abc", "cba"}
print(len(list(set_var)))
Output:
2
Related
Im new to python and hit a wall with my last print in my program
I got a list of numbers created with math int(numbers that when printed looks like this
[0, 0, 0, 0] #just with random numbers from 1 - 1000
I want to add text in front of every random number in list and print it out like this
[Paul 0 Frederick 0 Ape 0 Ida 0]
Any help would be appreciated. Thanks !
Sounds like you want to make a dictionary. You could type:
d = dict()
d["Paul"] = random.randint(1,100)
....
print(d)
#output: {"Paul":1, "Fredrick":50, "Ape":25, "Ida":32}
Alternatively there is nothing stopping you from using strings and integers in the same list! Python is not strongly statically typed.
If you have a list of numbers [45,5,59,253] and you want to add names to them you probably need a loop.
nums = [45,5,59,253]
names = ["Paul", "Frederick", "Ape", "Ida"]
d = dict()
i = 0
for n in nums:
d[names[i]] = n
i+=1
or if you wanted a list
nums = [45,5,59,253]
names = ["Paul", "Frederick", "Ape", "Ida"]
list = [x for y in zip(names, nums) for x in y]
You'd have to turn your random integers into strings and add them to the text (string) you want.
Example:
lst=[]
x = str(randint(0,1000))
text = 'Alex'
final_text = text+' '+x
lst.append(final_text)
Just added the space like in your example. It'll just be a little more complex to access the numbers if you do it this way.
I'm a newbie in python, and I need to find the most frequent element in list pdInput and how many elements are the same in the list of mostFreqenNum
mostFreqenNum = []
contMostnum = [0]
ContTraining = int(input('How many time You like to Train you input: '))
for i in range(ContTraining):
pdInput = int(
input('Please input your number whatever you want: '))
mostFreqenNum.append(pdInput)
for x in mostFreqenNum:
coutFreqenNum = contMostnum.count(x)
given a list of values inp, you can find the most common like this:
using collections.Counter
from collections import Counter
most_common = Counter(inp).most_common(1)
output is a tuple with (value, count) inside
using sorted
sorted(inp, key=lambda x: inp.count(x), reverse=True)[0]
output is the most common value in the list
using numpy: # note only works with numeric values
np.argmax(np.bincount(inp))
output is the most common value in the list
one more using builtins:
max(set(inp), key=inp.count)
output is the most common value in the list
another using pandas:
import pandas as pd
pd.value_counts(inp).index[0]
output is the most common value in the list
Why you dont use the built in module from python, statistics.
you can use the module like these :
import statistics
### your input code
mode = statistics.mode(mostFreqenNum)
print(mode)
mode() receive parameter list type.
Then you can use the count().
Another example, maybe like these:
>>> import statistics
>>> lists = [2,3,2,2,3,4,5]
>>> mode = statistics.mode(lists)
>>> print(mode)
2
>>> lists.count(2)
3
>>>
I am not sure what you are trying to do exactly, but maybe this could work:
mostFreqenNum = {}
contMostnum = 0
myList = [1, 2, 3, 2, 4, 3, 2, 3, 5, 3]
for i in myList:
if i in mostFreqenNum:
mostFreqenNum[i] += 1
else:
mostFreqenNum[i] = 1
for x in mostFreqenNum:
if mostFreqenNum[x] > contMostnum:
contMostnum = mostFreqenNum[x]
mostFreqKey = x
else:
continue
print(f'Most frequent key, {mostFreqKey}, seen {contMostnum} times.')
def Prediction_Model_v3():
alnv3 = [[],[]]
inpv3 = int(input('How many time You like to Train you input V3: '))
for i in range(inpv3):
pdInpv3 = int(
input('V3 input number whatever you want: '))
alnv3[0].append(pdInpv3)
mdv3 = statistics.mode(alnv3[0])
if(pdInpv3 == mdv3):
alnv3[1].append(str(len(alnv3[1])))
print('numberInput V3: ', alnv3[0])
print('Most Frequent number V3 is ', str(mdv3), ':', str(len(alnv3[1])))
pdtISv3 = (((inpv3-int(len(alnv3[1])))*100)/inpv3)
print('Result of prediction V3 is: ', str(
mdv3), '=', str(pdtISv3), '%')
alnv3.clear()
return str(pdtISv3)
import collections
from typing import Counter
numbers = [1,3,7,4,3,0,3,6,3]
c = Counter(numbers).most_common()
print(f"The most frequent number {c[0][0]} was {c[0][1]} times repeated")
If I have an input string and an array:
s = "to_be_or_not_to_be"
pos = [15, 2, 8]
I am trying to find the longest common prefix between the consecutive elements of the array pos referencing the original s. I am trying to get the following output:
longest = [3,1]
The way I obtained this is by computing the longest common prefix of the following pairs:
s[15:] which is _be and s[2:] which is _be_or_not_to_be giving 3 ( _be )
s[2:] which is _be_or_not_to_be and s[8:] which is _not_to_be giving 1 ( _ )
However, if s is huge, I don't want to create multiple copies when I do something like s[x:]. After hours of searching, I found the function buffer that maintains only one copy of the input string but I wasn't sure what is the most efficient way to utilize it here in this context. Any suggestions on how to achieve this?
Here is a method without buffer which doesn't copy, as it only looks at one character at a time:
from itertools import islice, izip
s = "to_be_or_not_to_be"
pos = [15, 2, 8]
length = len(s)
for start1, start2 in izip(pos, islice(pos, 1, None)):
pref = 0
for pos1, pos2 in izip(xrange(start1, length), xrange(start2, length)):
if s[pos1] == s[pos2]:
pref += 1
else:
break
print pref
# prints 3 1
I use islice, izip, and xrange in case you're talking about potentially very long strings.
I also couldn't resist this "One Liner" which doesn't even require any indexing:
[next((i for i, (a, b) in
enumerate(izip(islice(s, start1, None), islice(s, start2, None)))
if a != b),
length - max((start1, start2)))
for start1, start2 in izip(pos, islice(pos, 1, None))]
One final method, using os.path.commonprefix:
[len(commonprefix((buffer(s, n), buffer(s, m)))) for n, m in zip(pos, pos[1:])]
>>> import os
>>> os.path.commonprefix([s[i:] for i in pos])
'_'
Let Python to manage memory for you. Don't optimize prematurely.
To get the exact output you could do (as #agf suggested):
print [len(commonprefix([buffer(s, i) for i in adj_indexes]))
for adj_indexes in zip(pos, pos[1:])]
# -> [3, 1]
I think your worrying about copies is unfounded. See below:
>>> s = "how long is a piece of string...?"
>>> t = s[12:]
>>> print t
a piece of string...?
>>> id(t[0])
23295440
>>> id(s[12])
23295440
>>> id(t[2:20]) == id(s[14:32])
True
Unless you're copying the slices and leaving references to the copies hanging around, I wouldn't think it could cause any problem.
edit: There are technical details with string interning and stuff that I'm not really clear on myself. But I'm sure that a string slice is not always a copy:
>>> x = 'google.com'
>>> y = x[:]
>>> x is y
True
I guess the answer I'm trying to give is to just let python manage its memory itself, to begin with, you can look at memory buffers and views later if needed. And if this is already a real problem occurring for you, update your question with details of what the actual problem is.
One way of doing using buffer this is give below. However, there could be much faster ways.
s = "to_be_or_not_to_be"
pos = [15, 2, 8]
lcp = []
length = len(pos) - 1
for index in range(0, length):
pre = buffer(s, pos[index])
cur = buffer(s, pos[index+1], pos[index+1]+len(pre))
count = 0
shorter, longer = min(pre, cur), max(pre, cur)
for i, c in enumerate(shorter):
if c != longer[i]:
break
else:
count += 1
lcp.append(count)
print
print lcp
lst = [1,2,3,4,1]
I want to know 1 occurs twice in this list, is there any efficient way to do?
lst.count(1) would return the number of times it occurs. If you're going to be counting items in a list, O(n) is what you're going to get.
The general function on the list is list.count(x), and will return the number of times x occurs in a list.
Are you asking whether every item in the list is unique?
len(set(lst)) == len(lst)
Whether 1 occurs more than once?
lst.count(1) > 1
Note that the above is not maximally efficient, because it won't short-circuit -- even if 1 occurs twice, it will still count the rest of the occurrences. If you want it to short-circuit you will have to write something a little more complicated.
Whether the first element occurs more than once?
lst[0] in lst[1:]
How often each element occurs?
import collections
collections.Counter(lst)
Something else?
For multiple occurrences, this give you the index of each occurence:
>>> lst=[1,2,3,4,5,1]
>>> tgt=1
>>> found=[]
>>> for index, suspect in enumerate(lst):
... if(tgt==suspect):
... found.append(index)
...
>>> print len(found), "found at index:",", ".join(map(str,found))
2 found at index: 0, 5
If you want the count of each item in the list:
>>> lst=[1,2,3,4,5,2,2,1,5,5,5,5,6]
>>> count={}
>>> for item in lst:
... count[item]=lst.count(item)
...
>>> count
{1: 2, 2: 3, 3: 1, 4: 1, 5: 5, 6: 1}
def valCount(lst):
res = {}
for v in lst:
try:
res[v] += 1
except KeyError:
res[v] = 1
return res
u = [ x for x,y in valCount(lst).iteritems() if y > 1 ]
u is now a list of all values which appear more than once.
Edit:
#katrielalex: thank you for pointing out collections.Counter, of which I was not previously aware. It can also be written more concisely using a collections.defaultdict, as demonstrated in the following tests. All three methods are roughly O(n) and reasonably close in run-time performance (using collections.defaultdict is in fact slightly faster than collections.Counter).
My intention was to give an easy-to-understand response to what seemed a relatively unsophisticated request. Given that, are there any other senses in which you consider it "bad code" or "done poorly"?
import collections
import random
import time
def test1(lst):
res = {}
for v in lst:
try:
res[v] += 1
except KeyError:
res[v] = 1
return res
def test2(lst):
res = collections.defaultdict(lambda: 0)
for v in lst:
res[v] += 1
return res
def test3(lst):
return collections.Counter(lst)
def rndLst(lstLen):
r = random.randint
return [r(0,lstLen) for i in xrange(lstLen)]
def timeFn(fn, *args):
st = time.clock()
res = fn(*args)
return time.clock() - st
def main():
reps = 5000
res = []
tests = [test1, test2, test3]
for t in xrange(reps):
lstLen = random.randint(10,50000)
lst = rndLst(lstLen)
res.append( [lstLen] + [timeFn(fn, lst) for fn in tests] )
res.sort()
return res
And the results, for random lists containing up to 50,000 items, are as follows:
(Vertical axis is time in seconds, horizontal axis is number of items in list)
Another way to get all items that occur more than once:
lst = [1,2,3,4,1]
d = {}
for x in lst:
d[x] = x in d
print d[1] # True
print d[2] # False
print [x for x in d if d[x]] # [1]
You could also sort the list which is O(n*log(n)), then check the adjacent elements for equality, which is O(n). The result is O(n*log(n)). This has the disadvantage of requiring the entire list be sorted before possibly bailing when a duplicate is found.
For a large list with a relatively rare duplicates, this could be the about the best you can do. The best way to approach this really does depend on the size of the data involved and its nature.
I'm working through exercises in Building Skills in Python, which to my knowledge don't have any published solutions.
In any case, I'm attempting to have a dictionary count the number of occurrences of a certain number in the original list, before duplicates are removed. For some reason, despite a number of variations on the theme below, I cant seem to increment the value for each of the 'keys' in the dictionary.
How could I code this with dictionaries?
dv = list()
# arbitrary sequence of numbers
seq = [2,4,5,2,4,6,3,8,9,3,7,2,47,2]
# dictionary counting number of occurances
seqDic = { }
for v in seq:
i = 1
dv.append(v)
for i in range(len(dv)-1):
if dv[i] == v:
del dv[-1]
seqDic.setdefault(v)
currentCount = seqDic[v]
currentCount += 1
print currentCount # debug
seqDic[v]=currentCount
print "orig:", seq
print "new: ", dv
print seqDic
defaultdict is not dict (it's a subclass, and may do too much of the work for you to help you learn via this exercise), so here's a simple way to do it with plain dict:
dv = list()
# arbitrary sequence of numbers
seq = [2,4,5,2,4,6,3,8,9,3,7,2,47,2]
# dictionary counting number of occurances
seqDic = { }
for i in seq:
if i in seqDic:
seqDic[i] += 1
else:
dv.append(i)
seqDic[i] = 1
this simple approach works particularly well here because you need the if i in seqDic test anyway for the purpose of building dv as well as seqDic. Otherwise, simpler would be:
for i in seq:
seqDic[i] = 1 + seqDic.get(i, 0)
using the handy method get of dict, which returns the second argument if the first is not a key in the dictionary. If you like this idea, here's a solution that also builds dv:
for i in seq:
seqDic[i] = 1 + seqDic.get(i, 0)
if seqDic[i] == 1: dv.append(i)
Edit: If you don't case about the order of items in dv (rather than wanting dv to be in the same order as the first occurrence of item in seq), then just using (after the simple version of the loop)
dv = seqDic.keys()
also works (in Python 2, where .keys returns a list), and so does
dv = list(seqDic)
which is fine in both Python 2 and Python 3. Under the same hypothesis (that you don't care about the order of items in dv) there are also other good solutions, such as
seqDic = dict.fromkeys(seq, 0)
for i in seq: seqDic[i] += 1
dv = list(seqDic)
here, we first use the fromkeys class method of dictionaries to build a new dict which already has 0 as the value corresponding to each key, so we can then just increment each entry without such precautions as .get or membership checks.
defaultdict makes this easy:
>>> from collections import defaultdict
>>> seq = [2,4,5,2,4,6,3,8,9,3,7,2,47,2]
>>> seqDic = defaultdict(int)
>>> for v in seq:
... seqDic[v] += 1
>>> print seqDic
defaultdict(<type 'int'>, {2: 4, 3: 2, 4: 2, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 47: 1})
I'm not really sure what you try to do .. count how often each number appears?
#arbitrary sequence of numbers
seq = [2,4,5,2,4,6,3,8,9,3,7,2,47,2]
#dictionary counting number of occurances
seqDic = {}
### what you want to do, spelled out
for number in seq:
if number in seqDic: # we had the number before
seqDic[number] += 1
else: # first time we see it
seqDic[number] = 1
#### or:
for number in seq:
current = seqDic.get(number, 0) # current count in the dict, or 0
seqDic[number] = current + 1
### or, to show you how setdefault works
for number in seq:
seqDic.setdefault(number, 0) # set to 0 if it doesnt exist
seqDic[number] += 1 # increase by one
print "orig:", seq
print seqDic
How about this:
#arbitrary sequence of numbers
seq = [2,4,5,2,4,6,3,8,9,3,7,2,47,2]
#dictionary counting number of occurances
seqDic = { }
for v in seq:
if v in seqDic:
seqDic[v] += 1
else:
seqDic[v] = 1
dv = seqDic.keys()
print "orig:", seq
print "new: ", dv
print seqDic
It's clean and I think it demonstrates what you are trying to learn how to do in a simple manner. It is possible to do this using defaultdict as others have pointed out, but knowing how to do it this way is instructive too.
Or, if you use Python3, you can use collections.Counter, which is essentially a dict, albeit subclassed.
>>> from collections import Counter
>>> seq = [2,4,5,2,4,6,3,8,9,3,7,2,47,2]
>>> Counter(seq)
Counter({2: 4, 3: 2, 4: 2, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 47: 1}
for v in seq:
try:
seqDic[v] += 1
except KeyError:
seqDic[v] = 1
That's the way I've always done the inner loop of things like this.
Apart from anything else, it's significantly faster than testing membership before working on the element, so if you have a few hundred thousand elements it saves a lot of time.