I need to build up a counting function starting from a dictionary. The dictionary is a classical Bag_of_Words and looks like as follows:
D={'the':5, 'pow':2, 'poo':2, 'row':2, 'bub':1, 'bob':1}
I need the function that for a given integer returns the number of words with at least that number of occurrences. In the example F(2)=4, all words but 'bub' and 'bob'.
First of all I build up the inverse dictionary of D:
ID={5:1, 2:3, 1:2}
I think I'm fine with that. Then here is the code:
values=list(ID.keys())
values.sort(reverse=True)
Lk=[]
Nw=0
for val in values:
Nw=Nw+ID[val]
Lk.append([Nw, val])
The code works fine but I do not like it. The point is that I would prefer to use a list comprehension to build up Lk; also I really ate the Nw variable I have used. It does not seems pythonic at all
you can create a sorted array of your word counts then find the insertion point with np.searchsorted to get how many items are to either side of it... np.searchsorted is very efficient and fast. If your dictionary doesn't change often this call is basically free compared to other methods
import numpy as np
def F(n, D):
#creating the array each time would be slow if it doesn't change move this
#outside the function
arr = np.array(D.values())
arr.sort()
L = len(arr)
return L - np.searchsorted(arr, n) #this line does all the work...
what's going on....
first we take just the word counts (and convert to a sorted array)...
D = {"I'm": 12, "pretty": 3, "sure":12, "the": 45, "Donald": 12, "is": 3, "on": 90, "crack": 11}
vals = np.arrau(D.values())
#vals = array([90, 12, 12, 3, 11, 12, 45, 3])
vals.sort()
#vals = array([ 3, 3, 11, 12, 12, 12, 45, 90])
then if we want to know how many values are greater than or equal to n, we simply find the length of the list beyond the first number greater than or equal to n. We do this by determining the leftmost index where n would be inserted (insertion sort) and subtracting that from the total number of positions (len)
# how many are >= 10?
# insertion point for value of 10..
#
# | index: 2
# v
# array([ 3, 3, 11, 12, 12, 12, 45, 90])
#find how many elements there are
#len(arr) = 8
#subtract.. 2-8 = 6 elements that are >= 10
A fun little trick for counting things: True has a numerical value of 1 and False has a numerical value of 0. SO we can do things like
sum(v >= k for v in D.values())
where k is the value you're comparing against.
collections.Counter() is ideal choice for this. Use them on dict.values() list. Also, you need not to install them explicitly like numpy. Sample example:
>>> from collections import Counter
>>> D = {'the': 5, 'pow': 2, 'poo': 2, 'row': 2, 'bub': 1, 'bob': 1}
>>> c = Counter(D.values())
>>> c
{2: 3, 1: 2, 5: 1}
Related
I know there is an exhaustive solution that iterates through every single number, but how to implement the divide and conquer approach?
With an array of integers without repeated numbers, and a target product integer number, return a set of numbers that include all pairs that have the product equal to the target number.
def product_pair(num_arr, product):
"""
:type num_arr: List[int]
:type product: int
:rtype: List[int]
"""
Example 1. Product_pair([3, 5, 9, 10, 23, 53], 20) => []
Example 2. Product_pair([10, 2, 9, 30, 5, 1], 10) => [10, 2, 5, 1]
Well I'm not so sure about divide and conquer, but this will be rather efficient and quite simple:
def product_pair(num_arr, product):
value_set = set(num_arr)
sol = [n for n in num_arr if product/n in value_set]
return sol
You can do this as follows:
def f(lst, n):
lst = list(filter(lambda x: x<=n, lst)) # Note 3
res = []
seen = set()
for i, x in enumerate(lst[:-1]): # Note 4
if x in seen:
continue
rem = n / x
if rem in lst[i+1:]: # Note 1, 2
seen.add(rem)
res.extend([x, int(rem)])
return res
which for your examples, produces:
print(f([3, 5, 9, 10, 23, 53], 20)) # -> []
print(f([10, 2, 9, 30, 5, 1], 10)) # -> [10, 1, 2, 5]
Notes
Optimized membership test; you only look for membership in the slice of the list after the current element. If there was something before, you would have already found it.
I am assuming here that your list of candidates only contains integers.
You can filter out numbers that are bigger than the target number. Those are impossible to find an integer complementary to.
It follows from Note 1 that there cannot be anything you haven't found already when reaching the last number on the list.
General
Duplicates in original list: e.g., having two 4s would not return a (4, 4) result for a target number of 16.
This is definitely not the fastest one can do, but it is not too slow either.
I am trying to solve a assignment where are 13 lights and starting from 1, light is turned off at every 5th light, when the count reaches 13, start from 1st item again. The function should return the order of lights turned off. In this case, for a list of 13 items, the return list would be [5, 10, 2, 8, 1, 9, 4, 13, 12, 3, 7, 11, 6]. Also, turned off lights would not count again.
So the way I was going to approach this problem was to have a list named turnedon, which is [1,2,3,4,5,6,7,8,9,10,11,12,13] and an empty list called orderoff and append to this list whenever a light gets turned off in the turnedon list. So while the turnedon is not empty, iterate through the turnedon list and append the light getting turned off and remove that turnedoff light from the turnedon list, if that makes sense. I cannot figure out what should go into the while loop though. Any idea would be really appreciated.
def orderoff():
n=13
turnedon=[]
for n in range(1,n+1):
turnedon.append(n)
orderoff=[]
while turneon !=[]:
This problem is equivalent to the well-known Josephus problem, in which n prisoners stand in a circle, and they are killed in a sequence where each time, the next person to be killed is k steps around the circle from the previous person; the steps are only counted over the remaining prisoners. A sample solution in Python can be found on the Rosetta code website, which I've adapted slightly below:
def josephus(n, k):
p = list(range(1, n+1))
i = 0
seq = []
while p:
i = (i+k-1) % len(p)
seq.append(p.pop(i))
return seq
Example:
>>> josephus(13, 5)
[5, 10, 2, 8, 1, 9, 4, 13, 12, 3, 7, 11, 6]
This works, but the results are different from yours:
>>> pos = 0
>>> result = []
>>> while len(result) < 13 :
... pos += 5
... pos %= 13
... if pos not in result :
... result.append(pos)
...
>>> result = [i+1 for i in result] # make it 1-based, not 0-based
>>> result
[6, 11, 3, 8, 13, 5, 10, 2, 7, 12, 4, 9, 1]
>>>
I think a more optimal solution would be to use a loop, add the displacement each time, and use modules to keep the number in range
def orderoff(lights_num,step):
turnd_off=[]
num =0
for i in range(max):
num =((num+step-1)%lights_num)+1
turnd_off.append(num)
return turnd_off
print(orderoff(13))
I am using the itertools library module in python.
I am interested the different ways to choose 15 of the first 26000 positive integers. The function itertools.combinations(range(1,26000), 15) enumerates all of these possible subsets, in a lexicographical ordering.
The binomial coefficient 26000 choose 15 is a very large number, on the order of 10^54. However, python has no problem running the code y = itertools.combinations(range(1,26000), 15) as shown in the sixth line below.
If I try to do y[3] to find just the 3rd entry, I get a TypeError. This means I need to convert it into a list first. The problem is that trying to convert it into a list gives a MemoryError. All of this is shown in the screenshot above.
Converting it into a list does work for smaller combinations, like 6 choose 3, shown below.
My question is:
Is there a way to access specific elements in itertools.combinations() without converting it into a list?
I want to be able to access, say, the first 10000 of these ~10^54 enumerated 15-element subsets.
Any help is appreciated. Thank you!
You can use a generator expression:
comb = itertools.combinations(range(1,26000), 15)
comb1000 = (next(comb) for i in range(1000))
To jump directly to the nth combination, here is an itertools recipe:
def nth_combination(iterable, r, index):
"""Equivalent to list(combinations(iterable, r))[index]"""
pool = tuple(iterable)
n = len(pool)
if r < 0 or r > n:
raise ValueError
c = 1
k = min(r, n-r)
for i in range(1, k+1):
c = c * (n - k + i) // i
if index < 0:
index += c
if index < 0 or index >= c:
raise IndexError
result = []
while r:
c, n, r = c*r//n, n-1, r-1
while index >= c:
index -= c
c, n = c*(n-r)//n, n-1
result.append(pool[-1-n])
return tuple(result)
It's also available in more_itertools.nth_combination
>>> import more_itertools # pip install more-itertools
>>> more_itertools.nth_combination(range(1,26000), 15, 123456)
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 18, 19541)
To instantly "fast-forward" a combinations instance to this position and continue iterating, you can set the state to the previously yielded state (note: 0-based state vector) and continue from there:
>>> comb = itertools.combinations(range(1,26000), 15)
>>> comb.__setstate__((0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 19540))
>>> next(comb)
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 18, 19542)
If you want to access the first few elements, it's pretty straightforward with islice:
import itertools
print(list(itertools.islice(itertools.combinations(range(1,26000), 15), 1000)))
Note that islice internally iterates the combinations up to the specified point, so it can't magically give you the middle elements without iterating all the way there. You'd have to go down the route of computing the elements you want combinatorially in that case.
Say I have a simple list of numbers, e.g.
simple_list = range(100)
I would like to shorten this list such that the gaps between the values are greater than or equal to 5 for example, so it should look like
[0, 5, 10...]
FYI the actual list does not have regular increments but it is ordered
I'm trying to use list comprehension to do it but the below obviously returns an empty list:
simple_list2 = [x for x in simple_list if x-simple_list[max(0,x-1)] >= 5]
I could do it in a loop by appending to a list if the condition is met but I'm wondering specifically if there is a way to do it using list comprehension?
This is not a use case for a comprehension, you have to use a loop as there could be any amount of elements together that have less than five between them, you cannot just check the next or any n amount of numbers unless you knew the data had some very specific format:
simple_list = range(100)
def f(l):
it = iter(l)
i = next(it)
for ele in it:
if abs(ele - i) >= 5:
yield i
i = ele
yield i
simple_list[:] = f(simple_list)
print(simple_list)
[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95]
A better example to use would be:
l = [1, 2, 2, 2, 3, 3, 3, 10, 12, 13, 13, 18, 24]
l[:] = f(l)
print(l)
Which would return:
[1, 10, 18, 24]
If your data is always in ascending order you can remove the abs and just if ele - i >= 5.
If I understand your question correctly, which I'm not sure I do (please clarify), you can do this easily. Assume that a is the list you want to process.
[v for i,v in enumerate(a) if abs(a[i] - a[i - 1]) >= 5]
This gives all elements with which the difference to the previous one (should it be next?) are greater or equal than 5. There are some variations of this, according to what you need. Should the first element not be compared and excluded? The previous implementation compares it with index -1 and includes it if the criteria is met, this one excludes it from the result:
[v for i,v in enumerate(a) if i != 0 and abs(a[i] - a[i - 1]) >= 5]
On the other hand, should it always be included? Then use this:
[v for i,v in enumerate(a) if (i != 0 and abs(a[i] - a[i - 1]) >= 5) or (i == 0)]
Given an list of integers does exists a default method find the max distance between values?
So if I have this array
[1, 3, 5, 9, 15, 30]
The max step between the values is 15. Does the list object has a method for do that?
No, list objects have no standard "adjacent differences" method or the like. However, using the pairwise function mentioned in the itertools recipes:
def pairwise(iterable):
a, b = tee(iterable)
next(b, None)
return izip(a, b)
...you can (concisely and efficiently) define
>>> max(b-a for (a,b) in pairwise([1, 3, 5, 9, 15, 30]))
15
No, but it's trivial to code:
last = data[0]
dist = 0
for i in data[1:]:
dist = max(dist, i-last)
last = i
return dist
You can do:
>>> s = [1, 3, 5, 9, 15, 30]
>>> max(x[0] - x[1] for x in zip(s[1:], s))
15
This uses max and zip. It computes the difference between all consecutive elements and returns the max of those.
l=[1, 3, 5, 9, 15, 30]
max([j-i for i, j in zip(l[:-1], l[1:])])
That is using pure python and gives you the desired output "15".
If you like to work with "numpy" you could do:
import numpy as np
max(np.diff(l))
The list object does not. However, it is pretty quick to write a function that does that:
def max_step(my_list):
max_step = 0
for ind in xrange(len(my_list)-1):
step = my_list[ind+1] - my_list[ind]
if step > max_step:
max_step = step
return max_step
>>> max_step([1, 3, 5, 9, 15, 30])
15
Or if you prefer even shorter:
max_step = lambda l: max([l[i+1] - l[i] for i in xrange(len(l)-1)])
>>> max_step([1, 3, 5, 9, 15, 30])
15
It is possible to use the reduce() function, but it is not that elegant as you need some way to keep track of the previous value:
def step(maxStep, cur):
if isinstance(maxStep, int):
maxStep = (abs(maxStep-cur), cur)
return (max(maxStep[0], abs(maxStep[1]-cur)), cur)
l = [1, 3, 5, 9, 15, 30]
print reduce(step, l)[0]
The solution works by returing the previous value and the accumulated max calculation as a tuple for each iteration.
Also what is the expected outcome for [10,20,30,5]? Is it 10 or 25? If 25 then you need to add abs() to your calculation.