This question already has answers here:
Python random number excluding one variable
(6 answers)
Closed 3 years ago.
I want to generate two differnt random number lists.
The condition is the value of the first list index cannot be equal to the second list value in the same index.
For example a=[5,6,7,5] and q=[2,7,3,5] in this case the value of the fourth index in list q is equal to the value in the same index in list a. I want to avoid this. I created list a as folllowing
import random
a=[]
b=list(range(1,7164))
for i in b:
t=random.randint(1,20)
a.append(t)
how to generate the second list with above condition?
import random
def generate_n_lists(num_of_lists, num_of_elements, value_from=0, value_to=100):
s = random.sample(range(value_from, value_to + 1), num_of_lists * num_of_elements)
return [s[i*num_of_elements:(i+1)*num_of_elements] for i in range(num_of_lists)]
print(generate_n_lists(2, 5, 0, 20)) # generate 2 lists, each 5 elements, values are from 0 to 20
Prints:
[[1, 16, 4, 3, 15], [0, 10, 14, 17, 7]]
This creates a and q as tuples, but you can easily convert them to lists.
In [29]: import random
In [30]: size = 15
In [31]: maxval = 20
In [32]: a, q = zip(*[random.sample(range(1, maxval+1), 2) for z in range(size)])
In [33]: a
Out[33]: (18, 7, 12, 6, 17, 16, 12, 1, 14, 20, 9, 5, 8, 5, 18)
In [34]: q
Out[34]: (12, 10, 6, 1, 12, 15, 20, 7, 6, 10, 5, 7, 16, 7, 10)
The best approach I think would be is to iterate over each item and offset it with a random number in a way that it can't be the same as the original value.
Add the following to the end of your code:
c = []
for i in range(len(a)):
t = (a[i] + random.randint(1, 19)) % 19 + 1
c.append(t)
This way you offset each of the items with a number between 1-18, and wrap it around if it goes above 19. (+1 so it's between 1 and 19, not 0)
To avoid it, just check if it is repeating. If it is, generate a different random number again.
import random
a=[]
b=list(range(1,7164))
for i in b:
t=random.randint(1,20)
while t == i:
t = random.randint(1,20)
a.append(t)
print(a)
import random
a=[]
b=[]
for rand_a in range(1,7164):
a.append(random.randint(1,20))
random.seed()
for rand_b in range(1,7164):
r = random.randint(1,20)
# keep rolling until you get a diff number
while (a[rand_b] == r):
r = random.randint(1,20)
b.append(r)
your code example had 1 random list and 1 list of 1,7164.
This code will generate you two lists of 1,20 with a total of 17164 elements all of differing values based on their respective position in the other list.
seed probably isn't needed but
There are multiple ways to do this.
One would be to generate the second random value in a smaller range, and offset if it equals or exceeds the excluded value:
excluded_value = first_list[i]
new_value = random.randint(1, 19)
if new_value >= excluded_value:
new_value += 1
Another is to generate the lists at the same time, using random.sample to select without replacement.
possible_values = range(1, 20) # or xrange on python 2.x
while i < desired_num_values:
a, b = random.sample(possible_values, 2)
first_list.append(a)
second_list.append(b)
i += 1
I have not profiled to see if there's a notable performance difference. Both seem likely to be faster than repeatedly generating a random number until there isn't a conflict (but again, I haven't profiled to confirm). The second scales more gracefully if you want more than two lists.
These are not the only ways to do this.
Related
I am using the itertools library module in python.
I am interested the different ways to choose 15 of the first 26000 positive integers. The function itertools.combinations(range(1,26000), 15) enumerates all of these possible subsets, in a lexicographical ordering.
The binomial coefficient 26000 choose 15 is a very large number, on the order of 10^54. However, python has no problem running the code y = itertools.combinations(range(1,26000), 15) as shown in the sixth line below.
If I try to do y[3] to find just the 3rd entry, I get a TypeError. This means I need to convert it into a list first. The problem is that trying to convert it into a list gives a MemoryError. All of this is shown in the screenshot above.
Converting it into a list does work for smaller combinations, like 6 choose 3, shown below.
My question is:
Is there a way to access specific elements in itertools.combinations() without converting it into a list?
I want to be able to access, say, the first 10000 of these ~10^54 enumerated 15-element subsets.
Any help is appreciated. Thank you!
You can use a generator expression:
comb = itertools.combinations(range(1,26000), 15)
comb1000 = (next(comb) for i in range(1000))
To jump directly to the nth combination, here is an itertools recipe:
def nth_combination(iterable, r, index):
"""Equivalent to list(combinations(iterable, r))[index]"""
pool = tuple(iterable)
n = len(pool)
if r < 0 or r > n:
raise ValueError
c = 1
k = min(r, n-r)
for i in range(1, k+1):
c = c * (n - k + i) // i
if index < 0:
index += c
if index < 0 or index >= c:
raise IndexError
result = []
while r:
c, n, r = c*r//n, n-1, r-1
while index >= c:
index -= c
c, n = c*(n-r)//n, n-1
result.append(pool[-1-n])
return tuple(result)
It's also available in more_itertools.nth_combination
>>> import more_itertools # pip install more-itertools
>>> more_itertools.nth_combination(range(1,26000), 15, 123456)
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 18, 19541)
To instantly "fast-forward" a combinations instance to this position and continue iterating, you can set the state to the previously yielded state (note: 0-based state vector) and continue from there:
>>> comb = itertools.combinations(range(1,26000), 15)
>>> comb.__setstate__((0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 19540))
>>> next(comb)
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 18, 19542)
If you want to access the first few elements, it's pretty straightforward with islice:
import itertools
print(list(itertools.islice(itertools.combinations(range(1,26000), 15), 1000)))
Note that islice internally iterates the combinations up to the specified point, so it can't magically give you the middle elements without iterating all the way there. You'd have to go down the route of computing the elements you want combinatorially in that case.
for n in range(1,(len(randnum))/3):
X.append(randnum(n))
for i in range((len(randnum))/3 , (2/3)*len(randnum)):
Y.append(randnum(i))
for r in range ((2/3)*len(randnum) , len(randnum)):
Z.append(randnum(r))
I have been trying to form a list based on this criteria and I keep getting this error message for specifically this line below:
for n in range(1,(len(randnum))/3):
TypeError: 'float' object cannot be interpreted as an integer
The part of the program that is causing the problems is that part above and if I can fix it I can take the error and apply it to the rest.
Here is an example list that is used to fill the other three it has 20 elements and I want each list that I form to take from this list about 1/3 of its elements from different positions:
[ 59.18013391 12159.7881626 26308.21887981 8357.05103068
20718.85232457 16333.1546026 9828.75690047 10273.65018539
5949.58907673 8767.68292925 31826.29595355 13749.12915211
25423.61181129 28799.50849876 9517.54482827 27275.19296144
12460.2541769 25883.7888204 10393.9452616 26008.572598 ]
And I want this code to form 3 new lists containing in for example
X = [59.18013391 12159.7881626 26308.21887981 8357.05103068
20718.85232457 16333.1546026]
Y = [9828.75690047 10273.65018539
5949.58907673 8767.68292925 31826.29595355 13749.12915211 ]
Z = [ 25423.61181129 28799.50849876 9517.54482827 27275.19296144
12460.2541769 25883.7888204 10393.9452616 26008.572598]
You cannot use float for the range() function.
See: https://docs.python.org/3/library/stdtypes.html#range
(len(randnum))/3
that's float!
possible fix :
int((len(randnum))/3)
OKAY, perhaps your should try random
With repetition
import random
original_list =[ i for i in range(20)]
X = random.sample(original_list, int(len(original_list)/3))
Y = random.sample(original_list, int(len(original_list)/3))
Z = random.sample(original_list, int(len(original_list)/3))
Sample Output
X: [7, 3, 18, 15, 19, 1]
Y : [6, 13, 17, 4, 14, 5]
Z: [19, 2, 8, 18, 13, 17]
Without repetition
from random import shuffle
shuffle(original_list)
list(zip(*[iter(original_list)]*int(len(original_list)/3)))
Sample Output
[(17, 13, 15, 5, 16, 12), (14, 4, 18, 2, 19, 6), (10, 11, 7, 3, 1, 0)]
If I follow your goal, a simple approach would be to shuffle a copy of the list and then take every 3rd element starting at 0, then 1, then 2:
tmp_data = data.copy()
random.shuffle(tmp_data)
new_lists = [tmp_data[i::3] for i in range(3)]
which gives me, e.g.
In [361]: new_lists
Out[361]:
[[13749.12915211,
26008.572598,
25423.61181129,
8767.68292925,
12460.2541769,
26308.21887981,
59.18013391],
[9828.75690047,
20718.85232457,
10273.65018539,
9517.54482827,
27275.19296144,
8357.05103068,
5949.58907673],
[28799.50849876,
12159.7881626,
25883.7888204,
16333.1546026,
10393.9452616,
31826.29595355]]
and you could then do
X, Y, Z = new_lists
if you insisted on separate named variables.
(You could also simply do tmp_data = random.sample(data, len(data)) to get a random permutation of the list instead, but for some reason I find this less clear than shuffling. Not sure why.)
You could use random here but also you are going to need to make sure you don't use that random int again, my approach is append it to a used list and check that list to see if it can be used again, as far as splitting them you have 20 so you can use // floor division and % to help you create two lists of 7 items and one of 6 items.
import random
data = [
59.18013391,12159.7881626,26308.21887981, 8357.05103068,
20718.85232457,16333.1546026,9828.75690047, 10273.65018539,
5949.58907673, 8767.68292925, 31826.29595355, 13749.12915211,
25423.61181129, 28799.50849876, 9517.54482827, 27275.19296144,
12460.2541769, 25883.7888204, 10393.9452616, 26008.572598
]
y = len(data)//3
z = int((len(data) % 3)/2)
used = []
l1 = []
l2 = []
l3 = []
for i in range(y):
x = random.randint(0, len(data)-1)
while x in used:
x = random.randint(0, len(data)-1)
used.append(x)
l1.append(data[x])
for i in range(y+z):
x = random.randint(0, len(data)-1)
while x in used:
x = random.randint(0, len(data)-1)
used.append(x)
l2.append(data[x])
for i in range(y+z):
x = random.randint(0, len(data)-1)
while x in used:
x = random.randint(0, len(data)-1)
used.append(x)
l3.append(data[x])
chrx#chrx:~/python/stackoverflow/9.23$ python3.7 loop.py
l1: [8357.05103068, 10273.65018539, 26008.572598, 5949.58907673, 28799.50849876, 8767.68292925]
l2: [25423.61181129, 13749.12915211, 26308.21887981, 9828.75690047, 59.18013391, 16333.1546026, 27275.19296144]
l3: [12460.2541769, 12159.7881626, 9517.54482827, 10393.9452616, 25883.7888204, 31826.29595355, 20718.85232457]
I have a large array of integers, and I need to print the maximum of every 10 integers and its corresponding index in the array as a pair.
ex. (max_value, index of max_value in array)
I can successfully find the maximum value and the corresponding index within the first 10 integers, however I am having trouble looping through the entire array.
I have tried using:
a = some array of integers
split = [a[i:i+10] for i in xrange(0, len(a), 10)]
for i in split:
j = max(i)
k = i.index(max(i))
print (j,k)
The issue with this method is that it splits my array into chunks of 10 so the max_values are correct, but the indexes are inaccurate (all of the indexes are between 0-10.)
I need to find a way of doing this that doesn't split my array into chunks so that the original indices are retained. I'm sure there is an easier way of looping through to find max values but I can't seem to figure it out.
A small modification to your current code:
a = some array of integers
split = [a[i:i+10] for i in xrange(0, len(a), 10)]
for index, i in enumerate(split):
j = max(i)
k = i.index(max(i))
print (j, k+10*index)
You need to count the number of elements that appear before the current window. This will do the job:
a=list(range(5,35))
split = [a[i:i+10] for i in xrange(0, len(a), 10)]
for ind,i in enumerate(split):
j = max(i)
k = i.index(j)
print (j,k+ind*10)
This prints
(14, 9)
(24, 19)
(34, 29)
So with debugging with an example array, we find that split returns a 2d list like this one:
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]]
And every time the for loop runs, it does through one of those lists in order. First it goes through the first inner list then the second one etc. So every time the for loop jumps into the next list, we simply add 10. Since the list can have over 2 lists in them, we store the number we need to add in a variable and add 10 to it every loop:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
split = [a[i:i+10] for i in xrange(0, len(a), 10)]
counter = 0
for i in split:
j = max(i)
k = i.index(max(i))
print (j,k+counter)
counter += 10
You can test it here
The toolz package has a partition_all function that divides a sequence up into equal-sized tuples, so you can do something like this.
import toolz
ns = list(range(25))
[max(sublist) for sublist in toolz.partition_all(10, ns)]
This will return [9, 19, 24].
You will need to loop through in order to iterate through the list, however we could change your split's loop to make it more effective to what you want.
a = some array of integers
split = [a[i:i+10] for i in xrange(0, len(a), 10)]
for i in range(len(split)):
#Now instead of being the list, i is the index, so we can use 10*i as a counter
j = max(split[i])
#j = max(i)
k = split[i].index(j) + 10*i #replaced max(i) with j since we already calculated it.
#k = i.index(max(i))
print (j,k)
Though in the future, please make a new name for your split list since split is already a function in python. Perhaps split_list or separated or some other name that doesn't look like the split() function.
numpy solution for arbitrary input:
import numpy as np
a = np.random.randint(1,21,40) #40 random numbers from 1 to 20
b = a.reshape([4,10]) #shape into chunks 10 numbers long
i = b.argsort()[:,-1] #take the index of the largest number (last number from argsort)
# from each chunk. (these don't take into account the reshape)
i += np.arange(0,40,10) #add back in index offsets due to reshape
out = zip(i, a[i]) #zip together indices and values
You could simplify this by only enumerating once and using zip to partition your list into groups:
n=10
for grp in zip(*[iter(enumerate(some_list))]*n):
grp_max_ind, grp_mv=max(grp, key=lambda t: t[1])
k=[t[1] for t in grp].index(grp_mv)
print grp_mv, (grp_max_ind, k)
Use izip in Python 2 if you want a generator (or use Python 3)
from itertools import izip
for grp in izip(*[iter(enumerate(some_list))]*n):
grp_max_ind, grp_mv=max(grp, key=lambda t: t[1])
k=[t[1] for t in grp].index(grp_mv)
print grp_mv, (grp_max_ind, k)
Zip will truncate the last group if not a length of n
An example using numpy. First let's generate some data, i.e., integers ranging from 1 to V and of length (number of values) L:
import numpy as np
V = 1000
L = 45 # method works with arrays not multiples of 10
a = np.random.randint(1, V, size=L)
Now solve the problem for sub-arrays of size N:
import numpy as np
N = 10 # example "split" size
sa = np.array_split(a, range(N, len(a), N))
sind = [np.argpartition(i, -1)[-1] for i in sa]
ind = [np.ravel_multi_index(i, (len(sa), N)) for i in enumerate(sind)]
vals = np.asarray(a)[np.asarray(ind)]
split_imax = zip(vals, ind) # <-- output
Assume I have a list [12,12,12,12,13,13,13,13,14,14,14,14,14,14,14,15,15,15, etc]
I would like my result to be the following:
[12,12,12,13,13,13,14,14,14,15,15,15]
The number of identical numbers in the first list can vary, but I want to get triplets for each range of the identical numbers. I assume I could iterate through the list starting from the first number (12) and get the first 3 identical numbers (12,12,12), and then compare the numbers and once the number 12 changes to 13, get the next 3 numbers (13,13,13), and so on. But I cannot think of a good approach to do it correctly. Thank you for any suggestions.
I would use itertools.groupby() to isolate the strings of identical numbers, then use a list comprehension to create the triplets:
import itertools
some_list = [12,12,12,12,13,13,13,13,14,14,14,14,14,14,14,15,15,15,]
updated_list = [i for k,_ in itertools.groupby(some_list) for i in [k]*3]
assert updated_list == [12,12,12,13,13,13,14,14,14,15,15,15]
updated_list = []
curr_number = some_list[0]
curr_count = 0
for n in some_list:
if n == curr_number
curr_count += 1
if not (curr_count > 3):
updated_list.append(n)
else:
curr_number = n
curr_count = 1
updated_list.append(n)
Seems as a set approach is a bit faster than itertools. If you need it sorted, less but still faster.
A = [12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15]
def with_set(A):
A = set(A)
return list(A) * 3
import itertools
def with_iter(A):
return [i for k,_ in itertools.groupby(A) for i in [k]*3]
import timeit
print("Result with set: ", timeit.timeit(lambda:with_set(A),number = 1000))
print("Result with iter: ", timeit.timeit(lambda:with_iter(A),number = 1000))
Result with set: 0.008438773198370306
Result with iter: 0.018557160246834882
Below line of code is self explanatory:
A = [1, 2, 3, 4, 1, 4]
A = list(set(A)) #removes duplicates
A *= 3 #multiplies the unique list 3 times
print sorted(A) # makes a new sorted list
I need to build up a counting function starting from a dictionary. The dictionary is a classical Bag_of_Words and looks like as follows:
D={'the':5, 'pow':2, 'poo':2, 'row':2, 'bub':1, 'bob':1}
I need the function that for a given integer returns the number of words with at least that number of occurrences. In the example F(2)=4, all words but 'bub' and 'bob'.
First of all I build up the inverse dictionary of D:
ID={5:1, 2:3, 1:2}
I think I'm fine with that. Then here is the code:
values=list(ID.keys())
values.sort(reverse=True)
Lk=[]
Nw=0
for val in values:
Nw=Nw+ID[val]
Lk.append([Nw, val])
The code works fine but I do not like it. The point is that I would prefer to use a list comprehension to build up Lk; also I really ate the Nw variable I have used. It does not seems pythonic at all
you can create a sorted array of your word counts then find the insertion point with np.searchsorted to get how many items are to either side of it... np.searchsorted is very efficient and fast. If your dictionary doesn't change often this call is basically free compared to other methods
import numpy as np
def F(n, D):
#creating the array each time would be slow if it doesn't change move this
#outside the function
arr = np.array(D.values())
arr.sort()
L = len(arr)
return L - np.searchsorted(arr, n) #this line does all the work...
what's going on....
first we take just the word counts (and convert to a sorted array)...
D = {"I'm": 12, "pretty": 3, "sure":12, "the": 45, "Donald": 12, "is": 3, "on": 90, "crack": 11}
vals = np.arrau(D.values())
#vals = array([90, 12, 12, 3, 11, 12, 45, 3])
vals.sort()
#vals = array([ 3, 3, 11, 12, 12, 12, 45, 90])
then if we want to know how many values are greater than or equal to n, we simply find the length of the list beyond the first number greater than or equal to n. We do this by determining the leftmost index where n would be inserted (insertion sort) and subtracting that from the total number of positions (len)
# how many are >= 10?
# insertion point for value of 10..
#
# | index: 2
# v
# array([ 3, 3, 11, 12, 12, 12, 45, 90])
#find how many elements there are
#len(arr) = 8
#subtract.. 2-8 = 6 elements that are >= 10
A fun little trick for counting things: True has a numerical value of 1 and False has a numerical value of 0. SO we can do things like
sum(v >= k for v in D.values())
where k is the value you're comparing against.
collections.Counter() is ideal choice for this. Use them on dict.values() list. Also, you need not to install them explicitly like numpy. Sample example:
>>> from collections import Counter
>>> D = {'the': 5, 'pow': 2, 'poo': 2, 'row': 2, 'bub': 1, 'bob': 1}
>>> c = Counter(D.values())
>>> c
{2: 3, 1: 2, 5: 1}