I'm doing some work with the Ising model. I've written a code to help me count the multiplicity of a lattice but I can't get up to any big numbers without getting a MemoryError.
The basic idea is, you have a list of zeros and ones, say [0,0,1,1]. I want to generate a set of all possible orderings of the ones and zeros. So in this example I want a set like this:
[(1,1,0,0),(1,0,1,0),(1,0,0,1),(0,1,1,0),(0,1,0,1),(0,0,1,1)]
At the moment I have done it like this:
set_initial=[0,0,1,1]
set_intermediate=[]
for subset in itertools.permutations(set_initial,4):
set_intermediate.append(subset)
set_final=list(set(set_intermediate))
The issue is that in the set_intermediate, for this example, there are 2^4 elements, only six of which are unique. And to take another example such as [0,0,0,0,0,0,0,0,1], there are 2^9 elements, only 9 of which are unique.
Is there another way of doing this so that set_intermediate isn't such a bottleneck?
Instead of permutations, you can think in terms of selecting the positions of the 1s as combinations. (I knew I'd done something similar before..)
from itertools import combinations
def binary_perm(seq):
n_on = sum(seq)
for comb in combinations(range(len(seq)), n_on):
out = [0]*len(seq)
for loc in comb:
out[loc] = 1
yield out
Not super-speedy, but generates exactly the right number of outputs, and so can handle longer sequences:
>>> list(binary_perm([0,0,1,1]))
[[1, 1, 0, 0], [1, 0, 1, 0], [1, 0, 0, 1], [0, 1, 1, 0], [0, 1, 0, 1], [0, 0, 1, 1]]
>>> %timeit sum(1 for x in binary_perm([1]+[0]*10**4))
1 loops, best of 3: 409 ms per loop
Of course, usually you'd want to avoid looping over these in the first place, but depending on what you're doing with the permuations you might not be able to get away with simply calculating the number of unique permutations directly.
Try this inbuilt method itertools.permutation(iterable,r)
Related
I have a bottleneck in a piece of code that is ruining the performance of my code. I re-wrote the section, but, after timing it, things didn't improve.
The problem is as follows. Given a list of fixed-length-lists of integers
data = [[1,2,3], [3,2,1], [8,1,0], [1,3,4]]
I need to append the index of each sublist to a separate list as many times as its list value at a given column index. There is a separate list for each column in the data.
For instance, for the above data, there will be three resulting lists since the sub-lists have three columns.
There are 4 sublists, so we expect the numbers 0-3 to appear in each of the final lists.
We expect the following three lists to be generated from the above data
[[0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3],
[0, 0, 1, 1, 2, 3, 3, 3],
[0, 0, 0, 1, 3, 3, 3, 3]]
I have two ways of doing this:
processed_data = list([] for _ in range(len(data[0])))
for n in range(len(data)):
sub_list = data[n]
for k, proc_list in enumerate(processed_data):
for _ in range(sub_list[k]):
proc_list.append(n)
processed_data = []
for i, col in enumerate(zip(*data)):
processed_data.append([j for j,count in enumerate(col) for _ in range(count)])
The average size of the data list is around 100,000.
Is there a way I can speed this up?
You can't improve the computational complexity of your algorithm unless you're able to tweak the output format (see below). In other words, you'll at best be able to improve the speed by a modest percentage (and the percentage will be independent of the size of the input).
I don't see any obvious implementation issues. The one idea I had was to get rid of the large number of append() calls and the overhead that is incurred by gradual list expansions by preallocating the output matrix, but #juanpa.arrivillaga suggests in their comment that append() is in fact very optimized on CPython. If you're on another interpreter, you could try it: you know that the length of the output list for column c will be equal to the sum of all the input numbers in column c. So you can just preallocate each output list by using [0] * sum_of_input_values_at_column_c, and then do proc_list[i] = n instead of proc_list.append(n) (and manually increment i). This does, however, require two passes over the input, so it might not actually be an improvement - your problem is quite memory-intensive as its core computation is extremely simple.
The reason that you can't improve the computational complexity is that it is already optimal: any algorithm needs to spend time on generating its output, so the size of the output is a lower bound for how fast the algorithm can possibly be. And in your case, the size of the output is equal to the sum of the values in your input matrix (and it's generally considered bad when you depend on the input values themselves rather than on the number of input values). And that's the number of iterations that your algorithm spends, so it is optimal. However, if the output of this function is going to reside in memory to be consumed by another function (rather than being written to a file), and you are able to make some adaptations in that function, you could instead output a matrix of generators, where each generator knows that it needs to generate sub_list[k] occurrences of n. Then, the complexity of your algorithm becomes proportional to the size of the input matrix (but consuming the output will still take the same amount of time that it would have taken to generate the full output).
Perhaps itertools can make this go faster for you by minimizing the amount of python code inside loops:
data = [[1,2,3], [3,2,1], [8,1,0], [1,3,4]]
from itertools import chain,repeat,starmap
result = [ list(chain.from_iterable(starmap(repeat,r)))
for r in map(enumerate,zip(*data)) ]
print(result)
[[0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3],
[0, 0, 1, 1, 2, 3, 3, 3],
[0, 0, 0, 1, 3, 3, 3, 3]]
If you're processing the output in the same order as the result's rows come out, you can convert this to a generator and use it directly in your main process:
iResult = ( chain.from_iterable(starmap(repeat,r))
for r in map(enumerate,zip(*data)) )
for iRow in iResult: # iRow is also an iterator
for resultItem in iRow:
# Perform your item processing here
print(resultItem, end=" ")
print()
0 1 1 1 2 2 2 2 2 2 2 2 3
0 0 1 1 2 3 3 3
0 0 0 1 3 3 3 3
This will avoid creating and storing the lists of indexes altogether (i.e. bringing that bottleneck down to zero). But that's only if you process the result sequentially
I am looking to take a numpy array which is a 1D boolean mask of size N, and transform it into a new mask where each element represents a boolean AND over two mask elements (I don't want to repeat the same combinations twice since the order has no importance for the logical 'AND').
Example input:
mask = [1, 0, 1] = [a, b, c]
Expected output:
newmask = [1*0, 1*1, 0*1] = [0, 1, 0] = [a*b, a*c, b*c]
From a list of elements you can create all possibile combinations of them where their order doesn't matter, without wasting time on repeated combinations:
from itertools import combinations_with_replacement
import numpy as np
n = 3
elements_to_combine = [0, 1]
for c in combinations_with_replacement(elements_to_combine, n):
x = np.array(list(c))
print(x)
and the output is:
[0, 0, 0]
[0, 0, 1]
[0, 1, 1]
[1, 1, 1]
Now you have a straight foward method to compute only the combinations you need. You may also add elements to the list "elements_to_combine" and you may also increase the size of n according to your needs. Since you didn't specify precisely the kind of elmeents to be used and how you intend to mask your elements using the logical AND operations, I will leave the rest to you. Hope this solves your performance issues.
Cheers!
I am relatively new to python and still figuring out stuff. I wanted to check if there is an equivalent of r's rep command in python to replicate entire vector and not each element. I used numpy.repeat but it only repeats each element given times, is there a way to tweak it to repeat the entire vector?
example:
y=np.repeat(np.arange(0,2),3)
print(y)
array([0, 0, 0, 1, 1, 1])
expected output using r's rep
a<-c(0,1)
rep(a,3)
0 1 0 1 0 1
I'm no expert in R by any means but as far as I can tell, this is what you are looking for:
>>> np.tile([0, 1], 3)
array([0, 1, 0, 1, 0, 1])
your expected output is not in python (even though that's what you want) but if i try to translate it basically you want something that transforms lets say [0,1,2]
to [0,1,2,0,1,2,0,1,2 ...] with any number of repetitions
in python you can simply multiply a list with a number to get that:
lst = [0,1]
lst2 = lst*3
print(lst2)
this will print [0, 1, 0, 1, 0, 1]
Straight from the docs. np.repeat simply repeat the element present in the iterable to the number of times specified in the argument.
Other than what has already been posted is use of repeat and chain of itertools
from itertools import repeat, chain
list(chain(*(repeat((1,2),3)))) # [1, 2, 1, 2, 1, 2]
I have a list of randomly generated 0's and 1's. I need to pick a single bit and flip it to its opposite: either a zero to a one or a one to a zero.
The bitwise not operator only works on integers and long integers, and the xor (^) operator works with two integers.
Enter a population size: 4
Enter an organism length: 2
[[1, 1], [0, 1], [0, 0], [0, 0]]
[[[1, 1], [0, 1]]]
The code above is part of a short user input program where the user inputs pop size and length. The program prints a list of randomly generated numbers with mind to pop size and length and takes the top 50%, which is the second printed list. Now I need to pick a random bit from the second list and flip it to either a zero or a one. Not the whole list, however.
Links and explanations are much appreciated, I'm looking to improve.
You can use randint to generate a random integer for the index, and then use for instance ^1 to flip it, like:
from random import randint
pop = [[1, 1], [0, 1], [0, 0], [0, 0]]
individual = pop[1] # select the second individual
individual[randint(0,len(individual)-1)] ^= 1 # flip the bit
After I ran this, I obtained:
>>> pop
[[1, 1], [0, 0], [0, 0], [0, 0]]
so it flipped the second bit. But it could have been the first as well. By using len(individual) we guarantee that if the number of bits of the individual increases, it will still work.
That been said encoding bits as 0-1s in a list, is not very efficient. You can use ints in Python as a list of bits (ints have arbitrary length in python-3.x).
EDIT
If you want to flip a bit per individual (for every individual it can be a different bit), you can use a for loop:
for individual in pop: # iterate over population
individual[randint(0,len(individual)-1)] ^= 1 # flip a random bit
If I run this with your given initial population, I get:
>>> pop
[[0, 1], [1, 1], [0, 1], [1, 0]]
so every individual has exactly one bit flipped, and not all the same one. Of course it is random, so it is possible that at a certain run, the random number generator will pick the same bit for every individual.
To flip a single bit, use logic arguments.
int(not 0) = 1
int(not 1) = 0
If you have to flip exactly one bit in every population, I would suggest:
chrom = random.choice(pop)
j = random.randrange(len(chrom))
chrom[j] = int(not chrom[j])
I am attempting Project Euler #15, which essentially reduces to computing the number of binary lists of length 2*size such that their entries sum to size, for the particular case size = 20. For example, if size = 2 there are 6 such lists: [1,1,0,0], [1,0,1,0], [1,0,0,1], [0,1,1,0], [0,1,1,0], [0,1,0,1], [0,0,1,1]. Of course the number of such sequences is trivial to compute for any value size and is equal to some binomial coefficient but I am interested in explicitly generating the correct sequences in Python. I have tried the following:
import itertools
size = 20
binary_lists = itertools.product(range(2), repeat = 2*size)
lattice_paths = {lists for lists in binary_lists if sum(lists) == size}
but the last line makes me run into memory errors. What would be a neat way to accomplish this?
There are far too many for the case of size=20 to iterate over (even if we don't materialize them, 137846528820 is not a number we can loop over in a reasonable time), so it's not particularly useful.
But you can still do it using built-in tools by thinking of the positions of the 1s:
from itertools import combinations
def bsum(size):
for locs in combinations(range(2*size), size):
vec = [0]*(2*size)
for loc in locs:
vec[loc] = 1
yield vec
which gives
>>> list(bsum(1))
[[1, 0], [0, 1]]
>>> list(bsum(2))
[[1, 1, 0, 0], [1, 0, 1, 0], [1, 0, 0, 1], [0, 1, 1, 0], [0, 1, 0, 1], [0, 0, 1, 1]]
>>> sum(1 for x in bsum(12))
2704156
>>> factorial(24)//factorial(12)**2
2704156
I'm not 100% sure of the math on this problem, but your last line is taking a generator and dumping it into a list, and based on your example, and your size of 20, that is a massive list. If you want to sum it, just iterate, but I don't think you can get a nice view of every combo