Python Alpha Sort a List - python

I have a list from 1 to 25. Right now it's properly sorted form 1 to 25 but I need it sorted like this:
[1,10,11,12,13,14,15,16,17,18,19,2,20, .. etc]
I can't find anything online that would allow me to do that. Thanks.

Try this:
>>>l = range(1,26)
>>>sorted(l, key=str)
[1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22, 23, 24, 25, 3, 4, 5, 6, 7, 8, 9]

Related

Python speed up permutation of bits within 32Bit number

I'm interested in reordering the bits within a number, and since I want to do it several trillion times, I want to do it fast.
Here are the details: given a number num and an order matrix order.
order contains up to ~6000 lines of permutations of the numbers 0..31.
These are the positions to which the bits change.
Simplified example: binary(num) = 1001, order[1]=[0,1,3,2], reordered number for order[1] would be 1010 (binary).
Now I want to know, if my input number num is the smallest of these (~6000) reordered numbers. I'm searching for all 32-Bit numbers which fullfill this criterion.
My current approach is to slow, so I'm looking for a speedup.
minimal-reproducible-example:
num = 1753251840
order = [[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
[ 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, 19, 18, 17, 16, 23, 22, 21, 20, 27, 26, 25, 24, 31, 30, 29, 28],
[15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16],
[31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
[ 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23, 8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31],
[21, 20, 23, 22, 29, 28, 31, 30, 17, 16, 19, 18, 25, 24, 27, 26, 5, 4, 7, 6, 13, 12, 15, 14, 1, 0, 3, 2, 9, 8, 11, 10]]
patterns=set()
bits = format(num, '032b')
for perm in order:
bitsn = [bits[perm[i]] for i in range(32)]
patterns.add(int(''.join(bitsn),2))
print( min(patterns)==num)
Where can I start to improve this?
Extracting bits using string is generally very inefficient (whatever the language). The same thing also apply for parsing. Moreover, for such a fast low-level operation, you need to use a JIT or a compiled language as comments already pointed out.
Here is a prototype using the Numba's JIT (assume all numbers are unsigned):
npOrder = np.array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
[ 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, 19, 18, 17, 16, 23, 22, 21, 20, 27, 26, 25, 24, 31, 30, 29, 28],
[15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16],
[31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
[ 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23, 8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31],
[21, 20, 23, 22, 29, 28, 31, 30, 17, 16, 19, 18, 25, 24, 27, 26, 5, 4, 7, 6, 13, 12, 15, 14, 1, 0, 3, 2, 9, 8, 11, 10]], dtype=np.uint32)
#njit
def extractBits(num):
bits = np.empty(32, dtype=np.int32)
for i in range(32):
bits[i] = (num >> i) & 0x01
return bits
#njit
def permuteAndMerge(bits, perm):
bitsnFinal = 0
for i in range(32):
bitsnFinal |= bits[31-perm[i]] << i
return bitsnFinal
#njit
def computeOptimized(num):
bits = extractBits(num)
permCount = npOrder.shape[0]
patterns = np.empty(permCount, dtype=np.uint32)
for i in range(permCount):
patterns[i] = permuteAndMerge(bits, npOrder[i])
# The array can be converted to a set if needed here with: set(patterns)
return min(patterns) == num
This code is about 25 time faster than the original one on my machine (ran 5 000 000 times).
You can also use Numba to accelerate and parallelize the loop that run the function computeOptimized resulting in a significant additional speed-up.
Note that this code can be again much faster in C or C++ using low-level processor instructions (available for example on many x86_64 processors). With that and parallelism, the order of magnitude of the execution speed should be close to a billion of permutation per second.
Couple of possible speed-ups, staying with Python and the current algorithm:
Bail out as soon as you find a pattern less than num; once one like that is found, the condition cannot possibly be true. (You also don't need to store patterns; at most a flag whether an equal one was found, if that's not guaranteed by the problem.)
bitsn could be a generator expression, and doesn't need to be in a variable; you'll have to measure whether that's faster.
More fundamental improvements:
If you want to find all the numbers (rather than just test a particular one), it feels like there ought to be a faster algorithm by considering what the bits mean. A couple of hours thinking could potentially let you process just the 6000 lists, rather than all 2³² integers.
As others have written, if you're after pure speed, python is not the ideal language. That depends on the balance of how much time you want to spend on programming vs on running the program.
Side note:
Are the 32-bit integers signed or unsigned?

Rewriting code to use list comprehensions to sum values in initial list

I have written this code:
rand_map, lst = [2, 2, 6, 6, 8, 11, 4], []
for i in range(len(rand_map)):
num = rand_map[i]
lst.append(num)
for j in range(i+1, len(rand_map)):
assembly = num + rand_map[j]
num += rand_map[j]
lst.append(assembly)
print(sorted(lst))
Which gives this output:
[2, 2, 4, 4, 6, 6, 8, 8, 10, 11, 12, 14, 14, 15, 16, 19, 20, 22, 23, 24, 25, 29, 31, 33, 35, 35, 37, 39]
I've been trying to rewrite this code using list comprehensions, but I don't know how. I have tried multiple ways (standard and itertools) but I just can't get it right. I'll be very grateful for your help!
I came up with a couple of approaches for this problem:
Approach 1 - Vanilla list comprehension
In this approach, we iterate two variables, i and j and calculate the sum of the elements between these two indexes.
Code:
>>> rand_map = [2, 2, 6, 6, 8, 11, 4]
>>> sorted([sum(rand_map[i:i+j+1]) for i in range(len(rand_map)) for j in range(len(rand_map)-i)])
[2, 2, 4, 4, 6, 6, 8, 8, 10, 11, 12, 14, 14, 15, 16, 19, 20, 22, 23, 24, 25, 29, 31, 33, 35, 35, 37, 39]
Approach 2 - Itertools
In this approach, we use the itertools recipe from here to iterate n-wise through the rand_map list, and calculate the sums accordingly. This works in approximately the same way as the first approach, but is a bit tider.
Code:
from itertools import islice
def n_wise(iterable, n):
return zip(*(islice(iterable, i, None) for i in range(n)))
print(sorted([sum(x) for n in range(len(rand_map)) for x in n_wise(rand_map, n+1)]))
Output:
[2, 2, 4, 4, 6, 6, 8, 8, 10, 11, 12, 14, 14, 15, 16, 19, 20, 22, 23, 24, 25, 29, 31, 33, 35, 35, 37, 39]

Issue with printing clusters using igraph

I have a graph g and I want to find the clusters in this graph using igraph, Here's my code:
g = Graph.Read_Ncol('karate.txt', directed=False)
p = g.community_label_propagation()
I tried to print the clusters in 2 ways, first:
print(p)
second:
for idx, cluster in enumerate(p):
print(cluster)
Here's the output of the first one:
[0] 0, 1, 3, 4, 6, 7, 10, 11, 12, 13, 17, 19, 21
[1] 2, 8, 31, 30, 9, 27, 28, 32, 33, 14, 15, 18, 20, 22, 23, 25, 29, 24, 26
[2] 5, 16
and the output of the second one is:
[0, 1, 3, 4, 6, 7, 9, 10, 11, 12, 13, 14, 15]
[2, 8, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]
[5, 22]
I was wondering why the clusters are different in these two outputs.
You are printing different data structures.
In the first,you are printing the entire clustering. Apparently,someone of the igraph authors decided that printing the cluster number is a good idea, too.
In the second case, that would be your responsibility.
Note that the output of
a=[[1,2],[3,4]]
print a
for row in a: print row
is not the same either.

numpy meshgrid operations problems

Y, X = np.mgrid[-3:-3:10j, -3:3:10j]
I've noticed that when applying certain operations on meshgrids like the one above I get an error because the operations may not be compatible with numpy. Sometimes there might be a numpy function alternative for sin, cos but not for all function like the quad function in scipy.integrate.
How do I get around this problem? I need to apply operations on the entire meshgrids.
Your question (with the follow-on comment) can be taken at least two different ways:
You have a function of multiple arguments, and you would like to be able to call that function in a manner that is syntactically similar to the broadcasted calls supported natively by numpy. Performance is not the issue, just the calling syntax of the function.
You have a function of multiple arguments that is to be evaluated on a sequence of numpy arrays, but the function is not implemented in such a manner that it can exploit the contiguous memory layout of numpy arrays. Performance is the issue; you would be happy to loop over the numpy arrays and call your function in a boring, plain old for-loop style, except that doing so is too slow.
For item 1. there is a convenience function provided by numpy called vectorize which takes a regular callable and returns a callable that can be called with numpy arrays as the arguments and will obey numpy's broadcasting rules.
Consider this contrived example:
def my_func(x, y):
return x + 2*y
Now suppose I need to evaluate this function everywhere in a 2-D grid. Here is the plain old boring way:
Y, X = np.mgrid[0:10:1, 0:10:1]
Z = np.zeros_like(Y)
for i in range(Y.shape[0]):
for j in range(Y.shape[1]):
Z[i,j] = my_func(X[i,j], Y[i,j])
If we had a few different functions like my_func, it might be nice to generalize this process into a function that "mapped" a given function over the 2-D arrays.
import itertools
def array_map(some_func, *arg_arrays):
output = np.zeros_like(arg_arrays[0])
coordinates = itertools.imap(range, output.shape)
for coord in itertools.product(coordinates):
args = [arg_array[coord] for arg_array in arg_arrays]
output[coord] = some_func(*args)
return output
Now we can see that array_map(my_func, X, Y) acts just like the nested for-loop:
In [451]: array_map(my_func, X, Y)
Out[451]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
[ 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 8, 9, 10, 11, 12, 13, 14, 15, 16, 17],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
[16, 17, 18, 19, 20, 21, 22, 23, 24, 25],
[18, 19, 20, 21, 22, 23, 24, 25, 26, 27]])
Now, wouldn't it be nice if we could call array_map(my_func) and leave off the extra array arguments? Instead just getting back a new function that was just waiting to do the required for-loops.
We can do this with functools.partial -- so we can write a handy little vectorizer like this:
import functools
def vectorizer(regular_function):
awesome_function = functools.partial(array_map, regular_function)
return awesome_function
and testing it out:
In [453]: my_awesome_func = vectorizer(my_func)
In [454]: my_awesome_func(X, Y)
Out[454]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
[ 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 8, 9, 10, 11, 12, 13, 14, 15, 16, 17],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
[16, 17, 18, 19, 20, 21, 22, 23, 24, 25],
[18, 19, 20, 21, 22, 23, 24, 25, 26, 27]])
Now my_awesome_func behaves as if you are able to call it directly on top of ndarrays!
I've overlooked many extra little performance details, bounds checking, etc., while making this toy version called vectorizer ... but luckily in numpy there is vectorize which already does just this!
In [455]: my_vectorize_func = np.vectorize(my_func)
In [456]: my_vectorize_func(X, Y)
Out[456]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13],
[ 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
[ 8, 9, 10, 11, 12, 13, 14, 15, 16, 17],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
[16, 17, 18, 19, 20, 21, 22, 23, 24, 25],
[18, 19, 20, 21, 22, 23, 24, 25, 26, 27]])
Once again, as stressed in my earlier comments to the OP and in the documentation for vectorize -- this is not a speed optimization. In fact, the extra function calling overhead will be slower in some cases than just writing a for-loop directly. But, for cases when speed is not a problem, this method does allow you to make your custom functions adhere to the same calling conventions as numpy -- which can improve the uniformity of your library's interface and make the code more consistent and more readable.
A whole lot of other stuff has already been written about item 2. If your problem is that you need to optimize your functions to leverage contiguous blocks of memory and by-passing repeated dynamic type checking (the main features that numpy arrays add to Python lists) then here are a few links you may find helpful:
< http://pandas.pydata.org/pandas-docs/stable/enhancingperf.html >
< http://csl.name/C-functions-from-Python/ >
< https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow >
< nbviewer.ipython.org/url/jakevdp.github.io/downloads/notebooks/NumbaCython.ipynb >

Avoid chained numbers when printing lists

The code that I created generate a list with 15 numbers, from combinations, so after sorting it it's possible to see that some sequences comes with a lot of numbers chained like:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 16, 20]
I'm trying to think a way to control it and print only lists with maximum 4 chained numbers:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 16, 20]
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) >>>> 11 Chained numbers: 1 to 11.
So it won't be stored in file.txt.
[1, 2, 3, 6, 7, 8, 9, 11, 12, 16, 17, 18, 19, 22, 23]
(1, 2, 3) >>>> 3 chained, OK
(6, 7, 8, 9) >>>> 4 chained, OK
(11, 12) >>>> 2 chained, OK
(16, 17, 18, 19) >>>> 4 chained, OK
(22,23) 2 chained, OK.
So this list will be stored in the file
Could you guys give me an idea? A light?
Code that I created, it generate a file with all possible combinations of 15 numbers from a list of 25:
import itertools
my_file = open('file.txt', 'w')
ALL_25 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
for subset in itertools.combinations(ALL_25, 15):
sort_subsets = sorted(subset)
my_file.write("{0}\n".format(sort_subsets))
print(sort_subsets)
my_file.close()
If you can convert the chain to its difference between consecutive elements it is easier to identify incremental sequences i.e, [1,2,3,4,7,8] gets converted to [1,1,1,3,1]. Further by converting it into a string it is easier to search for the pattern 111.
import numpy as np
import re
def validate(seq):
stl = "".join(np.diff(seq).astype(str))
for x in re.findall("[1]+",stl):
if len(x)>3:
return False
return True
print validate([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 16, 20])
print validate([1, 2, 3, 6, 7, 8, 9, 11, 12, 16, 17, 18, 19, 22, 23])
output
False
True

Categories

Resources