Creating a nested list referencing specific ranges - python

I was challenged by a friend to make a simple program that asks a user to input a maximum value, and then a sample size (n). It then just uses randint to create a histogram in shell using ascii characters.
I can establish the class width and boundaries very easily. Where I'm having trouble is in understanding and implementing some sort of algorithm that will append all numbers that fall within a specific class to the histogram list to be printed. For example, if I have:
sample = [5, 1, 3, 9, 7, 13, 12, 5]
class_boundaries = [(1, 4), (4, 7), (7, 10), (10, 14)]
histogram = []
I just need to make a function that appends the sample values in the position that they would belong to in reference to the class boundaries. So for example, histogram[0] should return [1, 3]. I've been doing my best to try different solutions and understand how for-loop algorithms or list comprehensions function, but a practical explanation to my problem would be really helpful in my quest to better understand how to program. Thank you in advance!

sample = [5, 1, 3, 9, 7, 13, 12, 5]
class_boundaries = [(1, 4), (4, 7), (7, 10), (10, 14)]
classified = [[X for X in sample if LO <= X <= HI] for LO,HI in class_boundaries]
counts = [sum(LO <= X <= HI for X in sample) for LO,HI in class_boundaries]
Result: classified = [[1, 3], [5, 7, 5], [9, 7], [13, 12]], counts = [2, 3, 2, 2]
The computation of the counts doesn't need classified, so if thats all you need, skip the classified step.

Related

I like to eliminate duplicate components from double list, and then combine the double lists

I need your help to figure out this problem.
when I use this code, I can get this result.
input=[[1],[1,2],[5,7]]
output=[[1,2],[5,7]]
Tb2 = Tb[:1]
for t in Tb[1:]:
if set(t).isdisjoint(Tb2[-1]):
Tb2.append(t)
else:
Tb2[-1] = sorted({*t,*Tb2[-1]})
But I can't solve the problem when another second list have same number in input.
input=[[2,3],[1,2],[5,7],[5,8],[7,8,9],[1]]
expected output=[[1,2,3],[5,7,8,9]]
Would you give me advice or help?
This seems like connected components problem from graph theory, so you could use networkx for it:
import networkx as nx
from itertools import combinations
# input
lst = [[2, 3], [1, 2], [5, 7], [5, 8], [7, 8, 9], [1]]
# build graph
g = nx.Graph()
g.add_edges_from([edge for ls in lst for edge in combinations(ls, 2)])
# compute components
components = nx.algorithms.components.connected_components(g)
res = list(components)
print(res)
Output
[{1, 2, 3}, {8, 9, 5, 7}]
The idea is to build an edge between each pair of elements of the same list, this is achieved with the following list comprehension:
[edge for ls in lst for edge in combinations(ls, 2)]
# [(2, 3), (1, 2), (5, 7), (5, 8), (7, 8), (7, 9), (8, 9)]
Once that is done simply run the connected_components algorithm on the graph.
for t in Tb:
for j in range(len(Tb2)):
if set(t).isdisjoint(Tb2[j]):
Tb2.append(t)
else:
Tb2[j] = sorted({*t,*Tb2[j]})
Output= [[1, 2, 3], [5, 7, 8, 9], [5, 7, 8, 9], [7, 8, 9], [1], [1], [1]]
I can't eliminate same and small size of double lists.

Detect ranges in Python

I'm trying to solve this exercise in my coursework:
Create a function named detect_ranges that gets a list of integers as a parameter.
The function should then sort this list, and transform the list into another list where pairs are used for all the detected intervals.
So 3,4,5,6 is replaced by the pair (3,7).
Numbers that are not part of any interval result just single numbers.
The resulting list consists of these numbers and pairs, separated by commas. An example of how this function works:
print(detect_ranges([2,5,4,8,12,6,7,10,13]))
[2,(4,9),10,(12,14)]
I couldn't comprehend the exercise topic and can't think of how I can detect range. Do you guys have any hints or tips?
Another way of doing this. Although this method will not be as efficient as the other one, but since its an exercise, it will be easier to follow.
I have used zip function in python to do some stuff I explained below, you can check it here to know more about it.
1. First sort the list data, so you get: [2, 4, 5, 6, 7, 8, 10, 12, 13]
2. Then find the differences of increasing values in list. Like (4-2),(5-4), .. If the difference is <=1, then it will be part of a range:
(Also, insert a 0 in the front, just to account for the 1st element and make the obtained list's length equal to original list)
>>> diff = [j-i for i, j in zip(lst[:-1], lst[1:])]
>>> diff.insert(0, 0)
>>> diff
[0, 2, 1, 1, 1, 1, 2, 2, 1]
3. Now get positions in above list where difference is >= 2. This is to detect the ranges:
(Again, insert a 0 in the front, just to account for the 1st element, and make sure it gets picked in range detection)
>>> ind = [i for i,v in enumerate(diff) if v >= 2]
>>> ind.insert(0, 0)
>>> ind
[0, 1, 6, 7]
So the ranges are 0 to 1, 1 to 6, and 6 to 7 in your original list.
4. Group the elements together that will form ranges, using the ind list obtained:
>>> groups = [lst[i:j] for i,j in zip(ind, ind[1:]+[None])]
>>> groups
[[2], [4, 5, 6, 7, 8], [10], [12, 13]]
5. Finally obtain your desired ranges:
>>> ranges = [(i[0],i[-1]+1) if len(i)>1 else i[0] for i in groups]
>>> ranges
[2, (4, 9), 10, (12, 14)]
Putting it all in a function detect_ranges:
def detect_ranges(lst):
lst = sorted(lst)
diff = [j-i for i, j in zip(lst[:-1], lst[1:])]
diff.insert(0, 0)
ind = [i for i,v in enumerate(diff) if v >= 2]
ind.insert(0, 0)
groups = [lst[i:j] for i,j in zip(ind, ind[1:]+[None])]
ranges = [(i[0],i[-1]+1) if len(i)>1 else i[0] for i in groups]
return ranges
Examples:
>>> lst = [2,6,1,9,3,7,12,45,46,13,90,14,92]
>>> detect_ranges(lst)
[(1, 4), (6, 8), 9, (12, 15), (45, 47), 90, 92]
>>> lst = [12,43,43,11,4,3,6,6,9,9,10,78,32,23,22,98]
>>> detect_ranges(lst)
[(3, 5), (6, 7), (9, 13), (22, 24), 32, (43, 44), 78, 98]
Iterate through the elements and save the start of each interval.
def detect_ranges(xs):
it = iter(xs)
try:
start = next(it)
except StopIteration:
return
prev = start
for x in it:
if prev + 1 != x:
yield start, prev + 1
start = x
prev = x
yield start, prev + 1
Usage:
>>> xs = [2, 4, 5, 6, 7, 8, 10, 12, 13]
>>> ranges = list(detect_ranges(xs))
>>> ranges
[(2, 3), (4, 9), (10, 11), (12, 14)]
If you want to reduce single item intervals like (2, 3) to 2, you can do:
>>> ranges = [a if a + 1 == b else (a, b) for a, b in ranges]
>>> ranges
[2, (4, 9), 10, (12, 14)]

generate a list of permutations that preserve a given partitioning (context: Graph Isomorphism)

I'm working on a python program that tests two given networkx graphs G and H for an isomorphism by using a brute force method. Each node in each graph has been assigned a label and color attribute, and the program should test all possible bijections between graph G, for which the labeling is fixed, and graph H, for which the labeling can be varied. In addition, i only need to examine the bijections which make sure that for a given node color 'i' in graph G is mapped onto a node in H which also has color 'i'. To that end, i've created a class which inherits all the methods/attributes from a nx.Graph, and written several methods of my own.
So far what I've done is gone through both graphs, and created a dictionary which gives the possible valid mappings for each node in G onto H.
e.g. for the graph G == 1-2-3
the coloring would be: color_g = {1: 1, 2: 2, 3:1} because '1' and '3' have the same degree, and 2 has a different degree.
if graph H == 2-1-3 then
color_h = {2:1, 1: 2, 3:1}
and when i run a group_by_color function to give possible mappings from G to H i would get the following dictionary
map = {1: [2, 3], 2: [1], 3:[2, 3]}
what that means is that due to the color partitioning node '1' from G could be mapped onto either '2' or '3' from H, '2' from G can only be mapped onto '1' from H, and so on.
Here is the problem: I am struggling to generate a list of all valid permutations from G to H that preserve the partitioning given by coloring, and am struggling to think of how to do it. I am well aware of python's permutations function and in a previous version of the brute force method where i didn't consider the color, which meant the list of permutations was significantly larger (and the run-time much slower), but the implimentation also easier. Now i want to speed things up by only considering permutations which are possible according to the given colorings.
the question: How can i take my map dictionary, and use it to generate the bijection functions that are color-conserving/preserving (german: 'farbe-erhaltend')? Or would you suggest a different method all together?
some other facts:
the nodes in both graphs are labled consecutively and ascending
the 'colors' i'm using are numbers because the graphs can become arbitrarily large.
I'd be grateful for any help,
ItsAnApe
To answer the algorithmic part of your question: Say your partition has k cells: C_1, ..., C_k. There is a 1 to 1 correspondence between permutations of the overall set that preserve the partition and the Cartesian product P_1 x P_2 x ... x P_k where P_i is the set of permutations of the cell C_i. itertools contains a method for generating the Cartesian product. Exactly how you want to use a tuple like (p_1, p_2, ..., p_k) where each p_i is a permutation of cell C_i depends on your purposes. You could write a function to combine them into a single permutation if you want -- or just iterate over them if you are going to be using the permutations on a cell by cell basis anyway.
The following is proof of concept. It represents a partition as a list of tuples, where each tuple represents a cell, and it lists all permutations of the overall set which preserves the partition. In the test case it lists the 2x6x2 = 24 permutations of {1,2,3,4,5,6,7} which preserve the partition [(1,4), (2,3,5),(6,7)]. No need to step through and filter all 7! = 5040 permutations:
#list_perms.py
import itertools
def combinePermutations(perms):
a = min(min(p) for p in perms)
b = max(max(p) for p in perms)
d = {i:i for i in range(a,b+1)}
for p in perms:
pairs = zip(sorted(p),p)
for i,j in pairs:
d[i] = j
return tuple(d[i] for i in range(a,b+1))
def permute(cell):
return [p for p in itertools.permutations(cell)]
def listGoodPerms(cells):
products = itertools.product(*(permute(cell) for cell in cells))
return [combinePermutations(perms) for perms in products]
#to test:
myCells = [(1,4), (2,3,5), (6,7)]
for p in listGoodPerms(myCells): print(p)
The output when the module is run:
(1, 2, 3, 4, 5, 6, 7)
(1, 2, 3, 4, 5, 7, 6)
(1, 2, 5, 4, 3, 6, 7)
(1, 2, 5, 4, 3, 7, 6)
(1, 3, 2, 4, 5, 6, 7)
(1, 3, 2, 4, 5, 7, 6)
(1, 3, 5, 4, 2, 6, 7)
(1, 3, 5, 4, 2, 7, 6)
(1, 5, 2, 4, 3, 6, 7)
(1, 5, 2, 4, 3, 7, 6)
(1, 5, 3, 4, 2, 6, 7)
(1, 5, 3, 4, 2, 7, 6)
(4, 2, 3, 1, 5, 6, 7)
(4, 2, 3, 1, 5, 7, 6)
(4, 2, 5, 1, 3, 6, 7)
(4, 2, 5, 1, 3, 7, 6)
(4, 3, 2, 1, 5, 6, 7)
(4, 3, 2, 1, 5, 7, 6)
(4, 3, 5, 1, 2, 6, 7)
(4, 3, 5, 1, 2, 7, 6)
(4, 5, 2, 1, 3, 6, 7)
(4, 5, 2, 1, 3, 7, 6)
(4, 5, 3, 1, 2, 6, 7)
(4, 5, 3, 1, 2, 7, 6)

unexpected result sorting list with mixed items tuples and lists

tmp = [
(1, 2, 3),
(4, 5, 6),
[7, 8, 9],
[10, 11, 12],
]
print tmp
tmp.sort()
print tmp
results in:
[(1, 2, 3), (4, 5, 6), [7, 8, 9], [10, 11, 12]]
[[7, 8, 9], [10, 11, 12], (1, 2, 3), (4, 5, 6)]
Apparently lists get precedence over tuples.
Is this correct?
In Python 2,
In the documentation https://docs.python.org/2/reference/expressions.html#not-in,
Most other objects of built-in types compare unequal unless they are the same object; the choice whether one object is considered smaller or larger than another one is made arbitrarily but consistently within one execution of a program.
But I believe it's implementation independent:
Objects of different types except numbers are ordered by their type
names; objects of the same types that don’t support proper comparison
are ordered by their address.
In Python 3, this is fixed, so that comparing tuples and lists gives
TypeError: unorderable types: tuple() > list().

Algorithm to find the least difference between lists

I have been trying to understand the algorithm used here to compare two lists, implemented in this commit. The intention, as I understood, is to find the least amount of changes to create dst from src. These changes are later listed as sequence of patch commands. I am not a python developer, and learned generators to understand the flow and how recursion is done. but, now I can't make much sense out of the output generated by the _split_by_common_seq method. I fed a few different lists, and the output is shown below. Can you please help me to understand why the output is like it is in these cases.
in the reference case,
src [0, 1, 2, 3]
dst [1, 2, 4, 5]
[[(0, 1), None], [(3, 4), (2, 4)]]
I cannot see how it is related to the the picture in the doc. Why (3,4) and (2,4) on the right? Is it a standard algorithm?
test cases
src [1, 2, 3]
dst [1, 2, 3, 4, 5, 6, 7, 8]
[[None, None], [None, (3, 8)]]
src [1, 2, 3, 4, 5]
dst [1, 2, 3, 4, 5, 6, 7, 8]
[[None, None], [None, (5, 8)]]
src [4, 5]
dst [1, 2, 3, 4, 5, 6, 7, 8]
[[None, (0, 3)], [None, (5, 8)]]
src [0, 1, 2, 3]
dst [1, 2, 4, 5]
[[(0, 1), None], [(3, 4), (2, 4)]]
src [0, 1, 2, 3]
dst [1, 2, 3, 4, 5]
[[(0, 1), None], [None, (3, 5)]]
src [0, 1, 3]
dst [1, 2, 4, 5]
[[(0, 1), None], [(2, 3), (1, 4)]]
For future reference, here's the code (taken from the aforementioned repository):
import itertools
def _longest_common_subseq(src, dst):
"""Returns pair of ranges of longest common subsequence for the `src`
and `dst` lists.
>>> src = [1, 2, 3, 4]
>>> dst = [0, 1, 2, 3, 5]
>>> # The longest common subsequence for these lists is [1, 2, 3]
... # which is located at (0, 3) index range for src list and (1, 4) for
... # dst one. Tuple of these ranges we should get back.
... assert ((0, 3), (1, 4)) == _longest_common_subseq(src, dst)
"""
lsrc, ldst = len(src), len(dst)
drange = list(range(ldst))
matrix = [[0] * ldst for _ in range(lsrc)]
z = 0 # length of the longest subsequence
range_src, range_dst = None, None
for i, j in itertools.product(range(lsrc), drange):
if src[i] == dst[j]:
if i == 0 or j == 0:
matrix[i][j] = 1
else:
matrix[i][j] = matrix[i-1][j-1] + 1
if matrix[i][j] > z:
z = matrix[i][j]
if matrix[i][j] == z:
range_src = (i-z+1, i+1)
range_dst = (j-z+1, j+1)
else:
matrix[i][j] = 0
return range_src, range_dst
def split_by_common_seq(src, dst, bx=(0, -1), by=(0, -1)):
"""Recursively splits the `dst` list onto two parts: left and right.
The left part contains differences on left from common subsequence,
same as the right part by for other side.
To easily understand the process let's take two lists: [0, 1, 2, 3] as
`src` and [1, 2, 4, 5] for `dst`. If we've tried to generate the binary tree
where nodes are common subsequence for both lists, leaves on the left
side are subsequence for `src` list and leaves on the right one for `dst`,
our tree would looks like::
[1, 2]
/ \
[0] []
/ \
[3] [4, 5]
This function generate the similar structure as flat tree, but without
nodes with common subsequences - since we're don't need them - only with
left and right leaves::
[]
/ \
[0] []
/ \
[3] [4, 5]
The `bx` is the absolute range for currently processed subsequence of
`src` list. The `by` means the same, but for the `dst` list.
"""
# Prevent useless comparisons in future
bx = bx if bx[0] != bx[1] else None
by = by if by[0] != by[1] else None
if not src:
return [None, by]
elif not dst:
return [bx, None]
# note that these ranges are relative for processed sublists
x, y = _longest_common_subseq(src, dst)
if x is None or y is None: # no more any common subsequence
return [bx, by]
return [split_by_common_seq(src[:x[0]], dst[:y[0]],
(bx[0], bx[0] + x[0]),
(by[0], by[0] + y[0])),
split_by_common_seq(src[x[1]:], dst[y[1]:],
(bx[0] + x[1], bx[0] + len(src)),
(bx[0] + y[1], bx[0] + len(dst)))]
It is a cute algorithm, but I don't think it's a "known" one. It's a clever way of comparing lists, and probably not the first time that someone thought of it, but I had never seen it before.
Basically, the output is telling you the ranges that look different in src and dst.
The function always returns a list with 2 lists. The first list refers to the elements in src and dst that are on the left side of the longest common subsequence between src and dst; the second refers to the elements that are on the right side of the longest common subsequence. Each of these lists holds a pair of tuples. Tuples represent a range in the list - (x, y) denotes the elements you would get if you performed lst[x:y]. From this pair of tuples, the first tuple is the range from src, the second tuple is the range from dst.
At each step, the algorithm computes the ranges of src and dst that are to the left of the longest common subsequence and to the right of the longest common subsequence between src and dst.
Let's look at your first example to clear things up:
src [0, 1, 2, 3]
dst [1, 2, 4, 5]
The longest common subsequence between src and dst is [1, 2]. In src, the range (0, 1) defines the elements that are immediately to the left of [1, 2]; in dst, that range is empty, because there is nothing before [1, 2]. So, the first list will be [(0, 1), None].
To the right of [1, 2], in src, we have the elements in the range (3, 4), and in dst we have 4 and 5, which are represented by the range (2, 4). So the second list will be [(3, 4), (2, 4)].
And there you go:
[[(0, 1), None], [(3, 4), (2, 4)]]
How does this relate to the tree in the comments?
The leafs in the tree are using a different notation: instead of a tuple describing a range, the actual elements on that range are shown. In fact, [0] is the only element in the range (0, 1) in src. The same applies for the rest.
Once you get this, the other examples you posted should be pretty easy to follow. But note that the output can become more complex if there is more than one common subsequence: the algorithm finds every common subsequences in nonincreasing order; since each invocation returns a list with 2 elements, this means that you will get nested lists in cases like these. Consider:
src = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
dst = [46, 1, 2, 3, 4, 5, 99, 98, 97, 5, 6, 7, 30, 31, 32, 11, 12, 956]
This outputs:
[[(0, 1), (0, 1)], [[[None, (6, 10)], [(8, 11), (12, 15)]], [(13, 14), (17, 18)]]]
The second list is nested because there was more than one recursion level (your previous examples immediately fell on a base case).
The explanation shown before applies recursively to each list: the second list in [[(0, 1), (0, 1)], [[[None, (6, 10)], [(8, 11), (12, 15)]], [(13, 14), (17, 18)]]] shows the differences in the lists to the right of the longest common subsequence.
The longest common subsequence is [1, 2, 3, 4, 5]. To the left of [1, 2, 3, 4, 5], both lists are different in the first element (the ranges are equal and easy to check).
Now, the procedure applies recursively. For the right side, there is a new recursive call, and src and dst become:
src = [6, 7, 8, 9, 10, 11, 12, 13]
dst = [99, 98, 97, 5, 6, 7, 30, 31, 32, 11, 12, 956]
# LCS = [6, 7]; Call on the left
src = []
dst = [99, 98, 97, 5]
# LCS = [6, 7]; Call on the right
src = [8, 9, 10, 11, 12, 13]
dst = [30, 31, 32, 11, 12, 956]
# LCS = [11, 12]; Call on the left
src = [8, 9, 10]
dst = [30, 31, 32]
# LCS = [11, 12]; Call on the right
src = [13]
dst = [956]
The longest common subsequence is [6, 7]. Then you will have another recursive call
on the left, for src = [] and dst = [99, 98, 97, 5], now there is no longest common subsequence and the recursion on this side stops (just follow the picture).
Each nested list recursively represents the differences on the sub-list with which the procedure was invoked, but note that the indices always refer to positions in the original list (due to the way arguments for bx and by are passed - note that they always accumulate since the beginning).
The key point here is that you will get nested lists linearly proportional to the depth of the recursion, and in fact, you can actually tell how many common subsequences exist in the original lists just by looking at the nesting level.

Categories

Resources