Create tasks from lists - Airflow

Create tasks from lists - Airflow - python

I have a requirement where I have a list of lists like
[[a,b,c,d],[e,f,g,h],[i,j,k,l]]
Now I want to create tasks in DAG like below
a >> b >> c >> d
e >> f >> g >> h
i >> j >> k >> l
Any help is appreciated.

You can use the handy chain() function to do this in 1 line.
from airflow.models.baseoperator import chain
[a,b,c,d,e,f,g,h,i,j,k,l] = [DummyOperator(task_id=f"{i}") for i in "abcdefghijkl"]
chain([a,e,i], [b,f,j], [c,g,k], [d,h,l])

Assuming a,b,c,... are operators - the below should do the job (mocking airflow operator)
class Operator:
def __init__(self, name):
self.name = name
def set_downstream(self, other):
print(f'{self}: setting {other} as downstream')
def __str__(self) -> str:
return self.name
a = Operator('a')
b = Operator('b')
c = Operator('c')
d = Operator('d')
e = Operator('e')
f = Operator('f')
lst = [[a, b, c], [e, f, d]]
for oper_lst in lst:
for i in range(0, len(oper_lst) - 1):
oper_lst[i].set_downstream(oper_lst[i + 1])
output
a: setting b as downstream
b: setting c as downstream
e: setting f as downstream
f: setting d as downstream

op_lists = [[a,b,c,d],[e,f,g,h],[i,j,k,l]]
for op_list in op_lists:
for i in range(len(op_list) - 1):
op_list[i] >> op_list[i + 1]
EDIT: I didn't see balderman's answer. He was first

With a,...,l being nodes and >> describing the arcs, we can write the nested list as a dictionary and then use this object for a directed acyclic graph. Depending on your data (e.g) the nested lists and the connections between them you can adapt the code. In the example above, we have three lists converting to three arcs. One could the list object to a dictionary like this:
graph = [["a","b","c","d"],["e","f","g","h"],["i","j","k","l"]]
dic_list = []
z = {}
for i in range(len(graph)):
b = dict(zip(graph[i][::1], list(graph[i][1::1])))
dic_list.append(b)
z = {**z, **dic_list[i]}
and then use this standard code from the python documentation to build a DAG out of it like this:
def find_all_paths(graph, start, end, path=[]):
path = path + [start]
if start == end:
return [path]
if not graph.has_key(start):
return []
paths = []
for node in graph[start]:
if node not in path:
newpaths = find_all_paths(graph, node, end, path)
for newpath in newpaths:
paths.append(newpath)
return paths
Does this answer your question?

Related

Split two lists of lists into subgraphs using python

I have a network as a list of lists, where the first list is the origin nodes and the second list is the destination nodes, and then the two lists combined tell you which origins have an edge to which destinations.
So essentially I have this:
edge_index = [[0,1,2,3,5,6,5,9,10,11,12,12,13],[1,2,3,4,6,7,8,10,11,10,13,12,9]]
And I want to split this list structure into:
[[0,1,2,3,5,6,5],[9,10,11,12,12,13]]
[[1,2,3,4,6,7,8],[10,11,10,13,12,9]]
i.e. there is no link between 8 and 9, so it's a new subgraph.
I cannot use networkx because it does not seem to give me the right number of subgraphs (I know how many networks there should be in advance). So I wanted to subgraph the list using a different method, and then see if I get the same number as NetworkX or not.
I wrote this code:
edge_index = [[0,1,2,3,5,6,5],[1,2,3,4,6,7,8]]
origins_split = edge_index[0]
dest_split = edge_index[1]
master_list_of_all_graph_nodes = [0,1,2,3,4,5,6,7,8] ##for testing
list_of_graph_nodes = []
list_of_origin_edges = []
list_of_dest_edges = []
graph_nodes = []
graph_edge_origin = []
graph_edge_dest = []
targets_list = []
for o,d in zip(origins_split,dest_split): #change
if o not in master_list_of_all_graph_nodes:
if d not in master_list_of_all_graph_nodes:
nodes = [o,d]
origin = [o]
dest = [d]
graph_nodes.append(nodes)
graph_edge_origin.append(origin)
graph_edge_dest.append(dest)
elif d in master_list_of_all_graph_nodes:
for index,graph_node_list in enumerate(graph_nodes):
if d in graph_node_list:
origin_list = graph_edge_origin[index]
origin_list.append(o)
dest_list.append(d)
master_list_of_all_graph_nodes.append(o)
if d not in master_list_of_all_graph_nodes:
if o in master_list_of_all_graph_nodes:
for index,graph_node_list in enumerate(graph_nodes):
if o in graph_node_list:
origin_list = graph_edge_origin[index]
origin_list.append(o)
dest_list.append(d)
master_list_of_all_graph_nodes.append(d)
if o in master_list_of_all_graph_nodes:
if d in master_list_of_all_graph_nodes:
o_index = ''
d_index = ''
for index,graph_node_list in enumerate(graph_nodes):
if d in graph_node_list:
d_index = index
if o in graph_node_list:
o_index = index
if o_index == d_index:
graph_edge_origin[o_index].append(o)
graph_edge_dest[d_index].append(d)
master_list_of_all_graph_nodes.append(o)
master_list_of_all_graph_nodes.append(d)
else:
o_list = graph_edge_origin[o_index]
d_list = graph_edge_dest[d_index]
node_o_list = node_list[o_index]
node_d_list = node_list[d_index]
new_node_list = node_o_list + node_d_list
node_list.remove(node_o_list)
node_list.remove(node_d_list)
graph_edge_origin.remove(o_list)
graph_edge_dest.remove(d_list)
new_origin_list = o_list.append(o)
new_dest_list = d_list.append(d)
graph_nodes.append(new_node_list)
graph_edge_dest.append(new_dest_list)
graph_edge_origin.append(new_origin_list)
master_list_of_all_graph_nodes.append(o)
master_list_of_all_graph_nodes.append(d)
print(graph_nodes)
print(graph_edge_dest)
print(graph_edge_origin)
And i get the error:
graph_edge_origin[o_index].append(o)
TypeError: list indices must be integers or slices, not str
I was wondering if someone could demonstrate where I'm going wrong, but also I feel like I'm doing this really inefficiently so if someone could demonstrate a better method I'd appreciate it. I can see other questions like this, but not one I can specifically figure out how to apply here.

In this line:
graph_edge_origin[o_index].append(o)
o_index is a string (probably the empty string, due to the for-loop not being entered).
In general either set a break-point on the line that is failing and inspect the variables in your debugger, or print out the variables before the failing line.

Merging overlapping string sequences in a list

I am trying to figure out how to merge overlapping strings in a list together, for example for
['aacc','accb','ccbe']
I would get
['aaccbe']
This following code works for the example above, however it does not provide me with the desired result in the following case:
s = ['TGT','GTT','TTC','TCC','CCC','CCT','CCT','CTG','TGA','GAA','AAG','AGC','GCG','CGT','TGC','GCT','CTC','TCT','CTT','TTT','TTT','TTC','TCA','CAT','ATG','TGG','GGA','GAT','ATC','TCT','CTA','TAT','ATG','TGA','GAT','ATT','TTC']
a = s[0]
b = s[-1]
final_s = a[:a.index(b[0])]+b
print(final_s)
>>>TTC
My output is clearly not right, and I don't know why it doesn't work in this case. Note that I have already organized the list with the overlapping strings next to each other.

You can use a trie to storing the running substrings and more efficiently determine overlap. When the possibility of an overlap occurs (i.e for an input string, there exists a string in the trie with a letter that starts or ends the input string), a breadth-first search to find the largest possible overlap takes place, and then the remaining bits of string are added to the trie:
from collections import deque
#trie node (which stores a single letter) class definition
class Node:
def __init__(self, e, p = None):
self.e, self.p, self.c = e, p, []
def add_s(self, s):
if s:
self.c.append(self.__class__(s[0], self).add_s(s[1:]))
return self
class Trie:
def __init__(self):
self.c = []
def last_node(self, n):
return n if not n.c else self.last_node(n.c[0])
def get_s(self, c, ls):
#for an input string, find a letter in the trie that the string starts or ends with.
for i in c:
if i.e in ls:
yield i
yield from self.get_s(i.c, ls)
def add_string(self, s):
q, d = deque([j for i in self.get_s(self.c, (s[0], s[-1])) for j in [(s, i, 0), (s, i, -1)]]), []
while q:
if (w:=q.popleft())[1] is None:
d.append((w[0] if not w[0] else w[0][1:], w[2], w[-1]))
elif w[0] and w[1].e == w[0][w[-1]]:
if not w[-1]:
if not w[1].c:
d.append((w[0][1:], w[1], w[-1]))
else:
q.extend([(w[0][1:], i, 0) for i in w[1].c])
else:
q.append((w[0][:-1], w[1].p, w[1], -1))
if not (d:={a:b for a, *b in d}):
self.c.append(Node(s[0]).add_s(s[1:]))
elif (m:=min(d, key=len)):
if not d[m][-1]:
d[m][0].add_s(m)
else:
t = Node(m[0]).add_s(m)
d[m][0].p = self.last_node(t)
Putting it all together
t = Trie()
for i in ['aacc','accb','ccbe']:
t.add_string(i)
def overlaps(trie, c = ''):
if not trie.c:
yield c+trie.e
else:
yield from [j for k in trie.c for j in overlaps(k, c+trie.e)]
r = [j for k in t.c for j in overlaps(k)]
Output:
['aaccbe']

Use difflib.find_longest_match to find the overlap and concatenate appropriately, then use reduce to apply the entire list.
import difflib
from functools import reduce
def overlap(s1, s2):
# https://stackoverflow.com/a/14128905/4001592
s = difflib.SequenceMatcher(None, s1, s2)
pos_a, pos_b, size = s.find_longest_match(0, len(s1), 0, len(s2))
return s1[:pos_a] + s2[pos_b:]
s = ['aacc','accb','ccbe']
result = reduce(overlap, s, "")
print(result)
Output
aaccbe

Substitute list of paths with unique elements at every level (/)

I have a list of path strings ( which can form a tree structure ) like below:
xo = ['1/sometext1',
'1/1/sometext2',
'1/1/1/sometext3',
'1/1/2/sometext4',
'1/2/sometext5',
'1/2/1/sometext6',
'1/2/2/sometext7',
'2/sometext8',
'3/sometext9']
I want to convert above list into a form like below with unique numbers specific to every level. So that there will be proper differentiation between 1's in ('1/', '1/1/','1/1/1/') and 2's in ('1/1/2/','1/2/','1/2/1/','1/2/2/','2/').
xd = ['123/sometext1',
'123/1234/sometext2',
'123/1234/12345/sometext3',
'123/1234/234/sometext4',
'123/2345/sometext5',
'123/2345/123456/sometext6',
'123/2345/23456/sometext7',
'234567/sometext8',
'3456/sometext9']
The unique values are just for example and can be any unique strings.

This example will add depth number to each level:
import re
xo = [
"1/sometext1",
"1/1/sometext2",
"1/1/1/sometext3",
"1/1/2/sometext4",
"1/2/sometext5",
"1/2/1/sometext6",
"1/2/2/sometext7",
"2/sometext8",
"3/sometext9",
]
pat = re.compile(r"((?:\d+/)+)(.*)")
out = []
for s in xo:
nums, rest = pat.match(s).groups()
nums = "/".join(f"{i}-{n}" for i, n in enumerate(nums.split("/"), 1) if n)
out.append(nums + "/" + rest)
print(out)
Prints:
[
"1-1/sometext1",
"1-1/2-1/sometext2",
"1-1/2-1/3-1/sometext3",
"1-1/2-1/3-2/sometext4",
"1-1/2-2/sometext5",
"1-1/2-2/3-1/sometext6",
"1-1/2-2/3-2/sometext7",
"1-2/sometext8",
"1-3/sometext9",
]
EDIT: Modified example:
import re
xo = [
"1/sometext1",
"1/1/sometext2",
"1/1/1/sometext3",
"1/1/2/sometext4",
"1/2/sometext5",
"1/2/1/sometext6",
"1/2/2/sometext7",
"2/sometext8",
"3/sometext9",
]
pat = re.compile(r"((?:\d+/)+)(.*)")
out = []
for s in xo:
nums, rest = pat.match(s).groups()
tmp = [n for n in nums.split("/") if n]
nums = "/".join(f"{'.'.join(tmp[:i])}" for i in range(1, len(tmp) + 1))
out.append(nums + "/" + rest)
print(out)
Prints:
[
"1/sometext1",
"1/1.1/sometext2",
"1/1.1/1.1.1/sometext3",
"1/1.1/1.1.2/sometext4",
"1/1.2/sometext5",
"1/1.2/1.2.1/sometext6",
"1/1.2/1.2.2/sometext7",
"2/sometext8",
"3/sometext9",
]

This code below will, for every path component, generate a unique corresponding number for that specific value:
from collections import defaultdict
import random, string
class UniquePaths:
def __init__(self):
self.paths = []
def new_path(self):
while (p:=''.join(random.choice(string.digits) for _ in range(random.randint(3, 10)))) in self.paths:
pass
self.paths.append(p)
return p
def build_results(self, d, new_p = []):
_d = defaultdict(list)
for i in d:
if len(i) == 1:
yield '/'.join(new_p)+'/'+i[0]
else:
_d[i[0]].append([*i[1:-1], i[-1]])
yield from [j for b in _d.values() for j in self.build_results(b, new_p+[self.new_path()])]
#classmethod
def to_unique(cls, paths):
return list(cls().build_results([i.split('/') for i in paths]))
xo = ['1/sometext1', '1/1/sometext2', '1/1/1/sometext3', '1/1/2/sometext4', '1/2/sometext5', '1/2/1/sometext6', '1/2/2/sometext7', '2/sometext8', '3/sometext9']
new_paths = UniquePaths.to_unique(xo)
Output:
['987498/sometext1',
'987498/3886405008/sometext2',
'987498/3886405008/4380239/sometext3',
'987498/3886405008/0407507/sometext4',
'987498/984618899/sometext5',
'987498/984618899/89110/sometext6',
'987498/984618899/45767633/sometext7',
'50264/sometext8',
'768/sometext9']
The solution above does not base the unique value generation on the original component values themselves, thus removing any possibility of producing a non unique path component, and instead randomly generates strings of varying lengths.

Appending nodes of tree to a python list

I have a tree like this:
A
/ | \
B C D
/ \ | / | \
E F G H I J
and I am trying to append the nodes of the tree to an empty list
such that the list looks like:
[[A], [B, C, D], [E, F, G, H, I, J]]
Suppose that I have a root_node A and I don't know how deep my tree is.
How can I append nodes from the tree to an empty list in the above-mentioned format?
I tried breadth first search, but my list length is way longer than than the
depth of the tree.

Append each new depth of nodes as a new list.
Start with an empty list: tree = []
Create a new inner list for the current depth
Append each element at the current depth in the list: tree[depth].append(element)
Go to the next depth and repeat

Given a Node implementation like:
class Node:
def __init__(self, name):
self.name = name
self.children = []
def __repr__(self):
return f'Node({self.name})'
You can create your nodes and arrange a graph with:
nodes = {letter: Node(letter) for letter in 'ABCDEFGHIJ'}
nodes['A'].children.extend([nodes['B'], nodes['C'], nodes['D']])
nodes['B'].children.extend([nodes['E'], nodes['F']])
nodes['C'].children.extend([nodes['G']])
nodes['D'].children.extend([nodes['H'], nodes['I'], nodes['J']])
Now you can start with the root node, and continually make a new list of nodes until you run out with a simple generator:
def make_lists(root):
current = [root]
while current:
yield current
current = [c for n in current for c in n.children]
list(make_lists(nodes['A']))
The while loop will end when there are no more children, resulting in:
[[Node(A)],
[Node(B), Node(C), Node(D)],
[Node(E), Node(F), Node(G), Node(H), Node(I), Node(J)]]

Disjoint Paths Algorithm

What is the simplest way to count find augmenting paths?
Counting Edge-Disjoint Paths Using Labeling Traversal to Find Augmenting Paths
def paths(G, s, t): # Edge-disjoint path coun
H, M, count = tr(G), set(), 0 # Transpose, matching, result
while True: # Until the function returns
Q, P = {s}, {} # Traversal queue + tree
while Q: # Discovered, unvisited
u = Q.pop() # Get one
if u == t: # Augmenting path!
count += 1 # That means one more path
break # End the traversal
forw = (v for v in G[u] if (u,v) not in M) # Possible new edges
back = (v for v in H[u] if (v,u) in M) # Cancellations
for v in chain(forw, back): # Along out- and in-edges
if v in P: continue # Already visited? Ignore
P[v] = u # Traversal predecessor
Q.add(v) # New node discovered
else: # Didn't reach t?
return count # We're donefinnish
Can i use I while loop to finnish and how?

I trid this and it worked!
while u != s:
u, v = P[u], u
if v in G[u]:
M.add((u,v))
else:
M.remove((v,u))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create tasks from lists - Airflow - python

I have a requirement where I have a list of lists like [[a,b,c,d],[e,f,g,h],[i,j,k,l]] Now I want to create tasks in DAG like below a >> b >> c >> d e >> f >> g >> h i >> j >> k >> l Any help is appreciated.

You can use the handy chain() function to do this in 1 line. from airflow.models.baseoperator import chain [a,b,c,d,e,f,g,h,i,j,k,l] = [DummyOperator(task_id=f"{i}") for i in "abcdefghijkl"] chain([a,e,i], [b,f,j], [c,g,k], [d,h,l])

op_lists = [[a,b,c,d],[e,f,g,h],[i,j,k,l]] for op_list in op_lists: for i in range(len(op_list) - 1): op_list[i] >> op_list[i + 1] EDIT: I didn't see balderman's answer. He was first

Related

Split two lists of lists into subgraphs using python

Merging overlapping string sequences in a list

Substitute list of paths with unique elements at every level (/)

Appending nodes of tree to a python list

Disjoint Paths Algorithm

Categories

Resources