Related
Not sure if this can be done with pandas or if I need to write a loop with some logic.
I have some data representing chains of pairs of nodes:
pairs = [
# A1 -> B1 -> C1
{'source': 'A1', 'target': 'B1'},
{'source': 'B1', 'target': 'C1'},
# A1 -> D1
{'source': 'A1', 'target': 'D1'},
# C2 -> A2 -> B2
{'source': 'C2', 'target': 'A2'},
{'source': 'A2', 'target': 'B2'},
]
And I want to resolve those chains to create the list of nodes they contain:
results = [
['A1', 'B1', 'C1', 'D1'],
['C2', 'A2', 'B2'],
]
So far I have this code which does allow me to match some of those nodes together:
def pair_nodes(df, src, tgt):
df = df.groupby([src]).agg({tgt: 'unique'}).reset_index()
df['nodes'] = df.apply(lambda r: np.append(r[src], r[tgt]), axis=1)
return df
df1 = pair_nodes(df, 'source', 'target')
df2 = pair_nodes(df, 'target', 'source')
print(df1)
print(df2)
Which gives me:
source target nodes
0 A1 [B1, D1] [A1, B1, D1]
1 A2 [B2] [A2, B2]
2 B1 [C1] [B1, C1]
3 C2 [A2] [C2, A2]
target source nodes
0 A2 [C2] [A2, C2]
1 B1 [A1] [B1, A1]
2 B2 [A2] [B2, A2]
3 C1 [B1] [C1, B1]
4 D1 [A1] [D1, A1]
And I'm a stuck there. What I guess I'm missing is to merge rows from df1 and df2 whenever source or target is found in nodes
I had a look at df.merge but it only seems to work for exact key match.
Can this be achieved with pandas or do I need to write a custom loop/logic to do this?
Creating the desired result with merging dataframes can be a complicated process.
The above used login of merging will not be able to satisfy all types of graphs. Have a look at the below method.
# Create graph
graph = {}
for pair in pairs:
if pair['source'] in graph.keys():
graph[pair['source']].append(pair['target'])
else:
graph[pair['source']] = [pair['target']]
# Graph
print(graph)
{
'A1': ['B1', 'D1'],
'B1': ['C1'],
'C2': ['A2'],
'A2': ['B2']
}
# Generating list of nodes
start = 'A1' # Starting node parameter
result = [start]
for each in result:
if each in graph.keys():
result.extend(graph[each])
result = list(set(result))
# Output
print(result)
['A1', 'B1', 'C1', 'D1']
How can I compare lists within two columns of a dataframe and identify if the elements of one list is within the other list and create another column with the missing elements.
The dataframe looks something like this:
df = pd.DataFrame({'A': ['a1', 'a2', 'a3'],
'B': [['b1', 'b2'], ['b1', 'b2', 'b3'], ['b2']],
'C': [['c1', 'b1'], ['b3'], ['b2', 'b1']],
'D': ['d1', 'd2', 'd3']})
I want to compare if elements of column C are in column B and output the missing values to column E, the desired output is:
df = pd.DataFrame({'A': ['a1', 'a2', 'a3'],
'B': [['b1', 'b2'], ['b1', 'b2', 'b3'], ['b2']],
'C': [['c1', 'b1'], ['b3'], ['b2', 'b1']],
'D': ['d1', 'd2', 'd3']
'E': ['b2', ['b1','b2'],'']})
Like your previous related question, you can use a list comprehension. As a general rule, you shouldn't force multiple different types of output, e.g. list or str, depending on result. Therefore, I have chosen lists throughout in this solution.
df['E'] = [list(set(x) - set(y)) for x, y in zip(df['B'], df['C'])]
print(df)
A B C D E
0 a1 [b1, b2] [c1, b1] d1 [b2]
1 a2 [b1, b2, b3] [b3] d2 [b1, b2]
2 a3 [b2] [b2, b1] d3 []
def Desintersection(i):
Output = [b for b in df['B'][i] if b not in df['C'][i]]
if(len(Output) == 0):
return ''
elif(len(Output) == 1):
return Output[0]
else:
return Output
df['E'] = df.index.map(Desintersection)
df
Like what I do for my previous answer
(df.B.map(set)-df.C.map(set)).map(list)
Out[112]:
0 [b2]
1 [b2, b1]
2 []
dtype: object
I agree with #jpp that you shouldn't mix the types so much, as when you try to apply the same function to the new E column, it will fail, cause it expected each element to be a list.
This would work on E, as it converts single str values to [str] before comparison.
import pandas as pd
df = pd.DataFrame({'A': ['a1', 'a2', 'a3'],
'B': [['b1', 'b2'], ['b1', 'b2', 'b3'], ['b2']],
'C': [['c1', 'b1'], ['b3'], ['b2', 'b1']],
'D': ['d1', 'd2', 'd3']})
def difference(df, A, B):
elements_to_list = lambda x: [n if isinstance(n, list) else [n] for n in x]
diff = [list(set(a).difference(set(b))) for a, b in zip(elements_to_list(df[A]), elements_to_list(df[B]))]
diff = [d if d else "" for d in diff] # replace empty lists with empty strings
return [d if len(d) != 1 else d[0] for d in diff] # return with single values extracted from the list
df['E'] = difference(df, "B", "C")
df['F'] = difference(df, "B", "E")
print(list(df['E']))
print(list(df['F']))
['b2', ['b2', 'b1'], '']
['b1', 'b3', 'b2']
I'm new to python so I have been trying to figure out how to print the following list in the format below. I have tried to get them through indexes but it doesn't seem to work out.
So I have a list of tuples that I need to print out in the following format:
1. A1 to B1
2. C1 to D2 , C1 to B2
This is the following list I have that I need to convert to that format.
[['A1', 'B1'], ['C1', 'D2', 'C1', 'B2', 'C1', 'E3', 'C1', 'A3', 'C1', 'F4', 'C1', 'G5', 'C1', 'H6'], ['E1', 'D1', 'E1', 'F2', 'E1', 'D2', 'E1', 'G3', 'E1', 'H4']]
>>> tuples = [['A1', 'B1'],
... ['C1', 'D2', 'C1', 'B2', 'C1', 'E3', 'C1', 'C2'],
... ['E1', 'D1', 'E1', 'F2']]
>>> for i, sequence in enumerate(tuples, 1):
... it = iter(sequence)
... print(i, ', '.join('{} to {}'.format(x, y) for x, y in zip(it, it))
1 A1 to B1
2 C1 to D2, C1 to B2, C1 to E3, C1 to C2
3 E1 to D1, E1 to F2
You must replace zip() with itertools.izip_longest() if number of elements in the list is odd:
>>> from itertools import izip_longest
>>> tuples = [['A1', 'B1', 'T1'],
['C1', 'D2', 'C1', 'B2', 'C1', 'E3', 'C1', 'C2', 'Z1'],
['E1', 'D1', 'E1', 'F2']]
>>> for i, sequence in enumerate(tuples, 1):
... it = iter(sequence)
... print(i, ', '.join('{} to {}'.format(x, y)
... for x, y in izip_longest(it, it, fillvalue='X'))
1 A1 to B1, T1 to X
2 C1 to D2, C1 to B2, C1 to E3, C1 to C2, Z1 to X
3 E1 to D1, E1 to F2
start_list = [['A1', 'B1'], ['C1', 'D2', 'C1', 'B2', 'C1', 'E3', 'C1', 'A3',
'C1', 'F4', 'C1', 'G5', 'C1', 'H6'], ['E1', 'D1', 'E1', 'F2', 'E1', 'D2',
'E1', 'G3', 'E1', 'H4']]
for line, sublist in enumerate(start_list):
pairs_list = [ (a + " to " + b) for (a, b) in zip(sublist, sublist[1:]) ]
print str(line)+'.', ', '.join(pairs_list)
OK, enough dense advanced code only answers.
start_list = [['A1', 'B1'], ['C1', 'D2', 'C1', 'B2', 'C1', 'E3', 'C1', 'A3',
'C1', 'F4', 'C1', 'G5', 'C1', 'H6'], ['E1', 'D1', 'E1', 'F2', 'E1', 'D2',
'E1', 'G3', 'E1', 'H4']]
#Loop over the inner lists, with a counter
for line_num, sublist in enumerate(start_list):
# Build up a string to print with a line number and dot
output = str(line_num) + '. '
# Count through the list of pairs, step 2
# skipping every other one
for i in range(0, len(sublist), 2):
# take the current and next sublist item, into the output string
# and add a trailing comma and space, to give the form
# "A to B, "
output += sublist[i] + " to " + sublist[i+1] + ", "
# remove trailing comma after last pair in the line
output = output.rstrip(', ')
print output
e.g.
0. A1 to B1
1. C1 to D2, C1 to B2, C1 to E3, C1 to A3, C1 to F4, C1 to G5, C1 to H6
2. E1 to D1, E1 to F2, E1 to D2, E1 to G3, E1 to H4
The easier way to do this for you will be to iterate over the inner lists using an index and an offset.
Something like this (not tested):
for j, inner in enumerate(list_of_lists):
print(j, '.', sep='', end=' ')
for i in range(0, len(inner)-1, 2):
print(inner[i], 'to', inner[i+1], ',', end='')
Using loop and list comprehension
This would also work for lists with odd elements
Code:
lst=[['A1', 'B1'], ['C1', 'D2', 'C1', 'B2', 'C1', 'E3', 'C1', 'A3', 'C1', 'F4', 'C1', 'G5', 'C1', 'H6','H5'], ['E1', 'D1', 'E1', 'F2', 'E1', 'D2', 'E1', 'G3', 'E1', 'H4']]
for count,line in enumerate(lst ) :
print str(count)+" , ".join([" %s to %s" %(tuple((line[i:i+2]))) if i+1 <len(line) else "Single king %s"%(line[i]) for i in range(0,len(line),2)])
Output:
0 A1 to B1
1 C1 to D2 , C1 to B2 , C1 to E3 , C1 to A3 , C1 to F4 , C1 to G5 , C1 to H6 , Single king H5
2 E1 to D1 , E1 to F2 , E1 to D2 , E1 to G3 , E1 to H4
Here is one way of doing this:
>>> l = [['A1', 'B1'], ['C1', 'D2', 'C1', 'B2', 'C1', 'E3', 'C1', 'A3', 'C1', 'F4', 'C1', 'G5', 'C1', 'H6'], ['E1', 'D1', 'E1', 'F2', 'E1', 'D2', 'E1', 'G3', 'E1', 'H4']]
>>>
>>>
>>> for i,m in enumerate(l):
... print '%s. %s' %(i+1, ' , '.join([' to '.join(n) for n in zip(m[::2], m[1::2])]))
...
1. A1 to B1
2. C1 to D2 , C1 to B2 , C1 to E3 , C1 to A3 , C1 to F4 , C1 to G5 , C1 to H6
3. E1 to D1 , E1 to F2 , E1 to D2 , E1 to G3 , E1 to H4
We have some departures which can be assigned to different arrivals, just like this:
Dep1.arrivals = [A1, A2]
Dep2.arrivals = [A2, A3, A4]
Dep3.arrivals = [A3, A5]
The output of this function should be a list containing every possible combination of arrivals:
Output: [[A1, A2, A3], [A1, A2, A5], [A1, A3, A5], [A1, A4, A5], ...]
Notice that [A1, A3, A3] isn't contained in the list because you can not use an arrival twice. Also notice that [A1, A2, A3] is the same element as [A3, A1, A2] or [A3, A2, A1].
EDIT:
Many solutions given works in this case but not as a general solution, for instance if the 3 sets or arrivals are equal:
Dep1.arrivals = [A1, A2, A3]
Dep2.arrivals = [A1, A2, A3]
Dep3.arrivals = [A1, A2, A3]
Then it returns:
('A1', 'A2', 'A3')
('A1', 'A3', 'A2')
('A2', 'A1', 'A3')
('A2', 'A3', 'A1')
('A3', 'A1', 'A2')
('A3', 'A2', 'A1')
Which is wrong since ('A1', 'A2', 'A3') and ('A3', 'A2', 'A1') are the same solution.
Thank you anyway!
You can do this using a list comprehension with itertools.product:
>>> import itertools
>>> lol = [["A1", "A2"], ["A2", "A3", "A4"], ["A3", "A5"]]
>>> print [x for x in itertools.product(*lol) if len(set(x)) == len(lol)]
Result
[('A1', 'A2', 'A3'),
('A1', 'A2', 'A5'),
('A1', 'A3', 'A5'),
('A1', 'A4', 'A3'),
('A1', 'A4', 'A5'),
('A2', 'A3', 'A5'),
('A2', 'A4', 'A3'),
('A2', 'A4', 'A5')]
Note that this is notionally equivalent to the code that #Kevin has given.
Edit: As OP mentions in his edits, this solution doesn't work with when order of combination is different.
To resolve that, the last statement can be altered to the following, where we first obtain a list of sorted tuple of arrivals, and then convert convert the list to a set, as below:
>>> lol = [["A1", "A2", "A3"], ["A1", "A2", "A3"], ["A1", "A2", "A3"]]
>>> set([tuple(sorted(x)) for x in itertools.product(*lol) if len(set(x)) == len(lol)])
{('A1', 'A2', 'A3')}
>>> lol = [["A1", "A2"], ["A2", "A3", "A4"], ["A3", "A5"]]
>>> set([tuple(sorted(x)) for x in itertools.product(*lol) if len(set(x)) == len(lol)])
{('A1', 'A2', 'A3'),
('A1', 'A2', 'A5'),
('A1', 'A3', 'A4'),
('A1', 'A3', 'A5'),
('A1', 'A4', 'A5'),
('A2', 'A3', 'A4'),
('A2', 'A3', 'A5'),
('A2', 'A4', 'A5')}
You could use product to generate all possible combinations of the departures, and then filter out combinations containing duplicates after the fact:
import itertools
arrivals = [
["A1", "A2"],
["A2", "A3", "A4"],
["A3", "A5"]
]
for items in itertools.product(*arrivals):
if len(set(items)) < len(arrivals): continue
print items
Result:
('A1', 'A2', 'A3')
('A1', 'A2', 'A5')
('A1', 'A3', 'A5')
('A1', 'A4', 'A3')
('A1', 'A4', 'A5')
('A2', 'A3', 'A5')
('A2', 'A4', 'A3')
('A2', 'A4', 'A5')
The question is tagged with itertools but i suspect you did not look at itertools.combinations
arrivals = ['A1', 'A2', 'A3', 'A4']
[a for a in itertools.combinations(arrivals, 3)]
#[('A1', 'A2', 'A3'),
#('A1', 'A2', 'A4'),
# ('A1', 'A3', 'A4'),
#('A2', 'A3', 'A4')]
I would like to combine two lists in Python to make one list in the following way:
Input:
list1 = [a, b, c, d]
list2 = [1, 2, 3, 4]
Result should be:
list3 = [a1, a2, a3, a4, b1, b2, b3, b4, c1 ... ]
Or using list comprehension as a one liner, instead of nested loops:
list1 = ['a', 'b', 'c', 'd']
list2 = [1, 2, 3, 4]
list3 = [x + str(y) for x in list1 for y in list2]
Note: I assume you forgot the quotes in list1, and list3 is supposed to be a list of strings.
list1 = ['a','b','c','d']
list2 = [1,2,3,4]
list3 = []
for letter in list1:
for number in list2:
newElement = letter+str(number)
list3.append(newElement)
print list3
returns this:
['a1', 'a2', 'a3', 'a4', 'b1', 'b2', 'b3', 'b4', 'c1', 'c2', 'c3', 'c4', 'd1', 'd2', 'd3', 'd4']