How do I keep the index of the duplicate element unchanged - python

Here is a input list:
['a', 'b', 'b', 'c', 'c', 'd']
The output I expect should be:
[[0, 'a'], [1, 'b'], [1, 'b'], [2, 'c'], [2, 'c'], [3, 'd']]
I try to use map()
>>> map(lambda (index, word): [index, word], enumerate([['a', 'b', 'b', 'c', 'c', 'd']])
[[0, 'a'], [1, 'b'], [2, 'b'], [3, 'c'], [4, 'c'], [5, 'd']]
How can I get the expected result?
EDIT: This is not a sorted list, the index of each element increase only when meet a new element

>>> import itertools
>>> seq = ['a', 'b', 'b', 'c', 'c', 'd']
>>> [[i, c] for i, (k, g) in enumerate(itertools.groupby(seq)) for c in g]
[[0, 'a'], [1, 'b'], [1, 'b'], [2, 'c'], [2, 'c'], [3, 'd']]

[
[i, x]
for i, (value, group) in enumerate(itertools.groupby(['a', 'b', 'b', 'c', 'c', 'd']))
for x in group
]

It sounds like you want to rank the terms based on a lexicographical ordering.
input = ['a', 'b', 'b', 'c', 'c', 'd']
mapping = { v:i for (i, v) in enumerate(sorted(set(input))) }
[ [mapping[v], v] for v in input ]
Note that this works for unsorted inputs as well.
If, as your amendment suggests, you want to number items based on order of first appearance, a different approach is in order. The following is short and sweet, albeit offensively hacky:
[ [d.setdefault(v, len(d)), v] for d in [{}] for v in input ]

When list is sorted use groupby (see jamylak answer); when not, just iterate over the list and check if you've seen this letter already:
a = ['a', 'b', 'b', 'c', 'c', 'd']
result = []
d = {}
n = 0
for k in a:
if k not in d:
d[k] = n
n += 1
result.append([d[k],k])
It is the most effective solution; it takes only O(n) time.
Example of usage for unsorted lists:
[[0, 'a'], [1, 'b'], [1, 'b'], [2, 'c'], [2, 'c'], [3, 'd'], [0, 'a']]
As you can see, you have here the same order of items as in the input list.
When you sort the list first you need O(n*log(n)) additional time.

Related

How to combine python list and create list within a list based on first element?

I am trying to compile a python list based on first elements of nested list. But I am not sure what is the correct way to do that.
I have this nested list.
list1 = [[1, a, b, c], [2, b, c, d], [2, b, d, e], [1, c, a, d]]
I am trying to get an output like this.
output_list = [[1, [a, b, c], [c, a, d]], [2, [b, c, d], [b, d, e]]]
Accumulating with a defaultdict, and then using a list comprehension at the end:
>>> list1 = [[1, 'a', 'b', 'c'], [2, 'b', 'c', 'd'], [2, 'b', 'd', 'e'], [1, 'c', 'a', 'd']]
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for first, *rest in list1:
... d[first].append(rest)
...
>>> [[first, *rest] for first, rest in d.items()]
[[1, ['a', 'b', 'c'], ['c', 'a', 'd']], [2, ['b', 'c', 'd'], ['b', 'd', 'e']]]
list1 = [[1, "a", 'b', 'c'], [2, 'b', 'c', "d"], [2, 'b', 'd', 'e'], [1, 'c', 'a', 'd']]
firstList = []
output_list = []
for i, list in enumerate(list1):
if list[0] not in firstList:
firstList.append(list[0])
anotherList = []
for j in range(1, len(list)):
anotherList.append(list[j])
bList = [list[0], anotherList]
output_list.append(bList)
else:
place = firstList.index(list[0])
anotherList = []
for j in range(1, len(list)):
anotherList.append(list[j])
output_list[place].append(anotherList)
print(output_list)
>>>[[1, ['a', 'b', 'c'], ['c', 'a', 'd']], [2, ['b', 'c', 'd'], ['b', 'd', 'e']]]

Get right label using indices?

Really stupid question as I am new to python:
If I have labels = ['a', 'b', 'c', 'd'],
and indics = [2, 3, 0, 1]
How should I get the corresponding label using each index so I can get: ['c', 'd', 'a', 'b']?
There are a few alternatives, one, is to use a list comprehension:
labels = ['a', 'b', 'c', 'd']
indices = [2, 3, 0, 1]
result = [labels[i] for i in indices]
print(result)
Output
['c', 'd', 'a', 'b']
Basically iterate over each index and fetch the item at that position. The above is equivalent to the following for loop:
result = []
for i in indices:
result.append(labels[i])
A third option is to use operator.itemgetter:
from operator import itemgetter
labels = ['a', 'b', 'c', 'd']
indices = [2, 3, 0, 1]
result = list(itemgetter(*indices)(labels))
print(result)
Output
['c', 'd', 'a', 'b']

Sorting dictionaries in python by values that are actually lists

If I'm given a dictionary to represent a graph, where vertices are keys and values are lists, whose entries contain both a neighbor vertex and the weight between the two vertices, how can I return a list of edges in increasing order with no repeats? For example, I may be given the following dictionary...:
{"A": [["B",10], ["D",5]], "B": [["A",10], ["C",5]], "C": [["B",5],["D",15]], "D": [["C",15], ["A",5]]}.
Also I'm only allowed to import the copy library, so I could copy one list and use deepcopy() to create a new object with the same elements.
Right now, I'm trying to turn the dictionary into a list, because I figure it might be easier to sort elements within a list, and delete duplicate edges. So at the moment I have the following (graph is the dictionary, and in this case the one I provided above)...
def edge_get(graph):
input_list = []
sorted_list = []
for key, value in graph.items():
temp = [key,value]
input_list.append(temp)
print(input_list)
This prints out...
[['A', [['B', 10], ['D', 5]]], ['B', [['A', 10], ['C', 5]]], ['C', [['B', 5], ['D', 15]]], ['D', [['C', 15], ['A', 5]]]]
I would like to get it to output:
[['A', 'B', 10], ['A', 'D', 5], ['B', 'A', 10], ['B', 'C', 5],...
I figure if I can get it like this, I can compare the third element of each list, within the list, and if they are the same, check to see if the other elements match (same edge). And based off of that I can add it to the final list or forget it and move on.
For this example the ultimate goal is:
[['A', 'D'], ['B', 'C'], ['A', 'B'], ['C', 'D']]
So you have a dict that represents a graph as adjacency list, and you want to convert that adjacency list into an edge list.
You can do that with a nested list comprehension:
graph = {"A": [["B",10], ["D",5]], "B": [["A",10], ["C",5]], "C": [["B",5],["D",15]], "D": [["C",15], ["A",5]]}
edges = [(src, dst, weight) for src, adjs in graph.items() for dst, weight in adjs]
# edges = [('A', 'B', 10), ('A', 'D', 5), ('B', 'A', 10), ('B', 'C', 5), ('C', 'B', 5), ('C', 'D', 15), ('D', 'C', 15), ('D', 'A', 5)]
Then you can eliminate duplicates edges by converting to a dict, note that if you have duplicate edges with conflicting weights, this will pick one of the weight arbitrarily:
uniques = {frozenset([src, dst]): weight for src, dst, weight in edges}
# uniques = {frozenset({'B', 'A'}): 10, frozenset({'A', 'D'}): 5, frozenset({'B', 'C'}): 5, frozenset({'C', 'D'}): 15}
and then sort the edges with sorted:
sorted_uniques = sorted(uniques.items(), key=lambda v: v[1])
# sorted_uniques = [(frozenset({'A', 'D'}), 5), (frozenset({'C', 'B'}), 5), (frozenset({'A', 'B'}), 10), (frozenset({'C', 'D'}), 15)]
Finally, to get the result in the structure you wanted, you simply do:
result = [sorted(e) for e, weight in sorted_uniques]
# result = [['A', 'D'], ['B', 'C'], ['A', 'B'], ['C', 'D']]
You can represent each edge as frozenset and filter edge duplicates with help of set:
G = {"A": [["B",10], ["D",5]], "B": [["A",10], ["C",5]], "C": [["B",5],["D",15]], "D": [["C",15], ["A",5]]}
edges = {(frozenset((k, i)), j) for k, v in G.items()
for i, j in v}
[sorted(i) for i, _ in sorted(edges, key=lambda x: x[1])]
# [['B', 'C'], ['A', 'D'], ['A', 'B'], ['C', 'D']]
You can use itertools.product to generate the combinations of key with each related sublist. If you sort and unpack the string components of each combination, then you get the initial output you are looking for. From there you can sort the entire list first by the weight value and then by the vertices in order to get an ordered list. If you slice that list with a step value you can remove the duplicates. Then you can just remove the weight value to get the list of pairs for your final output.
You could consolidate the steps below just a bit more but this goes through the steps outlined in your question to hopefully make it a bit easier to follow.
from itertools import product
from operator import itemgetter
d = {"A": [["B",10], ["D",5]], "B": [["A",10], ["C",5]], "C": [["B",5],["D",15]], "D": [["C",15], ["A",5]]}
combos = [[*sorted([c1, c2]), n] for k, v in d.items() for c1, [c2, n] in product(k, v)]
print(combos)
# [['A', 'B', 10], ['A', 'D', 5], ['A', 'B', 10], ['B', 'C', 5], ['B', 'C', 5], ['C', 'D', 15], ['C', 'D', 15], ['A', 'D', 5]]
ordered = sorted(combos, key=itemgetter(2, 0, 1))[::2]
print(ordered)
# [['A', 'D', 5], ['B', 'C', 5], ['A', 'B', 10], ['C', 'D', 15]]
pairs = [o[:-1] for o in ordered]
print(pairs)
# [['A', 'D'], ['B', 'C'], ['A', 'B'], ['C', 'D']]
EDIT (without imports):
Per comment highlighting a restriction on using imports in your solution, here is a modified version of the original. Differences are replacement of itertools.product with list comprehension that accomplishes the same thing and the replacement of operator.itemgetter with a lambda.
d = {"A": [["B",10], ["D",5]], "B": [["A",10], ["C",5]], "C": [["B",5],["D",15]], "D": [["C",15], ["A",5]]}
combos = [[*sorted([k, c]), n] for k, v in d.items() for c, n in v]
print(combos)
# [['A', 'B', 10], ['A', 'D', 5], ['A', 'B', 10], ['B', 'C', 5], ['B', 'C', 5], ['C', 'D', 15], ['C', 'D', 15], ['A', 'D', 5]]
ordered = sorted(combos, key=lambda x: (x[2], x[0], x[1]))[::2]
print(ordered)
# [['A', 'D', 5], ['B', 'C', 5], ['A', 'B', 10], ['C', 'D', 15]]
pairs = [o[:-1] for o in ordered]
print(pairs)
# [['A', 'D'], ['B', 'C'], ['A', 'B'], ['C', 'D']]

Appending to a list of lists sequentially

I have two list of lists:
my_list = [[1,2,3,4], [5,6,7,8]]
my_list2 = [['a', 'b', 'c'], ['d', 'e', 'f']]
I want my output to look like this:
my_list = [[1,2,3,4,'a','b','c'], [5,6,7,8,'d','e','f']]
I wrote the following code to do this but I end up getting more lists in my result.
my_list = map(list, (zip(my_list, my_list2)))
this produces the result as:
[[[1, 2, 3, 4], ['a', 'b', 'c']], [[5, 6, 7, 8], ['d', 'e', 'f']]]
Is there a way that I can remove the redundant lists.
Thanks
Using zip is the right approach. You just need to add the elements from the tuples zip produces.
>>> my_list = [[1,2,3,4], [5,6,7,8]]
>>> my_list2 = [['a', 'b', 'c'], ['d', 'e', 'f']]
>>> [x+y for x,y in zip(my_list, my_list2)]
[[1, 2, 3, 4, 'a', 'b', 'c'], [5, 6, 7, 8, 'd', 'e', 'f']]
You can use zip in a list comprehension:
my_list = [[1,2,3,4], [5,6,7,8]]
my_list2 = [['a', 'b', 'c'], ['d', 'e', 'f']]
new_list = [i+b for i, b in zip(my_list, my_list2)]
As an alternative you may also use map with sum and lambda function to achieve this (but list comprehension approach as mentioned in other answer is better):
>>> map(lambda x: sum(x, []), zip(my_list, my_list2))
[[1, 2, 3, 4, 'a', 'b', 'c'], [5, 6, 7, 8, 'd', 'e', 'f']]

Identify duplicates in a list of lists and sum up their last items

I have a list of lists from which I would like to remove duplicates and sum up duplicates' last elements. An item is a duplicate if its first 2 elements are the same. This is better illustrated with an example:
input = [['a', 'b', 2], ['a', 'c', 1], ['a', 'b', 1]]
# Desired output
output = [['a', 'b', 3], ['a', 'c', 1]]
There are similar questions here on SO but I haven't found one which would deal with list of lists and summing up list items at the same time.
I tried several approaches but couldn't make it work:
create a copy of input list, make a nested loop, if second duplicate is found, add its last item to original --> this got too confusing with too much nesting
I looked into collections Counter but it doesn't seem to work with list of lists
itertools
Could you give me any pointers on how to approach this problem?
I don't think lists are the best data structure for it. I would use dictionaries with tuple key. I you really need list, you can create one later:
from collections import defaultdict
data = [['a', 'b', 2], ['a', 'c', 1], ['a', 'b', 1]]
result = collections.defaultdict(int) # new keys are auto-added and initialized as 0
for item in data:
a, b, value = item
result[(a,b)] += value
print result
# defaultdict(<type 'int'>, {('a', 'b'): 3, ('a', 'c'): 1})
print dict(result)
# {('a', 'b'): 3, ('a', 'c'): 1}
print [[a, b, total] for (a, b), total in result.items()]
# [['a', 'b', 3], ['a', 'c', 1]]
You could use Counter; someone's already given a manual defaultdict solution; so here's an itertools.groupby one, just for variety:
>>> from itertools import groupby
>>> inp = [['a', 'b', 2], ['a', 'c', 1], ['a', 'b', 1]]
>>> [k[:2] + [sum(v[2] for v in g)] for k,g in groupby(sorted(inp), key=lambda x: x[:2])]
[['a', 'b', 3], ['a', 'c', 1]]
but I second #m.wasowski's view that a dictionary (or dict subclass like defaultdict or Counter) is probably a better data structure.
It'd also be somewhat more general to use [:-1] and [-1] instead of [:2] and [2], but I'm too lazy to make the change. :-)
I prefer this approach:
>>> from collections import Counter
>>> from itertools import repeat, chain
>>> sum((Counter({tuple(i[:-1]): i[-1]}) for i in input), Counter())
Counter({('a', 'b'): 3, ('a', 'c'): 1})
(Thanks to #DSM for pointing out an improvement to my original answer.)
If you want it in list form:
>>> [[a, b, n] for (a,b),n in _.items()]
[['a', 'b', 3], ['a', 'c', 1]]
>>> t = [['a', 'b', 2], ['a', 'c', 1], ['a', 'b', 1]]
>>> sums = {}
>>> for i in t:
sums[tuple(i[:-1])] = sums.get(tuple(i[:-1]),0) + i[-1]
>>> output = [[a,b,sums[(a,b)]] for a,b in sums]
>>> output
[['a', 'b', 3], ['a', 'c', 1]]
inp = [['a', 'b', 2], ['a', 'c', 1], ['a', 'b', 1], ['a', 'c', 2], ['a', 'b', 4]]
lst = []
seen = []
for i, first in enumerate(inp):
if i in seen:
continue
found = False
count = first[-1]
for j, second in enumerate(inp[i + 1:]):
if first[:2] == second[:2]:
count += second[-1]
found = True
seen.append(i + j + 1)
if found:
lst.append(first[:-1] + [count])
else:
lst.append(first)
print(lst)
# [['a', 'b', 7], ['a', 'c', 3]]

Categories

Resources