Changing dictionary format - python

I want to change a dictionary below ...
dict = {
'A': [('B', 1), ('C', 3), ('D', 7)],
'B': [('D', 5)],
'C': [('D', 12)] }
into other form like this:
dict = [
('A', 'B', 1), ('A', 'C', 3), ('A', 'D', 7),
('B', 'D', 5), ('C', 'D', 12)]
This is what I done.
dict = {
'A': [('B', 1), ('C', 3), ('D', 7)],
'B': [('D', 5)],
'C': [('D', 12)] }
if(i[0] in dict):
value = dict[i[0]]
newvalue = i[1],i[2]
value.append(newvalue)
dict1[i[0]]=value
else:
newvalue = i[1],i[2]
l=[]
l.append(newvalue)
dict[i[0]]=l
print(dict)
Thanks

Python tuple is an immutable object. Hence any operation that tries to modify it (like append) is not allowed. However, following workaround can be used.
dict = {
'A': [('B', 1), ('C', 3), ('D', 7)],
'B': [('D', 5)],
'C': [('D', 12)] }
new_dict = []
for key, tuple_list in dict.items():
for tuple_item in tuple_list:
entry = list(tuple_item)
entry.append(key)
new_dict.append(tuple(entry))
print(new_dict)
Output:
[('B', 1, 'A'), ('C', 3, 'A'), ('D', 7, 'A'), ('D', 5, 'B'), ('D', 12, 'C')]

A simple aproach could be
new_dict = []
for letter1, list in dict.items():
for letter2, value in list:
new_dict.append([letter1, letter2, value])

With list comprehension;
dict_ = {
'A': [('B', 1), ('C', 3), ('D', 7)],
'B': [('D', 5)],
'C': [('D', 12)] }
result = [(key, value[0], value[1]) for key, list_ in dict_.items() for value in list_]
Output;
[('A', 'B', 1), ('A', 'C', 3), ('A', 'D', 7), ('B', 'D', 5), ('C', 'D', 12)]

You can iterate through the dictionary using .items(). Notice that each value is by itself a list of tuples. We want to unpack each tuple, so we need a nested for-loop as shown below. res is the output list that we will populate within the loop.
res = []
for key, values in dict.items():
for value in values:
res.append((key, value[0], value[1]))
Sample output:
>>> res
[('A', 'B', 1), ('A', 'C', 3), ('A', 'D', 7), ('B', 'D', 5), ('C', 'D', 12)]
EDIT: If value is a tuple of more than two elements, we would modify the last line as follows, using tuple unpacking:
res.append((key, *value))
This effectively unpacks all the elements of value. For example,
>>> test = (1, 2, 3)
>>> (0, *test)
(0, 1, 2, 3)

Related

Remove elements from tuple array that have same value in first index position of each element

Lets say I have a list:
t = [('a', 1), ('a', 6), ('b', 2), ('c', 3), ('c', 5), ('d', 4)]
There are two tuples with 'a' as the first element, and two tuples with 'c' as the first element. I want to only keep the first instance of each, so I end up with:
t = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
How can I achieve that?
You can use a dictionary to help you filter the duplicate keys:
>>> t = [('a', 1), ('a', 6), ('b', 2), ('c', 3), ('c', 5), ('d', 4)]
>>> d = {}
>>> for x, y in t:
... if x not in d:
... d[x] = y
...
>>> d
{'a': 1, 'b': 2, 'c': 3, 'd': 4}
>>> t = list(d.items())
>>> t
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
#MrGeek's answer is good, but if you do not want to use a dictionary, you could do something simply like this:
>>> t = [('a', 1), ('a', 6), ('b', 2), ('c', 3), ('c', 5), ('d', 4)]
>>> already_seen = []
>>> for e in t:
... if e[0] not in already_seen:
... already_seen.append(e[0])
... else:
... t.remove(e)
...
>>> t
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
#gold_cy's Comment is the easiest way:
You can use itertools.groupby in order to group your data. We use key param to group by the first element of each tuple.
import itertools as it
t = [list(my_iterator)[0] for g, my_iterator in it.groupby(t, key=lambda x: x[0])]
Output:
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]

How to add an index to set members and convert members to tuple?

I want to set an index for my set elements. For example, if my set was equal to:
A = {'a', 'b', 'c', 'd'}
I want to convert this to:
B = {('a', 0), ('b', 1), ('c', 2), ('c', 4)}
is there any way in Python to do this?
B = {(elem, idx) for idx, elem in enumerate(A)}
Order is not defined for sets, so if you need order, a set is not the right data structure.
This will return the index as the first element of the tuples:
set(enumerate(A))
# {(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')}
This will return as your output, the index as the second item of the tuples:
set(zip(A, range(len(A)))
# {('a', 0), ('b', 1), ('c', 2), ('d', 3)}
You can do the following:
B=sorted({(j,i) for i,j in enumerate(A)},key= lambda x: x[1])

List duplicates manipulation

I am very new to python, and I am really lost right now. If anyone can help me, I will appreciate it a lot.
I have a list:
list1 = [((a, b), 2), ((a, b), 5), ((c, d), 1)), ((e, f), 2), ((e, f), 4)]
The output I am looking for is:
output = [((a, b), 7), ((c, d), 1), ((e, f), 6)]
I tried to put it in a dictionary
new_dict = {i: j for i, j in list1}
But it throws me an error
Maybe there are other ways?
Find the explanation in the code comments
list1 = [(('a', 'b'), 2), (('a', 'b'), 5), (('c', 'd'), 1), (('e', 'f'), 2), (('e', 'f'), 4)]
# let's create an empty dictionary
output = {}
# ('a', 'b') is a tuple and tuple is hashable so we can use it as dictionary key
# iterate over the list1
for i in list1:
# for each item check if i[0] exist in output
if i[0] in output:
# if yes just add i[1]
output[i[0]] += i[1]
else:
# create new key
output[i[0]] = i[1]
# finally print the dictionary
final_output = list(output.items())
print(final_output)
[(('a', 'b'), 7), (('c', 'd'), 1), (('e', 'f'), 6)]
You can use {}.get in this fashion:
list1 = [(('a', 'b'), 2), (('a', 'b'), 5), (('c', 'd'), 1), (('e', 'f'), 2), (('e', 'f'), 4)]
di={}
for t in list1:
di[t[0]]=di.get(t[0],0)+t[1]
>>> di
{('a', 'b'): 7, ('c', 'd'): 1, ('e', 'f'): 6}
You can also use a Counter:
from collections import Counter
c=Counter({t[0]:t[1] for t in list1})
>>> c
Counter({('a', 'b'): 5, ('e', 'f'): 4, ('c', 'd'): 1})
Then to turn either of those into a list of tuples (as you have) you use list and {}.items():
>>> list(c.items())
[(('a', 'b'), 5), (('c', 'd'), 1), (('e', 'f'), 4)]
list1 = [(('a', 'b'), 2), (('a', 'b'), 5), (('c', 'd'), 1), (('e', 'f'), 2), (('e', 'f'), 4)]
sorted_dict = {}
for ele in list1:
if ele[0] in sorted_dict:
sorted_dict[ele[0]] += ele[1]
else:
sorted_dict[ele[0]] = ele[1]
print(sorted_dict)

How to efficiently search a list in python

I have a dictionary with only 4 keys (mydictionary) and a list (mynodes) as follows.
mydictionary = {0: {('B', 'E', 'G'), ('A', 'E', 'G'), ('A', 'E', 'F'), ('A', 'D', 'F'), ('C', 'D', 'F'), ('C', 'E', 'F'), ('A', 'D', 'G'), ('C', 'D', 'G'), ('C', 'E', 'G'), ('B', 'E', 'F')},
1: {('A', 'C', 'G'), ('E', 'F', 'G'), ('D', 'E', 'F'), ('A', 'F', 'G'), ('A', 'B', 'G'), ('B', 'D', 'F'), ('C', 'F', 'G'), ('A', 'C', 'E'), ('D', 'E', 'G'), ('B', 'F', 'G'), ('B', 'C', 'G'), ('A', 'C', 'D'), ('A', 'B', 'F'), ('B', 'D', 'G'), ('B', 'C', 'F'), ('A', 'D', 'E'), ('C', 'D', 'E'), ('A', 'C', 'F'), ('A', 'B', 'E'), ('B', 'C', 'E'), ('D', 'F', 'G')},
2: {('B', 'D', 'E'), ('A', 'B', 'D'), ('B', 'C', 'D')},
3: {('A', 'B', 'C')}}
mynodes = ['E', 'D', 'G', 'F', 'B', 'A', 'C']
I am checking how many times each node in mynodes list is in each key of mydictionary. For example, consider the above dictionary and list.
The output should be;
{'E': [(0, 6), (1, 8), (2, 1), (3, 0)],
'D': [(0, 4), (1, 8), (2, 3), (3, 0)],
'G': [(0, 5), (1, 10), (2, 0), (3, 0)],
'F': [(0, 5), (1, 10), (2, 0), (3, 0)],
'B': [(0, 2), (1, 9), (2, 3), (3, 1)],
'A': [(0, 4), (1, 9), (2, 1), (3, 1)],
'C': [(0, 4), (1, 9), (2, 1), (3, 1)]}
For example, consider E. It appears 6 times in 0 key, 8 times in 1 key, 2 times in 2 key and 0 times in 3 key.
My current code is as follows.
triad_class_for_nodes = {}
for node in mynodes:
temp_list = []
for key, value in mydictionary.items():
temp_counting = 0
for triad in value:
#print(triad[0])
if node in triad:
temp_counting = temp_counting + 1
temp_list.append(tuple((key, temp_counting)))
triad_class_for_nodes.update({node: temp_list})
print(triad_class_for_nodes)
This works fine with the small dictionary values.
However, in my real dataset, I have millions of tuples in the value list for each of my 4 keys in my dictionary. Hence, my existing code is really inefficient and takes days to run.
When I search on how to make this more efficient I came accross this question (Fastest way to search a list in python), which suggests to make the list of values to a set. I tried this as well. However, it also takes days to run.
I am just wondering if there is a more efficient way of doing this in python. I am happy to transform my existing data formats into different structures (such as pandas dataframe) to make things more efficient.
A small sample of mydictionary and mynodes is attached below for testing purposes. https://drive.google.com/drive/folders/15Faa78xlNAYLPvqS3cKM1v8bV1HQzW2W?usp=sharing
mydictionary: see triads.txt
with open("triads.txt", "r") as file:
mydictionary = ast.literal_eval(file.read)
mynodes: see nodes.txt
with open("nodes.txt", "r") as file:
mynodes = ast.literal_eval(file.read)
I am happy to provide more details if needed.
Since you tag pandas, first we need convert your dict to pandas dataframe , then we stack it , and using crosstab
s=pd.DataFrame.from_dict(mydictionary,'index').stack()
s = pd.DataFrame(s.values.tolist(), index=s.index).stack()
pd.crosstab(s.index.get_level_values(0),s)
col_0 A B C D E F G
row_0
0 4 2 4 4 6 5 5
1 9 9 9 8 8 10 10
2 1 3 1 3 1 0 0
3 1 1 1 0 0 0 0
Update
s=pd.crosstab(s.index.get_level_values(0), s).stack().reset_index()
s[['row_0',0]].apply(tuple,1).groupby(s['col_0']).agg(list).to_dict()
If you're not using pandas, you could do this with Counter from collections:
from collections import Counter,defaultdict
from itertools import product
counts = Counter((c,k) for k,v in mydictionary.items() for t in v for c in t )
result = defaultdict(list)
for c,k in product(mynodes,mydictionary):
result[c].append((k,counts[(c,k)]))
print(result)
{'E': [(0, 6), (1, 8), (2, 1), (3, 0)],
'D': [(0, 4), (1, 8), (2, 3), (3, 0)],
'G': [(0, 5), (1, 10), (2, 0), (3, 0)],
'F': [(0, 5), (1, 10), (2, 0), (3, 0)],
'B': [(0, 2), (1, 9), (2, 3), (3, 1)],
'A': [(0, 4), (1, 9), (2, 1), (3, 1)],
'C': [(0, 4), (1, 9), (2, 1), (3, 1)]}
Counter will manage counting instances for each combination of mydictionary key and node. You can then use these counts to create the expected output.
EDIT Expanded counts line:
counts = Counter() # initialize Counter() object
for key,tupleSet in mydictionary.items(): # loop through dictionary
for tupl in tupleSet: # loop through tuple set of each key
for node in tupl: # loop through node character in each tuple
counts[(node,key]] += 1 # count 1 node/key pair

How to filter list of tuples with an item of a tuple?

I have this list -
d = [('A', 'B', 1), ('C', 'D', 1),
('B', 'D', 2), ('A', 'B', 3),
('A', 'D', 3), ('B', 'C', 4),
('A', 'C', 5), ('B', 'C', 8)]
Here first two items in the tuple are nodes, and the third item is the weight. I want to remove the tuple with same 1st and 2nd nodes (same 1st and 2nd node between two tuples) but higher weight.
Final List:
d = [('A', 'B', 1), ('C', 'D', 1),
('B', 'D', 2), ('A', 'D', 3),
('B', 'C', 4), ('A', 'C', 5)]
I have tried something like this, but looks like not a very clean solution.
edge_dict = {}
for x in d:
key = '%s%s' % (x[0], x[1])
if not edge_dict.get(key):
edge_dict[key] = x[2]
else:
if edge_dict[key] > x[2]:
edge_dict[key] = x[2]
final_list = []
for k, v in edge_dict.items():
t = list(k)
t.append(v)
final_list.append(tuple(t))
final_list.sort(key=lambda x: x[2])
print final_list
One other way may be to first sorting the list of tuples on first two elements of each tuple and descending order for last element:
sorted_res = sorted(d, key = lambda x:((x[0], x[1]), x[2]),reverse=True)
print(sorted_res)
Result:
[('C', 'D', 1),
('B', 'D', 2),
('B', 'C', 8),
('B', 'C', 4),
('A', 'D', 3),
('A', 'C', 5),
('A', 'B', 3),
('A', 'B', 1)]
Now creating dictionary with key of first two element and value will be the latest one which is small:
my_dict = {(i[0], i[1]):i for i in sorted_res}
print(my_dict)
Result:
{('A', 'B'): ('A', 'B', 1),
('A', 'C'): ('A', 'C', 5),
('A', 'D'): ('A', 'D', 3),
('B', 'C'): ('B', 'C', 4),
('B', 'D'): ('B', 'D', 2),
('C', 'D'): ('C', 'D', 1)}
Final result is values of dictionary:
list(my_dict.values())
Result:
[('A', 'C', 5),
('A', 'B', 1),
('A', 'D', 3),
('B', 'D', 2),
('C', 'D', 1),
('B', 'C', 4)]
Above steps can be done by combining sorted and dictionary comprehension:
result = list({(i[0], i[1]):i
for i in sorted(d, key = lambda x:((x[0], x[1]), x[2]),reverse=True)}.values())
Just a little refactoring.
edge_dict = {}
for t in d:
key = t[:2]
value = t[-1]
if key in edge_dict:
edge_dict[key] = min(value, edge_dict[key])
else:
edge_dict[key] = value
final_list = [(q,r,t) for (q,r),t in edge_dict.items()]
final_list.sort(key=lambda x: x[2])
You can use itertools.groupby, and select the minimum value in each grouping:
import itertools
d = [('A', 'B', 1), ('C', 'D', 1),
('B', 'D', 2), ('A', 'B', 3),
('A', 'D', 3), ('B', 'C', 4),
('A', 'C', 5), ('B', 'C', 8)]
new_d = [min(list(b), key=lambda x:x[-1]) for _, b in itertools.groupby(sorted(d, key=lambda x:x[:-1]), key=lambda x:x[:-1])]
Output:
[('A', 'B', 1), ('A', 'C', 5), ('A', 'D', 3), ('B', 'C', 4), ('B', 'D', 2), ('C', 'D', 1)]
Slightly different of your code but idea is almost same.
Check whether the Node pair is previously found, if not found then store it without any comparison.
If the node pair previously found then compare the values and store the minimum value
Use sorted form of node pair as dictionary key to treat ('B','A') as ('A','B').
Also read the comments for better clarification:
check_dict={} #This will store the minimum valued nodes
for i in d:
if check_dict.get((i[0],i[1])) ==None: #if the node is absent then add it to the check_dict
check_dict[tuple(sorted((i[0],i[1])))] = i[2]
else: #if the node is present then compare with the previous value and store the minimum one
check_dict[tuple(sorted((i[0],i[1])))] = min(check_dict[(i[0],i[1])],i[2]) #used sorted to treat ('A','B') as same as ('B',A')
expected_list = [tuple(key+(value,)) for key,value in check_dict.items()] #create your list of tuples
print(expected_list)
Output :
[('A', 'B', 1), ('C', 'D', 1), ('B', 'D', 2), ('A', 'D', 3), ('B', 'C', 4), ('A', 'C', 5)]

Categories

Resources