Create a Dictionary from a Dataframe With the Index as the Keys

Create a Dictionary from a Dataframe With the Index as the Keys - python

df.to_dict() creates a nested dictionary where the headers form the keys, {column:{index:value}}.
Is there an easy way to create a dictionary where the index forms the keys, {index:column:value}}? Or even {index:(column,value)}?
I can create the dictionary and then invert it, but I was wondering if this can be done in a single step.

Transpose the dataframe before you use df.to_dict.
df = pd.DataFrame({'a': [1, 3, 5], 'b': [2, 7, 5]})
print(df)
# a b
# 0 1 2
# 1 3 7
# 2 5 5
print(df.transpose().to_dict())
# {0: {'a': 1, 'b': 2},
# 1: {'a': 3, 'b': 7},
# 2: {'a': 5, 'b': 5}}

Related

Parsing a distance matrix csv into python dictionary structure

I have a distance matrix laid out like this in a csv file
, A, B, C,
A, 0
B, 3, 0
C, 6, 4, 0
And I would like to parse it into a python dictionary like this...
graph = {'A': {'B': 3, 'C': 6},
'B': {'A': 3, 'C': 4},
'C': {'A': 6, 'B': 4}}

With the file you specified, you will never get that dict in graph. In any case, if you provide the correct CSV file, the code below will result in exactly what you want.
Just pay attention that inside the CSV file, you cannot have a comma in the end of the header row (first row), and you cannot have spaces in the column names (first row). Otherwise, you'll get a weird dict.
import pandas
import io
import math
d = pandas.read_csv('csv_file.csv',sep=',',header=0,index_col=0)
d_dict = d.to_dict() # use d.to_dict(orient='index') for the transpose
graph = { k.strip():{ k2.strip():v2 for k2,v2 in v.items() if not math.isnan(v2) } for k,v in d_dict.items() }
print(graph)
which generates the corresponding dict
{'A': {'A': 0, 'B': 3, 'C': 6},
'B': {'B': 0.0, 'C': 4.0},
'C': {'C': 0.0}}
the csv_file.csv
,A,B,C
A, 0
B, 3, 0
C, 6, 4, 0

Python combine values of identical dictionaries without using looping

I have list of identical dictionaries:
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
I need to get something like this:
a = [1, 4, 7]
b = [2, 5, 8]
c = [3, 6, 9]
I know how to do in using for .. in .., but is there way to do it without looping?
If i do
a, b, c = zip(*my_list)
i`m getting
a = ('a', 'a', 'a')
b = ('b', 'b', 'b')
c = ('c', 'c', 'c')
Any solution?

You need to extract all the values in my_list.You could try:
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
a, b, c = zip(*map(lambda d: d.values(), my_list))
print(a, b, c)
# (1, 4, 7) (2, 5, 8) (3, 6, 9)
Pointed out by #Alexandre,This work only when the dict is ordered.If you couldn't make sure the order, consider the answer of yatu.

You will have to loop to obtain the values from the inner dictionaries. Probably the most appropriate structure would be to have a dictionary, mapping the actual letter and a list of values. Assigning to different variables is usually not the best idea, as it will only work with the fixed amount of variables.
You can iterate over the inner dictionaries, and append to a defaultdict as:
from collections import defaultdict
out = defaultdict(list)
for d in my_list:
for k,v in d.items():
out[k].append(v)
print(out)
#defaultdict(list, {'a': [1, 4, 7], 'b': [2, 5, 8], 'c': [3, 6, 9]})

Pandas DataFrame has just a factory method for this, so if you already have it as a dependency or if the input data is large enough:
import pandas as pd
my_list = ...
df = pd.DataFrame.from_rows(my_list)
a = list(df['a']) # df['a'] is a pandas Series, essentially a wrapped C array
b = list(df['b'])
c = list(df['c'])

Please find the code below. I believe that the version with a loop is much easier to read.
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
# we assume that all dictionaries have the sames keys
a, b, c = map(list, map(lambda k: map(lambda d: d[k], my_list), my_list[0]))
print(a,b,c)

Efficient way to select row from a DataFrame based on varying list of columns

Suppose, we have the following DataFrame:
dt = {'A': ['a','a','a','a','a','a','b','b','c'],
'B': ['x','x','x','y','y','z','x','z','y'],
'C': [10, 14, 15, 11, 10, 14, 14, 11, 10],
'D': [1, 3, 2, 1, 3, 5, 1, 4, 2]}
df = pd.DataFrame(data=dt)
I want to extract certain rows based on a dictionary where keys are column names and values are row values. For example:
d = {'A': 'a', 'B': 'x'}
d = {'A': 'a', 'B': 'y', 'C': 10}
d = {'A': 'b', 'B': 'z', 'C': 11, 'D': 4}
It can be done using loop (consider the last dictionary):
for iCol in d:
df = df[df[iCol] == d[iCol]]
Out[215]:
A B C D
7 b z 11 4
Since DataFrame is expected to be pretty large and it may have many columns to select on, I am looking for the efficient way to solve the problem without using for loop to iterate the dataframe.

Use the below, Make the dict a Series:
print(df[(df[list(d)] == pd.Series(d)).all(axis=1)])
Output:
A B C D
7 b z 11 4

Pandas: how to remove duplicate rows, but keep ALL rows with max value [duplicate]

This question already has answers here:
Get the row(s) which have the max value in groups using groupby
(15 answers)
Closed 4 years ago.
How can I remove duplicate rows, but keep ALL rows with the max value. For example, I have a dataframe with 4 rows:
data = [{'a': 1, 'b': 2, 'c': 3},{'a': 7, 'b': 10, 'c': 2}, {'a': 7, 'b': 2, 'c': 20}, {'a': 7, 'b': 2, 'c': 20}]
df = pd.DataFrame(data)
From this dataframe, I want to have a dataframe like (3 rows, group by 'a', keep all rows that have max value in 'c'):
data = [{'a': 1, 'b': 2, 'c': 3}, {'a': 7, 'b': 2, 'c': 20}, {'a': 7, 'b': 2, 'c': 20}]
df = pd.DataFrame(data)

You can use GroupBy + transform with Boolean indexing:
res = df[df['c'] == df.groupby('a')['c'].transform('max')]
print(res)
a b c
0 1 2 3
1 7 2 20
2 7 2 20

You can calculate the max c per group using groupby and transform and then filter where your record is equal to the max like:
df['max_c'] = df.groupby('a')['c'].transform('max')
df[df['c']==df['max_c']].drop(['max_c'], axis=1)

Convert a dictionary of equal length lists into a list of dictionaries [duplicate]

This question already has answers here:
Split dictionary of lists into list of dictionaries
(8 answers)
Closed 5 years ago.
I want to write a function that is able to convert a dictionary of lists, that are of equal length lists, into a list of dictionaries.
EX:
{'a': [1, 2, 3], 'b': [3, 2, 1]}
=> [{'a': 1, 'b': 3}, {'a': 2, 'b': 2}, {'a': 3, 'b': 1}]
The thing I am caught on is how to code this without knowing the total length of the first dictionary, and being able to remove all hardcoded values out of the function.
My intuition was to try and use a defaultdict, but that didn't seem to help.
Any help would be much appreciated!

Zip values and zip keys + each item in that zip:
d = {'a': [1, 2, 3], 'b': [3, 2, 1]}
[dict(zip(d.keys(),i)) for i in zip(*d.values())]
Result:
[{'a': 1, 'b': 3}, {'a': 2, 'b': 2}, {'a': 3, 'b': 1}]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create a Dictionary from a Dataframe With the Index as the Keys - python

Transpose the dataframe before you use df.to_dict. df = pd.DataFrame({'a': [1, 3, 5], 'b': [2, 7, 5]}) print(df) # a b # 0 1 2 # 1 3 7 # 2 5 5 print(df.transpose().to_dict()) # {0: {'a': 1, 'b': 2}, # 1: {'a': 3, 'b': 7}, # 2: {'a': 5, 'b': 5}}

Related

Parsing a distance matrix csv into python dictionary structure

Python combine values of identical dictionaries without using looping

Efficient way to select row from a DataFrame based on varying list of columns

Pandas: how to remove duplicate rows, but keep ALL rows with max value [duplicate]

Convert a dictionary of equal length lists into a list of dictionaries [duplicate]

Categories

Resources