Pandas get cell index by header and row names - python

I would like to acces cells by header and row names (size can vary)
df = pandas.read_excel('file.xlsx')
print(df)
id A B C
0 D 1 2 3
1 E 4 5 6
2 F 7 8 1
# into this...
{
("A", "D") : "1",
("B", "D") : "2",
...
}

If order is not important create MultiIndex Series by DataFrame.set_index with DataFrame.unstack and then use Series.to_dict:
d = df.set_index('id').unstack().to_dict()
print (d)
{('A', 'D'): 1, ('A', 'E'): 4, ('A', 'F'): 7, ('B', 'D'): 2,
('B', 'E'): 5, ('B', 'F'): 8, ('C', 'D'): 3, ('C', 'E'): 6, ('C', 'F'): 1}
If order is important (python 3):
d = df.set_index('id').unstack().sort_index(level=1).to_dict()
print (d)
{('A', 'D'): 1, ('B', 'D'): 2, ('C', 'D'): 3, ('A', 'E'): 4, ('B', 'E'): 5,
('C', 'E'): 6, ('A', 'F'): 7, ('B', 'F'): 8, ('C', 'F'): 1}
df1 = df.set_index('id')
c = np.tile(df1.columns, len(df1))
i = np.repeat(df1.index, len(df1.columns))
v = np.ravel(df1)
d = {(a,b):c for a,b,c in zip(c,i, v)}
print (d)
{('A', 'D'): 1, ('B', 'D'): 2, ('C', 'D'): 3, ('A', 'E'): 4, ('B', 'E'): 5,
('C', 'E'): 6, ('A', 'F'): 7, ('B', 'F'): 8, ('C', 'F'): 1}

Related

Changing dictionary format

I want to change a dictionary below ...
dict = {
'A': [('B', 1), ('C', 3), ('D', 7)],
'B': [('D', 5)],
'C': [('D', 12)] }
into other form like this:
dict = [
('A', 'B', 1), ('A', 'C', 3), ('A', 'D', 7),
('B', 'D', 5), ('C', 'D', 12)]
This is what I done.
dict = {
'A': [('B', 1), ('C', 3), ('D', 7)],
'B': [('D', 5)],
'C': [('D', 12)] }
if(i[0] in dict):
value = dict[i[0]]
newvalue = i[1],i[2]
value.append(newvalue)
dict1[i[0]]=value
else:
newvalue = i[1],i[2]
l=[]
l.append(newvalue)
dict[i[0]]=l
print(dict)
Thanks
Python tuple is an immutable object. Hence any operation that tries to modify it (like append) is not allowed. However, following workaround can be used.
dict = {
'A': [('B', 1), ('C', 3), ('D', 7)],
'B': [('D', 5)],
'C': [('D', 12)] }
new_dict = []
for key, tuple_list in dict.items():
for tuple_item in tuple_list:
entry = list(tuple_item)
entry.append(key)
new_dict.append(tuple(entry))
print(new_dict)
Output:
[('B', 1, 'A'), ('C', 3, 'A'), ('D', 7, 'A'), ('D', 5, 'B'), ('D', 12, 'C')]
A simple aproach could be
new_dict = []
for letter1, list in dict.items():
for letter2, value in list:
new_dict.append([letter1, letter2, value])
With list comprehension;
dict_ = {
'A': [('B', 1), ('C', 3), ('D', 7)],
'B': [('D', 5)],
'C': [('D', 12)] }
result = [(key, value[0], value[1]) for key, list_ in dict_.items() for value in list_]
Output;
[('A', 'B', 1), ('A', 'C', 3), ('A', 'D', 7), ('B', 'D', 5), ('C', 'D', 12)]
You can iterate through the dictionary using .items(). Notice that each value is by itself a list of tuples. We want to unpack each tuple, so we need a nested for-loop as shown below. res is the output list that we will populate within the loop.
res = []
for key, values in dict.items():
for value in values:
res.append((key, value[0], value[1]))
Sample output:
>>> res
[('A', 'B', 1), ('A', 'C', 3), ('A', 'D', 7), ('B', 'D', 5), ('C', 'D', 12)]
EDIT: If value is a tuple of more than two elements, we would modify the last line as follows, using tuple unpacking:
res.append((key, *value))
This effectively unpacks all the elements of value. For example,
>>> test = (1, 2, 3)
>>> (0, *test)
(0, 1, 2, 3)

Dataframe to a dictionary

I have a very large dataframe, a sample of which looks like this:
df = pd.DataFrame({'From':['a','b','c','a','d'], 'To':['b', 'c', 'a', 'd', 'e'], 'Rates':[1e-4, 2.3e-2, 1e-2, 100, 70]})
In[121]: df
Out[121]:
From To Rates
0 a b 0.0001
1 b c 0.0230
2 c a 0.0100
3 a d 100.0000
4 d e 70.0000
The end result I would like is a dictionary that looks like this:
{('a', 'b'): 0.0001,
('a', 'd'): 100.0,
('b', 'c'): 0.023,
('c', 'a'): 0.01,
('d', 'e'): 70.0}
The following code works but it is very inefficient for a large df.
from_comps = list(df['From'])
to_comps = list(df['To'])
transfer_rates = {}
for from_comp in from_comps:
for to_comp in to_comps:
try:
transfer_rates[from_comp, to_comp] = df.loc[(df['From'] == from_comp) & (df['To'] == to_comp)]['Rates'].values[0]
except:
pass
Is there a more efficient way of doing this?
Given the input provided, it's far simpler to use the built-in to_dict() method. Note that for a more complex dataset, this might require more tweaking.
df = pd.DataFrame({'From':['a','b','c','a','d'], 'To':['b', 'c', 'a', 'd', 'e'], 'Rates':[1e-4, 2.3e-2, 1e-2, 100, 70]})
df.set_index(['From','To']).to_dict()
{'Rates': {('a', 'b'): 0.0001,
('b', 'c'): 0.023,
('c', 'a'): 0.01,
('a', 'd'): 100.0,
('d', 'e'): 70.0}}
df.set_index(['From','To']).to_dict()['Rates']
{('a', 'b'): 0.0001,
('b', 'c'): 0.023,
('c', 'a'): 0.01,
('a', 'd'): 100.0,
('d', 'e'): 70.0}
We can also use the to_records method to get the desired results.
{(item.From, item.To): item.Rates for item in df.to_records(index=False)}
{('a', 'b'): 0.0001,
('b', 'c'): 0.023,
('c', 'a'): 0.01,
('a', 'd'): 100.0,
('d', 'e'): 70.0}
You could use df.to_dict and pivot_table
df['key'] = list(zip(df['From'], df['To']))
df[['key', 'Rates']].pivot_table(columns='key', values='Rates').to_dict()
{('a', 'b'): {'Rates': 0.0001}, ('a', 'd'): {'Rates': 100.0}, ('b', 'c'): {'Rates': 0.023}, ('c', 'a'): {'Rates': 0.01}, ('d', 'e'): {'Rates': 70.0}}

How can I replace all the values of a Python dictionary with a range of values?

I have the following dictionary:
mydict = {('a', 'b'): 28.379,
('c', 'd'): 32.292,
('e', 'f'): 61.295,
('g', 'h'): 112.593,
('i', 'j'): 117.975}
And I would like to replace all the values with a range from 1 to 5, but keep the order of the keys. As a result, I would get this:
mydict = {('a', 'b'): 1,
('c', 'd'): 2,
('e', 'f'): 3,
('g', 'h'): 4,
('i', 'j'): 5}
The length of the dictionary is actually 22000, so I need a range from 1 to 22000.
How can I do it?
Thanks in advance.
Using enumerate to iterate on the keys, you can do:
mydict = {('a', 'b'): 28.379,
('c', 'd'): 32.292,
('e', 'f'): 61.295,
('g', 'h'): 112.593,
('i', 'j'): 117.975}
for i, key in enumerate(mydict): # iterates on the keys
mydict[key] = i
print(mydict)
# {('a', 'b'): 0, ('c', 'd'): 1, ('e', 'f'): 2, ('g', 'h'): 3, ('i', 'j'): 4}
Important note: dicts are only officially ordered since Python 3.7 (and in the CPython implementation since 3.6), so this would n't make much sense with older versions of Python.
To answer your comment: enumerate takes an optional second parameter start(that defaults to 0)
So, if you want to start at 1, just do:
for i, key in enumerate(mydict, start=1): # iterates on the keys
mydict[key] = i
The most simple is to create another dictionary from the keys of the previous one.
mydict2=dict()
for i,key in enumerate(mydict):
mydict2[key]=i+1
You can do this with a one-liner which is more compact:
mydict = {('a', 'b'): 28.379,
('c', 'd'): 32.292,
('e', 'f'): 61.295,
('g', 'h'): 112.593,
('i', 'j'): 117.975}
{k: i for i, (k, v) in enumerate(mydict.items())}
Pandas solution for this:
import pandas as pd
a = pd.DataFrame(mydict, index=[0]).T
a[0] = list(range(0,len(a)))
a.to_dict()[0]
# {('a', 'b'): 0, ('c', 'd'): 1, ('e', 'f'): 2, ('g', 'h'): 3, ('i', 'j'): 4}
This can be done gracefully with dict.update and itertools.count, and explicit loops can be avoided:
>>> mydict = {('a', 'b'): 28.379,
... ('c', 'd'): 32.292,
... ('e', 'f'): 61.295,
... ('g', 'h'): 112.593,
... ('i', 'j'): 117.975}
>>> from itertools import count
>>> mydict.update(zip(mydict, count(1)))
>>> mydict
{('a', 'b'): 1, ('c', 'd'): 2, ('e', 'f'): 3, ('g', 'h'): 4, ('i', 'j'): 5}

Changing Every Instance of a tuple

I am trying to WRITE A FUNCTION to change the every instance in a list of
tuples. Basically i need to convert the every instance of the list from ('value', number, 'value') to Arc('value', number, 'value')
Input: [('root', 1, 'a'), ('b', 0.0, 'root'), ('b', 2, 'c'), ('a', 5, 'd'), ('b', 7, 'a')]
def Convert(t):
t1=('head', 'weight', 'tail')
t2=namedtuple('Arc', (t1))
return t2
Required Output: [Arc('root', 1, 'a'), Arc('b', 0.0, 'root'), Arc('b', 2, 'c'), Arc('a', 5, 'd'), Arc('b', 7, 'a')]
You can use list-comprehension to convert your list of tuples to list of named-tuples:
t = [ ('root', 1, 'a'), ('b', 0.0, 'root'), ('b', 2, 'c'), ('a', 5, 'd'), ('b', 7, 'a') ]
from collections import namedtuple
Arc = namedtuple('Arc', 'head weight tail')
def Convert(t):
return [Arc(*item) for item in t]
print(Convert(t))
Prints:
[Arc(head='root', weight=1, tail='a'), Arc(head='b', weight=0.0, tail='root'), Arc(head='b', weight=2, tail='c'), Arc(head='a', weight=5, tail='d'), Arc(head='b', weight=7, tail='a')]

Matrix file to dictionary in python

I have a file matrix.txt that contains :
A B C
A 1 2 3
B 4 5 6
C 7 8 9
I want to read the content of the file and store it in a dictionary as following :
{('A', 'A') : 1, ('A', 'B') : 2, ('A', 'C') : 3,
('B', 'A') : 4, ('B', 'B') : 5, ('B', 'C') : 6,
('C', 'A') : 7, ('C', 'B') : 8, ('C', 'C') : 9}
The following Python3 function will yield all matrix items with it's indices, compatible with dict constructor:
def read_mx_cells(file, parse_cell = lambda x:x):
rows = (line.rstrip().split() for line in file)
header = next(rows)
for row in rows:
row_id = row[0]
for col_id,cell in zip(header, row[1:]):
yield ((row_id, col_id), parse_cell(cell))
with open('matrix.txt') as f:
for x in read_mx_cells(f, int):
print(x)
# ('A','A'),1
# ('A','B'),2
# ('A','C'),3 ...
with open('matrix.txt') as f:
print(dict(read_mx_cells(f, int)))
# { ('A','A'): 1, ('A','B'): 2, ('A','C'): 3 ... }
# Note that python dicts dont retain item order
You can use itertools.product to create your keys, using the file header and the first column after transposing to create the keys, then just zip transforming the remaining rows back to their original state and creating a single iterable of the split substrings. To maintain order we also need to use an OrderedDict:
from collections import OrderedDict
from itertools import izip, product, imap, chain
with open("matrix.txt") as f:
head, zipped = next(f).split(), izip(*imap(str.split, f))
cols = next(zipped)
od = OrderedDict(zip(product(head, cols), chain.from_iterable(izip(*zipped))))
Output:
OrderedDict([(('A', 'A'), '1'), (('A', 'B'), '2'), (('A', 'C'), '3'),
(('B', 'A'), '4'), (('B', 'B'), '5'), (('B', 'C'), '6'), (('C', 'A'), '7'),
(('C', 'B'), '8'), (('C', 'C'), '9')])
For python3 just use map and zip.
Or without transposing and using the csv lib:
from collections import OrderedDict
from itertools import izip,repeat
import csv
with open("matrix.txt") as f:
r = csv.reader(f, delimiter=" ", skipinitialspace=1)
head = repeat(next(r))
od = OrderedDict((((row[0], k), v) for row in r
for k, v in izip(next(head), row[1:])))
output will be the same.
pandas makes it pretty neat.
import pandas as pd
Approach 1
df = pd.read_table('matrix.txt', sep=' ')
>>> df
A B C
A 1 2 3
B 4 5 6
C 7 8 9
d = df.to_dict()
>>> d
{'A': {'A': 1, 'B': 4, 'C': 7},
'B': {'A': 2, 'B': 5, 'C': 8},
'C': {'A': 3, 'B': 6, 'C': 9}}
new_d = {}
{new_d.update(g) for g in [{(r,c):v for r,v in v1.iteritems()} for c,v1 in d.iteritems()]}
>>> new_d
{('A', 'A'): 1,
('A', 'B'): 2,
('A', 'C'): 3,
('B', 'A'): 4,
('B', 'B'): 5,
('B', 'C'): 6,
('C', 'A'): 7,
('C', 'B'): 8,
('C', 'C'): 9}
Approach 2
df = pd.read_table('matrix.txt', sep=' ')
>>> df
A B C
A 1 2 3
B 4 5 6
C 7 8 9
new_d = {}
for r, v in df.iterrows():
for c, v1 in v.iteritems():
new_d.update({(r,c): v1})
>>> new_d
{('A', 'A'): 1,
('A', 'B'): 2,
('A', 'C'): 3,
('B', 'A'): 4,
('B', 'B'): 5,
('B', 'C'): 6,
('C', 'A'): 7,
('C', 'B'): 8,
('C', 'C'): 9}

Categories

Resources