Matrix file to dictionary in python

Matrix file to dictionary in python - python

I have a file matrix.txt that contains :
A B C
A 1 2 3
B 4 5 6
C 7 8 9
I want to read the content of the file and store it in a dictionary as following :
{('A', 'A') : 1, ('A', 'B') : 2, ('A', 'C') : 3,
('B', 'A') : 4, ('B', 'B') : 5, ('B', 'C') : 6,
('C', 'A') : 7, ('C', 'B') : 8, ('C', 'C') : 9}

The following Python3 function will yield all matrix items with it's indices, compatible with dict constructor:
def read_mx_cells(file, parse_cell = lambda x:x):
rows = (line.rstrip().split() for line in file)
header = next(rows)
for row in rows:
row_id = row[0]
for col_id,cell in zip(header, row[1:]):
yield ((row_id, col_id), parse_cell(cell))
with open('matrix.txt') as f:
for x in read_mx_cells(f, int):
print(x)
# ('A','A'),1
# ('A','B'),2
# ('A','C'),3 ...
with open('matrix.txt') as f:
print(dict(read_mx_cells(f, int)))
# { ('A','A'): 1, ('A','B'): 2, ('A','C'): 3 ... }
# Note that python dicts dont retain item order

You can use itertools.product to create your keys, using the file header and the first column after transposing to create the keys, then just zip transforming the remaining rows back to their original state and creating a single iterable of the split substrings. To maintain order we also need to use an OrderedDict:
from collections import OrderedDict
from itertools import izip, product, imap, chain
with open("matrix.txt") as f:
head, zipped = next(f).split(), izip(*imap(str.split, f))
cols = next(zipped)
od = OrderedDict(zip(product(head, cols), chain.from_iterable(izip(*zipped))))
Output:
OrderedDict([(('A', 'A'), '1'), (('A', 'B'), '2'), (('A', 'C'), '3'),
(('B', 'A'), '4'), (('B', 'B'), '5'), (('B', 'C'), '6'), (('C', 'A'), '7'),
(('C', 'B'), '8'), (('C', 'C'), '9')])
For python3 just use map and zip.
Or without transposing and using the csv lib:
from collections import OrderedDict
from itertools import izip,repeat
import csv
with open("matrix.txt") as f:
r = csv.reader(f, delimiter=" ", skipinitialspace=1)
head = repeat(next(r))
od = OrderedDict((((row[0], k), v) for row in r
for k, v in izip(next(head), row[1:])))
output will be the same.

pandas makes it pretty neat.
import pandas as pd
Approach 1
df = pd.read_table('matrix.txt', sep=' ')
>>> df
A B C
A 1 2 3
B 4 5 6
C 7 8 9
d = df.to_dict()
>>> d
{'A': {'A': 1, 'B': 4, 'C': 7},
'B': {'A': 2, 'B': 5, 'C': 8},
'C': {'A': 3, 'B': 6, 'C': 9}}
new_d = {}
{new_d.update(g) for g in [{(r,c):v for r,v in v1.iteritems()} for c,v1 in d.iteritems()]}
>>> new_d
{('A', 'A'): 1,
('A', 'B'): 2,
('A', 'C'): 3,
('B', 'A'): 4,
('B', 'B'): 5,
('B', 'C'): 6,
('C', 'A'): 7,
('C', 'B'): 8,
('C', 'C'): 9}
Approach 2
df = pd.read_table('matrix.txt', sep=' ')
>>> df
A B C
A 1 2 3
B 4 5 6
C 7 8 9
new_d = {}
for r, v in df.iterrows():
for c, v1 in v.iteritems():
new_d.update({(r,c): v1})
>>> new_d
{('A', 'A'): 1,
('A', 'B'): 2,
('A', 'C'): 3,
('B', 'A'): 4,
('B', 'B'): 5,
('B', 'C'): 6,
('C', 'A'): 7,
('C', 'B'): 8,
('C', 'C'): 9}

Related

Pandas dataframe to dict of list of tuples

Suppose I have the following dataframe:
df = pd.DataFrame({'id': [1,2,3,3,3], 'v1': ['a', 'a', 'c', 'c', 'd'], 'v2': ['z', 'y', 'w', 'y', 'z']})
df
id v1 v2
1 a z
2 a y
3 c w
3 c y
3 d z
And I want to transform it to this format:
{1: [('a', 'z')], 2: [('a', 'y')], 3: [('c', 'w'), ('c', 'y'), ('d', 'z')]}
I basically want to create a dict where the keys are the id and the values is a list of tuples of the (v1,v2) of this id.
I tried using groupby in id:
df.groupby('id')[['v1', 'v2']].apply(list)
But this didn't work

Create tuples first and then pass to groupby with aggregate list:
d = df[['v1', 'v2']].agg(tuple, 1).groupby(df['id']).apply(list).to_dict()
print (d)
{1: [('a', 'z')], 2: [('a', 'y')], 3: [('c', 'w'), ('c', 'y'), ('d', 'z')]}
Another idea is using MultiIndex:
d = df.set_index(['v1', 'v2']).groupby('id').apply(lambda x: x.index.tolist()).to_dict()

You can use defaultdict from the collections library :
from collections import defaultdict
d = defaultdict(list)
for k, v, s in df.to_numpy():
d[k].append((v, s))
defaultdict(list,
{1: [('a', 'z')],
2: [('a', 'y')],
3: [('c', 'w'), ('c', 'y'), ('d', 'z')]})

df['New'] = [tuple(x) for x in df[['v1','v2']].to_records(index=False)]
df=df[['id','New']]
df=df.set_index('id')
df.to_dict()
Output:
{'New': {1: ('a', 'z'), 2: ('a', 'y'), 3: ('d', 'z')}}

Pandas get cell index by header and row names

I would like to acces cells by header and row names (size can vary)
df = pandas.read_excel('file.xlsx')
print(df)
id A B C
0 D 1 2 3
1 E 4 5 6
2 F 7 8 1
# into this...
{
("A", "D") : "1",
("B", "D") : "2",
...
}

If order is not important create MultiIndex Series by DataFrame.set_index with DataFrame.unstack and then use Series.to_dict:
d = df.set_index('id').unstack().to_dict()
print (d)
{('A', 'D'): 1, ('A', 'E'): 4, ('A', 'F'): 7, ('B', 'D'): 2,
('B', 'E'): 5, ('B', 'F'): 8, ('C', 'D'): 3, ('C', 'E'): 6, ('C', 'F'): 1}
If order is important (python 3):
d = df.set_index('id').unstack().sort_index(level=1).to_dict()
print (d)
{('A', 'D'): 1, ('B', 'D'): 2, ('C', 'D'): 3, ('A', 'E'): 4, ('B', 'E'): 5,
('C', 'E'): 6, ('A', 'F'): 7, ('B', 'F'): 8, ('C', 'F'): 1}
df1 = df.set_index('id')
c = np.tile(df1.columns, len(df1))
i = np.repeat(df1.index, len(df1.columns))
v = np.ravel(df1)
d = {(a,b):c for a,b,c in zip(c,i, v)}
print (d)
{('A', 'D'): 1, ('B', 'D'): 2, ('C', 'D'): 3, ('A', 'E'): 4, ('B', 'E'): 5,
('C', 'E'): 6, ('A', 'F'): 7, ('B', 'F'): 8, ('C', 'F'): 1}

How can I replace all the values of a Python dictionary with a range of values?

I have the following dictionary:
mydict = {('a', 'b'): 28.379,
('c', 'd'): 32.292,
('e', 'f'): 61.295,
('g', 'h'): 112.593,
('i', 'j'): 117.975}
And I would like to replace all the values with a range from 1 to 5, but keep the order of the keys. As a result, I would get this:
mydict = {('a', 'b'): 1,
('c', 'd'): 2,
('e', 'f'): 3,
('g', 'h'): 4,
('i', 'j'): 5}
The length of the dictionary is actually 22000, so I need a range from 1 to 22000.
How can I do it?
Thanks in advance.

Using enumerate to iterate on the keys, you can do:
mydict = {('a', 'b'): 28.379,
('c', 'd'): 32.292,
('e', 'f'): 61.295,
('g', 'h'): 112.593,
('i', 'j'): 117.975}
for i, key in enumerate(mydict): # iterates on the keys
mydict[key] = i
print(mydict)
# {('a', 'b'): 0, ('c', 'd'): 1, ('e', 'f'): 2, ('g', 'h'): 3, ('i', 'j'): 4}
Important note: dicts are only officially ordered since Python 3.7 (and in the CPython implementation since 3.6), so this would n't make much sense with older versions of Python.
To answer your comment: enumerate takes an optional second parameter start(that defaults to 0)
So, if you want to start at 1, just do:
for i, key in enumerate(mydict, start=1): # iterates on the keys
mydict[key] = i

The most simple is to create another dictionary from the keys of the previous one.
mydict2=dict()
for i,key in enumerate(mydict):
mydict2[key]=i+1

You can do this with a one-liner which is more compact:
mydict = {('a', 'b'): 28.379,
('c', 'd'): 32.292,
('e', 'f'): 61.295,
('g', 'h'): 112.593,
('i', 'j'): 117.975}
{k: i for i, (k, v) in enumerate(mydict.items())}

Pandas solution for this:
import pandas as pd
a = pd.DataFrame(mydict, index=[0]).T
a[0] = list(range(0,len(a)))
a.to_dict()[0]
# {('a', 'b'): 0, ('c', 'd'): 1, ('e', 'f'): 2, ('g', 'h'): 3, ('i', 'j'): 4}

This can be done gracefully with dict.update and itertools.count, and explicit loops can be avoided:
>>> mydict = {('a', 'b'): 28.379,
... ('c', 'd'): 32.292,
... ('e', 'f'): 61.295,
... ('g', 'h'): 112.593,
... ('i', 'j'): 117.975}
>>> from itertools import count
>>> mydict.update(zip(mydict, count(1)))
>>> mydict
{('a', 'b'): 1, ('c', 'd'): 2, ('e', 'f'): 3, ('g', 'h'): 4, ('i', 'j'): 5}

Most Pythonic way for creating a defaultdictionary counter

I am trying to count occurrences of various items based on condition. What I have until now is this function that given two items will increase the counter like this:
given [('a', 'a'), ('a', 'b'), ('b', 'a')] will output defaultdict(<class 'collections.Counter'>, {'a': Counter({'a': 1, 'b': 1}), 'b': Counter({'a': 1})
the function can be seen bellow
def freq(samples=None):
out = defaultdict(Counter)
if samples:
for (c, s) in samples:
out[c][s] += 1
return out
It is limited though to only work with tuples while I would like it to be more generic and work with any number of variables e.g., [('a', 'a', 'b'), ('a', 'b', 'c'), ('b', 'a', 'a')] would still work and I would be able to query the result for lets say res['a']['b'] and get the count for 'c' that is one.
What would be the best way to do this in Python?

Assuming all tuples in the list have the same length:
from collections import Counter
from itertools import groupby
from operator import itemgetter
def freq(samples=[]):
sorted_samples = sorted(samples)
if sorted_samples and len(sorted_samples[0]) > 2:
return {key: freq(value[1:] for value in values) for key, values in groupby(sorted_samples, itemgetter(0))}
else:
return {key: Counter(value[1] for value in values) for key, values in groupby(sorted_samples, itemgetter(0))}
That gives:
freq([('a', 'a'), ('a', 'b'), ('b', 'a'), ('a', 'c')])
>>> {'a': Counter({'a': 1, 'b': 1, 'c': 1}), 'b': Counter({'a': 1})}
freq([('a', 'a', 'a'), ('a', 'b', 'c'), ('b', 'a', 'a'), ('a', 'c', 'c')])
>>> {'a': {'a': Counter({'a': 1}), 'b': Counter({'c': 1}), 'c': Counter({'c': 1})}, 'b': {'a': Counter({'a': 1})}}

One option is to use the full tuples as keys
def freq(samples=[]):
out = Counter()
for sample in samples:
out[sample] += 1
return out
which would then return things as
Counter({('a', 'a', 'b'): 1, ('a', 'b', 'c'): 1, ('b', 'a', 'a'): 1})
You could convert the tuples to strings to select certain slices, e.g. "('a', 'b',". For example in a new dictionary {k: v for k,v in out.items() if str(k)[:10] == "('a', 'b',"}.
If the groups are indeed either 2 or 3 long, but never both, you can change to:
def freq(samples):
l = len(samples[0])
if l == 2:
out = defaultdict(lambda: 0)
for a, b in samples:
out[a][b] += 1
elif l == 3:
out = defaultdict(lambda: defaultdict(lambda: 0))
for a, b, c in samples:
out[a][b][c] += 1
return out

Python tuple operations and count

I have the following tuple.I want to build a string which outputs as stated in output.I want count all the elements corresponding to 'a' i.e, how many k1 occured w.r.t 'a' and so on .What is the easiest way to do this
a=[('a','k1'),('b','k2'),('a','k2'),('a','k1'),('b','k2'),('a','k1'),('b','k2'),('c','k3'),('c','k4')]
Output should be in a string output=""
a k1 3
a k2 1
b k1 1
b k2 3
c k3 1
c k4 1

Use the Counter class from collections:
>>> a = [('a', 'k1'), ('b', 'k2'), ('a', 'k2'), ('a', 'k1'), ('b', 'k2'), ('a', 'k1'), ('b', 'k2'), ('c', 'k3'), ('c', 'k4')]
>>> from collections import Counter
>>> c = Counter(a)
Counter({('b', 'k2'): 3, ('a', 'k1'): 3, ('a', 'k2'): 1, ('c', 'k3'): 1, ('c', 'k4'): 1})
You can use c.items() to iterate over the counts:
>>> for item in c.items():
... print(item)
...
(('a', 'k2'), 1)
(('c', 'k3'), 1)
(('b', 'k2'), 3)
(('a', 'k1'), 3)
(('c', 'k4'), 1)
The above code is Python 3. The Counter class is new in Python 2.7. You can now rearrange the items in the desired order and convert them to a string if needed.

You can do the addition portion easily with defaultdict. The default dict works like a normal dictionary, except it has a default value for empty key stores so you can easily increment your counter when you iterate over your data set.
a=[('a','k1'),('b','k2'),('a','k2'),('a','k1'),('b','k2'),('a','k1'),('b','k2'),('c','k3'),('c','k4')]
from collections import defaultdict
b = defaultdict(int)
for item in a:
b[item] += 1
print b
defaultdict(<type 'int'>, {('a', 'k2'): 1, ('c', 'k3'): 1, ('b', 'k2'): 3, ('a', 'k1'): 3, ('c', 'k4'): 1})
And for pretty printing it, just iterate over the resulting data and print it how you want.
for key, value in b.iteritems():
print '%s %s %s' % (key[0], key[1], value)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Matrix file to dictionary in python - python

I have a file matrix.txt that contains : A B C A 1 2 3 B 4 5 6 C 7 8 9 I want to read the content of the file and store it in a dictionary as following : {('A', 'A') : 1, ('A', 'B') : 2, ('A', 'C') : 3, ('B', 'A') : 4, ('B', 'B') : 5, ('B', 'C') : 6, ('C', 'A') : 7, ('C', 'B') : 8, ('C', 'C') : 9}

Related

Pandas dataframe to dict of list of tuples

Pandas get cell index by header and row names

How can I replace all the values of a Python dictionary with a range of values?

Most Pythonic way for creating a defaultdictionary counter

Python tuple operations and count

Categories

Resources