bar plot a multiheader dataframe in a desired format - python

I have the following DataFrame:
data = {('Case1', 'A'): {'One': 0.96396415, 'Two': 0.832049574, 'Three': 0.636568627, 'Four': 0.765846157},
('Case1', 'B'): {'One': 0.257496625, 'Two': 0.984418254, 'Three': 0.018891398, 'Four': 0.440278509},
('Case1', 'C'): {'One': 0.512732941, 'Two': 0.622697929, 'Three': 0.731555346, 'Four': 0.031419349},
('Case2', 'A'): {'One': 0.736783294, 'Two': 0.460765675, 'Three': 0.078558864, 'Four': 0.566186283},
('Case2', 'B'): {'One': 0.921473211, 'Two': 0.274749932, 'Three': 0.312766018, 'Four': 0.159229808},
('Case2', 'C'): {'One': 0.146389032, 'Two': 0.893299471, 'Three': 0.536288712, 'Four': 0.775763286},
('Case3', 'A'): {'One': 0.351607026, 'Two': 0.041402396, 'Three': 0.924265706, 'Four': 0.639154727},
('Case3', 'B'): {'One': 0.966538215, 'Two': 0.658236148, 'Three': 0.473447279, 'Four': 0.545974617},
('Case3', 'C'): {'One': 0.036585457, 'Two': 0.279443317, 'Three': 0.407991168, 'Four': 0.101083315}}
pd.DataFrame(data=data)
Case1
Case1
Case1
Case2
Case2
Case2
Case3
Case3
Case3
A
B
C
A
B
C
A
B
C
One
0,963964
0,257497
0,512733
0,736783
0,921473
0,146389
0,351607
0,966538
0,036585
Two
0,83205
0,984418
0,622698
0,460766
0,27475
0,893299
0,041402
0,658236
0,279443
Three
0,636569
0,018891
0,731555
0,078559
0,312766
0,536289
0,924266
0,473447
0,407991
Four
0,765846
0,440279
0,031419
0,566186
0,15923
0,775763
0,639155
0,545975
0,101083
There are 2 header rows.
In the end i need a plot like the following (which i created in excel).
Another solution would be a seperate plot for every Case, instead of all in one.
What i tried so far is:
df.T.melt(ignore_index=False)
to get the DataFrame in a format like i used in excel.
But from there i could not figure any solution to get the right plot. Maybe the transpose/melt is not even necessary.
Can anyone give me a hint on how to achieve the desired plot?

Make sure to reshape your dataframe first so the two levels of the x-axis be as a MultiIndex.
df = df.T.stack().unstack(level=1)
Then make the plot this way (highly inspired by #gyx-hh answer) :
def plot_function(x, ax):
ax = graph[x]
ax.set_xlabel(x)
return df.xs(x).plot.bar(ax=ax, legend=False, rot=0)
fig, axes = plt.subplots(nrows=1, ncols=len(df.index.levels[0]),
sharey=True, figsize=(16, 4))
graph = dict(zip(df.index.levels[0], axes))
plots = list(map(lambda x: plot_function(x, graph[x]), graph))
ax.tick_params(axis="both", which="both", length=0)
fig.subplots_adjust(wspace=0)
plt.legend()
plt.show();
Output :

Given your data, if it is in Excel, one would read it like this:
import pandas as pd
df = pd.read_excel('values.xlsx', index_col=0, header=[0,1])
To plot something very similar to what you would like:
long_df = df.T.stack().reset_index()
long_df.columns = ['cases', 'subcases', 'observations', 'value']
transformed = long_df.pivot_table(
index=['cases', 'observations'],
columns='subcases',
values='value',
sort=False)
transformed.plot(kind='bar')
Explaining how it works:
long_df is a df normalized to a 'long' format using stack function.
pivot data to get it in a correct orientation for simple plotting
plot in matplotlib using .plot method and bar option.

Related

How to remove subset dictionaries from list of dictionaries

I have a list of dictionaries, and some of them are subsets:
l = [
{'zero': 'zero', 'one': 'example', 'two': 'second'},
{'zero': 'zero', 'one': 'example', 'two': 'second', 'three': 'blabla'},
{'zero': 'zero'},
{'zero': 'non-zero', 'one': 'example'}, ...
]
And I want to create a new list of dictionaries that do not contain a subset of dictionaries.
res = [
{'zero': 'zero', 'one': 'example', 'two': 'second', 'three': 'blabla'},
{{'zero': 'non-zero', 'one': 'example'}, ...
]
This work around will create a new list that only contains dictionaries that are not subsets of any other dictionary in the list
res = [
d for d in l
if not any(set(d.items()).issubset(set(other.items()))
for other in l if other != d)
]
print(res)
Output:
[{'zero': 'zero', 'one': 'example', 'two': 'second', 'three': 'blabla'},
{'zero': 'non-zero', 'one': 'example'}]

Forming a nested list and calculating the sum of fields

The records are stored in the database in this form:
clu
name
a
b
q
one
7
6
q
two
6
9
q
five
6
10
e
three
8
7
e
four
10
4
I am writing a function that will calculate the sum of the given values for different clu variables (for example, q and e). That is, first it is determined which values of name correspond to the value of e (these are three and four). Then there is an appeal to the dictionary "e_dict", which stores certain values corresponding to different names. The function should determine that for group e it is necessary to add the values of the "name" keys three and four and give the result.
Example of a dictionary e_dict:
{'one': {'u_mean': 4.25, 'c_mean': 4.25}, 'three': {'u_mean': 4.5, 'c_mean': 4.5}, 'two': {'u_mean': 4.583333333333334, 'c_mean': 4.583333333333334}, 'four': {'u_mean': 4.5625, 'c_mean': 4.5625}, 'five': {'u_mean': 4.65, 'c_mean': 4.65}}
The result should be like this:
{'e': {'u_mean': 4.531, 'c_mean': 4.531}, 'q': {'u_mean': 4.49443, 'c_mean': 4.49443}}
that is, the fields are all u_mean and their average is found, and they are also added.
The full code of my function:
def group_names():
st, c_clus, n_names = [], [], []
for h in Utilizations.objects.values('clu', 'name', 'a', 'b'):
st.append((h.get('clu'), h.get('name'), h.get('a'), h.get('b')))
c_clus.append(h.get('clu'))
n_names.append(h.get('name'))
"""получение названий"""
names, clus = [], []
for nam in n_names:
if nam not in names:
names.append(nam)
for cl in c_clus:
if cl not in clus:
clus.append(cl)
clu, e = {}, {}
u_load, u_max = {}, {}
mean_all, u_load_mean, u_max_mean = 0, 0, 0
for nam in names:
hs = Utilizations.objects.filter(name=nam)
o, p = 0, 0
for h in hs:
o += h.a
p += h.b
u_load[nam] = o / 2 + 1
u_max[nam] = p / 2 + 1
u_max_mean = mean(u_max.values())
u_load_mean = mean(u_load.values())
mean_all = (u_max_mean + u_load_mean) / 2
e[nam] = {'u_mean': mean_all, 'c_mean': mean_all}
for cl in clus:
for nam in names:
s = Utilizations.objects.filter(name=nam, clu=cl)
for h in hs:
clu[nam] = cl
return clu
It turns out to group in this form: {'one': 'q', 'two': 'q', 'five': 'q', 'three': 'e', 'four': 'e'}
And I don't know what to do next(
I don't know how your original data is stored (I don't recognize "Utilizations.objects.values"), but here's code that will compute those averages based on a simple list of lists:
data = [
['q','one',7,6],
['q','two',7,6],
['q','five',7,6],
['e','three',7,6],
['e','four',7,6]
]
e_dict = {
'one': {'u_mean': 4.25, 'c_mean': 4.25},
'three': {'u_mean': 4.5, 'c_mean': 4.5},
'two': {'u_mean': 4.583333333333334, 'c_mean': 4.583333333333334},
'four': {'u_mean': 4.5625, 'c_mean': 4.5625},
'five': {'u_mean': 4.65, 'c_mean': 4.65}
}
def group_names():
sums = {}
counts = {}
for h in data:
if h[0] not in sums:
sums[h[0]] = { "u_mean": 0, "c_mean": 0 }
counts[h[0]] = 0
for k,v in e_dict[h[1]].items():
sums[h[0]][k] += v
counts[h[0]] += 1
for k,v in sums.items():
sums[k]['u_mean'] /= counts[k]
sums[k]['c_mean'] /= counts[k]
return sums
print(group_names())
Output:
{'q': {'u_mean': 4.4944444444444445, 'c_mean': 4.4944444444444445}, 'e': {'u_mean': 4.53125, 'c_mean': 4.53125}}
You can use pandas for this:
Input:
data = {'clu': {0: 'q', 1: 'q', 2: 'q', 3: 'e', 4: 'e'}, 'name': {0: 'one', 1: 'two', 2: 'five', 3: 'three', 4: 'four'}, 'a': {0: 7, 1: 6, 2: 6, 3: 8, 4: 10}, 'b': {0: 6, 1: 9, 2: 10, 3: 7, 4: 4}}
e_dict = {'one': {'u_mean': 4.25, 'c_mean': 4.25}, 'three': {'u_mean': 4.5, 'c_mean': 4.5}, 'two': {'u_mean': 4.583333333333334, 'c_mean': 4.583333333333334}, 'four': {'u_mean': 4.5625, 'c_mean': 4.5625}, 'five': {'u_mean': 4.65, 'c_mean': 4.65}}
Code:
import pandas as pd
df = pd.DataFrame(data)
df_e = pd.DataFrame(e_dict).T.rename_axis('name').reset_index()
df = df.merge(df_e, on=['name'])
df = df.groupby(['clu']).agg({'u_mean':'mean', 'c_mean':'mean'})
df.to_dict(orient='index')
Output:
{'e': {'u_mean': 4.53125, 'c_mean': 4.53125},
'q': {'u_mean': 4.4944444444444445, 'c_mean': 4.4944444444444445}}
Explaination:
Pandas allows to handle data in a tabular formar, similar to the data you showed in the example.
The first table (DataFrame) is df, the second one is the table containing the lookup values (e_dict) but a bit preprocessed (Transposed and renamed columns).
Then we merge both tables based on their name values, so that you have the coresponding values of u_mean and c_mean in the first table.
Now we group the values by the clue values and aggregate the values with the mean.
Finally we return the table as a dictionary.

multiple pie chart dimentions error in python

I have the dictionary in python:
dict = {1: {'A': 11472, 'C': 8405, 'T': 11428, 'G': 6613}, 2: {'A': 11678, 'C': 9388, 'T': 10262, 'G': 6590}, 3: {'A': 2945, 'C': 25843, 'T': 6980, 'G': 2150}, 4: {'A': 1149, 'C': 24552, 'T': 7000, 'G': 5217}, 5: {'A': 27373, 'C': 3166, 'T': 4494, 'G': 2885}, 6: {'A': 19300, 'C': 4252, 'T': 7510, 'G': 6856}, 7: {'A': 17744, 'C': 5390, 'T': 7472, 'G': 7312}}
this dictionary has 7 sub-dictionaries and every sub-dictionary has 4 items. I am trying to make 7 pie charts in the same figure (multiple plot) and every pit chart would have 4 sections. to plot the data I am using the following function.
def plot(array):
array = np.array([list(val.values()) for val in dict.values()])
df = pd.DataFrame(array, index=['a', 'b', 'c', 'd'], columns=['x', 'y','z','w', 'd', 't', 'u'])
plt.style.use('ggplot')
colors = plt.rcParams['axes.color_cycle']
fig, axes = plt.subplots(1,4, figsize=(10,5))
for ax, col in zip(axes, df.columns):
ax.pie(df[col], labels=df.index, autopct='%.2f', colors=colors)
ax.set(ylabel='', title=col, aspect='equal')
axes[0].legend(bbox_to_anchor=(0, 0.5))
fig.savefig('plot.pdf')
plt.show()
but this function returns a figure with 4 pie charts and every pie chart has 7 sections. and if I replace "index" and "columns" I will get the following error:
ValueError: Shape of passed values is (4, 7), indices imply (7, 4)
do you know how I can fix it? here is the figure that I will get BUT is NOT correct.
There are two issues:
You want 7 subplots but you were only creating 4 using plt.subplots(1,4). You should define (1,7) to have 7 subfigures.
You need to reshape your data accordingly. Since you need 7 pie charts, each with 4 entries, you need to reshape your array to have a shape of (4, 7)
P.S: I am using matplotlib 2.2.2 where 'axes.color_cycle' is depreciated.
Below is your modified plot function.
def plot():
array = np.array([list(val.values()) for val in dict.values()]).reshape((4, 7))
df = pd.DataFrame(array, index=['a', 'b', 'c', 'd'], columns=['x', 'y','z','w', 'd', 't', 'u'])
plt.style.use('ggplot')
colors = plt.rcParams['axes.color_cycle']
fig, axes = plt.subplots(1,7, figsize=(12,8))
for ax, col in zip(axes, df.columns):
ax.pie(df[col], labels=df.index, autopct='%.2f', colors=colors)
ax.set(ylabel='', title=col, aspect='equal')
axes[0].legend(bbox_to_anchor=(0, 0.5))

how to create a dictionary of list of dictionary

I need to create a dictionary of list of dictionaries or if there is any other way of achieving the below requirement:
i have a set of keys let say
keys = [1,2,3,4,5] (dont consider this as list but i am just showing i have let say 5 keys)
for each key i will have a set of key value pair so something lime below:
d = {
1:{
{'one': 'test', 'two': 'new', 'three': 'dummy'}
{'one': 'testtest', 'two': 'newnew', 'three': 'dummyextra'}
{'one': 'newvalue', 'two': 'newvalue2', 'three': 'newvalue4'}
}
2:{
{'one': 'test1', 'two': 'new1', 'three': 'dummy1'}
{'one': 'testtest2', 'two': 'newnew2', 'three': 'dummyextra2'}
{'one': 'newvalue3', 'two': 'newvalue23', 'three': 'newvalue43'}
}
1:{
{'one': 'test', 'two': 'new', 'three': 'dummy'}
{'one': 'testtest', 'two': 'newnew', 'three': 'dummyextra'}
{'one': 'newvalue', 'two': 'newvalue2', 'three': 'newvalue4'}
}
}
All the inner and outer dictionaries will be forming through loops.
If because of unique key the above is not possible than what will be the alternate solution to get the data in above format (list of dictionaries or dictionary of lists or anything else?).
With above my main agenda is i will have a unique tag that will be the key of outer dictionary and using that tag i will be creating one HTML header,
under that header i will b populating the data i.e. multiple links and that internal data has to come from inner dictionary from the example.
So in this example i have an html page with header title 1 and under this header i will have 3 links that wil come from inner dictionary.
Than i will have header 2 and again 3 links under it and so on.
Kindly help me to achieve this.
Thanks,
You just have to represent lists with [] not {} and don't forget the commas (,) to separate the elements:
d = {
1:[
{'one': 'test', 'two': 'new', 'three': 'dummy'},
{'one': 'testtest', 'two': 'newnew', 'three': 'dummyextra'},
{'one': 'newvalue', 'two': 'newvalue2', 'three': 'newvalue4'}
],
2:[
{'one': 'test1', 'two': 'new1', 'three': 'dummy1'},
{'one': 'testtest2', 'two': 'newnew2', 'three': 'dummyextra2'},
{'one': 'newvalue3', 'two': 'newvalue23', 'three': 'newvalue43'}
],
3:[
{'one': 'test', 'two': 'new', 'three': 'dummy'},
{'one': 'testtest', 'two': 'newnew', 'three': 'dummyextra'},
{'one': 'newvalue', 'two': 'newvalue2', 'three': 'newvalue4'}
]
}
Try to guess something similar to your input (dlist) just to how how to build a dict using a list as a default value and appending data:
dlist = [[2, {'two': 'two1'}], [1, {'one': 'one1'}], [1, {'one': 'one2'}], [2, {'two': 'two2'}] ]
res = {}
for item in dlist:
res.setdefault(item[0], list()).append(item[1])
print(res)
#=> {1: [{'one': 'one1'}, {'one': 'one2'}], 2: [{'two': 'two1'}, {'two': 'two2'}]}

itertools.izip() for not predefined count of lists

I have the following data structure: {'one':['a','b','c'],'two':['q','w','e'],'three':['t','u','y'],...}. So, the dictionary has variant count of keys. Each array, picked by dict's key has similar length. How I can convert this structure to following: [{'one':'a','two':'q','three':'t'},{'one':'b','two':'w','three':'y'},...]?
I think I should use itertools.izip(), but how i can apply it with not predefined count of args? Maybe something like this: itertools.izip([data[l] for l in data.keys()])?
TIA!
Not terribly elegant, but does the trick:
In [9]: [{k:v[i] for (k,v) in d.items()} for i in range(len(d.values()[0]))]
Out[9]:
[{'one': 'a', 'three': 't', 'two': 'q'},
{'one': 'b', 'three': 'u', 'two': 'w'},
{'one': 'c', 'three': 'y', 'two': 'e'}]
I can't help thinking that there's got to be a better way to phrase the i loop, but nothing comes to mind right now.
Alternatively:
In [50]: map(dict, zip(*[[(k, v) for v in l] for k, l in d.items()]))
Out[50]:
[{'one': 'a', 'three': 't', 'two': 'q'},
{'one': 'b', 'three': 'u', 'two': 'w'},
{'one': 'c', 'three': 'y', 'two': 'e'}]
Not sure if this is much of an improvement on the readability front though.
Your assessment in using izip is correct but the way of using it is not quite right
You first need to
get the items as a list of tuples (key, value), (using iteritems() method if using Py2.x or items() if using Py3.x)
create a scalar product of key and value
flatten the list (using itertools.chain)
zip it (using itertools.izip)
and then create a dict of each element
Here is the sample code
>>> from pprint import PrettyPrinter
>>> pp = PrettyPrinter(indent = 4)
>>> pp.pprint(map(dict, izip(*chain((product([k], v) for k, v in data.items())))))
[ { 'one': 'a', 'three': 't', 'two': 'q'},
{ 'one': 'b', 'three': 'u', 'two': 'w'},
{ 'one': 'c', 'three': 'y', 'two': 'e'}]
>>>

Categories

Resources