multiple pie chart dimentions error in python - python

I have the dictionary in python:
dict = {1: {'A': 11472, 'C': 8405, 'T': 11428, 'G': 6613}, 2: {'A': 11678, 'C': 9388, 'T': 10262, 'G': 6590}, 3: {'A': 2945, 'C': 25843, 'T': 6980, 'G': 2150}, 4: {'A': 1149, 'C': 24552, 'T': 7000, 'G': 5217}, 5: {'A': 27373, 'C': 3166, 'T': 4494, 'G': 2885}, 6: {'A': 19300, 'C': 4252, 'T': 7510, 'G': 6856}, 7: {'A': 17744, 'C': 5390, 'T': 7472, 'G': 7312}}
this dictionary has 7 sub-dictionaries and every sub-dictionary has 4 items. I am trying to make 7 pie charts in the same figure (multiple plot) and every pit chart would have 4 sections. to plot the data I am using the following function.
def plot(array):
array = np.array([list(val.values()) for val in dict.values()])
df = pd.DataFrame(array, index=['a', 'b', 'c', 'd'], columns=['x', 'y','z','w', 'd', 't', 'u'])
plt.style.use('ggplot')
colors = plt.rcParams['axes.color_cycle']
fig, axes = plt.subplots(1,4, figsize=(10,5))
for ax, col in zip(axes, df.columns):
ax.pie(df[col], labels=df.index, autopct='%.2f', colors=colors)
ax.set(ylabel='', title=col, aspect='equal')
axes[0].legend(bbox_to_anchor=(0, 0.5))
fig.savefig('plot.pdf')
plt.show()
but this function returns a figure with 4 pie charts and every pie chart has 7 sections. and if I replace "index" and "columns" I will get the following error:
ValueError: Shape of passed values is (4, 7), indices imply (7, 4)
do you know how I can fix it? here is the figure that I will get BUT is NOT correct.

There are two issues:
You want 7 subplots but you were only creating 4 using plt.subplots(1,4). You should define (1,7) to have 7 subfigures.
You need to reshape your data accordingly. Since you need 7 pie charts, each with 4 entries, you need to reshape your array to have a shape of (4, 7)
P.S: I am using matplotlib 2.2.2 where 'axes.color_cycle' is depreciated.
Below is your modified plot function.
def plot():
array = np.array([list(val.values()) for val in dict.values()]).reshape((4, 7))
df = pd.DataFrame(array, index=['a', 'b', 'c', 'd'], columns=['x', 'y','z','w', 'd', 't', 'u'])
plt.style.use('ggplot')
colors = plt.rcParams['axes.color_cycle']
fig, axes = plt.subplots(1,7, figsize=(12,8))
for ax, col in zip(axes, df.columns):
ax.pie(df[col], labels=df.index, autopct='%.2f', colors=colors)
ax.set(ylabel='', title=col, aspect='equal')
axes[0].legend(bbox_to_anchor=(0, 0.5))

Related

How do I total sum of several columns based on argument python

I have been trying to get the total value of all columns based on an argument, but it didn't work out.
import numpy as np
np.random.seed(100)
NO= pd.DataFrame({'TR':'NO', 'A': np.random.randint(1, 10,3), 'B': np.random.randint(10, 20,3), 'C': np.random.randint(25, 35,3)})
YS= pd.DataFrame({'TR':'YS', 'A': np.random.randint(1, 10,3), 'B': np.random.randint(10, 20,3), 'C': np.random.randint(25, 35,3)})
frames = (NO, YS)
df = pd.concat(frames)
Total=df.loc[df['TR'] == 'NO', ['A', 'B', 'C']].sum()
The total would be a single value = 152
You have to sum twice to reduce the dimensional:
>>> df.loc[df['TR'] == 'NO', ['A', 'B', 'C']].sum().sum()
152

TypeError: no numeric data to plot after creating a dataframe from a list of dictionaries

I'm trying to plot a dataset contained in a dictionary:
my_dict = [{'A': [0.7315847607219574],
'B': [0.5681159420289855],
'C': [0.9999999999999997],
'D': [0.5793801642856945],
'E': [0.6867350732769776],
'F': [0.7336804366512104]},
{'A': [0.4758837897858464],
'B': [0.4219886317147244],
'C': [0.6206223617183635],
'D': [0.3911170612926995],
'E': [0.5159829508133175],
'F': [0.479838956092881]},
{'A': [0.7315847607219574],
'B': [0.5681159420289855],
'C': [0.9999999999999997],
'D': [0.5793801642856945],
'E': [0.6867350732769776],
'F': [0.7336804366512104]}]
then
df = pd.DataFrame(my_dict)
df.plot(kind="barh")
plt.show()
dtypes is showing object type for all, and the syntax error TypeError: no numeric data to plot
I've exhausted most of my brain cells trying to figure this out but with no avail. All help will be appreciated.
Extracting the number from the list does the job
import pandas as pd
import matplotlib.pyplot as plt
my_dict = [{'A': [0.7315847607219574],
'B': [0.5681159420289855],
'C': [0.9999999999999997],
'D': [0.5793801642856945],
'E': [0.6867350732769776],
'F': [0.7336804366512104]},
{'A': [0.4758837897858464],
'B': [0.4219886317147244],
'C': [0.6206223617183635],
'D': [0.3911170612926995],
'E': [0.5159829508133175],
'F': [0.479838956092881]},
{'A': [0.7315847607219574],
'B': [0.5681159420289855],
'C': [0.9999999999999997],
'D': [0.5793801642856945],
'E': [0.6867350732769776],
'F': [0.7336804366512104]}]
for entry in my_dict:
for k,v in entry.items():
entry[k] = v[0]
df = pd.DataFrame(my_dict)
df.plot(kind="barh")
plt.show()

Plotting a double barplot in Python with values coming from two different dictionaries

I have two dictionaries of alphabets with values as the frequency of a character. When I try to plot them with my code below, I get the following error:
ValueError: shape mismatch: objects cannot be broadcast to a single shape
import string
import matplotlib.pyplot as plt
labels = list(string.ascii_uppercase)
N = len(labels)
dict_1 = {'A': 0.08167, 'B': 0.01492, 'C': 0.02782, 'D': 0.04253, 'E': 0.12702, 'F': 0.02228, 'G': 0.02015, 'H': 0.06094, 'I': 0.06966, 'J': 0.00153, 'K': 0.00772, 'L': 0.04025, 'M': 0.02406, 'N': 0.06749, 'O': 0.07507, 'P': 0.01929, 'Q': 0.00095, 'R': 0.05987, 'S': 0.06327, 'T': 0.09056, 'U': 0.02758, 'V': 0.00978, 'W': 0.0236, 'X': 0.0015, 'Y': 0.01974, 'Z': 0.00074}
dict_2 = {'P': 0.05776173285198556, 'U': 0.05776173285198556, 'A': 0.09025270758122744, 'O': 0.05415162454873646, 'L': 0.1263537906137184, 'M': 0.02888086642599278, 'V': 0.06859205776173286, 'S': 0.061371841155234655, 'D': 0.02888086642599278, 'N': 0.032490974729241874, 'J': 0.04332129963898917, 'I': 0.021660649819494584, 'F': 0.036101083032490974, 'B': 0.032490974729241874, 'H': 0.08664259927797834, 'C': 0.007220216606498195, 'W': 0.032490974729241874, 'Y': 0.05415162454873646, 'X': 0.007220216606498195, 'Z': 0.05054151624548736, 'R': 0.007220216606498195, 'E': 0.0036101083032490976, 'K': 0.0036101083032490976, 'T': 0.007220216606498195}
X = np.arange(len(dict_1))
bar_width = 0.45
fig = plt.figure( figsize=(17,5) )
ax = plt.subplot(111)
ax.bar(X, dict_1.values(), bar_width, color='blue', align='center', hatch='//')
ax.bar(X-bar_width, dict_2.values(), bar_width, color='green', align='center', hatch='//')
ax.legend(('Usual English','Your Text'), fontsize = 15)
plt.xticks(X-(bar_width/2), dict_1.keys())
plt.xlabel('Character', fontsize = 15)
plt.ylabel('Frequency', fontsize = 15)
plt.title("Frequency Analysis", fontsize=17)
plt.show()
I built the code using the same dictionary as first and thought it would work had I a different dictionary of alphabets. When I use the same dictionary, I get the following plot:
I want the plot to show two bars with different values (corresponding to the value from the dictionary it came from) for each character.
You need to somehow get values from each dict based on merged keys. Like this:
>>> d1 = {'A': 10, 'C': 20, 'B': 42}
>>> d2 = {'X': 13, 'B': 21}
>>>
>>> d1.keys()
dict_keys(['A', 'C', 'B'])
>>> d2.keys()
dict_keys(['X', 'B'])
>>> all_keys = sorted(d1.keys() | d2.keys())
>>> all_keys
['A', 'B', 'C', 'X']
>>>
>>> d1_values = [d1.get(k, 0) for k in all_keys]
>>> d1_values
[10, 42, 20, 0]
>>>
>>> d2_values = [d2.get(k, 0) for k in all_keys]
>>> d2_values
[0, 21, 0, 13]
Make the plot from all_keys, d1_values and d2_values.
import matplotlib.pyplot as plt
import numpy as np
dict_1 = {'A': 0.08167, 'B': 0.01492, 'C': 0.02782, 'D': 0.04253, 'E': 0.12702, 'F': 0.02228, 'G': 0.02015, 'H': 0.06094, 'I': 0.06966, 'J': 0.00153, 'K': 0.00772, 'L': 0.04025, 'M': 0.02406, 'N': 0.06749, 'O': 0.07507, 'P': 0.01929, 'Q': 0.00095, 'R': 0.05987, 'S': 0.06327, 'T': 0.09056, 'U': 0.02758, 'V': 0.00978, 'W': 0.0236, 'X': 0.0015, 'Y': 0.01974, 'Z': 0.00074}
dict_2 = {'P': 0.05776173285198556, 'U': 0.05776173285198556, 'A': 0.09025270758122744, 'O': 0.05415162454873646, 'L': 0.1263537906137184, 'M': 0.02888086642599278, 'V': 0.06859205776173286, 'S': 0.061371841155234655, 'D': 0.02888086642599278, 'N': 0.032490974729241874, 'J': 0.04332129963898917, 'I': 0.021660649819494584, 'F': 0.036101083032490974, 'B': 0.032490974729241874, 'H': 0.08664259927797834, 'C': 0.007220216606498195, 'W': 0.032490974729241874, 'Y': 0.05415162454873646, 'X': 0.007220216606498195, 'Z': 0.05054151624548736, 'R': 0.007220216606498195, 'E': 0.0036101083032490976, 'K': 0.0036101083032490976, 'T': 0.007220216606498195}
all_keys = sorted(dict_1.keys() | dict_2.keys())
d1_values = [dict_1.get(k, 0) for k in all_keys]
d2_values = [dict_2.get(k, 0) for k in all_keys]
X = np.arange(len(all_keys))
bar_width = 0.45
fig = plt.figure( figsize=(17,5) )
ax = plt.subplot(111)
ax.bar(X, d1_values, bar_width, color='blue', align='center', hatch='//')
ax.bar(X-bar_width, d2_values, bar_width, color='green', align='center', hatch='//')
ax.legend(('Usual English','Your Text'), fontsize = 15)
plt.xticks(X-(bar_width/2), all_keys)
plt.xlabel('Character', fontsize = 15)
plt.ylabel('Frequency', fontsize = 15)
plt.title("Frequency Analysis", fontsize=17)
plt.show()

Dictionary rearrangement and sorting

Required of counting the number of different values appear in the dict books, and in accordance with the number of occurrences of value reverse output.
books = {
123457889: 'A',
252435234: 'A',
434234341: 'B',
534524365: 'C',
354546589: 'D',
146546547: 'D',
353464543: 'F',
586746547: 'E',
511546547: 'F',
546546647: 'F',
541146127: 'F',
246546127: 'A',
434545127: 'B',
533346127: 'E',
544446127: 'F',
546446127: 'G',
155654627: 'G',
546567627: 'G',
145452437: 'H',
}
Output like this:
'F': 5,
'A': 3,
'G': 3,
'B': 2,
'D': 2,
'E': 2,
'C': 1,
'H': 1
I tried it:
import pprint
# to get the values from books
clist = [v for v in books.values()]
# values in books as keys in count,
count = {}
for c in clist:
count.setdefault(c, 0)
count[c] += 1
pprint.pprint(count)
But dict couldn't sorting.
Your code works fine. You can do this much easier using Counter from the collections module to do this for you. Simply pass books.values() in to Counter:
from collections import Counter
counts = Counter(books.values())
print(counts)
Output:
Counter({'F': 5, 'A': 3, 'G': 3, 'E': 2, 'D': 2, 'B': 2, 'H': 1, 'C': 1})
To provide the layout of the output you are expecting in order of value, you can perform a simple iteration using the most_common method and print each line:
for char, value in counts.most_common():
print("'{}': {}".format(char, value))
Output:
'F': 5
'G': 3
'A': 3
'E': 2
'D': 2
'B': 2
'C': 1
'H': 1

How to retrieve and store multiple values from a python Data Frame?

I have the following Dataframe that represents a From-To distance matrix between pairs of points. I have predetermined "trips" that visit specific pairs of points that I need to calculate the total distance for.
For example,
Trip 1 = [A:B] + [B:C] + [B:D] = 6 + 5 + 8 = 19
Trip 2 = [A:D] + [B:E] + [C:E] = 6 + 15 + 3 = 24
import pandas
graph = {'A': {'A': 0, 'B': 6, 'C': 10, 'D': 6, 'E': 7},
'B': {'A': 10, 'B': 0, 'C': 5, 'D': 8, 'E': 15},
'C': {'A': 40, 'B': 30, 'C': 0, 'D': 9, 'E': 3}}
df = pd.DataFrame(graph).T
df.to_excel('file.xls')
I have many "trips" that I need to repeat this process for and then need to store the values in a row in a new Dataframe that I can export to excel. I know I can use df.at[A,'B'] to retrieve specific values in the Dataframe but how can retrieve multiple values, sum them, store in new Dataframe, and then repeat for the enxt trip.
Thank you in advance for any help or guidance,
I think if you don't transpose then maybe an unstack will help?
import pandas as pd
graph = {'A': {'A': 0, 'B': 6, 'C': 10, 'D': 6, 'E': 7},
'B': {'A': 10, 'B': 0, 'C': 5, 'D': 8, 'E': 15},
'C': {'A': 40, 'B': 30, 'C': 0, 'D': 9, 'E': 3}}
df = pd.DataFrame(graph)
df = df.unstack()
df.index.names = ['start','finish']
# a list of tuples to represent the trip(s)
trip1 = [('A','B'),('B','C'),('B','D')]
trip2 = [('A','D'),('B','E'),('C','E')]
trips = [trip1,trip2]
my_trips = {}
for trip in trips:
my_trips[str(trip)] = df.loc[trip].sum()
distance_df = pd.DataFrame(my_trips,index=['distance']).T
distance_df
distance
[('A', 'B'), ('B', 'C'), ('B', 'D')] 19
[('A', 'D'), ('B', 'E'), ('C', 'E')] 24

Categories

Resources