Plotting pandas dataframe with string labels

Plotting pandas dataframe with string labels - python

I have a pandas dataframe that has several fields. The ones of importance are:
In[191]: tasks[['start','end','appId','index']]
Out[189]:
start end appId index
2576 1464262540102.000 1464262541204.000 application_1464258584784_0012 1
2577 1464262540098.000 1464262541208.000 application_1464258584784_0012 0
2579 1464262540104.000 1464262541194.000 application_1464258584784_0012 3
2583 1464262540107.000 1464262541287.000 application_1464258584784_0012 6
2599 1464262540125.000 1464262541214.000 application_1464258584784_0012 26
2600 1464262541191.000 1464262541655.000 application_1464258584784_0012 28
.
.
.
2701 1464262562172.000 1464262591147.000 application_1464258584784_0013 14
2718 1464262578901.000 1464262588156.000 application_1464258584784_0013 28
2727 1464262591145.000 1464262602085.000 application_1464258584784_0013 40
I want to plot a line for each row that goes from the coords (x1=start,y1=index),(x2=end,y1=index). Each line will have a different color depending on the value of appId which is a string. This is all done in a subplot I have inside a time series plot. I post the code here but the important bit is the tasks.iterrows() part, you can ignore the rest.
def plot_stage_in_host(dfm,dfg,appId,stageId,parameters,host):
[s,e] = time_interval_for_app(dfm, appId,stageId, host)
time_series = create_time_series_host(dfg, host, parameters, s,e)
fig,p1 = plt.subplots()
p2 = p1.twinx()
for para in parameters:
p1.plot(time_series.loc[time_series['parameter']==para].time,time_series.loc[time_series['parameter']==para].value,label=para)
p1.legend()
p1.set_xlabel("Time")
p1.set_ylabel(ylabel='%')
p1.set(ylim=(-1,1))
p2.set_ylabel("TASK INDEX")
tasks = dfm.loc[(dfm["hostname"]==host) & (dfm["start"]>s) & (dfm["end"]<e) & (dfm["end"]!=0)] #& (dfm["appId"]==appId) & (dfm["stageId"]==stageId)]
apps = tasks.appId.unique()
norm = colors.Normalize(0,len(apps))
scalar_map = cm.ScalarMappable(norm=norm, cmap='hsv')
for _,row in tasks.iterrows():
color = scalar_map.to_rgba(np.where(apps == row['appId'])[0][0])
p2.plot([row['start'],row['end']],[row['index'],row['index']],lw=4 ,c=color)
p2.legend(apps,loc='lower right')
p2.show()
This is the result I get.
Apparently is not considering the labels and the legend shows the same colors for all the lines. How can I label them correctly and show the legend as well?

The problem is that you are assigning the label each time you plot the graph in the for loop using the label= argument. Try removing it and giving p2.lengend() a list of strings as an argument that represent the labels you want to show.
p2.legend(['label1', 'label2'])
If you want to assign a different color to each line try the following:
import matplotlib.pyplot as plt
import numpy as np
xdata = [1, 2, 3, 4, 5]
ydata = [[np.random.randint(0, 6) for i in range(5)],
[np.random.randint(0, 6) for i in range(5)],
[np.random.randint(0, 6) for i in range(5)]]
colors = ['r', 'g', 'b'] # can be hex colors as well
legend_names = ['a', 'b', 'c']
for c, y in zip(colors, ydata):
plt.plot(xdata, y, c=c)
plt.legend(legend_names)
plt.show()
It gives the following result:
Hope this helps!

Related

How to create a plot with dynamic variables

Using matplotlib library on Pyhton, I would like to plot some graphs with dynamic y variables, i.e. variables which would change according to another variable stated before my plot functions.
From my imported data frame, I have extracted different gases concentration (M**_conc) and fluxes (M**_fluxes).
M33_conc = ec_top["M 33(ppbv)"]
M39_conc = ec_top["M 39(ncps)"]
M45_conc = ec_top["M 45(ppbv)"]
M59_conc = ec_top["M 59(ppbv)"]
M69_conc = ec_top["M 69(ppbv)"]
M71_conc = ec_top["M 71(ppbv)"]
M81_conc = ec_top["M 81(ppbv)"]
M137_conc = ec_top["M 137(ppbv)"]
M87_conc = ec_top["M 87(ppbv)"]
M47_conc = ec_top["M 47(ppbv)"]
M61_conc = ec_top["M 61(ppbv)"]
M33_flux = ec_top["Flux_M 33"]
M45_flux = ec_top["Flux_M 45"]
M59_flux = ec_top["Flux_M 59"]
M69_flux = ec_top["Flux_M 69"]
M71_flux = ec_top["Flux_M 71"]
M81_flux = ec_top["Flux_M 81"]
M137_flux = ec_top["Flux_M 137"]
M87_flux = ec_top["Flux_M 87"]
M47_flux = ec_top["Flux_M 47"]
M61_flux = ec_top["Flux_M 61"]
I want to be able to plot the evolution of these gases concentration/fluxes with time, with only one function which would allow me to choose between plotting the concentration or the fluxes of these gases.
Here is what I have written so far :
color_1 = 'black'
graph_type='conc'
fig, ((ax1, ax2, ax3), (ax5, ax7, ax8),(ax9,ax10,ax11)) = plt.subplots(3, 3, sharex=True, sharey=False)
fig.suptitle('Influence of wind direction of BVOCs concentration')
ax1.plot(wind_dir,'M33_'+graph_type,linestyle='',marker='.',color=color_1)
ax1.set_title('Methanol')
ax1.set(ylabel='Concentration [ppbv]')
ax2.plot(wind_dir,M39_conc,linestyle='',marker='.',color=color_1)
ax2.set_title('Water cluster')
ax2.set(ylabel='Concentration [ncps]')
ax3.plot(wind_dir,M45_conc,linestyle='',marker='.',color=color_1)
ax3.set_title('Acetaldehyde')
ax3.set(ylabel='Concentration [ppbv]')
# ax4.plot(wind_dir,M47_conc,linestyle='',marker='.',color='color_1')
# ax4.set_title('Unknown')
ax5.plot(wind_dir,M59_conc,linestyle='',marker='.',color=color_1)
ax5.set_title('Acetone')
ax5.set(ylabel='Concentration [ppbv]')
# ax6.plot(wind_dir,M61_conc,linestyle='',marker='.',color='color_1')
# ax6.set_title('Unknown')
ax7.plot(wind_dir,M69_conc,linestyle='',marker='.',color=color_1)
ax7.set_title('Isoprene')
ax7.set(ylabel='Concentration [ppbv]')
ax8.plot(wind_dir,M71_conc,linestyle='',marker='.',color=color_1)
ax8.set_title('Methyl vinyl, ketone and methacrolein')
ax8.set(ylabel='Concentration [ppbv]')
ax9.plot(wind_dir,M81_conc,linestyle='',marker='.',color=color_1)
ax9.set_title('Fragment of monoterpenes')
ax9.set(ylabel='Concentration [ppbv]',xlabel='Wind direction [°]')
ax10.plot(wind_dir,M87_conc,linestyle='',marker='.',color=color_1)
ax10.set_title('Methylbutenols')
ax10.set(ylabel='Concentration [ppbv]',xlabel='Wind direction [°]')
ax11.plot(wind_dir,M137_conc,linestyle='',marker='.',color=color_1)
ax11.set_title('Monoterpenes')
ax11.set(ylabel='Concentration [ppbv]',xlabel='Wind direction [°]')
plt.show()
When I try to parametrize the data I want to plot, I write, for example :
'M33_'+graph_type
which I am expecting to take the value 'M33_conc'.
Could someone help me to do this?
Thanks in advance

You have mentioned wanting to plot the evolution of the gases with time, but in the code sample you have given, you use wind_dir as the x variable. In this answer, I disregard this and use time as the x variable instead.
Looking at your code, I understand that you are wanting to create two different figures made of small multiples, one for gas concentrations and one for gas fluxes. For this kind of plot, I recommend using pandas or seaborn so that you can plot all the variables contained in a pandas dataframe at once. Here I share an example using pandas.
Because you are wanting to plot different measurements of the same substances, I recommend creating a table that lists the names of the variables and units associated with each unique substance (see df_subs below). I create one using code to extract the units and share it here, but this is easier to do with spreadsheet software.
Having a table like that makes it easier to create a plotting function that selects the group of variables you want to plot from the ec_top dataframe. You can then use the pandas plotting function like this: df.plot(subplots=True).
Most of the code shown below is to create some sample data based on your code to make it possible for you to recreate exactly what I show here and for anyone else who would like to give this a try. So if you want to use this solution, you can skip most of it, all you would need to do is create the substances table your way and then adjust the plotting function to fit your preferences.
Create sample dataset
import io # from Python v 3.8.5
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
import matplotlib.dates as mdates
pd.set_option("display.max_columns", 6)
rng = np.random.default_rng(seed=1) # random number generator
# Copy paste variable names from sample given in question
var_strings = '''
"M 33(ppbv)"
"M 39(ncps)"
"M 45(ppbv)"
"M 59(ppbv)"
"M 69(ppbv)"
"M 71(ppbv)"
"M 81(ppbv)"
"M 137(ppbv)"
"M 87(ppbv)"
"M 47(ppbv)"
"M 61(ppbv)"
"Flux_M 33"
"Flux_M 45"
"Flux_M 59"
"Flux_M 69"
"Flux_M 71"
"Flux_M 81"
"Flux_M 137"
"Flux_M 87"
"Flux_M 47"
"Flux_M 61"
'''
variables = pd.read_csv(io.StringIO(var_strings), header=None, names=['var'])['var']
# Create datetime variable
nperiods = 60
time = pd.date_range('2021-01-15 12:00', periods=nperiods, freq='min')
# Create range of numbers to compute sine waves for fake data
x = np.linspace(0, 2*np.pi, nperiods)
# Create dataframe containing gas concentrations
var_conc = np.array([var for var in variables if '(' in var])
conc_sine_wave = np.reshape(np.sin(x), (len(x), 1))
loc = rng.exponential(scale=10, size=var_conc.size)
scale = loc/10
var_conc_noise = rng.normal(loc, scale, size=(x.size, var_conc.size))
data_conc = conc_sine_wave + var_conc_noise + 2
df_conc = pd.DataFrame(data_conc, index=time, columns=var_conc)
# Create dataframe containing gas fluxes
var_flux = np.array([var for var in variables if 'Flux' in var])
flux_sine_wave = np.reshape(np.sin(x)**2, (len(x), 1))
loc = rng.exponential(scale=10, size=var_flux.size)
scale = loc/10
var_flux_noise = rng.normal(loc, scale, size=(x.size, var_flux.size))
data_flux = flux_sine_wave + var_flux_noise + 1
df_flux = pd.DataFrame(data_flux, index=time, columns=var_flux)
# Merge concentrations and fluxes into single dataframe
ec_top = pd.merge(left=df_conc, right=df_flux, how='outer',
left_index=True, right_index=True)
ec_top.head()
# M 33(ppbv) M 39(ncps) M 45(ppbv) ... Flux_M 87 Flux_M 47 Flux_M 61
# 2021-01-15 12:00:00 11.940054 5.034281 53.162767 ... 8.079255 2.402073 31.383911
# 2021-01-15 12:01:00 13.916828 4.354558 45.706391 ... 10.229084 2.494649 26.816754
# 2021-01-15 12:02:00 13.635604 5.500438 53.202743 ... 12.772899 2.441369 33.219213
# 2021-01-15 12:03:00 13.146823 5.409585 53.346907 ... 11.373669 2.817323 33.409331
# 2021-01-15 12:04:00 14.124752 5.491555 49.455010 ... 11.827497 2.939942 28.639749
Create substances table containing variable names and units
The substances are shown in the figure subplots in the order that they are listed here. Information from this table is used to create the labels and titles of the subplots.
# Copy paste substance codes and names from sample given in question
subs_strings = """
M33 "Methanol"
M39 "Water cluster"
M45 "Acetaldehyde"
M47 "Unknown"
M59 "Acetone"
M61 "Unknown"
M69 "Isoprene"
M71 "Methyl vinyl, ketone and methacrolein"
M81 "Fragment of monoterpenes"
M87 "Methylbutenols"
M137 "Monoterpenes"
"""
# Create dataframe containing substance codes and names
df_subs = pd.read_csv(io.StringIO(subs_strings), header=None,
names=['subs', 'subs_name'], index_col=False,
delim_whitespace=True)
# Add units and variable names matching the substance codes
# Do this for gas concentrations
for var in var_conc:
var_subs, var_unit_raw = var.split('(')
var_subs_num = var_subs.lstrip('M ')
var_unit = var_unit_raw.rstrip(')')
for i, subs in enumerate(df_subs['subs']):
if var_subs_num == subs.lstrip('M'):
df_subs.loc[i, 'conc_unit'] = var_unit
df_subs.loc[i, 'conc_var'] = var
# Do this for gas fluxes
for var in var_flux:
var_subs_num = var.split('M')[1].lstrip()
var_unit = rng.choice(['unit_a', 'unit_b', 'unit_c'])
for i, subs in enumerate(df_subs['subs']):
if var_subs_num == subs.lstrip('M'):
df_subs.loc[i, 'flux_unit'] = var_unit
df_subs.loc[i, 'flux_var'] = var
df_subs
# subs subs_name conc_unit conc_var flux_unit flux_var
# 0 M33 Methanol ppbv M 33(ppbv) unit_c Flux_M 33
# 1 M39 Water cluster ncps M 39(ncps) NaN NaN
# 2 M45 Acetaldehyde ppbv M 45(ppbv) unit_a Flux_M 45
# 3 M47 Unknown ppbv M 47(ppbv) unit_b Flux_M 47
# 4 M59 Acetone ppbv M 59(ppbv) unit_a Flux_M 59
# 5 M61 Unknown ppbv M 61(ppbv) unit_c Flux_M 61
# 6 M69 Isoprene ppbv M 69(ppbv) unit_a Flux_M 69
# 7 M71 Methyl vinyl, ketone and methacrolein ppbv M 71(ppbv) unit_a Flux_M 71
# 8 M81 Fragment of monoterpenes ppbv M 81(ppbv) unit_c Flux_M 81
# 9 M87 Methylbutenols ppbv M 87(ppbv) unit_c Flux_M 87
# 10 M137 Monoterpenes ppbv M 137(ppbv) unit_b Flux_M 137
Create plotting function based on pandas
Here is one way of creating a plotting function that lets you select the variables for the plot with the graph_type argument. It works by selecting the relevant variables from the substances table using the if/elif statement. This and the ec_top[variables].plot(...) function are all that is really necessary to create the plot, the rest is all for formatting the figure. The variables are plotted in the order of the variables list. I draw only two columns of subplots because of width constraints here (max 10 inches width to get a sharp image on Stack Overflow).
# Create plotting function that creates a single figure showing all
# variables of the chosen type
def plot_grid(graph_type):
# Set the type of variables and units to fetch in df_subs: using if
# statements for the strings lets you use a variety of strings
if 'conc' in graph_type:
var_type = 'conc_var'
unit_type = 'conc_unit'
elif 'flux' in graph_type:
var_type = 'flux_var'
unit_type = 'flux_unit'
else:
return f'Error: "{graph_type}" is not a valid string, \
it must contain "conc" or "flux".'
# Create list of variables to plot depending on type
variables = df_subs[var_type].dropna()
# Set parameters for figure dimensions
nvar = variables.size
cols = 2
rows = int(np.ceil(nvar/cols))
width = 10/cols
height = 3
# Draw grid of line plots: note that x_compat is used to override the
# default x-axis time labels, remove it if you do not want to use custom
# tick locators and formatters like the ones created in the loop below
grid = ec_top[variables].plot(subplots=True, figsize=(cols*width, rows*height),
layout=(rows, cols), marker='.', linestyle='',
xlabel='Time', x_compat=True)
# The code in the following loop is optional formatting based on my
# preferences, if you remove it the plot should still look ok but with
# fewer informative labels and the legends may not all be in the same place
# Loop through the subplots to edit format, including creating labels and
# titles based on the information in the substances table (df_subs):
for ax in grid.flatten()[:nvar]:
# Edit tick locations and format
plt.setp(ax.get_xticklabels(which='both'), fontsize=8, rotation=0, ha='center')
loc = mdates.AutoDateLocator()
ax.xaxis.set_major_locator(loc)
ax.set_xticks([], minor=True)
fmt = mdates.ConciseDateFormatter(loc, show_offset=False)
ax.xaxis.set_major_formatter(fmt)
# Edit legend
handle, (var_name,) = ax.get_legend_handles_labels()
subs = df_subs[df_subs[var_type] == var_name]['subs']
ax.legend(handle, subs, loc='upper right')
# Add y label
var_unit, = df_subs[df_subs[var_type] == var_name][unit_type]
ylabel_type = f'{"Concentration" if "conc" in graph_type else "Flux"}'
ax.set_ylabel(f'{ylabel_type} [{var_unit}]')
# Add title
subs_name, = df_subs[df_subs[var_type] == var_name]['subs_name']
ax.set_title(subs_name)
# Edit figure format
fig = plt.gcf()
date = df_conc.index[0].strftime('%b %d %Y')
title_type = f'{"concentrations" if "conc" in graph_type else "fluxes"}'
fig.suptitle(f'BVOCs {title_type} on {date} from 12:00 to 13:00',
y=0.93, fontsize=15);
fig.subplots_adjust(wspace=0.3, hspace=0.4)
plt.show()
plot_grid('conc') # any kind of string works if it contains 'conc' or 'flux'
plot_grid('graph fluxes')
Documentation: matplotlib date ticks

Python/matplotlib: how do I change the color and/or symbol of every nth data point in a plot?

My experience with Python is pretty basic. I have written Python code to import data from an external file and perform a calculation. My result looks something like this (except much larger in reality).
1 1
1 1957
1 0.15
2 346
2 0.90
2 100
3 1920
3 100
3 40
What I want to do is plot these two columns as a single series, but then distinguish each data point according to a certain pattern. I know this sounds unnecessarily complicated, but it's something I need to do to help out the people who will use my code. Unfortunately, my Python skills fail me here. More specifically:
1. The first column has "1," "2," or "3." So first I want to make all the "1" data points circles (for example), all the "2" data points some other symbol, and likewise for the "3" data points.
2. Next. There are three rows for each distinct number. So for "1," the "0.15" in the second column is the average value, the "1957" is the maximum value, the "1" is the minimum value. I want to make the data point associated with each number's average value (the top row for each number) green (for example). I want the maximum and minimum values to have their own colors too.
So I will end up with a plot that shows one series only, but where each data point looks distinct. If anyone could please point me in the right direction, I would be very grateful. If I have not said this clearly, please let me know and I'll try again!

For different marker styles you currently need to create different plot instances (see this github issue). Using different colors can be done by passing an array as the color argument. So for example:
import matplotlib.pyplot as plt
import numpy as np
data = np.array([
[1, 0.15],
[1, 1957],
[1, 1],
[2, 346],
[2, 0.90],
[2, 100],
[3, 1920],
[3, 100],
[3, 40],
])
x, y = np.transpose(data)
symbols = ['o', 's', 'D']
colors = ['blue', 'orange', 'green']
for value, marker in zip(np.unique(x), symbols):
mask = (x == value)
plt.scatter(x[mask], y[mask], marker=marker, color=colors)
plt.show()

What I would do is to separate the data into three different columns so you have a few series. Then I'd use the plt.scatter with different markers to get the desired effect.
code
import matplotlib.pyplot as plt
import numpy as np
# Fixing random state for reproducibility
np.random.seed(19680801)
N = 100
r0 = 0.6
x = 0.9 * np.random.rand(N)
y = 0.9 * np.random.rand(N)
area = (20 * np.random.rand(N))**2 # 0 to 10 point radii
c = np.sqrt(area)
r = np.sqrt(x ** 2 + y ** 2)
area1 = np.ma.masked_where(r < r0, area)
area2 = np.ma.masked_where(r >= r0, area)
plt.scatter(x, y, s=area1, marker='^', c=c)
plt.scatter(x, y, s=area2, marker='o', c=c)
# Show the boundary between the regions:
theta = np.arange(0, np.pi / 2, 0.01)
plt.plot(r0 * np.cos(theta), r0 * np.sin(theta))
plt.show()
source: https://matplotlib.org/3.1.1/gallery/lines_bars_and_markers/scatter_masked.html#sphx-glr-gallery-lines-bars-and-markers-scatter-masked-py

How to draw proper chart of distributional tree?

I am using python with matplotlib and need to visualize distribution percentage of sub-groups of an data set.
imagine this tree:
Data --- group1 (40%)
-
--- group2 (25%)
-
--- group3 (35%)
group1 --- A (25%)
-
--- B (25%)
-
--- c (50%)
and it can go on, each group can have several sub-groups and same for each sub group.
How can i plot a proper chart for this info?

I created a minimal reproducible example that I think fits your description, but please let me know if that is not what you need.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data = pd.DataFrame()
n_rows = 100
data['group'] = np.random.choice(['1', '2', '3'], n_rows)
data['subgroup'] = np.random.choice(['A', 'B', 'C'], n_rows)
For instance, we could get the following counts for the subgroups.
In [1]: data.groupby(['group'])['subgroup'].value_counts()
Out[1]: group subgroup
1 A 17
C 16
B 5
2 A 23
C 10
B 7
3 C 8
A 7
B 7
Name: subgroup, dtype: int64
I created a function that computes the necessary counts given an ordering of the columns (e.g. ['group', 'subgroup']) and incrementally plots the bars with the corresponding percentages.
import matplotlib.pyplot as plt
import matplotlib.cm
def plot_tree(data, ordering, axis=False):
"""
Plots a sequence of bar plots reflecting how the data
is distributed at different levels. The order of the
levels is given by the ordering parameter.
Parameters
----------
data: pandas DataFrame
ordering: list
Names of the columns to be plotted.They should be
ordered top down, from the larger to the smaller group.
axis: boolean
Whether to plot the axis.
Returns
-------
fig: matplotlib figure object.
The final tree plot.
"""
# Frame set-up
fig, ax = plt.subplots(figsize=(9.2, 3*len(ordering)))
ax.set_xticks(np.arange(-1, len(ordering)) + 0.5)
ax.set_xticklabels(['All'] + ordering, fontsize=18)
if not axis:
plt.axis('off')
counts=[data.shape[0]]
# Get colormap
labels = ['All']
for o in reversed(ordering):
labels.extend(data[o].unique().tolist())
# Pastel is nice but has few colors. Change for a larger map if needed
cmap = matplotlib.cm.get_cmap('Pastel1', len(labels))
colors = dict(zip(labels, [cmap(i) for i in range(len(labels))]))
# Group the counts
counts = data.groupby(ordering).size().reset_index(name='c_' + ordering[-1])
for i, o in enumerate(ordering[:-1], 1):
if ordering[:i]:
counts['c_' + o]=counts.groupby(ordering[:i]).transform('sum')['c_' + ordering[-1]]
# Calculate percentages
counts['p_' + ordering[0]] = counts['c_' + ordering[0]]/data.shape[0]
for i, o in enumerate(ordering[1:], 1):
counts['p_' + o] = counts['c_' + o]/counts['c_' + ordering[i-1]]
# Plot first bar - all data
ax.bar(-1, data.shape[0], width=1, label='All', color=colors['All'], align="edge")
ax.annotate('All -- 100%', (-0.9, 0.5), fontsize=12)
comb = 1 # keeps track of the number of possible combinations at each level
for bar, col in enumerate(ordering):
labels = sorted(data[col].unique())*comb
comb *= len(data[col].unique())
# Get only the relevant counts at this level
local_counts = counts[ordering[:bar+1] +
['c_' + o for o in ordering[:bar+1]] +
['p_' + o for o in ordering[:bar+1]]].drop_duplicates()
sizes = local_counts['c_' + col]
percs = local_counts['p_' + col]
bottom = 0 # start at from 0
for size, perc, label in zip(sizes, percs, labels):
ax.bar(bar, size, width=1, bottom=bottom, label=label, color=colors[label], align="edge")
ax.annotate('{} -- {:.0%}'.format(label, perc), (bar+0.1, bottom+0.5), fontsize=12)
bottom += size # stack the bars
ax.legend(colors)
return fig
With the data shown above we would get the following.
fig = plot_tree(data, ['group', 'subgroup'], axis=True)

Have you tried stacked bar graph?
https://matplotlib.org/gallery/lines_bars_and_markers/bar_stacked.html#sphx-glr-gallery-lines-bars-and-markers-bar-stacked-py

How to I set different colors to subsets of line plot iterations in matplotlib?

I am iteratively plotting the np.exp results of 12 rows of data from a 2D array (12,5000), out_array. All data share the same x values, (x_d). I want the first 4 iterations to all plot as the same color, the next 4 to be a different color, and next 4 a different color...such that I have 3 different colors each corresponding to the 1st-4th, 5th-8th, and 9th-12th iterations respectively. In the end, it would also be nice to define these sets with their corresponding colors in a legend.
I have researched cycler (https://matplotlib.org/examples/color/color_cycle_demo.html), but I can't figure out how to assign colors into sets of iterations > 1. (i.e. 4 in my case). As you can see in my code example, I can have all 12 lines plotted with different (default) colors -or- I know how to make them all the same color (i.e. ...,color = 'r',...)
plt.figure()
for i in range(out_array.shape[0]):
plt.plot(x_d, np.exp(out_array[i]),linewidth = 1, alpha = 0.6)
plt.xlim(-2,3)
I expect a plot like this, only with a total of 3 different colors, each corresponding to the chunks of iterations described above.

An other solution
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(10)
color = ['r', 'g', 'b', 'p']
for i in range(12):
plt.plot(x, i*x, color[i//4])
plt.show()

plt.figure()
n = 0
color = ['r','g','b']
for i in range(out_array.shape[0]):
n = n+1
if n/4 <= 1:
c = 1
elif n/4 >1 and n/4 <= 2:
c = 2
elif n/4 >2:
c = 3
else:
print(n)
plt.plot(x_d, np.exp(out_array[i]),color = color[c-1])
plt.show()

Python Pandas plot multiindex specify x and y

Below is an example DataFrame.
joaquin manolo
xx 0 0.000000e+00 44.000000
1 1.570796e+00 52.250000
2 3.141593e+00 60.500000
3 4.712389e+00 68.750000
4 6.283185e+00 77.000000
yy 0 0.000000e+00 37.841896
1 2.078796e+00 39.560399
2 5.292179e-17 41.026434
3 -8.983291e-02 42.304767
4 -4.573916e-18 43.438054
As you can see, the row index has two levels, ['xx', 'yy'] and [0, 1, 2, 3, 4]. I want to call DataFrame.plot() in such a way that it will produce two subplots, one for joaquin and one for manolo, and where I can specify to use data.loc["xx", :] for the domain data and to use data.loc["yy", :] for the ordinate data. In addition, I want the option to supply the subplots on which the plots should be drawn, in a list (or array) of matplotlib.axes._subplots.AxesSubplot instances, such as those that can be returned by the DataFrame.hist() method. How can this be done?
Generating the data above
Just in case you're wondering, below is the code I used to generate the data. If there is an easier way to generate this data, I'd be very interested to know as a side-note.
joaquin_dict = {}
xx_joaquin = numpy.linspace(0, 2*numpy.pi, 5)
yy_joaquin = 10 * numpy.sin(xx_joaquin) * numpy.exp(-xx_joaquin)
for i in range(len(xx_joaquin)):
joaquin_dict[("xx", i)] = xx_joaquin[i]
joaquin_dict[("yy", i)] = yy_joaquin[i]
manolo_dict = {}
xx_manolo = numpy.linspace(44, 77, 5)
yy_manolo = 10 * numpy.log(xx_manolo)
for i in range(len(xx_manolo)):
manolo_dict[("xx", i)] = xx_manolo[i]
manolo_dict[("yy", i)] = yy_manolo[i]
data_dict = {"joaquin": joaquin_dict, "manolo": manolo_dict}
data = pandas.DataFrame.from_dict(data_dict)

Just use a for loop:
fig, axes = pl.subplots(1, 2)
for ax, col in zip(axes, data.columns):
data[col].unstack(0).plot(x="xx", y="yy", ax=ax, title=col)
output:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotting pandas dataframe with string labels - python

Related

How to create a plot with dynamic variables

Python/matplotlib: how do I change the color and/or symbol of every nth data point in a plot?

How to draw proper chart of distributional tree?

How to I set different colors to subsets of line plot iterations in matplotlib?

Python Pandas plot multiindex specify x and y

Categories

Resources