Plot separate pandas dataframe as subplots [duplicate] - python

This question already has an answer here:
matplotlib subplots - too many indices for array [duplicate]
(1 answer)
Closed 1 year ago.
I would like to plot pandas dataframes as subplots.
I read this post: How can I plot separate Pandas DataFrames as subplots?
Here is my minimum example where, like the accepted answer in the post, I used the ax keyword:
import pandas as pd
from matplotlib.pyplot import plot, show, subplots
import numpy as np
# Definition of the dataframe
df = pd.DataFrame({'Pressure': {0: 1, 1: 2, 2: 4}, 'Volume': {0: 2, 1: 4, 2: 8}, 'Temperature': {0: 3, 1: 6, 2: 12}})
# Plot
fig,axes = subplots(2,1)
df.plot(x='Temperature', y=['Volume'], marker = 'o',ax=axes[0,0])
df.plot(x='Temperature', y=['Pressure'], marker = 'o',ax=axes[1,0])
show()
Unfortunately, there is a problem with the indices:
df.plot(x='Temperature', y=['Volume'], marker = 'o',ax=axes[0,0])
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
Please, could you help me ?

If you have only one dimension (like 2 x 1 subplots), you can just used axes[0] and axes[1]. When you have two dimensional subplots (2 x 3 subplots for example), you indeed need slicing with two numbers.

Related

price Histogram in python pandas [duplicate]

This question already has answers here:
How do I melt a pandas dataframe?
(3 answers)
Binning a column with pandas
(4 answers)
How to plot percentage with seaborn distplot / histplot / displot
(3 answers)
Closed 4 months ago.
I have the following table which shows the item and price for that item.
item CAR_PRIC1 Car_PRICE2
0 H1 17400.00 18400.00
1 H2 35450.00 27400.00
2 H3 55780.00 57400.00
3 H4 78500.00 37400.00
4 H5 25609.55 77400.00
5 H6 96000.00 97400.00
How I can draw a histogram to show on Y-axis a category of different prices and on X-Axis shows how many percentage of all contract falls among those category of prices.
like following:
It's straightforward with seaborn displot (editing your df a bit to make the plot more readable):
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = {
'item': {0: 'H1', 1: 'H2', 2: 'H3', 3: 'H4', 4: 'H5', 5: 'H6'},
'CAR_PRICE1': {0: 7400.0, 1: 135450.0, 2: 5780.0, 3: 78500.0, 4: 25609.55, 5: 126000.0},
'CAR_PRICE2': {0: 78400.0, 1: 27400.0, 2: 37600.0, 3: 37400.0, 4: 77400.0, 5: 97400.0}
}
df = pd.DataFrame(data)
sns.displot(data=df[['CAR_PRICE1', 'CAR_PRICE2']])
plt.show()
Output:
If you want percentage instead of count:
sns.displot(data=df[['CAR_PRICE1', 'CAR_PRICE2']], stat='percent')

How to create a scatter plot where x and y values are the column and row names

I have a question of plotting a scatter plot from a dataframe.
The data I would like to plot seems like this:
I would like to have a scatter plot where the x axis are the years and the y axis are named as cities. The sizes of the scatters on the scatterplot should be based on the data value.
the wished visualization of the data:
I searched examples of documents from different libraries and also stack overflow, but unfortunately I didn't find a suitable answer to this.
I would appreciate if anyone can help, either excel or python solution would be fine.
Thanks
Something like this should work:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# assuming your example data is in a dataframe called df
# rename columns so that we can apply 'wide_to_long'
df.rename(columns={1990: 'A1990', 1991: 'A1991', 2019: 'A2019', 2020: 'A2020'}, inplace=True)
# reshape data using 'wide_to_long' to get it into the right format for scatter plot
df = pd.wide_to_long(df, "A", i="City", j="year")
df.reset_index(inplace=True)
df["A"] = df["A"].astype(int)
# OPTIONAL: scale the "bubble size" variable in column A to make graph easier to interpret
df["A"] = (df["A"] + 0.5) * 100
# map years onto integers so we can only plot the years that we want
year_dict = {1990: 1, 1991: 2, 2019: 3, 2020: 4}
df['year_num'] = df['year'].map(year_dict)
# plot the data
fig, ax = plt.subplots()
plt.scatter(df['year_num'], df['City'], s=df['A'], alpha=0.5)
# label the years corresponding to 'year_num' values on the x-axis
plt.xticks(np.arange(1, 5, 1.0))
labels = [1990, 1991, 2019, 2020]
ax.set_xticklabels(labels)
plt.show()
You can play around with the colors/formatting options in matplotlib to get the look you want, but the above should accomplish the basic idea.

How to draw the smooth lineplot and display the dates on the x-axis with python? [duplicate]

This question already has answers here:
Passing datetime-like object to seaborn.lmplot
(2 answers)
format x-axis (dates) in sns.lmplot()
(1 answer)
How to plot int to datetime on x axis using seaborn?
(1 answer)
Closed 10 months ago.
I would really really appreciate it if you guys can point me to where to look. I have been trying to do it for 3 days and still can't find the right one. I need to draw the chart which looks as the first picture's chart and I need to display the dates on the X axis as it gets displayed on the second chart. I am complete beginner with seaborn, python and everything. I used lineplot first, which only met one criteria, display the dates on X-axis. But, the lines are actually sharp like in the second picture rather than smooth like in the first picture. Then, I kept digging and found implot. With that, I could get the design of the chart I wanted (Smoothed chart). But, the problem is when I tried to display the dates on the X-axis, it didn't work. I got an error could not convert string to float: '2022-07-27T13:31:00Z'.
Here is the code for implot, got the wanted plot design but date can't be displayed on X-axis
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([ "2022-07-27T13:31:00Z",
"2022-08-28T13:31:00Z",
"2022-09-29T13:31:00Z",
"2022-10-30T13:31:00Z",])
power = np.array([10,25,60,42])
df = pd.DataFrame(data = {'T': T, 'power': power})
sns.lmplot(x='T', y='power', data=df, ci=None, order=4, truncate=False)
If I use the number instead of date, the output is this. Exactly as I need
Here is the code with which all the data gets displayed correctly. But, the plot design is not smoothed.
import seaborn as sns
import numpy as np
import scipy
import matplotlib.pyplot as plt
import pandas as pd
from pandas.core.apply import frame_apply
years = ["2022-03-22T13:30:00Z",
"2022-03-23T13:31:00Z",
"2022-04-24T19:27:00Z",
"2022-05-25T13:31:00Z",
"2022-06-26T13:31:00Z",
"2022-07-27T13:31:00Z",
"2022-08-28T13:31:00Z",
"2022-09-29T13:31:00Z",
"2022-10-30T13:31:00Z",
]
feature_1 =[0,
6,
1,
5,
9,
15,
21,
4,
1,
]
data_preproc = pd.DataFrame({
'Period': years,
# 'Feature 1': feature_1,
# 'Feature 2': feature_2,
# 'Feature 3': feature_3,
# 'Feature 4': feature_4,
"Feature 1" :feature_1
})
data_preproc['Period'] = pd.to_datetime(data_preproc['Period'],
format="%Y-%m-%d",errors='coerce')
data_preproc['Period'] = data_preproc['Period'].dt.strftime('%b')
# aiAlertPlot =sns.lineplot(x='Period', y='value', hue='variable',ci=None,
# data=pd.melt(data_preproc, ['Period']))
sns.lineplot(x="Period",y="Feature 1",data=data_preproc)
# plt.xticks(np.linspace(start=0, stop=21, num=52))
plt.xticks(rotation=90)
plt.legend(title="features")
plt.ylabel("Alerts")
plt.legend(loc='upper right')
plt.show()
The output is this. Correct data, wrong chart design.
lmplot is a model based method, which requires numeric x. If you think the date values are evenly spaced, you can just create another variable range which is numeric and calculate lmplot on that variable and then change the xticks labels.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([ "2022-07-27T13:31:00Z",
"2022-08-28T13:31:00Z",
"2022-09-29T13:31:00Z",
"2022-10-30T13:31:00Z",])
power = np.array([10,25,60,42])
df = pd.DataFrame(data = {'T': T, 'power': power})
df['range'] = np.arange(df.shape[0])
sns.lmplot(x='range', y='power', data=df, ci=None, order=4, truncate=False)
plt.xticks(df['range'], df['T'], rotation = 45);

Plot Multicolored Time Series Plot based on Conditional in Python [duplicate]

This question already has answers here:
How to plot multi-color line if x-axis is date time index of pandas
(2 answers)
Closed 5 years ago.
I have a pandas Financial timeseries DataFrame with two columns and one datetime index.
TOTAL.PAPRPNT.M Label
1973-03-01 25504.000 3
1973-04-01 25662.000 3
1973-05-01 25763.000 0
1973-06-01 25996.000 0
1973-07-01 26023.000 1
1973-08-01 26005.000 1
1973-09-01 26037.000 2
1973-10-01 26124.000 2
1973-11-01 26193.000 3
1973-12-01 26383.000 3
As you can see each data-set corresponds to a 'Label'. This label should essentially classify if the line from the previous 'point' to the next 'point' carries certain characteristics (different types of stock graph changes) and therefore use a separate color for each of these plots. This question is related to this question Plot Multicolored line based on conditional in python but the 'groupby' part totally skipped my understanding and this scheme is Bicolored scheme rather than a multicolored one (I have four labels).
I want to create a Multicoloured Plot of the graph based on the Labels associated with each entry in the dataframe.
Here's an example of what I think your trying to do. It's based on the MPL documentation mentioned in the comments and uses randomly generated data.
Just map the colormap boundaries to the discrete values given by the number of classes.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
from matplotlib.colors import ListedColormap, BoundaryNorm
import pandas as pd
num_classes = 4
ts = range(10)
df = pd.DataFrame(data={'TOTAL': np.random.rand(len(ts)), 'Label': np.random.randint(0, num_classes, len(ts))}, index=ts)
print(df)
cmap = ListedColormap(['r', 'g', 'b', 'y'])
norm = BoundaryNorm(range(num_classes+1), cmap.N)
points = np.array([df.index, df['TOTAL']]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
lc = LineCollection(segments, cmap=cmap, norm=norm)
lc.set_array(df['Label'])
fig1 = plt.figure()
plt.gca().add_collection(lc)
plt.xlim(df.index.min(), df.index.max())
plt.ylim(-1.1, 1.1)
plt.show()
Each line segment is coloured according to the class label given in df['Label'] Here's a sample result:

How to plot different parts of same Pandas Series column with different colors? [duplicate]

This question already has answers here:
Plotting line with different colors
(2 answers)
Closed 5 years ago.
Let's say I have a Series like this:
testdf = pd.Series([3, 4, 2, 5, 1, 6, 10])
When plotting, this is the result:
testdf.plot()
I want to plot, say, the line up to the first 4 values in blue (default) and the rest of the line in red. How can I do it?
IIUC
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
testdf.plot(ax=ax,color='b')
testdf.iloc[3:].plot(ax=ax,color='r')

Categories

Resources