I have the following problem: I want to combine graphs from various years in one plot. To explain you more about my problem I made the hereunder simplified problem.
# packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Here I make a simplified dataframe which i want to plot
data = {'Dates': ['02-04-2014', '18-08-2014', '05-03-2014', '06-06-2014', '05-08-2013', '06-11-2013', '03-01-2013', '12-02-2013'], 'Values':
[7, 8, 11, 3, 6, 1, 8, 13]}
df = pd.DataFrame.from_dict(data)
This is important for me because it is the format I work with in my problem
df['Dates'] = pd.to_datetime(df['Dates'])
Here I do the plotting
years = sorted([i for i in df['Dates'].apply(lambda x: x.year).unique()])
for i in years:
df1 = df[(df['Dates'].apply(lambda x: x.year) == i)]
df1 = df1.sort_values(by = ['Dates'])
plt.show()
This returns in this case two separate line plots, one for the year 2013 and one for 2014. I want these combined in one graph. So, that i get one grap with a legend for the year.
Hope you can help!
Related
I have a question of plotting a scatter plot from a dataframe.
The data I would like to plot seems like this:
I would like to have a scatter plot where the x axis are the years and the y axis are named as cities. The sizes of the scatters on the scatterplot should be based on the data value.
the wished visualization of the data:
I searched examples of documents from different libraries and also stack overflow, but unfortunately I didn't find a suitable answer to this.
I would appreciate if anyone can help, either excel or python solution would be fine.
Thanks
Something like this should work:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# assuming your example data is in a dataframe called df
# rename columns so that we can apply 'wide_to_long'
df.rename(columns={1990: 'A1990', 1991: 'A1991', 2019: 'A2019', 2020: 'A2020'}, inplace=True)
# reshape data using 'wide_to_long' to get it into the right format for scatter plot
df = pd.wide_to_long(df, "A", i="City", j="year")
df.reset_index(inplace=True)
df["A"] = df["A"].astype(int)
# OPTIONAL: scale the "bubble size" variable in column A to make graph easier to interpret
df["A"] = (df["A"] + 0.5) * 100
# map years onto integers so we can only plot the years that we want
year_dict = {1990: 1, 1991: 2, 2019: 3, 2020: 4}
df['year_num'] = df['year'].map(year_dict)
# plot the data
fig, ax = plt.subplots()
plt.scatter(df['year_num'], df['City'], s=df['A'], alpha=0.5)
# label the years corresponding to 'year_num' values on the x-axis
plt.xticks(np.arange(1, 5, 1.0))
labels = [1990, 1991, 2019, 2020]
ax.set_xticklabels(labels)
plt.show()
You can play around with the colors/formatting options in matplotlib to get the look you want, but the above should accomplish the basic idea.
This question already has answers here:
Passing datetime-like object to seaborn.lmplot
(2 answers)
format x-axis (dates) in sns.lmplot()
(1 answer)
How to plot int to datetime on x axis using seaborn?
(1 answer)
Closed 10 months ago.
I would really really appreciate it if you guys can point me to where to look. I have been trying to do it for 3 days and still can't find the right one. I need to draw the chart which looks as the first picture's chart and I need to display the dates on the X axis as it gets displayed on the second chart. I am complete beginner with seaborn, python and everything. I used lineplot first, which only met one criteria, display the dates on X-axis. But, the lines are actually sharp like in the second picture rather than smooth like in the first picture. Then, I kept digging and found implot. With that, I could get the design of the chart I wanted (Smoothed chart). But, the problem is when I tried to display the dates on the X-axis, it didn't work. I got an error could not convert string to float: '2022-07-27T13:31:00Z'.
Here is the code for implot, got the wanted plot design but date can't be displayed on X-axis
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([ "2022-07-27T13:31:00Z",
"2022-08-28T13:31:00Z",
"2022-09-29T13:31:00Z",
"2022-10-30T13:31:00Z",])
power = np.array([10,25,60,42])
df = pd.DataFrame(data = {'T': T, 'power': power})
sns.lmplot(x='T', y='power', data=df, ci=None, order=4, truncate=False)
If I use the number instead of date, the output is this. Exactly as I need
Here is the code with which all the data gets displayed correctly. But, the plot design is not smoothed.
import seaborn as sns
import numpy as np
import scipy
import matplotlib.pyplot as plt
import pandas as pd
from pandas.core.apply import frame_apply
years = ["2022-03-22T13:30:00Z",
"2022-03-23T13:31:00Z",
"2022-04-24T19:27:00Z",
"2022-05-25T13:31:00Z",
"2022-06-26T13:31:00Z",
"2022-07-27T13:31:00Z",
"2022-08-28T13:31:00Z",
"2022-09-29T13:31:00Z",
"2022-10-30T13:31:00Z",
]
feature_1 =[0,
6,
1,
5,
9,
15,
21,
4,
1,
]
data_preproc = pd.DataFrame({
'Period': years,
# 'Feature 1': feature_1,
# 'Feature 2': feature_2,
# 'Feature 3': feature_3,
# 'Feature 4': feature_4,
"Feature 1" :feature_1
})
data_preproc['Period'] = pd.to_datetime(data_preproc['Period'],
format="%Y-%m-%d",errors='coerce')
data_preproc['Period'] = data_preproc['Period'].dt.strftime('%b')
# aiAlertPlot =sns.lineplot(x='Period', y='value', hue='variable',ci=None,
# data=pd.melt(data_preproc, ['Period']))
sns.lineplot(x="Period",y="Feature 1",data=data_preproc)
# plt.xticks(np.linspace(start=0, stop=21, num=52))
plt.xticks(rotation=90)
plt.legend(title="features")
plt.ylabel("Alerts")
plt.legend(loc='upper right')
plt.show()
The output is this. Correct data, wrong chart design.
lmplot is a model based method, which requires numeric x. If you think the date values are evenly spaced, you can just create another variable range which is numeric and calculate lmplot on that variable and then change the xticks labels.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([ "2022-07-27T13:31:00Z",
"2022-08-28T13:31:00Z",
"2022-09-29T13:31:00Z",
"2022-10-30T13:31:00Z",])
power = np.array([10,25,60,42])
df = pd.DataFrame(data = {'T': T, 'power': power})
df['range'] = np.arange(df.shape[0])
sns.lmplot(x='range', y='power', data=df, ci=None, order=4, truncate=False)
plt.xticks(df['range'], df['T'], rotation = 45);
A very newbie question but very new in using Altair library. I have Dates as X-axis and and in Y column containing 0 and 1. I want two lines one for 0 and one for 1 both of different colours. How to do it?
You can do this by mapping this column to the color encoding. Here's a short example:
import altair as alt
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame({
'x': pd.date_range('2020-01-01', freq='D', periods=10),
'y': np.random.randn(10).cumsum(),
'z': np.random.randint(0, 2, 10),
})
alt.Chart(df).mark_line().encode(
x='x:T',
y='y:Q',
color='z:O'
)
I was trying to visualize a facebook stock dataset, where the data for 2014 to 2018 is stored. The dataset looks like this: dataset screenshot
My goal is to visualize the closing column, but by year. That is, year 2014, then 2015 and so on, but they should be in one figure, and one after another. Something like this: expected graph image
But whatever I try, all the graph parts start from index 0, instead of continuing from the end of the previous one. Here's what I got: the graph I generated
Please help me to solve this problem. Thanks!
The most straightforward way is simply to create separate dataframes with empty
values for the non-needed dates.
Here I use an example dataset.
import pandas as pd
import numpy as np
df = pd.DataFrame(
np.random.randint(0, 100, size=100),
index=pd.date_range(start="2020-01-01", periods=100, freq="D"),
)
Then you can create and select the data to plot
df1 = df.copy()
df2 = df.copy()
df1[df.index > pd.to_datetime('2020-02-01')] = np.NaN
df2[df.index < pd.to_datetime('2020-02-01')] = np.NaN
And then simply plot these on the same axis.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1, figsize=(18, 8))
ax.plot(df1)
ax.plot(df2)
The result
I am trying to compare two simple and summarized pandas dataframe with line plot from Seaborn library but one of the lines shifts one unit in X axis. What's wrong with it?
The dataframes are:
Here is my code:
df = pd.read_csv('/home/gazelle/Documents/m3inference/m3_result.csv',index_col='id')
df = df.drop("Unnamed: 0",axis=1)
for i, v in df.iterrows():
if str(i) not in result:
df.drop(i, inplace=True)
else:
df.loc[i, 'estimated'] = result[str(i)]
m3 = pd.read_csv('plot_result.csv').set_index('id')
ids = list(m3.index.values)
m3 = m3['age'].value_counts().to_frame().reset_index().sort_values('index')
m3 = m3.rename(columns={m3.columns[0]:'bucket', m3.columns[1]:'age'})
df_estimated = df[df.index.isin(ids)]['estimated'].value_counts().to_frame().reset_index().sort_values('index')
df_estimated = df_estimated.rename(columns={df_estimated.columns[0]:'bucket', df_estimated.columns[1]:'age'})
sns.lineplot(x='bucket', y='age', data=m3)
sns.lineplot(x='bucket', y='age', data=df_estimated)
And the result is:
As has been pointed out in the comments, the data and code you provide appear to produce the correct result:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
sns.set()
m3 = pd.DataFrame({"index": [2, 3, 4, 1], "age": [123, 116, 66, 33]})
df_estimated = pd.DataFrame({"index": [3, 2, 4, 1], "estimated": [200, 100, 37, 1]})
sns.lineplot(x="index", y="age", data=m3)
sns.lineplot(x="index", y="estimated", data=df_estimated)
plt.show()
This gives a plot which is different from the one you posted above:
From your screenshots it looks like you are working in a Jupyter notebook. You are probably suffering from the issue that at the time you plot, the dataframe m3 no longer has the values you printed above, but has been modified.