Need to draw a bar chart using below data set. X axis needs to be Territory and Y axis needs to be average production in each territory and hue needs to contain the month from the date column.
Not exactly sure what you are asking. When you say average production, do you want to calculate average production from a Territory, or just display the value that is in the production column? If you clarify I can update my answer. In my example I just display the data from the production column. First export your spreadsheet to csv. Then you can do the following:
import calendar
import datetime
import pandas as pd
import plotly.express as ex
df = pd.read_csv("data.csv")
def get_month_names(dataframe: pd.DataFrame):
# Get all the dates
dates = dataframe["Date"].to_list()
# Convert date-string to datetime object
# I assume month/day/year, if it is day/month/year, swap %m and %d
date_objs = [datetime.datetime.strptime(date, "%m/%d/%Y %H:%M:%S") for date in dates]
# Get all the months
months = [date.month for date in date_objs]
# Get the names of the months
month_names = [calendar.month_name[month] for month in months]
return month_names
fig = ex.bar(x=df["Territory"],
y=df["Production"],
color=get_month_names(df))
fig.show()
this produces:
Related
My problem is when I plot the users joining by day the advance year appear, it should not have year 2023. I tried to search it into my csv file and there is no row holding the value of 2023.
data = pd.read_csv('users-current.csv')
#transform datetime to date
data['dateCreated'] = pd.to_datetime(data['created_on']).dt.date
#date Count Registered
dataCreated = data.groupby('dateCreated').size()
#dataCreatedArray = np.array([dataCreated], dtype = object)
dataCreated.head(50)
dataCreated.plot().invert_xaxis()
plt.title('Users Joining in a Day',pad=20, fontdict={'fontsize':24})
plt.show()
the output:
column in my csv used below:
This is because the range of x is automatically generated. Instead, you can explicitly limit a range of x using plt.xlim(), as follows:
import pandas as pd
import matplotlib.pyplot as plt
import datetime
data = pd.read_csv('users-current.csv')
#transform datetime to date
data['dateCreated'] = pd.to_datetime(data['created_on']).dt.date
#date Count Registered
dataCreated = data.groupby('dateCreated').size()
#dataCreatedArray = np.array([dataCreated], dtype = object)
dataCreated.head(50)
dataCreated.plot().invert_xaxis()
# import datetime, and use this code to set a period as you want.
plt.xlim([datetime.date(2021, 1, 1), datetime.date(2022, 12, 31)])
plt.title('Users Joining in a Day', pad=20, fontdict={'fontsize':24})
plt.show()
This question already has answers here:
how to highlight weekends for time series line plot in python
(3 answers)
Closed 2 years ago.
i plotted a daily line plot for flights and i would like to highlight all the saturdays and sundays. I'm trying to do it with axvspan but i'm struggling with the use of it? Any suggestions on how can this be coded?
(flights.loc[flights['date'].dt.month.between(1, 2), 'date']
.dt.to_period('D')
.value_counts()
.sort_index()
.plot(kind="line",figsize=(12,6))
)
Thx in advance for any help provided
Using a date column of type pandas timestamp, you can get the weekday of a date directly using pandas.Timestamp.weekday. Then you can use df.iterrows() to check whether or not each date is a saturday or sunday and include a shape in the figure like this:
for index, row in df.iterrows():
if row['date'].weekday() == 5 or row['date'].weekday() == 6:
fig.add_shape(...)
With a setup like this, you would get a line indicating whether or not each date is a saturday or sunday. But given that you're dealing with a continuous time series, it would probably make sense to illustrate these periods as an area for the whole period instead of highlighting each individual day. So just identify each saturday and set the whole period to each saturday plus pd.DateOffset(1) to get this:
Complete code with sample data
# imports
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import datetime
pd.set_option('display.max_rows', None)
# data sample
cols = ['signal']
nperiods = 20
np.random.seed(12)
df = pd.DataFrame(np.random.randint(-2, 2, size=(nperiods, len(cols))),
columns=cols)
datelist = pd.date_range(datetime.datetime(2020, 1, 1).strftime('%Y-%m-%d'),periods=nperiods).tolist()
df['date'] = datelist
df = df.set_index(['date'])
df.index = pd.to_datetime(df.index)
df.iloc[0] = 0
df = df.cumsum().reset_index()
df['signal'] = df['signal'] + 100
# plotly setup
fig = px.line(df, x='date', y=df.columns[1:])
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='rgba(0,0,255,0.1)')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='rgba(0,0,255,0.1)')
for index, row in df.iterrows():
if row['date'].weekday() == 5: #or row['date'].weekday() == 6:
fig.add_shape(type="rect",
xref="x",
yref="paper",
x0=row['date'],
y0=0,
# x1=row['date'],
x1=row['date'] + pd.DateOffset(1),
y1=1,
line=dict(color="rgba(0,0,0,0)",width=3,),
fillcolor="rgba(0,0,0,0.1)",
layer='below')
fig.show()
You can use pandas' dt.weekday to get an integer corresponding to the weekday of a given date. 5 equals to Saturday and 6 to Sunday (Monday is 0). You can use this information as an additional way to slice your dataframe and filter those entries that belong to either Saturdays or Sundays. As you mentioned they can be highlighted with axvspan and matplotlib versions >3 are able to use the datetime objects as an input. 1 day has to be added via datetime.timedelta, because no rectangle will be drawn if xmin=xmax.
Here is the code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
#create sample data and dataframe
datelist = pd.date_range(start="2014-12-09",end="2015-03-02").tolist()
datelist += datelist #test to see whether it works with multiple entries having the same date
flights = pd.DataFrame(datelist, columns=["date"])
#plot command, save object in variable
plot = flights.loc[flights['date'].dt.month.between(1, 2), 'date'].dt.to_period('D').value_counts().sort_index().plot(kind="line",figsize=(12,6))
#filter out saturdays and sundays from the date range needed
weekends = flights.loc[(flights['date'].dt.month.between(1, 2)) & ((flights['date'].dt.weekday == 5) | (flights['date'].dt.weekday == 6)), 'date']
#5 = Saturday, 6 = Sunday
#plot axvspan for every sat or sun, set() to get unique dates
for day in set(weekends.tolist()):
plot.axvspan(day, day + datetime.timedelta(days=1))
I've tried df['release_date'] = pd.to_datetime(df['release_date'], format='%d/%m/%Y')... but it doesn't match the format of the dates in the file. Example of object list date which is in the file is: 8/14/1960
Can someone help me convert this to a datetime object so that I can plot with matplotlib? or how would I go about plotting a scatter-plot with this data?
Try this. Yo alright except you ordered format day month and not month day
Data
df=pd.DataFrame({'release_date':['8/14/1960','8/14/1989'],'value':[20,200]})
import matplotlib .pyplot as plt
#df['release_date'] = pd.to_datetime(df['release_date'], format='%m/%d/%Y')
#or
df['release_date'] = pd.to_datetime(df['release_date']).dt.strftime('%m/%d/%Y')
df.set_index('release_date').plot(kind='bar')
What I need to do is basically calculate the responses received over a period of time.
I.E
07/07/2019 | 6
08/07/2019 | 7
And plot the above to a graph.
But the current data is in the below format:
07/07/2019 17:33:07
07/07/2019 12:00:03
08/07/2019 21:10:05
08/07/2019 20:06:09
So far,
import pandas as pd
df = pd.read_csv('survey_results_public.csv')
df.head()
df['Timestamp'].value_counts().plot(kind="bar")
plt.show()
But the above doesn't look good.
You are counting all values in the timestamp column so you will have 1 response per timestamp.
You should parse the timestamp column, check the unique dates and then count the number of timestamps that belong to each date.
Only then should you plot the data.
So do something like this:
import pandas as pd
import datetime
def parse_timestamps(timestamp):
datetime.datetime.strptime(timestamp, '%d/%m/%Y %H:%M:%S')
df = pandas.read_csv('survey_results_public.csv')
df["Date"]=df["Timestamp"].map(lambda t: parse_timestamps(t).date())
df["Date"].value_counts().plot(kind="bar")
I want to plot a line graph of ECG in mV and time in HH:MM:SS:MMM . its a 10 second ECG strip.
image of ECG CSV file with two ECG values and time
I Have extract the Time column and now i want to convert the time column in dataframe of python and then plot it on graph
but when I apply to_datetime() function it give me the following error
to assemble mappings requires at least that [year, month, day] be
specified: [day,month,year] is missing
Screenshot of error i get
please Help me to resolve this error , I only want to put %H:%M:%S.%f because i don not have the year , months and days.
As commented you can add a date to those times. The date can be arbitrary. Then you can convert to datetime and use them to plot your graph.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
data = {'Time':['11:20:15.333','12:00:00.444', '13:46:00.100'],
'A':[1,3,2],'B':[5,5,4]}
df = pd.DataFrame(data=data)
df["Date"] = "2019-09-09"
df['Datetime'] = pd.to_datetime(df['Date']) + pd.to_timedelta(df['Time'])
df = df[["Datetime", "A", "B"]].set_index("Datetime")
ax = df.plot(x_compat=True)
ax.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M:%S.%f"))
plt.show()
See to_timedelta
Date :
df1 = {'Time':['11:20:15.333','10:00:00.444'],'P1':['102','102'],'P2':['240','247']}
df1 = pd.DataFrame(data=df1)
df1
Code :
df1['Time'] = pd.to_timedelta(df1['Time'])
df1
Result:
Time P1 P2
0 11:20:15.333000 102 240
1 10:00:00.444000 102 247
Reference : https://stackoverflow.com/a/46801500/1855988
You need to specify the format you want to convert the time to. You can find out more information here about what each symbol means.
# before it is object
df['column_name'] = pd.to_datetime(df['column_name'], format="%H:%M:%S,%f")
df = df.astype('datetime64')
df['column_name'] = pd.to_datetime(df['column_name'], format='%H:%M:%S', errors='coerce').dt.time
# coming back to object
print(df.head())
# print(df.info())