Advance year problem appear when plotting (pandas && matplotlib) - python

My problem is when I plot the users joining by day the advance year appear, it should not have year 2023. I tried to search it into my csv file and there is no row holding the value of 2023.
data = pd.read_csv('users-current.csv')
#transform datetime to date
data['dateCreated'] = pd.to_datetime(data['created_on']).dt.date
#date Count Registered
dataCreated = data.groupby('dateCreated').size()
#dataCreatedArray = np.array([dataCreated], dtype = object)
dataCreated.head(50)
dataCreated.plot().invert_xaxis()
plt.title('Users Joining in a Day',pad=20, fontdict={'fontsize':24})
plt.show()
the output:
column in my csv used below:

This is because the range of x is automatically generated. Instead, you can explicitly limit a range of x using plt.xlim(), as follows:
import pandas as pd
import matplotlib.pyplot as plt
import datetime
data = pd.read_csv('users-current.csv')
#transform datetime to date
data['dateCreated'] = pd.to_datetime(data['created_on']).dt.date
#date Count Registered
dataCreated = data.groupby('dateCreated').size()
#dataCreatedArray = np.array([dataCreated], dtype = object)
dataCreated.head(50)
dataCreated.plot().invert_xaxis()
# import datetime, and use this code to set a period as you want.
plt.xlim([datetime.date(2021, 1, 1), datetime.date(2022, 12, 31)])
plt.title('Users Joining in a Day', pad=20, fontdict={'fontsize':24})
plt.show()

Related

Bar graph drawing using month from date in pandas

Need to draw a bar chart using below data set. X axis needs to be Territory and Y axis needs to be average production in each territory and hue needs to contain the month from the date column.
Not exactly sure what you are asking. When you say average production, do you want to calculate average production from a Territory, or just display the value that is in the production column? If you clarify I can update my answer. In my example I just display the data from the production column. First export your spreadsheet to csv. Then you can do the following:
import calendar
import datetime
import pandas as pd
import plotly.express as ex
df = pd.read_csv("data.csv")
def get_month_names(dataframe: pd.DataFrame):
# Get all the dates
dates = dataframe["Date"].to_list()
# Convert date-string to datetime object
# I assume month/day/year, if it is day/month/year, swap %m and %d
date_objs = [datetime.datetime.strptime(date, "%m/%d/%Y %H:%M:%S") for date in dates]
# Get all the months
months = [date.month for date in date_objs]
# Get the names of the months
month_names = [calendar.month_name[month] for month in months]
return month_names
fig = ex.bar(x=df["Territory"],
y=df["Production"],
color=get_month_names(df))
fig.show()
this produces:

How to highlight a plotline chart with vertical color bar for specific weekdays (saturday and sunday)? [duplicate]

This question already has answers here:
how to highlight weekends for time series line plot in python
(3 answers)
Closed 2 years ago.
i plotted a daily line plot for flights and i would like to highlight all the saturdays and sundays. I'm trying to do it with axvspan but i'm struggling with the use of it? Any suggestions on how can this be coded?
(flights.loc[flights['date'].dt.month.between(1, 2), 'date']
.dt.to_period('D')
.value_counts()
.sort_index()
.plot(kind="line",figsize=(12,6))
)
Thx in advance for any help provided
Using a date column of type pandas timestamp, you can get the weekday of a date directly using pandas.Timestamp.weekday. Then you can use df.iterrows() to check whether or not each date is a saturday or sunday and include a shape in the figure like this:
for index, row in df.iterrows():
if row['date'].weekday() == 5 or row['date'].weekday() == 6:
fig.add_shape(...)
With a setup like this, you would get a line indicating whether or not each date is a saturday or sunday. But given that you're dealing with a continuous time series, it would probably make sense to illustrate these periods as an area for the whole period instead of highlighting each individual day. So just identify each saturday and set the whole period to each saturday plus pd.DateOffset(1) to get this:
Complete code with sample data
# imports
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import datetime
pd.set_option('display.max_rows', None)
# data sample
cols = ['signal']
nperiods = 20
np.random.seed(12)
df = pd.DataFrame(np.random.randint(-2, 2, size=(nperiods, len(cols))),
columns=cols)
datelist = pd.date_range(datetime.datetime(2020, 1, 1).strftime('%Y-%m-%d'),periods=nperiods).tolist()
df['date'] = datelist
df = df.set_index(['date'])
df.index = pd.to_datetime(df.index)
df.iloc[0] = 0
df = df.cumsum().reset_index()
df['signal'] = df['signal'] + 100
# plotly setup
fig = px.line(df, x='date', y=df.columns[1:])
fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='rgba(0,0,255,0.1)')
fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='rgba(0,0,255,0.1)')
for index, row in df.iterrows():
if row['date'].weekday() == 5: #or row['date'].weekday() == 6:
fig.add_shape(type="rect",
xref="x",
yref="paper",
x0=row['date'],
y0=0,
# x1=row['date'],
x1=row['date'] + pd.DateOffset(1),
y1=1,
line=dict(color="rgba(0,0,0,0)",width=3,),
fillcolor="rgba(0,0,0,0.1)",
layer='below')
fig.show()
You can use pandas' dt.weekday to get an integer corresponding to the weekday of a given date. 5 equals to Saturday and 6 to Sunday (Monday is 0). You can use this information as an additional way to slice your dataframe and filter those entries that belong to either Saturdays or Sundays. As you mentioned they can be highlighted with axvspan and matplotlib versions >3 are able to use the datetime objects as an input. 1 day has to be added via datetime.timedelta, because no rectangle will be drawn if xmin=xmax.
Here is the code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
#create sample data and dataframe
datelist = pd.date_range(start="2014-12-09",end="2015-03-02").tolist()
datelist += datelist #test to see whether it works with multiple entries having the same date
flights = pd.DataFrame(datelist, columns=["date"])
#plot command, save object in variable
plot = flights.loc[flights['date'].dt.month.between(1, 2), 'date'].dt.to_period('D').value_counts().sort_index().plot(kind="line",figsize=(12,6))
#filter out saturdays and sundays from the date range needed
weekends = flights.loc[(flights['date'].dt.month.between(1, 2)) & ((flights['date'].dt.weekday == 5) | (flights['date'].dt.weekday == 6)), 'date']
#5 = Saturday, 6 = Sunday
#plot axvspan for every sat or sun, set() to get unique dates
for day in set(weekends.tolist()):
plot.axvspan(day, day + datetime.timedelta(days=1))

How to add month column to a date column in python?

date['Maturity_date'] = data.apply(lambda data: relativedelta(months=int(data['TRM_LNTH_MO'])) + data['POL_EFF_DT'], axis=1)
Tried this also:
date['Maturity_date'] = date['POL_EFF_DT'] + date['TRM_LNTH_MO'].values.astype("timedelta64[M]")
TypeError: 'type' object does not support item assignment
import pandas as pd
import datetime
#Convert the date column to date format
date['date_format'] = pd.to_datetime(date['Maturity_date'])
#Add a month column
date['Month'] = date['date_format'].apply(lambda x: x.strftime('%b'))
If you are using Pandas, you may use a resource called: "Frequency Aliases". Something very out of the box:
# For "periods": 1 (is the current date you have) and 2 the result, plus 1, by the frequency of 'M' (month).
import pandas as pd
_new_period = pd.date_range(_existing_date, periods=2, freq='M')
Now you can get exactly the period you want as the second element returned:
# The index for your information is 1. Index 0 is the existing date.
_new_period.strftime('%Y-%m-%d')[1]
# You can format in different ways. Only Year, Month or Day. Whatever.
Consult this link for further information

Not able to display time on X-Axis using Matplotlib

I am reading some data from the file, and then I have to plot this data for visual representation.
The data in the file is present in the following format:
16:08:45,64,31.2
16:08:46,60,29.3
16:08:47,60,29.3
16:08:48,60,29.3
16:08:49,60,29.3
.
.
This data is present in a file with the current date.
The file consist of time (Hour:Minute:Second), Adc Counts, Temperature Value
I used the following code to read the data from the file using Pandas.
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
year = 2018 # For 2018
'''
month = input ('Enter Month : ')
date = input ('Enter Date : ')
month = int(month)
date = int(date)
'''
# Hardcoded Values for Testing Purpose
month = 1
date = 20
headers = ['Time', 'ADC', 'Temperature']
filename = '%.2d-%.2d-%.2d.txt' % (month, date, year-2000)
print (filename)
try:
df = pd.read_table( filename, ',', names=headers,\
engine='python', header=None)
except:
print ('No Such File in Database')
print ('Exiting Program')
exit()
FMT = '%H:%M:%S'
df['Time'] = df['Time'].map(lambda x: datetime.strptime(str(x), FMT))
df['Time'] = df['Time'].map(lambda x: x.replace(day=date, month=month, year=year))
plt.plot( df['Time'], df['ADC'])
plt.ylim( [0,200])
#plt.gcf().autofmt_xdate()
plt.show()
I didn't get why the x-axis doesn't have correct values.
Is it due to the reason, that samples are too close ( 1sec) apart?
I only want time information on X-Axis.
Please suggest how I can get that.
Thanks in advance.
Update:
From comments, I am able to find the reason, why this is happening.
Pandas is treating my Date and Time as timestamp object while for plotting it with Matplotlib, it should be datetime object.
My problem will get solved if I am able to convert df['Time'] from Timestamp to datetime.
I searched online and found the pd.to_datetime, will do this work for me, but unfortunately, it doesn't work.
The following commands will list what I have done.
>>> x = pd.Timestamp( "2018-01-20")
>>> x
Timestamp('2018-01-20 00:00:00')
>>> pd.to_datetime( x, format="%Y-%m-%d")
Timestamp('2018-01-20 00:00:00')
As you can see above, i still get timestamp object
I searched again to check why it is not working, then I found that the following line will work.
>>> x = pd.Timestamp( "2018-01-20")
>>> y = pd.to_datetime( x, format="%Y-%m-%d")
>>> y.to_datetime()
Warning (from warnings module):
File "D:\Program Files\Python36\lib\idlelib\run.py", line 457
exec(code, self.locals)
FutureWarning: to_datetime is deprecated. Use self.to_pydatetime()
datetime.datetime(2018, 1, 20, 0, 0)
>>>
To remove this warning following commands can be used.
>>> x = pd.Timestamp( "2018-01-20")
>>> y = pd.to_datetime( x, format="%Y-%m-%d")
>>> y.to_pydatetime()
datetime.datetime(2018, 1, 20, 0, 0)
Now I tested these command in my project to see if this is working with my dataframe or not, I just took one value for testing.
>>> x = df['Time'][0].to_pydatetime()
>>> x
datetime.datetime(2018, 1, 20, 16, 8, 45)
So, yes it is working, now I have to apply this on complete column of dataframe and I used the following command.
>>> df['New'] = df['Time'].apply(lambda x: x.to_pydatetime())
>>> df['New'][0]
Timestamp('2018-01-20 16:08:45')
But it's still the same.
I am newbie, so I am not able to understand what I am doing wrong.
First we need a minimal, verifiable example:
u = u"""16:08:45,64,31.2
16:08:46,54,29.3
16:08:47,36,34.3
16:08:48,67,36.3
16:08:49,87,29.3"""
import io
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
year = 2018
month = 1
date = 20
headers = ['Time', 'ADC', 'Temperature']
df = pd.read_table(io.StringIO(u), ',', names=headers,\
engine='python', header=None)
FMT = '%H:%M:%S'
df['Time'] = df['Time'].map(lambda x: datetime.strptime(str(x), FMT))
df['Time'] = df['Time'].map(lambda x: x.replace(day=date, month=month, year=year))
plt.plot( df['Time'], df['ADC'])
plt.ylim( [20,100])
plt.gcf().autofmt_xdate()
plt.show()
Running this code with pandas 0.20.1 and matplotlib 2.1, the following plot is produced, which looks as desired:
The reason this works, even if the dates are pandas time stamps is that matplotlib uses the pandas converters internally, if they are available.
If not, one may first try to load them manually,
import pandas.plotting._converter as pandacnv
pandacnv.register()
If this also fails, one may indeed try to convert the timestamps to datetime objects.
dt = [x.to_pydatetime() for x in df['Time']]
plt.plot( dt, df['ADC'])

Extract date from Pandas DataFrame

I want to download adjusted close prices and their corresponding dates from yahoo, but I can't seem to figure out how to get dates from pandas DataFrame.
I was reading an answer to this question
from pandas.io.data import DataReader
from datetime import datetime
goog = DataReader("GOOG", "yahoo", datetime(2000,1,1), datetime(2012,1,1))
print goog["Adj Close"]
and this part works fine; however, I need to extract the dates that correspond to the prices.
For example:
adj_close = np.array(goog["Adj Close"])
Gives me a 1-D array of adjusted closing prices, I am looking for 1-D array of dates, such that:
date = # what do I do?
adj_close[0] corresponds to date[0]
When I do:
>>> goog.keys()
Index([Open, High, Low, Close, Volume, Adj Close], dtype=object)
I see that none of the keys will give me anything similar to the date, but I think there has to be a way to create an array of dates. What am I missing?
You can get it by goog.index which is stored as a DateTimeIndex.
To get a series of date, you can do
goog.reset_index()['Date']
import numpy as np
import pandas as pd
from pandas.io.data import DataReader
symbols_list = ['GOOG','IBM']
d = {}
for ticker in symbols_list:
d[ticker] = DataReader(ticker, "yahoo", '2014-01-01')
pan = pd.Panel(d)
df_adj_close = pan.minor_xs('Adj Close') #also use 'Open','High','Low','Adj Close' and 'Volume'
#the dates of the adjusted closes from the dataframe containing adjusted closes on multiple stocks
df_adj_close.index
# create a dataframe that has data on only one stock symbol
df_individual = pan.get('GOOG')
# the dates from the dataframe of just 'GOOG' data
df_individual.index

Categories

Resources