Trying to merge dataFrame

Trying to merge dataFrame - python

Please have a look at the picture.
I've been trying to solve this problem on my own, but unfortunately I haven't been successful.
I basically want to have the following format:
close1 close2
date x x
x x
***code***
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
from pandas import Series, DataFrame
import ystockquote
#eon
his1 = ystockquote.get_historical_prices('EOAN.DE', '2013-01-01', '2013-01-10')
eon = DataFrame(his1)
close_eon = eon.ix["Close"]
#RWE
his2 = ystockquote.get_historical_prices('RWE.DE', '2013-01-01', '2013-01-10')
rwe = DataFrame(his2)
close_rwe = rwe.ix["Close"]
fig = plt.figure(); ax = fig.add_subplot(1,1,1)
ax.plot(close_eon)
ax.plot(close_rwe)
plt.show()
eonrwe = eon.append(rwe)

Can you concatenate the close series?
import matplotlib.pyplot as plt
import pandas as pd
import pandas.io.data as web
start = pd.datetime(2013, 1, 1)
end = pd.datetime(2013, 1, 10)
eon = web.DataReader("EOAN.DE", 'yahoo', start, end)
rwe = web.DataReader("RWE.DE", 'yahoo', start, end)
closes = pd.concat({ "eon" : eon["Close"], "rwe" : rwe["Close"]}, axis=1)
closes.plot()
Gives
eon rwe
2013-01-01 14.09 31.24
2013-01-02 14.35 31.61
2013-01-03 14.40 31.53
2013-01-04 14.51 31.90
2013-01-07 14.26 30.93
2013-01-08 14.22 30.95
2013-01-09 14.40 31.30
2013-01-10 14.35 30.70

Related

seaborn : plotting time on x-axis

I'm working with a dataset that only contains datetime objects and I have retrieved the day of the week and reformatted the time in a separate column like this (conversion functions included below):
datetime day_of_week time_of_day
0 2021-06-13 12:56:16 Sunday 20:00:00
5 2021-06-13 12:56:54 Sunday 20:00:00
6 2021-06-13 12:57:27 Sunday 20:00:00
7 2021-07-16 18:55:42 Friday 20:00:00
8 2021-07-16 18:56:03 Friday 20:00:00
9 2021-06-04 18:42:06 Friday 20:00:00
10 2021-06-04 18:49:05 Friday 20:00:00
11 2021-06-04 18:58:22 Friday 20:00:00
What I would like to do is create a kde plot with x-axis = time_of_day (spanning 00:00:00 to 23:59:59), y-axis to be the count of each day_of_week at each hour of the day, and hue = day_of_week. In essence, I'd have seven different distributions representing occurrences during each day of the week.
Here's a sample of the data and my code. Any help would be appreciated:
df = pd.DataFrame([
'2021-06-13 12:56:16',
'2021-06-13 12:56:16',
'2021-06-13 12:56:16',
'2021-06-13 12:56:16',
'2021-06-13 12:56:54',
'2021-06-13 12:56:54',
'2021-06-13 12:57:27',
'2021-07-16 18:55:42',
'2021-07-16 18:56:03',
'2021-06-04 18:42:06',
'2021-06-04 18:49:05',
'2021-06-04 18:58:22',
'2021-06-08 21:31:44',
'2021-06-09 02:14:30',
'2021-06-09 02:20:19',
'2021-06-12 18:05:47',
'2021-06-15 23:46:41',
'2021-06-15 23:47:18',
'2021-06-16 14:19:08',
'2021-06-17 19:08:17',
'2021-06-17 22:37:27',
'2021-06-21 23:31:32',
'2021-06-23 20:32:09',
'2021-06-24 16:04:21',
'2020-05-22 18:29:02',
'2020-05-22 18:29:02',
'2020-05-22 18:29:02',
'2020-05-22 18:29:02',
'2020-08-31 21:38:07',
'2020-08-31 21:38:22',
'2020-08-31 21:38:42',
'2020-08-31 21:39:03',
], columns=['datetime'])
def convert_date(date):
return calendar.day_name[date.weekday()]
def convert_hour(time):
return time[:2]+':00:00'
df['day_of_week'] = pd.to_datetime(df['datetime']).apply(convert_date)
df['time_of_day'] = df['datetime'].astype(str).apply(convert_hour)

Let's try:
converting the datetime column to_datetime
Create a Categorical column from day_of_week codes (so categorical ordering functions correctly)
normalizing the time_of_day to a single day (so comparisons function correctly). This makes it seem like all events occurred within the same day making plotting logic much simpler.
plot the kdeplot
set the xaxis formatter to only display HH:MM:SS
import calendar
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt, dates as mdates
# df = pd.DataFrame({...})
# Convert to datetime
df['datetime'] = pd.to_datetime(df['datetime'])
# Create Categorical Column
cat_type = pd.CategoricalDtype(list(calendar.day_name), ordered=True)
df['day_of_week'] = pd.Categorical.from_codes(
df['datetime'].dt.day_of_week, dtype=cat_type
)
# Create Normalized Date Column
df['time_of_day'] = pd.to_datetime('2000-01-01 ' +
df['datetime'].dt.time.astype(str))
# Plot
ax = sns.kdeplot(data=df, x='time_of_day', hue='day_of_week')
# X axis format
ax.set_xlim([pd.to_datetime('2000-01-01 00:00:00'),
pd.to_datetime('2000-01-01 23:59:59')])
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))
plt.tight_layout()
plt.show()
Note sample size is small here:
If looking for count on y then maybe histplot is better:
ax = sns.histplot(data=df, x='time_of_day', hue='day_of_week')

I would use Timestamp of pandas straight away. By the way your convert_hour function seems to do wrong. It gives time_of_the day as 20:00:00 for all data.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_context("paper", font_scale=2)
sns.set_style('whitegrid')
df['day_of_week'] = df['datetime'].apply(lambda x: pd.Timestamp(x).day_name())
df['time_of_day'] = df['datetime'].apply(lambda x: pd.Timestamp(x).hour)
plt.figure(figsize=(8, 4))
for idx, day in enumerate(days):
sns.kdeplot(df[df.day_of_week == day]['time_of_day'], label=day)
The kde for wednesday, looks a bit strange because the time varies between 2 and 20, hence the long tail from -20 to 40 in the plot.

Here is a simple code and using df.plot.kde.
Added more data so that multiple values are present for each day_of_week for kde to plot. Simplified the code to remove functions.
df1 = pd.DataFrame([
'2020-09-01 16:39:03',
'2020-09-02 16:39:03',
'2020-09-03 16:39:03',
'2020-09-04 16:39:03',
'2020-09-05 16:39:03',
'2020-09-06 16:39:03',
'2020-09-07 16:39:03',
'2020-09-08 16:39:03',
], columns=['datetime'])
df = pd.concat([df,df1]).reset_index(drop=True)
df['day_of_week'] = pd.to_datetime(df['datetime']).dt.day_name()
df['time_of_day'] = df['datetime'].str.split(expand=True)[1].str.split(':',expand=True)[0].astype(int)
df.pivot(columns='day_of_week').time_of_day.plot.kde()
Plots:

Adding new column to data frame with values based on years of data frame

I have a dataframe
import pandas_datareader as webreader
import math
import numpy as np
import pandas as pd
from datetime import date, timedelta, datetime
from pandas.plotting import register_matplotlib_converters
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from sklearn.metrics import mean_absolute_error, mean_squared_error
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout
from keras.callbacks import EarlyStopping
from sklearn.preprocessing import RobustScaler
stockname = 'INFOSYS'
symbol = 'INFY.NS'
df = webreader.DataReader(symbol, start=date_start, end=date_today, data_source="yahoo")
print(df)
Output is,
I wanted to add column "deb/eq" to this dataframe and I want to add values specific to year.
For eg.
for all dates in 2020 - all deb/eq values to be 0.00
for all dates in 2019 - all deb/eq values to be 1.00
for all dates in 2018 and previous - all deb/eq values to be 2.00
How can I add new column and values to dataframe based on dataframes year?

Use Index.map with set not matched values (NaNs) to 2.0 by Index.fillna:
df['deb/eq'] = df.index.year.map({2020:0.0, 2019:1.0}).fillna(2.0)
With numpy.select:
y = df.index.year
df['deb/eq'] = np.select([y == 2020, y == 2019], [0.0, 1.0], default=2.0)
EDIT:
df = pd.DataFrame(index=pd.date_range('2003', '2021', freq='A'))
df['ATR'] = df.index.year.map({2021:91.45, 2020:97.53, 2019:92.62, 2018:81.83, 2017:74.21, 2016:74.22, 2015:76.52, 2014:84.11, 2013:85.44, 2012:87.26, 2011:87.97, 2010:81.10, 2009:95.80, 2008:90.86, 2007:101.25, 2006:99.05, 2005:104.12, 2004:92.67}).fillna(0.0)
print (df)
ATR
2003-12-31 0.00
2004-12-31 92.67
2005-12-31 104.12
2006-12-31 99.05
2007-12-31 101.25
2008-12-31 90.86
2009-12-31 95.80
2010-12-31 81.10
2011-12-31 87.97
2012-12-31 87.26
2013-12-31 85.44
2014-12-31 84.11
2015-12-31 76.52
2016-12-31 74.22
2017-12-31 74.21
2018-12-31 81.83
2019-12-31 92.62
2020-12-31 97.53

Cannot plot predicted time series values using matplotlib

I am trying to plot my actual time series values and predicted values but it gives me this error:
ValueError: view limit minimum -36816.95989583333 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units
I am using statsmodels to fit an arima model to the data.
This is a sample of my data:
datetime value
2017-01-01 00:00:00 10.18
2017-01-01 00:15:00 10.2
2017-01-01 00:30:00 10.32
2017-01-01 00:45:00 10.16
2017-01-01 01:00:00 9.93
2017-01-01 01:15:00 9.77
2017-01-01 01:30:00 9.47
2017-01-01 01:45:00 9.08
This is my code:
mod = sm.tsa.statespace.SARIMAX(
subset,
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 12),
enforce_stationarity=False,
enforce_invertibility=False
)
results = mod.fit()
pred_uc = results.get_forecast(steps=500)
pred_ci = pred_uc.conf_int(alpha = 0.05)
# Plot
fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(1, 1, 1)
ax.plot(subset,color = "blue")
ax.plot(pred_uc.predicted_mean, color="black", alpha=0.5, label='SARIMAX')
plt.show()
Any idea how to fix this?

It should have been an issue on how you provide the data.
The datetime values must be the index of the values in your data subset variable, and thus, the following works.
I've imported the data as follows right before the code you provided:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.api as sm
subset = pd.Series(
[
10.18, 10.2 , 10.32,
10.16, 9.93, 9.77,
9.47, 9.08
]
, index=pd.date_range(
start='2017-01-01T00:00:00.000',
end='2017-01-01T01:45:00.000',
freq='15T'
) )
And I got, I believe, your desired plot (it's cut):
.
I used these versions of the libraries, in Python 3:
matplotlib.version
'3.1.2'
numpy.version
'1.17.4'
pandas.version
'0.25.3'
statsmodels.version
'0.12.0'

Struggling formating xticks with DateTime from Pandas Dataframe

I'm currently struggling with my xaxis format.. the date doesnt start at 2017 but somewhere in the year 1138.
Is there something I've done wrong?
my tick labeling should be like Thu 01.06.2017, Fri 02.06.2017, etc.
%pylab inline
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.dates as mdates
from matplotlib.dates import MO, TU, WE, TH, FR, SA, SU
Gesamt_Apr_Sept_2017 = pd.read_csv('Gesamtstromverbrauch_Met_01.04.2017-30.09.2017-1h.csv',sep=';',decimal = ",", thousands = '.', index_col=0, parse_dates=True, dayfirst=True)
Daten = Gesamt_Apr_Sept_2017['6/1/2017':'06/30/2017']
# Figure erzeugen
fig, ax = plt.subplots(figsize=(15,6))
Daten['LKHF-Strom-Met - Gesamt (kWh)'].plot()
ax.xaxis.set_major_locator(DayLocator(interval=7))
ax.xaxis.set_major_formatter(DateFormatter('%a %d.%m.%Y'))
for tick in ax.get_xticklabels():
tick.set_rotation(90)
diagram: https://imgur.com/a/d4KQQ
The Dataframe looks like:
Date
2017-04-01 01:00:00 1008.0
2017-04-01 02:00:00 996.0
2017-04-01 03:00:00 976.0
2017-04-01 04:00:00 984.0
2017-04-01 05:00:00 1024.0
dtype='datetime64[ns]', name='Date', length=720, freq=None)

Just replace Daten['LKHF-Strom-Met - Gesamt (kWh)'].plot() with Daten[['LKHF-Strom-Met - Gesamt (kWh)']].plot() to fix your error.
Short explanation why:
df[col] returns a Pandas Series and df[[col]] a Pandas Dataframe and for the Dataframe the plot function uses the index as x-axis.
%pylab inline
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.dates as mdates
from matplotlib.dates import MO, TU, WE, TH, FR, SA, SU
Gesamt_Apr_Sept_2017 = pd.read_csv('Gesamtstromverbrauch_Met_01.04.2017-30.09.2017-1h.csv',sep=';',decimal = ",", thousands = '.', index_col=0, parse_dates=True, dayfirst=True)
Daten = Gesamt_Apr_Sept_2017['6/1/2017':'06/30/2017']
# Figure erzeugen
fig, ax = plt.subplots(figsize=(15,6))
Daten[['LKHF-Strom-Met - Gesamt (kWh)']].plot()
ax.xaxis.set_major_locator(DayLocator(interval=7))
ax.xaxis.set_major_formatter(DateFormatter('%a %d.%m.%Y'))
for tick in ax.get_xticklabels():
tick.set_rotation(90)

Wrong labels when plotting a time series pandas dataframe with matplotlib

I am working with a dataframe containing data of 1 week.
y
ds
2017-08-31 10:15:00 1.000000
2017-08-31 10:20:00 1.049107
2017-08-31 10:25:00 1.098214
...
2017-09-07 10:05:00 99.901786
2017-09-07 10:10:00 99.950893
2017-09-07 10:15:00 100.000000
I create a new index by combining the weekday and time i.e.
y
dayIndex
4 - 10:15 1.000000
4 - 10:20 1.049107
4 - 10:25 1.098214
...
4 - 10:05 99.901786
4 - 10:10 99.950893
4 - 10:15 100.000000
The plot of this data is the following:
The plot is correct as the labels reflect the data in the dataframe. However, when zooming in, the labels do not seem correct as they no longer correspond to their original values:
What is causing this behavior?
Here is the code to reproduce this:
import datetime
import numpy as np
import pandas as pd
dtnow = datetime.datetime.now()
dindex = pd.date_range(dtnow , dtnow + datetime.timedelta(7), freq='5T')
data = np.linspace(1,100, num=len(dindex))
df = pd.DataFrame({'ds': dindex, 'y': data})
df = df.set_index('ds')
df = df.resample('5T').mean()
df['dayIndex'] = df.index.strftime('%w - %H:%M')
df= df.set_index('dayIndex')
df.plot()

"What is causing this behavior?"
The formatter of an axes of a pandas dates plot is a matplotlib.ticker.FixedFormatter (see e.g.
print plt.gca().xaxis.get_major_formatter()). "Fixed" means that it formats the ith tick (if shown) with some constant string.
When zooming or panning, you shift the tick locations, but not the format strings.
In short: A pandas date plot may not be the best choice for interactive plots.
Solution
A solution is usually to use matplotlib formatters directly. This requires the dates to be datetime objects (which can be ensured using df.index.to_pydatetime()).
import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates
dtnow = datetime.datetime.now()
dindex = pd.date_range(dtnow , dtnow + datetime.timedelta(7), freq='110T')
data = np.linspace(1,100, num=len(dindex))
df = pd.DataFrame({'ds': dindex, 'y': data})
df = df.set_index('ds')
df.index.to_pydatetime()
df.plot(marker="o")
plt.gca().xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%w - %H:%M'))
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to merge dataFrame - python

Related

seaborn : plotting time on x-axis

Adding new column to data frame with values based on years of data frame

Cannot plot predicted time series values using matplotlib

Struggling formating xticks with DateTime from Pandas Dataframe

Wrong labels when plotting a time series pandas dataframe with matplotlib

Categories

Resources