How to change Dataframe columns names - python

Let's say I have an array of values corresponding to stock symbols: all = {AAPL, TSLA, MSFT}.
I created a function to scrap their historical prices on YFinance.
def Scrapping(symbol):
aapl = yf.Ticker(symbol)
ainfo = aapl.history(period='1y')
global castex_date
castex_date = ainfo.index
return ainfo.Close
all_Assets = list(map(Scrapping, all))
print(all_Assets)
This is an output sample:
[Date
2018-12-12 00:00:00-05:00 53.183998
2018-12-13 00:00:00-05:00 53.095001
Name: Close, Length: 1007, dtype: float64, Date
2018-12-12 00:00:00-05:00 24.440001
2018-12-13 00:00:00-05:00 25.119333
2022-12-08 00:00:00-05:00 247.399994
2022-12-09 00:00:00-05:00 245.419998
Name: Close, Length: 1007, dtype: float64]
The issue is that all these symbols' historical data have the same name 'Output'. When putting all of these in a dataframe, we get:
df = pd.DataFrame(all_Assets)
df2 = df.transpose()
print(df2)
Close Close Close
Date
2018-12-12 00:00:00-05:00 53.183998 24.440001 104.500511
2018-12-13 00:00:00-05:00 53.095001 25.119333 104.854965
2018-12-14 00:00:00-05:00 52.105000 24.380667 101.578560
2018-12-17 00:00:00-05:00 50.826500 23.228001 98.570389
2018-12-18 00:00:00-05:00 51.435501 22.468666 99.605042
This creates an issue when plotting the DF.
I need these columns names to be equal to the 'symbol' parameter of the function. SO, automatically, the column names would be AAPL TSLA MSFT

You can use:
import pandas as pd
import yfinance as yf
all_symbols = ['AAPL', 'TSLA', 'MSFT']
def scrapping(symbol):
ticker = yf.Ticker(symbol)
data = ticker.history(period='1y')
return data['Close'].rename(symbol)
all_assets = map(scrapping, all_symbols)
df = pd.concat(all_assets, axis=1)
One-liner version:
df = pd.concat({symbol: yf.Ticker(symbol).history(period='1y')['Close']
for symbol in all_symbols}, axis=1)
Output:
>>> df
AAPL TSLA MSFT
Date
2022-02-14 167.863144 291.920013 292.261475
2022-02-15 171.749573 307.476654 297.680695
2022-02-16 171.511032 307.796661 297.333221
2022-02-17 167.863144 292.116669 288.626678
2022-02-18 166.292648 285.660004 285.846893
... ... ... ...
2023-02-08 151.688400 201.289993 266.730011
2023-02-09 150.639999 207.320007 263.619995
2023-02-10 151.009995 196.889999 263.100006
2023-02-13 153.850006 194.639999 271.320007
2023-02-14 152.949997 203.729996 272.790009
[252 rows x 3 columns]

Related

Create a dataframe from a date range in python

Given an interval from two dates, which will be a Python TimeStamp.
create_interval('2022-01-12', '2022-01-17', 'Holidays')
Create the following dataframe:
date
interval_name
2022-01-12 00:00:00
Holidays
2022-01-13 00:00:00
Holidays
2022-01-14 00:00:00
Holidays
2022-01-15 00:00:00
Holidays
2022-01-16 00:00:00
Holidays
2022-01-17 00:00:00
Holidays
If it can be in a few lines of code I would appreciate it. Thank you very much for your help.
If you're open to using Pandas, this should accomplish what you've requested
import pandas as pd
def create_interval(start, end, field_val):
#setting up index date range
idx = pd.date_range(start, end)
#create the dataframe using the index above, and creating the empty column for interval_name
df = pd.DataFrame(index = idx, columns = ['interval_name'])
#set the index name
df.index.names = ['date']
#filling out all rows in the 'interval_name' column with the field_val parameter
df.interval_name = field_val
return df
create_interval('2022-01-12', '2022-01-17', 'holiday')
I hope I coded exactly what you need.
import pandas as pd
def create_interval(ts1, ts2, interval_name):
ts_list_dt = pd.date_range(start=ts1, end=ts2).to_pydatetime().tolist()
ts_list = list(map(lambda x: ''.join(str(x)), ts_list_dt))
d = {'date': ts_list, 'interval_name': [interval_name]*len(ts_list)}
df = pd.DataFrame(data=d)
return df
df = create_interval('2022-01-12', '2022-01-17', 'Holidays')
print(df)
output:
date interval_name
0 2022-01-12 00:00:00 Holidays
1 2022-01-13 00:00:00 Holidays
2 2022-01-14 00:00:00 Holidays
3 2022-01-15 00:00:00 Holidays
4 2022-01-16 00:00:00 Holidays
5 2022-01-17 00:00:00 Holidays
If you want DataFrame without Index column, use df = df.set_index('date') after creating DataFrame df = pd.DataFrame(data=d). And then you will get:
date interval_name
2022-01-12 00:00:00 Holidays
2022-01-13 00:00:00 Holidays
2022-01-14 00:00:00 Holidays
2022-01-15 00:00:00 Holidays
2022-01-16 00:00:00 Holidays
2022-01-17 00:00:00 Holidays

Turning daily stock prices into weekly/monthly/quarterly/semester/yearly?

I'm trying to convert daily prices into weekly, monthly, quarterly, semesterly, yearly, but the code only works when I run it for one stock. When I add another stock to the list the code crashes and gives two errors. 'ValueError: Length of names must match number of levels in MultiIndex.' and 'TypeError: other must be a MultiIndex or a list of tuples.' I'm not experienced with MultiIndexing and have searched everywhere with no success.
This is the code:
import yfinance as yf
from pandas_datareader import data as pdr
symbols = ['AMZN', 'AAPL']
yf.pdr_override()
df = pdr.get_data_yahoo(symbols, start = '2014-12-01', end = '2021-01-01')
df = df.reset_index()
df.Date = pd.to_datetime(df.Date)
df.set_index('Date', inplace = True)
res = {'Open': 'first', 'Adj Close': 'last'}
dfw = df.resample('W').agg(res)
dfw_ret = (dfw['Adj Close'] / dfw['Open'] - 1)
dfm = df.resample('BM').agg(res)
dfm_ret = (dfm['Adj Close'] / dfm['Open'] - 1)
dfq = df.resample('Q').agg(res)
dfq_ret = (dfq['Adj Close'] / dfq['Open'] - 1)
dfs = df.resample('6M').agg(res)
dfs_ret = (dfs['Adj Close'] / dfs['Open'] - 1)
dfy = df.resample('Y').agg(res)
dfy_ret = (dfy['Adj Close'] / dfy['Open'] - 1)
print(dfw_ret)
print(dfm_ret)
print(dfq_ret)
print(dfs_ret)
print(dfy_ret)```
This is what the original df prints:
```Adj Close Open
AAPL AMZN AAPL AMZN
Date
2014-12-01 26.122288 326.000000 29.702499 338.119995
2014-12-02 26.022408 326.309998 28.375000 327.500000
2014-12-03 26.317518 316.500000 28.937500 325.730011
2014-12-04 26.217640 316.929993 28.942499 315.529999
2014-12-05 26.106400 312.630005 28.997499 316.799988
... ... ... ... ...
2020-12-24 131.549637 3172.689941 131.320007 3193.899902
2020-12-28 136.254608 3283.959961 133.990005 3194.000000
2020-12-29 134.440399 3322.000000 138.050003 3309.939941
2020-12-30 133.294067 3285.850098 135.580002 3341.000000
2020-12-31 132.267349 3256.929932 134.080002 3275.000000
And this is what the different df_ret print when I go from daily
to weekly/monthly/etc but it can only do it for one stock and
the idea is to be able to do it for multiple stocks:
Date
2014-12-07 -0.075387
2014-12-14 -0.013641
2014-12-21 -0.029041
2014-12-28 0.023680
2015-01-04 0.002176
...
2020-12-06 -0.014306
2020-12-13 -0.012691
2020-12-20 0.018660
2020-12-27 -0.008537
2021-01-03 0.019703
Freq: W-SUN, Length: 318, dtype: float64
Date
2014-12-31 -0.082131
2015-01-30 0.134206
2015-02-27 0.086016
2015-03-31 -0.022975
2015-04-30 0.133512
...
2020-08-31 0.085034
2020-09-30 -0.097677
2020-10-30 -0.053569
2020-11-30 0.034719
2020-12-31 0.021461
Freq: BM, Length: 73, dtype: float64
Date
2014-12-31 -0.082131
2015-03-31 0.190415
2015-06-30 0.166595
2015-09-30 0.165108
2015-12-31 0.322681
2016-03-31 -0.095461
2016-06-30 0.211909
2016-09-30 0.167275
2016-12-31 -0.103026
2017-03-31 0.169701
2017-06-30 0.090090
2017-09-30 -0.011760
2017-12-31 0.213143
2018-03-31 0.234932
2018-06-30 0.199052
2018-09-30 0.190349
2018-12-31 -0.257182
2019-03-31 0.215363
2019-06-30 0.051952
2019-09-30 -0.097281
2019-12-31 0.058328
2020-03-31 0.039851
2020-06-30 0.427244
2020-09-30 0.141676
2020-12-31 0.015252
Freq: Q-DEC, dtype: float64
Date
2014-12-31 -0.082131
2015-06-30 0.388733
2015-12-31 0.538386
2016-06-30 0.090402
2016-12-31 0.045377
2017-06-30 0.277180
2017-12-31 0.202181
2018-06-30 0.450341
2018-12-31 -0.107405
2019-06-30 0.292404
2019-12-31 -0.039075
2020-06-30 0.471371
2020-12-31 0.180907
Freq: 6M, dtype: float64
Date
2014-12-31 -0.082131
2015-12-31 1.162295
2016-12-31 0.142589
2017-12-31 0.542999
2018-12-31 0.281544
2019-12-31 0.261152
2020-12-31 0.737029
Freq: A-DEC, dtype: float64```
Without knowing what your df DataFrame looks like I am assuming it is an issue with correctly handling the resampling on a MultiIndex similar to the one talked about in this question.
The solution listed there is to use pd.Grouper with the freq and level parameters filled out correctly.
# This is just from the listed solution so I am not sure if these is the correct level to choose
df.groupby(pd.Grouper(freq='W', level=-1))
If this doesn't work, I think you would need to provide some more detail or a dummy data set to reproduce the issue.

Slicing pandas dataframe via index datetime

I tried to slice a pandas dataframe, that was read from the CSV file and the index was set from the first column of dates.
IN:
df = pd.read_csv(r'E:\...\^d.csv')
df["Date"] = pd.to_datetime(df["Date"])
OUT:
Date Open High Low Close Volume
0 1920-01-02 9.52 9.52 9.52 9.52 NaN
1 1920-01-03 9.62 9.62 9.62 9.62 NaN
2 1920-01-05 9.57 9.57 9.57 9.57 NaN
3 1920-01-06 9.46 9.46 9.46 9.46 NaN
4 1920-01-07 9.47 9.47 9.47 9.47 NaN
Date Open High Low Close Volume
26798 2020-10-26 3441.42 3441.42 3364.86 3400.97 2.435787e+09
26799 2020-10-27 3403.15 3409.51 3388.71 3390.68 2.395102e+09
26800 2020-10-28 3342.48 3342.48 3268.89 3271.03 3.147944e+09
26801 2020-10-29 3277.17 3341.05 3259.82 3310.11 2.752626e+09
26802 2020-10-30 3293.59 3304.93 3233.94 3269.96 3.002804e+09
IN:
df = df.set_index(['Date'])
print("my index type is ")
print(df.index.dtype)
print(type(df.index)) #type of index
OUT:
Open High Low Close Volume
Date
2007-01-03 1418.03 1429.42 1407.86 1416.60 1.905089e+09
2007-01-04 1416.95 1421.84 1408.22 1418.34 1.669144e+09
2007-01-05 1418.34 1418.34 1405.75 1409.71 1.621889e+09
2007-01-08 1409.22 1414.98 1403.97 1412.84 1.535189e+09
2007-01-09 1412.85 1415.61 1405.42 1412.11 1.687989e+09
... ... ... ... ...
2009-12-24 1120.59 1126.48 1120.59 1126.48 7.042833e+08
2009-12-28 1126.48 1130.38 1123.51 1127.78 1.509111e+09
2009-12-29 1127.78 1130.38 1126.08 1126.19 1.383900e+09
2009-12-30 1126.19 1126.42 1121.94 1126.42 1.265167e+09
2009-12-31 1126.42 1127.64 1114.81 1115.10 1.153883e+09
my index type is
datetime64[ns]
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
I try to slice for Mondays using
monday_dow = df["Date"].dt.dayofweek==0
OUT (Spyder returns):
KeyError: 'Date'
I've read a lot and similar answers on stackoverflow, but could fix this, although I understand I do something wrong with index, it should be called another way?
You need filter by DatetimeIndex by DatetimeIndex.dayofweek (removed .dt used only for columns):
monday_dow = df.index.dayofweek==0
So if need all rows:
df1 = df[monday_dow]
Also here is possible simplify code for set DatimeIndex in read_csv:
df = pd.read_csv(r'E:\...\^d.csv', index_col=['Date'], parse_dates=['Date'])
monday_dow = df.index.dayofweek==0
df1 = df[monday_dow]

converting daily stock data to weekly in pandas in Python

I have following data format.
Date Open High Low Close
2018-11-12 **10607.80** 10645.50 10464.05 10482.20
2018-11-13 10451.90 10596.25 10440.55 10582.50
2018-11-14 10634.90 10651.60 10532.70 10576.30
2018-11-15 10580.60 10646.50 10557.50 10616.70
2018-11-16 10644.00 10695.15 10631.15 **10682.20**
2018-11-19 **10731.25** 10774.70 10688.80 10763.40
2018-11-20 10740.10 10740.85 10640.85 10656.20
2018-11-21 10670.95 10671.30 10562.35 10600.05
2018-11-22 10612.65 10646.25 10512.00 **10526.75**
2018-11-26 **10568.30** 10637.80 10489.75 10628.60
2018-11-27 10621.45 10695.15 10596.35 10685.60
2018-11-28 10708.75 10757.80 10699.85 10728.85
2018-11-29 10808.70 10883.05 10782.35 10858.70
2018-11-30 10892.10 10922.45 10835.10 **10876.75**
I want to get the open price of monday and closing price of following Friday.
This is my code for same.
open = df.Open.resample('W-MON').last()
print open.tail(5)
close = df.Close.resample('W-FRI').last().resample('W-MON').first()
print close.tail(5)
weekly_data = pd.concat([open, close], axis=1)
print weekly_data.tail(5)
It gives me correct data for open and close individually, but when i merge to weekly_data, it gives wrong output for close. It shows me previous friday closing price.
How to fix this issue?
You can use shift by -4 days for align both DatetimeIndex:
open = df.Open.resample('W-MON').last()
print (open.tail(5))
Date
2018-11-12 10607.80
2018-11-19 10731.25
2018-11-26 10568.30
2018-12-03 10892.10
Freq: W-MON, Name: Open, dtype: float64
close = df.Close.resample('W-FRI').last().shift(-4, freq='D')
print (close.tail(5))
Date
2018-11-12 10682.20
2018-11-19 10526.75
2018-11-26 10876.75
Freq: W-MON, Name: Close, dtype: float64
weekly_data = pd.concat([open, close], axis=1)
print (weekly_data)
Open Close
Date
2018-11-12 10607.80 10682.20
2018-11-19 10731.25 10526.75
2018-11-26 10568.30 10876.75
2018-12-03 10892.10 NaN

Pulling specific value from Pandas DataReader dataframe?

Here is the code I am running:
def competitor_stock_data_report():
import datetime
import pandas_datareader.data as web
date_time = datetime.datetime.now()
date = date_time.date()
stocklist = ['LAZ','AMG','BEN','LM','EVR','GHL','HLI','MC','PJT','MS','GS','JPM','AB']
start = datetime.datetime(date.year-1, date.month, date.day-1)
end = datetime.datetime(date.year, date.month, date.day-1)
for x in stocklist:
df = web.DataReader(x, 'google', start, end)
print(df)
print(df.loc[df['Date'] == start]['Close'].values)
The problem is in the last line. How do I pull the specific value of the date specified 'Close' value?
Open High Low Close Volume
Date
2016-08-02 35.22 35.25 33.66 33.75 861111
2016-08-03 33.57 34.72 33.42 34.25 921401
2016-08-04 33.89 34.22 33.77 34.07 587016
2016-08-05 34.55 34.94 34.31 34.35 463317
2016-08-08 34.54 34.75 34.31 34.74 958230
2016-08-09 34.68 35.12 34.64 34.87 732959
I would like to get 33.75 for example, but the date is dynamically changing..
Any suggestions?
IMO the easiest way to get a column's value in the first row:
In [40]: df
Out[40]:
Open High Low Close Volume
Date
2016-08-03 767.18 773.21 766.82 773.18 1287421
2016-08-04 772.22 774.07 768.80 771.61 1140254
2016-08-05 773.78 783.04 772.34 782.22 1801205
2016-08-08 782.00 782.63 778.09 781.76 1107857
2016-08-09 781.10 788.94 780.57 784.26 1318894
... ... ... ... ... ...
2017-07-27 951.78 951.78 920.00 934.09 3212996
2017-07-28 929.40 943.83 927.50 941.53 1846351
2017-07-31 941.89 943.59 926.04 930.50 1970095
2017-08-01 932.38 937.45 929.26 930.83 1277734
2017-08-02 928.61 932.60 916.68 930.39 1824448
[252 rows x 5 columns]
In [41]: df.iat[0, df.columns.get_loc('Close')]
Out[41]: 773.17999999999995
Last row:
In [42]: df.iat[-1, df.columns.get_loc('Close')]
Out[42]: 930.38999999999999
Recommended
df.at[df.index[-1], 'Close']
df.iat[-1, df.columns.get_loc('Close')]
df.loc[df.index[-1], 'Close']
df.iloc[-1, df.columns.get_loc('Close')]
Not intended as public api, but works
df.get_value(df.index[-1], 'Close')
df.get_value(-1, df.columns.get_loc('Close'), takeable=True)
Not recommended, chained indexing
There could be more, but do I really need to add them
df.iloc[-1].at['Close']
df.loc[:, 'Close'].iat[-1]
All yield
34.869999999999997

Categories

Resources