Mismatch when filling yearly data into dataframe with daily data

Mismatch when filling yearly data into dataframe with daily data - python

I am trying to download data and add statistics and economic indicators, however my data is on a daily basis and the indicators are on a yearly basis.
I tried to store year/indicator pairs as a dictionary, go through each day in the dates column returned from yfinance, and populate a list with the GDP Deflator for each day using the dictionary. Then I convert that list to a Dataframe and add it as a row to the dataframe returned from yfinance and save it as a csv.
However, when I look at the csv file, the GDP deflator for 2004 shows up for the last day in 2003, and for the last two days in 2004 the GDP Deflator is that of 2005.
What am I doing wrong?
code below:
import pandas as pd
import yfinance as yf
import world_bank_data as wb
df = pd.DataFrame() # Empty DataFrame
GDPD = []
df = yf.download(tickers = 'USDSGD=X' , period='max', interval='1d')
df.reset_index(inplace=True)
date = df['Date']
SGD_def_dict = {"Year":[],"GDP_Deflator":[]}
for i in range(len(date)):
if date[i].year in SGD_def_dict['Year']:
GDPD.append(list(SGD_def_dict.values())[-1][-1])
else:
SGD_def_dict["Year"].append(date[i].year)
try:
SGD_def_dict["GDP_Deflator"].append(wb.get_series('NY.GDP.DEFL.ZS', country= 'SGP', date=date[i].year, id_or_value='id', simplify_index=True))
except:
SGD_def_dict["GDP_Deflator"].append(float("nan"))
#GDPD.append(list(SGD_def_dict.values())[-1][-1])
df2 = pd.DataFrame({"GDP_Deflator":GDPD})
df["GDP_Deflator"] = df2
df.to_csv(r'C:..WBTEST.csv')`

You need to match the year of each day to the corresponding GDP deflator in the dictionary, and then use the same value for all days in that year.
import pandas as pd
import yfinance as yf
import world_bank_data as wb
df = pd.DataFrame() # Empty DataFrame
df = yf.download(tickers = 'USDSGD=X' , period='max', interval='1d')
df.reset_index(inplace=True)
date = df['Date']
SGD_def_dict = {"Year":[],"GDP_Deflator":[]}
for i in range(len(date)):
year = date[i].year
if year not in SGD_def_dict['Year']:
SGD_def_dict["Year"].append(year)
try:
SGD_def_dict["GDP_Deflator"].append(wb.get_series('NY.GDP.DEFL.ZS', country= 'SGP', date=year, id_or_value='id', simplify_index=True))
except:
SGD_def_dict["GDP_Deflator"].append(float("nan"))
df['Year'] = df['Date'].dt.year
df = df.merge(pd.DataFrame(SGD_def_dict), on='Year')
df.drop(['Year'], axis=1, inplace=True)
df.to_csv(r'C:..WBTEST.csv')

Related

only pull rows for today's date from dataframe

I'm pulling data from an API and placing it into a Pandas dataframe. I want to then create a new df that includes only the rows that have today's date in. I know how to select between two static dates, but can't seem to filter by a 'today' timestamp.
from matplotlib import pyplot as plt
#Access API
r = requests.get('REMOVED')
x = r.json()
keys = x.keys()
old_df = pd.DataFrame(x['results'])
#set dataframe
df = old_df[['valid_from','valid_to','value_inc_vat']].copy()
df['valid_from'] = pd.to_datetime(df['valid_from'])
df['valid_to'] = pd.to_datetime(df['valid_to'])
#only today's rows
today = pd.Timestamp.today().date()
mask = (df['from'] == today)
df_today = df.loc[mask]```

Use Series.dt.date for compare by dates:
mask = (df['from'].dt.date == today)
df_today = df[mask]

Comparing daily returns of every stock with S&P500 and sort the most which overperformed

I need to -download all the historical prices of every stock contained in the S&P500 and the historical price of the index -calculate the daily returns -comparing every daily returns of every stock with the daily returns of S&P500 -sorting a list of the most performing -calculate how many days out of tot they outperformed
This should be the code for downloaded the data:
start_date = "2018-01-01"`
end_date = date.today().strftime("%Y-%m-%d")
payload = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies%27')
first_table = payload[0]
df = first_table
symbols = df['Symbol'].values.tolist()
n = len(symbols)
data = yf.download(symbols[:5], group_by='Ticker', start=start_date, end=end_date)
snp = yf.download('SPY', start=start_date, end=end_date)`
This is the code for calculating the daily returns
logRet = np.log(data/data.shift(1))
Thank you

Your inputs:
import pandas as pd
import yfinance as yf
from datetime import date
import numpy as np
start_date = "2018-01-01"
end_date = date.today().strftime("%Y-%m-%d")
payload = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
first_table = payload[0]
df = first_table
symbols = df['Symbol'].values.tolist()
n = len(symbols)
data = yf.download(symbols[:5], group_by='Ticker', start=start_date, end=end_date)
snp = yf.download('SPY', start=start_date, end=end_date)
Then:
# close only
data = data[data.columns[data.columns.get_level_values(1) == "Close"]]
data.columns = data.columns.droplevel(1)
snp = snp[["Close"]].rename(columns={"Close": "SNP"})
# daily returns
logRet = np.log(data/data.shift(1))
logRet_snp = np.log(snp/snp.shift(1))
returns = logRet.merge(logRet_snp, left_index=True, right_index=True)
# count where returns are greater than S&P
count = pd.Series()
# don't include S&P, which will last column from merge.
for x in returns.columns[:-1]:
# new series index x is sum of bool where return is greater than that of S&P
count.loc[x] = ((returns[x] > returns["SNP"])*1).sum()
# sort in descending order.
count.sort_values(ascending=False, inplace=True)
Output:
#Out:
#ABMD 565
#ABBV 543
#ABT 543
#AOS 515
#MMM 494
#dtype: int64
Your website link for pd.read_html was incorrect, you had a few additional characters on the end.

How to sort Months in Monthly order within Python Pandas when working with .CSV files?

For each NAME/LOCATION, calculate the average snow amount per month. Save the results in two separate .csv files (one for 2016 and the other for 2017) name the files average2016.csv and average2017.csv.
I am used Python 3.8 with Panadas.
I accomplished doing this with this code:
import numpy as np
import pandas as pd
df = pd.read_csv('filteredData.csv')
df['DATE'] = pd.to_datetime(df['DATE'])
df['year'] = pd.DatetimeIndex(df['DATE']).year
df16 = df[(df.year == 2016)]
df17 = df[(df.year == 2017)]
df_2016 = df16.groupby(['NAME', 'Month'])['SNOW'].mean().reset_index()
df_2017 = df17.groupby(['NAME', 'Month'])['SNOW'].mean().reset_index()
df_2016[['NAME', 'Month', 'SNOW']].to_csv('average2016.csv')
df_2017[['NAME', 'Month', 'SNOW']].to_csv('average2017.csv')
This image shows my results for average 2016.
However, the problem that I am having is that the Months are not in Monthly order. I want them to go from January through December for each location. Example: I want the NAME: ADA 0.7 SE, MI US Month's to be May then June. How would I be able to accomplish this? Also is there a way to get rid of the first numbered column?

You can sort on the DATE column. But then you need to remember to do sort=False in your groupby, else it will sort there using the string ordering. In addition, your repetitive code for each year can be replaced with a single groupby, adding year to the grouping keys. Then you'd then write separately into different files and index=False is how you get rid of the Index.
import numpy as np
import pandas as pd
df = pd.read_csv('filteredData.csv')
df['DATE'] = pd.to_datetime(df['DATE'])
df['year'] = df['DATE'].dt.year # Datetime has this attribute already
df = df.sort_values(['NAME', 'DATE']) # Output will be in order within each Name
df = (df[df.year.between(2016,2017)] # Only 2016 and 2017
.groupby(['year', 'NAME', 'Month'], sort=False)['SNOW']
.mean().reset_index())
for year,gp in df.groupby('year'): # Write files separately by year
gp[['NAME', 'Month', 'SNOW']].to_csv(f'average{year}.csv', index=False)

Filter particular date in a DF column

I want to filter particular date in a DF column.
My code:
df
df["Crawl Date"]=pd.to_datetime(df["Crawl Date"]).dt.date
date=pd.to_datetime("03-21-2020")
df=df[df["Crawl Date"]==date]
It is showing no match.
Note: df column is having time also with date which need to be trimmed.
Thanks in advance.

The following script assumes that the 'Crawl Dates' column contains strings:
import pandas as pd
import datetime
column_names = ["Crawl Date"]
df = pd.DataFrame(columns = column_names)
#Populate dataframe with dates
df.loc[0] = ['03-21-2020 23:45:57']
df.loc[1] = ['03-22-2020 23:12:33']
df["Crawl Date"]=pd.to_datetime(df["Crawl Date"]).dt.date
date=pd.to_datetime("03-21-2020")
df=df[df["Crawl Date"]==date]
Then df returns:
Crawl Date 0 2020-03-21

iterating a stock tick data with append on python

I am trying to combine a series of stock tick data based on the dates.
But it wont work. Please help.
import pandas as pd
import tushare as ts
def get_all_tick(stockID):
dates=pd.date_range('2016-01-01',periods=5,freq='D')
append_data=[]
for i in dates:
stock_tick=pd.DataFrame(ts.get_tick_data(stockID,date=i))
stock_tick.sort('volume',inplace=True, ascending=False)
stock_tick=stock_tick[:10]
stock_tick.sort('time',inplace=True, ascending=False)
append_data.append(stock_tick.iterrows())
get_all_tick('300243')

I figure it out myself.
def get_all_tick(stockID):
.........
df = pd.DataFrame()
for i in get_date:
stock_tick = ts.get_tick_data(stockID, date=i)
stock_tick['Date']=i
stock_tick.sort('volume', inplace=True, ascending=False)
stock_tick = stock_tick[:10]
stock_tick.sort('time', inplace=True, ascending=False)
df = df.append(stock_tick)
df.to_excel('tick.xlsx',sheet_name='Sheet1')
get_all_tick('300243')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Mismatch when filling yearly data into dataframe with daily data - python

Related

only pull rows for today's date from dataframe

Comparing daily returns of every stock with S&P500 and sort the most which overperformed

How to sort Months in Monthly order within Python Pandas when working with .CSV files?

Filter particular date in a DF column

iterating a stock tick data with append on python

Categories

Resources