Downloading stock data via Python to excel (missing date column) - python

I want to download stock data into single excel file but while doing this I have an issue that I struggle to get date column for my stock database.
Goal - Picture example
Can someone help me to find why am I getting such results? And how can I fix this?
import pandas as pd
import pandas_datareader as pdr
import datetime as dt
file = str(input('Enter File Name - '))
download_sourrce = (r"C:\Users\vladt\OneDrive\Рабочий стол\Stock\{}.xlsx".format(file))
writer = pd.ExcelWriter(download_sourrce, engine='xlsxwriter')
should_continue = True
while should_continue:
x = str(input('Stock? - '))
start = dt.datetime(2020,1,1)
end = dt.datetime.today()
df = pdr.get_data_yahoo(x,start,end)
df.to_excel(writer, sheet_name=x, index=False)
should_continue = input('Add new stock? [Y/N]').upper() == 'Y'
writer.save()

Try the following before saving to Excel:
df = df.reset_index()
It's common to have stock data feeds from pandas datareader (from yahoo or other companies providing data readers to pandas datareader) or other API/packages provided by other stock brokers, to have the date column set up as row index.
For this kind date series set as row index, you have to reset index to get its contents into data column.

Related

Format a column for excel date format from Format to Date within python script

df.to_excel(filename, index = False)
This is my current code for exporting data frame to an excel, column index 0 and 2 contain dates, but I need these columns to be formatted as an Excel date format. I have read online but I don't understand how to achieve this.
Can someone please explain
import pandas as pd
from datetime import datetime
writer = pd.ExcelWriter(testing.xlsx", datetime_format='dd/mm/yyyy')
hold.to_excel(writer, "Sheet1", index = False)
writer.close()
print("done")

add column to a dataframe in python pandas

How do i loop through my excel sheet and add each 'Adjusted Close' to a dataframe? I want to summarize all adj close and make an stock indice.
When i try with the below code the dataframe Percent_Change is empty.
xls = pd.ExcelFile('databas.xlsx')
countSheets = len(xls.sheet_names)
Percent_Change = pd.DataFrame()
x = 0
for x in range(countSheets):
data = pd.read_excel('databas.xlsx', sheet_name=x, index_col='Date')
# Calculate the percent change from day to day
Percent_Change[x] = pd.Series(data['Adj Close'].pct_change()*100, index=Percent_Change.index)
stock_index = data['Percent_Change'].cumsum()
unfortunately I do not have the data to replicate your complete example. However, there appears to be a bug in your code.
You are looping over "x" and "x" is a list of integers. You probably want to loop over the sheet names and append them to your DF. If you want to do that your code should be:
import pandas as pd
xls = pd.ExcelFile('databas.xlsx')
# pep8 unto thyself only, it is conventional to use "_" instead of camelCase or to avoid longer names if at all possible
sheets = xls.sheet_names
Percent_Change = pd.DataFrame()
# using sheet instead of x is more "pythonic"
for sheet in sheets:
data = pd.read_excel('databas.xlsx', sheet_name=sheet, index_col='Date')
# Calculate the percent change from day to day
Percent_Change[sheet] = pd.Series(data['Adj Close'].pct_change()*100, index=Percent_Change.index)
stock_index = data['Percent_Change'].cumsum()

Subtract cell in one column from the one prior in xlsx file in Python

I have an xlsx file with multiple sheets. In the sheets there is column A
with time stamps (as strings). I need to subtract the cells from the one above it to see how much time has elapsed.
ex. COLUMN A
02/23/2017 08:25:39
02/23/2017 08:55:56
02/23/2017 08:55:57
02/23/2017 08:56:12
Here is what I have so far.....Thank you in advance.
import xlrd
from datetime import datetime
def open_file(path):
# Open and read an Excel file
book = xlrd.open_workbook(path)
# get the first worksheet
first_sheet = book.sheet_by_index(0)
# read first column
column_values = first_sheet.col_values(0,0)
column_list = []
for i in column_values:
i = datetime.strptime(i, '%m/%d/%Y %H:%M:%S')
column_list.append(i)
print(column_list[1] - column_list[0])
if __name__ == "__main__":
path = '02-23-2017.xlsx'
open_file(path)
You may want to check out pandas. It handles calculations like this quickly.
import pandas as pd
# create a dictionary of data frames, one for each sheet
df_dict = pd.read_excel('C:/path/to/file.xlsx', sheets=None, header=None)
# iterate over each data frame
for df_key in df_dict:
# pull the time data from the first columns
t = pd.to_datetime(df_dict[df_key].iloc[:,0])
# calculate the time difference using .diff(1), fillna makes the first cell 0
dt = t.diff(1).fillna(0)
# assign the difference to a new columns in the data frame
df_dict[df_key]['time_delta'] = dt
# create a writer to make a new excel file
writer = pd.ExcelWriter('C:/path/to/new_file.xlsx')
# write each sheet to file
for name, df in df_dict.items():
df.to_excel(writer, 'sheet{}'.format(name))
writer.save()
Based on your existing code, you could do the following:
import xlrd
from datetime import datetime
def open_file(path):
# Open and read an Excel file
book = xlrd.open_workbook(path)
# Open each sheet
for sheet in book.sheet_names():
current_sheet = book.sheet_by_name(sheet)
# Read first column and convert to datetime objects
column_values = [datetime.strptime(i, '%m/%d/%Y %H:%M:%S') for i in current_sheet.col_values(0, 0)]
# Create a list of timedelta differences
cur = column_values[0]
differences = []
for i in column_values[1:]:
differences.append(i - cur)
cur = i
print(sheet)
for d in differences:
print(" {}".format(d))
if __name__ == "__main__":
path = '02-23-2017.xlsx'
open_file(path)
Assuming each of the sheets has the same format, this would give you something like:
Sheet1
0:30:17
0:00:01
0:00:15
Sheet2
0:30:17
0:00:01
0:00:15
Sheet3
0:30:17
0:00:01
0:00:15

Saving to_csv read just the columns, not the rows

I'm stuck with reading all the rows of a csv file and save into a csv files (I'm using pandas 0.17.1).
I've a list of tickers inserted into a csv file: they are inserted into each column, like this:
Column A: AAPL / Column B:TSLA / Column C: EXPD... and so on.
Now, I've to add 3000 new tickers to this list, and so I change the orientation of the csv, bringing every ticker into each row of the first column, like this:
Column A
AAPL
TSLA
EXPD
...and so on.
The issue is: when I save the document into a csv file, it read only the first row, and nothing else.
In my example, if i have on the first row "AAPL", I will obtain a csv file that has only the data from AAPL.
This is my code:
symbols_list = pd.read_csv('/home/andrea/htrade/python/titoli_rows.csv')
symbols = []
for ticker in symbols_list:
r = DataReader(ticker, "yahoo",
start=datetime.datetime.now() - BDay(20),
end=datetime.datetime.now())
# add a symbol column
r['Symbol'] = ticker
symbols.append(r)
# concatenate all the dfs
df = pd.concat(symbols)
#define cell with the columns that i need
cell = df[['Symbol', 'Open', 'High', 'Low', 'Adj Close', 'Volume']]
cell.reset_index().sort_values(['Symbol', 'Date'], ascending=[1, 0]).set_index('Symbol').to_csv('/home/andrea/Dropbox/HT/stock20.csv', date_format='%d/%m/%Y')
Why if I paste a ticker in each column the csv contain all the data of every ticker, but if I paste a ticker in each row, it will read just the first row?
I already tried to see if the "read_csv" function was reading correctly the csv, and he is, so I don't understand why he's not elaborating them all.
I just ran the below and with a short list of symbols imported via read_csv it seemed to work fine:
from datetime import datetime
import pandas.io.data as web
from pandas.tseries.offsets import BDay
df = pd.read_csv(path_to_file).loc[:, ['symbols']].dropna().squeeze()
symbols = []
for ticker in df.tolist():
r = web.DataReader(ticker, "yahoo",
start= datetime.now() - BDay(20),
end= datetime.now())
r['Symbol'] = ticker
symbols.append(r)
df = pd.concat(symbols).drop('Close', axis=1)
cell= df[['Symbol','Open','High','Low','Adj Close','Volume']]
cell.reset_index().sort_values(['Symbol', 'Date'], ascending=[1,0]).set_index('Symbol').to_csv(path_to_file, date_format='%d/%m/%Y')

How to write data to excel using python for stock data being pulled from yahoo

I have the following code which helps me to pull daily data for a number of stocks I have stored in a worksheet. What I was hoping to accomplish was to have the daily data returned and stored in another worksheet.
I am struggling to write a code which accomplishes this task. Currently I am able to pull the data for each of the individual stocks, though I have no way of storing this information. Any help will be appreciated. For the sake of testing I only tried to store Open and Close, ideally I would like all the parameters from yahoo finance to be stored.
import numpy as np
import pandas as pd
import xlsxwriter
df=pd.read_csv('Stock Companies Modified.csv', sep=',',header=True)
df.columns = ['StockSymbol', 'CompanyName', 'ClosingPrice', 'MarketCap', 'IPOYear', 'Sector', 'Industry']
workbook = xlsxwriter.Workbook('New Workbook.xlsx')
worksheet = workbook.add_worksheet()
df = df.convert_objects(convert_numeric=True)
df.dtypes
from pandas.io.data import DataReader
from datetime import datetime
for x in df.StockSymbol:
if len(x)<=4:
ClosingPrice = DataReader(x, 'yahoo', datetime(2015,1,1), datetime(2015,7,1))
row = 0
col = 0
#This is the area where I am getting an error, and to be honest I dont know how to do it correctly
for Open, Close in (ClosingPrice):
worksheet.write(row, col, (ClosingPrice['Open']))
worksheet.write(row,col+1,(ClosingPrice['Close']))
row+=1
workbook.close()
print x
else:
print("This is not working")
I've yet to find a clean way to append data to sheets with xlsxwriter, so typically I create a temporary dataframe with all of the values, as well as current sheet if existing - then overwrite. I would definitely prefer if we could append to sheets as you attempted but it doesn't seem possible.
import pandas as pd
from pandas.io.data import DataReader
from datetime import datetime
symbols = ['GOOG','AAPL']
try:
df = pd.read_excel('NewFile.xlsx')
except:
df = pd.DataFrame()
for symbol in symbols:
ClosingPrice = DataReader(symbol, 'yahoo', datetime(2015,1,1), datetime(2015,9,1))
ClosingPrice = ClosingPrice.reset_index()
ClosingPrice['Symbol'] = symbol
df = df.append(ClosingPrice)
writer = pd.ExcelWriter('NewFile.xlsx', engine='xlsxwriter')
df.to_excel(writer,sheet_name='Sheet1',index=False)
writer.save()
If you were later appending to this same file, it would be ok:
df = pd.read_excel('NewFile.xlsx')
symbols = ['G']
for symbol in symbols:
ClosingPrice = DataReader(symbol, 'yahoo', datetime(2015,1,1), datetime(2015,9,1))
ClosingPrice = ClosingPrice.reset_index()
ClosingPrice['Symbol'] = symbol
df = df.append(ClosingPrice)
writer = pd.ExcelWriter('NewFile.xlsx', engine='xlsxwriter')
df.to_excel(writer,sheet_name='Sheet1',index=False)
writer.save()
What is the error you are getting? Have you tried Pandas dataframe.to_excel?
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html

Categories

Resources