In my csv file, I save my dates in this format: "yyyy-mm-dd".
Every time I pull the data from csv and into a pandas dataframe, it will reset the format to "yyyy/mm/dd" in my csv file. This will cause errors if I test my code again, so I have to open the csv and reformat the date column to yyyy-mm-dd again.
Do you know why CSV does this? Is there a permanent solution to make sure my date format doesn't reset every time pandas reads my csv file?
Here is some of my code directly related to reading my csv file:
origindf = pd.read_csv('testlist.csv')
origindf = pd.DataFrame(origindf, columns=["ticker","date"])
origintickers = origindf['ticker'].values.tolist()
origintickersiterate = origindf['ticker'].values.tolist()
origindates = origindf['date'].values.tolist()
masterdf = pd.DataFrame(columns = ['ticker', 'date', 'time', 'vol', 'vwap', 'open', 'high', 'low','close','trades'])
for ticker in origintickersiterate:
polygonapi = 'xxxxxxxxxxxxxxx'
limit = 10000
multiplier = 1
timespan = 'minute'
adjusted = 'False'
theticker = origintickers.pop()
thedate = origindates.pop()
Assuming that pandas recognises the original text as dates, it will represent it as datetime64[ns] which is not text and how it displays on screen, eg with df.head() is irrelevant. You can check the data formats with df.dtypes to make sure.
Pandas to_csv allows you to control the format of the output dates with the date_format parameter, eg:
df.to_csv('testlist.csv', date_format='%Y-%m-%d')
I suggest viewing the output in a text editor because Excel will parse the dates and may convert them.
Current documentation:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
I want to download stock data into single excel file but while doing this I have an issue that I struggle to get date column for my stock database.
Goal - Picture example
Can someone help me to find why am I getting such results? And how can I fix this?
import pandas as pd
import pandas_datareader as pdr
import datetime as dt
file = str(input('Enter File Name - '))
download_sourrce = (r"C:\Users\vladt\OneDrive\Рабочий стол\Stock\{}.xlsx".format(file))
writer = pd.ExcelWriter(download_sourrce, engine='xlsxwriter')
should_continue = True
while should_continue:
x = str(input('Stock? - '))
start = dt.datetime(2020,1,1)
end = dt.datetime.today()
df = pdr.get_data_yahoo(x,start,end)
df.to_excel(writer, sheet_name=x, index=False)
should_continue = input('Add new stock? [Y/N]').upper() == 'Y'
writer.save()
Try the following before saving to Excel:
df = df.reset_index()
It's common to have stock data feeds from pandas datareader (from yahoo or other companies providing data readers to pandas datareader) or other API/packages provided by other stock brokers, to have the date column set up as row index.
For this kind date series set as row index, you have to reset index to get its contents into data column.
My goal for this script is to loop through hundreds of Excel and CSV files and find only file string names with "cash", that are CSV files, and between 2 date periods, formatted like YYYYMMDD. Once located, the script would pull the second row of the first worksheet down to the last row and append it to a master worksheet.
This is all I was able to put together so far but the code errors out. Any help is appreciated. Thank you in advance!
import pandas as pd
from glob import glob
from datetime import datetime
path = r'\\base\sub1\sub2\sub3\sub4\sub5\sub6\SearchFolder'
base_date = datetime(2020, 10, 1, 00, 00)
dates = pd.date_range(base_date, periods=92).tolist()
dates = [i.strftime("%Y%m%d") for i in dates]
list_of_csvs = glob(path+'*.csv')
print (list_of_csvs)
csvs_to_keep = []
for csv in list_of_csvs:
if 'cash activities' in str(csv).lower():
print ('cash activities found')
for date in dates:
if str(date) in csv:
print (csv)
csvs_to_keep.append(csv)
master_df = pd.DataFrame()
for csv in csvs_to_keep:
df = pd.read_csv(csv)
master_df = master_df.append(df)
master_df.to_excel('master_file.xlsx', index=False)
The code is super simple and the content of the cell is the exact same as what I'm writing into the code. Just trying to get the row number for all of the times where the ticker column = A.
Code:
import pandas as pd
filename = 'DataFiles/SHARADAR_SF1_aafe962511a67db10c0a72fe536305b0.csv'
pattern = 'AAPL'
df = pd.read_csv(filename, index_col=0)
rows = df[df['ticker'] == pattern].index.to_list()
Example of the CSV (There are more tickers later in the file, for example AAPL or TSLA etc.):
ticker,dimension,calendardate,datekey,lastupdated,assets,assetsavg,cashneq,debt,debtc,debtusd,divyield,deposits,eps,epsusd,equity,equityavg,liabilities,netinc,pe,price,revenue
A,ARQ,1999-12-31,2000-03-15,2020-09-01,7107000000,,1368000000,665000000,111000000,665000000,0,0,0.3,0.3,4486000000,,2621000000,131000000,,114.3,2246000000
A,ARQ,2000-03-31,2000-06-12,2020-09-01,7321000000,,978000000,98000000,98000000,98000000,0,0,0.37,0.37,4642000000,,2679000000,166000000,,66,2485000000
A,ARQ,2000-06-30,2000-09-01,2020-09-01,7827000000,,703000000,129000000,129000000,129000000,0,0,0.34,0.34,4902000000,,2925000000,155000000,46.877,61.88,2670000000
A,ARQ,2000-09-30,2001-01-17,2020-09-01,8425000000,,996000000,110000000,110000000,110000000,0,0,0.67,0.67,5265000000,,3160000000,305000000,37.341,61.94,3372000000
A,ARQ,2000-12-31,2001-03-19,2020-09-01,9208000000,,433000000,556000000,556000000,556000000,0,0,0.34,0.34,5541000000,,3667000000,154000000,21.661,36.99,2841000000
Here, use index_col as None, otherwise ticker is index column:
import pandas as pd
filename = 'DataFiles/SHARADAR_SF1_aafe962511a67db10c0a72fe536305b0.csv'
pattern = 'AAPL'
df = pd.read_csv(filename, index_col=None)
rows = df[df['ticker'] == pattern].index.to_list()
I have the following code which helps me to pull daily data for a number of stocks I have stored in a worksheet. What I was hoping to accomplish was to have the daily data returned and stored in another worksheet.
I am struggling to write a code which accomplishes this task. Currently I am able to pull the data for each of the individual stocks, though I have no way of storing this information. Any help will be appreciated. For the sake of testing I only tried to store Open and Close, ideally I would like all the parameters from yahoo finance to be stored.
import numpy as np
import pandas as pd
import xlsxwriter
df=pd.read_csv('Stock Companies Modified.csv', sep=',',header=True)
df.columns = ['StockSymbol', 'CompanyName', 'ClosingPrice', 'MarketCap', 'IPOYear', 'Sector', 'Industry']
workbook = xlsxwriter.Workbook('New Workbook.xlsx')
worksheet = workbook.add_worksheet()
df = df.convert_objects(convert_numeric=True)
df.dtypes
from pandas.io.data import DataReader
from datetime import datetime
for x in df.StockSymbol:
if len(x)<=4:
ClosingPrice = DataReader(x, 'yahoo', datetime(2015,1,1), datetime(2015,7,1))
row = 0
col = 0
#This is the area where I am getting an error, and to be honest I dont know how to do it correctly
for Open, Close in (ClosingPrice):
worksheet.write(row, col, (ClosingPrice['Open']))
worksheet.write(row,col+1,(ClosingPrice['Close']))
row+=1
workbook.close()
print x
else:
print("This is not working")
I've yet to find a clean way to append data to sheets with xlsxwriter, so typically I create a temporary dataframe with all of the values, as well as current sheet if existing - then overwrite. I would definitely prefer if we could append to sheets as you attempted but it doesn't seem possible.
import pandas as pd
from pandas.io.data import DataReader
from datetime import datetime
symbols = ['GOOG','AAPL']
try:
df = pd.read_excel('NewFile.xlsx')
except:
df = pd.DataFrame()
for symbol in symbols:
ClosingPrice = DataReader(symbol, 'yahoo', datetime(2015,1,1), datetime(2015,9,1))
ClosingPrice = ClosingPrice.reset_index()
ClosingPrice['Symbol'] = symbol
df = df.append(ClosingPrice)
writer = pd.ExcelWriter('NewFile.xlsx', engine='xlsxwriter')
df.to_excel(writer,sheet_name='Sheet1',index=False)
writer.save()
If you were later appending to this same file, it would be ok:
df = pd.read_excel('NewFile.xlsx')
symbols = ['G']
for symbol in symbols:
ClosingPrice = DataReader(symbol, 'yahoo', datetime(2015,1,1), datetime(2015,9,1))
ClosingPrice = ClosingPrice.reset_index()
ClosingPrice['Symbol'] = symbol
df = df.append(ClosingPrice)
writer = pd.ExcelWriter('NewFile.xlsx', engine='xlsxwriter')
df.to_excel(writer,sheet_name='Sheet1',index=False)
writer.save()
What is the error you are getting? Have you tried Pandas dataframe.to_excel?
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html