Saving to_csv read just the columns, not the rows - python

I'm stuck with reading all the rows of a csv file and save into a csv files (I'm using pandas 0.17.1).
I've a list of tickers inserted into a csv file: they are inserted into each column, like this:
Column A: AAPL / Column B:TSLA / Column C: EXPD... and so on.
Now, I've to add 3000 new tickers to this list, and so I change the orientation of the csv, bringing every ticker into each row of the first column, like this:
Column A
AAPL
TSLA
EXPD
...and so on.
The issue is: when I save the document into a csv file, it read only the first row, and nothing else.
In my example, if i have on the first row "AAPL", I will obtain a csv file that has only the data from AAPL.
This is my code:
symbols_list = pd.read_csv('/home/andrea/htrade/python/titoli_rows.csv')
symbols = []
for ticker in symbols_list:
r = DataReader(ticker, "yahoo",
start=datetime.datetime.now() - BDay(20),
end=datetime.datetime.now())
# add a symbol column
r['Symbol'] = ticker
symbols.append(r)
# concatenate all the dfs
df = pd.concat(symbols)
#define cell with the columns that i need
cell = df[['Symbol', 'Open', 'High', 'Low', 'Adj Close', 'Volume']]
cell.reset_index().sort_values(['Symbol', 'Date'], ascending=[1, 0]).set_index('Symbol').to_csv('/home/andrea/Dropbox/HT/stock20.csv', date_format='%d/%m/%Y')
Why if I paste a ticker in each column the csv contain all the data of every ticker, but if I paste a ticker in each row, it will read just the first row?
I already tried to see if the "read_csv" function was reading correctly the csv, and he is, so I don't understand why he's not elaborating them all.

I just ran the below and with a short list of symbols imported via read_csv it seemed to work fine:
from datetime import datetime
import pandas.io.data as web
from pandas.tseries.offsets import BDay
df = pd.read_csv(path_to_file).loc[:, ['symbols']].dropna().squeeze()
symbols = []
for ticker in df.tolist():
r = web.DataReader(ticker, "yahoo",
start= datetime.now() - BDay(20),
end= datetime.now())
r['Symbol'] = ticker
symbols.append(r)
df = pd.concat(symbols).drop('Close', axis=1)
cell= df[['Symbol','Open','High','Low','Adj Close','Volume']]
cell.reset_index().sort_values(['Symbol', 'Date'], ascending=[1,0]).set_index('Symbol').to_csv(path_to_file, date_format='%d/%m/%Y')

Related

"df = pd.read_csv('xxx.csv')" resets the date format in my csv file

In my csv file, I save my dates in this format: "yyyy-mm-dd".
Every time I pull the data from csv and into a pandas dataframe, it will reset the format to "yyyy/mm/dd" in my csv file. This will cause errors if I test my code again, so I have to open the csv and reformat the date column to yyyy-mm-dd again.
Do you know why CSV does this? Is there a permanent solution to make sure my date format doesn't reset every time pandas reads my csv file?
Here is some of my code directly related to reading my csv file:
origindf = pd.read_csv('testlist.csv')
origindf = pd.DataFrame(origindf, columns=["ticker","date"])
origintickers = origindf['ticker'].values.tolist()
origintickersiterate = origindf['ticker'].values.tolist()
origindates = origindf['date'].values.tolist()
masterdf = pd.DataFrame(columns = ['ticker', 'date', 'time', 'vol', 'vwap', 'open', 'high', 'low','close','trades'])
for ticker in origintickersiterate:
polygonapi = 'xxxxxxxxxxxxxxx'
limit = 10000
multiplier = 1
timespan = 'minute'
adjusted = 'False'
theticker = origintickers.pop()
thedate = origindates.pop()
Assuming that pandas recognises the original text as dates, it will represent it as datetime64[ns] which is not text and how it displays on screen, eg with df.head() is irrelevant. You can check the data formats with df.dtypes to make sure.
Pandas to_csv allows you to control the format of the output dates with the date_format parameter, eg:
df.to_csv('testlist.csv', date_format='%Y-%m-%d')
I suggest viewing the output in a text editor because Excel will parse the dates and may convert them.
Current documentation:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html

Downloading stock data via Python to excel (missing date column)

I want to download stock data into single excel file but while doing this I have an issue that I struggle to get date column for my stock database.
Goal - Picture example
Can someone help me to find why am I getting such results? And how can I fix this?
import pandas as pd
import pandas_datareader as pdr
import datetime as dt
file = str(input('Enter File Name - '))
download_sourrce = (r"C:\Users\vladt\OneDrive\Рабочий стол\Stock\{}.xlsx".format(file))
writer = pd.ExcelWriter(download_sourrce, engine='xlsxwriter')
should_continue = True
while should_continue:
x = str(input('Stock? - '))
start = dt.datetime(2020,1,1)
end = dt.datetime.today()
df = pdr.get_data_yahoo(x,start,end)
df.to_excel(writer, sheet_name=x, index=False)
should_continue = input('Add new stock? [Y/N]').upper() == 'Y'
writer.save()
Try the following before saving to Excel:
df = df.reset_index()
It's common to have stock data feeds from pandas datareader (from yahoo or other companies providing data readers to pandas datareader) or other API/packages provided by other stock brokers, to have the date column set up as row index.
For this kind date series set as row index, you have to reset index to get its contents into data column.

Python Iterate through Folder of CSV Workbooks and Append only Workbook Names with a Key Word and Date Range to Master Sheet

My goal for this script is to loop through hundreds of Excel and CSV files and find only file string names with "cash", that are CSV files, and between 2 date periods, formatted like YYYYMMDD. Once located, the script would pull the second row of the first worksheet down to the last row and append it to a master worksheet.
This is all I was able to put together so far but the code errors out. Any help is appreciated. Thank you in advance!
import pandas as pd
from glob import glob
from datetime import datetime
path = r'\\base\sub1\sub2\sub3\sub4\sub5\sub6\SearchFolder'
base_date = datetime(2020, 10, 1, 00, 00)
dates = pd.date_range(base_date, periods=92).tolist()
dates = [i.strftime("%Y%m%d") for i in dates]
list_of_csvs = glob(path+'*.csv')
print (list_of_csvs)
csvs_to_keep = []
for csv in list_of_csvs:
if 'cash activities' in str(csv).lower():
print ('cash activities found')
for date in dates:
if str(date) in csv:
print (csv)
csvs_to_keep.append(csv)
master_df = pd.DataFrame()
for csv in csvs_to_keep:
df = pd.read_csv(csv)
master_df = master_df.append(df)
master_df.to_excel('master_file.xlsx', index=False)

Can't figure out why I keep getting a KeyError while trying to the number of a CSV row

The code is super simple and the content of the cell is the exact same as what I'm writing into the code. Just trying to get the row number for all of the times where the ticker column = A.
Code:
import pandas as pd
filename = 'DataFiles/SHARADAR_SF1_aafe962511a67db10c0a72fe536305b0.csv'
pattern = 'AAPL'
df = pd.read_csv(filename, index_col=0)
rows = df[df['ticker'] == pattern].index.to_list()
Example of the CSV (There are more tickers later in the file, for example AAPL or TSLA etc.):
ticker,dimension,calendardate,datekey,lastupdated,assets,assetsavg,cashneq,debt,debtc,debtusd,divyield,deposits,eps,epsusd,equity,equityavg,liabilities,netinc,pe,price,revenue
A,ARQ,1999-12-31,2000-03-15,2020-09-01,7107000000,,1368000000,665000000,111000000,665000000,0,0,0.3,0.3,4486000000,,2621000000,131000000,,114.3,2246000000
A,ARQ,2000-03-31,2000-06-12,2020-09-01,7321000000,,978000000,98000000,98000000,98000000,0,0,0.37,0.37,4642000000,,2679000000,166000000,,66,2485000000
A,ARQ,2000-06-30,2000-09-01,2020-09-01,7827000000,,703000000,129000000,129000000,129000000,0,0,0.34,0.34,4902000000,,2925000000,155000000,46.877,61.88,2670000000
A,ARQ,2000-09-30,2001-01-17,2020-09-01,8425000000,,996000000,110000000,110000000,110000000,0,0,0.67,0.67,5265000000,,3160000000,305000000,37.341,61.94,3372000000
A,ARQ,2000-12-31,2001-03-19,2020-09-01,9208000000,,433000000,556000000,556000000,556000000,0,0,0.34,0.34,5541000000,,3667000000,154000000,21.661,36.99,2841000000
Here, use index_col as None, otherwise ticker is index column:
import pandas as pd
filename = 'DataFiles/SHARADAR_SF1_aafe962511a67db10c0a72fe536305b0.csv'
pattern = 'AAPL'
df = pd.read_csv(filename, index_col=None)
rows = df[df['ticker'] == pattern].index.to_list()

How to write data to excel using python for stock data being pulled from yahoo

I have the following code which helps me to pull daily data for a number of stocks I have stored in a worksheet. What I was hoping to accomplish was to have the daily data returned and stored in another worksheet.
I am struggling to write a code which accomplishes this task. Currently I am able to pull the data for each of the individual stocks, though I have no way of storing this information. Any help will be appreciated. For the sake of testing I only tried to store Open and Close, ideally I would like all the parameters from yahoo finance to be stored.
import numpy as np
import pandas as pd
import xlsxwriter
df=pd.read_csv('Stock Companies Modified.csv', sep=',',header=True)
df.columns = ['StockSymbol', 'CompanyName', 'ClosingPrice', 'MarketCap', 'IPOYear', 'Sector', 'Industry']
workbook = xlsxwriter.Workbook('New Workbook.xlsx')
worksheet = workbook.add_worksheet()
df = df.convert_objects(convert_numeric=True)
df.dtypes
from pandas.io.data import DataReader
from datetime import datetime
for x in df.StockSymbol:
if len(x)<=4:
ClosingPrice = DataReader(x, 'yahoo', datetime(2015,1,1), datetime(2015,7,1))
row = 0
col = 0
#This is the area where I am getting an error, and to be honest I dont know how to do it correctly
for Open, Close in (ClosingPrice):
worksheet.write(row, col, (ClosingPrice['Open']))
worksheet.write(row,col+1,(ClosingPrice['Close']))
row+=1
workbook.close()
print x
else:
print("This is not working")
I've yet to find a clean way to append data to sheets with xlsxwriter, so typically I create a temporary dataframe with all of the values, as well as current sheet if existing - then overwrite. I would definitely prefer if we could append to sheets as you attempted but it doesn't seem possible.
import pandas as pd
from pandas.io.data import DataReader
from datetime import datetime
symbols = ['GOOG','AAPL']
try:
df = pd.read_excel('NewFile.xlsx')
except:
df = pd.DataFrame()
for symbol in symbols:
ClosingPrice = DataReader(symbol, 'yahoo', datetime(2015,1,1), datetime(2015,9,1))
ClosingPrice = ClosingPrice.reset_index()
ClosingPrice['Symbol'] = symbol
df = df.append(ClosingPrice)
writer = pd.ExcelWriter('NewFile.xlsx', engine='xlsxwriter')
df.to_excel(writer,sheet_name='Sheet1',index=False)
writer.save()
If you were later appending to this same file, it would be ok:
df = pd.read_excel('NewFile.xlsx')
symbols = ['G']
for symbol in symbols:
ClosingPrice = DataReader(symbol, 'yahoo', datetime(2015,1,1), datetime(2015,9,1))
ClosingPrice = ClosingPrice.reset_index()
ClosingPrice['Symbol'] = symbol
df = df.append(ClosingPrice)
writer = pd.ExcelWriter('NewFile.xlsx', engine='xlsxwriter')
df.to_excel(writer,sheet_name='Sheet1',index=False)
writer.save()
What is the error you are getting? Have you tried Pandas dataframe.to_excel?
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html

Categories

Resources