Python and Pandas Creating Multiple Dynamic Excel Sheets with Dataframes - python

Thanks in advance! I have been struggling for a few days so that means it is time for me to ask a question. I have a program that is pulling information for three stocks using the module "yfinance" It uses a ticker list in a txt file. I can get the intended information into a data frame for each ticker in the list using a for loop. I then want to save information for each separate ticker on its own sheet in an Excel book with the sheet name being the ticker. As of now I end up creating three distinct data frames but the Excel output only has one tab with the last requested ticker information (MSFT). I think I may need to use an append process to create a new tab with each data frame information, thanks for any suggestions.
Code
import platform
import yfinance as yf
import pandas as pd
import csv
# check versions
print('Python Version: ' + platform.python_version())
print('YFinance Version: ' + yf.__version__)
# load txt of tickers to list, contains three tickers
tickerlist = []
with open('tickers.txt') as inputfile:
for row in csv.reader(inputfile):
tickerlist.append(row)
# iterate through ticker txt file
for i in range(len(tickerlist)):
tickersymbol = tickerlist[i]
stringticker = str(tickersymbol)
stringticker = stringticker.replace("[", "")
stringticker = stringticker.replace("]", "")
stringticker = stringticker.replace("'", "")
# set data to retrievable variable
tickerdata = yf.Ticker(stringticker)
tickerinfo = tickerdata.info
# data items requested
investment = tickerinfo['shortName']
country = tickerinfo['country']
# create dataframes from lists
dfoverview = pd.DataFrame({'Label': ['Company', 'Country'],
'Value': [investment, country]
})
print(dfoverview)
print('-----------------------------------------------------------------')
#export data to each tab (PROBLEM AREA)
dfoverview.to_excel('output.xlsx',
sheet_name=stringticker)
Output
Python Version: 3.7.7
YFinance Version: 0.1.54
Company Walmart Inc.
Country United States
Company Tesla, Inc.
Country United States
Company Microsoft Corporation
Country United States
Process finished with exit code 0
EDITS: Deleted original to try and post to correct forum/location

If all of your ticker information is in a single data frame, Pandas groupby() method works well for you here (if I'm understanding your problem correctly). This is pseudo, but try something like this instead:
import pandas as pd
# df here represents your single data frame with all your ticker info
# column_value is the column you choose to group by
# this column_value will also be used to dynamically create your sheet names
ticker_group = df.groupby(['column_value'])
# create the writer obj
with pd.ExcelWriter('output.xlsx') as writer:
# key=str obj of column_value, data=dataframe obj of data pertaining to key
for key, data in ticker_group:
ticker_group.get_group(key).to_excel(writer, sheet_name=key, index=False)

Related

Loading CSV into dataframe results in all records becoming "NaN"

I'm new to python (and posting on SO), and I'm trying to use some code I wrote that worked in another similar context to import data from a file into a MySQL table. To do that, I need to convert it to a dataframe. In this particular instance I'm using Federal Election Comission data that is pipe-delimited (It's the "Committee Master" data here). It looks like this.
C00000059|HALLMARK CARDS PAC|SARAH MOE|2501 MCGEE|MD #500|KANSAS CITY|MO|64108|U|Q|UNK|M|C||
C00000422|AMERICAN MEDICAL ASSOCIATION POLITICAL ACTION COMMITTEE|WALKER, KEVIN MR.|25 MASSACHUSETTS AVE, NW|SUITE 600|WASHINGTON|DC|200017400|B|Q||M|M|ALABAMA MEDICAL PAC|
C00000489|D R I V E POLITICAL FUND CHAPTER 886|JERRY SIMS JR|3528 W RENO||OKLAHOMA CITY|OK|73107|U|N||Q|L||
C00000547|KANSAS MEDICAL SOCIETY POLITICAL ACTION COMMITTEE|JERRY SLAUGHTER|623 SW 10TH AVE||TOPEKA|KS|666121627|U|Q|UNK|Q|M|KANSAS MEDICAL SOCIETY|
C00000729|AMERICAN DENTAL ASSOCIATION POLITICAL ACTION COMMITTEE|DI VINCENZO, GIORGIO T. DR.|1111 14TH STREET, NW|SUITE 1100|WASHINGTON|DC|200055627|B|Q|UNK|M|M|INDIANA DENTAL PAC|
When I run this code, all of the records come back "NaN."
import pandas as pd
import pymysql
print('convert CSV to dataframe')
data = pd.read_csv ('Desktop/Python/FECupdates/cm.txt', delimiter='|')
df = pd.DataFrame(data, columns=['CMTE_ID','CMTE_NM','TRES_NM','CMTE_ST1','CMTE_ST2','CMTE_CITY','CMTE_ST','CMTE_ZIP','CMTE_DSGN','CMTE_TP','CMTE_PTY_AFFILIATION','CMTE_FILING_FREQ','ORG_TP','CONNECTED_ORG_NM','CAND_ID'])
print(df.head(10))
If I remove the dataframe part and just do this, it displays the data, so it doesn't seem like it's a problem with file itself (but what do I know?):
import pandas as pd
import pymysql
print('convert CSV to dataframe')
data = pd.read_csv ('Desktop/Python/FECupdates/cm.txt', delimiter='|')
print(data.head(10))
I've spent hours looking at different questions here that seem to be trying to address similar issues -- in which cases the problems apparently stemmed from things like the encoding or different kinds of delimiters -- but each time I try to make the same changes to my code I get the same result. I've also converted the whole thing to a csv, by changing all the commas in fields to "$" and then changing the pipes to commas. It still shows up as all "Nan," even though the number of records is correct if I upload it to MySQL (they're just all empty).
You made typos in columns list. Pandas can automatically recognize columns.
import pandas as pd
import pymysql
print('convert CSV to dataframe')
data = pd.read_csv ('cn.txt', delimiter='|')
df = pd.DataFrame(data)
print(df.head(10))
Also, you can create an empty dataframe and concatenate the readed file.
import pandas as pd
import pymysql
print('convert CSV to dataframe')
data = pd.read_csv ('cn.txt', delimiter='|')
data2 = pd.DataFrame()
df = pd.concat([data,data2],ignore_index=True)
print(df.head(10))
Try this, worked for me:
path = Desktop/Python/FECupdates
df = pd.read_csv(path+'cm.txt',encoding ='unicode_escape', sep='|')
df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
df.columns = ['CMTE_ID','CMTE_NM','TRES_NM','CMTE_ST1','CMTE_ST2','CMTE_CITY','CMTE_ST','CMTE_ZIP','CMTE_DSGN','CMTE_TP','CMTE_PTY_AFFILIATION','CMTE_FILING_FREQ','ORG_TP','CONNECTED_ORG_NM','CAND_ID']
df.head(200)
Output:

Python Edgar package - get CIK number

I am reading S-1 filings from Edgar sec. I get my initial data from Bloomberg. Through the company name I can look for the matching CIK number using the term get_cik_by_company_name(company_name: str). I should be able to get the CIK number which I than want to save in a list -> cik_list. However it is not working - Invalid Syntax for str.
BloombergList ist the Excel Bloomberg created with all the relevant company names. In column 4 I got the names which I import as a list, than get the matching CIK and than export the CIK list in the right order back to the BloombergList - theoretically.
I am happy if someone can help. Thanks in advance.
#needed packages
import pandas as pd
from openpyxl import load_workbook
from edgar import Edgar
#from excel get company names
book = load_workbook('BloombergList.xlsx')
sheet = book['Cleaned up']
for row in sheet.rows:
row_list = row[4].value
print (row_list)
#use edgar package to get CIK numbers from row_list
edgar = Edgar()
cik_list = []
for x in row_list:
possible_companies = edgar.find_company_name(x)
cik_list.append(get_cik_by_company_name(company_name: str))
#export generated CIK numbers back to excel
df = pd.DataFrame({'CIK':[cik_list]})
df.to_excel('BloombergList.xlsx', sheet_name="CIK", index=False)
print ("done")

Problems reading and concatenating CSV files into a single dataframe

I am going to Yahoo Finance and pulling data for German stocks. Then writing them to individual CSV files.
I then want to read them back in to a single dataframe.
#Code to get stocks
tickers = ["MUV2.DE","DTE.DE", "VNA.DE", "ALV.DE", "BAYN.DE", "EOAN.DE", "RWE.DE", "CON.DE", "HEN3.DE", "BAS.DE", "FME.DE", "WDI.DE", "IFX.DE", "SAP.DE", "BMW.DE", "DPW.DE", "DB1.DE", "DAI.DE", "BEI.DE", "SIE.DE", "ADS.DE", "DBK.DE", "FRE.DE", "HEI.DE", "MRK.DE", "LHA.DE", "VOW3.DE", "1COV.DE", "LIN.DE", "TKA.DE"]
start = datetime.datetime(2012,5,31)
end = datetime.datetime(2020,3,1)
# Go to yahoo and pull data for the following tickers and then write them to CSV
for ticker in tickers:
df = pdr.get_data_yahoo(ticker, start=start, end=end)
df.to_csv(f"{ticker}.csv")
Once the above has been done, I'm reading in a CSV of all the ticker names and then concatenating them with the individual CSV file names. Well that's what I want to do at least.
import pandas as pd
tickers = pd.read_csv('C:/Users/XXX/Desktop/Momentum/tickers.csv', header=None)[1].tolist()
stocks = (
(pd.concat(
[pd.read_csv(f"C:/Users/XXX/Desktop/Momentum/{ticker}.csv", index_col='Date', parse_dates=True)['Adj Close'].rename(.replace(".DE", "")ticker)
for ticker in tickers],
axis=1,
sort=True)
)
)
stocks = stocks.loc[:,~stocks.columns.duplicated()]
Now I've got this code to work before but when importing other stock tickers. All my jupyter notebook does is spin out.
I was wondering what the issue was here and if it was because the CSV file name would be something like ADS.DE.csv and if the first . is causing issues.
I solved this problem with the following code.
import os
for filename in os.listdir('dirname'):
os.rename(filename, filename.replace('_intsect_d', ''))

CSV file from txt using pandas

I have a txt file with info inside of it, separated for every deal with \n symbol.
DEAL: 896
CITY: New York
MARKET: Manhattan
PRICE: $9,750,000
ASSET TYPE: Rental Building
SF: 8,004
PPSF: $1,218
DATE: 11/01/2017
Is there any way to make a csv (or another) table with headers, specified like CITY, MARKET, etc. with pandas or csv module? All the info from specific title should go into corresponding header
Updated to navigate around using : as a delimiter:
import pandas as pd
new_temp = open('temp.txt', 'w') # writing data to a new file changing the first delimiter only
with open('fun.txt') as f:
for line in f:
line = line.replace(':', '|', 1) # only replace first instance of : to use this as delimiter for pandas
new_temp.write(line)
new_temp.close()
df = pd.read_csv('temp.txt', delimiter='|', header=None)
df = df.set_index([0]).T
df.to_csv('./new_transposed_df.csv', index=False)
Will make a csv with the left column as headers and the right column as data without changing colons in the second column. It will write out a temp file called temp.txt which you can remove after you run the program.
Use Pandas to input it and then transform/pivot your table.
import pandas as pd
df = pd.read_csv('data.txt',sep=':',header=None)
df = df.set_index(0).T
Example
import pandas as pd
data = '''
DEAL: 896
CITY: New York
MARKET: Manhattan
PRICE: $9,750,000
ASSET TYPE: Rental Building
SF: 8,004
PPSF: $1,218
DATE: 11/01/2017
'''
df = pd.read_csv(pd.compat.StringIO(data),sep=':',header=None)
print(df.set_index(0).T)
Results:

Read data from excel after a string matches

I want to read the entire row data and store it in variables, later use them in selenium to write it to webelements. Programming language is Python.
Example: I have an excel sheet of Incidents and their details regarding priority, date, assignee etc
If I give the string as INC00000 it should match the excel data, fetch all the above details and store it in separate variables like
INC #= INC0000 Priority= Moderate Date = 11/2/2020
Is this feasible? I tried and failed writing a code. Please suggest other possible ways to do this.
I would,
load the sheet into a pandas DataFrame
filter the corresponding column in the DataFrame by the INC # of interest
convert the row to dictionary (assuming the INC filter produces only 1 row)
get the corresponding value in the dictionary to assign to the corresponding webelement
Example:
import pandas as pd
df = pd.read_excel("full_file_path", sheet_name="name_of_sheet")
dict_data = df[df['INC #']].to_dict("record") # <-- assuming the "INC #" are in column named "INC #" in the spreadsheet
webelement1.send_keys(dict_data[columnname1])
webelement2.send_keys(dict_data[columnname2])
webelement3.send_keys(dict_data[columnname3])
.
.
.
Please find the below code and do the changes as per your variables after saving your excel file as csv:
Please find the dummy data image
import csv
# Set up input and output variables for the script
gTrack = open("file1.csv", "r")
# Set up CSV reader and process the header
csvReader = csv.reader(gTrack)
header = csvReader.next()
print header
id_index = header.index("id")
date_index = header.index("date ")
var1_index = header.index("var1")
var2_index = header.index("var2")
# # Make an empty list
cList = []
# Loop through the lines in the file and get required id
for row in csvReader:
id = row[id_index]
if(id == 'INC001') :
date = row[date_index]
var1 = row[var1_index]
var2 = row[var2_index]
cList.append([id,date,var1,var2])
# # Print the coordinate list
print cList

Categories

Resources