Python Edgar package - get CIK number - python

I am reading S-1 filings from Edgar sec. I get my initial data from Bloomberg. Through the company name I can look for the matching CIK number using the term get_cik_by_company_name(company_name: str). I should be able to get the CIK number which I than want to save in a list -> cik_list. However it is not working - Invalid Syntax for str.
BloombergList ist the Excel Bloomberg created with all the relevant company names. In column 4 I got the names which I import as a list, than get the matching CIK and than export the CIK list in the right order back to the BloombergList - theoretically.
I am happy if someone can help. Thanks in advance.
#needed packages
import pandas as pd
from openpyxl import load_workbook
from edgar import Edgar
#from excel get company names
book = load_workbook('BloombergList.xlsx')
sheet = book['Cleaned up']
for row in sheet.rows:
row_list = row[4].value
print (row_list)
#use edgar package to get CIK numbers from row_list
edgar = Edgar()
cik_list = []
for x in row_list:
possible_companies = edgar.find_company_name(x)
cik_list.append(get_cik_by_company_name(company_name: str))
#export generated CIK numbers back to excel
df = pd.DataFrame({'CIK':[cik_list]})
df.to_excel('BloombergList.xlsx', sheet_name="CIK", index=False)
print ("done")

Related

Trying to incorporate RegEX with Excel exercise from "Automate the Boring Stuff with Python"

I am trying to modify the project code from Automate The boring stuff with python chapter 13 page 315 to incorporate regular expressions into the code. For my Excel sheet I have a list of sensor names that need the "Unit" column filled out appropriately (see image below for an example). I have updated the project code dictionary to incorporate the corresponding units to the _BattV, _ft_H20,etc. as the sensor name is structured XXX_123_BattV, where the XXX is the project code, 123 is the sensor number and _BattV is the suffix indicating what type of sensor it is. I would like to match the last chunk of each sensor name using RegEX so that the code will update each sensor with _BattV with 'volts' in the unit column and so on.
Here is the code I have modified from the project example so far.
#! python3
#updateProduce.py - Corrects cost in produce sales spreadsheet.
import openpyxl
filename = 'stackoverflowexample.xlsx'
wb = openpyxl.load_workbook(filename)
sheet = wb['stackoverflowexample']
#The produce types and their updated prices
UNIT_UPDATES = {
'_BattV': 'volts',
'_ft_H2O': 'Feet of H20',
'_GWE': 'Elevation (ft)',
'_PSI': 'PSI',
'_TempC': 'deg C'}
#Loop through the rows and update the prices.
for rowNum in range(2, sheet.max_row): #skip the first row
Sensor_name = sheet.cell(row=rowNum, column=1).value
if Sensor_name in UNIT_UPDATES:
sheet.cell(row=rowNum, column=2).value = UNIT_UPDATES[Sensor_name]
wb.save(f'updatedstackoverflowexample.xlsx')
Here is what I have gleaned from the RegEx section of the book:
import re
unitRegex = re.compile(r'_BattV|_ft_H2O|_GWE|_PSI|_TempC')
voltRegex = re.compile(r'.*_BattV')
fth20Regex = re.compile(r'.*_ft_H2O')
gweRegex = re.compile(r'.*_GWE')
psiRegex = re.compile(r'.*_PSI')
tempRegex = re.compile(r'.*_TempC')
mo = unitRegex.search('insert cell data here')
I am also curious if it is better to run the Regex and feed all of the matches into the dictionary first and then run the rest. Or if it is better to incorporate it within the for loop.
Finally here is the example screenshot:
Screenshot of excel spreadsheet showing structure of data:
While you can do this with a regex, you might simply split the string:
units = {"BattV":"volts", …}
for cell in ws[A][1:]:
project, sensor, unit = cell.value.split("_")
cell.offset(column=2).value = units[unit]

Python and Pandas Creating Multiple Dynamic Excel Sheets with Dataframes

Thanks in advance! I have been struggling for a few days so that means it is time for me to ask a question. I have a program that is pulling information for three stocks using the module "yfinance" It uses a ticker list in a txt file. I can get the intended information into a data frame for each ticker in the list using a for loop. I then want to save information for each separate ticker on its own sheet in an Excel book with the sheet name being the ticker. As of now I end up creating three distinct data frames but the Excel output only has one tab with the last requested ticker information (MSFT). I think I may need to use an append process to create a new tab with each data frame information, thanks for any suggestions.
Code
import platform
import yfinance as yf
import pandas as pd
import csv
# check versions
print('Python Version: ' + platform.python_version())
print('YFinance Version: ' + yf.__version__)
# load txt of tickers to list, contains three tickers
tickerlist = []
with open('tickers.txt') as inputfile:
for row in csv.reader(inputfile):
tickerlist.append(row)
# iterate through ticker txt file
for i in range(len(tickerlist)):
tickersymbol = tickerlist[i]
stringticker = str(tickersymbol)
stringticker = stringticker.replace("[", "")
stringticker = stringticker.replace("]", "")
stringticker = stringticker.replace("'", "")
# set data to retrievable variable
tickerdata = yf.Ticker(stringticker)
tickerinfo = tickerdata.info
# data items requested
investment = tickerinfo['shortName']
country = tickerinfo['country']
# create dataframes from lists
dfoverview = pd.DataFrame({'Label': ['Company', 'Country'],
'Value': [investment, country]
})
print(dfoverview)
print('-----------------------------------------------------------------')
#export data to each tab (PROBLEM AREA)
dfoverview.to_excel('output.xlsx',
sheet_name=stringticker)
Output
Python Version: 3.7.7
YFinance Version: 0.1.54
Company Walmart Inc.
Country United States
Company Tesla, Inc.
Country United States
Company Microsoft Corporation
Country United States
Process finished with exit code 0
EDITS: Deleted original to try and post to correct forum/location
If all of your ticker information is in a single data frame, Pandas groupby() method works well for you here (if I'm understanding your problem correctly). This is pseudo, but try something like this instead:
import pandas as pd
# df here represents your single data frame with all your ticker info
# column_value is the column you choose to group by
# this column_value will also be used to dynamically create your sheet names
ticker_group = df.groupby(['column_value'])
# create the writer obj
with pd.ExcelWriter('output.xlsx') as writer:
# key=str obj of column_value, data=dataframe obj of data pertaining to key
for key, data in ticker_group:
ticker_group.get_group(key).to_excel(writer, sheet_name=key, index=False)

Creating a new sheet every month with openpyxl

Hi Im involved with tourist lodges in Namibia. We record water readings ect. every day and input to an Excel file and calculate consumption per Pax , the problem is not every staff member understands Excel. So I wrote a simple Python program to input readings into excel automatically. It works the only problem is I want to save each month in a new sheet and have all the data grouped by month (eg. January(all readings) February(all readings)) . I can create a new sheet but I cannot input data to the new sheet, it just overwrites my data from the previous months... The code looks as follows
*import tkinter
from openpyxl import load_workbook
from openpyxl.styles import Font
import time
import datetime
book = load_workbook('sample.xlsx')
#sheet = book.active
Day = datetime.date.today().strftime("%B")
x = book.get_sheet_names()
list= x
if Day in list: # this checks if the sheet exists to stop the creation of multiple sheets with the same name
sheet = book.active
else:
book.create_sheet(Day)
sheet = book.active
#sheet = book.active*
And to write to the sheet I use and entry widget then save the value as follow:
Bh1=int(Bh1In.get())
if Bh1 == '0':
import Error
else:
sheet.cell(row=Day , column =4).value = Bh1
number_format = 'Number'
Maybe I'm being stupid but please help!!
You're depending on getting the active worksheet instead of accessing it by name. Simply using something like:
try:
sheet = wb[Day]
except KeyError:
sheet = wb.create_sheet(Day)
is probably all you need.
Try
if Day in list: # this checks if the sheet exists to stop the creation of multiple sheets with the same name
sheet = book.get_sheet_by_name(Day)
else:
book.create_sheet(Day)
book.save('sample.xlsx')
sheet = book.get_sheet_by_name(Day)

Exporting values from a Spreadsheet using Python for webscraping (BeautifulSoup4)

A. My Objective:
Use Python to extract unique OCPO IDs from an Excel Spreadsheet and using these IDs to web-scrape for corresponding company names and NIN IDs. (Note: Both NIN and OCPO IDs are unique to one company).
B. Details:
i. Extract OCPO IDs from an Excel Spreadsheet using openpyxl.
ii. Search OCPO IDs one-by-one in a business registry (https://focus.kontur.ru/) and find corresponding company names and company IDs (NIN) using BeautifulSoup4.
Example: A search for OCPO ID "00044428" yields a matching company name ПАО "НК "РОСНЕФТЬ" and corresponding NIN ID "7706107510."
Save in Excel the list of company names and NIN IDs.
C. My progress:
i. I'm able to extract the list of OCPO IDs from Excel to Python.
# Pull the Packages
import openpyxl
import requests
import sys
from bs4 import BeautifulSoup
# Pull OCPO from the Spreadsheet
wb = openpyxl.load_workbook(r"C:\Users\ksong\Desktop\book1.xlsx")
sheet = wb.active
sheet.columns[0]
for cellobjc in sheet.columns[0]:
print(cellobjc.value)
ii. I'm able to search an OCPO ID and let Python scrape matching company name and corresponding company NIN ID.
# Part 1a: Pull the Website
r = requests.get("https://focus.kontur.ru/search?query=" + "00044428")
r.encoding = "UTF-8"
# Part 1b: Pull the Content
c = r.content
soup = BeautifulSoup(c, "html.parser", from_encoding="UTF-8")
# Part 2a: Pull Company name
name = soup.find("a", attrs={'class':"js-subject-link"})
name_box = name.text.strip()
print(name_box)
D. Help
i. How do you code so that loop each OCPO IDs are searched individually as a loop so that I don't get a list of OCPOs IDs but instead a list of search results? In other words, each OCPO is searched and matched with corresponding Company Name and NIN ID. This loop would have to be fed as ######## ("https://focus.kontur.ru/search?query=" + "########").
ii. Also, what code should I use for Python to save all the search results in an Excel Spreadsheet?
1) Create an empty workbook to write to:
wb2 = Workbook()
ws1 = wb2.active
2) Put all that code in the 2nd box into your for loop from the first box.
3) Change "00044428" to str(cellobjc.value)
4) At the end of each loop, append your row to the new worksheet:
row = [cellobjc.value, date_box, other_variables]
ws1.append(row)
5) After the loop finishes, save your file
wb2.save("results.xlsx")

Automatically writing data from Python dictionary to a very specific excel format

I have some data stored in a .csv file that is automatically read into a nested python dictionary. The code I already have will read any properly formatted file such that the dictionary is of the form dict[experiment][variable]=value.
My goal is to rewrite the data into a very specific format, namely:
Name Experiment1
Notes
Componentnotes
Components time LR1R2 LR1R2_I R1 R1_I R2 R2_I
Values 0 1.69127 16.9127 271.087 2710.87 127.087 1270.87
20 62.0374 356.28 146.54 2107.15 2.54022 667.147
40 50.0965 451.149 146.061 1793.54 2.06075 353.535
Note that this is pasted from excel so Experiment1 is in cell B2.
My code so far:
import pandas
import openpyxl
def write_experiment_files_template(self):
alphabet=list(string.ascii_lowercase)#get alphabet for looping over later
for i in self.child_experiments_dir: #loop over all locations for writing the properly formatted data
for j in self.data_dct.keys(): #loop over experiments
name = j[:-4] #get name of experiment and use it as name in spreadsheet
for k in self.data_dct[j].keys():
components= self.data_dct[j][k].keys() #get variable names
data_shape =self.data_dct[j][k].shape #get dimentions of data (which is a pandas data frame)
#write the easy bits into the excel workbook
wb = openpyxl.Workbook()
ws = wb.active
ws['A1']='Name'
ws['B1']=name
ws['A2']='Notes'
ws['A3']='Componentnotes'
ws['A4']='Components'
ws['B4']='time'
ws['A5']='Values'
#loop over variables and write the pandas data frame into the space for the values
for l in range(len(components)):
ws[alphabet[l+2]+'4']=components[l] #plus 2 to account for time and headers
#loop over the space in the spreadsheet required for data input (B5:B35)
for m in ws[alphabet[l+2]+'5':alphabet[len(components)+1]+str(data_shape[0]+4)]:
m[0]= self.data_dct[j] #attempt to assign values
The above code does not do what I need it to and it appears I can't find a way to fix it. Does anybody have any ideas as to how I can either fix my code or take another approach to properly formatting this data?
Thanks

Categories

Resources