Resolving numbers stored as text errors - python

I am trying to complete a script to store all the trail reports my company gets from various clearing houses. As part of this script I rip the data from multiple excel sheets (over 20 a month) and an amalgamate it in a series of pandas dataframes(organized in a timeline). Unfortunately when I try to output a new spreadsheet with the amalgamated summaries, I get a 'number stored as text' error from excel.
FinalFile = Workbook()
FinalFile.create_sheet(title='Summary') ### This will hold a summary table eventually
for i in Timeline:
index = Timeline.index(i)
sheet = FinalFile.create_sheet(title=i)
sheet[i].number_format = 'Currency'
df = pd.DataFrame(Output[index])
df.columns = df.iloc[0]
df = df.iloc[1:].reset_index(drop=True)
df.head()
df = df.set_index('Payment Type')
for r in dataframe_to_rows(df, index=True,header=True):
sheet.append(r)
for cell in sheet['A'] + sheet[1]:
cell.style='Pandas'
SavePath = SaveFolder+'/'+CurrentDate+'.xlsx'
FinalFile.save(SavePath)
using number_format = 'Currency' to format as currency did not resolve this, nor did my attempt to use the write only methond on the openpyxl documentation page
https://openpyxl.readthedocs.io/en/stable/pandas.html
Fundamentally this code is outputting the right index, headers, sheetname and formatting the only issue issue is the numbers stored as text from B3:D7.
Attached is an example month Output
example dataframe of the same month
0 Total Paid Net GST
Payment Type
Adjustments -2800 -2546 -254
Agency Upfront 23500 21363 2135
Agency Trail 46980 42708 4270
Referring Office Trail 16003 14548 1454
NilTrailPayment 0 0 0

Related

use ek.get_data in python, there is only one line corp.ric in csv shows the results, other lines are disappear

I want retrieve the financial data from Refinitiv, I use the code below, the 122.csv have many ric of companies, but when I run the code, the results only shows the first company's information, and the ric of the first company is incomplete,
with open (r'C:\Users\zhang\Desktop\122.csv',encoding='utf-8') as date:
for i, line in enumerate(read_csv(date)):
RIC = line[0]
SDate = '2013-06-10'
EDate = '2015-06-10'
df1,das= ek.get_data(RIC,fields =[
'TR.F.TotAssets(Scale=6)',
'TR.F.DebtTot(Scale=6)',]
parameters = {'SDate':'{}'.format(SDate), 'EDate': '{}'.format(EDate)},
)
df1
the result:
Instrume ntTotal Assets Debt - Total
0 A 10686 2699
1 A 10815 1663
How can I generate the whole list of ric of companies in my csv file?

error "can only convert an array of size 1 to a Python scalar" when I try to get the index of a row of a pandas dataframe as an integer

I have an excel file with stock symbols and many other columns. I have a simplified version of the excel file below:
Symbol
Industry
0
AAPL
Technology Manufacturing
1
MSFT
Technology Manufacturing
2
TSLA
Electric Car Manufacturing
Essentially, I am trying to get the Industry based on the Symbol.
For example, if I use 'AAPL' I want to get 'Technology Manufacturing'. Here is my code so far.
import pandas as pd
excel_file1 = 'file.xlsx'
df = pd.read_excel(excel_file1)
stock = 'AAPL'
row_index = df[df['Symbol'] == stock].index.item()
industry = df['Industry'][row_index]
print(industry)
after trying to get row_index, I get an error: "ValueError: can only convert an array of size 1 to a Python scalar"
can someone solve this? Also let's say row_index works: is this code (below) correct?
industry = df['Industry'][row_index]
Use:
stock = 'AAPL'
industry = df[df['Symbol'] == stock]['Industry'][0]
OR:, if you want to search using index, use df.loc:
stock = 'AAPL'
industry = df.loc[df[df['Symbol'] == stock].index, 'Industry'][0]
But the first one's much better.

Need a 'for loop' to get dividend data for a stock portfolio, from their respective api urls

I am trying to automate parsing of dividend data for a stock portfolio, and getting the stock wise dividend values into a single dataframe table.
The data for each stock in a portfolio is stored in a separate api url
The portfolio ids (for stocks - ITC, Britannia, Sanofi) are [500875, 500825, 500674].
I would first like to run a 'for loop' to generate/concatenate each specific url (which goes like this - https://api.bseindia.com/BseIndiaAPI/api/CorporateAction/w?scripcode=500674), the last 6 digit numbers of urls being their respective company ids
Then I would like to use that url to get each of the respective dividend table's first line into a single dataframe. The code I used to get the individual dividend data, and the final dataframe that I need is represented in image attached
Basically I would like to run a 'for loop' to get the first line of 'Table2' for each stock id and store it in a single data frame as a final result.
PS - The code which I used to get individual dividend data is highlighted below:
url = 'https://api.bseindia.com/BseIndiaAPI/api/CorporateAction/w?scripcode=500674'
jsondata = requests.get(url, headers= {'User-Agent': 'Mozilla/5.0'}).json()
df = pd.DataFrame(jsondata['Table2'])
If you need for-loop then you should use it and show code with for-loop and problem which it gives you.
You could use single for-loop for all works.
You can use string formatting to create url with code and read data from server. Next you can get first row (even without creating DataFrame) and append to list with all rows. And after loop you can convert this list to DataFrame
import requests
import pandas as pd
# --- before loop ---
headers = {'User-Agent': 'Mozilla/5.0'}
all_rows = []
# --- loop ---
for code in [500875, 500825, 500674]:
# use `f-string` of string `.format()` to create url
#url = f'https://api.bseindia.com/BseIndiaAPI/api/CorporateAction/w?scripcode={code}'
url = 'https://api.bseindia.com/BseIndiaAPI/api/CorporateAction/w?scripcode={}'.format(code)
r = requests.get(url, headers=headers)
#print(r.text) # to check error message
#print(r.status_code)
data = r.json()
first_row = data['Table2'][0] # no need to use DataFrame
#df = pd.DataFrame(data['Table2'])
#first_row = df.iloc[0]
#print(first_row)
all_rows.append(first_row)
# --- after loop ---
df_result = pd.DataFrame(all_rows)
print(df_result)
Result:
scrip_code sLongName ... Details PAYMENT_DATE
0 500875 ITC LTD. ... 10.1500 2020-09-08T00:00:00
1 500825 BRITANNIA INDUSTRIES LTD. ... 83.0000 2020-09-16T00:00:00
2 500674 Sanofi India Ltd ... 106.0000 2020-08-06T00:00:00
[3 rows x 9 columns]

How to append two dataframe objects containing same column data but different column names?

I want to append an expense df to a revenue df but can't properly do so. Can anyone offer how I may do this?
'''
import pandas as pd
import lxml
from lxml import html
import requests
import numpy as np
symbol = 'MFC'
url = 'https://www.marketwatch.com/investing/stock/'+ symbol +'/financials'
df=pd.read_html(url)
revenue = pd.concat(df[0:1]) # the revenue dataframe obj
revenue = revenue.dropna(axis='columns') # drop naN column
header = revenue.iloc[:0] # revenue df header row
expense = pd.concat(df[1:2]) # the expense dataframe obj
expense = expense.dropna(axis='columns') # drop naN column
statement = revenue.append(expense) #results in a dataframe with an added column (Unnamed:0)
revenue = pd.concat(df[0:1]) =
Fiscal year is January-December. All values CAD millions.
2015
2016
2017
2018
2019
expense = pd.concat(df[1:2]) =
Unnamed: 0
2015
2016
2017
2018
2019
'''
How can I append the expense dataframe to the revenue dataframe so that I am left with a single dataframe object?
Thanks,
Rename columns.
df = df.rename(columns={'old_name': 'new_name',})
Then append with merge(), join(), or concat().
I managed to append the dataframes with the following code. Thanks David for putting me on the right track. I admit this is not the best way to do this because in a run time environment, I don't know the value of the text to rename and I've hard coded it here. Ideally, it would be best to reference a placeholder at df.iloc[:0,0] instead, but I'm having a tough time getting that to work.
df=pd.read_html(url)
revenue = pd.concat(df[0:1])
revenue = revenue.dropna(axis='columns')
revenue.rename({'Fiscal year is January-December. All values CAD millions.':'LineItem'},axis=1,inplace=True)
header = revenue.iloc[:0]
expense = pd.concat(df[1:2])
expense = expense.dropna(axis='columns')
expense.rename({'Unnamed: 0':'LineItem'}, axis=1, inplace=True)
statement = revenue.append(expense,ignore_index=True)
Using the df=pd.read_html(url) construct, several lists are returned when scraping marketwatch financials. The below function returns a single dataframe of all balance sheet elements. The same code applies to quarterly and annual income and cash flow statements.
def getBalanceSheet(url):
df=pd.read_html(url)
count = sum([1 for Listitem in df if 'Unnamed: 0' in Listitem])
statement = pd.concat(df[0:1])
statement = statement.dropna(axis='columns')
if 'q' in url: #quarterly
statement.rename({'All values CAD millions.':'LineItem'},axis=1,inplace=True)
else:
statement.rename({'Fiscal year is January-December. All values CAD millions.':'LineItem'},axis=1,inplace=True)
for rowidx in range(count):
df_name = 'df_'+str(int(rowidx))
df_name = pd.concat(df[rowidx+1:rowidx+2])
df_name = df_name.dropna(axis='columns')
df_name.rename({'Unnamed: 0':'LineItem'}, axis=1, inplace=True)
statement = statement.append(df_name,ignore_index=True)
return statement

How to calculate growth rate from csv excel data sheet

I am working with a csv sheet which contains data from a brewery, for e.g Data required, Quantity order etc.
I want to write a module to read the csv file structure and load the data into a suitable data structure
in Python. I have to interpret the data by calculating the average growth rate, the ratio of sales for
different beers and use these values to predict sales for a given week or month in the future.
I have no idea where to start. The only line of code I have so far are :
df = pd.read_csv (r'file location')
print (df)
To illustrate, I have downloaded data on the US employment level (https://fred.stlouisfed.org/series/CE16OV) and population (https://fred.stlouisfed.org/series/POP).
import pandas as pd
employ = pd.read_csv('/home/brb/bugs/data/CE16OV.csv')
employ = employ.rename(columns={'DATE':'date'})
employ = employ.rename(columns={'CE16OV':'employ'})
employ = employ[employ['date']>='1952-01-01']
pop = pd.read_csv('/home/brb/bugs/data/POP.csv')
pop = pop.rename(columns={'DATE':'date'})
pop = pop.rename(columns={'POP':'pop'})
pop = pop[pop['date']<='2019-10-01']
df = pd.merge(employ,pop)
df['employ_monthly'] = df['employ'].pct_change()
df['employ_yoy'] = df['employ'].pct_change(periods=12)
df['employ_pop'] = df['employ']/df['pop']
df.head()

Categories

Resources