Format pivot table with even columns as column names - python

I'm trying to parse scraped data from finviz quote profile pages. I think I can use panda's pivot table mechanisms to get the output I need, but have never used pivot tables so I'm unsure if or how to format the output.
The table I receive is below. I would like each even column number to be the column header of the output table. Ending with a dataframe with one row, and 72 columns, as there are 72 values. Unless anyone can recommend a better output strcuture and how to access the values?
0 1 2 3 4 5 6 7 8 9 10 11
0 Index S&P 500 P/E 23.06 EPS (ttm) 3.15 Insider Own 0.20% Shs Outstand 1.01B Perf Week 3.94%
1 Market Cap 73.24B Forward P/E 21.29 EPS next Y 3.41 Insider Trans -34.81% Shs Float 996.66M Perf Month 4.85%
2 Income 3.21B PEG 2.31 EPS next Q 0.81 Inst Own 89.30% Short Float 2.05% Perf Quarter 4.47%
3 Sales 13.15B P/S 5.57 EPS this Y 9.70% Inst Trans 0.42% Short Ratio 4.21 Perf Half Y 25.19%
4 Book/sh 10.26 P/B 7.08 EPS next Y 7.88% ROA 20.10% Target Price 74.34 Perf Year 28.65%
5 Cash/sh 3.11 P/C 23.35 EPS next 5Y 10.00% ROE 32.10% 52W Range 45.51 - 72.28 Perf YTD 36.02%
6 Dividend 2 P/FCF 31.29 EPS past 5Y 1.50% ROI 21.60% 52W High 0.44% Beta 1.26
7 Dividend % 2.75% Quick Ratio 2.5 Sales past 5Y -1.40% Gross Margin 60.70% 52W Low 59.54% ATR 1.34
8 Employees 29977 Current Ratio 3.3 Sales Q/Q 7.20% Oper. Margin 35.20% RSI (14) 67.26 Volatility 1.65% 1.93%
9 Optionable Yes Debt/Eq 0.35 EPS Q/Q 23.80% Profit Margin 24.40% Rel Volume 1.13 Prev Close 72.08
10 Shortable Yes LT Debt/Eq 0.29 Earnings Oct 26 AMC Payout 47.70% Avg Volume 4.84M Price 72.6
11 Recom 2.5 SMA20 3.86% SMA50 4.97% SMA200 16.49% Volume 5476883 Change 0.72%
I know the formatting is hard to see, so

Try reshaping
d1 = pd.DataFrame(df.values.reshape(-1, 2), columns=['key', 'value'])
d1.set_index('key').T

Related

Determining average values over irregular number of rows in a csv file

I have a csv file with days of the year in one column and temperature in another. The days are split into sections and I want to find the average temperature over each day.Eg day 0,1,2,3 etc
The measurements of temperatures has been taken irregularly meaning there are different numbers of measurements at certain times for each day.
Typically I would use df.groupby(np.arange(len(df)) // n).mean() but n, the number of rows will be varying in this case.
I have an example of what the data is like.
Days
Temp
0.75
19
0.8
18
1.2
18
1.25
18
1.75
19
3.05
18
3.55
21
3.60
21
3.9
18
4.5
20
You could convert Days to an integer and use that to group.
>>> df.groupby(df["Days"].astype(int)).mean()
Days Temp
Days
0 0.775 18.500000
1 1.400 18.333333
3 3.525 19.500000
4 4.500 20.000000

Calculate compounded return with external cashflows

I'm trying to calculate the cumulative price of a time series in the face of external cashflows.
This is the sample dataset:
reportdate fund mtd_return cashflow Desired Output
30/11/2018 Fund X -0.00860 15687713 15552798.98
31/12/2018 Fund X -0.00900 15412823.78
31/01/2019 Fund X 0.00920 15554621.76
28/02/2019 Fund X 0.00630 15652615.88
31/03/2019 Fund X 0.00700 15762184.19
30/04/2019 Fund X 0.01220 15954482.84
31/05/2019 Fund X 0.00060 1000000 16964655.53
30/06/2019 Fund X 0.00570 1200000 18268194.07
31/07/2019 Fund X 0.00450 18350400.94
31/08/2019 Fund X 0.00210 18388936.78
30/09/2019 Fund X 0.00530 18486398.15
31/10/2019 Fund X 0.00200 18523370.94
30/11/2019 Fund X 0.00430 18603021.44
31/12/2019 Fund X 0.00660 18725801.38
31/01/2020 Fund X 0.01070 18926167.45
29/02/2020 Fund X -0.00510 18829644.00
31/03/2020 Fund X -0.10700 16814872.09
30/04/2020 Fund X 0.02740 3400000 20768759.59
31/05/2020 Fund X 0.02180 2000000 23265118.55
30/06/2020 Fund X 0.02270 23793236.74
31/07/2020 Fund X 0.01120 24059720.99
31/08/2020 Fund X 0.01260 24362873.47
30/09/2020 Fund X 0.00750 24545595.02
31/10/2020 Fund X 0.00410 -8110576 16502402.68
30/11/2020 Fund X 0.02790 16962819.72
31/12/2020 Fund X 0.01230 17171462.40
In the above, the Desired Output column is calculated by taking the previous row's Desired Output, plus any cashflow in the current period, times 1 + mtd_return. Effectively, I'm looking for a good way to calculate compounded returns in the face of external cashflows.
Many thanks!
Mike
Any help on implementing this in python would be appreciated.
import pandas as pd
df = pd.read_csv('df7.txt', sep=',', header=0)
df['reportdate'] = pd.to_datetime(df['reportdate'])
df = df.fillna(0)
qqq = []
def func_data(x):
a = 0
ind = x.index[0] - 1
if x.index[0] > 0:
a = (qqq[ind] + x['cashflow']) * (1 + x['mtd_return'])
qqq.append(a.values[0])
else:
qqq.append(df.loc[0, 'cashflow'] * (1 + df.loc[0, 'mtd_return']))
return a
df.groupby(['reportdate']).apply(func_data)
df['new'] = qqq
print(df)
Output
reportdate fund mtd_return cashflow Desired Output new
0 2018-11-30 Fund X -0.0086 15687713.0 15552798.98 1.555280e+07
1 2018-12-31 Fund X -0.0090 0.0 15412823.78 1.541282e+07
2 2019-01-31 Fund X 0.0092 0.0 15554621.76 1.555462e+07
3 2019-02-28 Fund X 0.0063 0.0 15652615.88 1.565262e+07
4 2019-03-31 Fund X 0.0070 0.0 15762184.19 1.576218e+07
5 2019-04-30 Fund X 0.0122 0.0 15954482.84 1.595448e+07
6 2019-05-31 Fund X 0.0006 1000000.0 16964655.53 1.696466e+07
7 2019-06-30 Fund X 0.0057 1200000.0 18268194.07 1.826819e+07
8 2019-07-31 Fund X 0.0045 0.0 18350400.94 1.835040e+07
9 2019-08-31 Fund X 0.0021 0.0 18388936.78 1.838894e+07
10 2019-09-30 Fund X 0.0053 0.0 18486398.15 1.848640e+07
11 2019-10-31 Fund X 0.0020 0.0 18523370.94 1.852337e+07
12 2019-11-30 Fund X 0.0043 0.0 18603021.44 1.860302e+07
13 2019-12-31 Fund X 0.0066 0.0 18725801.38 1.872580e+07
14 2020-01-31 Fund X 0.0107 0.0 18926167.45 1.892617e+07
15 2020-02-29 Fund X -0.0051 0.0 18829644.00 1.882964e+07
16 2020-03-31 Fund X -0.1070 0.0 16814872.09 1.681487e+07
17 2020-04-30 Fund X 0.0274 3400000.0 20768759.59 2.076876e+07
18 2020-05-31 Fund X 0.0218 2000000.0 23265118.55 2.326512e+07
19 2020-06-30 Fund X 0.0227 0.0 23793236.74 2.379324e+07
20 2020-07-31 Fund X 0.0112 0.0 24059720.99 2.405972e+07
21 2020-08-31 Fund X 0.0126 0.0 24362873.47 2.436287e+07
22 2020-09-30 Fund X 0.0075 0.0 24545595.02 2.454559e+07
23 2020-10-31 Fund X 0.0041 -8110576.0 16502402.68 1.650240e+07
24 2020-11-30 Fund X 0.0279 0.0 16962819.72 1.696282e+07
25 2020-12-31 Fund X 0.0123 0.0 17171462.40 1.717146e+07
Made in your file all the values ​​separated by commas, empty too (that is, between commas is empty). Read a file in pandas and created a dataframe. header=0 means that the first row is used as column headers. Next, the 'reportdate ' column is converted to datetime format and empty values are replaced with zero. The data is grouped by date. The func_data function is created for the call. If this is the first call, then the code in else is calculated, the rest is in if. The calculations are written to the qqq array, which then populates the 'new' column.

Calculating Accrued Interest with Pandas

I'm trying to calculate the amount of interest that would've accrued during a period of time. I have the starting DataFrame as below
MONTH_BEG_D NO_OF_DAYS RATE
1/10/2017 31 5.22
1/11/2017 30 5.22
1/12/2017 31 5.22
1/1/2018 31 3.5
1/2/2018 28 3.5
1/3/2018 31 3.5
If the starting value is 20, I would like the outcome to be:
FORMULA: INTEREST = (PRINCIPAL_A * RATE * NO_OF_DAYS) / 36500
PRINCIPAL_A MONTH_BEG_D RATE NO_OF_DAYS INTEREST NEW_BALANCE
20 1/10/2017 5.22 31 0.08866849 20.08866849
20.08866849 1/11/2017 5.22 30 0.08618864 20.17485713
20.17485713 1/12/2017 5.22 31 0.08944371 20.26430084
20.26430084 1/1/2018 3.5 31 0.06023772 20.32453856
20.32453856 1/2/2018 3.5 28 0.05456999 20.37910855
Just to explain, the 36500 is from 365 days of the year for NO_OF_DAYS and the 0.01 multiplier for RATE. I can easily add/modify columns for these 2 variables so this is no problem. My problem lies in how I can carry the NEW_BALANCE over as the next month's PRINCIPAL_A
This is basically a cumprod between each column with a cumsum between each row. Is there an easier way of doing this while avoiding loops?
There you go, not the cleanest solution but it does what you require!
#ENSURE TO IMPORT NUMPY LIBRARY.
import numpy as np
#INCREASE PRECISION OUTPUT.
pd.options.display.precision = 7
#INSERT PRINCIPAL_A COLUMN WITH NULL VALUE, LATER TO SET THE INITIAL VALUE.
df.insert(0, 'PRINCIPAL_A', np.nan)
#SET INITIAL VALUE AS 20 IN FIRST ROW ONLY.
df.iloc[0:1, 0] = 20
#LOOP OVER DATAFRAME FOR ROLL-OVER.
for row in range(1, len(df)):
df['INTEREST'] = (df['PRINCIPAL_A'] * df['RATE'] * df['NO_OF_DAYS']) / 36500
df['NEW_BALANCE'] = df['PRINCIPAL_A'] + df['INTEREST']
df.iloc[row : row+1]['PRINCIPAL_A'] = df['NEW_BALANCE'].shift(1) #ROLL-OVER NEW BALANCE USING PANDAS SHIFT.
There is the output (slightly different than yours due to rounding) :
PRINCIPAL_A MONTH_BEG_D NO_OF_DAYS RATE INTEREST NEW_BALANCE
0 20.0000000 01/10/2017 31 5.22 0.0886685 20.0886685
1 20.0886685 01/11/2017 30 5.22 0.0861886 20.1748571
2 20.1748571 01/12/2017 31 5.22 0.0894437 20.2643008
3 20.2643008 01/01/2018 31 3.50 0.0602377 20.3245386
4 20.3245386 01/02/2018 28 3.50 0.0545700 20.3791086
5 20.3791086 01/03/2018 31 3.50 NaN NaN

How can I return multple rows instead of just one using Python?

I'm attempting to pull data from finivz and I'm able to pull only one row at all time.
Here's my code:
url = ('https://finviz.com/quote.ashx?t=' + ticker.upper())
r = Request(url, headers = header)
html = urlopen(r).read()
soup = BeautifulSoup(html, 'lxml')
rows = soup.find_all('tr')
rows = rows[13:20]
for row in rows:
row_td = row.find_all('td') <------------ I believe the issue is with this section?
#print(row_td)
str_cells = str(row_td)
clean = BeautifulSoup(str_cells, "lxml").get_text()
print(clean)
Only prints:
[Dividend %, 2.97%, Quick Ratio, 1.30, Sales past 5Y, -5.70%, Gross Margin, 60.60%, 52W Low, 20.59%, ATR, 0.64] - even though I specify rows[13:30]
I'd like to print out all of the rows from the table on the page.
here is a screenshot of the table
You can do it easily using only pandas DataFrame.
Here is the full working output.
CODE:
import requests
import pandas as pd
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'}
url = "https://finviz.com/quote.ashx?t=KO"
req = requests.get(url,headers=headers)
wiki_table = pd.read_html(req.text, attrs = {"class":"snapshot-table2"} )
df = wiki_table[0]
print(df)
OUTPUT:
0 1 2 3 ... 8 9 10 11
0 Index DJIA S&P500 P/E 30.35 ... Shs Outstand 4.31B Perf Week -1.03%
1 Market Cap 245.44B Forward P/E 23.26 ... Shs Float 4.29B Perf Month 0.30%
2 Income 8.08B PEG 3.00 ... Short Float 0.75% Perf Quarter 3.70%
3 Sales 36.41B P/S 6.74 ... Short Ratio 2.34 Perf Half Y 11.87%
4 Book/sh 5.16 P/B 10.98 ... Target Price 62.06 Perf Year 19.62%
5 Cash/sh 3.01 P/C 18.82 ... 52W Range 46.97 - 57.56 Perf YTD 3.28%
6 Dividend 1.68 P/FCF 95.02 ... 52W High -1.60% Beta 0.63
7 Dividend % 2.97% Quick Ratio 1.30 ... 52W Low 20.59% ATR 0.64
8 Employees 80300 Current Ratio 1.50 ... RSI (14) 51.63 Volatility 1.17% 0.94%
9 Optionable Yes Debt/Eq 1.89 ... Rel Volume 0.76 Prev Close 56.86
10 Shortable Yes LT Debt/Eq 1.79 ... Avg Volume 13.67M Price 56.64
11 Recom 2.20 SMA20 -0.42% ... Volume 10340772 Change -0.39%
[12 rows x 12 columns]
In the for-loop, you're rewriting variable row_td over and over. Store the content of the variable in the list (in my example, I use the list all_data to store all rows).
To print all rows from the table, you can use next example:
import requests
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
}
url = "https://finviz.com/quote.ashx?t=KO"
soup = BeautifulSoup(requests.get(url, headers=headers).content, "lxml")
all_data = []
for tr in soup.select(".snapshot-table2 tr"):
tds = [td.get_text(strip=True) for td in tr.select("td")]
all_data.append(tds)
fmt_string = "{:<15}" * 12
for row in all_data:
print(fmt_string.format(*row))
Prints:
Index DJIA S&P500 P/E 30.35 EPS (ttm) 1.87 Insider Own 0.30% Shs Outstand 4.31B Perf Week -1.03%
Market Cap 245.44B Forward P/E 23.26 EPS next Y 2.44 Insider Trans -2.65% Shs Float 4.29B Perf Month 0.30%
Income 8.08B PEG 3.00 EPS next Q 0.58 Inst Own 69.00% Short Float 0.75% Perf Quarter 3.70%
Sales 36.41B P/S 6.74 EPS this Y -13.30% Inst Trans 0.55% Short Ratio 2.34 Perf Half Y 11.87%
Book/sh 5.16 P/B 10.98 EPS next Y 7.84% ROA 8.90% Target Price 62.06 Perf Year 19.62%
Cash/sh 3.01 P/C 18.82 EPS next 5Y 10.12% ROE 40.10% 52W Range 46.97 - 57.56 Perf YTD 3.28%
Dividend 1.68 P/FCF 95.02 EPS past 5Y 1.40% ROI 12.20% 52W High -1.60% Beta 0.63
Dividend % 2.97% Quick Ratio 1.30 Sales past 5Y -5.70% Gross Margin 60.60% 52W Low 20.59% ATR 0.64
Employees 80300 Current Ratio 1.50 Sales Q/Q 41.70% Oper. Margin 25.70% RSI (14) 51.63 Volatility 1.17% 0.94%
Optionable Yes Debt/Eq 1.89 EPS Q/Q 47.70% Profit Margin 22.20% Rel Volume 0.76 Prev Close 56.86
Shortable Yes LT Debt/Eq 1.79 Earnings Jul 21 BMO Payout 87.90% Avg Volume 13.67M Price 56.64
Recom 2.20 SMA20 -0.42% SMA50 1.65% SMA200 6.56% Volume 10,340,772 Change -0.39%

Labelling group boxplot in seaborn with mean and standard deviation values [duplicate]

I want to display mean and standard deviation values above each of the boxplots in the grouped boxplot (see picture).
My code is
import pandas as pd
import seaborn as sns
from os.path import expanduser as ospath
df = pd.read_excel(ospath('~/Documents/Python/Kandidatspeciale/TestData.xlsx'),'Ark1')
bp = sns.boxplot(y='throw angle', x='incident angle',
data=df,
palette="colorblind",
hue='Bat type')
bp.set_title('Rubber Comparison',fontsize=15,fontweight='bold', y=1.06)
bp.set_ylabel('Throw Angle [degrees]',fontsize=11.5)
bp.set_xlabel('Incident Angle [degrees]',fontsize=11.5)
Where my dataframe, df, is
Bat type incident angle throw angle
0 euro 15 28.2
1 euro 15 27.5
2 euro 15 26.2
3 euro 15 27.7
4 euro 15 26.4
5 euro 15 29.0
6 euro 30 12.5
7 euro 30 14.7
8 euro 30 10.2
9 china 15 29.9
10 china 15 31.1
11 china 15 24.9
12 china 15 27.5
13 china 15 31.2
14 china 15 24.4
15 china 30 9.7
16 china 30 9.1
17 china 30 9.5
I tried with the following code. It needs to be independent of number of x (incident angles), for instance it should do the job for more angles of 45, 60 etc.
m=df.mean(axis=0) #Mean values
st=df.std(axis=0) #Standard deviation values
for i, line in enumerate(bp['medians']):
x, y = line.get_xydata()[1]
text = ' μ={:.2f}\n σ={:.2f}'.format(m[i], st[i])
bp.annotate(text, xy=(x, y))
Can somebody help?
This question brought me here since I was also looking for a similar solution with seaborn.
After some trial and error, you just have to change the for loop to:
for i in range(len(m)):
bp.annotate(
' μ={:.2f}\n σ={:.2f}'.format(m[i], st[i]),
xy=(i, m[i]),
horizontalalignment='center'
)
This change worked for me (although I just wanted to print the actual median values). You can also add changes like the fontsize, color or style (i.e., weight) just by adding them as arguments in annotate.

Categories

Resources