python: Arrange in pandas dataframe - python

I extract the data from a webpage but would like to arrange it into the pandas dataframe table.
finviz = requests.get('https://finviz.com/screener.ashx?v=152&o=ticker&c=0,1,2,3,4,5,6,7,10,11,12,14,16,17,19,21,22,23,24,25,31,32,33,38,41,48,65,66,67&r=1')
finz = html.fromstring(finviz.content)
col = finz.xpath('//table/tr/td[#class="table-top"]/text()')
data = finz.xpath('//table/tr/td/a[#class="screener-link"]/text()')
Col is the column for the pandas dataframe and each of the 28 data points in data list will be arranged accordingly into rows. data points 29 to 56 in the second row and so forth. How to write the code elegantly?
datalist = []
for y in range (28):
datalist.append(data[y])
>>> datalist
['1', 'Agilent Technologies, Inc.', 'Healthcare', 'Medical Laboratories & Research', 'USA', '23.00B', '29.27', '4.39', '4.53', '18.76', '1.02%', '5.00%', '5.70%', '3
24.30M', '308.52M', '2.07', '8.30%', '15.70%', '14.60%', '1.09', '1,775,149', '2', 'Alcoa Corporation', 'Basic Materials', 'Aluminum', 'USA', '1.21B', '-']
But the result is not in table form like dataframe

Pandas has a function to parse HTML: pd.read_html
You can try the following:
# Modules
import pandas as pd
import requests
# HTML content
finviz = requests.get('https://finviz.com/screener.ashx?v=152&o=ticker&c=0,1,2,3,4,5,6,7,10,11,12,14,16,17,19,21,22,23,24,25,31,32,33,38,41,48,65,66,67&r=1')
# Convert to dataframe
df = pd.read_html(finviz.content)[-2]
# Set 1st row to columns names
df.columns = df.iloc[0]
# Drop 1st row
df = df.drop(df.index[0])
# df = df.set_index('No.')
print(df)
# 0 No. Ticker Company Sector Industry Country ... Debt/Eq Profit M Beta Price Change Volume
# 1 1 A Agilent Technologies, Inc. Healthcare Medical Laboratories & Research USA ... 0.51 14.60 % 1.20 72.47 - 0.28 % 177333
# 2 2 AA Alcoa Corporation Basic Materials Aluminum USA ... 0.44 - 10.80 % 2.03 6.28 3.46 % 3021371
# 3 3 AAAU Perth Mint Physical Gold ETF Financial Exchange Traded Fund USA ... - - - 16.08 - 0.99 % 45991
# 4 4 AACG ATA Creativity Global Services Education & Training Services China ... 0.02 - 2.96 0.95 - 0.26 % 6177
# 5 5 AADR AdvisorShares Dorsey Wright ADR ETF Financial Exchange Traded Fund USA ... - - - 40.80 0.22 % 1605
# 6 6 AAL American Airlines Group Inc. Services Major Airlines USA ... - 3.70 % 1.83 12.81 4.57 % 16736506
# 7 7 AAMC Altisource Asset Management Corporation Financial Asset Management USA ... - -17.90 % 0.78 12.28 0.00 % 0
# 8 8 AAME Atlantic American Corporation Financial Life Insurance USA ... 0.28 - 0.40 % 0.29 2.20 3.29 % 26
# 9 9 AAN Aaron's, Inc. Services Rental & Leasing Services USA ... 0.20 0.80 % 1.23 22.47 - 0.35 % 166203
# 10 10 AAOI Applied Optoelectronics, Inc. Technology Semiconductor - Integrated Circuits USA ... 0.49 - 34.60 % 2.02 7.80 2.63 % 61303
# 11 11 AAON AAON, Inc. Industrial Goods General Building Materials USA ... 0.02 11.40 % 0.88 48.60 0.71 % 20533
# 12 12 AAP Advance Auto Parts, Inc. Services Auto Parts Stores USA ... 0.21 5.00 % 1.04 95.94 - 0.58 % 165445
# 13 13 AAPL Apple Inc. Consumer Goods Electronic Equipment USA ... 1.22 21.50 % 1.19 262.39 2.97 % 11236642
# 14 14 AAT American Assets Trust, Inc. Financial REIT - Retail USA ... 1.03 12.50 % 0.99 25.35 2.78 % 30158
# 15 15 AAU Almaden Minerals Ltd. Basic Materials Gold Canada ... 0.04 - 0.53 0.28 - 1.43 % 34671
# 16 16 AAWW Atlas Air Worldwide Holdings, Inc. Services Air Services, Other USA ... 1.33 - 10.70 % 1.65 22.79 2.70 % 56521
# 17 17 AAXJ iShares MSCI All Country Asia ex Japan ETF Financial Exchange Traded Fund USA ... - - - 60.13 1.18 % 161684
# 18 18 AAXN Axon Enterprise, Inc. Industrial Goods Aerospace/Defense Products & Services USA ... 0.00 0.20 % 0.77 71.11 2.37 % 187899
# 19 19 AB AllianceBernstein Holding L.P. Financial Asset Management USA ... 0.00 89.60 % 1.35 19.15 1.84 % 54588
# 20 20 ABB ABB Ltd Industrial Goods Diversified Machinery Switzerland ... 0.67 5.10 % 1.10 17.44 0.52 % 723739
# [20 rows x 29 columns]
I let you improve the data selection if the HTML page structure change ! The parent div id might be useful.
Explanation "[-2]": the read_html returns a list of dataframe:
list_df = pd.read_html(finviz.content)
print(type(list_df))
# <class 'list'>
# Elements types in the lists
print(type(list_df [0]))
# <class 'pandas.core.frame.DataFrame' >
So in order to get the desired dataframe, I select the 2nd element before the end with [-2]. This discussion explains about negative indexes.

Related

How to web scrap Economic Calendar data from TradingView and load into Dataframe?

I want to load the Economic Calendar data from TradingView link and load into Dataframe ?
Link: https://in.tradingview.com/economic-calendar/
Filter-1: Select Data for India and United States
Filter-2: Data for This Week
You can request this url: https://economic-calendar.tradingview.com/events
import pandas as pd
import requests
url = 'https://economic-calendar.tradingview.com/events'
today = pd.Timestamp.today().normalize()
payload = {
'from': (today + pd.offsets.Hour(23)).isoformat() + '.000Z',
'to': (today + pd.offsets.Day(7) + pd.offsets.Hour(22)).isoformat() + '.000Z',
'countries': ','.join(['US', 'IN'])
}
data = requests.get(url, params=payload).json()
df = pd.DataFrame(data['result'])
Output:
>>> df
id title country ... ticker comment scale
0 312843 3-Month Bill Auction US ... NaN NaN NaN
1 312844 6-Month Bill Auction US ... NaN NaN NaN
2 316430 LMI Logistics Managers Index Current US ... USLMIC The Logistics Managers Survey is a monthly stu... NaN
3 316503 Exports US ... USEXP The United States is the world's third biggest... B
4 316504 Imports US ... USIMP The United States is the world's second-bigges... B
5 316505 Balance of Trade US ... USBOT The United States has been running consistent ... B
6 312845 Redbook YoY US ... USRI The Johnson Redbook Index is a sales-weighted ... NaN
7 316509 IBD/TIPP Economic Optimism US ... USEOI IBD/TIPP Economic Optimism Index measures Amer... NaN
8 337599 Fed Chair Powell Speech US ... USINTR In the United States, the authority to set int... NaN
9 334599 3-Year Note Auction US ... NaN NaN NaN
10 337600 Fed Barr Speech US ... USINTR In the United States, the authority to set int... NaN
11 316449 Consumer Credit Change US ... USCCR In the United States, Consumer Credit refers t... B
12 312846 API Crude Oil Stock Change US ... USCSC Stocks of crude oil refer to the weekly change... M
13 316575 Cash Reserve Ratio IN ... INCRR Cash Reserve Ratio is a specified minimum frac... NaN
14 334653 RBI Interest Rate Decision IN ... ININTR In India, interest rate decisions are taken by... NaN
15 312847 MBA 30-Year Mortgage Rate US ... USMR MBA 30-Year Mortgage Rate is average 30-year f... NaN
16 312848 MBA Mortgage Applications US ... USMAPL In the US, the MBA Weekly Mortgage Application... NaN
17 312849 MBA Mortgage Refinance Index US ... USMRI The MBA Weekly Mortgage Application Survey is ... NaN
18 312850 MBA Mortgage Market Index US ... USMMI The MBA Weekly Mortgage Application Survey is ... NaN
19 312851 MBA Purchase Index US ... USPIND NaN NaN
20 337604 Fed Williams Speech US ... USINTR In the United States, the authority to set int... NaN
21 316553 Wholesale Inventories MoM US ... USWI The Wholesale Inventories are the stock of uns... NaN
22 337601 Fed Barr Speech US ... USINTR In the United States, the authority to set int... NaN
23 312852 EIA Refinery Crude Runs Change US ... USRCR Crude Runs refer to the volume of crude oil co... M
24 312853 EIA Crude Oil Stocks Change US ... USCOSC Stocks of crude oil refer to the weekly change... M
25 312854 EIA Distillate Stocks Change US ... USDFS NaN M
26 312855 EIA Heating Oil Stocks Change US ... USHOS NaN M
27 312856 EIA Gasoline Production Change US ... USGPRO NaN M
28 312857 EIA Crude Oil Imports Change US ... USCOI NaN M
29 312858 EIA Gasoline Stocks Change US ... USGSCH Stocks of gasoline refers to the weekly change... M
30 312859 EIA Cushing Crude Oil Stocks Change US ... USCCOS Change in the number of barrels of crude oil h... M
31 312860 EIA Distillate Fuel Production Change US ... USDFP NaN M
32 337598 17-Week Bill Auction US ... NaN NaN NaN
33 334575 WASDE Report US ... NaN NaN NaN
34 334586 10-Year Note Auction US ... NaN Generally, a government bond is issued by a na... NaN
35 337602 Fed Waller Speech US ... USINTR In the United States, the authority to set int... NaN
36 312933 M3 Money Supply YoY IN ... INM3 India Money Supply M3 includes M2 plus long-te... NaN
37 312863 Jobless Claims 4-week Average US ... USJC4W NaN K
38 312864 Continuing Jobless Claims US ... USCJC Continuing Jobless Claims refer to actual numb... K
39 312865 Initial Jobless Claims US ... USIJC Initial jobless claims have a big impact in fi... K
40 312866 EIA Natural Gas Stocks Change US ... USNGSC Natural Gas Stocks Change refers to the weekly... B
41 312867 8-Week Bill Auction US ... NaN NaN NaN
42 312868 4-Week Bill Auction US ... NaN NaN NaN
43 334602 30-Year Bond Auction US ... NaN NaN NaN
44 312827 Deposit Growth YoY IN ... INDG In India, deposit growth refers to the year-ov... NaN
45 312869 Foreign Exchange Reserves IN ... INFER In India, Foreign Exchange Reserves are the fo... B
46 337022 Bank Loan Growth YoY IN ... INLG In India, bank loan growth refers to the year-... NaN
47 316685 Industrial Production YoY IN ... INIPYY In India, industrial production measures the o... NaN
48 316687 Manufacturing Production YoY IN ... INMPRYY Manufacturing production measures the output o... NaN
49 312902 Michigan Consumer Expectations Prel US ... USMCE The Index of Consumer Expectations focuses on ... NaN
50 312903 Michigan Current Conditions Prel US ... USMCEC The Index of Consumer Expectations focuses on ... NaN
51 312904 Michigan 5 Year Inflation Expectations Prel US ... USMIE5Y The Index of Consumer Expectations focuses on ... NaN
52 312905 Michigan Inflation Expectations Prel US ... USMIE1Y The Index of Consumer Expectations focuses on ... NaN
53 312906 Michigan Consumer Sentiment Prel US ... USCCI The Index of Consumer Expectations focuses on ... NaN
54 337603 Fed Waller Speech US ... USINTR In the United States, the authority to set int... NaN
55 312870 Baker Hughes Oil Rig Count US ... USCOR US Crude Oil Rigs refer to the number of activ... NaN
56 335652 Baker Hughes Total Rig Count US ... NaN US Total Rigs refer to the number of active US... NaN
57 335824 Monthly Budget Statement US ... USGBV Federal Government budget balance is the diffe... B
58 337605 Fed Harker Speech US ... USINTR In the United States, the authority to set int... NaN
[59 rows x 16 columns]
Info:
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59 entries, 0 to 58
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 59 non-null object
1 title 59 non-null object
2 country 59 non-null object
3 indicator 59 non-null object
4 period 59 non-null object
5 source 59 non-null object
6 actual 0 non-null object
7 previous 51 non-null float64
8 forecast 9 non-null float64
9 currency 59 non-null object
10 unit 28 non-null object
11 importance 59 non-null int64
12 date 59 non-null object
13 ticker 49 non-null object
14 comment 44 non-null object
15 scale 20 non-null object
dtypes: float64(2), int64(1), object(13)
memory usage: 7.5+ KB

How to loop through CSV to read and extract ranking according to one column?

I feel that I am going to ask a very basic question, but please bear with me.
I have 3 CSV files. I want to find the best and worst of a specific column.
I did for one CSV, somehow.
import pandas as pd
import os
import numpy as np
path = r"MyFolder"
file1 = '59107_20210630m.csv'
file2 = '65758_20210630m.csv'
file3 = '26389_20210630m.csv'
d1 = os.path.join(path,file1)
d2 = os.path.join(path,file2)
d3 = os.path.join(path,file3)
df1 = pd.read_csv(d1,dtype={'Fund':str, 'Identifier':str, 'Product Description':str, 'L/S':str},thousands=',')
df2 = pd.read_csv(d2,dtype={'Fund':str, 'Identifier':str, 'Product Description':str, 'L/S':str},thousands=',')
df3= pd.read_csv(d3,dtype={'Fund':str, 'Identifier':str, 'Product Description':str, 'L/S':str},thousands=',')
Best_Contributors = df1.sort_values('Net MTD P&L (Base)',
ascending=False).reset_index().head()[['Fund', 'Identifier', 'Product Description', 'L/S',
'% exp of NAV','Period % (Base)','Net MTD P&L (Base)','Contribution (bps)' ]]
Worst_Contributors = df1.sort_values('Net MTD P&L (Base)').reset_index().head()[['Fund', 'Identifier', 'Product Description', 'L/S',
'% exp of NAV','Period % (Base)','Net MTD P&L (Base)','Contribution (bps)']]
Fund = df1.iloc[0,0]
Fund
'59107'
Best_Contributors.style.set_caption(Fund+" Best_Contributors")
59107 Best_Contributors
Fund Identifier Product Description L/S % exp of NAV Period % (Base) Net MTD P&L (Base) Contribution (bps)
0 59107 4523 JP Equity EISAI CO LTD L 1.3 4.67 44 5.41060
1 59107 9517 JP Equity EREX CO LTD L 1.2 4.22 43 5.47042
2 59107 7203 JP Equity TOYOTA MOTOR CORP L 4.5 6.53 22 2.42082
3 59107 3382 JP Equity SEVEN & I HOLDINGS CO LTD L 2.3 1.68 18 2.45396
4 59107 6501 JP Equity HITACHI LTD L 1.9 1.01 17 2.51208
Worst_Contributors .style.set_caption(Fund+" Worst_Contributors")
59107 Worst_Contributors
Fund Identifier Product Description L/S % exp of NAV Period % (Base) Net MTD P&L (Base) Contribution (bps)
0 59107 6301 JP Equity KOMATSU LTD L 1.4 -1.40 -1680 -21.12414
1 59107 9984 JP Equity SOFTBANK GROUP CORP L 1.8 -5.94 -114 -14.41187
2 59107 3678 JP Equity MEDIA DO HOLDINGS CO LTD L 0.0 -1.90 -1133 -14.24195
3 59107 8630 JP Equity SOMPO HOLDINGS INC L 1.7 -6.36 -9766 -12.27612
4 59107 8750 JP Equity DAI-ICHI LIFE HOLDINGS INC L 1.2 -8.01 -931 -11.70994
How can I make it in a loop, to get 6 tables (dataframe) at once? Basically one set of Best_Contributors and Worst_Contributors for each fund.
Thank you.

Text Extraction from Images

I did extraction of text from image. I got unstructured data after extracting text. I have to convert this to a structured form but I'm not able to do the so.
The unstructured data extracted from image in python:
EQUITY-LARGE CAP ©# SBIMUTUAL FUND
A’ A PARTNER FOR LIFE
LSS LAST DIVIDENDS Ct EV a A)
i Option NAV #) Record Date Dividend (in /Unit) NAV (#)
BLUE CH | Pp FU N D Reg-Plan-Growth 34.9294 23-Sep-16 (Reg Plan) 1.00 18.5964
—————— a 23-Sep-16 (Dir Plan) 1.20 21.8569
= Reg-Plan-Dividend 19.8776 9 =
An Open-ended Growth Scheme = -Reg-Plan-Dividend 188776 TT a5 Reg Plan) 2.50 17.6880
Dir-Plan-Dividend 23.5613 17-Jul-15 (Dir Plan) 2.90 20.5395
. . ir a 21- Mar-14 (Reg Plan) 1.80 12.7618
Investment Objective Dir-Plan-Growth 36.2961
a. . a. Pursuant to payment of dividend, the NAV of Dividend Option of
To provide investors with opportunities scheme/plans would fall to the extent of payout and statutory levy, if
for long-term growth in capital through applicable.
anactive management of investments ina
diversified basket of equity stocks of
companies whose market capitalization
is at least equal to or more than the least PORTFOLIO
market capitalized stock of S&P BSE 100
face Stock Name (%) Of Total AUM Stock Name (%) Of Total AUM
. HDFC Bank Ltd. 8.29 Apollo Hospitals Enterprises Ltd. 1.04
Fund Details Larsen & Toubro Ltd. 4.46 Tata Motors Ltd. (Dvr-A-Ordy) 0.85
ITC Ltd. 4.07 Eicher Motors Ltd. 0.84
+ Type of Scheme UPL Ltd. 2.95 Shriram City Union Finance Ltd. 0.79
An Open - Ended Growth Scheme Infosys Ltd. 2.93 Divi's Laboratories Ltd. 0.73
Mahindra & Mahindra Ltd. 2.92 Pidilite Industries Ltd. 0.62
+ Date of Allotment: 14/02/2006 Nestle India Ltd. 2.90 Fag Bearings India Ltd. 0.62
. . Reliance Industries Ltd. 2.86 Sadbhav Engineering Ltd. 0.61
Reno AS ono /OG/2007 Indusind Bank Ltd. 2.68 Grasim Industries Ltd. 0.60
+ AAUM for the Month of June 2017 State Bank Of India 2.63 Petronet LNG Ltd. 0.60
214,204.29¢ Kotak Mahindra Bank Ltd. 2.57 Hudco Ltd. 0.58
, rores HCL Technologies Ltd. 2.50 Torrent Pharmaceuticals Ltd. 0.55
+» AUMas on June 30, 2017 Bharat Electronics Ltd. 2.48 Thermax Ltd. 0.52
% 14,292.59 Crores Cholamandalam Investment And Dr. Lal Path Labs Ltd. 0.49
: — - Finance Company Ltd. 2.36 Coal India Ltd. 0.44
+ Fund Manager: Ms. Sohini Andani Hero Motocorp Ltd. 2.16 Narayana Hrudayalaya Ltd. 0.41
Managing Since: Sep-2010 Hindustan Petroleum Corporation Ltd. 2.11 Britannia Industries Ltd. 0.40
i . Motherson Sumi Systems Ltd. 1.98 Tata Steel Ltd. 0.38
Total Experience: Over 22 years Maruti Suzuki India Ltd. 1.90 Procter & Gamble Hygiene And
+ Benchmark: S&P BSE 100 Index ICICI Bank Ltd. 1.88 Health Care Ltd. 0.38
— Sun Pharmaceuticals Industries Ltd. 1.66 SKF India Ltd. 0.35
+ Exit Load: HDFC Ltd. 1.66 ff Tata Motors Ltd. 0.26
For exit within 1 year from the date of Strides Shasun Ltd. 1.59 Equity Shares Total 90.22
allotment - 1%; For exit after 1 year Titan Company Ltd. 1.58 Motilal Oswal Securities Ltd
fi he d f n il Hindalco Industries Ltd. 1.57 CP Mat 28.07.2017. 0.42
rom the date of allotment - Ni Ultratech Cement Ltd. 1.52 [| Commercial Paper Total 0.42
+ Entry Load: N.A. Voltas Ltd. 1.48 HDFC Bank Ltd. 0.14
- - Mahindra & Mahindra Financial Services Ltd. 1.42 Fixed Deposits Total 0.14
+ Plans Available: Regular, Direct The Ramco Cements Ltd. 1.41 CBLO 8.24
. a ao PI Industries Ltd. 1.40 Cash & Other Receivables (4.29)
Options: Growth, Dividend Aurobindo Pharma Ltd. 1.39 Futures 4.72
+ SIP Indian Oil Corporation Ltd. 1.36 HDFC Ltd. 0.56
Weekly - Minimum & 1000 & in multiples The Federal Bank Ltd. 1.22 Warrants Total 0.56
LIC Housing Finance Ltd. 1.18 Grand Total 100.00
of = 1 thereafter for a minimum of 6 Shriram Transport Finance Company Ltd. 1.10
instalments.
Monthly - Minimum = 1000 & in
Eee ee aC PORTFOLIO CLASSIFICATION BY PORTFOLIO CLASSIFICATION BY
See ee eae Oe INDUSTRY ALLOCATION (%) ASSET ALLOCATION (%)
multiples of = 1 thereafter for minimum
one year. Financial Services 29.34
Quarterly - Minimum % 1500 & in Automobile 10.90 s.o6 172
multiples of = 1 thereafter for minimum ronsumer Goods 03
nergy :
one WEEN Construction 6.54 18.66
+ Minimum Investment Pharma 5.93 *
= 5000 & in multiples of = 1 IT 5.43
resi Fertilisers & Pesticides 4.35
. Additional Investment Industrial Manufacturing 3.97
< HOO © tho coawlittas Gtr Cement & Cement Products 3.53
Metals 2.39 71.55
Quantitative Data Healthcare Services 1.93
Chemicals 0.62
Standard Deviation® 112.21% Cash & Other Recivables -4.29 L c = Mia
mLarge Cap jidcap
Beta* :0.86 Futures 4.72
ae cBLO 8.24
Sharpe Ratio’ 0.76 Fixed Deposits 0.14 m Cash & Other Current Assets Futures
Portfolio Turnover* 11.03
*Source: CRISIL Fund Analyser Riskometor SBI Blue Chip Fund
“Portfolio Turnover = lower of total sale or one] > This product is suitable for investors who are seeking:
total purchase for the last 12 months L\E * Long term capital appreciation,
Fe on C aL a GCM cL OT LT Ss BAA Z*3\ * Investment in equity shares of companies whose market capitalization is at least equal to or more
Risk Free rate: FBIL Overnight Mibor rate Inve EE sical than the least market capitalized stock of S&P BSE 100 index to provide long term capital growth
(6.25% as on 30th June 2017) Basis for will best Moderately Highrisk | OPPOrtunities.
Ratio Calculation: eavcarsiMonthiy{Data ‘Alnvestors should consult their financial advisers if in doubt about whether the product is suitable for them.
The image:
Please help to convert this unstructured data to structure data. Any library or any function suggested?
You need to have certain parameters to split,
text=inp_text.split(".\n")## this will help to split where full stop and new line starts
text= re.split('\s{4,}',inp_text) ## this will help to split where atleast 4 white spaces

Pandas add value to inner level of hierarchical index

I have a Pandas DataFrame with a hierarchical index (MultiIndex). I created this DataFrame by grouping values for "cousub" and "year".
annualMed = df.groupby(["cousub", "year"])[["ratio", "sr_val_transfer"]].median().round(2)
print annualMed.head(8)
ratio sr_val_transfer
cousub year
Allen Park city 2013 0.51 75000.0
2014 0.47 85950.0
2015 0.47 95030.0
2016 0.45 102500.0
Belleville city 2013 0.49 113900.0
2014 0.55 114750.0
2015 0.53 149000.0
2016 0.48 121500.0
I would like to add an "overall" value in the "year" level that I could then populate with values based on a grouping of "cousub" alone, i.e., excluding "year". I would like the result to look like the following
ratio sr_val_transfer
cousub year
Allen Park city 2013 0.51 75000.0
2014 0.47 85950.0
2015 0.47 95030.0
2016 0.45 102500.0
Overall 0.50 90000.0
Belleville city 2013 0.49 113900.0
2014 0.55 114750.0
2015 0.53 149000.0
2016 0.48 121500.0
Overall 0.50 135000.0
How can I add this new item to the "years" level of the MultiIndex?
If you want to just add these two columns explicitly, you could just specify all the MultiIndex levels with loc.
df.loc[('Allen Park city', 'Overall'), :] = (0.50, 90000.)
df.loc[('Belleville city', 'Overall'), :] = (0.50, 135000.)
If you had a whole list of cities that you wanted to add this row for however, this would be a bit tedious. Maybe you could append another DataFrame with the overall values with a bit of index manipulation.
(df.reset_index()
.append(pd.DataFrame([['Allen Park city', 'Overall', 0.5, 90000.],
['Belleville city', 'Overall', 0.5, 135000.]],
columns=list(df.index.names) + list(df.columns)))
.set_index(df.index.names)
.sort_index())
Demo
Method 1 (smaller case)
>>> df.loc[('Allen Park city', 'Overall'), :] = (0.50, 90000.)
>>> df.loc[('Belleville city', 'Overall'), :] = (0.50, 135000.)
>>> df.sort_index()
ratio sr_val_transfer
cousub year
Allen Park city 2013 0.51 75000.0
2014 0.47 85950.0
2015 0.47 95030.0
2016 0.45 102500.0
Overall 0.50 90000.0
Belleville city 2013 0.49 113900.0
2014 0.55 114750.0
2015 0.53 149000.0
2016 0.48 121500.0
Overall 0.50 135000.0
Method 2 (larger case)
>>> (df.reset_index()
.append(pd.DataFrame([['Allen Park city', 'Overall', 0.5, 90000.],
['Belleville city', 'Overall', 0.5, 135000.]],
columns=list(df.index.names) + list(df.columns)))
.set_index(df.index.names)
.sort_index())
ratio sr_val_transfer
cousub year
Allen Park city 2013 0.51 75000.0
2014 0.47 85950.0
2015 0.47 95030.0
2016 0.45 102500.0
Overall 0.50 90000.0
Belleville city 2013 0.49 113900.0
2014 0.55 114750.0
2015 0.53 149000.0
2016 0.48 121500.0
Overall 0.50 135000.0

Pandas drop unique row in order to use groupby and qcut

How do I drop unique? It is interfering with groupby and qcut.
df0 = psql.read_frame(sql_query,conn)
df = df0.sort(['industry','C'], ascending=[False,True] )
Here is my dataframe:
id industry C
5 28 other industry 0.22
9 32 Specialty Eateries 0.60
10 33 Restaurants 0.84
1 22 Processed & Packaged Goods 0.07
0 21 Processed & Packaged Goods 0.14
8 31 Processed & Packaged Goods 0.43
11 34 Major Integrated Oil & Gas 0.07
14 37 Major Integrated Oil & Gas 0.50
15 38 Independent Oil & Gas 0.06
18 41 Independent Oil & Gas 0.06
19 42 Independent Oil & Gas 0.13
12 35 Independent Oil & Gas 0.43
16 39 Independent Oil & Gas 0.65
17 40 Independent Oil & Gas 0.91
13 36 Independent Oil & Gas 2.25
2 25 Food - Major Diversified 0.35
3 26 Beverages - Soft Drinks 0.54
4 27 Beverages - Soft Drinks 0.73
6 29 Beverages - Brewers 0.19
7 30 Beverages - Brewers 0.21
And I've used the following code from pandas and qcut to rank column 'C' which sadly went batsh*t on me.
df['rank'] = df.groupby(['industry'])['C'].transform(lambda x: pd.qcut(x,5, labels=range(1,6)))
After researching a bit, the reason qcut threw errors is because of the unique value for industry column, reference to error and another ref to err.
Although, I still want to be able to rank without throwing out unique (unique should be assign to the value of 1) if that is possible. But after so many tries, I am convinced that qcut can't handle unique and so I am willing to settle for dropping unique to make qcut happy doing its thing.
But if there is another way, I'm very curious to know. I really appreciate your help.
Just in case anyone still wants to do this. You should be able to do it by selecting only duplicates?
df = df[df['industry'].duplicated(keep=False)]

Categories

Resources