how to scrape Wikipedia tables with Python - python

i want to extract table url is https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia
my code is not giving data.how we can get?
Code:
import requests
from bs4 import BeautifulSoup as bs
url = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
html = requests.get(url).text
soup = bs(html, 'html.parser')
ta=soup.find_all('table',class_="wikitable sortable jquery-tablesorter")
print(ta)

If I'm pulling a table and see <table> tags, I would always try first Pandas .read_html(). It'll do the iterating over rows for you. Most of the time you can get exactly what you need, or at the very least only have to do some minor manipulationg of the dataframe. In this case, it gives you the full table nicely:
import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
table = pd.read_html(url)[1]
Output:
print (table.to_string())
0 1 2 3 4 5
0 Name Industry Sector Headquarters Founded Notes
1 Airfast Indonesia Consumer services Airlines Tangerang 1971 Private airline
2 Angkasa Pura Industrials Transportation services Jakarta 1962 State-owned airports
3 Astra International Conglomerates - Jakarta 1957 Automotive, financials, industrials, technology
4 Bank Central Asia Financials Banks Jakarta 1957 Bank
5 Bank Danamon Financials Banks Jakarta 1956 Bank
6 Bank Mandiri Financials Banks Jakarta 1998 Bank
7 Bank Negara Indonesia Financials Banks Jakarta 1946 Bank
8 Bank Rakyat Indonesia Financials Banks Jakarta 1895 Micro-finance bank
9 Bumi Resources Basic materials General mining Jakarta 1973 Mining
10 Djarum Consumer goods Tobacco Kudus and Jakarta 1951 Tobacco
11 Dragon Computer & Communication Technology Computer hardware Jakarta 1980 Computer hardware
12 Elex Media Komputindo Consumer services Publishing Jakarta 1985 Publisher
13 Femina Consumer services Media Jakarta 1972 Weekly magazine
14 Garuda Indonesia Consumer services Travel & leisure Tangerang 1949 State-owned airline
15 Gudang Garam Consumer goods Tobacco Kediri 1958 Tobacco
16 Gunung Agung Consumer services Specialty retailers Jakarta 1953 Bookstores
17 Indocement Tunggal Prakarsa Industrials Building materials & fixtures Jakarta 1985 Cement, part of HeidelbergCement (Germany)
18 Indofood Consumer goods Food products Jakarta 1968 Food production
19 Indonesian Aerospace Industrials Aerospace Bandung 1976 State-owned aircraft design
20 Indonesian Bureau of Logistics Consumer goods Food products Jakarta 1967 Food distribution
21 Indosat Telecommunications Fixed line telecommunications Jakarta 1967 Telecommunications network
22 Infomedia Nusantara Consumer services Publishing Jakarta 1975 Directory publisher
23 Jalur Nugraha Ekakurir (JNE) Industrials Delivery services Jakarta 1990 Express logistics
24 Kalbe Farma Health care Pharmaceuticals Jakarta 1966 Pharmaceuticals
25 Kereta Api Indonesia Industrials Railroads Bandung 1945 State-owned railway
26 Kimia Farma Health care Pharmaceuticals Jakarta 1971 State-owned pharma
27 Kompas Gramedia Group Consumer services Media agencies Jakarta 1965 Media holding
28 Krakatau Steel Basic materials Iron & steel Cilegon 1970 State-owned steel
29 Lion Air Consumer services Airlines Jakarta 2000 Low-cost airline
30 Lippo Group Financials Real estate holding & development Jakarta 1950 Development
31 Matahari Consumer services Broadline retailers Tangerang 1982 Department stores
32 MedcoEnergi Oil & gas Exploration & production Jakarta 1980 Energy, oil and gas
33 Media Nusantara Citra Consumer services Broadcasting & entertainment Jakarta 1997 Media
34 Panin Sekuritas Financials Investment services Jakarta 1989 Broker
35 Pegadaian Financials Consumer finance Jakarta 1901 State-owned financial services
36 Pelni Industrials Marine transportation Jakarta 1952 Shipping
37 Pos Indonesia Industrials Delivery services Bandung 1995 State-owned postal service
38 Pertamina Oil & gas Integrated oil & gas Jakarta 1957 State-owned oil and natural gas
39 Perusahaan Gas Negara Oil & gas Exploration & production Jakarta 1965 Gas
40 Perusahaan Gas Negara Utilities Gas distribution Jakarta 1965 State-owned natural gas transportation
41 Perusahaan Listrik Negara Utilities Conventional electricity Jakarta 1945 State-owned electrical distribution
42 Phillip Securities Indonesia, PT Financials Investment services Jakarta 1989 Financial services
43 Pindad Industrials Defense Bandung 1808 State-owned defense
44 PT Lapindo Brantas Oil & gas Exploration & production Jakarta 1996 Oil and gas
45 PT Metro Supermarket Realty Tbk Consumer services Food retailers & wholesalers Jakarta 1955 Supermarkets
46 Salim Group Conglomerates - Jakarta 1972 Industrials, financials, consumer goods
47 Sampoerna Consumer goods Tobacco Surabaya 1913 Tobacco
48 Semen Indonesia Industrials Building materials & fixtures Gresik 1957 Cement
49 Susi Air Consumer services Airlines Pangandaran 2004 Charter airline
50 Telkom Indonesia Telecommunications Fixed line telecommunications Bandung 1856 Telecommunication services
51 Telkomsel Telecommunications Mobile telecommunications Jakarta 1995 Mobile network, part of Telkom Indonesia
52 Trans Corp Conglomerates - Jakarta 2006 Media, consumer services, real estate, part of...
53 Unilever Indonesia Consumer goods Personal products Jakarta 1933 Personal care products, part of Unilever (Neth...
54 United Tractors Industrials Commercial vehicles & trucks Jakarta 1972 Heavy equipment
55 Waskita Industrials Heavy construction Jakarta 1961 State-owned construction

import requests
from bs4 import BeautifulSoup as bs
URL = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
html = requests.get(url).text
soup = bs(html, 'html.parser')
ta=soup.find_all('table',{'class':'wikitable'})
print(ta)
You can search for table by class name using old way. It seems still working.

Fixes:
Use URL instead of url in your code (line 4)
Use class wikitable
Optimized your code a little
Hence:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia")
soup = BeautifulSoup(page.content, 'html.parser')
ta = soup.find_all('table',class_="wikitable")
print(ta)
OUTPUT:
[<table class="wikitable sortable">
<tbody><tr>
<th>Rank
</th>
<th>Image
</th>
<th>Name
</th>
<th>2016 Revenues (USD $M)
</th>
<th>Employees
</th>
<th>Notes
.
.
.

Maybe it's not what you are looking for. But you can try this one.
import requests
from bs4 import BeautifulSoup as bs
url = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
html = requests.get(url).text
soup = bs(html, 'html.parser')
for data in soup.find_all('table', {"class":"wikitable"}):
for td in data.find_all('td'):
for link in td.find_all('a'):
print (link.text)

try the below,
import requests
from bs4 import BeautifulSoup as bs
URL = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
html = requests.get(URL).text
soup = bs(html, 'html.parser')
ta=soup.find("table",{"class":"wikitable sortable"})
print(ta)
to get all the tables
ta=soup.find_all("table",{"class":"wikitable sortable"})

If you want to parse table data then you can do this using pandas and very efficient if you want to manipulate the table data, you can navigate the table using pandas DataFrame()
import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
table = pd.read_html(url,header=0)
print(table[1])

Related

Getting 500 Response on Python Requests

import requests
url = "https://uk.eu-supply.com/ctm/supplier/publictenders?B=BLUELIGHT"
payload={'Branding': 'BLUELIGHT',
'SearchFilter.BrandingCode': 'BLUELIGHT',
'CpvContainer.CpvIds': '',
'CpvContainer.CpvIds': '',
'SearchFilter.PagingInfo.PageNumber': '2',
'SearchFilter.PagingInfo.PageSize': '25'}
files=[
]
headers = {
'Cookie': 'EUSSESSION=e3cc7bc4-ea51-4c4b-99f7-2ebde589c8e0'
}
response = requests.request("POST", url, headers=headers, data=payload, files=files)
print(response.status_code)
eu-supply.com
https://uk.eu-supply.com
I am trying to hit request to this site, the same payload and url request works with postman. Notice (two same fields on payload "CpvContainer.CpvIds", this is must. In order to get 200 response , we need to pass this field twice. But as dict can only set one key unique. I am getting 500.
Make the repeated key's values as a list:
payload={
...
'CpvContainer.CpvIds': ['',''],
...
}
To read a table from the page, simple pd.read_html is enought:
import pandas as pd
url = "https://uk.eu-supply.com/ctm/supplier/publictenders?B=BLUELIGHT"
df = pd.read_html(url)[0]
print(df)
Prints:
Quote/tender Id Reference Name Date of publication Response deadline (UK time) Process Buyers Countries
0 45024 7F- 2020-0278 Kennelling of Seized & Dangerous Dogs (Kent Police and Essex Police) 25/06/2021 26/07/2021 12:00 Above Threshold OPEN Procedure 7 Forces Procurement United Kingdom
1 45074 CONT0056 Leaflets for Crime Commissioner 25/06/2021 05/07/2021 12:00 Tender/Quotation (below Threshold value) Leicestershire Police United Kingdom
2 45072 7F-2021-P038 Bury St Edmunds Police Station - Car Park Extension 25/06/2021 26/07/2021 12:00 Formal Tender/Quotation (Advertised) (over £50,000) 7 Forces Procurement United Kingdom
3 44234 7F - 2020 - 0387 Fire Doors Inspection and Maintenance PMA 24/06/2021 26/07/2021 10:00 Above Threshold OPEN Procedure 7 Forces Procurement United Kingdom
4 45068 7F-2021-P040 Armoury Improvements- Bury St Edmunds Police Station 24/06/2021 26/07/2021 12:00 Formal Tender/Quotation (Advertised) (over £50,000) 7 Forces Procurement United Kingdom
5 45042 BLC-Aviation-003 Provision of Police Aviation Services including a Fleet Replacement Programme 23/06/2021 06/07/2021 12:00 Prior Information Notice (PIN) (Standalone) BlueLight Commercial United Kingdom
6 42896 DP0570 Reactive Mechanical Services and Small Works Projects 23/06/2021 03/08/2021 12:00 Above Threshold OPEN Procedure The Police and Crime Commissioner for Derbyshire United Kingdom
7 45043 BLC-Aviation-003 Market Engagement Event for the Provision of Police Aviation Services including a Fleet Replacement Programme 23/06/2021 06/07/2021 12:00 Request for Information /Market Consultation BlueLight Commercial United Kingdom
8 44914 20-21 EM15 Asbestos Management Services Contract 22/06/2021 20/07/2021 12:00 Formal Tender/Quotation (Advertised) ( up to £50,000) Tyne and Wear Fire and Rescue Service (TWFRS) United Kingdom
9 44793 1363-2016 Recovery of loose, stray and abandoned horses 21/06/2021 21/07/2021 14:00 Prior Information Notice (PIN) (Standalone) West Yorkshire Combined Authority United Kingdom
10 44734 7F 2020 0339 Bodyshop Replacement and Repairs 21/06/2021 23/07/2021 12:00 Above Threshold OPEN Procedure 7 Forces Procurement United Kingdom
11 44984 NaN Integrated Communications Control System (ICCS) for Northamptonshire Police 18/06/2021 30/06/2021 17:00 Prior Information Notice (PIN) (Standalone) MINT Commercial Services LLP United Kingdom
12 44983 WMP0126 Garage Equipment - Maintenance, Calibration and Repair 18/06/2021 19/07/2021 12:00 Tender/Quotation (below Threshold value) West Midlands Police United Kingdom
13 44962 DHR Chair & Author Domestic Homicide Review Chair & Author 17/06/2021 05/07/2021 12:00 Prior Information Notice (PIN) (Standalone) 7 Forces Procurement United Kingdom
14 44950 1197 Access Control/Automated Security Gate Maintenance and Related Services 17/06/2021 23/07/2021 14:00 Formal Tender/Quotation (Advertised) (over £50,000) Royal Berkshire Fire Authority United Kingdom
15 44959 627_RFI Evidence Based Practice Partnership Board 17/06/2021 16/07/2021 17:00 Request for Information /Market Consultation Chief Constable for Devon and Cornwall Police United Kingdom
16 44944 W344 Repair & Maintenance of UPS Systems 16/06/2021 13/07/2021 12:00 Tender/Quotation (below Threshold value) Northumbria Police United Kingdom
17 44942 7F-2021-C058 Specialist Accredited Fire Investigation 16/06/2021 30/06/2021 12:00 Request for Information /Market Consultation 7 Forces Procurement United Kingdom
18 44916 WYP/CS/019/JB Fire Risk Assessments 16/06/2021 30/06/2021 14:00 Formal Tender/Quotation (Advertised) ( up to £50,000) West Yorkshire Combined Authority United Kingdom
19 44594 7F-2020-0273 Norfolk Integrated Domestic Abuse Service 15/06/2021 20/07/2021 12:00 Above Threshold OPEN Procedure 7 Forces Procurement United Kingdom
20 44911 SPCC Sussex PCC RfQ - An evaluation for the domestic abuse perpetrator programme 15/06/2021 30/06/2021 12:00 Formal Tender/Quotation (Advertised) ( up to £50,000) Office of the Sussex Police & Crime Commissioner United Kingdom
21 44543 2420-2021 Force Medical Advisor (FMA) / Selected Medical Practitioner (SMP) and other Occupational Health Services 15/06/2021 14/07/2021 13:00 Above Threshold OPEN Procedure West Yorkshire Combined Authority United Kingdom
22 44845 C003196 THERAPEUTIC INTERVENTIONS AND TRAINING PROVISION FOR WMFRA 15/06/2021 06/07/2021 16:00 Formal Tender/Quotation (Advertised) (over £30,000) West Midlands Fire and Rescue Authority United Kingdom
23 44544 ESFA0199 CMI Leadership and Management Online Training Courses 14/06/2021 09/07/2021 12:00 Formal Tender/Quotation (Advertised) (over £25,000) East Sussex Fire Authority United Kingdom
24 44714 590 Regional Grounds Maintenance and Winter Salting Service 11/06/2021 15/07/2021 12:00 Above Threshold OPEN Procedure Chief Constable for Devon and Cornwall Police United Kingdom
In the above API use the payload as raw text. I cross-checked the below code is successfully fetching the data
import requests
url = "https://uk.eu-supply.com/ctm/Supplier/publictenders/PublicTenders"
payload="Branding=BLUELIGHT&SearchFilter.BrandingCode=BLUELIGHT&CpvContainer.CpvIds=&CpvContainer.CpvIds=&SearchFilter.PagingInfo.PageNumber=2&SearchFilter.PagingInfo.PageSize=25"
headers = {
'Connection': 'keep-alive',
'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
'Accept': '*/*',
'X-Requested-With': 'XMLHttpRequest',
'sec-ch-ua-mobile': '?0',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Origin': 'https://uk.eu-supply.com',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Dest': 'empty',
'Referer': 'https://uk.eu-supply.com/ctm/supplier/publictenders?B=BLUELIGHT',
'Accept-Language': 'en-US,en;q=0.9',
'Cookie': 'EUSSESSION=b18ce90d-32b9-429c-8d74-7151a997e8cd'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)

How to get a list of tickers in Jupyter Notebook?

Write code to get a list of tickers for all S&P 500 stocks from Wikipedia. As of 2/24/2021, there are 505 tickers in that list. You can use any method you want as long as the code actually queries the following website to get the list:
https://en.wikipedia.org/wiki/List_of_S%26P_500_companies
One way would be to use the requests module to get the HTML code and then use the re module to extract the tickers. Another option would be the .read_html function in pandas and then export the tickers column to a list.
You should save the tickers in a list with the name sp500_tickers
This will grab the data in the table named 'constituents'.
# find a specific table by table count
import pandas as pd
import requests
from bs4 import BeautifulSoup
res = requests.get("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
print(df[0].to_json(orient='records'))
Result:
[{"Symbol":"MMM","Security":"3M Company","SEC filings":"reports","GICS Sector":"Industrials","GICS Sub-Industry":"Industrial Conglomerates","Headquarters Location":"St. Paul, Minnesota","Date first added":"1976-08-09","CIK":66740,"Founded":"1902"},{"Symbol":"ABT","Security":"Abbott Laboratories","SEC filings":"reports","GICS Sector":"Health Care","GICS Sub-Industry":"Health Care Equipment","Headquarters Location":"North Chicago, Illinois","Date first added":"1964-03-31","CIK":1800,"Founded":"1888"},{"Symbol":"ABBV","Security":"AbbVie Inc.","SEC filings":"reports","GICS Sector":"Health Care","GICS Sub-Industry":"Pharmaceuticals","Headquarters Location":"North Chicago, Illinois","Date first added":"2012-12-31","CIK":1551152,"Founded":"2013 (1888)"},{"Symbol":"ABMD","Security":"Abiomed","SEC filings":"reports","GICS Sector":"Health Care","GICS Sub-Industry":"Health Care Equipment","Headquarters Location":"Danvers, Massachusetts","Date first added":"2018-05-31","CIK":815094,"Founded":"1981"},{"Symbol":"ACN","Security":"Accenture","SEC filings":"reports","GICS Sector":"Information Technology","GICS Sub-Industry":"IT Consulting & Other Services","Headquarters Location":"Dublin, Ireland","Date first added":"2011-07-06","CIK":1467373,"Founded":"1989"},{"Symbol":"ATVI","Security":"Activision Blizzard","SEC filings":"reports","GICS Sector":"Communication Services","GICS Sub-Industry":"Interactive Home Entertainment","Headquarters Location":"Santa Monica, California","Date first added":"2015-08-31","CIK":718877,"Founded":"2008"},{"Symbol":"ADBE","Security":"Adobe Inc.","SEC filings":"reports","GICS Sector":"Information Technology","GICS Sub-Industry":"Application Software","Headquarters Location":"San Jose, California","Date first added":"1997-05-05","CIK":796343,"Founded":"1982"},
Etc., etc., etc.
That's JSON. If you want a table, kind of like what you would use in Excel, simply print the df.
Result:
[ Symbol Security SEC filings GICS Sector \
0 MMM 3M Company reports Industrials
1 ABT Abbott Laboratories reports Health Care
2 ABBV AbbVie Inc. reports Health Care
3 ABMD Abiomed reports Health Care
4 ACN Accenture reports Information Technology
.. ... ... ... ...
500 YUM Yum! Brands Inc reports Consumer Discretionary
501 ZBRA Zebra Technologies reports Information Technology
502 ZBH Zimmer Biomet reports Health Care
503 ZION Zions Bancorp reports Financials
504 ZTS Zoetis reports Health Care
GICS Sub-Industry Headquarters Location \
0 Industrial Conglomerates St. Paul, Minnesota
1 Health Care Equipment North Chicago, Illinois
2 Pharmaceuticals North Chicago, Illinois
3 Health Care Equipment Danvers, Massachusetts
4 IT Consulting & Other Services Dublin, Ireland
.. ... ...
500 Restaurants Louisville, Kentucky
501 Electronic Equipment & Instruments Lincolnshire, Illinois
502 Health Care Equipment Warsaw, Indiana
503 Regional Banks Salt Lake City, Utah
504 Pharmaceuticals Parsippany, New Jersey
Date first added CIK Founded
0 1976-08-09 66740 1902
1 1964-03-31 1800 1888
2 2012-12-31 1551152 2013 (1888)
3 2018-05-31 815094 1981
4 2011-07-06 1467373 1989
.. ... ... ...
500 1997-10-06 1041061 1997
501 2019-12-23 877212 1969
502 2001-08-07 1136869 1927
503 2001-06-22 109380 1873
504 2013-06-21 1555280 1952
[505 rows x 9 columns]]
Alternatively, you can export the df to a CSV file.
df.to_csv('constituents.csv')

Value Count combing two columns

import pandas as pd
import seaborn as sns
dfexcel= pd.read_excel('https://raw.githubusercontent.com/ArsenioMGonzalez3/Project3_ABDS/master/Open%20Parking%20and%20Camera%20Violations_OH%20NY_2019_2020%20YTD.xlsx')
dfexcel = dfexcel[['Issuing Agency','State']].sort_values(by = 'Issuing Agency' , ascending=False)
dfexcel
this code generates all the violations issued by each different agency for vechicles registered to either NY or OH.
how can I see how many violation each agency issued for both NY/OH
for example: Traffic agency issused 42 for NY and 2 for OH
You can use groupby then get the size of each group:
dfexcel.groupby(['State','Issuing Agency']).size()
Output:
State Issuing Agency
NY CON RAIL 2
DEPARTMENT OF SANITATION 2457
DEPARTMENT OF TRANSPORTATION 22065
FIRE DEPARTMENT 8
HOUSING AUTHORITY 2
NYC TRANSIT AUTHORITY MANAGERS 6
NYS OFFICE OF MENTAL HEALTH POLICE 2
OTHER/UNKNOWN AGENCIES 239
PARKS DEPARTMENT 54
POLICE DEPARTMENT 10340
PORT AUTHORITY 5
TRAFFIC 26344
OH DEPARTMENT OF SANITATION 7
DEPARTMENT OF TRANSPORTATION 47
PARKS DEPARTMENT 1
POLICE DEPARTMENT 13
TRAFFIC 35

scrappig an HTML tag on the web page using BS

Image description is:
Tag is:
I was looking to get the data in the stock dropdown. I went into the source and found the tag but I can't get the code to access the data. Can someone please help me fix the bug?
url ="http://www.moneycontrol.com/india/fnoquote/reliance-industries/RI/2020-07-30"
from bs4 import BeautifulSoup
import pandas as pd
r = requests.get(url, headers = headers)
soup = BeautifulSoup(r.content, "html.parser")
for i in soup.select("stock_id"):
print(i.text)
You can use #stock_code > option instead of stock_id to get the data in the stock dropdown.You can try it:
url ="http://www.moneycontrol.com/india/fnoquote/reliance-industries/RI/2020-07-30"
headers = {'User-Agent': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'}
from bs4 import BeautifulSoup
import requests
r = requests.get(url, headers = headers)
soup = BeautifulSoup(r.content, "html.parser")
a = soup.select("#stock_code > option")
for i in a:
print(i.text)
Output will be:
ACC
Adani Enterpris
Adani Ports
Adani Power
Ajanta Pharma
Allahabad Bank
Amara Raja Batt
Ambuja Cements
Apollo Hospital
Apollo Tyres
Arvind
Ashok Leyland
Asian Paints
Aurobindo Pharm
Axis Bank
Bajaj Auto
Bajaj Finance
Bajaj Finserv
Balkrishna Ind
Bank of Baroda
Bank of India
Bata India
BEML
Berger Paints
Bharat Elec
Bharat Fin
Bharat Forge
Bharti Airtel
Bharti Infratel
BHEL
Biocon
Bosch
BPCL
Britannia
Cadila Health
Can Fin Homes
Canara Bank
Capital First
Castrol
Ceat
Century
CESC
CG Power
Chennai Petro
Cholamandalam
Cipla
Coal India
Colgate
Container Corp
Cummins
Dabur India
Dalmia Bharat
DCB Bank
Dewan Housing
Dish TV
Divis Labs
DLF
Dr Reddys Labs
Eicher Motors
EngineersInd
Equitas Holding
Escorts
Exide Ind
Federal Bank
GAIL
Glenmark
GMR Infra
Godfrey Phillip
Godrej Consumer
Godrej Ind
Granules India
Grasim
GSFC
Havells India
HCL Tech
HDFC
HDFC Bank
Hero Motocorp
Hexaware Tech
Hind Constr
Hind Zinc
Hindalco
HPCL
HUL
ICICI Bank
ICICI Prudentia
IDBI Bank
IDFC
IDFC Bank
IFCI
IGL
India Cements
Indiabulls Hsg
Indian Bank
IndusInd Bank
Infibeam Avenue
Infosys
Interglobe Avi
IOC
IRB Infra
ITC
Jain Irrigation
Jaiprakash Asso
Jet Airways
Jindal Steel
JSW Steel
Jubilant Food
Just Dial
Kajaria Ceramic
Karnataka Bank
Kaveri Seed
Kotak Mahindra
KPIT Tech
L&T Finance
Larsen
LIC Housing Fin
Lupin
M&M
M&M Financial
Mahanagar Gas
Manappuram Fin
Marico
Maruti Suzuki
Max Financial
MCX India
Mindtree
Motherson Sumi
MRF
MRPL
Muthoot Finance
NALCO
NBCC (India)
NCC
Nestle
NHPC
NIIT Tech
NMDC
NTPC
Oil India
ONGC
Oracle Fin Serv
Oriental Bank
Page Industries
PC Jeweller
Petronet LNG
Pidilite Ind
Piramal Enter
PNB
Power Finance
Power Grid Corp
PTC India
PVR
Ramco Cements
Raymond
RBL Bank
REC
Rel Capital
Reliance
Reliance Comm
Reliance Infra
Reliance Power
Repco Home
SAIL
SBI
Shree Cements
Shriram Trans
Siemens
South Ind Bk
SREI Infra
SRF
Strides Pharma
Sun Pharma
Sun TV Network
Suzlon Energy
Syndicate Bank
Tata Chemicals
Tata Comm
Tata Elxsi
Tata Global Bev
Tata Motors
Tata Motors (D)
Tata Power
Tata Steel
TCS
Tech Mahindra
Titan Company
Torrent Pharma
Torrent Power
TV18 Broadcast
TVS Motor
Ujjivan Financi
UltraTechCement
Union Bank
United Brewerie
United Spirits
UPL
V-Guard Ind
Vedanta
Vodafone Idea
Voltas
Wipro
Wockhardt
Yes Bank
Zee Entertain
Select
ACC
Adani Enterpris
Adani Ports
Adani Power
Ajanta Pharma
Allahabad Bank
Amara Raja Batt
Ambuja Cements
Apollo Hospital
Apollo Tyres
Arvind
Ashok Leyland
Asian Paints
Aurobindo Pharm
Axis Bank
Bajaj Auto
Bajaj Finance
Bajaj Finserv
Balkrishna Ind
Bank of Baroda
Bank of India
Bata India
BEML
Berger Paints
Bharat Elec
Bharat Fin
Bharat Forge
Bharti Airtel
Bharti Infratel
BHEL
Biocon
Bosch
BPCL
Britannia
Cadila Health
Can Fin Homes
Canara Bank
Capital First
Castrol
Ceat
Century
CESC
CG Power
Chennai Petro
Cholamandalam
Cipla
Coal India
Colgate
Container Corp
Cummins
Dabur India
Dalmia Bharat
DCB Bank
Dewan Housing
Dish TV
Divis Labs
DLF
Dr Reddys Labs
Eicher Motors
EngineersInd
Equitas Holding
Escorts
Exide Ind
Federal Bank
GAIL
Glenmark
GMR Infra
Godfrey Phillip
Godrej Consumer
Godrej Ind
Granules India
Grasim
GSFC
Havells India
HCL Tech
HDFC
HDFC Bank
Hero Motocorp
Hexaware Tech
Hind Constr
Hind Zinc
Hindalco
HPCL
HUL
ICICI Bank
ICICI Prudentia
IDBI Bank
IDFC
IDFC Bank
IFCI
IGL
India Cements
Indiabulls Hsg
Indian Bank
IndusInd Bank
Infibeam Avenue
Infosys
Interglobe Avi
IOC
IRB Infra
ITC
Jain Irrigation
Jaiprakash Asso
Jet Airways
Jindal Steel
JSW Steel
Jubilant Food
Just Dial
Kajaria Ceramic
Karnataka Bank
Kaveri Seed
Kotak Mahindra
KPIT Tech
L&T Finance
Larsen
LIC Housing Fin
Lupin
M&M
M&M Financial
Mahanagar Gas
Manappuram Fin
Marico
Maruti Suzuki
Max Financial
MCX India
Mindtree
Motherson Sumi
MRF
MRPL
Muthoot Finance
NALCO
NBCC (India)
NCC
Nestle
NHPC
NIIT Tech
NMDC
NTPC
Oil India
ONGC
Oracle Fin Serv
Oriental Bank
Page Industries
PC Jeweller
Petronet LNG
Pidilite Ind
Piramal Enter
PNB
Power Finance
Power Grid Corp
PTC India
PVR
Ramco Cements
Raymond
RBL Bank
REC
Rel Capital
Reliance
Reliance Comm
Reliance Infra
Reliance Power
Repco Home
SAIL
SBI
Shree Cements
Shriram Trans
Siemens
South Ind Bk
SREI Infra
SRF
Strides Pharma
Sun Pharma
Sun TV Network
Suzlon Energy
Syndicate Bank
Tata Chemicals
Tata Comm
Tata Elxsi
Tata Global Bev
Tata Motors
Tata Motors (D)
Tata Power
Tata Steel
TCS
Tech Mahindra
Titan Company
Torrent Pharma
Torrent Power
TV18 Broadcast
TVS Motor
Ujjivan Financi
UltraTechCement
Union Bank
United Brewerie
United Spirits
UPL
V-Guard Ind
Vedanta
Vodafone Idea
Voltas
Wipro
Wockhardt
Yes Bank
Zee Entertain

Inserting a header row for pandas dataframe

I have just started python and am trying to rewrite one of my perl scripts in python. Essentially, I had a long script to convert a csv to json.
I've tried to import my csv into a pandas dataframe, and wanted to insert a header row at the top, since my csv lacks that.
Code:
import pandas
db=pandas.read_csv("netmedsdb.csv",header=None)
db
Output:
0 1 2 3
0 3M CAVILON NO STING BARRIER FILM SPRAY 28ML OTC 0 Rs.880.00 3M INDIA LTD
1 BACTI BAR SOAP 75GM OTC Rs.98.00 6TH SKIN PHARMACEUTICALS PVT LTD
2 KWIKNIC MINT FLAVOUR 4MG CHEW GUM TABLET 30'S NICOTINE Rs.180.00 A S V LABORATORIES INDIA PVT LTD
3 RIFAGO 550MG TABLET 10'S RIFAXIMIN 550MG Rs.298.00 AAREEN HEALTHCARE
4 999 OIL 60ML AYURVEDIC MEDICINE Rs.120.00 AAKASH PHARMACEUTICALS
5 AKASH SOAP 75GM AYURVEDIC PRODUCT Rs.80.00 AAKASH PHARMACEUTICALS
6 GROW CARE OIL 100ML AYURVEDIC MEDICINE Rs.190.00 AAKASH PHARMACEUTICALS
7 GROW CARE OIL 100ML AYURVEDIC MEDICINE Rs.190.00 AAKASH PHARMACEUTICALS
8 RHUNS OIL 30ML AYURVEDIC Rs.50.00 AAKASH PHARMACEUTICALS
9 VILLO CAPSULE 10'S AYURVEDIC MEDICINE Rs.70.00 AAKASH PHARMACEUTICALS
10 VITAWIN FORTE CAPSULE 10'S AYURVEDIC MEDICINE Rs.150.00 AAKASH PHARMACEUTICALS
I wrote the following code to insert the first element at row 0,coloumn 0:
db.insert(loc=0,column='0',value='Brand')
db
Output:
0 0 1 2 3
0 Brand 3M CAVILON NO STING BARRIER FILM SPRAY 28ML OTC 0 Rs.880.00 3M INDIA LTD
1 Brand BACTI BAR SOAP 75GM OTC Rs.98.00 6TH SKIN PHARMACEUTICALS PVT LTD
2 Brand KWIKNIC MINT FLAVOUR 4MG CHEW GUM TABLET 30'S NICOTINE Rs.180.00 A S V LABORATORIES INDIA PVT LTD
3 Brand RIFAGO 550MG TABLET 10'S RIFAXIMIN 550MG Rs.298.00 AAREEN HEALTHCARE
4 Brand 999 OIL 60ML AYURVEDIC MEDICINE Rs.120.00 AAKASH PHARMACEUTICALS
5 Brand AKASH SOAP 75GM AYURVEDIC PRODUCT Rs.80.00 AAKASH PHARMACEUTICALS
6 Brand GROW CARE OIL 100ML AYURVEDIC MEDICINE Rs.190.00 AAKASH PHARMACEUTICALS
7 Brand GROW CARE OIL 100ML AYURVEDIC MEDICINE Rs.190.00 AAKASH PHARMACEUTICALS
8 Brand RHUNS OIL 30ML AYURVEDIC Rs.50.00 AAKASH PHARMACEUTICALS
9 Brand VILLO CAPSULE 10'S AYURVEDIC MEDICINE Rs.70.00 AAKASH PHARMACEUTICALS
10 Brand VITAWIN FORTE CAPSULE 10'S AYURVEDIC MEDICINE Rs.150.00 AAKASH PHARMACEUTICALS
But unfortunately I got the word "Brand" inserted at coloumn 0 in all rows.
I'm trying to add the header coloumns "Brand", "Generic", "Price", "Company"
Need parameter names in read_csv only:
import pandas as pd
temp=u"""a,b,10,d
e,f,45,r
"""
#after testing replace 'pd.compat.StringIO(temp)' to 'netmedsdb.csv'
df = pd.read_csv(pd.compat.StringIO(temp), names=["Brand", "Generic", "Price", "Company"])
print (df)
Brand Generic Price Company
0 a b 10 d
1 e f 45 r

Categories

Resources