Value Count combing two columns - python

import pandas as pd
import seaborn as sns
dfexcel= pd.read_excel('https://raw.githubusercontent.com/ArsenioMGonzalez3/Project3_ABDS/master/Open%20Parking%20and%20Camera%20Violations_OH%20NY_2019_2020%20YTD.xlsx')
dfexcel = dfexcel[['Issuing Agency','State']].sort_values(by = 'Issuing Agency' , ascending=False)
dfexcel
this code generates all the violations issued by each different agency for vechicles registered to either NY or OH.
how can I see how many violation each agency issued for both NY/OH
for example: Traffic agency issused 42 for NY and 2 for OH

You can use groupby then get the size of each group:
dfexcel.groupby(['State','Issuing Agency']).size()
Output:
State Issuing Agency
NY CON RAIL 2
DEPARTMENT OF SANITATION 2457
DEPARTMENT OF TRANSPORTATION 22065
FIRE DEPARTMENT 8
HOUSING AUTHORITY 2
NYC TRANSIT AUTHORITY MANAGERS 6
NYS OFFICE OF MENTAL HEALTH POLICE 2
OTHER/UNKNOWN AGENCIES 239
PARKS DEPARTMENT 54
POLICE DEPARTMENT 10340
PORT AUTHORITY 5
TRAFFIC 26344
OH DEPARTMENT OF SANITATION 7
DEPARTMENT OF TRANSPORTATION 47
PARKS DEPARTMENT 1
POLICE DEPARTMENT 13
TRAFFIC 35

Related

Pandas aggregate data by same ID and comma separate values in column

I have data such as the following:
ID
Category
1
Finance
2
Computer Science
3
Data Science
1
Marketing
2
Finance
My goal is to aggregate the common ID's into one row and add the differing categories all into one column seperate by commas, such as the following:
ID
Category
1
Finance, Marketing
2
Computer Science , Finance
3
Data Science
How would I go about this using Pandas?
Edit:
I also have other columns for the ID's I would like to keep. For example:
ID
Category
Location
1
Finance
New York
2
Computer Science
Los Angeles
3
Data Science
Austin
1
Marketing
New York
2
Finance
Los Angeles
since the additional data from the other columns are the same for all similarr ID's (ex: ID 1 has the same location for all instances, as does ID 2) I would like to not drop any columns and keep other data like this:
ID
Category
Location
1
Finance, Marketing
New York
2
Computer Science , Finance
Los Angeles
3
Data Science
Austin

How to get a list of tickers in Jupyter Notebook?

Write code to get a list of tickers for all S&P 500 stocks from Wikipedia. As of 2/24/2021, there are 505 tickers in that list. You can use any method you want as long as the code actually queries the following website to get the list:
https://en.wikipedia.org/wiki/List_of_S%26P_500_companies
One way would be to use the requests module to get the HTML code and then use the re module to extract the tickers. Another option would be the .read_html function in pandas and then export the tickers column to a list.
You should save the tickers in a list with the name sp500_tickers
This will grab the data in the table named 'constituents'.
# find a specific table by table count
import pandas as pd
import requests
from bs4 import BeautifulSoup
res = requests.get("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
print(df[0].to_json(orient='records'))
Result:
[{"Symbol":"MMM","Security":"3M Company","SEC filings":"reports","GICS Sector":"Industrials","GICS Sub-Industry":"Industrial Conglomerates","Headquarters Location":"St. Paul, Minnesota","Date first added":"1976-08-09","CIK":66740,"Founded":"1902"},{"Symbol":"ABT","Security":"Abbott Laboratories","SEC filings":"reports","GICS Sector":"Health Care","GICS Sub-Industry":"Health Care Equipment","Headquarters Location":"North Chicago, Illinois","Date first added":"1964-03-31","CIK":1800,"Founded":"1888"},{"Symbol":"ABBV","Security":"AbbVie Inc.","SEC filings":"reports","GICS Sector":"Health Care","GICS Sub-Industry":"Pharmaceuticals","Headquarters Location":"North Chicago, Illinois","Date first added":"2012-12-31","CIK":1551152,"Founded":"2013 (1888)"},{"Symbol":"ABMD","Security":"Abiomed","SEC filings":"reports","GICS Sector":"Health Care","GICS Sub-Industry":"Health Care Equipment","Headquarters Location":"Danvers, Massachusetts","Date first added":"2018-05-31","CIK":815094,"Founded":"1981"},{"Symbol":"ACN","Security":"Accenture","SEC filings":"reports","GICS Sector":"Information Technology","GICS Sub-Industry":"IT Consulting & Other Services","Headquarters Location":"Dublin, Ireland","Date first added":"2011-07-06","CIK":1467373,"Founded":"1989"},{"Symbol":"ATVI","Security":"Activision Blizzard","SEC filings":"reports","GICS Sector":"Communication Services","GICS Sub-Industry":"Interactive Home Entertainment","Headquarters Location":"Santa Monica, California","Date first added":"2015-08-31","CIK":718877,"Founded":"2008"},{"Symbol":"ADBE","Security":"Adobe Inc.","SEC filings":"reports","GICS Sector":"Information Technology","GICS Sub-Industry":"Application Software","Headquarters Location":"San Jose, California","Date first added":"1997-05-05","CIK":796343,"Founded":"1982"},
Etc., etc., etc.
That's JSON. If you want a table, kind of like what you would use in Excel, simply print the df.
Result:
[ Symbol Security SEC filings GICS Sector \
0 MMM 3M Company reports Industrials
1 ABT Abbott Laboratories reports Health Care
2 ABBV AbbVie Inc. reports Health Care
3 ABMD Abiomed reports Health Care
4 ACN Accenture reports Information Technology
.. ... ... ... ...
500 YUM Yum! Brands Inc reports Consumer Discretionary
501 ZBRA Zebra Technologies reports Information Technology
502 ZBH Zimmer Biomet reports Health Care
503 ZION Zions Bancorp reports Financials
504 ZTS Zoetis reports Health Care
GICS Sub-Industry Headquarters Location \
0 Industrial Conglomerates St. Paul, Minnesota
1 Health Care Equipment North Chicago, Illinois
2 Pharmaceuticals North Chicago, Illinois
3 Health Care Equipment Danvers, Massachusetts
4 IT Consulting & Other Services Dublin, Ireland
.. ... ...
500 Restaurants Louisville, Kentucky
501 Electronic Equipment & Instruments Lincolnshire, Illinois
502 Health Care Equipment Warsaw, Indiana
503 Regional Banks Salt Lake City, Utah
504 Pharmaceuticals Parsippany, New Jersey
Date first added CIK Founded
0 1976-08-09 66740 1902
1 1964-03-31 1800 1888
2 2012-12-31 1551152 2013 (1888)
3 2018-05-31 815094 1981
4 2011-07-06 1467373 1989
.. ... ... ...
500 1997-10-06 1041061 1997
501 2019-12-23 877212 1969
502 2001-08-07 1136869 1927
503 2001-06-22 109380 1873
504 2013-06-21 1555280 1952
[505 rows x 9 columns]]
Alternatively, you can export the df to a CSV file.
df.to_csv('constituents.csv')

how to scrape Wikipedia tables with Python

i want to extract table url is https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia
my code is not giving data.how we can get?
Code:
import requests
from bs4 import BeautifulSoup as bs
url = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
html = requests.get(url).text
soup = bs(html, 'html.parser')
ta=soup.find_all('table',class_="wikitable sortable jquery-tablesorter")
print(ta)
If I'm pulling a table and see <table> tags, I would always try first Pandas .read_html(). It'll do the iterating over rows for you. Most of the time you can get exactly what you need, or at the very least only have to do some minor manipulationg of the dataframe. In this case, it gives you the full table nicely:
import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
table = pd.read_html(url)[1]
Output:
print (table.to_string())
0 1 2 3 4 5
0 Name Industry Sector Headquarters Founded Notes
1 Airfast Indonesia Consumer services Airlines Tangerang 1971 Private airline
2 Angkasa Pura Industrials Transportation services Jakarta 1962 State-owned airports
3 Astra International Conglomerates - Jakarta 1957 Automotive, financials, industrials, technology
4 Bank Central Asia Financials Banks Jakarta 1957 Bank
5 Bank Danamon Financials Banks Jakarta 1956 Bank
6 Bank Mandiri Financials Banks Jakarta 1998 Bank
7 Bank Negara Indonesia Financials Banks Jakarta 1946 Bank
8 Bank Rakyat Indonesia Financials Banks Jakarta 1895 Micro-finance bank
9 Bumi Resources Basic materials General mining Jakarta 1973 Mining
10 Djarum Consumer goods Tobacco Kudus and Jakarta 1951 Tobacco
11 Dragon Computer & Communication Technology Computer hardware Jakarta 1980 Computer hardware
12 Elex Media Komputindo Consumer services Publishing Jakarta 1985 Publisher
13 Femina Consumer services Media Jakarta 1972 Weekly magazine
14 Garuda Indonesia Consumer services Travel & leisure Tangerang 1949 State-owned airline
15 Gudang Garam Consumer goods Tobacco Kediri 1958 Tobacco
16 Gunung Agung Consumer services Specialty retailers Jakarta 1953 Bookstores
17 Indocement Tunggal Prakarsa Industrials Building materials & fixtures Jakarta 1985 Cement, part of HeidelbergCement (Germany)
18 Indofood Consumer goods Food products Jakarta 1968 Food production
19 Indonesian Aerospace Industrials Aerospace Bandung 1976 State-owned aircraft design
20 Indonesian Bureau of Logistics Consumer goods Food products Jakarta 1967 Food distribution
21 Indosat Telecommunications Fixed line telecommunications Jakarta 1967 Telecommunications network
22 Infomedia Nusantara Consumer services Publishing Jakarta 1975 Directory publisher
23 Jalur Nugraha Ekakurir (JNE) Industrials Delivery services Jakarta 1990 Express logistics
24 Kalbe Farma Health care Pharmaceuticals Jakarta 1966 Pharmaceuticals
25 Kereta Api Indonesia Industrials Railroads Bandung 1945 State-owned railway
26 Kimia Farma Health care Pharmaceuticals Jakarta 1971 State-owned pharma
27 Kompas Gramedia Group Consumer services Media agencies Jakarta 1965 Media holding
28 Krakatau Steel Basic materials Iron & steel Cilegon 1970 State-owned steel
29 Lion Air Consumer services Airlines Jakarta 2000 Low-cost airline
30 Lippo Group Financials Real estate holding & development Jakarta 1950 Development
31 Matahari Consumer services Broadline retailers Tangerang 1982 Department stores
32 MedcoEnergi Oil & gas Exploration & production Jakarta 1980 Energy, oil and gas
33 Media Nusantara Citra Consumer services Broadcasting & entertainment Jakarta 1997 Media
34 Panin Sekuritas Financials Investment services Jakarta 1989 Broker
35 Pegadaian Financials Consumer finance Jakarta 1901 State-owned financial services
36 Pelni Industrials Marine transportation Jakarta 1952 Shipping
37 Pos Indonesia Industrials Delivery services Bandung 1995 State-owned postal service
38 Pertamina Oil & gas Integrated oil & gas Jakarta 1957 State-owned oil and natural gas
39 Perusahaan Gas Negara Oil & gas Exploration & production Jakarta 1965 Gas
40 Perusahaan Gas Negara Utilities Gas distribution Jakarta 1965 State-owned natural gas transportation
41 Perusahaan Listrik Negara Utilities Conventional electricity Jakarta 1945 State-owned electrical distribution
42 Phillip Securities Indonesia, PT Financials Investment services Jakarta 1989 Financial services
43 Pindad Industrials Defense Bandung 1808 State-owned defense
44 PT Lapindo Brantas Oil & gas Exploration & production Jakarta 1996 Oil and gas
45 PT Metro Supermarket Realty Tbk Consumer services Food retailers & wholesalers Jakarta 1955 Supermarkets
46 Salim Group Conglomerates - Jakarta 1972 Industrials, financials, consumer goods
47 Sampoerna Consumer goods Tobacco Surabaya 1913 Tobacco
48 Semen Indonesia Industrials Building materials & fixtures Gresik 1957 Cement
49 Susi Air Consumer services Airlines Pangandaran 2004 Charter airline
50 Telkom Indonesia Telecommunications Fixed line telecommunications Bandung 1856 Telecommunication services
51 Telkomsel Telecommunications Mobile telecommunications Jakarta 1995 Mobile network, part of Telkom Indonesia
52 Trans Corp Conglomerates - Jakarta 2006 Media, consumer services, real estate, part of...
53 Unilever Indonesia Consumer goods Personal products Jakarta 1933 Personal care products, part of Unilever (Neth...
54 United Tractors Industrials Commercial vehicles & trucks Jakarta 1972 Heavy equipment
55 Waskita Industrials Heavy construction Jakarta 1961 State-owned construction
import requests
from bs4 import BeautifulSoup as bs
URL = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
html = requests.get(url).text
soup = bs(html, 'html.parser')
ta=soup.find_all('table',{'class':'wikitable'})
print(ta)
You can search for table by class name using old way. It seems still working.
Fixes:
Use URL instead of url in your code (line 4)
Use class wikitable
Optimized your code a little
Hence:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia")
soup = BeautifulSoup(page.content, 'html.parser')
ta = soup.find_all('table',class_="wikitable")
print(ta)
OUTPUT:
[<table class="wikitable sortable">
<tbody><tr>
<th>Rank
</th>
<th>Image
</th>
<th>Name
</th>
<th>2016 Revenues (USD $M)
</th>
<th>Employees
</th>
<th>Notes
.
.
.
Maybe it's not what you are looking for. But you can try this one.
import requests
from bs4 import BeautifulSoup as bs
url = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
html = requests.get(url).text
soup = bs(html, 'html.parser')
for data in soup.find_all('table', {"class":"wikitable"}):
for td in data.find_all('td'):
for link in td.find_all('a'):
print (link.text)
try the below,
import requests
from bs4 import BeautifulSoup as bs
URL = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
html = requests.get(URL).text
soup = bs(html, 'html.parser')
ta=soup.find("table",{"class":"wikitable sortable"})
print(ta)
to get all the tables
ta=soup.find_all("table",{"class":"wikitable sortable"})
If you want to parse table data then you can do this using pandas and very efficient if you want to manipulate the table data, you can navigate the table using pandas DataFrame()
import pandas as pd
url = "https://en.wikipedia.org/wiki/List_of_companies_of_Indonesia"
table = pd.read_html(url,header=0)
print(table[1])

Inserting a header row for pandas dataframe

I have just started python and am trying to rewrite one of my perl scripts in python. Essentially, I had a long script to convert a csv to json.
I've tried to import my csv into a pandas dataframe, and wanted to insert a header row at the top, since my csv lacks that.
Code:
import pandas
db=pandas.read_csv("netmedsdb.csv",header=None)
db
Output:
0 1 2 3
0 3M CAVILON NO STING BARRIER FILM SPRAY 28ML OTC 0 Rs.880.00 3M INDIA LTD
1 BACTI BAR SOAP 75GM OTC Rs.98.00 6TH SKIN PHARMACEUTICALS PVT LTD
2 KWIKNIC MINT FLAVOUR 4MG CHEW GUM TABLET 30'S NICOTINE Rs.180.00 A S V LABORATORIES INDIA PVT LTD
3 RIFAGO 550MG TABLET 10'S RIFAXIMIN 550MG Rs.298.00 AAREEN HEALTHCARE
4 999 OIL 60ML AYURVEDIC MEDICINE Rs.120.00 AAKASH PHARMACEUTICALS
5 AKASH SOAP 75GM AYURVEDIC PRODUCT Rs.80.00 AAKASH PHARMACEUTICALS
6 GROW CARE OIL 100ML AYURVEDIC MEDICINE Rs.190.00 AAKASH PHARMACEUTICALS
7 GROW CARE OIL 100ML AYURVEDIC MEDICINE Rs.190.00 AAKASH PHARMACEUTICALS
8 RHUNS OIL 30ML AYURVEDIC Rs.50.00 AAKASH PHARMACEUTICALS
9 VILLO CAPSULE 10'S AYURVEDIC MEDICINE Rs.70.00 AAKASH PHARMACEUTICALS
10 VITAWIN FORTE CAPSULE 10'S AYURVEDIC MEDICINE Rs.150.00 AAKASH PHARMACEUTICALS
I wrote the following code to insert the first element at row 0,coloumn 0:
db.insert(loc=0,column='0',value='Brand')
db
Output:
0 0 1 2 3
0 Brand 3M CAVILON NO STING BARRIER FILM SPRAY 28ML OTC 0 Rs.880.00 3M INDIA LTD
1 Brand BACTI BAR SOAP 75GM OTC Rs.98.00 6TH SKIN PHARMACEUTICALS PVT LTD
2 Brand KWIKNIC MINT FLAVOUR 4MG CHEW GUM TABLET 30'S NICOTINE Rs.180.00 A S V LABORATORIES INDIA PVT LTD
3 Brand RIFAGO 550MG TABLET 10'S RIFAXIMIN 550MG Rs.298.00 AAREEN HEALTHCARE
4 Brand 999 OIL 60ML AYURVEDIC MEDICINE Rs.120.00 AAKASH PHARMACEUTICALS
5 Brand AKASH SOAP 75GM AYURVEDIC PRODUCT Rs.80.00 AAKASH PHARMACEUTICALS
6 Brand GROW CARE OIL 100ML AYURVEDIC MEDICINE Rs.190.00 AAKASH PHARMACEUTICALS
7 Brand GROW CARE OIL 100ML AYURVEDIC MEDICINE Rs.190.00 AAKASH PHARMACEUTICALS
8 Brand RHUNS OIL 30ML AYURVEDIC Rs.50.00 AAKASH PHARMACEUTICALS
9 Brand VILLO CAPSULE 10'S AYURVEDIC MEDICINE Rs.70.00 AAKASH PHARMACEUTICALS
10 Brand VITAWIN FORTE CAPSULE 10'S AYURVEDIC MEDICINE Rs.150.00 AAKASH PHARMACEUTICALS
But unfortunately I got the word "Brand" inserted at coloumn 0 in all rows.
I'm trying to add the header coloumns "Brand", "Generic", "Price", "Company"
Need parameter names in read_csv only:
import pandas as pd
temp=u"""a,b,10,d
e,f,45,r
"""
#after testing replace 'pd.compat.StringIO(temp)' to 'netmedsdb.csv'
df = pd.read_csv(pd.compat.StringIO(temp), names=["Brand", "Generic", "Price", "Company"])
print (df)
Brand Generic Price Company
0 a b 10 d
1 e f 45 r

Django Query Multiple Models with ForeignKey and ManyToMany Filed with Count

I have 3 DB tables (models) that I'm trying to use Django ORM query to get count of restaurants based on cuisine type in every city.
Models:
RESTAURANT
name ..
city = models.ForeignKey('City')
cuisine_types = models.ManyToManyField('Cuisinetype')
CITY
name ..
CUISINETYPE
name ..
If I was using raw SQL query, something like this would give me the desired results:
SELECT
city.`name` AS city,
cuisinetype.`cuisine`,
COUNT(
restaurant_cuisine_types.`cuisinetype_id`
) AS total
FROM
restaurant_cuisine_types
JOIN cuisinetype
ON restaurant_cuisine_types.`cuisinetype_id` = cuisinetype.`id`
JOIN restaurant
ON restaurant_cuisine_types.`restaurant_id` = restaurant.`id`
JOIN city
ON restaurant.`city_id` = city.`id`
GROUP BY cuisinetype_id,
city.name
ORDER BY city,
cuisine
LIMIT 0, 1000;
RESULTS
city cuisine total
Albuquerque American 5
Albuquerque French 1
Albuquerque Italian 1
Albuquerque Southwest 2
Albuquerque Steak 2
Atlanta American 6
Atlanta Asian 1
Atlanta Continental 2
Atlanta Fusion 1
Atlanta International 1
Atlanta Italian 1
...
So what is the Django way to get these results?

Categories

Resources