Graph python similar to R - python

I have a table like this one: (Ignore the columns "Index" and "D")
+-------+-----------------------+----------+----------+----------+
| Index | Type | Male | Female | D |
+-------+-----------------------+----------+----------+----------+
| 44 | Life struggles | 2.097324 | 3.681356 | 1.584032 |
| 2 | Writing notes | 2.677262 | 3.354730 | 0.677468 |
| 18 | Empathy | 3.528117 | 4.083051 | 0.554933 |
| 12 | Criminal damage | 2.926650 | 2.374150 | 0.552501 |
| 20 | Giving | 2.650367 | 3.196944 | 0.546577 |
| 21 | Compassion to animals | 3.666667 | 4.178268 | 0.511602 |
| 33 | Mood swings | 2.965937 | 3.451613 | 0.485676 |
| 10 | Funniness | 3.574572 | 3.104907 | 0.469665 |
| 38 | Children | 3.354523 | 3.805415 | 0.450891 |
| 47 | Small - big dogs | 3.221951 | 2.801695 | 0.420256 |
+-------+-----------------------+----------+----------+----------+
and I am trying to do a similar graph :
I know how to do it in R but not in python
I tried this:
sns.stripplot(data=df,y="Male",color="Blue")
sns.stripplot(data=df,y="Female",color="red")
But I don't know how to continue. Does someone have am idea?

This is easily done with matplotlib, it is simply a scatter plot with categories as y-values.
plt.style.use('ggplot')
fig, ax = plt.subplots()
ax.plot(df['Male'],df['Type'],'o', color='xkcd:reddish', ms=10, label='Male')
ax.plot(df['Female'],df['Type'],'o', color='xkcd:teal', ms=10, label='Female')
ax.axvline(3,ls='-',color='k')
ax.set_xlim(1,5)
ax.set_xlabel('avg response')
ax.set_ylabel('Variable')
ax.legend(bbox_to_anchor=(0.5, 1.02), loc='lower center',
ncol=2, title='group')
fig.tight_layout()

Related

Plot multiple bar plots for multiple columns

I have a dataset that looks roughly like the table below.
I need to create a barplot for each column TS1 to TS5 that counts the number of each item in that column. The items are one of the following: NOT_SEEN NOT_ABLE HIGH_BAR and numerical values between 110 and 140 separated by 2 (so 110, 112, 114 etc).
I have found a way to do this which works fine but what I am asking is if there is a way to create a loop or something so I don't have to copy paste the same code 5 times (for the 5 columns)?
This is what I have tried and working:
num_range = list(range(110,140, 2))
OUTCOMES = ['NOT_SEEN', 'NOT_ABLE', 'HIGH_BAR']
OUTCOMES.extend([str(num) for num in num_range])
OUTCOMES = CategoricalDtype(OUTCOMES, ordered = True)
fig, ax =plt.subplots(2, 3, sharey=True)
fig.tight_layout(pad=3)
This below is what I copy 5 times and only change the title (Testing 1, Testing 2 etc) and TS1 TS2.. (in the first line).
df["outcomes"] = df["TS1"].astype(OUTCOMES)
bpt=sns.countplot(x= "outcomes", data=df, palette='GnBu', ax=ax[0,0])
plt.setp(bpt.get_xticklabels(), rotation=60, size=6, ha='right')
bpt.set(xlabel='')
bpt.set_title('Testing 1')
Then the following code is below the "5" instances of the above.
ax[1,2].set_visible(False)
plt.show()
I am sure there is a way to do this that is much better but I'm new to all this.
Also, I need to make sure the bars of the barplot are ordered going left to right as: NOT_SEEN NOT_ABLE HIGH_BAR and 110, 112, 114 etc
Using python 2.7 (not my choice unfortunately) and pandas 0.24.2.
+----+------+------+----------+----------+----------+----------+----------+
| ID | VIEW | YEAR | TS1 | TS2 | TS3 | TS4 | TS5 |
+----+------+------+----------+----------+----------+----------+----------+
| AA | NO | 2005 | | 134 | | HIGH_BAR | |
+----+------+------+----------+----------+----------+----------+----------+
| AB | YES | 2015 | | | NOT_SEEN | | |
+----+------+------+----------+----------+----------+----------+----------+
| AB | YES | 2010 | 118 | | | | NOT_ABLE |
+----+------+------+----------+----------+----------+----------+----------+
| BB | NO | 2020 | | | | | |
+----+------+------+----------+----------+----------+----------+----------+
| BA | YES | 2020 | | | | NOT_SEEN | |
+----+------+------+----------+----------+----------+----------+----------+
| AA | NO | 2010 | | | | | |
+----+------+------+----------+----------+----------+----------+----------+
| BA | NO | 2015 | | | | | 132 |
+----+------+------+----------+----------+----------+----------+----------+
| BB | YES | 2010 | | HIGH_BAR | | 140 | NOT_ABLE |
+----+------+------+----------+----------+----------+----------+----------+
| AA | YES | 2020 | | | | | |
+----+------+------+----------+----------+----------+----------+----------+
| AB | NO | 2010 | | | | 112 | |
+----+------+------+----------+----------+----------+----------+----------+
| AB | YES | 2015 | | | NOT_ABLE | | HIGH_BAR |
+----+------+------+----------+----------+----------+----------+----------+
| BB | NO | 2020 | | | | 145 | |
+----+------+------+----------+----------+----------+----------+----------+
| BA | NO | 2015 | | 110 | | | |
+----+------+------+----------+----------+----------+----------+----------+
| AA | YES | 2010 | HIGH_BAR | | | NOT_SEEN | |
+----+------+------+----------+----------+----------+----------+----------+
| BA | YES | 2015 | | | | | |
+----+------+------+----------+----------+----------+----------+----------+
| AA | NO | 2020 | | | | 118 | |
+----+------+------+----------+----------+----------+----------+----------+
| BA | YES | 2015 | | 180 | NOT_ABLE | | |
+----+------+------+----------+----------+----------+----------+----------+
| BB | YES | 2020 | | NOT_SEEN | | | 126 |
+----+------+------+----------+----------+----------+----------+----------+
You can put plotting lines in a function and call it in a for loop automatically changing column, title and axis in each iteration:
fig, axes =plt.subplots(2, 3, sharey=True)
fig.tight_layout(pad=3)
def plotting(column, title, ax):
df["outcomes"] = df[column].astype(OUTCOMES)
bpt=sns.countplot(x= "outcomes", data=df, palette='GnBu', ax=ax)
plt.setp(bpt.get_xticklabels(), rotation=60, size=6, ha='right')
bpt.set(xlabel='')
bpt.set_title(title)
columns = ['TS1', 'TS2', 'TS3', 'TS4', 'TS5']
titles = ['Testing 1', 'Testing 2', 'Testing 3', 'Testing 4', 'Testing 5']
for column, title, ax in zip(columns, titles, axes.flatten()):
plotting(column, title, ax)
axes[1,2].set_visible(False)
plt.show()

Plotting for next row after the slice

I am plotting values of column X and FT according to column CN value in the following code
import matplotlib.pyplot as plt, plt.plot(X[CN==1],FT[CN==1]), plt.plot(X[CN==36],FT[CN==36])
and the data is given as
+-------+-----+----+-------+-------+
| X | N | CN | Vdiff | FT |
+-------+-----+----+-------+-------+
| 524 | 2 | 1 | 0.0 | 0.12. |
| 534 | 2 | 1 | 0.0 |0.134. |
| 525 | 2 | 1 | 0.0 |0.154. |
| . | | | |. |
| . | | | |. |
| 5976 | 15 | 14 | 0.0 |3.54. |
| 5913 | 15 | 14 | 0.1 |3.98. |
| 5923 | 0 | 15 | 0.0 |3.87. |
| . | | | |. |
| . | | | |. |
| 33001 | 7 | 36 | 0.0 |7.36 |
| 33029 | 7 | 36 | 0.0 |8.99 |
| 33023 | 7 | 36 | 0.1 |12.45 |
| 33114 | 0 | 37 | 0.0 |14.33 |
+-------+-----+----+-------+-------+
I am getting incomplete graphs so I need to use 1 next row in my plot. For example for the graph of CN==36 as plt.plot(X[CN==36],FT[CN==36]) I want to use first row of CN==37 in my plot. Note that CN values are repetitive.
I have to plot multiple graphs in this way so a general code above graphs will be appreciated.
Addition on request in comment: Check at the end of the circular shape they are not touching their edges so circle is incomplete. for example for aqua & green color cycles. I want complete cycles so I need 1 or 2 additonal rows in data to plot.

Plotting a CDF from a multiclass pandas dataframe

I understand the package empiricaldist provides a CDF function as per the documentation.
However, I find it tricky to plot my dataframe in the column has multiple values.
df.head()
+------+---------+---------------+-------------+----------+----------+-------+--------------+-----------+-----------+-----------+-----------+------------+
| | trip_id | seconds_start | seconds_end | duration | distance | speed | acceleration | lat_start | lon_start | lat_end | lon_end | travelmode |
+------+---------+---------------+-------------+----------+----------+-------+--------------+-----------+-----------+-----------+-----------+------------+
| 0 | 318410 | 1461743310 | 1461745298 | 1988 | 5121.49 | 2.58 | 0.00130 | 41.162687 | -8.615425 | 41.177888 | -8.597549 | car |
| 1 | 318411 | 1461749359 | 1461750290 | 931 | 1520.71 | 1.63 | 0.00175 | 41.177949 | -8.597074 | 41.177839 | -8.597574 | bus |
| 2 | 318421 | 1461806871 | 1461806941 | 70 | 508.15 | 7.26 | 0.10370 | 37.091240 | -8.211239 | 37.092322 | -8.206681 | foot |
| 3 | 318422 | 1461837354 | 1461838024 | 670 | 1207.39 | 1.80 | 0.00269 | 37.092082 | -8.205060 | 37.091659 | -8.206462 | car |
| 4 | 318425 | 1461852790 | 1461853845 | 1055 | 1470.49 | 1.39 | 0.00132 | 37.091628 | -8.202143 | 37.092095 | -8.205070 | foot |
+------+---------+---------------+-------------+----------+----------+-------+--------------+-----------+-----------+-----------+-----------+------------+
Would like to plot CDF for the column travelmode for each travel mode.
groups = df.groupby('travelmode')
However, I don't really understand how this could be done from the documentation.
You can plot them in a loop like
import matplotlib.pyplot as plt
def decorate_plot(title):
''' Adds labels to plot '''
plt.xlabel('Outcome')
plt.ylabel('CDF')
plt.title(title)
for tm in df['travelmode'].unique():
for col in df.columns:
if col != 'travelmode':
# Create new figures for each plot
fig, ax = plt.subplots()
d4 = Cdf.from_seq(df[col])
d4.plot()
decorate_plot(f"{tm} - {col}")

Multi-Index Lookup Mapping

I'm trying to create a new column which has a value based on 2 indices of that row. I have 2 dataframes with equivalent multi-index on the levels I'm querying (but not of equal size). For each row in the 1st dataframe, I want the value of the 2nd df that matches the row's indices.
I originally thought perhaps I could use a .loc[] and filter off the index values, but I cannot seem to get this to change the output row-by-row. If I wasn't using a dataframe object, I'd loop over the whole thing to do it.
I have tried to use the .apply() method, but I can't figure out what function to pass to it.
Creating some toy data with the same structure:
#import pandas as pd
#import numpy as np
np.random.seed = 1
df = pd.DataFrame({'Aircraft':np.ones(15),
'DC':np.append(np.repeat(['A','B'], 7), 'C'),
'Test':np.array([10,10,10,10,10,10,20,10,10,10,10,10,10,20,10]),
'Record':np.array([1,2,3,4,5,6,1,1,2,3,4,5,6,1,1]),
# There are multiple "value" columns in my data, but I have simplified here
'Value':np.random.random(15)
}
)
df.set_index(['Aircraft', 'DC', 'Test', 'Record'], inplace=True)
df.sort_index(inplace=True)
v = pd.DataFrame({'Aircraft':np.ones(7),
'DC':np.repeat('v',7),
'Test':np.array([10,10,10,10,10,10,20]),
'Record':np.array([1,2,3,4,5,6,1]),
'Value':np.random.random(7)
}
)
v.set_index(['Aircraft', 'DC', 'Test', 'Record'], inplace=True)
v.sort_index(inplace=True)
df['v'] = df.apply(lambda x: v.loc[df.iloc[x]])
Returns error for indexing on multi-index.
To set all values to a single "v" value:
df['v'] = float(v.loc[(slice(None), 'v', 10, 1), 'Value'])
So inputs look like this:
--------------------------------------------
| Aircraft | DC | Test | Record | Value |
|----------|----|------|--------|----------|
| 1.0 | A | 10 | 1 | 0.847576 |
| | | | 2 | 0.860720 |
| | | | 3 | 0.017704 |
| | | | 4 | 0.082040 |
| | | | 5 | 0.583630 |
| | | | 6 | 0.506363 |
| | | 20 | 1 | 0.844716 |
| | B | 10 | 1 | 0.698131 |
| | | | 2 | 0.112444 |
| | | | 3 | 0.718316 |
| | | | 4 | 0.797613 |
| | | | 5 | 0.129207 |
| | | | 6 | 0.861329 |
| | | 20 | 1 | 0.535628 |
| | C | 10 | 1 | 0.121704 |
--------------------------------------------
--------------------------------------------
| Aircraft | DC | Test | Record | Value |
|----------|----|------|--------|----------|
| 1.0 | v | 10 | 1 | 0.961791 |
| | | | 2 | 0.046681 |
| | | | 3 | 0.913453 |
| | | | 4 | 0.495924 |
| | | | 5 | 0.149950 |
| | | | 6 | 0.708635 |
| | | 20 | 1 | 0.874841 |
--------------------------------------------
And after the operation, I want this:
| Aircraft | DC | Test | Record | Value | v |
|----------|----|------|--------|----------|----------|
| 1.0 | A | 10 | 1 | 0.847576 | 0.961791 |
| | | | 2 | 0.860720 | 0.046681 |
| | | | 3 | 0.017704 | 0.913453 |
| | | | 4 | 0.082040 | 0.495924 |
| | | | 5 | 0.583630 | 0.149950 |
| | | | 6 | 0.506363 | 0.708635 |
| | | 20 | 1 | 0.844716 | 0.874841 |
| | B | 10 | 1 | 0.698131 | 0.961791 |
| | | | 2 | 0.112444 | 0.046681 |
| | | | 3 | 0.718316 | 0.913453 |
| | | | 4 | 0.797613 | 0.495924 |
| | | | 5 | 0.129207 | 0.149950 |
| | | | 6 | 0.861329 | 0.708635 |
| | | 20 | 1 | 0.535628 | 0.874841 |
| | C | 10 | 1 | 0.121704 | 0.961791 |
Edit:
as you are on pandas 0.23.4, you just change droplevel to reset_index with option drop=True
df_result = (df.reset_index('DC').assign(v=v.reset_index('DC', drop=True))
.set_index('DC', append=True)
.reorder_levels(v.index.names))
Original:
One way is putting index DC of df to columns and using assign to create new column on it and reset_index and reorder_index
df_result = (df.reset_index('DC').assign(v=v.droplevel('DC'))
.set_index('DC', append=True)
.reorder_levels(v.index.names))
Out[1588]:
Value v
Aircraft DC Test Record
1.0 A 10 1 0.847576 0.961791
2 0.860720 0.046681
3 0.017704 0.913453
4 0.082040 0.495924
5 0.583630 0.149950
6 0.506363 0.708635
20 1 0.844716 0.874841
B 10 1 0.698131 0.961791
2 0.112444 0.046681
3 0.718316 0.913453
4 0.797613 0.495924
5 0.129207 0.149950
6 0.861329 0.708635
20 1 0.535628 0.874841
C 10 1 0.121704 0.961791

How do I make this bs4 webscraping code travel down a table and store strings into a 2d array?

I am building a web scraper that tracks a changing list using BS, the html tags for the objects I am looking for are generic except for their id's which are unique and constantly changing. I know the top id will always be the same so I have gotten to the point where my output is giving me the top result in the format I need, but I am trying to figure out a way of adding the next nine . I cannot use their id's because they change, so I thought of using .find_next('tr') but I cant figure out how to get it past the second . I know that there must be an elegant solution, but it is my first time using BS4 so I was hoping that someone could point me in the right direction.
import requests
from bs4 import BeautifulSoup
from numpy import np
website_url = requests.get ('').text
soup = BeautifulSoup(website_url, 'lxml')
L = []
H = ["H1","H2","H3"]
for derp in soup.find(id='tr-id-1').findAll('a')[0:3:1]:
L.append(derp.string)
A = np.vstack((H, L))
print(A)
This gets me the printed array in the right format, but only for the with the id I entered in the find. I can get the second row by writing-
for derp in soup.find(id='tr-id-1').find_next('tr').findAll('a')[0:3:1]:
-but i don't know how to get further. I am only trying to scrape the first 10 rows of the table so I am thinking that I might need a while loop with a countdown marker? I am wondering if there is a way to create a loop that selectively takes the next 9 rows and appends the specific column data in the array.
This script prints the table with currencies (you can store data to list or numpy instead):
import requests
from bs4 import BeautifulSoup
url = 'https://coinmarketcap.com'
soup = BeautifulSoup(requests.get(url).text, 'lxml')
for tr in soup.select('#currencies tr'):
if not tr.select('td'):
continue
for i, td in enumerate(tr.select('td')[:-2]):
txt = td.text.replace('\n', ' ').replace('*', '').strip()
if i == 0:
print('{: ^4}'.format(txt), end='|')
else:
print('{: ^24}'.format(txt), end='|')
print()
Prints:
1 | BTC Bitcoin | $196,174,869,053 | $11020.77 | $24,501,665,241 | 17,800,475 BTC | -5.84% |
2 | ETH Ethereum | $30,603,567,177 | $286.61 | $8,821,119,760 | 106,776,759 ETH | -2.40% |
3 | XRP XRP | $16,148,857,177 | $0.379379 | $1,335,082,415 | 42,566,596,173 XRP | -4.06% |
4 | LTC Litecoin | $7,401,989,981 | $118.37 | $4,167,212,036 | 62,533,191 LTC | -4.15% |
5 | BCH Bitcoin Cash | $7,133,878,965 | $399.09 | $1,770,785,779 | 17,875,463 BCH | -3.87% |
6 | EOS EOS | $5,292,523,634 | $5.74 | $2,181,469,151 | 921,990,507 EOS | -2.58% |
7 | BNB Binance Coin | $4,621,088,383 | $32.73 | $267,344,456 | 141,175,490 BNB | -1.87% |
8 | USDT Tether | $3,684,665,566 | $0.999098 | $23,550,580,244 | 3,687,991,972 USDT | -0.53% |
9 | BSV Bitcoin SV | $3,534,271,930 | $197.94 | $351,461,514 | 17,854,986 BSV | -1.55% |
10 | TRX TRON | $2,113,478,617 | $0.031695 | $803,645,870 | 66,682,072,191 TRX | -0.96% |
11 | ADA Cardano | $1,981,827,482 | $0.076439 | $119,258,825 | 25,927,070,538 ADA | -3.69% |
12 | XLM Stellar | $1,940,474,350 | $0.099896 | $358,001,782 | 19,425,036,996 XLM | -2.60% |
13 | LEO UNUS SED LEO | $1,749,404,864 | $1.75 | $12,215,975 | 999,498,893 LEO | -2.01% |
14 | XMR Monero | $1,552,808,370 | $90.96 | $127,576,800 | 17,070,711 XMR | -0.03% |
15 | DASH Dash | $1,360,432,697 | $152.74 | $254,426,418 | 8,906,619 DASH | -4.60% |
16 | LINK Chainlink | $1,248,540,238 | $3.57 | $188,091,151 | 350,000,000 LINK | 4.24% |
17 | NEO NEO | $1,191,827,236 | $16.90 | $490,262,022 | 70,538,831 NEO | -4.08% |
18 | MIOTA IOTA | $1,069,692,929 | $0.384847 | $20,590,490 | 2,779,530,283 MIOTA | -2.74% |
19 | ATOM Cosmos | $1,021,900,211 | $5.36 | $63,724,815 | 190,688,439 ATOM | -4.31% |
20 | ETC Ethereum Classic | $872,993,215 | $7.81 | $751,025,201 | 111,727,165 ETC | -2.11% |
21 | XTZ Tezos | $817,988,097 | $1.24 | $7,008,121 | 658,849,612 XTZ | -1.62% |
22 | XEM NEM | $807,925,560 | $0.089770 | $24,960,771 | 8,999,999,999 XEM | -2.04% |
23 | ZEC Zcash | $700,497,262 | $101.45 | $281,578,113 | 6,905,119 ZEC | -1.92% |
24 | ONT Ontology | $675,289,519 | $1.36 | $133,352,633 | 494,757,215 ONT | -4.79% |
25 | MKR Maker | $655,751,917 | $655.75 | $1,156,367 | 1,000,000 MKR | 1.59% |
26 | CRO Crypto.com Chain | $552,054,533 | $0.071538 | $3,056,529 | 7,716,894,977 CRO | -2.14% |
27 | BTG Bitcoin Gold | $460,804,983 | $26.31 | $11,765,404 | 17,513,924 BTG | -5.22% |
28 | QTUM Qtum | $456,630,457 | $4.76 | $303,751,103 | 95,845,424 QTUM | -6.03% |
29 | DOGE Dogecoin | $452,796,907 | $0.003766 | $160,357,833 | 120,219,215,287 DOGE | 13.34% |
30 | VET VeChain | $416,649,897 | $0.007513 | $57,432,988 | 55,454,734,800 VET | 0.22% |
31 | BAT Basic Attenti... | $372,192,333 | $0.292373 | $28,777,633 | 1,273,006,300 BAT | -1.68% |
32 | USDC USD Coin | $366,029,067 | $0.997092 | $110,990,052 | 367,096,485 USDC | -0.35% |
33 | OMG OmiseGO | $323,389,435 | $2.31 | $102,874,355 | 140,245,398 OMG | -4.46% |
34 | VSYS V Systems | $312,745,092 | $0.178751 | $10,413,916 | 1,749,608,504 VSYS | -2.14% |
35 | DCR Decred | $297,378,169 | $29.63 | $1,739,049 | 10,037,096 DCR | -6.28% |
36 | BTT BitTorrent | $277,023,930 | $0.001306 | $56,422,080 | 212,116,500,000 BTT | -0.66% |
37 | HOT Holo | $229,759,018 | $0.001725 | $25,162,937 | 133,214,575,156 HOT | 0.21% |
38 | EGT Egretia | $222,938,874 | $0.052953 | $39,938,247 | 4,210,121,792 EGT | 2.24% |
39 | TUSD TrueUSD | $213,775,752 | $0.989291 | $131,504,347 | 216,089,898 TUSD | -1.22% |
40 | HC HyperCash | $208,038,166 | $4.78 | $13,740,143 | 43,529,781 HC | -5.15% |
41 | BCD Bitcoin Diamond | $202,441,610 | $1.09 | $3,011,645 | 186,492,898 BCD | -2.68% |
42 | RVN Ravencoin | $199,913,461 | $0.051124 | $15,906,528 | 3,910,345,000 RVN | -4.85% |
43 | HEDG HedgeTrade | $199,512,069 | $0.691805 | $1,395,797 | 288,393,355 HEDG | -4.07% |
44 | HT Huobi Token | $199,033,233 | $3.98 | $96,051,667 | 50,000,200 HT | -1.05% |
45 | AOA Aurora | $196,149,743 | $0.029982 | $10,250,381 | 6,542,330,148 AOA | 3.98% |
46 | LSK Lisk | $195,904,100 | $1.66 | $8,908,589 | 118,280,370 LSK | -4.19% |
47 | NPXS Pundi X | $193,713,239 | $0.000815 | $4,963,859 | 237,816,087,583 NPXS | -3.55% |
48 | KMD Komodo | $188,203,691 | $1.64 | $12,446,638 | 114,883,815 KMD | 5.35% |
49 | BTM Bytom | $188,040,836 | $0.187572 | $54,667,305 | 1,002,499,275 BTM | 11.50% |
50 | WAVES Waves | $177,993,586 | $1.78 | $12,125,048 | 100,000,000 WAVES | -5.44% |
51 | ZRX 0x | $171,782,082 | $0.287372 | $13,858,310 | 597,769,457 ZRX | -2.12% |
52 | QBIT Qubitica | $169,037,263 | $60.18 | $56,989 | 2,808,628 QBIT | -2.17% |
53 | BTS BitShares | $163,390,853 | $0.059810 | $3,167,151 | 2,731,850,000 BTS | 0.32% |
54 | PAX Paxos Standar... | $162,646,375 | $0.997482 | $136,732,457 | 163,056,875 PAX | -0.43% |
55 | NANO Nano | $161,436,979 | $1.21 | $15,637,856 | 133,248,297 NANO | -1.85% |
56 | BCN Bytecoin | $159,764,211 | $0.000868 | $116,637 | 184,066,828,814 BCN | -6.31% |
57 | REP Augur | $156,392,893 | $14.22 | $5,068,790 | 11,000,000 REP | -3.05% |
58 | NRG Energi | $156,304,538 | $8.70 | $1,129,572 | 17,972,740 NRG | -1.29% |
59 | MONA MonaCoin | $152,581,722 | $2.32 | $6,087,802 | 65,729,675 MONA | -3.20% |
60 | THR ThoreCoin | $148,663,548 | $1714.97 | $179,021 | 86,686 THR | -5.74% |
61 | IOST IOST | $146,115,831 | $0.012162 | $31,901,773 | 12,013,965,609 IOST | -2.13% |
62 | ICX ICON | $142,531,341 | $0.301076 | $10,890,859 | 473,406,688 ICX | -4.32% |
63 | DGB DigiByte | $138,427,137 | $0.011541 | $1,186,651 | 11,994,056,188 DGB | -5.09% |
64 | ZIL Zilliqa | $136,925,993 | $0.015762 | $13,494,196 | 8,687,360,058 ZIL | -2.40% |
65 | KCS KuCoin Shares | $129,815,988 | $1.45 | $26,079,874 | 89,659,415 KCS | -2.98% |
66 | LAMB Lambda | $125,711,992 | $0.251424 | $36,282,311 | 500,000,000 LAMB | -1.68% |
67 | XIN Mixin | $125,658,873 | $277.73 | $887,298 | 452,447 XIN | -5.58% |
68 | ABBC ABBC Coin | $125,028,718 | $0.247542 | $83,030,205 | 505,080,602 ABBC | -5.76% |
69 | SC Siacoin | $121,947,079 | $0.002949 | $2,043,076 | 41,353,612,700 SC | -4.06% |
70 | GXC GXChain | $121,693,968 | $2.03 | $3,042,648 | 60,000,000 GXC | -4.99% |
71 | AE Aeternity | $121,162,277 | $0.444154 | $38,500,380 | 272,793,174 AE | -4.23% |
72 | XVG Verge | $116,468,172 | $0.007369 | $1,604,278 | 15,805,409,499 XVG | -3.23% |
73 | ETP Metaverse ETP | $116,154,120 | $1.62 | $40,461,666 | 71,759,885 ETP | -9.33% |
74 | STEEM Steem | $110,124,952 | $0.340978 | $1,122,121 | 322,967,892 STEEM | -2.49% |
75 | ARDR Ardor | $109,691,634 | $0.109801 | $1,327,459 | 998,999,495 ARDR | -5.31% |
76 | ELF aelf | $108,074,954 | $0.217880 | $15,735,213 | 496,030,000 ELF | -1.35% |
77 | INB Insight Chain | $107,508,460 | $0.307252 | $5,435,619 | 349,902,689 INB | -6.46% |
78 | SOLVE SOLVE | $107,021,001 | $0.327169 | $9,286,887 | 327,112,052 SOLVE | 12.73% |
79 | VEST VestChain | $105,387,369 | $0.014889 | $396,465 | 7,078,400,000 VEST | -11.00% |
80 | QNT Quant | $101,347,141 | $10.37 | $13,719,402 | 9,777,236 QNT | 14.61% |
81 | NEX Nash Exchange | $99,192,791 | $2.74 | $1,985,748 | 36,196,678 NEX | -3.50% |
82 | THETA THETA | $98,543,466 | $0.113203 | $2,157,896 | 870,502,690 THETA | -5.94% |
83 | DENT Dent | $96,923,187 | $0.001332 | $6,458,186 | 72,745,838,994 DENT | -5.83% |
84 | WTC Waltonchain | $96,854,881 | $2.32 | $22,574,334 | 41,682,339 WTC | 16.17% |
85 | MCO Crypto.com | $93,731,944 | $5.93 | $8,346,732 | 15,793,831 MCO | -2.05% |
86 | MAID MaidSafeCoin | $93,684,812 | $0.207014 | $685,687 | 452,552,412 MAID | -6.19% |
87 | SNT Status | $91,353,836 | $0.026323 | $17,797,371 | 3,470,483,788 SNT | -3.86% |
88 | ENJ Enjin Coin | $89,470,435 | $0.115345 | $4,932,359 | 775,679,781 ENJ | -2.92% |
89 | EKT EDUCare | $89,461,163 | $0.124975 | $2,309,456 | 715,835,137 EKT | -2.38% |
90 | DAI Dai | $88,618,335 | $0.982690 | $18,747,792 | 90,179,367 DAI | -0.80% |
91 | GNT Golem | $88,468,793 | $0.091730 | $1,127,506 | 964,450,000 GNT | -3.47% |
92 | XZC Zcoin | $86,935,432 | $11.06 | $2,154,223 | 7,861,468 XZC | -4.14% |
93 | NAS Nebulas | $83,446,607 | $1.72 | $10,626,245 | 48,627,715 NAS | 4.07% |
94 | STRAT Stratis | $81,564,126 | $0.820611 | $3,546,613 | 99,394,330 STRAT | -7.04% |
95 | NET NEXT | $78,533,755 | $1.56 | $12,188,679 | 50,269,268 NET | 22.71% |
96 | REN Ren | $77,607,280 | $0.100819 | $8,913,932 | 769,764,831 REN | -8.98% |
97 | CCCX Clipper Coin | $74,987,003 | $0.019861 | $56,719 | 3,775,570,996 CCCX | 13.61% |
98 | MXM Maximine Coin | $70,357,334 | $0.042667 | $2,647,414 | 1,649,000,000 MXM | -2.66% |
99 | WAX WAX | $69,037,228 | $0.073224 | $548,672 | 942,821,662 WAX | -5.35% |
100 | SAN Santiment Net... | $69,026,823 | $1.10 | $21,347 | 62,660,371 SAN | -2.61% |
Using attribute and class selectors, you can easily scrape the table:
import requests
from bs4 import BeautifulSoup
def make_soup(url: str) -> BeautifulSoup:
res = requests.get(url, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0'
})
res.raise_for_status()
return BeautifulSoup(res.text, 'html.parser')
def scrape_coins(soup: BeautifulSoup) -> list:
table = soup.select_one('#currencies')
coins = {}
for row in table.select('tbody > tr'):
symbol = row.select_one('.currency-symbol').text.strip()
name = row.select_one('.currency-name-container').text.strip()
cap = row.select_one('.market-cap')['data-usd']
price = row.select_one('.price')['data-usd']
volume = row.select_one('.volume')['data-usd']
supply = row.select_one('[data-supply]')['data-supply']
change = row.select_one('[data-percentusd]')['data-percentusd']
coins[symbol] = {
'name': name,
'cap': float(cap),
'price': float(price),
'volume': float(volume),
'supply': float(supply),
'change': float(change),
}
return coins
if __name__ == "__main__":
url = 'https://coinmarketcap.com/'
soup = make_soup(url)
info = scrape_coins(soup)
from pprint import pprint
pprint(info)
output:
{'BTC': {'cap': 196969226244.0,
'change': -5.4235,
'name': 'Bitcoin',
'price': 11065.3915833,
'supply': 17800475.0,
'volume': 24574484943.9},
'ETH': {'cap': 30724660168.6,
'change': -2.00031,
'name': 'Ethereum',
'price': 287.746701554,
'supply': 106776758.874,
'volume': 8840470261.58},
'LTC': {'cap': 7439287857.04,
'change': -3.64038,
'name': 'Litecoin',
'price': 118.965428838,
'supply': 62533190.774,
'volume': 4181083872.28},
'XRP': {'cap': 16149651071.9,
'change': -4.05122,
'name': 'XRP',
'price': 0.379397286226,
'supply': 42566596173.0,
'volume': 1332204345.98}}
... and so on

Categories

Resources