Plot multiple bar plots for multiple columns - python
I have a dataset that looks roughly like the table below.
I need to create a barplot for each column TS1 to TS5 that counts the number of each item in that column. The items are one of the following: NOT_SEEN NOT_ABLE HIGH_BAR and numerical values between 110 and 140 separated by 2 (so 110, 112, 114 etc).
I have found a way to do this which works fine but what I am asking is if there is a way to create a loop or something so I don't have to copy paste the same code 5 times (for the 5 columns)?
This is what I have tried and working:
num_range = list(range(110,140, 2))
OUTCOMES = ['NOT_SEEN', 'NOT_ABLE', 'HIGH_BAR']
OUTCOMES.extend([str(num) for num in num_range])
OUTCOMES = CategoricalDtype(OUTCOMES, ordered = True)
fig, ax =plt.subplots(2, 3, sharey=True)
fig.tight_layout(pad=3)
This below is what I copy 5 times and only change the title (Testing 1, Testing 2 etc) and TS1 TS2.. (in the first line).
df["outcomes"] = df["TS1"].astype(OUTCOMES)
bpt=sns.countplot(x= "outcomes", data=df, palette='GnBu', ax=ax[0,0])
plt.setp(bpt.get_xticklabels(), rotation=60, size=6, ha='right')
bpt.set(xlabel='')
bpt.set_title('Testing 1')
Then the following code is below the "5" instances of the above.
ax[1,2].set_visible(False)
plt.show()
I am sure there is a way to do this that is much better but I'm new to all this.
Also, I need to make sure the bars of the barplot are ordered going left to right as: NOT_SEEN NOT_ABLE HIGH_BAR and 110, 112, 114 etc
Using python 2.7 (not my choice unfortunately) and pandas 0.24.2.
+----+------+------+----------+----------+----------+----------+----------+
| ID | VIEW | YEAR | TS1 | TS2 | TS3 | TS4 | TS5 |
+----+------+------+----------+----------+----------+----------+----------+
| AA | NO | 2005 | | 134 | | HIGH_BAR | |
+----+------+------+----------+----------+----------+----------+----------+
| AB | YES | 2015 | | | NOT_SEEN | | |
+----+------+------+----------+----------+----------+----------+----------+
| AB | YES | 2010 | 118 | | | | NOT_ABLE |
+----+------+------+----------+----------+----------+----------+----------+
| BB | NO | 2020 | | | | | |
+----+------+------+----------+----------+----------+----------+----------+
| BA | YES | 2020 | | | | NOT_SEEN | |
+----+------+------+----------+----------+----------+----------+----------+
| AA | NO | 2010 | | | | | |
+----+------+------+----------+----------+----------+----------+----------+
| BA | NO | 2015 | | | | | 132 |
+----+------+------+----------+----------+----------+----------+----------+
| BB | YES | 2010 | | HIGH_BAR | | 140 | NOT_ABLE |
+----+------+------+----------+----------+----------+----------+----------+
| AA | YES | 2020 | | | | | |
+----+------+------+----------+----------+----------+----------+----------+
| AB | NO | 2010 | | | | 112 | |
+----+------+------+----------+----------+----------+----------+----------+
| AB | YES | 2015 | | | NOT_ABLE | | HIGH_BAR |
+----+------+------+----------+----------+----------+----------+----------+
| BB | NO | 2020 | | | | 145 | |
+----+------+------+----------+----------+----------+----------+----------+
| BA | NO | 2015 | | 110 | | | |
+----+------+------+----------+----------+----------+----------+----------+
| AA | YES | 2010 | HIGH_BAR | | | NOT_SEEN | |
+----+------+------+----------+----------+----------+----------+----------+
| BA | YES | 2015 | | | | | |
+----+------+------+----------+----------+----------+----------+----------+
| AA | NO | 2020 | | | | 118 | |
+----+------+------+----------+----------+----------+----------+----------+
| BA | YES | 2015 | | 180 | NOT_ABLE | | |
+----+------+------+----------+----------+----------+----------+----------+
| BB | YES | 2020 | | NOT_SEEN | | | 126 |
+----+------+------+----------+----------+----------+----------+----------+
You can put plotting lines in a function and call it in a for loop automatically changing column, title and axis in each iteration:
fig, axes =plt.subplots(2, 3, sharey=True)
fig.tight_layout(pad=3)
def plotting(column, title, ax):
df["outcomes"] = df[column].astype(OUTCOMES)
bpt=sns.countplot(x= "outcomes", data=df, palette='GnBu', ax=ax)
plt.setp(bpt.get_xticklabels(), rotation=60, size=6, ha='right')
bpt.set(xlabel='')
bpt.set_title(title)
columns = ['TS1', 'TS2', 'TS3', 'TS4', 'TS5']
titles = ['Testing 1', 'Testing 2', 'Testing 3', 'Testing 4', 'Testing 5']
for column, title, ax in zip(columns, titles, axes.flatten()):
plotting(column, title, ax)
axes[1,2].set_visible(False)
plt.show()
Related
Filter the pyspark dataframe based on values in list
I am fairly new to pyspark. I have pyspark dataframe which has information about number of times a particular person has got message from a brand. It has three columns id, brand and count, as show below. | id | brand | Count | |:---:|:-------:|:-----:| | 143 | AD-ABC | 3 | | 314 | AX-DEFG | 8 | | 381 | AD-ABC | 6 | | 425 | AD-XYZP | 7 | | 432 | AD-GAF | 8 | | 102 | AD-GAF | 1 | | 331 | AX-ABC | 10 | | 191 | AD-GAF | 9 | | 224 | AD-GAF | 6 | The brand column is bit complex and I want to derive new column brand2 from brand column as shown below(keep character after -) +-----+---------+-------+--------+ | id | brand | Count | brand2 | +-----+---------+-------+--------+ | 143 | AD-ABC | 3 | ABC | | 314 | AX-DEFG | 8 | DEFG | | 381 | AD-ABC | 6 | ABC | | 425 | AD-XYZP | 7 | XYZP | | 432 | AD-GAF | 8 | GAF | | 102 | AD-GAF | 1 | GAF | | 331 | AX-ABC | 10 | ABC | | 191 | AD-GAF | 9 | GAF | | 224 | AD-GAF | 6 | GAF | +-----+---------+-------+--------+ I have a very large list which has the brands that I want to filter out from the dataframe as below brand_subset = ['ABC', 'DEF', 'XYZP'] #The list is very large !! The desired dataframe which I want is as below +-----+---------+-------+--------+ | id | brand | Count | brand2 | +-----+---------+-------+--------+ | 143 | AD-ABC | 3 | ABC | | 381 | AD-ABC | 6 | ABC | | 425 | AD-XYZP | 7 | XYZP | | 331 | AX-ABC | 10 | ABC | +-----+---------+-------+--------+ The above is just a sample scenario, practically both the list and the table is very large. Any help will be appreciated. (It will be good if the solution is optimized considering size of database)
Split the brand column and get the second element, then use isin to check if brand2 is in the list: import pyspark.sql.functions as F brand_subset = ['ABC', 'DEF', 'XYZP'] (df.withColumn("brand2",F.split("brand","-")[1]).where(F.col("brand2") .isin(brand_subset))).show() or: (df.withColumn("brand2",F.split("brand","-")[1]).filter(F.col("brand2") .isin(brand_subset)).show() +---+-------+-----+------+ | id| brand|Count|brand2| +---+-------+-----+------+ |143| AD-ABC| 3| ABC| |381| AD-ABC| 6| ABC| |425|AD-XYZP| 7| XYZP| |331| AX-ABC| 10| ABC| +---+-------+-----+------+
Graph python similar to R
I have a table like this one: (Ignore the columns "Index" and "D") +-------+-----------------------+----------+----------+----------+ | Index | Type | Male | Female | D | +-------+-----------------------+----------+----------+----------+ | 44 | Life struggles | 2.097324 | 3.681356 | 1.584032 | | 2 | Writing notes | 2.677262 | 3.354730 | 0.677468 | | 18 | Empathy | 3.528117 | 4.083051 | 0.554933 | | 12 | Criminal damage | 2.926650 | 2.374150 | 0.552501 | | 20 | Giving | 2.650367 | 3.196944 | 0.546577 | | 21 | Compassion to animals | 3.666667 | 4.178268 | 0.511602 | | 33 | Mood swings | 2.965937 | 3.451613 | 0.485676 | | 10 | Funniness | 3.574572 | 3.104907 | 0.469665 | | 38 | Children | 3.354523 | 3.805415 | 0.450891 | | 47 | Small - big dogs | 3.221951 | 2.801695 | 0.420256 | +-------+-----------------------+----------+----------+----------+ and I am trying to do a similar graph : I know how to do it in R but not in python I tried this: sns.stripplot(data=df,y="Male",color="Blue") sns.stripplot(data=df,y="Female",color="red") But I don't know how to continue. Does someone have am idea?
This is easily done with matplotlib, it is simply a scatter plot with categories as y-values. plt.style.use('ggplot') fig, ax = plt.subplots() ax.plot(df['Male'],df['Type'],'o', color='xkcd:reddish', ms=10, label='Male') ax.plot(df['Female'],df['Type'],'o', color='xkcd:teal', ms=10, label='Female') ax.axvline(3,ls='-',color='k') ax.set_xlim(1,5) ax.set_xlabel('avg response') ax.set_ylabel('Variable') ax.legend(bbox_to_anchor=(0.5, 1.02), loc='lower center', ncol=2, title='group') fig.tight_layout()
Plotting a CDF from a multiclass pandas dataframe
I understand the package empiricaldist provides a CDF function as per the documentation. However, I find it tricky to plot my dataframe in the column has multiple values. df.head() +------+---------+---------------+-------------+----------+----------+-------+--------------+-----------+-----------+-----------+-----------+------------+ | | trip_id | seconds_start | seconds_end | duration | distance | speed | acceleration | lat_start | lon_start | lat_end | lon_end | travelmode | +------+---------+---------------+-------------+----------+----------+-------+--------------+-----------+-----------+-----------+-----------+------------+ | 0 | 318410 | 1461743310 | 1461745298 | 1988 | 5121.49 | 2.58 | 0.00130 | 41.162687 | -8.615425 | 41.177888 | -8.597549 | car | | 1 | 318411 | 1461749359 | 1461750290 | 931 | 1520.71 | 1.63 | 0.00175 | 41.177949 | -8.597074 | 41.177839 | -8.597574 | bus | | 2 | 318421 | 1461806871 | 1461806941 | 70 | 508.15 | 7.26 | 0.10370 | 37.091240 | -8.211239 | 37.092322 | -8.206681 | foot | | 3 | 318422 | 1461837354 | 1461838024 | 670 | 1207.39 | 1.80 | 0.00269 | 37.092082 | -8.205060 | 37.091659 | -8.206462 | car | | 4 | 318425 | 1461852790 | 1461853845 | 1055 | 1470.49 | 1.39 | 0.00132 | 37.091628 | -8.202143 | 37.092095 | -8.205070 | foot | +------+---------+---------------+-------------+----------+----------+-------+--------------+-----------+-----------+-----------+-----------+------------+ Would like to plot CDF for the column travelmode for each travel mode. groups = df.groupby('travelmode') However, I don't really understand how this could be done from the documentation.
You can plot them in a loop like import matplotlib.pyplot as plt def decorate_plot(title): ''' Adds labels to plot ''' plt.xlabel('Outcome') plt.ylabel('CDF') plt.title(title) for tm in df['travelmode'].unique(): for col in df.columns: if col != 'travelmode': # Create new figures for each plot fig, ax = plt.subplots() d4 = Cdf.from_seq(df[col]) d4.plot() decorate_plot(f"{tm} - {col}")
How do I make this bs4 webscraping code travel down a table and store strings into a 2d array?
I am building a web scraper that tracks a changing list using BS, the html tags for the objects I am looking for are generic except for their id's which are unique and constantly changing. I know the top id will always be the same so I have gotten to the point where my output is giving me the top result in the format I need, but I am trying to figure out a way of adding the next nine . I cannot use their id's because they change, so I thought of using .find_next('tr') but I cant figure out how to get it past the second . I know that there must be an elegant solution, but it is my first time using BS4 so I was hoping that someone could point me in the right direction. import requests from bs4 import BeautifulSoup from numpy import np website_url = requests.get ('').text soup = BeautifulSoup(website_url, 'lxml') L = [] H = ["H1","H2","H3"] for derp in soup.find(id='tr-id-1').findAll('a')[0:3:1]: L.append(derp.string) A = np.vstack((H, L)) print(A) This gets me the printed array in the right format, but only for the with the id I entered in the find. I can get the second row by writing- for derp in soup.find(id='tr-id-1').find_next('tr').findAll('a')[0:3:1]: -but i don't know how to get further. I am only trying to scrape the first 10 rows of the table so I am thinking that I might need a while loop with a countdown marker? I am wondering if there is a way to create a loop that selectively takes the next 9 rows and appends the specific column data in the array.
This script prints the table with currencies (you can store data to list or numpy instead): import requests from bs4 import BeautifulSoup url = 'https://coinmarketcap.com' soup = BeautifulSoup(requests.get(url).text, 'lxml') for tr in soup.select('#currencies tr'): if not tr.select('td'): continue for i, td in enumerate(tr.select('td')[:-2]): txt = td.text.replace('\n', ' ').replace('*', '').strip() if i == 0: print('{: ^4}'.format(txt), end='|') else: print('{: ^24}'.format(txt), end='|') print() Prints: 1 | BTC Bitcoin | $196,174,869,053 | $11020.77 | $24,501,665,241 | 17,800,475 BTC | -5.84% | 2 | ETH Ethereum | $30,603,567,177 | $286.61 | $8,821,119,760 | 106,776,759 ETH | -2.40% | 3 | XRP XRP | $16,148,857,177 | $0.379379 | $1,335,082,415 | 42,566,596,173 XRP | -4.06% | 4 | LTC Litecoin | $7,401,989,981 | $118.37 | $4,167,212,036 | 62,533,191 LTC | -4.15% | 5 | BCH Bitcoin Cash | $7,133,878,965 | $399.09 | $1,770,785,779 | 17,875,463 BCH | -3.87% | 6 | EOS EOS | $5,292,523,634 | $5.74 | $2,181,469,151 | 921,990,507 EOS | -2.58% | 7 | BNB Binance Coin | $4,621,088,383 | $32.73 | $267,344,456 | 141,175,490 BNB | -1.87% | 8 | USDT Tether | $3,684,665,566 | $0.999098 | $23,550,580,244 | 3,687,991,972 USDT | -0.53% | 9 | BSV Bitcoin SV | $3,534,271,930 | $197.94 | $351,461,514 | 17,854,986 BSV | -1.55% | 10 | TRX TRON | $2,113,478,617 | $0.031695 | $803,645,870 | 66,682,072,191 TRX | -0.96% | 11 | ADA Cardano | $1,981,827,482 | $0.076439 | $119,258,825 | 25,927,070,538 ADA | -3.69% | 12 | XLM Stellar | $1,940,474,350 | $0.099896 | $358,001,782 | 19,425,036,996 XLM | -2.60% | 13 | LEO UNUS SED LEO | $1,749,404,864 | $1.75 | $12,215,975 | 999,498,893 LEO | -2.01% | 14 | XMR Monero | $1,552,808,370 | $90.96 | $127,576,800 | 17,070,711 XMR | -0.03% | 15 | DASH Dash | $1,360,432,697 | $152.74 | $254,426,418 | 8,906,619 DASH | -4.60% | 16 | LINK Chainlink | $1,248,540,238 | $3.57 | $188,091,151 | 350,000,000 LINK | 4.24% | 17 | NEO NEO | $1,191,827,236 | $16.90 | $490,262,022 | 70,538,831 NEO | -4.08% | 18 | MIOTA IOTA | $1,069,692,929 | $0.384847 | $20,590,490 | 2,779,530,283 MIOTA | -2.74% | 19 | ATOM Cosmos | $1,021,900,211 | $5.36 | $63,724,815 | 190,688,439 ATOM | -4.31% | 20 | ETC Ethereum Classic | $872,993,215 | $7.81 | $751,025,201 | 111,727,165 ETC | -2.11% | 21 | XTZ Tezos | $817,988,097 | $1.24 | $7,008,121 | 658,849,612 XTZ | -1.62% | 22 | XEM NEM | $807,925,560 | $0.089770 | $24,960,771 | 8,999,999,999 XEM | -2.04% | 23 | ZEC Zcash | $700,497,262 | $101.45 | $281,578,113 | 6,905,119 ZEC | -1.92% | 24 | ONT Ontology | $675,289,519 | $1.36 | $133,352,633 | 494,757,215 ONT | -4.79% | 25 | MKR Maker | $655,751,917 | $655.75 | $1,156,367 | 1,000,000 MKR | 1.59% | 26 | CRO Crypto.com Chain | $552,054,533 | $0.071538 | $3,056,529 | 7,716,894,977 CRO | -2.14% | 27 | BTG Bitcoin Gold | $460,804,983 | $26.31 | $11,765,404 | 17,513,924 BTG | -5.22% | 28 | QTUM Qtum | $456,630,457 | $4.76 | $303,751,103 | 95,845,424 QTUM | -6.03% | 29 | DOGE Dogecoin | $452,796,907 | $0.003766 | $160,357,833 | 120,219,215,287 DOGE | 13.34% | 30 | VET VeChain | $416,649,897 | $0.007513 | $57,432,988 | 55,454,734,800 VET | 0.22% | 31 | BAT Basic Attenti... | $372,192,333 | $0.292373 | $28,777,633 | 1,273,006,300 BAT | -1.68% | 32 | USDC USD Coin | $366,029,067 | $0.997092 | $110,990,052 | 367,096,485 USDC | -0.35% | 33 | OMG OmiseGO | $323,389,435 | $2.31 | $102,874,355 | 140,245,398 OMG | -4.46% | 34 | VSYS V Systems | $312,745,092 | $0.178751 | $10,413,916 | 1,749,608,504 VSYS | -2.14% | 35 | DCR Decred | $297,378,169 | $29.63 | $1,739,049 | 10,037,096 DCR | -6.28% | 36 | BTT BitTorrent | $277,023,930 | $0.001306 | $56,422,080 | 212,116,500,000 BTT | -0.66% | 37 | HOT Holo | $229,759,018 | $0.001725 | $25,162,937 | 133,214,575,156 HOT | 0.21% | 38 | EGT Egretia | $222,938,874 | $0.052953 | $39,938,247 | 4,210,121,792 EGT | 2.24% | 39 | TUSD TrueUSD | $213,775,752 | $0.989291 | $131,504,347 | 216,089,898 TUSD | -1.22% | 40 | HC HyperCash | $208,038,166 | $4.78 | $13,740,143 | 43,529,781 HC | -5.15% | 41 | BCD Bitcoin Diamond | $202,441,610 | $1.09 | $3,011,645 | 186,492,898 BCD | -2.68% | 42 | RVN Ravencoin | $199,913,461 | $0.051124 | $15,906,528 | 3,910,345,000 RVN | -4.85% | 43 | HEDG HedgeTrade | $199,512,069 | $0.691805 | $1,395,797 | 288,393,355 HEDG | -4.07% | 44 | HT Huobi Token | $199,033,233 | $3.98 | $96,051,667 | 50,000,200 HT | -1.05% | 45 | AOA Aurora | $196,149,743 | $0.029982 | $10,250,381 | 6,542,330,148 AOA | 3.98% | 46 | LSK Lisk | $195,904,100 | $1.66 | $8,908,589 | 118,280,370 LSK | -4.19% | 47 | NPXS Pundi X | $193,713,239 | $0.000815 | $4,963,859 | 237,816,087,583 NPXS | -3.55% | 48 | KMD Komodo | $188,203,691 | $1.64 | $12,446,638 | 114,883,815 KMD | 5.35% | 49 | BTM Bytom | $188,040,836 | $0.187572 | $54,667,305 | 1,002,499,275 BTM | 11.50% | 50 | WAVES Waves | $177,993,586 | $1.78 | $12,125,048 | 100,000,000 WAVES | -5.44% | 51 | ZRX 0x | $171,782,082 | $0.287372 | $13,858,310 | 597,769,457 ZRX | -2.12% | 52 | QBIT Qubitica | $169,037,263 | $60.18 | $56,989 | 2,808,628 QBIT | -2.17% | 53 | BTS BitShares | $163,390,853 | $0.059810 | $3,167,151 | 2,731,850,000 BTS | 0.32% | 54 | PAX Paxos Standar... | $162,646,375 | $0.997482 | $136,732,457 | 163,056,875 PAX | -0.43% | 55 | NANO Nano | $161,436,979 | $1.21 | $15,637,856 | 133,248,297 NANO | -1.85% | 56 | BCN Bytecoin | $159,764,211 | $0.000868 | $116,637 | 184,066,828,814 BCN | -6.31% | 57 | REP Augur | $156,392,893 | $14.22 | $5,068,790 | 11,000,000 REP | -3.05% | 58 | NRG Energi | $156,304,538 | $8.70 | $1,129,572 | 17,972,740 NRG | -1.29% | 59 | MONA MonaCoin | $152,581,722 | $2.32 | $6,087,802 | 65,729,675 MONA | -3.20% | 60 | THR ThoreCoin | $148,663,548 | $1714.97 | $179,021 | 86,686 THR | -5.74% | 61 | IOST IOST | $146,115,831 | $0.012162 | $31,901,773 | 12,013,965,609 IOST | -2.13% | 62 | ICX ICON | $142,531,341 | $0.301076 | $10,890,859 | 473,406,688 ICX | -4.32% | 63 | DGB DigiByte | $138,427,137 | $0.011541 | $1,186,651 | 11,994,056,188 DGB | -5.09% | 64 | ZIL Zilliqa | $136,925,993 | $0.015762 | $13,494,196 | 8,687,360,058 ZIL | -2.40% | 65 | KCS KuCoin Shares | $129,815,988 | $1.45 | $26,079,874 | 89,659,415 KCS | -2.98% | 66 | LAMB Lambda | $125,711,992 | $0.251424 | $36,282,311 | 500,000,000 LAMB | -1.68% | 67 | XIN Mixin | $125,658,873 | $277.73 | $887,298 | 452,447 XIN | -5.58% | 68 | ABBC ABBC Coin | $125,028,718 | $0.247542 | $83,030,205 | 505,080,602 ABBC | -5.76% | 69 | SC Siacoin | $121,947,079 | $0.002949 | $2,043,076 | 41,353,612,700 SC | -4.06% | 70 | GXC GXChain | $121,693,968 | $2.03 | $3,042,648 | 60,000,000 GXC | -4.99% | 71 | AE Aeternity | $121,162,277 | $0.444154 | $38,500,380 | 272,793,174 AE | -4.23% | 72 | XVG Verge | $116,468,172 | $0.007369 | $1,604,278 | 15,805,409,499 XVG | -3.23% | 73 | ETP Metaverse ETP | $116,154,120 | $1.62 | $40,461,666 | 71,759,885 ETP | -9.33% | 74 | STEEM Steem | $110,124,952 | $0.340978 | $1,122,121 | 322,967,892 STEEM | -2.49% | 75 | ARDR Ardor | $109,691,634 | $0.109801 | $1,327,459 | 998,999,495 ARDR | -5.31% | 76 | ELF aelf | $108,074,954 | $0.217880 | $15,735,213 | 496,030,000 ELF | -1.35% | 77 | INB Insight Chain | $107,508,460 | $0.307252 | $5,435,619 | 349,902,689 INB | -6.46% | 78 | SOLVE SOLVE | $107,021,001 | $0.327169 | $9,286,887 | 327,112,052 SOLVE | 12.73% | 79 | VEST VestChain | $105,387,369 | $0.014889 | $396,465 | 7,078,400,000 VEST | -11.00% | 80 | QNT Quant | $101,347,141 | $10.37 | $13,719,402 | 9,777,236 QNT | 14.61% | 81 | NEX Nash Exchange | $99,192,791 | $2.74 | $1,985,748 | 36,196,678 NEX | -3.50% | 82 | THETA THETA | $98,543,466 | $0.113203 | $2,157,896 | 870,502,690 THETA | -5.94% | 83 | DENT Dent | $96,923,187 | $0.001332 | $6,458,186 | 72,745,838,994 DENT | -5.83% | 84 | WTC Waltonchain | $96,854,881 | $2.32 | $22,574,334 | 41,682,339 WTC | 16.17% | 85 | MCO Crypto.com | $93,731,944 | $5.93 | $8,346,732 | 15,793,831 MCO | -2.05% | 86 | MAID MaidSafeCoin | $93,684,812 | $0.207014 | $685,687 | 452,552,412 MAID | -6.19% | 87 | SNT Status | $91,353,836 | $0.026323 | $17,797,371 | 3,470,483,788 SNT | -3.86% | 88 | ENJ Enjin Coin | $89,470,435 | $0.115345 | $4,932,359 | 775,679,781 ENJ | -2.92% | 89 | EKT EDUCare | $89,461,163 | $0.124975 | $2,309,456 | 715,835,137 EKT | -2.38% | 90 | DAI Dai | $88,618,335 | $0.982690 | $18,747,792 | 90,179,367 DAI | -0.80% | 91 | GNT Golem | $88,468,793 | $0.091730 | $1,127,506 | 964,450,000 GNT | -3.47% | 92 | XZC Zcoin | $86,935,432 | $11.06 | $2,154,223 | 7,861,468 XZC | -4.14% | 93 | NAS Nebulas | $83,446,607 | $1.72 | $10,626,245 | 48,627,715 NAS | 4.07% | 94 | STRAT Stratis | $81,564,126 | $0.820611 | $3,546,613 | 99,394,330 STRAT | -7.04% | 95 | NET NEXT | $78,533,755 | $1.56 | $12,188,679 | 50,269,268 NET | 22.71% | 96 | REN Ren | $77,607,280 | $0.100819 | $8,913,932 | 769,764,831 REN | -8.98% | 97 | CCCX Clipper Coin | $74,987,003 | $0.019861 | $56,719 | 3,775,570,996 CCCX | 13.61% | 98 | MXM Maximine Coin | $70,357,334 | $0.042667 | $2,647,414 | 1,649,000,000 MXM | -2.66% | 99 | WAX WAX | $69,037,228 | $0.073224 | $548,672 | 942,821,662 WAX | -5.35% | 100 | SAN Santiment Net... | $69,026,823 | $1.10 | $21,347 | 62,660,371 SAN | -2.61% |
Using attribute and class selectors, you can easily scrape the table: import requests from bs4 import BeautifulSoup def make_soup(url: str) -> BeautifulSoup: res = requests.get(url, headers={ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0' }) res.raise_for_status() return BeautifulSoup(res.text, 'html.parser') def scrape_coins(soup: BeautifulSoup) -> list: table = soup.select_one('#currencies') coins = {} for row in table.select('tbody > tr'): symbol = row.select_one('.currency-symbol').text.strip() name = row.select_one('.currency-name-container').text.strip() cap = row.select_one('.market-cap')['data-usd'] price = row.select_one('.price')['data-usd'] volume = row.select_one('.volume')['data-usd'] supply = row.select_one('[data-supply]')['data-supply'] change = row.select_one('[data-percentusd]')['data-percentusd'] coins[symbol] = { 'name': name, 'cap': float(cap), 'price': float(price), 'volume': float(volume), 'supply': float(supply), 'change': float(change), } return coins if __name__ == "__main__": url = 'https://coinmarketcap.com/' soup = make_soup(url) info = scrape_coins(soup) from pprint import pprint pprint(info) output: {'BTC': {'cap': 196969226244.0, 'change': -5.4235, 'name': 'Bitcoin', 'price': 11065.3915833, 'supply': 17800475.0, 'volume': 24574484943.9}, 'ETH': {'cap': 30724660168.6, 'change': -2.00031, 'name': 'Ethereum', 'price': 287.746701554, 'supply': 106776758.874, 'volume': 8840470261.58}, 'LTC': {'cap': 7439287857.04, 'change': -3.64038, 'name': 'Litecoin', 'price': 118.965428838, 'supply': 62533190.774, 'volume': 4181083872.28}, 'XRP': {'cap': 16149651071.9, 'change': -4.05122, 'name': 'XRP', 'price': 0.379397286226, 'supply': 42566596173.0, 'volume': 1332204345.98}} ... and so on
How to shift selected rows to next adjacent column in pandas?
df3=pd.read_excel(r'may_2019.xlsx',sheet_name='Sheet2') Here is Sample of my Pandas Dataframe: +--------------------------+ | Col1 | +--------------------------+ | G | 20 mins | 2015 | | NR | 2 | | G | 11 mins | 302 | | TV-MA | 44 mins | Apr 30 | | G | 198 | | TV-MA | Apr 30 | | NR | 2012 | | NR | 57 mins | +--------------------------+ there are some exception in data(i.e: 2,198,302) Output Desired for Given Sample : +--------+----------+------+-------+-----+ | Rating | Duration | Year | Month | Day | +--------+----------+------+-------+-----+ | G | 20 | 2015 | | | | NR | | 2 | | | | G | 11 | 302 | | | | TV-MA | 44 | | Apr | 30 | | G | | 198 | | | | TV-MA | | | Jan | 20 | | NR | | 2012 | | | | NR | 57 | | | | +--------+----------+------+-------+-----+ Things I've tried df5=pd.DataFrame(df3.Col1.str.split("|").tolist(),columns=['r','d','y']) indx=df5.loc[df5.d.str.contains('\d{4}')].index df6.loc[indx,['d','y']]=df5.loc[indx,['d','y']].shift(1,axis=1) then I failed to shift date according to my required table so I tried to create function but that also not worked. def split_data(input): newd=input.split("|") if len(newd)==3: df['date']=newd[2] df['du']=newd[1] df['rating']=newd[0] if len(newd)==2: df['rating']=newd[0] if re.findall('\d{4}',newd[1]): df['date']=newd[1] else: df['du']=newd[1] return df Things I've tried doen't provide a complete solution for all cases. So Does anyone know how to do it with Pandas?
Looking at your inputs, i would first try reading in the data properly - it seems you fail in defining the separators etc. of the excel file