Pandas reading in file that has uneven column lengths - python

I'm trying to read in a discharge data file which looks like this:
Station number: 420
Location: Kotagaon Shringe
Latitude: 27 45 00
River: Kali Gandaki
Longitude: 84 20 50
Year: 2001
Mean daily discharge in m3/s
============================
Day Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec. Year
01 118 99.3 85.9 75.5 119 182 656 2790 1690 402 232 158
02 123 97.4 82.9 74.3 134 251 514 2420 2180 397 230 158
03 118 95.5 80.7 73.1 168 377 466 2190 2190 386 226 157
-------------------------------- Skipping some rows of no real interest
25 95.5 85.5 70.7 83.3 163 583 898 3230 485 257 177 123
26 94.1 88.6 69.9 84.6 167 579 996 2330 474 252 175 121
27 92.2 88.6 71.9 88.1 166 736 1180 2270 461 248 173 120
28 91.8 87.3 69.9 91.3 172 419 1020 2270 431 246 168 118
29 95.5 71.9 93.2 165 446 1670 2140 410 244 163 118
30 98.4 76.0 109 176 575 2040 2100 403 239 159 117
31 98.4 75.1 174 3330 1600 234 117
My problem is that when using white space as a separator it does shift over the March value at day 29 since February got no day 29. And again for other places with empty/no values.
Is there a good way to work around this?
I have looked for solutions online but all I could find is dealing with uneven row length, not uneven column length.
My attempt this far has resulted in the code:
disc = pd.read_csv(filename,header = 6,sep = '\s+',nrows = 31)
disc['Year'] = 2001
With the dataframe looking like:
Day Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Nov. Dec. Year
0 1 118.0 99.3 85.9 75.5 119 182 656 2790.0 1690.0 402.0 232.0 158.0 2001
1 2 123.0 97.4 82.9 74.3 134 251 514 2420.0 2180.0 397.0 230.0 158.0 2001
2 3 118.0 95.5 80.7 73.1 168 377 466 2190.0 2190.0 386.0 226.0 157.0 2001
----------------------------------------------- Skipping some rows of no real interest
28 29 95.5 71.9 93.2 165.0 446 1670 2140 410.0 244.0 163.0 118.0 NaN 2001
29 30 98.4 76.0 109.0 176.0 575 2040 2100 403.0 239.0 159.0 117.0 NaN 2001
30 31 98.4 75.1 174.0 3330.0 1600 234 117 NaN NaN NaN NaN NaN 2001

You can use the pd.read_fwf() module for reading fixed-width files and leverage the skiprows keyword:
disc = pd.read_fwf('test.csv', skiprows=11)
Yields:
Day Jan. Feb. Mar. Apr. ... Sep. Oct. Nov. Dec. Year
0 1 118.0 99.3 85.9 75.5 ... 1690.0 402 232.0 158 NaN
1 2 123.0 97.4 82.9 74.3 ... 2180.0 397 230.0 158 NaN
2 3 118.0 95.5 80.7 73.1 ... 2190.0 386 226.0 157 NaN
3 25 95.5 85.5 70.7 83.3 ... 485.0 257 177.0 123 NaN
4 26 94.1 88.6 69.9 84.6 ... 474.0 252 175.0 121 NaN
5 27 92.2 88.6 71.9 88.1 ... 461.0 248 173.0 120 NaN
6 28 91.8 87.3 69.9 91.3 ... 431.0 246 168.0 118 NaN
7 29 95.5 NaN 71.9 93.2 ... 410.0 244 163.0 118 NaN
8 30 98.4 NaN 76.0 109.0 ... 403.0 239 159.0 117 NaN
9 31 98.4 NaN 75.1 NaN ... NaN 234 NaN 117 NaN

Related

How can I create group column

I want to divide data each 2unit raw using pandas
for example
df_A: raw data
data1
data2
data3
23
13.3
983
13
33.4
124
24
62.3
574
25
78.5
554
63
93.3
982
29
43.3
123
53
62.6
364
83
74.3
453
21
83.0
165
93
23.4
433
df_B :result data
group
data1
data2
data3
0
23
13.3
983
0
13
33.4
124
1
24
62.3
574
1
25
78.5
554
2
63
93.3
982
2
29
43.3
123
3
53
62.6
364
3
83
74.3
453
4
21
83.0
165
4
93
23.4
433
thank you
Try:
df["group"] = df.index // 2
Or:
df["group"] = np.arange(len(df)) // 2
This creates "group" column:
data1 data2 data3 group
0 23 13.3 983 0
1 13 33.4 124 0
2 24 62.3 574 1
3 25 78.5 554 1
4 63 93.3 982 2
5 29 43.3 123 2
6 53 62.6 364 3
7 83 74.3 453 3
8 21 83.0 165 4
9 93 23.4 433 4

Scraping a team stats table from python using BS4

Im trying to scrape a table from pro-football-reference, specifically the team offense table from https://www.pro-football-reference.com/years/2019/. Anytime I try the code below I get back an empty list or just a NoneType. I have scraped other websites like ESPN and have had no problems.
import requests
from bs4 import BeautifulSoup
url = 'https://www.pro-football-reference.com/years/{}/'
response = requests.get(url.format(2019))
soup = BeautifulSoup(response.text, 'lxml')
team_table = soup.find('table', {'id':'team_stats'})
I have also tried
soup = BeautifulSoup(response.text, 'html.parser')
to see if maybe it was the way I was bringing the data in. The page does have a bunch of tables so im assuming thats why its more difficult to scrape a specific table. Thank you.
The data is inside HTML comments <!-- ... -->. You can use this script to get them:
import requests
from bs4 import BeautifulSoup, Comment
url = "https://www.pro-football-reference.com/years/2019/"
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
table = soup.select_one('#all_team_stats').find_next(text=lambda t: isinstance(t, Comment))
table = BeautifulSoup(table, 'html.parser')
for tr in table.select('tr'):
tds = [td.get_text(strip=True) for td in tr.select('td')]
print(*tds)
Prints:
Baltimore Ravens 16 531 6521 1064 6.1 15 7 386 289 440 3225 37 8 6.9 171 596 3296 21 5.5 188 109 867 27 52.1 8.6 245.99
San Francisco 49ers 16 479 6097 1012 6.0 23 10 336 331 478 3792 28 13 7.4 195 498 2305 23 4.6 110 105 939 31 44.3 12.0 146.39
Tampa Bay Buccaneers 16 458 6366 1086 5.9 41 11 353 382 630 4845 33 30 7.2 244 409 1521 15 3.7 81 133 1111 28 38.3 20.7 44.00
New Orleans Saints 16 458 5982 1011 5.9 8 2 347 418 581 4244 36 6 7.0 230 405 1738 12 4.3 97 120 1036 20 47.1 4.1 167.04
Kansas City Chiefs 16 451 6067 976 6.2 15 10 350 378 576 4498 30 5 7.5 211 375 1569 16 4.2 93 107 1029 46 49.4 8.0 265.38
Dallas Cowboys 16 434 6904 1069 6.5 18 7 379 388 597 4751 30 11 7.7 229 449 2153 18 4.8 120 109 1008 30 44.6 10.3 214.77
New England Patriots 16 420 5664 1095 5.2 15 6 338 378 620 3961 25 9 6.1 197 447 1703 17 3.8 110 94 828 31 36.8 7.6 39.62
Minnesota Vikings 16 407 5656 970 5.8 20 12 314 319 466 3523 26 8 7.1 171 476 2133 19 4.5 106 96 895 37 41.9 10.5 103.01
Seattle Seahawks 16 405 5991 1046 5.7 20 14 341 341 517 3791 31 6 6.7 190 481 2200 15 4.6 121 109 882 30 36.9 10.2 90.55
Tennessee Titans 16 402 5805 949 6.1 17 9 317 297 448 3582 29 8 7.1 177 445 2223 21 5.0 104 99 932 36 31.7 8.2 119.88
Los Angeles Rams 16 394 5998 1055 5.7 24 7 342 397 632 4499 22 17 6.9 222 401 1499 20 3.7 92 118 899 28 36.0 12.9 42.09
Philadelphia Eagles 16 385 5772 1104 5.2 23 15 354 391 613 3833 27 8 5.9 215 454 1939 16 4.3 104 100 836 35 35.5 10.4 67.85
Atlanta Falcons 16 381 6075 1096 5.5 25 10 383 459 684 4714 29 15 6.4 258 362 1361 10 3.8 84 119 956 41 41.3 13.4 104.51
Houston Texans 16 378 5792 1017 5.7 22 8 346 355 534 3783 27 14 6.5 203 434 2009 17 4.6 112 111 892 31 37.7 12.0 121.67
Green Bay Packers 16 376 5528 1020 5.4 13 9 320 356 573 3733 26 4 6.1 190 411 1795 18 4.4 90 100 774 40 37.1 6.9 118.40
Arizona Cardinals 16 361 5467 1000 5.5 18 6 314 355 554 3477 20 12 5.8 176 396 1990 18 5.0 109 121 956 29 38.8 10.1 67.36
Indianapolis Colts 16 361 5238 1016 5.2 21 11 340 307 513 3108 22 10 5.7 165 471 2130 17 4.5 131 79 670 44 36.1 11.4 78.80
Detroit Lions 16 341 5549 1021 5.4 23 8 313 344 571 3900 28 15 6.4 196 407 1649 7 4.1 82 113 937 35 33.3 11.7 33.07
New York Giants 16 341 5416 1012 5.4 33 16 311 376 607 3731 30 17 5.7 187 362 1685 11 4.7 89 90 784 35 28.3 18.3 -5.30
Carolina Panthers 16 340 5469 1077 5.1 35 14 335 382 633 3650 17 21 5.3 230 386 1819 20 4.7 82 87 754 23 32.3 16.9 -24.07
Los Angeles Chargers 16 337 5879 997 5.9 31 11 349 394 597 4426 24 20 7.0 220 366 1453 12 4.0 90 103 872 39 39.5 18.5 83.43
Cleveland Browns 16 335 5455 973 5.6 28 7 305 318 539 3554 22 21 6.1 180 393 1901 15 4.8 90 122 1106 35 34.1 14.8 16.54
Buffalo Bills 16 314 5283 1018 5.2 19 7 314 299 513 3229 21 12 5.8 162 465 2054 13 4.4 120 117 927 32 30.6 10.4 9.66
Oakland Raiders 16 313 5819 989 5.9 17 9 315 367 523 3926 22 8 7.1 194 437 1893 13 4.3 104 128 1138 17 32.9 9.9 80.57
Miami Dolphins 16 306 4960 1022 4.9 26 8 315 371 615 3804 22 18 5.7 210 349 1156 10 3.3 64 92 769 41 30.6 13.3 -23.85
Jacksonville Jaguars 16 300 5468 1020 5.4 20 12 298 364 589 3760 24 8 6.0 183 389 1708 3 4.4 85 132 1165 30 33.9 10.2 -14.15
Pittsburgh Steelers 16 289 4428 937 4.7 30 11 265 315 510 2981 18 19 5.5 147 395 1447 7 3.7 75 111 893 43 28.6 15.7 -84.56
Denver Broncos 16 282 4777 954 5.0 16 6 279 312 504 3115 16 10 5.7 162 409 1662 11 4.1 77 110 912 40 32.9 9.4 -11.61
Chicago Bears 16 280 4749 1020 4.7 19 7 297 371 580 3291 20 12 5.3 178 395 1458 8 3.7 85 103 838 34 29.1 10.5 -38.05
Cincinnati Bengals 16 279 5169 1049 4.9 30 14 312 356 616 3652 18 16 5.5 191 385 1517 9 3.9 85 93 761 36 30.3 16.0 -57.73
New York Jets 16 276 4368 956 4.6 25 9 253 323 521 3111 19 16 5.4 162 383 1257 6 3.3 61 115 1105 30 23.0 11.5 -108.92
Washington Redskins 16 266 4395 885 5.0 21 8 248 298 479 2812 18 13 5.3 154 356 1583 9 4.4 74 106 835 20 30.1 12.1 -82.30
Avg Team 365.0 5565.8 1016.1 5.5 22.2 9.4 324.0 354.1 557.9 3759.4 24.9 12.8 6.3 193.8 418.3 1806.4 14.0 4.3 97.3 107.8 915.8 32.9 36.0 11.8 56.6
League Total 11680 178107 32516 5.5 711 301 10369 11331 17853 120301 797 410 6.3 6200 13387 57806 447 4.3 3115 3451 29306 1054 36.0 11.8
Avg Tm/G 22.8 347.9 63.5 5.5 1.4 0.6 20.3 22.1 34.9 235.0 1.6 0.8 6.3 12.1 26.1 112.9 0.9 4.3 6.1 6.7 57.2 2.1 36.0 11.8

can we use clustering without target variables?

This is Sample data ..
Inn R B W Eco AVG SR
111 368 432 30 5.11 12.27 14.4
94 359 444 24 4.85 14.96 18.5
47 187 202 13 5.55 14.38 15.54
59 273 279 16 5.87 17.06 17.44
34 132 140 9 5.66 14.67 15.56
135 437 536 33 4.89 13.24 16.24
1 0 1 1 0 0 1
Now I would like to Make a new column which is Choice with values as Good, Bad, Moderate Bowling option for each row. How can i achieve it?

How to get values for the next month for a selected column from a pandas data frame with date time index

I have the below data frame (date time index, with all working days in us calender)
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
import random
us_bd = CustomBusinessDay(calendar=USFederalHolidayCalendar())
dt_rng = pd.date_range(start='1/1/2018', end='12/31/2018', freq=us_bd)
n1 = [round(random.uniform(20, 35),2) for _ in range(len(dt_rng))]
n2 = [random.randint(100, 200) for _ in range(len(dt_rng))]
df = pd.DataFrame(list(zip(n1,n2)), index=dt_rng, columns=['n1','n2'])
print(df)
n1 n2
2018-01-02 24.78 197
2018-01-03 23.33 176
2018-01-04 33.19 128
2018-01-05 32.49 110
... ... ...
2018-12-26 31.34 173
2018-12-27 29.72 166
2018-12-28 31.07 104
2018-12-31 33.52 184
[251 rows x 2 columns]
For each row in column n1 , how to get values from the same column for the same day of next month? (if value for that exact day is not available (due to weekends or holidays), then should get the value at the next available date. ). I tried using df.n1.shift(21), but its not working as the exact working days at each month differ.
Expected output as below
n1 n2 next_mnth_val
2018-01-02 25.97 184 28.14
2018-01-03 24.94 133 27.65 # three values below are same, because on Feb 2018, the next working day after 2nd is 5th
2018-01-04 23.99 143 27.65
2018-01-05 24.69 182 27.65
2018-01-08 28.43 186 28.45
2018-01-09 31.47 104 23.14
... ... ... ...
2018-12-26 29.06 194 20.45
2018-12-27 29.63 158 20.45
2018-12-28 30.60 148 20.45
2018-12-31 20.45 121 20.45
for December , the next month value should be last value of the data frame ie, value at index 2018-12-31 (20.45).
please help.
This is an interesting problem. I would shift the date by 1 month, then shift it again to the next business day:
df1 = df.copy().reset_index()
df1['new_date'] = df1['index'] + pd.DateOffset(months=1) + pd.offsets.BDay()
df.merge(df1, left_index=True, right_on='new_date')
Output (first 31st days):
n1_x n2_x index n1_y n2_y new_date
0 34.82 180 2018-01-02 29.83 129 2018-02-05
1 34.82 180 2018-01-03 24.28 166 2018-02-05
2 34.82 180 2018-01-04 27.88 110 2018-02-05
3 24.89 186 2018-01-05 25.34 111 2018-02-06
4 31.66 137 2018-01-08 26.28 138 2018-02-09
5 25.30 162 2018-01-09 32.71 139 2018-02-12
6 25.30 162 2018-01-10 34.39 159 2018-02-12
7 25.30 162 2018-01-11 20.89 132 2018-02-12
8 23.44 196 2018-01-12 29.27 167 2018-02-13
12 25.40 153 2018-01-19 28.52 185 2018-02-20
13 31.38 126 2018-01-22 23.49 141 2018-02-23
14 30.90 133 2018-01-23 25.56 145 2018-02-26
15 30.90 133 2018-01-24 23.06 155 2018-02-26
16 30.90 133 2018-01-25 24.95 174 2018-02-26
17 29.39 138 2018-01-26 21.28 157 2018-02-27
18 32.94 173 2018-01-29 20.26 189 2018-03-01
19 32.94 173 2018-01-30 22.41 196 2018-03-01
20 32.94 173 2018-01-31 27.32 149 2018-03-01
21 28.09 119 2018-02-01 31.39 192 2018-03-02
22 32.21 199 2018-02-02 28.22 151 2018-03-05
23 21.78 120 2018-02-05 34.82 180 2018-03-06
24 28.25 127 2018-02-06 24.89 186 2018-03-07
25 22.06 189 2018-02-07 32.85 125 2018-03-08
26 33.78 121 2018-02-08 30.12 102 2018-03-09
27 30.79 137 2018-02-09 31.66 137 2018-03-12
28 29.88 131 2018-02-12 25.30 162 2018-03-13
29 20.02 143 2018-02-13 23.44 196 2018-03-14
30 20.28 188 2018-02-14 20.04 102 2018-03-15

X-Axis Ticks labels by year with X-Axis gridlines by fiscal quarter

I am trying to set the x axis tick labels as the year but have the gridlines as the fiscal quarter. The data is quite simple, just a groupby date.count, see below. Each date has a count and I am plotting it as a line plot.
rc[(rc['form']=='Bakken')&(rc['tgt']=='oil')].groupby(['date']).date.count()
date count
2010-01-08 65
2010-01-15 68
2010-01-22 73
2010-01-29 76
2010-02-05 79
2010-02-12 76
2010-02-19 79
2010-02-26 83
2010-03-05 81
2010-03-12 83
2010-03-19 80
2010-03-26 87
2010-04-02 84
2010-04-09 87
2010-04-16 87
2010-04-23 91
2010-04-30 86
2010-05-07 92
2010-05-14 95
2010-05-21 91
2010-05-28 100
2010-06-04 96
2010-06-11 101
2010-06-18 100
2010-06-25 113
2010-07-02 112
2010-07-09 119
2010-07-16 121
2010-07-23 119
2010-07-30 115
2010-08-06 115
2010-08-13 114
2010-08-20 111
2010-08-27 114
2010-09-03 121
2010-09-10 128
2010-09-17 121
2010-09-24 118
2010-10-01 109
2010-10-08 120
2010-10-15 122
2010-10-22 120
2010-10-29 118
2010-11-05 117
2010-11-12 115
2010-11-19 113
2010-11-26 106
2010-12-03 112
2010-12-10 114
2010-12-17 122
2010-12-24 120
2010-12-31 120
2011-01-07 139
2011-01-14 141
2011-01-21 141
2011-01-28 145
2011-02-04 146
2011-02-11 145
2011-02-18 148
2011-02-25 149
2011-03-04 150
2011-03-11 149
2011-03-18 145
2011-03-25 140
2011-04-01 150
2011-04-08 153
2011-04-15 151
2011-04-22 148
2011-04-29 150
2011-05-06 148
2011-05-13 154
2011-05-20 155
2011-05-27 152
2011-06-03 158
2011-06-10 155
2011-06-17 152
2011-06-24 148
2011-07-01 160
2011-07-08 164
2011-07-15 163
2011-07-22 147
2011-07-29 158
2011-08-05 161
2011-08-12 166
2011-08-19 158
2011-08-26 154
2011-09-02 161
2011-09-09 166
2011-09-16 160
2011-09-23 169
2011-09-30 171
2011-10-07 155
2011-10-14 159
2011-10-21 156
2011-10-28 168
2011-11-04 154
2011-11-11 166
2011-11-18 168
2011-11-25 164
2011-12-02 179
2011-12-09 171
2011-12-16 172
2011-12-23 165
2011-12-30 170
2012-01-06 162
2012-01-13 172
2012-01-20 172
2012-01-27 186
2012-02-03 183
2012-02-10 175
2012-02-17 188
2012-02-24 182
2012-03-02 184
2012-03-09 189
2012-03-16 190
2012-03-23 181
2012-03-30 186
2012-04-06 180
2012-04-13 178
2012-04-20 179
2012-04-27 174
2012-05-04 201
2012-05-11 201
2012-05-18 201
2012-05-25 201
2012-06-01 206
2012-06-08 206
2012-06-15 199
2012-06-22 201
2012-06-29 186
2012-07-06 194
2012-07-13 192
2012-07-20 189
2012-07-27 189
2012-08-03 189
2012-08-10 194
2012-08-17 190
2012-08-24 192
2012-08-31 177
2012-09-07 186
2012-09-14 173
2012-09-21 178
2012-09-28 180
2012-10-05 173
2012-10-12 165
2012-10-19 167
2012-10-26 160
2012-11-02 160
2012-11-09 167
2012-11-16 159
2012-11-23 161
2012-11-30 166
2012-12-07 161
2012-12-14 150
2012-12-21 158
2012-12-28 122
2013-01-04 121
2013-01-11 115
2013-01-18 116
2013-01-25 119
2013-02-01 113
2013-02-08 112
2013-02-15 125
2013-02-22 113
2013-03-01 117
2013-03-08 113
2013-03-15 113
2013-03-22 116
2013-03-29 125
2013-04-05 113
2013-04-12 120
2013-04-19 120
2013-04-26 128
2013-05-03 131
2013-05-10 129
2013-05-17 135
2013-05-24 125
2013-05-31 140
2013-06-07 131
2013-06-14 129
2013-06-21 130
2013-06-28 139
2013-07-05 136
2013-07-12 137
2013-07-19 131
2013-07-26 132
2013-08-02 131
2013-08-09 138
2013-08-16 138
2013-08-23 140
2013-08-30 137
2013-09-06 132
2013-09-13 132
2013-09-20 129
2013-09-27 129
2013-10-04 128
2013-10-11 129
2013-10-18 130
2013-10-25 135
2013-11-01 128
2013-11-08 131
2013-11-15 130
2013-11-22 128
2013-11-29 134
2013-12-06 140
2013-12-13 131
2013-12-20 130
2013-12-27 125
2014-01-03 134
2014-01-10 138
2014-01-17 139
2014-01-24 129
2014-01-31 142
2014-02-07 145
2014-02-14 135
2014-02-21 140
2014-02-28 137
2014-03-07 148
2014-03-14 148
2014-03-21 140
2014-03-28 141
2014-04-04 148
2014-04-11 145
2014-04-18 145
2014-04-25 140
2014-05-02 157
2014-05-09 146
2014-05-16 143
2014-05-23 159
2014-05-30 152
2014-06-06 141
2014-06-13 145
2014-06-20 152
2014-06-27 145
2014-07-03 144
2014-07-11 150
2014-07-18 145
2014-07-25 146
2014-08-01 149
2014-08-08 145
2014-08-15 146
2014-08-22 151
2014-08-29 142
2014-09-05 155
2014-09-12 149
2014-09-19 158
2014-09-26 149
2014-10-03 154
2014-10-10 141
2014-10-17 150
2014-10-24 135
2014-10-31 145
2014-11-07 145
2014-11-14 155
2014-11-21 143
2014-11-26 148
2014-12-05 149
2014-12-12 151
2014-12-19 155
2014-12-26 143
2015-01-02 131
2015-01-09 132
2015-01-16 124
2015-01-23 132
2015-01-30 121
2015-02-06 116
2015-02-13 115
2015-02-20 105
2015-02-27 77
2015-03-06 73
2015-03-13 72
2015-03-20 65
2015-03-27 64
2015-04-03 65
2015-04-10 62
2015-04-17 61
2015-04-24 59
2015-05-01 56
2015-05-08 58
2015-05-15 54
2015-05-22 53
2015-05-29 50
2015-06-05 50
2015-06-12 52
2015-06-19 54
2015-06-26 52
2015-07-02 50
2015-07-10 48
2015-07-17 45
2015-07-24 44
2015-07-31 43
2015-08-07 42
2015-08-14 45
2015-08-21 45
2015-08-28 47
2015-09-04 46
2015-09-11 43
2015-09-18 43
2015-09-25 44
2015-10-02 44
2015-10-09 44
2015-10-16 40
2015-10-23 38
2015-10-30 39
2015-11-06 32
2015-11-13 30
2015-11-20 31
2015-11-27 28
2015-12-04 31
2015-12-11 26
2015-12-18 26
2015-12-25 28
2016-01-01 25
2016-01-08 26
2016-01-15 25
2016-01-22 21
2016-01-29 23
2016-02-05 20
2016-02-12 21
2016-02-19 37
2016-02-26 34
2016-03-04 32
2016-03-11 31
2016-03-18 32
2016-03-24 30
2016-04-01 27
2016-04-08 25
2016-04-15 23
2016-04-22 23
lanery pointed to right place. you need to define you quarters and use in the same fashion.
Define years
years = ['2009-12-31', '2010-12-31', '2011-12-30', '2012-12-31',
'2013-12-31', '2014-12-31', '2015-12-31']
Define quarters
quarters = ['2009-12-31', '2010-03-31', '2010-06-30', '2010-09-30',
'2010-12-31', '2011-03-31', '2011-06-30', '2011-09-30',
'2011-12-30', '2012-03-30', '2012-06-29', '2012-09-28',
'2012-12-31', '2013-03-29', '2013-06-28', '2013-09-30',
'2013-12-31', '2014-03-31', '2014-06-30', '2014-09-30',
'2014-12-31', '2015-03-31', '2015-06-30', '2015-09-30',
'2015-12-31', '2016-03-31']
Load the data you supplied
import pandas as pd
from StringIO import StringIO
text = """date count
2010-01-08 65
2010-01-15 68
2010-01-22 73
2010-01-29 76
2010-02-05 79
2010-02-12 76
2010-02-19 79
2010-02-26 83
2010-03-05 81
2010-03-12 83
2010-03-19 80
2010-03-26 87
2010-04-02 84
2010-04-09 87
2010-04-16 87
2010-04-23 91
2010-04-30 86
2010-05-07 92
2010-05-14 95
2010-05-21 91
2010-05-28 100
2010-06-04 96
2010-06-11 101
2010-06-18 100
2010-06-25 113
2010-07-02 112
2010-07-09 119
2010-07-16 121
2010-07-23 119
2010-07-30 115
2010-08-06 115
2010-08-13 114
2010-08-20 111
2010-08-27 114
2010-09-03 121
2010-09-10 128
2010-09-17 121
2010-09-24 118
2010-10-01 109
2010-10-08 120
2010-10-15 122
2010-10-22 120
2010-10-29 118
2010-11-05 117
2010-11-12 115
2010-11-19 113
2010-11-26 106
2010-12-03 112
2010-12-10 114
2010-12-17 122
2010-12-24 120
2010-12-31 120
2011-01-07 139
2011-01-14 141
2011-01-21 141
2011-01-28 145
2011-02-04 146
2011-02-11 145
2011-02-18 148
2011-02-25 149
2011-03-04 150
2011-03-11 149
2011-03-18 145
2011-03-25 140
2011-04-01 150
2011-04-08 153
2011-04-15 151
2011-04-22 148
2011-04-29 150
2011-05-06 148
2011-05-13 154
2011-05-20 155
2011-05-27 152
2011-06-03 158
2011-06-10 155
2011-06-17 152
2011-06-24 148
2011-07-01 160
2011-07-08 164
2011-07-15 163
2011-07-22 147
2011-07-29 158
2011-08-05 161
2011-08-12 166
2011-08-19 158
2011-08-26 154
2011-09-02 161
2011-09-09 166
2011-09-16 160
2011-09-23 169
2011-09-30 171
2011-10-07 155
2011-10-14 159
2011-10-21 156
2011-10-28 168
2011-11-04 154
2011-11-11 166
2011-11-18 168
2011-11-25 164
2011-12-02 179
2011-12-09 171
2011-12-16 172
2011-12-23 165
2011-12-30 170
2012-01-06 162
2012-01-13 172
2012-01-20 172
2012-01-27 186
2012-02-03 183
2012-02-10 175
2012-02-17 188
2012-02-24 182
2012-03-02 184
2012-03-09 189
2012-03-16 190
2012-03-23 181
2012-03-30 186
2012-04-06 180
2012-04-13 178
2012-04-20 179
2012-04-27 174
2012-05-04 201
2012-05-11 201
2012-05-18 201
2012-05-25 201
2012-06-01 206
2012-06-08 206
2012-06-15 199
2012-06-22 201
2012-06-29 186
2012-07-06 194
2012-07-13 192
2012-07-20 189
2012-07-27 189
2012-08-03 189
2012-08-10 194
2012-08-17 190
2012-08-24 192
2012-08-31 177
2012-09-07 186
2012-09-14 173
2012-09-21 178
2012-09-28 180
2012-10-05 173
2012-10-12 165
2012-10-19 167
2012-10-26 160
2012-11-02 160
2012-11-09 167
2012-11-16 159
2012-11-23 161
2012-11-30 166
2012-12-07 161
2012-12-14 150
2012-12-21 158
2012-12-28 122
2013-01-04 121
2013-01-11 115
2013-01-18 116
2013-01-25 119
2013-02-01 113
2013-02-08 112
2013-02-15 125
2013-02-22 113
2013-03-01 117
2013-03-08 113
2013-03-15 113
2013-03-22 116
2013-03-29 125
2013-04-05 113
2013-04-12 120
2013-04-19 120
2013-04-26 128
2013-05-03 131
2013-05-10 129
2013-05-17 135
2013-05-24 125
2013-05-31 140
2013-06-07 131
2013-06-14 129
2013-06-21 130
2013-06-28 139
2013-07-05 136
2013-07-12 137
2013-07-19 131
2013-07-26 132
2013-08-02 131
2013-08-09 138
2013-08-16 138
2013-08-23 140
2013-08-30 137
2013-09-06 132
2013-09-13 132
2013-09-20 129
2013-09-27 129
2013-10-04 128
2013-10-11 129
2013-10-18 130
2013-10-25 135
2013-11-01 128
2013-11-08 131
2013-11-15 130
2013-11-22 128
2013-11-29 134
2013-12-06 140
2013-12-13 131
2013-12-20 130
2013-12-27 125
2014-01-03 134
2014-01-10 138
2014-01-17 139
2014-01-24 129
2014-01-31 142
2014-02-07 145
2014-02-14 135
2014-02-21 140
2014-02-28 137
2014-03-07 148
2014-03-14 148
2014-03-21 140
2014-03-28 141
2014-04-04 148
2014-04-11 145
2014-04-18 145
2014-04-25 140
2014-05-02 157
2014-05-09 146
2014-05-16 143
2014-05-23 159
2014-05-30 152
2014-06-06 141
2014-06-13 145
2014-06-20 152
2014-06-27 145
2014-07-03 144
2014-07-11 150
2014-07-18 145
2014-07-25 146
2014-08-01 149
2014-08-08 145
2014-08-15 146
2014-08-22 151
2014-08-29 142
2014-09-05 155
2014-09-12 149
2014-09-19 158
2014-09-26 149
2014-10-03 154
2014-10-10 141
2014-10-17 150
2014-10-24 135
2014-10-31 145
2014-11-07 145
2014-11-14 155
2014-11-21 143
2014-11-26 148
2014-12-05 149
2014-12-12 151
2014-12-19 155
2014-12-26 143
2015-01-02 131
2015-01-09 132
2015-01-16 124
2015-01-23 132
2015-01-30 121
2015-02-06 116
2015-02-13 115
2015-02-20 105
2015-02-27 77
2015-03-06 73
2015-03-13 72
2015-03-20 65
2015-03-27 64
2015-04-03 65
2015-04-10 62
2015-04-17 61
2015-04-24 59
2015-05-01 56
2015-05-08 58
2015-05-15 54
2015-05-22 53
2015-05-29 50
2015-06-05 50
2015-06-12 52
2015-06-19 54
2015-06-26 52
2015-07-02 50
2015-07-10 48
2015-07-17 45
2015-07-24 44
2015-07-31 43
2015-08-07 42
2015-08-14 45
2015-08-21 45
2015-08-28 47
2015-09-04 46
2015-09-11 43
2015-09-18 43
2015-09-25 44
2015-10-02 44
2015-10-09 44
2015-10-16 40
2015-10-23 38
2015-10-30 39
2015-11-06 32
2015-11-13 30
2015-11-20 31
2015-11-27 28
2015-12-04 31
2015-12-11 26
2015-12-18 26
2015-12-25 28
2016-01-01 25
2016-01-08 26
2016-01-15 25
2016-01-22 21
2016-01-29 23
2016-02-05 20
2016-02-12 21
2016-02-19 37
2016-02-26 34
2016-03-04 32
2016-03-11 31
2016-03-18 32
2016-03-24 30
2016-04-01 27
2016-04-08 25
2016-04-15 23
2016-04-22 23"""
Parse your data
data = pd.read_csv(StringIO(text), index_col=[0], parse_dates=[0], delim_whitespace=True)
Use info from
How to add a grid line at a specific location in matplotlib plot?
fig, ax = plt.subplots()
ax.set_xticks(quarters, minor=True)
ax.set_xticks(years, minor=False)
ax.xaxis.grid(True, which='minor')
ax.xaxis.grid(False, which='major')
data.plot(ax=ax)

Categories

Resources