I have a DataFrame which has multiple samples
df1 = [[ 0 1 2 3 4 5 6 7 8 9 10 11 \
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 13.4 5.2 7.7 -2.1 1.6 -4.1 -0.5 8.2 15.9 12.9 11.8 9.3
2 -3.1 -0.6 -5.1 -0.5 -4.1 0.5 -3.6 -5.6 -9.7 -3.6 -4.7 -5.7
3 -10.3 -1.0 -9.8 0.5 -3.6 1.0 -1.5 -1.6 -5.1 -4.6 -13.3 -10.7
4 0.0 -5.6 -4.1 1.5 3.0 -1.0 2.6 6.7 12.3 6.6 -0.5 1.0
5 6.2 0.5 2.6 1.1 1.6 0.5 2.0 0.0 -0.5 0.5 7.7 5.6
6 -1.6 5.1 6.1 -1.1 -2.1 0.0 -1.5 -9.2 -13.9 -7.1 1.5 -0.5
7 -6.1 -4.1 -1.5 -1.5 -0.5 -0.5 -0.5 -2.6 2.6 -2.6 -6.6 -3.1
8 -0.5 -4.1 -6.7 1.5 0.0 0.5 1.0 8.2 8.7 4.1 -3.1 0.6
9 5.1 7.2 4.6 0.6 1.0 0.0 0.5 -0.5 -7.7 -0.5 5.6 0.5
10 0.5 2.6 5.7 -2.1 2.1 -1.0 0.0 -8.2 -10.2 -6.2 3.6 0.0
11 -5.1 -7.2 -4.6 1.0 -1.0 2.0 -2.0 8.7 14.3 10.3 2.1 8.2
12 0.5 -2.6 -1.6 3.1 -0.5 3.1 4.1 10.8 15.9 9.7 2.5 7.7
13 2.6 2.1 0.0 0.0 -0.6 -0.5 8.7 -8.7 -15.9 -13.3 -3.0 -9.3
14 -2.1 4.6 1.0 -2.6 1.1 0.0 0.0 -6.2 -10.8 -9.7 -1.1 -4.1
15 4.6 5.1 6.2 -0.5 7.7 3.1 -3.6 8.2 19.0 11.7 7.2 12.9
16 1.6 -6.1 -2.6 0.5 5.1 2.0 1.0 0.0 5.7 5.7 2.1 1.0
17 -11.8 -10.8 -10.7 -1.5 -6.2 -1.0 -3.1 -10.7 -23.1 -11.3 -6.7 -12.8
18 -6.2 -0.5 -0.5 0.0 -7.7 -3.6 -7.7 0.0 -3.1 0.0 -4.1 -0.6
19 9.8 3.6 4.1 1.5 -2.0 -4.6 -1.0 11.2 25.1 14.9 1.5 8.8
20 7.7 3.0 2.0 0.0 3.6 2.5 3.1 1.6 3.1 2.0 6.2 2.5
21 -1.1 3.1 3.1 -0.5 7.2 7.2 2.0 -11.3 -26.7 -12.8 3.1 -2.5
22 0.5 -1.0 0.0 0.5 3.0 0.5 -2.0 -0.5 -2.0 -1.0 -3.1 1.0
23 1.1 -2.1 -2.6 -1.5 -0.5 -0.5 -2.6 9.2 23.1 6.6 -1.0 1.0
24 -2.1 3.6 2.1 -1.0 1.1 3.6 1.6 -3.6 -9.3 -5.1 2.0 -2.0
25 1.6 4.6 5.1 2.0 -4.7 -2.6 1.0 -2.0 -11.8 -2.0 2.1 3.0
26 0.5 -1.5 -2.6 1.1 -7.7 -9.2 0.5 6.1 15.4 5.6 -2.1 0.0
27 -6.2 -11.3 -11.8 -0.6 -1.5 2.1 5.1 -3.6 -2.0 -4.6 -6.7 -11.2
28 -1.0 -4.1 -1.0 0.6 1.0 8.7 7.7 -4.1 -10.3 -2.1 0.0 -1.1
29 8.2 10.3 10.3 0.0 -0.5 0.5 3.6 3.1 6.2 4.7 6.2 10.3
.. ... ... ... ... ... ... ... ... ... ... ... ...
98 -1.0 -4.1 -1.0 1.1 0.5 -3.1 -6.7 1.5 10.3 2.5 -1.0 -6.1
99 5.6 9.8 5.1 -1.6 1.6 -1.0 4.7 11.8 18.4 22.6 11.8 13.3
100 -1.0 2.0 1.1 0.5 -1.1 1.5 11.2 -5.6 -4.1 8.2 4.1 9.3
101 -4.6 -9.2 -5.7 2.6 -1.0 1.6 -0.5 -10.8 -14.3 -16.4 -8.2 -14.4
102 1.5 0.0 2.6 -1.5 0.5 -0.5 -10.7 4.6 -2.1 -13.3 -0.5 -3.1
103 -2.5 -2.6 1.5 -1.6 -4.1 -2.6 -5.2 2.6 2.6 -2.6 -3.1 2.6
104 -7.7 -6.7 -7.2 1.6 -6.1 -4.1 1.1 -5.6 -2.1 -2.1 -10.3 -12.8
105 2.6 7.2 -1.0 0.0 2.0 -9.2 0.0 4.1 1.6 4.1 -1.0 -3.1
106 5.6 7.2 7.2 -2.6 5.6 -1.1 -2.1 4.6 1.5 7.2 4.1 9.7
107 -0.5 -5.6 1.0 3.1 1.6 8.8 1.0 -4.6 -1.5 -7.7 -2.6 -6.6
108 -4.6 -3.6 -8.7 1.5 -2.6 0.0 2.6 0.5 5.1 -4.6 -2.5 -7.7
109 -3.6 1.5 -5.6 -5.6 -7.7 -10.8 -7.7 2.5 -0.5 5.1 0.5 4.1
110 0.0 2.1 4.1 0.0 -1.5 -1.5 -8.7 -7.7 -11.3 -8.2 0.5 0.5
111 7.2 6.6 9.7 4.6 9.7 10.2 8.7 -4.1 -7.2 -7.7 3.6 2.5
112 5.1 2.6 2.1 -1.6 5.2 5.1 5.6 1.6 3.6 7.7 1.0 8.8
113 -7.7 -5.1 2.5 -2.5 -8.2 -8.7 -11.8 5.6 16.9 11.3 -1.0 -1.1
114 -7.2 1.5 10.8 2.0 -6.2 -8.2 -1.0 4.6 9.8 10.3 6.1 2.6
115 -2.6 -1.5 -2.1 0.5 5.1 5.7 12.8 -9.2 -22.1 -10.3 1.1 2.6
116 -2.0 -8.2 -9.7 -0.5 3.1 9.2 3.1 -9.2 -16.9 -24.6 -11.8 -11.8
117 0.5 1.5 5.6 0.0 -5.1 -1.5 -3.6 13.3 24.1 6.2 -1.6 -3.1
118 1.5 7.2 6.7 -0.5 1.0 -2.1 2.1 8.7 10.3 18.4 9.3 10.8
119 8.3 6.1 3.1 1.0 8.2 4.1 -0.5 -16.4 -22.6 -6.6 1.5 2.0
120 9.2 5.7 5.6 1.6 2.6 1.0 -10.3 -5.6 2.6 -2.1 2.1 0.0
121 -2.1 -4.1 -1.5 -2.6 -2.1 -3.6 -8.2 13.3 22.5 9.7 7.1 4.1
122 -5.6 -2.6 -2.6 -2.0 -1.0 1.1 6.2 4.6 -2.5 -0.5 -1.0 -0.5
123 4.1 7.2 6.7 2.5 -1.5 2.0 11.2 -7.2 -14.4 -6.1 -1.0 -1.0
124 3.1 1.5 4.6 -1.0 0.0 -1.0 -2.5 -3.5 -2.0 0.0 1.5 1.0
125 -2.1 -2.0 -3.6 -2.1 1.5 1.0 -6.7 -1.1 1.5 0.0 -5.1 -5.1
126 1.6 3.0 -0.5 1.1 0.0 -1.5 1.0 1.6 4.6 0.5 -4.6 0.0
127 -2.1 -1.0 -2.6 1.0 0.5 -6.2 4.1 7.1 12.3 3.6 2.0 3.6
12 13
0 NaN NaN
1 12.8 14.9
2 -3.0 -7.7
3 -5.7 -10.8
4 2.6 1.5
5 4.1 4.1
6 -2.1 1.1
7 -7.7 -2.6
8 1.6 -3.6
9 4.1 0.5
10 -5.7 3.1
11 2.1 2.6
12 8.2 3.6
13 -2.1 -1.6
14 -2.5 -2.0
15 6.1 5.6
16 0.5 0.0
17 -5.1 -11.3
18 -2.0 -6.1
19 1.5 8.7
20 2.1 7.2
21 -0.6 -0.5
22 2.1 3.0
23 2.0 2.6
24 -5.1 -3.6
25 2.1 2.6
26 2.5 3.6
27 -13.8 -12.3
28 -3.6 -4.1
29 11.3 11.2
.. ... ...
98 6.2 1.0
99 12.3 9.7
100 -2.1 2.1
101 -7.2 -6.2
102 3.6 -1.5
103 -4.1 -3.1
104 -12.8 -7.2
105 3.6 0.6
106 13.8 4.6
107 -2.5 -3.1
108 -11.3 -7.2
109 -2.6 -3.1
110 4.1 1.1
111 7.2 4.6
112 4.1 4.1
113 -4.1 -2.1
114 3.6 3.1
115 0.5 -4.1
116 -11.3 -12.3
117 0.0 3.1
118 9.3 11.8
119 0.5 3.6
120 -1.1 1.5
121 2.6 -0.5
122 -2.6 -2.6
123 -1.5 4.1
124 2.1 1.0
125 -1.1 -1.0
126 0.5 5.7
127 0.6 -2.6
[128 rows x 14 columns x 60 samples]]
I have around 60 of these, then I have another DataFrame which of size df2 = (1x14)
What I wanted to do is, to check if the values of row in df2 is equal to or greater than or less than the corresponding rows in df1 is so it should give me values of 1, 0 or -1 for each element in that row.
which should look something like this
0 1 0 0 0 0 -1 0 -0....
-1 0 1 1 0 -1 0 0 ....
.
.
can anyone help me with this?
OK I think the following should work:
In [208]:
# load some data
t="""1 13.4 5.2 7.7 -2.1 1.6 -4.1 -0.5 8.2 15.9 12.9 11.8 9.3
2 -3.1 -0.6 -5.1 -0.5 -4.1 0.5 -3.6 -5.6 -9.7 -3.6 -4.7 -5.7
3 -10.3 -1.0 -9.8 0.5 -3.6 1.0 -1.5 -1.6 -5.1 -4.6 -13.3 -10.7
4 0.0 -5.6 -4.1 1.5 3.0 -1.0 2.6 6.7 12.3 6.6 -0.5 1.0
5 6.2 0.5 2.6 1.1 1.6 0.5 2.0 0.0 -0.5 0.5 7.7 5.6
6 -1.6 5.1 6.1 -1.1 -2.1 0.0 -1.5 -9.2 -13.9 -7.1 1.5 -0.5
7 -6.1 -4.1 -1.5 -1.5 -0.5 -0.5 -0.5 -2.6 2.6 -2.6 -6.6 -3.1
8 -0.5 -4.1 -6.7 1.5 0.0 0.5 1.0 8.2 8.7 4.1 -3.1 0.6
9 5.1 7.2 4.6 0.6 1.0 0.0 0.5 -0.5 -7.7 -0.5 5.6 0.5"""
df = pd.read_csv(io.StringIO(t), delim_whitespace=True, header = None, index_col=[0])
df.reset_index(inplace=True, drop=True)
df
Out[208]:
1 2 3 4 5 6 7 8 9 10 11 12
0 13.4 5.2 7.7 -2.1 1.6 -4.1 -0.5 8.2 15.9 12.9 11.8 9.3
1 -3.1 -0.6 -5.1 -0.5 -4.1 0.5 -3.6 -5.6 -9.7 -3.6 -4.7 -5.7
2 -10.3 -1.0 -9.8 0.5 -3.6 1.0 -1.5 -1.6 -5.1 -4.6 -13.3 -10.7
3 0.0 -5.6 -4.1 1.5 3.0 -1.0 2.6 6.7 12.3 6.6 -0.5 1.0
4 6.2 0.5 2.6 1.1 1.6 0.5 2.0 0.0 -0.5 0.5 7.7 5.6
5 -1.6 5.1 6.1 -1.1 -2.1 0.0 -1.5 -9.2 -13.9 -7.1 1.5 -0.5
6 -6.1 -4.1 -1.5 -1.5 -0.5 -0.5 -0.5 -2.6 2.6 -2.6 -6.6 -3.1
7 -0.5 -4.1 -6.7 1.5 0.0 0.5 1.0 8.2 8.7 4.1 -3.1 0.6
8 5.1 7.2 4.6 0.6 1.0 0.0 0.5 -0.5 -7.7 -0.5 5.6 0.5
now use nested np.where to mask the df using gt and lt to set 1 and -1 respectively 0 when it's equal if both conditions not met:
In [213]:
df1 = pd.DataFrame(np.arange(12)).astype(float)
df = pd.DataFrame(np.where(df.gt(df1.squeeze(), axis=0), 1, np.where(df.lt(df1.squeeze(), axis=0), -1, 0)))
df
Out[213]:
0 1 2 3 4 5 6 7 8 9 10 11
0 1 1 1 -1 1 -1 -1 1 1 1 1 1
1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
3 -1 -1 -1 -1 0 -1 -1 1 1 1 -1 -1
4 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 1
5 -1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1
6 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
7 -1 -1 -1 -1 -1 -1 -1 1 1 -1 -1 -1
8 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
9 0 0 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0
The above should work so long as the indices and column labels match
Setup
import pandas as pd
import numpy as np
np.random.seed([3,1415])
df1 = pd.DataFrame(np.random.choice(range(10), (20, 10)))
df2 = pd.Series(np.random.choice(range(10), (10,)))
try:
1 * (df1 > df2) - (df1 < df2)
0 1 2 3 4 5 6 7 8 9
0 -1 1 1 -1 -1 0 -1 -1 1 -1
1 -1 1 -1 -1 0 0 -1 -1 -1 -1
2 -1 1 1 -1 -1 -1 -1 -1 -1 1
3 0 1 1 -1 -1 -1 -1 -1 0 -1
4 -1 1 1 -1 -1 -1 1 -1 -1 -1
5 -1 1 1 -1 -1 -1 1 -1 -1 -1
6 -1 1 1 -1 0 1 -1 -1 -1 -1
7 -1 1 1 0 -1 -1 1 -1 -1 -1
8 -1 1 1 -1 -1 -1 -1 -1 -1 1
9 -1 1 1 -1 -1 -1 0 -1 1 1
10 -1 1 1 -1 -1 -1 -1 -1 -1 1
11 -1 1 1 -1 -1 -1 1 -1 1 0
12 -1 1 0 -1 -1 -1 -1 0 1 -1
13 1 1 1 -1 -1 1 -1 -1 -1 -1
14 -1 1 -1 -1 -1 0 1 -1 1 -1
15 -1 1 1 -1 -1 1 0 -1 -1 -1
16 -1 0 1 1 -1 -1 1 -1 1 0
17 -1 1 1 -1 -1 1 -1 -1 -1 1
18 -1 1 1 -1 -1 -1 -1 -1 1 1
19 -1 1 1 -1 -1 -1 -1 0 -1 -1
Related
I am currently filtering my dataset based on certain statements as such :
from sklearn.datasets import load_iris
iris = load_iris()
data1 = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target'])
# filter dataset
data1[(data1['sepal length (cm)'] > 4) | (data1['sepal width (cm)'] > 3)]
I want to be able to get the next 10 rows following each filter too and I am not sure how to even start that so for example when they find one row where the length is greater than 4, I want to return the next 10 as well as that one etc.
Please let me know how I can do this.
the dataset that you loaded has a sequential index starting at 0.
to get the 10 rows following a filter, you're looking for rows where the index is between index of the most recent filtered row and 10 + index of the most recent filtered row
however, the filter that you have provided (data1['sepal length (cm)'] > 4) | (data1['sepal width (cm)'] > 3) matches on every row. the minimum width is 2. so for this illustration i'll use the filter sepal length (cm) == 4.6 and filter the next 5 rows instead of 10.
filt = data1['sepal length (cm)'] == 4.6
data1.loc[filt, 'sentinel'] = data1.index[filt]
data1.sentinel = data1.sentinel.ffill()
data1[(data1.index >= data1.sentinel) & (data1.index <= data1.sentinel + 5)]
This filters 21 rows below
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target sentinel
3 4.6 3.1 1.5 0.2 0.0 3.0
4 5.0 3.6 1.4 0.2 0.0 3.0
5 5.4 3.9 1.7 0.4 0.0 3.0
6 4.6 3.4 1.4 0.3 0.0 6.0
7 5.0 3.4 1.5 0.2 0.0 6.0
8 4.4 2.9 1.4 0.2 0.0 6.0
9 4.9 3.1 1.5 0.1 0.0 6.0
10 5.4 3.7 1.5 0.2 0.0 6.0
11 4.8 3.4 1.6 0.2 0.0 6.0
22 4.6 3.6 1.0 0.2 0.0 22.0
23 5.1 3.3 1.7 0.5 0.0 22.0
24 4.8 3.4 1.9 0.2 0.0 22.0
25 5.0 3.0 1.6 0.2 0.0 22.0
26 5.0 3.4 1.6 0.4 0.0 22.0
27 5.2 3.5 1.5 0.2 0.0 22.0
47 4.6 3.2 1.4 0.2 0.0 47.0
48 5.3 3.7 1.5 0.2 0.0 47.0
49 5.0 3.3 1.4 0.2 0.0 47.0
50 7.0 3.2 4.7 1.4 1.0 47.0
51 6.4 3.2 4.5 1.5 1.0 47.0
52 6.9 3.1 4.9 1.5 1.0 47.0
I am still learning web scraping and would appreciate any help that I can get. Thanks to help from the community I was able to successfully scrape NBA player data (player name and player stats) and concatenate the data into one dataframe.
Here is the code below:
import pandas as pd
import requests
url = 'https://www.espn.com/nba/team/stats/_/name/lal/season/2020/seasontype/2'
df = pd.read_html(url)
df_concat = pd.concat([df[0], df[1], df[3]], axis=1)
I would now like to iterate through multiple urls to get data for different teams and then combine all of the different teams into one dataframe.
Here is the code that I have so far:
import pandas as pd
teams = ['chi','den','lac']
for team in teams:
print(team)
url = 'https://www.espn.com/nba/team/stats/_/name/{team}/season/2020/seasontype/2'.format(team=team)
print(url)
df = pd.read_html(url)
df_concat = pd.concat([df[0], df[1], df[3]], axis=1)
I tried changing 'lal' in the url to the variable team. When I ran this script the scrape was really, really slow and only gave me a dataframe for the team 'lac', not 'chi' or 'den. Any advice on the best way to do this? I have never tried scraping multiple urls.
Again, I would the data for each team combined into one large dataframe if possible. Thanks in advance for any help that you may offer. I will learn a lot from this project. =)
The principle is the same, use pd.concat with list of dataframes. For example:
import requests
import pandas as pd
teams = ["chi", "den", "lac"]
dfs_to_concat = []
for team in teams:
print(team)
url = "https://www.espn.com/nba/team/stats/_/name/{team}/season/2020/seasontype/2".format(
team=team
)
print(url)
df = pd.read_html(url)
df_concat = pd.concat([df[0], df[1], df[3]], axis=1)
dfs_to_concat.append(df_concat)
df_final = pd.concat(dfs_to_concat)
print(df_final)
df_final.to_csv("data.csv", index=False)
Prints:
chi
https://www.espn.com/nba/team/stats/_/name/chi/season/2020/seasontype/2
den
https://www.espn.com/nba/team/stats/_/name/den/season/2020/seasontype/2
lac
https://www.espn.com/nba/team/stats/_/name/lac/season/2020/seasontype/2
Name GP GS MIN PTS OR DR REB AST STL BLK TO PF AST/TO PER FGM FGA FG% 3PM 3PA 3P% FTM FTA FT% 2PM 2PA 2P% SC-EFF SH-EFF
0 Zach LaVine SG 60 60.0 34.8 25.5 0.7 4.1 4.8 4.2 1.5 0.5 3.4 2.2 1.2 19.52 9.0 20.0 45.0 3.1 8.1 38.0 4.5 5.6 80.2 5.9 11.9 49.7 1.276 0.53
1 Lauri Markkanen PF 50 50.0 29.8 14.7 1.2 5.1 6.3 1.5 0.8 0.5 1.6 1.9 0.9 14.32 5.0 11.8 42.5 2.2 6.3 34.4 2.5 3.1 82.4 2.8 5.5 51.8 1.247 0.52
2 Coby White PG 65 1.0 25.8 13.2 0.4 3.1 3.5 2.7 0.8 0.1 1.7 1.8 1.6 11.92 4.8 12.2 39.4 2.0 5.8 35.4 1.6 2.0 79.1 2.8 6.4 43.0 1.085 0.48
3 Otto Porter Jr. SF 14 9.0 23.6 11.9 0.9 2.5 3.4 1.8 1.1 0.4 0.8 2.2 2.3 15.87 4.4 10.0 44.3 1.7 4.4 38.7 1.4 1.9 70.4 2.7 5.6 48.7 1.193 0.53
4 Wendell Carter Jr. C 43 43.0 29.2 11.3 3.2 6.2 9.4 1.2 0.8 0.8 1.7 3.8 0.7 15.51 4.3 8.0 53.4 0.1 0.7 20.7 2.6 3.5 73.7 4.1 7.3 56.4 1.411 0.54
5 Thaddeus Young PF 64 16.0 24.9 10.3 1.5 3.5 4.9 1.8 1.4 0.4 1.6 2.1 1.1 13.36 4.2 9.4 44.8 1.2 3.5 35.6 0.7 1.1 58.3 3.0 5.9 50.1 1.097 0.51
6 Tomas Satoransky SG 65 64.0 28.9 9.9 1.2 2.7 3.9 5.4 1.2 0.1 2.0 2.1 2.7 13.52 3.6 8.5 43.0 1.0 3.1 32.2 1.6 1.9 87.6 2.7 5.4 49.1 1.169 0.49
7 Chandler Hutchison F 28 10.0 18.8 7.8 0.6 3.2 3.9 0.9 1.0 0.3 1.0 1.7 1.0 12.45 2.9 6.3 45.7 0.4 1.4 31.6 1.6 2.8 59.0 2.4 4.9 49.6 1.246 0.49
8 Kris Dunn PG 51 32.0 24.9 7.3 0.5 3.2 3.6 3.4 2.0 0.3 1.3 3.1 2.5 12.15 3.0 6.7 44.4 0.6 2.2 25.9 0.8 1.1 74.1 2.4 4.5 53.5 1.091 0.49
9 Denzel Valentine SG 36 5.0 13.6 6.8 0.3 1.8 2.1 1.2 0.7 0.2 0.7 1.4 1.7 13.09 2.7 6.6 40.9 1.3 3.8 33.6 0.2 0.2 75.0 1.4 2.8 51.0 1.038 0.51
10 Luke Kornet C 36 14.0 15.5 6.0 0.6 1.7 2.3 0.9 0.3 0.7 0.4 1.5 2.3 12.70 2.3 5.2 43.8 0.9 3.0 28.7 0.6 0.8 71.4 1.4 2.2 64.6 1.150 0.52
11 Daniel Gafford C 43 7.0 14.2 5.1 1.2 1.3 2.5 0.5 0.3 1.3 0.7 2.3 0.7 16.21 2.2 3.1 70.1 0.0 0.0 0.0 0.7 1.4 53.3 2.2 3.1 70.1 1.642 0.70
12 Shaquille Harrison G 43 10.0 11.3 4.9 0.5 1.5 2.0 1.1 0.8 0.4 0.4 1.3 2.6 17.81 1.8 3.8 46.7 0.4 1.0 38.1 0.9 1.2 78.0 1.4 2.9 49.6 1.267 0.52
13 Ryan Arcidiacono PG 58 4.0 16.0 4.5 0.3 1.6 1.9 1.7 0.5 0.1 0.6 1.7 2.6 9.04 1.6 3.8 40.9 0.9 2.4 39.1 0.5 0.7 71.1 0.6 1.4 43.9 1.186 0.53
14 Cristiano Felicio PF 22 0.0 17.5 3.9 2.5 2.1 4.6 0.7 0.5 0.1 0.8 1.5 0.9 12.79 1.5 2.5 63.0 0.0 0.1 0.0 0.8 1.0 78.3 1.5 2.4 65.4 1.593 0.63
15 Adam Mokoka SG 11 0.0 10.2 2.9 0.6 0.3 0.9 0.4 0.4 0.0 0.2 1.5 2.0 8.18 1.1 2.5 42.9 0.5 1.4 40.0 0.2 0.4 50.0 0.5 1.2 46.2 1.143 0.54
16 Max Strus SG 2 0.0 3.0 2.5 0.5 0.0 0.5 0.0 0.0 0.0 0.0 0.5 0.0 30.82 1.0 1.5 66.7 0.0 0.5 0.0 0.5 0.5 100.0 1.0 1.0 100.0 1.667 0.67
17 Total 65 NaN NaN 106.8 10.5 31.4 41.9 23.2 10.0 4.1 14.6 21.8 1.6 NaN 39.6 88.6 44.7 12.2 35.1 34.8 15.5 20.5 75.5 27.4 53.5 51.1 1.205 0.52
0 Nikola Jokic C 73 73.0 32.0 19.9 2.3 7.5 9.7 7.0 1.2 0.6 3.1 3.0 2.3 24.97 7.7 14.7 52.8 1.1 3.5 31.4 3.4 4.1 81.7 6.6 11.2 59.4 1.359 0.56
1 Jamal Murray PG 59 59.0 32.3 18.5 0.8 3.2 4.0 4.8 1.1 0.3 2.2 1.7 2.2 17.78 6.9 15.2 45.6 1.9 5.5 34.6 2.8 3.1 88.1 5.0 9.7 51.9 1.220 0.52
2 Will Barton SF 58 58.0 33.0 15.1 1.3 5.0 6.3 3.7 1.1 0.5 1.5 2.1 2.4 15.70 5.7 12.7 45.0 1.9 5.0 37.5 1.8 2.3 76.7 3.9 7.8 49.8 1.184 0.52
3 Jerami Grant SF 71 24.0 26.6 12.0 0.8 2.7 3.5 1.2 0.7 0.8 0.9 2.2 1.4 14.46 4.3 8.9 47.8 1.4 3.5 38.9 2.1 2.8 75.0 2.9 5.4 53.7 1.342 0.56
4 Paul Millsap PF 51 48.0 24.3 11.6 1.9 3.8 5.7 1.6 0.9 0.6 1.4 2.9 1.2 16.96 4.1 8.6 48.2 1.1 2.4 43.5 2.3 2.8 81.6 3.1 6.2 50.0 1.349 0.54
5 Gary Harris SG 56 55.0 31.8 10.4 0.5 2.4 2.9 2.1 1.4 0.3 1.1 2.1 2.0 9.78 3.9 9.3 42.0 1.3 3.8 33.3 1.3 1.6 81.5 2.6 5.5 47.9 1.119 0.49
6 Michael Porter Jr. SF 55 8.0 16.4 9.3 1.2 3.5 4.7 0.8 0.5 0.5 0.9 1.8 0.9 19.84 3.5 7.0 50.9 1.1 2.7 42.2 1.1 1.3 83.3 2.4 4.3 56.4 1.337 0.59
7 Monte Morris PG 73 12.0 22.4 9.0 0.3 1.5 1.9 3.5 0.8 0.2 0.7 1.0 4.8 14.98 3.6 7.8 45.9 0.9 2.4 37.8 1.0 1.2 84.3 2.7 5.4 49.5 1.166 0.52
8 Malik Beasley SG * 41 0.0 18.2 7.9 0.2 1.7 1.9 1.2 0.8 0.1 0.9 1.2 1.3 10.51 2.9 7.3 38.9 1.4 3.9 36.0 0.8 0.9 86.8 1.4 3.4 42.1 1.080 0.49
9 Mason Plumlee C 61 1.0 17.3 7.2 1.6 3.6 5.2 2.5 0.5 0.6 1.3 2.3 1.9 18.86 2.9 4.7 61.5 0.0 0.1 0.0 1.4 2.5 53.5 2.9 4.6 62.5 1.517 0.61
10 PJ Dozier SG 29 0.0 14.2 5.8 0.3 1.6 1.9 2.2 0.5 0.2 0.9 1.6 2.3 11.66 2.2 5.4 41.4 0.6 1.7 34.7 0.7 1.0 72.4 1.7 3.7 44.4 1.070 0.47
11 Bol Bol C 7 0.0 12.4 5.7 0.7 2.0 2.7 0.9 0.3 0.9 1.4 1.6 0.6 14.41 2.0 4.0 50.0 0.6 1.3 44.4 1.1 1.4 80.0 1.4 2.7 52.6 1.429 0.57
12 Torrey Craig SF 58 27.0 18.5 5.4 1.1 2.2 3.3 0.8 0.4 0.6 0.4 2.3 1.9 10.79 2.1 4.6 46.1 0.8 2.4 32.6 0.4 0.6 61.1 1.4 2.3 60.3 1.171 0.54
13 Keita Bates-Diop SF * 7 0.0 14.0 5.3 0.6 1.9 2.4 0.0 0.3 0.6 0.4 1.0 0.0 12.13 1.9 4.0 46.4 0.4 1.3 33.3 1.1 1.4 80.0 1.4 2.7 52.6 1.321 0.52
14 Troy Daniels G * 6 0.0 12.7 4.3 0.0 1.0 1.0 0.5 0.5 0.0 0.5 1.2 1.0 5.35 1.7 4.7 35.7 1.0 3.3 30.0 0.0 0.0 0.0 0.7 1.3 50.0 0.929 0.46
15 Juancho Hernangomez PF * 34 0.0 12.4 3.1 0.7 2.1 2.8 0.6 0.1 0.1 0.5 0.9 1.2 6.89 1.1 3.2 34.5 0.4 1.8 25.0 0.5 0.7 64.0 0.7 1.5 46.0 0.973 0.41
16 Jordan McRae G * 4 0.0 8.0 2.3 0.3 1.0 1.3 1.0 0.5 0.3 0.0 0.5 inf 16.74 0.5 1.5 33.3 0.5 1.0 50.0 0.8 1.0 75.0 0.0 0.5 0.0 1.500 0.50
17 Tyler Cook F * 2 0.0 9.5 2.0 1.0 1.0 2.0 0.0 1.0 0.0 1.0 0.5 0.0 11.31 0.5 1.0 50.0 0.0 0.0 0.0 1.0 1.0 100.0 0.5 1.0 50.0 2.000 0.50
18 Noah Vonleh F * 7 0.0 4.3 1.9 0.4 0.7 1.1 0.3 0.0 0.0 0.3 0.6 1.0 17.61 0.7 0.9 83.3 0.1 0.1 100.0 0.3 0.6 50.0 0.6 0.7 80.0 2.167 0.92
19 Vlatko Cancar SF 14 0.0 3.2 1.2 0.4 0.4 0.7 0.2 0.1 0.1 0.2 0.5 1.0 11.45 0.4 1.1 40.0 0.1 0.4 16.7 0.3 0.3 100.0 0.4 0.6 55.6 1.133 0.43
20 Jarred Vanderbilt PF * 9 0.0 4.6 1.1 0.3 0.6 0.9 0.2 0.3 0.1 0.8 0.7 0.3 7.20 0.6 0.8 71.4 0.0 0.0 0.0 0.0 0.0 0.0 0.6 0.8 71.4 1.429 0.71
21 Total 73 NaN NaN 111.3 10.8 33.4 44.1 26.7 8.0 4.6 13.1 20.3 2.0 NaN 42.0 88.9 47.3 11.0 30.6 35.9 16.2 20.9 77.7 31.1 58.3 53.3 1.252 0.53
0 Kawhi Leonard SF 57 57.0 32.4 27.1 0.9 6.1 7.1 4.9 1.8 0.6 2.6 2.0 1.9 26.91 9.3 19.9 47.0 2.2 5.7 37.8 6.2 7.1 88.6 7.2 14.2 50.6 1.362 0.52
1 Paul George SG 48 48.0 29.6 21.5 0.5 5.2 5.7 3.9 1.4 0.4 2.6 2.4 1.5 21.14 7.1 16.3 43.9 3.3 7.9 41.2 4.0 4.5 87.6 3.9 8.4 46.4 1.321 0.54
2 Montrezl Harrell C 63 2.0 27.8 18.6 2.6 4.5 7.1 1.7 0.6 1.1 1.7 2.3 1.0 23.26 7.5 12.9 58.0 0.0 0.3 0.0 3.7 5.6 65.8 7.5 12.6 59.3 1.445 0.58
3 Lou Williams SG 65 8.0 28.7 18.2 0.5 2.6 3.1 5.6 0.7 0.2 2.8 1.2 2.0 17.38 6.0 14.4 41.8 1.7 4.8 35.2 4.5 5.2 86.1 4.3 9.6 45.1 1.266 0.48
4 Marcus Morris Sr. SF * 19 19.0 28.9 10.1 0.6 3.5 4.1 1.4 0.7 0.7 1.3 2.7 1.1 8.96 3.9 9.2 42.5 1.4 4.4 31.0 0.9 1.2 81.8 2.5 4.7 53.3 1.103 0.50
5 Reggie Jackson PG * 17 6.0 21.3 9.5 0.4 2.6 3.0 3.2 0.3 0.2 1.6 2.2 1.9 12.66 3.4 7.5 45.3 1.5 3.7 41.3 1.1 1.2 90.5 1.9 3.8 49.2 1.258 0.55
6 Landry Shamet SG 53 30.0 27.4 9.3 0.1 1.8 1.9 1.9 0.4 0.2 0.8 2.7 2.4 8.51 3.0 7.4 40.4 2.1 5.6 37.5 1.2 1.4 85.5 0.9 1.8 49.5 1.258 0.55
7 Ivica Zubac C 72 70.0 18.4 8.3 2.7 4.8 7.5 1.1 0.2 0.9 0.8 2.3 1.3 21.75 3.3 5.3 61.3 0.0 0.0 0.0 1.7 2.3 74.7 3.3 5.3 61.6 1.548 0.61
8 Patrick Beverley PG 51 50.0 26.3 7.9 1.1 4.1 5.2 3.6 1.1 0.5 1.3 3.1 2.8 12.54 2.9 6.7 43.1 1.6 4.0 38.8 0.6 0.9 66.0 1.3 2.6 49.6 1.188 0.55
9 JaMychal Green PF 63 1.0 20.7 6.8 1.2 4.9 6.2 0.8 0.5 0.4 0.9 2.8 0.9 11.11 2.4 5.6 42.9 1.5 3.8 38.7 0.6 0.8 75.0 0.9 1.8 51.8 1.222 0.56
10 Maurice Harkless SF * 50 38.0 22.8 5.5 0.9 3.1 4.0 1.0 1.0 0.6 0.9 2.4 1.0 9.70 2.2 4.3 51.6 0.5 1.5 37.0 0.5 0.8 57.1 1.7 2.9 59.0 1.267 0.58
11 Patrick Patterson PF 59 18.0 13.2 4.9 0.6 2.0 2.6 0.7 0.1 0.1 0.4 0.9 2.0 11.57 1.6 3.9 40.8 1.1 2.9 39.0 0.6 0.7 81.4 0.5 1.0 45.9 1.253 0.55
12 Mfiondu Kabengele F 12 0.0 5.3 3.5 0.1 0.8 0.9 0.2 0.2 0.2 0.2 0.8 1.0 18.28 1.2 2.7 43.8 0.8 1.7 45.0 0.4 0.4 100.0 0.4 1.0 41.7 1.313 0.58
13 Rodney McGruder SG 56 4.0 15.6 3.3 0.5 2.2 2.7 0.6 0.5 0.1 0.4 1.3 1.5 6.75 1.3 3.2 39.8 0.4 1.6 27.0 0.3 0.6 55.9 0.9 1.6 52.2 1.033 0.46
14 Amir Coffey SG 18 1.0 8.8 3.2 0.2 0.7 0.9 0.8 0.3 0.1 0.4 1.1 1.8 8.55 1.3 3.0 42.6 0.3 1.1 31.6 0.3 0.6 54.5 0.9 1.9 48.6 1.074 0.48
15 Jerome Robinson SG * 42 1.0 11.3 2.9 0.1 1.3 1.4 1.1 0.3 0.2 0.6 1.3 1.8 4.86 1.1 3.2 33.8 0.5 1.6 28.4 0.3 0.5 57.9 0.6 1.6 39.1 0.897 0.41
16 Joakim Noah C 5 0.0 10.0 2.8 1.0 2.2 3.2 1.4 0.2 0.2 1.2 1.8 1.2 11.11 0.8 1.6 50.0 0.0 0.0 0.0 1.2 1.6 75.0 0.8 1.6 50.0 1.750 0.50
17 Terance Mann SG 41 6.0 8.8 2.4 0.2 1.1 1.3 1.3 0.3 0.1 0.4 1.1 2.9 10.58 0.9 1.9 46.8 0.2 0.5 35.0 0.4 0.7 66.7 0.7 1.4 50.8 1.253 0.51
18 Derrick Walton Jr. G * 23 1.0 9.7 2.2 0.1 0.6 0.7 1.0 0.2 0.0 0.2 0.8 5.5 8.43 0.7 1.6 47.2 0.4 0.9 42.9 0.3 0.4 77.8 0.3 0.7 53.3 1.389 0.60
19 Johnathan Motley F 13 0.0 3.2 2.2 0.2 0.5 0.8 0.6 0.2 0.0 0.4 0.5 1.6 28.53 0.8 1.2 73.3 0.1 0.1 100.0 0.4 0.5 71.4 0.8 1.1 71.4 1.867 0.77
20 Total 72 NaN NaN 116.3 10.7 37.0 47.7 23.7 7.1 4.7 14.0 22.1 1.7 NaN 41.6 89.2 46.6 12.4 33.5 37.1 20.8 26.3 79.1 29.1 55.8 52.2 1.304 0.54
and creates data.csv:
I use nth value as columns without row aggregation.
Because I want to create a feature that can be tracked by using the window function and the aggregation function at any time.
R:
library(tidyverse)
iris %>% arrange(Species, Sepal.Length) %>% group_by(Species) %>%
mutate(cs = cumsum(Sepal.Length), cs4th = cumsum(Sepal.Length)[4]) %>%
slice(c(1:4))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species cs cs4th
<dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl>
1 4.3 3 1.1 0.1 setosa 4.3 17.5
2 4.4 2.9 1.4 0.2 setosa 8.7 17.5
3 4.4 3 1.3 0.2 setosa 13.1 17.5
4 4.4 3.2 1.3 0.2 setosa 17.5 17.5
5 4.9 2.4 3.3 1 versicolor 4.9 20
6 5 2 3.5 1 versicolor 9.9 20
7 5 2.3 3.3 1 versicolor 14.9 20
8 5.1 2.5 3 1.1 versicolor 20 20
9 4.9 2.5 4.5 1.7 virginica 4.9 22
10 5.6 2.8 4.9 2 virginica 10.5 22
11 5.7 2.5 5 2 virginica 16.2 22
12 5.8 2.7 5.1 1.9 virginica 22 22
Python: Too long and verbose!
import numpy as np
import pandas as pd
import seaborn as sns
iris = sns.load_dataset('iris')
iris.sort_values(['species','sepal_length']).assign(
index_species=lambda x: x.groupby('species').cumcount(),
cs=lambda x: x.groupby('species').sepal_length.cumsum(),
tmp=lambda x: np.where(x.index_species==3, x.cs, 0),
cs4th=lambda x: x.groupby('species').tmp.transform(sum)
).iloc[list(range(0,4))+list(range(50,54))+list(range(100,104))]
sepal_length sepal_width petal_length ... cs tmp cs4th
13 4.3 3.0 1.1 ... 4.3 0.0 17.5
8 4.4 2.9 1.4 ... 8.7 0.0 17.5
38 4.4 3.0 1.3 ... 13.1 0.0 17.5
42 4.4 3.2 1.3 ... 17.5 17.5 17.5
57 4.9 2.4 3.3 ... 4.9 0.0 20.0
60 5.0 2.0 3.5 ... 9.9 0.0 20.0
93 5.0 2.3 3.3 ... 14.9 0.0 20.0
98 5.1 2.5 3.0 ... 20.0 20.0 20.0
106 4.9 2.5 4.5 ... 4.9 0.0 22.0
121 5.6 2.8 4.9 ... 10.5 0.0 22.0
113 5.7 2.5 5.0 ... 16.2 0.0 22.0
101 5.8 2.7 5.1 ... 22.0 22.0 22.0
Python : My better solution(not smart. There is room for improvement about specifications of groupby )
iris.sort_values(['species','sepal_length']).assign(
cs=lambda x: x.groupby('species').sepal_length.transform('cumsum'),
cs4th=lambda x: x.merge(
x.groupby('species', as_index=False).nth(3).loc[:,['species','cs']],on='species')
.iloc[:,-1]
)
This doesn't work in a good way
iris.groupby('species').transform('nth(3)')
Here is an updated solution, using Pandas, which is still longer than what you will get with dplyr:
import seaborn as sns
import pandas as pd
iris = sns.load_dataset('iris')
iris['cs'] = (iris
.sort_values(['species','sepal_length'])
.groupby('species')['sepal_length']
.transform('cumsum'))
M = (iris
.sort_values(['species','cs'])
.groupby('species')['cs'])
groupby has a nth function that gets you a row per group : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.nth.html
iris = (iris
.sort_values(['species','cs'])
.reset_index(drop=True)
.merge(M.nth(3), how='left', on='species')
.rename(columns={'cs_x':'cs',
'cs_y':'cs4th'})
)
iris.head()
sepal_length sepal_width petal_length petal_width species cs cs4th
0 4.3 3.0 1.1 0.1 setosa 4.3 17.5
1 4.4 2.9 1.4 0.2 setosa 8.7 17.5
2 4.4 3.0 1.3 0.2 setosa 13.1 17.5
3 4.4 3.2 1.3 0.2 setosa 17.5 17.5
4 4.5 2.3 1.3 0.3 setosa 22.0 17.5
Update: 16/04/2021 ... Below is a better way to achieve the OP's goal:
(iris
.sort_values(['species', 'sepal_length'])
.assign(cs = lambda df: df.groupby('species')
.sepal_length
.transform('cumsum'),
cs4th = lambda df: df.groupby('species')
.cs
.transform('nth', 3)
)
.groupby('species')
.head(4)
)
sepal_length sepal_width petal_length petal_width species cs cs4th
13 4.3 3.0 1.1 0.1 setosa 4.3 17.5
8 4.4 2.9 1.4 0.2 setosa 8.7 17.5
38 4.4 3.0 1.3 0.2 setosa 13.1 17.5
42 4.4 3.2 1.3 0.2 setosa 17.5 17.5
57 4.9 2.4 3.3 1.0 versicolor 4.9 20.0
60 5.0 2.0 3.5 1.0 versicolor 9.9 20.0
93 5.0 2.3 3.3 1.0 versicolor 14.9 20.0
98 5.1 2.5 3.0 1.1 versicolor 20.0 20.0
106 4.9 2.5 4.5 1.7 virginica 4.9 22.0
121 5.6 2.8 4.9 2.0 virginica 10.5 22.0
113 5.7 2.5 5.0 2.0 virginica 16.2 22.0
101 5.8 2.7 5.1 1.9 virginica 22.0 22.0
Now you can do it in a non-verbose way as you did in R with datar in python:
>>> from datar.datasets import iris
>>> from datar.all import f, arrange, mutate, cumsum, slice
>>>
>>> (iris >>
... arrange(f.Species, f.Sepal_Length) >>
... group_by(f.Species) >>
... mutate(cs=cumsum(f.Sepal_Length), cs4th=cumsum(f.Sepal_Length)[3]) >>
... slice(f[1:4]))
Sepal_Length Sepal_Width Petal_Length Petal_Width Species cs cs4th
0 4.3 3.0 1.1 0.1 setosa 4.3 17.5
1 4.4 2.9 1.4 0.2 setosa 8.7 17.5
2 4.4 3.0 1.3 0.2 setosa 13.1 17.5
3 4.4 3.2 1.3 0.2 setosa 17.5 17.5
4 4.9 2.4 3.3 1.0 versicolor 4.9 20.0
5 5.0 2.0 3.5 1.0 versicolor 9.9 20.0
6 5.0 2.3 3.3 1.0 versicolor 14.9 20.0
7 5.1 2.5 3.0 1.1 versicolor 20.0 20.0
8 4.9 2.5 4.5 1.7 virginica 4.9 22.0
9 5.6 2.8 4.9 2.0 virginica 10.5 22.0
10 5.7 2.5 5.0 2.0 virginica 16.2 22.0
11 5.8 2.7 5.1 1.9 virginica 22.0 22.0
[Groups: ['Species'] (n=3)]
I am the author of the package. Feel free to submit issues if you have any questions.
How can I change the values of the column 4 to 1 and -1, so that Iris-setosa is replace with 1 and Iris-virginica replaced with -1?
0 1 2 3 4
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
5 5.4 3.9 1.7 0.4 Iris-setosa
6 4.6 3.4 1.4 0.3 Iris-setosa
.. ... ... ... ... ...
120 6.9 3.2 5.7 2.3 Iris-virginica
121 5.6 2.8 4.9 2.0 Iris-virginica
122 7.7 2.8 6.7 2.0 Iris-virginica
123 6.3 2.7 4.9 1.8 Iris-virginica
124 6.7 3.3 5.7 2.1 Iris-virginica
125 7.2 3.2 6.0 1.8 Iris-virginica
126 6.2 2.8 4.8 1.8 Iris-virginica
I would appreciate the help.
You can use replace
d = {'Iris-setosa': 1, 'Iris-virginica': -1}
df['4'].replace(d,inplace = True)
0 1 2 3 4
0 5.1 3.5 1.4 0.2 1
1 4.9 3.0 1.4 0.2 1
2 4.7 3.2 1.3 0.2 1
3 4.6 3.1 1.5 0.2 1
4 5.0 3.6 1.4 0.2 1
5 5.4 3.9 1.7 0.4 1
6 4.6 3.4 1.4 0.3 1
.. ... ... ... ... ...
120 6.9 3.2 5.7 2.3 -1
121 5.6 2.8 4.9 2.0 -1
122 7.7 2.8 6.7 2.0 -1
123 6.3 2.7 4.9 1.8 -1
124 6.7 3.3 5.7 2.1 -1
125 7.2 3.2 6.0 1.8 -1
126 6.2 2.8 4.8 1.8 -1
df.iloc[df["4"]=="Iris-setosa","4"]=1
df.iloc[df["4"]=="Iris-virginica","4"]=-1
I would do something like this
def encode_row(self, row):
if row[4] == "Iris-setosa":
return 1
return -1
df_test[4] = df_test.apply(lambda row : self.encode_row(row), axis=1)
assuming that df_test is your data frame
Sounds like
df['4'] = np.where(df['4'] == 'Iris-setosa', 1, -1)
should do the job
I am trying to convert 10 years(1991-2000) daily data temperature into months in pandas using python 2.7. I have taken the data from the web page ("http://owww.met.hu/eghajlat/eghajlati_adatsorok/bp/Navig/201j_EN.htm"). But I have got a trouble. The data looks as follows:
` datum d_ta d_tx d_tn d_rs d_rf d_ss
---------- ----- ----- ----- ----- ---- -----
1991-01-01 3.0 5.4 1.5 0.2 1 0.0
1991-01-02 4.0 7.2 1.9 0.0 1 6.8
1991-01-03 6.0 8.8 3.6 0.0 1 2.5
1991-01-04 3.7 7.6 2.3 . 2.9
1991-01-05 4.9 7.2 1.5 . 0.0
1991-01-06 2.7 6.2 0.5 . 0.9
1991-01-07 4.0 8.4 1.9 . 3.2
1991-01-08 6.7 8.9 4.6 0.0 0 0.0
1991-01-09 4.1 8.0 3.0 0.3 0 0.0
1991-01-10 4.2 8.1 2.4 0.0 0 0.2
1991-01-11 4.7 6.9 3.6 . 0.7
1991-01-12 7.0 9.8 3.2 . 0.1
1991-01-13 6.3 8.2 4.6 . 0.0
1991-01-14 3.7 6.8 2.2 . 4.7
1991-01-15 0.7 3.4 -1.0 . 7.6
1991-01-16 -1.4 1.4 -3.0 . 7.5
1991-01-17 -2.5 2.1 -5.0 . 8.1
1991-01-18 -1.8 4.0 -5.1 . 7.0
1991-01-19 -3.0 0.1 -4.0 . 5.8
1991-01-20 -2.8 0.5 -5.2 . 5.6
1991-01-21 -5.0 -1.7 -7.8 . 0.0
1991-01-22 -3.3 -1.8 -4.2 . 0.0
1991-01-23 -1.7 0.4 -2.5 . 0.0
1991-01-24 0.0 3.2 -1.6 . 2.2
1991-01-25 1.1 5.1 -0.9 . 6.4
1991-01-26 0.6 4.5 -0.5 . 7.1
1991-01-27 -1.5 2.2 -4.0 . 0.0
1991-01-28 1.3 5.6 -0.8 . 3.8
1991-01-29 0.7 2.6 -0.4 . 1.1
1991-01-30 0.3 4.0 -1.2 . 7.3
1991-01-31 -5.0 -0.2 -7.4 . 8.0
1991-02-01 -8.1 -3.7 -11.7 . 7.6
1991-02-02 -7.0 -2.0 -10.2 . 7.4
1991-02-03 -5.3 0.8 -9.9 . 7.8
1991-02-04 -5.1 -2.3 -7.7 0.1 4 3.7
1991-02-05 -7.5 -4.4 -8.3 . 2.6
1991-02-06 -7.1 -2.2 -11.0 2.0 4 4.9
1991-02-07 -1.8 0.0 -2.7 2.7 4 0.0
1991-02-08 -1.8 0.4 -3.6 21.8 4 0.0
1991-02-09 0.8 2.0 -0.2 1.3 1 0.0
1991-02-10 1.6 3.4 -0.2 3.4 1 0.0
1991-02-11 0.7 2.5 -0.5 1.1 4 0.0
1991-02-12 -0.5 1.2 -1.0 4.7 4 0.0
1991-02-13 -2.0 -0.8 -2.6 0.0 4 0.0
1991-02-14 -1.8 1.4 -3.5 0.1 4 6.3
1991-02-15 -4.2 -0.8 -6.4 . 8.4
1991-02-16 -5.6 -2.4 -9.5 0.1 4 1.5
1991-02-17 -1.3 1.9 -3.8 . 8.3
1991-02-18 -1.3 4.5 -5.5 . 8.5
1991-02-19 -1.5 3.6 -4.7 . 5.8
1991-02-20 -1.4 4.7 -5.4 . 7.3
1991-02-21 1.0 6.1 -2.1 . 6.9
1991-02-22 4.1 10.1 0.5 . 3.2
1991-02-23 5.1 9.7 2.9 . 7.5
1991-02-24 6.0 8.6 5.5 0.0 1 1.8
1991-02-25 3.6 9.2 0.6 . 8.1
1991-02-26 3.9 9.3 1.2 . 2.9
1991-02-27 3.1 6.5 0.3 . 8.8
1991-02-28 1.4 5.3 -2.4 . 4.3
1991-03-01 1.7 3.5 -0.2 . 0.0
1991-03-02 2.4 3.3 1.7 0.8 4 0.0
1991-03-03 3.1 3.8 1.7 . 0.0
1991-03-04 4.3 6.2 2.7 . 1.5
1991-03-05 3.0 5.7 0.6 . 1.2
.........`
Somebody please can help me how I can convert it into months. Thanks!
After copying the table into memory starting from the numbers:
import pandas, bs4, requests, itertools, io
html = requests.get("http://owww.met.hu/eghajlat/eghajlati_adatsorok/bp/Navig/201j_EN.htm").text
soup = bs4.BeautifulSoup(html)
# the manual way:
# data = pandas.read_clipboard(names=["datum", "d_ta", "d_tx", "d_tn", "d_rs", "d_rf", "d_ss"], index_col='datum', parse_dates=['datum'])
# the automatic way:
table_html = '\n'.join(itertools.islice(map(lambda _: _.text, soup.find_all("pre")), 3, None))
data = pandas.read_table(io.StringIO(table_html), header=None, sep='\s+', index_col=0, parse_dates=0,
names=["datum", "d_ta", "d_tx", "d_tn", "d_rs", "d_rf", "d_ss"])
data.resample('m').mean()
You can, of course, use a different aggregation function other than the mean. The output:
d_ta d_tx d_tn d_rf d_ss
datum
1991-01-31 1.345161 4.609677 -0.574194 3.000000 1.583333
1991-02-28 -1.142857 2.592857 -3.639286 5.157143 1.516667
1991-03-31 8.158065 12.093548 5.141935 2.645161 0.775000
1991-04-30 9.920000 14.570000 6.510000 4.066667 4.450000
1991-05-31 13.396774 17.780645 9.738710 4.529032 4.280000
...