I have a dataframe as follows
id
return1
return2
weekday1
0.1
0.2
weekday1
0.2
0.4
weekday1
0.3
0.5
weekday2
0.4
0.7
weekday2
0.5
0.6
weekday2
0.6
0.1
I know how to do the rolling-groupby-sum, which is
df.groupby(df.index.dayofweek) #originally the index is a time series
.rolling(52).sum()
.droplevel(level=0).sort_index()
Now I need to add 1 to all the elements first and then multiply those in the same group as follows.
Step 1 - add 1:
id
return1
return2
weekday1
1.1
1.2
weekday1
1.2
1.4
weekday1
1.3
1.5
weekday2
1.4
1.7
weekday2
1.5
1.6
weekday2
1.6
1.1
Step2 - multiply by group:
id
return1
return2
weekday1
1.1×1.2×1.3
1.2×1.4×1.5
weekday2
1.4×1.5×1.6
1.7×1.6×1.1
I use the following codes
df.transform(lambda x : x+1).groupby(df.index.dayofweek)
.rolling(52).mul()
.droplevel(level=0).sort_index()
but it gives an AttributeError: 'RollingGroupby' object has no attribute 'mul'.
cumprod() doesn't work either. Perhaps it has somthing to do with the rolling part for that there's no such thing as rolling.cumprod() or rolling.mul().
Is there a better way to do the multiplication within a group with rolling part?
Use numpy.prod in Rolling.apply:
df.add(1).groupby(df.index.dayofweek).rolling(52).apply(np.prod)
Btw, from expected ouput seems need GroupBy.prod:
df.add(1).groupby(df.index).prod()
Related
I am trying to subtract a list of values from each key in a dictionary. Each key in the dictionary contains 20 y-values for a predicted line. I want to find the difference between these y-values and a different set of given values.
ydata contains 20 points. ycalc has a length of 100 to which keys are assigned for, from L1-L99. Each Key contains 20 points as well. I want to subtract each key from ydata. This is what I have tried, the main issue is that my method return a list of 20 values, when I expect a list of 100 values where each value is a list of 20 points.
ydata = [ 1.2 1.8 1.7 3.0 3.5 3.2 4.5 4.8 5.3 6.2 5.7 6.8 7.0 7.8 8.5 8.6 9.1 11.5 10.3 10.8]
ycalc = 'L0': array([-0.8, -0.6, -0.4, -0.2, 0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2,
1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. ]), 'L1': array([-0.57777778, -0.37777778, -0.17777778, 0.02222222, 0.22222222,
0.42222222, 0.62222222, 0.82222222, 1.02222222, 1.22222222,
1.42222222, 1.62222222, 1.82222222, 2.02222222, 2.22222222,
2.42222222, 2.62222222, 2.82222222, 3.02222222, 3.22222222]), 'L2': array([-0.35555556, -0.15555556, 0.04444444, 0.24444444, 0.44444444,
0.64444444, 0.84444444, 1.04444444, 1.24444444, 1.44444444,
1.64444444, 1.84444444, 2.04444444, 2.24444444, 2.44444444,
2.64444444, 2.84444444, 3.04444444, 3.24444444, 3.44444444]), 'L3': array([-0.13333333, 0.06666667, 0.26666667, 0.46666667, 0.66666667,
0.86666667, 1.06666667, 1.26666667, 1.46666667, 1.66666667,
1.86666667, 2.06666667, 2.26666667, 2.46666667, 2.66666667,
2.86666667, 3.06666667, 3.26666667, 3.46666667, 3.66666667]), 'L4': array([0.08888889, 0.28888889, 0.48888889, 0.68888889, 0.88888889,
1.08888889, 1.28888889, 1.48888889, 1.68888889, 1.88888889,
2.08888889, 2.28888889, 2.48888889, 2.68888889, 2.88888889,
3.08888889, 3.28888889, 3.48888889, 3.68888889, 3.88888889]), etc.
for i in ycalc:
ydiff = - i + array(ydata)
print(ydiff)
returns [-0.2 0. -0.5 0.4 0.5 -0.2 0.7 0.6 0.7 1.2 0.3 1. 0.8 1.2
1.5 1.2 1.3 3.3 1.7 1.8]
but I want something like this:
([-0.2 0. -0.5 0.4 0.5 -0.2 0.7 0.6 0.7 1.2 0.3 1. 0.8 1.2 1.5 1.2 1.3 3.3 1.7 1.8]), ([-0.3 0.1 -0.6 0.4 0.5 -0.2 0.2 0.6 0.8 1.2 0.5 1. 0.8 1.2 1.5 1.2 1.3 3.3 1.7 1.8]), etc.
I'm currently trying to get into regex expressions - at the moment I want to write one which acts as follows:
import regex
a = '[[0.1 0.1 0.1 0.1]\n [1.2 1.2 1.2 1.2]\n [2.3 2.3 2.3 2.3]\n [3.4 3.4 3.4 3.4]]'
a_transformed = re.sub(regex_expression, a)
# a_transformed = '0.1 0.1 0.1 0.1 1.2 1.2 1.2 1.2 2.3 2.3 2.3 2.3 3.4 3.4 3.4 3.4'
Basically I only need to sub all occurences of (,n,[,]), but currently I'm struggling to get the expression right.
Thanks for the help in advance!
You can try the following:
>>> re.sub(r'[^\d. ]', '', a)
'0.1 0.1 0.1 0.1 1.2 1.2 1.2 1.2 2.3 2.3 2.3 2.3 3.4 3.4 3.4 3.4'
Here '[^\d. ]' means anything except a digit, '.' and space like characters. ^ inside [] means negate this character group.
I want to print out a list of tuples in a formated form...
I have the following list which contains a tuple formed by a string and other tuple of floats:
returnList = [('mountain_113', (1.5, 1.0, 1.0338541666666667, 1.9166666666666667, 0.6614583333333334, 1.2598673502604167, 0.03375385780420761, 2.3029198140922946, 0.1698906919822926, 0.746665039060726)),
('street_gre295', (2.0, 1.033203125, 0.84375, 0.7421875, 0.9375, 0.654083251953125, 1.9498253377306005, 1.7506276776130898, 1.1736444973883702, 0.6882098607982887)),
('opencountry_134', (1.0, 0.99609375, 1.10546875, 1.875, 0.9296875, 1.740234375, 0.015625, 1.90625, 0.0625, 0.75))]
I have the following code
def format(inter): return f'{inter:.1f}'
[print(' '.join( list(map(format, n[1])) )) if i!=0 else print(n[0]) for i,n in enumerate(returnList)]
However it is causing a error such as
Traceback (most recent call last):
File "main.py", line 69, in <module>
[print(' '.join( list(map(format, n[1])) )) if i!=0 else print(n[0]) for i,n in enumerate(listStatics)]
File "main.py", line 69, in <listcomp>
[print(' '.join( list(map(format, n[1])) )) if i!=0 else print(n[0]) for i,n in enumerate(listStatics)]
TypeError: 'float' object is not iterable
I need to print them as follows: print the first element of the tuple, then for each float of the second element print them out whith the format function
I would like it to be printed using the list comprehension... For the dalta sample it should output the data down bellow
mountain_113 1.5 1.0 1.0 1.9 0.7 1.3 0.0 2.3 0.2 0.7
street_gre295 2.0 1.0 0.8 0.7 0.9 0.7 1.9 1.8 1.2 0.7
opencountry_134 1.0 1.0 1.1 1.9 0.9 1.7 0.0 1.9 0.1 0.8
for i, t in returnList:
print(i, " ".join(map("{:.1f}".format, t)))
Prints:
mountain_113 1.5 1.0 1.0 1.9 0.7 1.3 0.0 2.3 0.2 0.7
street_gre295 2.0 1.0 0.8 0.7 0.9 0.7 1.9 1.8 1.2 0.7
opencountry_134 1.0 1.0 1.1 1.9 0.9 1.7 0.0 1.9 0.1 0.8
If you want to call your format function:
def format(inter):
return f"{inter:.1f}"
for i, t in returnList:
print(i, " ".join(map(format, t)))
This should work:
for t in returnList:
datapoints = [str(dt) for dt in t[1]]
print('{} {}'.format(t[0], " ".join(datapoints)))
how do i replicate the structure of result of itertools.product?
so as you know itertools.product gives us an object and we need to put them in a list so we can print it
.. something like this.. right?
import itertools
import numpy as np
CN=np.asarray((itertools.product([0,1], repeat=5)))
print(CN)
i want to be able to make something like that but i want the data to be from a csv file.. so i want to make something like this
#PSEUDOCODE
import pandas as pd
df = pd.read_csv('csv here')
#a b c d are the columns that i want to get
x = list(df['a'] df['c'] df['c'] df['d'])
print(x)
so the result will be something like this
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4]
[4.6 3.4 1.4 0.3]
[5. 3.4 1.5 0.2]
[4.4 2.9 1.4 0.2]
[4.9 3.1 1.5 0.1]]
how can i do that?
EDIT:
i am trying to learn how to do recursive feature elimination and i saw in some codes in google that they use the iris data set..
from sklearn import datasets
dataset = datasets.load_iris()
x = dataset.data
print(x)
and when printed it looked something like this
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4]
[4.6 3.4 1.4 0.3]
[5. 3.4 1.5 0.2]
[4.4 2.9 1.4 0.2]
[4.9 3.1 1.5 0.1]]
how could i make my dataset something like that so i can use this RFE template ?
# Recursive Feature Elimination
from sklearn import datasets
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
# load the iris datasets
dataset = datasets.load_iris()
# create a base classifier used to evaluate a subset of attributes
model = LogisticRegression()
# create the RFE model and select 3 attributes
rfe = RFE(model, 3)
print(rfe)
rfe = rfe.fit(dataset.data, dataset.target)
print("features:",dataset.data)
print("target:",dataset.target)
print(rfe)
# summarize the selection of the attributes
print(rfe.support_)
print(rfe.ranking_)
You don't have to. If you want to use rfe.fit function, you need to feed features and target seperately.
So if your df is like:
a b c d target
0 5.1 3.5 1.4 0.2 1
1 4.9 3.0 1.4 0.2 1
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 1
5 5.4 3.9 1.7 0.4 1
6 4.6 3.4 1.4 0.3 0
7 5.0 3.4 1.5 0.2 0
8 4.4 2.9 1.4 0.2 1
9 4.9 3.1 1.5 0.1 1
you can use:
...
rfe = rfe.fit(df[['a', 'b', 'c', 'd']], df['target'])
...
I have a data frame (df) in pandas with four columns and I want a new column to represent the mean of this four columns: df['mean']= df.mean(1)
1 2 3 4 mean
NaN NaN NaN NaN NaN
5.9 5.4 2.4 3.2 4.225
0.6 0.7 0.7 0.7 0.675
2.5 1.6 1.5 1.2 1.700
0.4 0.4 0.4 0.4 0.400
So far so good. But when I save the results to a csv file this is what I found:
5.9,5.4,2.4,3.2,4.2250000000000005
0.6,0.7,0.7,0.7,0.6749999999999999
2.5,1.6,1.5,1.2,1.7
0.4,0.4,0.4,0.4,0.4
I guess I can force the format in the mean column, but any idea why this is happenning?
I am using winpython with python 3.3.2 and pandas 0.11.0
You could use the float_format parameter:
import pandas as pd
import io
content = '''\
1 2 3 4 mean
NaN NaN NaN NaN NaN
5.9 5.4 2.4 3.2 4.225
0.6 0.7 0.7 0.7 0.675
2.5 1.6 1.5 1.2 1.700
0.4 0.4 0.4 0.4 0.400'''
df = pd.read_table(io.BytesIO(content), sep='\s+')
df.to_csv('/tmp/test.csv', float_format='%g', index=False)
yields
1,2,3,4,mean
,,,,
5.9,5.4,2.4,3.2,4.225
0.6,0.7,0.7,0.7,0.675
2.5,1.6,1.5,1.2,1.7
0.4,0.4,0.4,0.4,0.4
The answers seem correct. Floating point numbers cannot be perfectly represented on our systems. There are bound to be some differences. Read The Floating Point Guide.
>>> a = 5.9+5.4+2.4+3.2
>>> a / 4
4.2250000000000005
As you said, you could always format the results if you want to get only a fixed number of points after the decimal.
>>> "{:.3f}".format(a/4)
'4.225'