I'm new using tsfresh, when I use the following lines, I get the extracted feature as desired
import numpy as np
import pandas as pd
from tsfresh.feature_extraction import ComprehensiveFCParameters
from tsfresh import extract_features
df = pd.DataFrame(np.array([[1, 2, 3, 4],[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]]),
columns=['Context ID','Time Elapsed', 'time_serie A', 'time_serie B'])
settings = ComprehensiveFCParameters()
kind_to_fc_parameters = {
"time_serie A": {},
"time_serie B": {"mean": None}
}
extract_features = extract_features(df, kind_to_fc_parameters =kind_to_fc_parameters,
column_id='Context ID', column_sort="Time Elapsed")
extract_features
However, when I change {"mean": None} by {"absolute_maximum": None} or "count_above": [{"t": 0.05}] it'won't work anymore:
module 'tsfresh.feature_extraction.feature_calculators' has no
attribute 'absolute_maximum'
What do I miss ?
I just had a similar issue with another calculation I chose and found it's just not in the feature_calculators.py (you can open it from yourdirectory\Python\Python37\Lib\site-packages\tsfresh\feature_extraction), so I did pip install tsfresh -U in terminal to get the latest tsfresh, checked feature_calculators.py again, my desired function is there and code runs fine then.
Related
I am trying to run the below mentioned query. It was also executed successfully but is not showing any kind of output. I am totally clueless about the same thing.
Why is this happening?
import pandas as pd
import qgrid
df = pd.DataFrame({'A': [1.2, 'foo', 4], 'B': [3, 4, 5]})
df = df.set_index(pd.Index(['bar', 7, 3.2]))
view = qgrid.show_grid(df, grid_options={'fullWidthRows': True}, show_toolbar=True)
view
also see My attached Screen shot for the same.
I can validate a DataFrame index using the DataFrameSchema like this:
import pandera as pa
from pandera import Column, DataFrameSchema, Check, Index
schema = DataFrameSchema(
columns={
"column1": pa.Column(int),
},
index=pa.Index(int, name="index_name"),
)
# raises the error as expected
schema.validate(
pd.DataFrame({"column1": [1, 2, 3]}, index=pd.Index([1, 2, 3], name="index_incorrect_name"))
)
Is there a way to do the same using a SchemaModel?
You can do as follows -
import pandera as pa
from pandera.typing import Index, Series
class Schema(pa.SchemaModel):
idx: Index[int] = pa.Field(ge=0, check_name=True)
column1: Series[int]
df = pd.DataFrame({"column1": [1, 2, 3]}, index=pd.Index([1, 2, 3], name="index_incorrect_name"))
Schema.validate(df)
Found an answer in GitHub
You can use pa.typing.Index to type-annotate an index.
class Schema(pa.SchemaModel):
column1: pa.typing.Series[int]
index_name: pa.typing.Index[int] = pa.Field(check_name=True)
See how you can validate a MultiIndex index: https://pandera.readthedocs.io/en/stable/schema_models.html#multiindex
I have a problem. I want to print the 5 most names. But unfortunately the names are not only Latin letters, but also Chinese letters. As soon as I want to print the plot, I got:
C:\Users\user\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:240: RuntimeWarning: Glyph 32422 missing from current font.
How can I solve this error?
import pandas as pd
import seaborn as sns
d = {'id': [1, 2, 3, 4, 5],
'name': ['Max Power', 'Jessica', '约翰·多伊', '哈拉尔量杯', 'Frank High'],
}
df = pd.DataFrame(data=d)
print(df)
df_count = df['name'].value_counts()[:5]
ax = sns.barplot(x=df_count.index, y=df_count)
I need to modify my code:
db_profit_platform=db[['Source','Device','Country','Profit']]
db_profit_final=db_profit_platform.groupby(['Source','Device','Country'])['Profit'].apply(sum).reset_index()
Now I need to add Bid and get average bid after group by (different aggregations for different columns):
to get: Source Device Country SumProfit Average Bid
How can I do it? (and maybe I will need more aggregations) Thanks
You can use agg function, here a minimal working example
import numpy as np
import pandas as pd
size = 10
db = pd.DataFrame({
'Source': np.random.randint(1, 3, size=size),
'Device': np.random.randint(1, 3, size=size),
'Country': np.random.randint(1, 3, size=size),
'Profit': np.random.randn(size),
'Bid': np.random.randn(size)
})
db.groupby(["Source", "Device", "Country"]).agg(
sum_profit=("Profit", "sum"),
avg_bid=("Bid", "mean")
)
See the official documentation https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.agg.html as well as this question
If the city has been mentioned in cities_specific I would like to create a flag in the cities_all data. It's just a minimal example and in reality I would like to create multiple of these flags based on multiple data frames. That's why I tried to solve it with isin instead of a join.
However, I am running into ValueError: Length of values (3) does not match length of index (7).
# import packages
import pandas as pd
import numpy as np
# create minimal data
cities_specific = pd.DataFrame({'city': ['Melbourne', 'Cairns', 'Sydney'],
'n': [10, 4, 8]})
cities_all = pd.DataFrame({'city': ['Vancouver', 'Melbourne', 'Athen', 'Vienna', 'Cairns',
'Berlin', 'Sydney'],
'inhabitants': [675218, 5000000, 664046, 1897000, 150041, 3769000, 5312000]})
# get value error
# how can this be solved differently?
cities_all.assign(in_cities_specific=np.where(cities_specific.city.isin(cities_all.city), '1', '0'))
# that's the solution I would like to get
expected_solution = pd.DataFrame({'city': ['Vancouver', 'Melbourne', 'Athen', 'Vienna', 'Cairns',
'Berlin', 'Sydney'],
'inhabitants': [675218, 5000000, 664046, 1897000, 150041, 3769000, 5312000],
'in_cities': [0, 1, 0, 0, 1, 0, 1]})
I think you are changing the position in the condition.
Here you have some alternatives:
cities_all.assign(
in_cities_specific=np.where(cities_all.city.isin(cities_specific.city), '1', '0')
)
or
cities_all["in_cities_specific"] =
cities_all["city"].isin(cities_specific["city"]).astype(int).astype(str)
or
condlist = [cities_all["city"].isin(cities_specific["city"])]
choicelist = ["1"]
cities_all["in_cities_specific"] = np.select(condlist, choicelist,default="0")