Pandas dynamic rolling on the dataframe

Pandas dynamic rolling on the dataframe - python

I have a panda dataframe
import yfinance as yf
import numpy as np
from scipy.signal import argrelextrema
import matplotlib.pyplot as plt
import pandas as pd
n = 2
df = yf.Ticker("INFY.NS").history(period='400d', interval='1D')
df['max'] = df.iloc[argrelextrema(df1['Close'].values, np.greater_equal,order=n)[0]]['Close']
print(df)
I have created a column name max which has values as shown in the screenshot. The screenshot is only for reference. Sample data can be obtained by running the code above.
I want to compare max values (which are non Nan) with each other but only in the forward direction.
for example,
777.244202 will be compared with all other values of the "max" column which are higher than 777.244202
print those rows which are having .618 Fibonacci retracement with 777.244202
Is there any simpler method in pandas that can do this?

Related

Change pandas DataFrame to numpy array but keeping column names

I have a pandas DataFrame from the sklearn.datasets Boston house price data and am trying to convert this to a numpy array but keeping column names. Here is the code I tried:
from sklearn import datasets ## imports datasets from scikit-learn
import numpy as np
import pandas as pd
data = datasets.load_boston() ## loads Boston dataset from datasets library
df = pd.DataFrame(data.data, columns=data.feature_names)
X = df.to_numpy()
print(X.dtype.names)
However this returns None and therefore column names are not kept. Does anyone understand why?
Thanks

try this :
w = (data.feature_names).reshape(13,1)
X = np.vstack((w.T, data.data))
print (X)

np.argmax does not return the correct index

I used np.argmax to search for the index of the highest value of this array:
And it returned 720. It was supposed to be 721. I tried to google the problem but haven't found the solution yet.
Here is my code:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
from matplotlib.pylab import rcParams
from statsmodels.tsa.stattools import acf, pacf
dir='C:\\Users\\DELL\\Google Drive\\JVN couse materials\\Projects\\Practice projects\\Time series project\\energydata_complete.csv'
rawdata=pd.read_csv(dir, index_col='date')
timeseries=pd.DataFrame(rawdata['Appliances'])
timeseries.index=pd.to_datetime(timeseries.index)
timeseries['Log scale']=np.log10(timeseries['Appliances'])
lag_pacf = pacf(timeseries.loc['2016-01-12':'2016-01-21','Log scale'], nlags=1439, method='ols')
highest_pacf_lag=np.argmax(lag_pacf[1:]) ###this is where the problem happens

csv file indexes values from 1 and Python (and numpy and pandas too)is zero indexed. Hence cell no 721 is shown as 720 in python

Why does Pandas say this data frame has only one column?

I began a python course in linear and logistic regression but I am encountering what is probably a stupid error. I have to work with this data frame:
http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
And this is my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
rwq = pd.read_csv('*filepath*/winequality-red.csv')
rows = len(rwq.index)
cols = rwq.shape[1]
When I print rows and cols, rows correctly prints 1599 but for some reason cols always equals 1 (when in fact they are 12).
I also tried 'len(rwq.columns)' and I still get 1.
Am I doing something wrong or is the problem with the file provided?

Apply log2 transformation to a pandas DataFrame

I want to apply log2 with applymap and np2.log2to a data and show it using boxplot, here is the code I have written:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data = pd.read_csv('testdata.csv')
df = pd.DataFrame(data)
################################
# a.
df.boxplot()
plt.title('Raw Data')
################################
# b.
df.applymap(np.log2)
df.boxplot()
plt.title('Normalized Data')
and below is the boxplot I get for my RAW data which is okay, but I do get the same boxplot after applying log2 transformation !!! can anyone please tell me what I am doing wrong and what should be corrected to get the normalized data with applymap and np.log2

A much faster way to do this would be:
df = np.log2(df)
Don't forget to assign the result back to df.

According to API Reference DataFrame.applymap(func)
Apply a function to a DataFrame that is intended to operate
elementwise, i.e. like doing map(func, series) for each series in the
DataFrame
It won't change the DataFrame you need to get the return value and use it.

Pandas now has the transform() function, which in your case amounts to:
df = df.transform(lambda x: np.log2(x))

Loading SKLearn cancer dataset into Pandas DataFrame

I'm trying to load a sklearn.dataset, and missing a column, according to the keys (target_names, target & DESCR). I have tried various methods to include the last column, but with errors.
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
print cancer.keys()
the keys are ['target_names', 'data', 'target', 'DESCR', 'feature_names']
data = pd.DataFrame(cancer.data, columns=[cancer.feature_names])
print data.describe()
with the code above, it only returns 30 column, when I need 31 columns. What is the best way load scikit-learn datasets into pandas DataFrame.

Another option, but a one-liner, to create the dataframe including the features and target variables is:
import pandas as pd
import numpy as np
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
df = pd.DataFrame(np.c_[cancer['data'], cancer['target']],
columns= np.append(cancer['feature_names'], ['target']))

If you want to have a target column you will need to add it because it's not in cancer.data. cancer.target has the column with 0 or 1, and cancer.target_names has the label. I hope the following is what you want:
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
print cancer.keys()
data = pd.DataFrame(cancer.data, columns=[cancer.feature_names])
print data.describe()
data = data.assign(target=pd.Series(cancer.target))
print data.describe()
# In case you want labels instead of numbers.
data.replace(to_replace={'target': {0: cancer.target_names[0]}}, inplace=True)
data.replace(to_replace={'target': {1: cancer.target_names[1]}}, inplace=True)
print data.shape # data.describe() won't show the "target" column here because I converted its value to string.

This works too, also using pd.Series.
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
print cancer.keys()
data = pd.DataFrame(cancer.data, columns=[cancer.feature_names])
data['Target'] = pd.Series(data=cancer.target, index=data.index)
print data.keys()
print data.shape

Only target column is missing, so you can just add one.
df = pd.DataFrame(cancer.data, columns=[cancer.feature_names])
df['target'] = cancer.target

mapping target names can be handled elegantly using map():
data["target"] = pd.Categorical(pd.Series(cancer.target).map(lambda x: cancer.target_names[x]))

As of scikit-learn 0.23 you can do the following to get a DataFrame with the target column included.
df = load_breast_cancer(as_frame=True)
df.frame

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas dynamic rolling on the dataframe - python

Related

Change pandas DataFrame to numpy array but keeping column names

np.argmax does not return the correct index

Why does Pandas say this data frame has only one column?

Apply log2 transformation to a pandas DataFrame

Loading SKLearn cancer dataset into Pandas DataFrame

Categories

Resources