enter image description here
Here is my code and I have given an image of my dataset "Market_Basket_Optimisation". I have made list of lists transaction to give the input in apriori algorithm.But I am not getting the rules. I am new to machine learning and I am not able to find out the error.
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Data Preprocessing
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None)
transactions = []
for i in range(0, 7501):
transactions.append([str(dataset.values[i,j]) for j in range(0, 20)])
# Training Apriori on the dataset
from apyori import apriori
rules = apriori(transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2)
# Visualising the results
results = list(rules)
It is not clear from your question if you are using jupyter notebook or an IDE such as Spyder. If you are using an IDE such as Spyder, you are not likely to see the result unless you use a print statement. I suggest adding another line as follows:
print(resuult)
You should see the rules list. This is the same issue I had and using the print statement solved the problem for me. You will still need to define a function to output the result in a tabular format that makes sense.
Related
I am trying to solve a multiobjective optimization problem with 3 objectives and 2 decision variables using NSGA 2. The pymoo code for NSGA2 algorithm and termination criteria is given below. My pop_size is 100 and n_offspring is 100. The algorithm is iterated over 100 generations. I want to store all 100 values of decision variables considered in each generation for all 100 generations in a dataframe.
NSGA2 implementation in pymoo code:
from pymoo.algorithms.nsga2 import NSGA2
from pymoo.factory import get_sampling, get_crossover, get_mutation
algorithm = NSGA2(
pop_size=20,
n_offsprings=10,
sampling=get_sampling("real_random"),
crossover=get_crossover("real_sbx", prob=0.9, eta=15),
mutation=get_mutation("real_pm", prob=0.01,eta=20),
eliminate_duplicates=True
)
from pymoo.factory import get_termination
termination = get_termination("n_gen", 100)
from pymoo.optimize import minimize
res = minimize(MyProblem(),
algorithm,
termination,
seed=1,
save_history=True,
verbose=True)
What I have tried (My reference: stackoverflow question):
import pandas as pd
df2 = pd.DataFrame (algorithm.pop)
df2.head(10)
The result from above code is blank and on passing
print(df2)
I get
Empty DataFrame
Columns: []
Index: []
Glad you intend to use pymoo for your research. You have correctly enabled the save_history option, which means you can access the algorithm objects.
To have all solutions from the run, you can combine the offsprings (algorithm.off) from each generation. Don't forget the Population objects contain Individual objectives. With the get method you can get the X and F or other values. See the code below.
import pandas as pd
from pymoo.algorithms.nsga2 import NSGA2 from pymoo.factory import get_sampling, get_crossover, get_mutation, ZDT1 from pymoo.factory import get_termination from pymoo.model.population import Population from pymoo.optimize import minimize
problem = ZDT1()
algorithm = NSGA2(
pop_size=20,
n_offsprings=10,
sampling=get_sampling("real_random"),
crossover=get_crossover("real_sbx", prob=0.9, eta=15),
mutation=get_mutation("real_pm", prob=0.01,eta=20),
eliminate_duplicates=True )
termination = get_termination("n_gen", 10)
res = minimize(problem,
algorithm,
termination,
seed=1,
save_history=True,
verbose=True)
all_pop = Population()
for algorithm in res.history:
all_pop = Population.merge(all_pop, algorithm.off)
df = pd.DataFrame(all_pop.get("X"), columns=[f"X{i+1}" for i in range(problem.n_var)])
print(df)
Another way would be to use a callback and fill the data frame each generation. Similar as shown here: https://pymoo.org/interface/callback.html
I'm trying to write a block of code that will allow me to identify the risk contribution of assets in a portfolio. The covariance matrix is a 6x6 pandas dataframe.
My code is as follows:
import numpy as np
import pandas as pd
weights = np.array([.1,.2,.05,.25,.1,.3])
data = pd.DataFrame(np.random.randn(1000,6),columns = 'a','b','c','d','e','f'])
covariance = data.cov()
portfolio_variance = (weights*covariance*weights.T)[0,0]
sigma = np.sqrt(portfolio_variance)
marginal_risk = covariance*weights.T
risk_contribution = np.multiply(marginal_risk, weights.T)/sigma
print(risk_contribution)
When I try to run the code I get a KeyError, and if I remove the [0,0] from portfolio_variance I get output that doesn't seem to make sense.
Can somebody point me to my error(s)?
Three problems with your code:
Open your list operator square brackets on line 6:
data = pd.DataFrame(np.random.randn(1000,6),columns = ['a','b','c','d','e','f'])
You're using the two dimensional indexing operator wrong. You can't say [0,0], you have to say [0][0].
And last, because you named the columns, you have to use them when indexing, so it's actually ['a'][0]:
portfolio_variance = (weights*covariance*weights.T)['a'][0]
Final working code:
import numpy as np
import pandas as pd
weights = np.array([.1,.2,.05,.25,.1,.3])
data = pd.DataFrame(np.random.randn(1000,6),columns = ['a','b','c','d','e','f'])
covariance = data.cov()
portfolio_variance = (weights*covariance*weights.T)['a'][0]
sigma = np.sqrt(portfolio_variance)
marginal_risk = covariance*weights.T
risk_contribution = np.multiply(marginal_risk, weights.T)/sigma
print(risk_contribution)
portfolio_variance =(weights*covariance*weights.T)
portfolio_variance should be
portfolio_variance =(weights#covariance#weights.T)
This will provide the portfolio variance, which should be a single number.
same for marginal risk, it should be
marginal_risk = covariance#weights.T
I am having an issue with this function. I am wanting to perform a cross-sectional regression on 25 portfolios ranked on value and size. I have 7 independent variables as the right side of the equation.
import pandas as pd
import numpy as np
from linearmodels import FamaMacBeth
#creating a multi_index of independent variables
ind_var = pd.read_excel('FAMA_MACBETH.xlsx')
ind_var['date'] = pd.to_datetime(ind_var['date'])
# dropping our dependent variables
ind_var = ind_var.drop(['Mkt_rf', 'div_innovations', 'term_innovations',
'def_innovations', 'rf_innovations', 'hml_innovations',
'smb_innovations'],axis = 1)
ind_var = pd.DataFrame(ind_var.set_index('date').stack())
ind_var.columns = ['x']
x = np.asarray(ind_var)
len(x)
11600
#creatiing a multi_index of dependent variables
# reading in our data
dep_var = pd.read_excel('FAMA_MACBETH.xlsx')
dep_var['date'] = pd.to_datetime(dep_var['date'])
# dropping our independent variables
dep_var = dep_var.drop(['SMALL_LoBM', 'ME1_BM2', 'ME1_BM3', 'ME1_BM4',
'SMALL_HiBM', 'ME2_BM1', 'ME2_BM2', 'ME2_BM3', 'ME2_BM4', 'ME2_BM5',
'ME3_BM1', 'ME3_BM2', 'ME3_BM3', 'ME3_BM4', 'ME3_BM5', 'ME4_BM1',
'ME4_BM2', 'ME4_BM3', 'ME4_BM4', 'ME4_BM5', 'BIG_LoBM', 'ME5_BM2',
'ME5_BM3', 'ME5_BM4', 'BIG_HiBM'],axis = 1)
dep_var = pd.DataFrame(dep_var.set_index('date').stack())
dep_var.columns = ['y']
y = np.asarray(dep_var)
len(y)
3248
mod = FamaMacBeth(y, x)
res = mod.fit(cov_type='kernel', kernel='Parzen')
output with tstats and errors ideally
I have tried numerous methods of getting this to work. I am really thinking of using SAS at this point. Really, I would prefer to get this running with pandas
I expect a cross-sectional regression output with standard errors and t stats
I got it to work in one go. See this site and run the lines of code for OLS below: "Here the difference is presented using the canonical Grunfeld data on investment."
(Note that this line is important: etdata = data.set_index(['firm','year']), else Python won't know the correct dimensions to run F&McB on.)
Then run:
from linearmodels import FamaMacBeth
FamaMacBeth(etdata.invest,etdata[['value','capital']]).fit()
Note, I updated linearmodels to the latest version, that got me access to the data.
I am getting started with google facets, but I am finding the documentation insufficient.
I want to use facets dive to visualize images like they do here for cifar-10 . There is another example here using the Quick, draw! dataset.
However, I cannot find how to set it up.
That is about what I have so far, the code works all right:
from sklearn.datasets import fetch_mldata
from sklearn.decomposition import PCA
import pandas as pd
import numpy as np
from IPython.core.display import display, HTML
#load data:
mnist = fetch_mldata("MNIST original")
idx =np.random.randint(70000, 1000) #get random 1000 samples
X = mnist.data[idx]/255.0
y = mnist.target[idx]
#dimensionality reduction to get 3 features
pca = PCA(n_components=3)
pca_result = pca.fit_transform(X)
#put everything in a dataframe
df = df_pca = pd.DataFrame(data = pca_result, columns = [["pca-one", "pca-two", "pca-three"]])
df['y'] = y
#display facets dive
jsonstr = df.to_json(orient='records')
HTML_TEMPLATE = """<link rel="import" href="https://raw.githubusercontent.com/PAIR-code/facets/master/facets-dist/facets-jupyter.html">
<facets-dive id="elem" height="600"></facets-dive>
<script>
var data = {jsonstr};
document.querySelector("#elem").data = data;
</script>"""
html = HTML_TEMPLATE.format(jsonstr=jsonstr)
display(HTML(html))
This script works fine, but I just get circles with the labels (or whichever feature I choose), but I don't see how to integrate the actual images in there. The only hint I have so far is that I need the facets_atlasmaker for that, but I found the documentation rather insufficient.
If something is not clear, please let me know in the comments, I can try do add more relevant information then.
I am new to python and pytables. Currently I am writing a project about clustering and KNN algorithm. That is what I have got.
********** code *****************
import numpy.random as npr
import numpy as np
step0: obtain the cluster
dtype = np.dtype('f4')
pnts_inds = np.arange(100)
npr.shuffle(pnts_inds)
pnts_inds = pnts_inds[:10]
pnts_inds = np.sort(pnts_inds)
for i,ind in enumerate(pnts_inds):
clusters[i] = pnts_obj[ind]
step1: save the result to a HDF5 file called clst_fn.h5
filters = tables.Filters(complevel = 1, complib = 'zlib')
clst_fobj = tables.openFile('clst_fn.h5', 'w')
clst_obj = clst_fobj.createCArray(clst_fobj.root, 'clusters',
tables.Atom.from_dtype(dtype), clusters.shape,
filters = filters)
clst_obj[:] = clusters
clst_fobj.close()
step2: other function
blabla
step3: load the cluster from clst_fn
pnts_fobj= tables.openFile('clst_fn.h5','r')
for pnts in pnts_fobj.walkNodes('/', classname = 'Array'):
break
#
step4: evoke another function (called knn). The function input argument is the data from pnts. I have checked the knn function individually. This function works well if the input is pnts = npr.rand(100,128)
def knn(pnts):
pnts = numpy.ascontiguousarray(pnts)
N = ctypes.c_uint(pnts.shape[0])
D = ctypes.c_uint(pnts.shape[1])
#
evoke knn using the cluster from clst_fn (see step 3)
knn(pnts)
********** end of code *****************
My problem now is that python is giving me a hard time by showing:
error: IndexError: tuple index out of range
This error comes from
"D = ctypes.c_uint(pnts.shape[1])" this line.
Obviously, there must be something wrong with the input argument. Any thought about fixing the problem? Thank you in advance.