Tsfresh takess too long that the computer can handle - python

I am trying to use tsfresh feature extraction library in python 3.7.1 using efficient parameters with a test file (24 rows x 366 columns)
it never stops and keeps processing and i tried to run same library on a different laptop with installed python 2.17.16 but the tsfresh library did not work.
what should i do?
# Import Data from CSV file
#import csv
#with open('T7.csv') as T7:
# reader = csv.reader(T7)
# try:
# for row in reader:
# print(row)
# finally:
# T7.close()
from matplotlib import pyplot as plt
from matplotlib import style
import numpy as np
import pandas as pd
style.use('ggplot')
#from tsfresh import extract_features #as tsfreshobj
#from tsfresh import MinimalFeatureExtractionSettings
from tsfresh.feature_extraction import extract_features, EfficientFCParameters
#X = extract_features(df, column_id='id', column_sort='time')
y=pd.read_csv ('1.csv')#, skiprows=1)
#y=np.loadtxt('T7_2.csv')#,
#unpack=True,
# delimiter=',')
#y1=tsfreshobj.feature_extraction.extraction.generate_data_chunk_format(y)
#y2=tsfreshobj.feature_extraction.feature_calculators.absolute_sum_of_changes(y1)
#y1=extract_features(y, feature_extraction_settings=MinimalFeatureExtractionSettings)
print (y)
# from tsfresh.feature_extraction import MinimalFeatureExtractionSettings
y1=extract_features(y, column_id='time', default_fc_parameters=EfficientFCParameters())#, column_sort='time')
print (y)
print (y1)
plt.plot(y1)
print (y)
plt.title ('some numbers')
plt.ylabel('Y axis')
plt.xlabel ('X axis')
plt.show()

Have you tried with the MinimalFCParameters if it works at all? With these, it should be finished in a matter of seconds.
One problem could be, that you need to wrap your code in a if __name__ == "__main__", otherwise the multiprocessing library will have a problem.
If this does not help, you could use any of the techniques I described e.g. here to parallelize the tsfresh computation.
The issue with installing tsfresh on your other machine is unrelated to tsfresh - the error message shows that you did not have internet connection while calling pip install.

Related

Does Rmarkdown allow knitting of matplotlib plots? If so, will you help me troubleshoot?

EDIT Found solution. Plot Python Plots Inline
I have been making a Rmarkdown notebook to take notes on my python studies. Within RStudio I have been able to knit the document containing my python code to HTML with no problem until I started plotting data using matplotlib. The curious part is that the plots are generated correctly within the code chunks. However, after knitting, it spits out an error every time at 80%.
Here is my sample code:
---
title: "Python Plot"
output: html_document
---
```{r setup, include=FALSE}
library(knitr)
knitr::opts_chunk$set(echo = TRUE)
library(reticulate) #Allows for Python to be run in chunks
```
```{python, eval=F}
import numpy as np
trees = np.array(r.trees) #Imported an internal R dataset. It got rid of headers and first row. Don't know how to deal with that right now.
type(trees)
np.shape(trees)
print(trees[1:6,:])
import matplotlib.pyplot as plt
plt.plot(trees[:,0], trees[:,1])
plt.show()
plt.clf() #Reset plot surface
```
Again, this plot comes out just fine when processing within the chunk, but does not knit. The error message says,
"This application failed to start because it could not find or load the Qt platform plugin "windows" in ",
Reinstalling the application may fix this problem."
I have uninstalled and reinstalled both Rstudio and Python and continue to have the same result. I find it odd that it works within the chunk but not to knit to HTML. All my other python code knits just fine.
I have
python 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)]
Rstudio Version 1.2.1335.
I've read other solutions. I believe that the libEGL.dll is in the same place as all the other QT5*.dll.
I found an answer here by Kevin Arseneau. Plot Python Plots Inline
I would not call this a duplicate because the problem was different, but the solution worked for both problems.
What is needed is to add the following code:
import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = '/path/to/Anaconda3/Library/plugins/platforms'
Here is my updated working code. It is similar to original question and updated with a python chunk for imports that Bryan suggested.
---
title: "Python Plot"
output: html_document
---
```{r setup, include=FALSE}
library(knitr)
knitr::opts_chunk$set(echo = TRUE)
library(reticulate) #Allows for Python to be run in chunks
```
```{python import}
import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = '/path/to/Anaconda3/Library/plugins/platforms'
import numpy as np
import matplotlib.pyplot as plt
```{python, eval=TRUE}
trees = np.array(r.trees) #Imported an internal R dataset. It got rid of headers and first row. Don't know how to deal with that right now.
type(trees)
np.shape(trees)
print(trees[1:6,:])
plt.plot(trees[:,0], trees[:,1])
plt.show()
plt.clf() #Reset plot surface
```
In your import libraries, you have to install and import PyQT5 into your Python environment. So for example, my first chunks look like the following, first line of # Base Libraries is import PyQt5:
---
title: "Cancellations TS"
author: "Bryan Butler"
date: "7/1/2019"
output:
html_document:
toc: false
toc_depth: 1
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, cache.lazy = FALSE)
```
# <strong>Time Series of Auto Policies</strong> {.tabset .tabset-fade .tabset-pills}
<style>
.main-container {
max-width: 1200px !important;
margin-left: auto;
margin-right: auto;
}
</style>
{r, loadPython}
library(reticulate)
use_condaenv('timeseries')
## Load Python
{python importLibaries}
# Base libraries
import PyQt5
import pandas as pd
from pandas import Series, DataFrame
from pandas.plotting import lag_plot
import numpy as np
import pyodbc
I was able to run your code with some modifications. I broke the chunks up for error checking. You need to import numpy as np, and I added others. Here is the code that I got to work. Also, I use conda virtual environments so that the Python environment is exact. This is what worked:
---
title: "test"
author: "Bryan Butler"
date: "7/2/2019"
output: html_document
---
```{r setup, include=FALSE}
library(knitr)
knitr::opts_chunk$set(echo = TRUE)
library(reticulate) #Allows for Python to be run in chunks
use_condaenv('timeseries')
```
```{python import}
import PyQt5
import pandas as pd
from pandas import Series, DataFrame
import numpy as np
import matplotlib.pyplot as plt
```
```{python, test}
trees = np.array(r.trees) #Imported an internal R dataset. It got rid of headers and first row. Don't know how to deal with that right now.
type(trees)
np.shape(trees)
print(trees[1:6,:])
```
```{python plot}
plt.plot(trees[:,0], trees[:,1])
plt.show()
plt.clf() #Reset plot surface
```

Python regression analysis error

I'm trying to run a regression analysis with the below mentioned code. I encounter ImportError: No module named statsmodels.api and No module named matplotlib.pyplot. Any suggestions will be appreciated to overcome this error.
import pandas as pd
import numpy as np
import seaborn as sns
from scipy import stats, integrate
import matplotlib.pyplot as plt
import statsmodels.api as sm
data = pd.read_csv("F:\Projects\Poli_Map\DAT_OL\MASTRTAB.csv")
# define the data/predictors as the pre-set feature names
df = pd.DataFrame(data.data, columns=data.feature_names)
# Put the target (IMR) in another DataFrame
target = pd.DataFrame(data.target, columns=["IMR"])
X = df["HH_LATR","COMM_TOILT","PWS"]
y = target["IMR"]
model = sm.OLS(y, X).fit()
predictions = model.predict(X) # make the predictions by the model
# Print out the statistics
model.summary()
plt.scatter(predictions, y, s=30, c='r', marker='+', zorder=10) #Plot graph
plt.xlabel("Independent variables")
plt.ylabel("Outcome variables")
plt.show()
I highly recommend that you install ANACONDA. This way the environment variables are set automatically and you don't need to worry about anything else. There are many useful packages (e.g. numpy, sympy, scipy) which are bundled with anaconda.
Moreover, based on personal experience I can tell you that using pip on windows and compiling from source (you need visual studio) is a pain in the neck sometimes. That's why ANACONDA has been conceived.
see : https://www.anaconda.com/download/
Hope this helps.

statsmodels package code works in spyder's ipython console but not in python script

This is a python script in the Spyder IDE of Anaconda. I have Python 3.6.2. The last two lines do nothing (apparently) when I run the script, but work if I type them in the IPython console of Spyder. How do I get them to work in the script please?
# import packages
import os # misc operating system functions
import sys # system parameters and functions
import pandas as pd # data frame handling
import statsmodels.formula.api as sm # stats module
import matplotlib.pyplot as plt # matlab-like plotting
import numpy as np # big, fast arrays for maths
# set working directory to here
os.chdir(os.path.dirname(sys.argv[0]))
# read data using pandas
datafile = '1314 Powerview Pasture Potential.csv'
data1 = pd.read_csv(datafile)
#list(data) # to show column names
# plot some data
# don't know how to pop out a separate window
plt.scatter(data1['long'],data1['lat'],s=40,facecolors='none',edgecolors='b')
# simple multiple regression
Y = data1[['Pasture and Crop eaten t DM/ha']]
X = data1[['Net imported Supplements per Ha',
'LWT/ha',
'lat']]
result = sm.OLS(Y,X).fit()
result.summary()

Pyplot "cannot connect to X server localhost:10.0" despite ioff() and matplotlib.use('Agg')

I have a piece of code which gets called by a different function, carries out some calculations for me and then plots the output to a file. Seeing as the whole script can take a while to run for larger datasets and since I may want to analyse multiple datasets at a given time I start it in screen then disconnect and close my putty session and check back on it the next day. I am using Ubuntu 14.04. My code looks as follows (I have skipped the calculations):
import shelve
import os, sys, time
import numpy
import timeit
import logging
import csv
import itertools
import graph_tool.all as gt
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
plt.ioff()
#Do some calculations
print 'plotting indeg'
# Let's plot its in-degree distribution
in_hist = gt.vertex_hist(g, "in")
y = in_hist[0]
err = numpy.sqrt(in_hist[0])
err[err >= y] = y[err >= y] - 1e-2
plt.figure(figsize=(6,4))
plt.errorbar(in_hist[1][:-1], in_hist[0], fmt="o",
label="in")
plt.gca().set_yscale("log")
plt.gca().set_xscale("log")
plt.gca().set_ylim(0.8, 1e5)
plt.gca().set_xlim(0.8, 1e3)
plt.subplots_adjust(left=0.2, bottom=0.2)
plt.xlabel("$k_{in}$")
plt.ylabel("$NP(k_{in})$")
plt.tight_layout()
plt.savefig("in-deg-dist.png")
plt.close()
print 'plotting outdeg'
#Do some more stuff
The script runs perfectly happily until I get to the plotting commands. To try and get to the root of the problem I am currently running it in putty without screen and with no X11 applications. The ouput I get is the following:
plotting indeg
PuTTY X11 proxy: unable to connect to forwarded X server: Network error: Connection refused
: cannot connect to X server localhost:10.0
I presume this is caused by the code trying to open a window but I thought that by explicitely setting plt.off() that would be disabled. Since it wasn't I followed this thread (Generating matplotlib graphs without a running X server ) and specified the backend, but that didn't solve the problem either. Where might I be going wrong?
The calling function calls other functions too which also use matplotlib. These get called only after this one but during the import statement their dependecies get loaded. Seeing as they were loaded first they disabled the subsequent matplotlib.use('Agg') declaration. Moving that declaration to the main script has solved the problem.

Write a for loop in Abaqus Macro (Python)

I've been using Abaqus for a while but I'm new to macros and python script. I'm sorry if this kind of question has already been asked, I did search on google to see if there was a similar problem but nothing works..
My problem is the following :
I have a model in Abaqus, I've ran an analysis with 2 steps and I created a path in it and I'd like to extract the value of the Von Mises stress along this path for each frame of each step.
Ideally I'd love to save it into an Excel or a .txt file for easy further analysis (e.g in Matlab).
Edit : I solved part of the problem, my macro works and all my data is correctly saved in the XY-Data manager.
Now I'd like to save all the "Y" data in an excel or text file and I have no clue on how to do that. I'll keep digging but if anyone has an idea I'll take it !
Here's the code from the abaqusMacros.py file :
# -*- coding: mbcs -*-
# Do not delete the following import lines
from abaqus import *
from abaqusConstants import *
import __main__
def VonMises():
import section
import regionToolset
import displayGroupMdbToolset as dgm
import part
import material
import assembly
import step
import interaction
import load
import mesh
import optimization
import job
import sketch
import visualization
import xyPlot
import displayGroupOdbToolset as dgo
import connectorBehavior
odbFile = session.openOdb(name='C:/Temp/Job-1.odb')
stepsName = odbFile.steps.keys()
for stepId in range(len(stepsName)):
numberOfFrames = len(odbFile.steps.values()[stepId].frames)
for frameId in range(numberOfFrames):
session.viewports['Viewport: 1'].odbDisplay.setPrimaryVariable(
variableLabel='S', outputPosition=INTEGRATION_POINT, refinement=(
INVARIANT, 'Mises'))
session.viewports['Viewport: 1'].odbDisplay.setFrame(step=stepId, frame=frameId)
pth = session.paths['Path-1']
session.XYDataFromPath(name='Step_'+str(stepId)+'_'+str(frameId), path=pth, includeIntersections=False,
projectOntoMesh=False, pathStyle=PATH_POINTS, numIntervals=10,
projectionTolerance=0, shape=DEFORMED, labelType=TRUE_DISTANCE)
First of all, your function VonMises contains only import statements, other parts of the code are not properly indented, so they are outside of the function.
Second of all, the function is never called. If you run your script using 'File > Run script', then you should call the function at the end of your file.
There two thing seem like obvious errors, but there are some other bad things as well.
Also, I don't see the point of writing import __name__ at the top of your file because I really doubt you have a module name __name__; Python environment used by Abaqus probably doesn't either.
There are some other things that might be improved as well, but you should try to fix the errors first.
If you got an actual error message from Abaqus (either in the window or in abaqus.rpy file), it would be helpful if you posted it here.
Got it :
I'll use the code posted above, repeated here :
# -*- coding: mbcs -*-
# Do not delete the following import lines
from abaqus import *
from abaqusConstants import *
import __main__
def VonMises():
import section
import regionToolset
import displayGroupMdbToolset as dgm
import part
import material
import assembly
import step
import interaction
import load
import mesh
import optimization
import job
import sketch
import visualization
import xyPlot
import displayGroupOdbToolset as dgo
import connectorBehavior
odbFile = session.openOdb(name='C:/Temp/Job-1.odb')
stepsName = odbFile.steps.keys()
for stepId in range(len(stepsName)):
numberOfFrames = len(odbFile.steps.values()[stepId].frames)
for frameId in range(numberOfFrames):
session.viewports['Viewport: 1'].odbDisplay.setPrimaryVariable(
variableLabel='S', outputPosition=INTEGRATION_POINT, refinement=(
INVARIANT, 'Mises'))
session.viewports['Viewport: 1'].odbDisplay.setFrame(step=stepId, frame=frameId)
pth = session.paths['Path-1']
session.XYDataFromPath(name='Step_'+str(stepId)+'_'+str(frameId), path=pth, includeIntersections=False,
projectOntoMesh=False, pathStyle=PATH_POINTS, numIntervals=10,
projectionTolerance=0, shape=DEFORMED, labelType=TRUE_DISTANCE)
And I just discovered the "Excel Utilities" tool on Abaqus which is enough for what I want to do.
Thanks y'all for your input.
Here is how you extract XY-data
from odbAccess import *
odb = session.odbs['C:/Job-Directory/Job-1.odb']
output = open('Result.dat', 'w')
for i in range (0,Number-of-XYData-to-extract):
xy1 = odb.userData.xyDataObjects['XYData-'+str(i)]
for x in range (0,len(xy1)):
output.write ( str(xy1[x]) + "\n" )
output.close()

Categories

Resources