scipy.stats is lost when I import my module - python

The problem appears only when I run the code via the Linux command line, i.e. the Windows Subsystem for Linux. It does not occur when run via a conda environment on Windows. In both cases, scipy is properly installed.
I have created a function to perform linear regressions of values in rows from across two dataframes df_1 and df_2. Their column names are the same as the keys as the dictionary data_dict.
from scipy.stats import linregress
import numpy as np
def foo(df_1, df_2, data_dict):
for index, row in df_2.iterrows():
x = []
for d in data_dict:
x.append(row[d])
x = np.array(x)
for index, row in df_1.iterrows():
y = []
for d in data_dict:
y.append(row[d])
y = np.array(y)
s, i, r, p, se = linregress(x, y)
This works fine as long as I run it from within the script it is written in, however as soon as I import it into a different script, 'bar' and try to run it I get the error AttributeError: module 'scipy' has no attribute 'stats', and the traceback refers to the line in which linregress is actually used, not the import line.
I have tried importing in other ways, i.e.
from scipy import stats
As well as importing directly before the linregress operation, i.e.
from scipy.stats import linregress
s, i, r, p, se = linregress(x, y)
And finally I've tried seeing if any of the other modules imported to 'bar' are interfering with scipy.stats, and this is not the case.
Any idea why python is 'forgetting' scipy.stats?
I also tried checking that scipy.stats was imported by writing a list of all modules imported in 'bar' before calling foo;
with open('modules_on_import.txt', 'a') as f:
for s in sys.modules:
f.write(f"{s}\n")
f.close()
and scipy.stats can be found in modules_on_import.txt
Some more details:
I'm not running in virtual environment, echo $VIRTUAL_ENV returns nothing.
Everything is run via the command line, i.e. directly in Bash. In this case I simply type python3 bar.py.
All modules installed using pip, via command line - i.e. pip install scipy
Unsure if it matters, but I'm editing in vim.
A (simplified) example of bar.py.
from psd_processing import process_psd # function to make df_2 and data_dict
from uptake_processing import process_uptake # function to make df_1
from foo_test import foo
project = '0020'
loading_df = process_uptake(project, 'co2', 298) # this works
param_df, data_dict = process_psd(project, 'n2', 'V') # this works
correlation_df = foo(loading_df, param_df, data_dict) # this breaks on linregress in foo.py
It's not the installation method of scipy. I uninstalled and reinstalled with pip3 to be sure.
However, when I run the code via the Spyder IDE, it works!
Some pertinent information;
I was originally running the code via Ubuntu 20.04.3 LTS on Windows 10 x86_64. My Python installation is in /usr on Ubuntu.
When running in Spyder, the code is run directly on Windows. The python installation is in C:\Users\<user>\Anaconda3.
How do I get this code to run properly via the command line?

As noted in the question, this code works on a conda venv in windows but not in python3 directly installed on the Ubuntu WSL. As my preference is to use the linux command line I did the following workaround;
Install Anaconda on the Ubuntu WSL.
Create and activate a virtual environment.
Install required packages in virtual environment via conda install <pkg>.
Run everything in the new virtual environment.

Related

'double free or corruption (out)' error when running python script from conda env

I recently installed anaconda to be able to install a couple of packages in an easier fashion. However, it seems that importing things in the conda env has messed things up.
For example, if I have a script 'data.py' that is very simply:
import xarray as xr
print('1')
And then run it as:
python3 data.py
I get the output '1'. However, if I now run a conda env as well as the script:
conda activate pyn_env && python3 data.py
I get the error:
double free or corruption (out)
[1] #### abort data.py
Any thoughts why this is happening? I should also note that if I change the 'xarray' import to numpy, I do not get this error. So it is not all packages, only certain ones.

Retrieving the requirements of a Python single script

I would like to know how to extract the requirements from a single python script. I tried the following way, at the beginning of my file, immediately after the imports:
try:
from pip._internal.operations import freeze
except ImportError: # pip < 10.0
from pip.operations import freeze
x = freeze.freeze()
for p in x:
print(p)
The piece of code above, however, gives me back all the Python frameworks installed locally. I would like to extract only the necessary requirements for the script, in order to be able to deploying the final application.
I hope I was clear.
pipreqs is simple to use
install:
pip install pipreqs
in linux in the same folder of your script
use:
pipreqs .
then the requirements.txt file is created
pip home page:
https://pypi.org/project/pipreqs/
You can do this easily with 'modulefinder' python module.
I think you want to print all the modules required by a script.
So, you can refer to
http://blog.rtwilson.com/how-to-find-out-what-modules-a-python-script-requires/
or for your ease the code is here:
from modulefinder import ModuleFinder
f = ModuleFinder()
# Run the main script
f.run_script('run.py')
# Get names of all the imported modules
names = list(f.modules.keys())
# Get a sorted list of the root modules imported
basemods = sorted(set([name.split('.')[0] for name in names]))
# Print it nicely
print ("\n".join(basemods))

How to import matplotlib python library in pyspark using sc.addPyFile()?

I am using spark on python both iteratively launching the command pyspark from Terminal and also launching an entire script with the command spark-submit pythonFile.py
I am using to analyze a local csv file, so no distributed computation is performed.
I would like to use the library matplotlib to plot columns of a dataframe. When importing matplotlib I get the error ImportError: No module named matplotlib. Then I came across this question and tried the command sc.addPyFile() but you could not find any file relating to matplotlib that I can pass to it on my OS (OSX).
For this reason I created a virtual environment and installed matplotlib with it. Navigating through the virtual environment I saw there was no file such as marplotlib.py so I tried to pass it the entire folder sc.addPyFile("venv//lib/python3.7/site-packages/matplotlib") but again no success.
I do not know which file I should include or how at this point and I ran out of ideas.
Is there a simple way to import matplotlib library inside spark (installing with virtualenv or referencing the OS installation)? And if so, which *.py files I should pass the command sc.addPyFile()
Again I am not interested in distributed computation: the python code will run only locally on my machine.
I will post what I have done. First of all I am working with virtualenv. So I created a new one with virtualenv path.
Then I activated it with source path/bin/activate.
I installed the packages I needed with pip3 install packageName.
After that I created a script in python that creates a zip archive of the libraries installed with virtualenv in the path ./path/lib/python3.7/site-packages/.
The code of this script is the following (it is zipping only numpy):
import zipfile
import os
#function to archive a single package
def ziplib(general_path, libName):
libpath = os.path.dirname(general_path + libName) # this should point to your packages directory
zippath = libName + '.zip' # some random filename in writable directory
zf = zipfile.PyZipFile(zippath, mode='w')
try:
zf.debug = 3 # making it verbose, good for debugging
zf.writepy(libpath)
return zippath # return path to generated zip archive
finally:
zf.close()
general_path = './path//lib/python3.7/site-packages/'
matplotlib_name = 'matplotlib'
seaborn_name = 'seaborn'
numpy_name = 'numpy'
zip_path = ziplib(general_path, numpy_name) # generate zip archive containing your lib
print(zip_path)
After that the archives must be referenced in the pyspark file myPyspark.py. You do this by calling the method addPyFile() of the sparkContext class. After that you just can import in your code as always. In my case I did the following:
from pyspark import SparkContext
sc = SparkContext.getOrCreate()
sc.addPyFile("matplot.zip") #generate with testZip.py
sc.addPyFile("numpy.zip") #generate with testZip.py
import matplotlib
import numpy
When you are launching the script you have to reference the zip archives in the command with --py-files. For example:
sudo spark-submit --py-files matplot.zip --py-files numpy.zip myPyspark.py
I considered two archives because to me it was clear how to import one but not two of them.
You can zip the matplotlib directory and pass it to addPyFile(). Or alternatively you can define a environment variable which includes user packages: export PYTHONPATH="venv//lib/python3.7/site-packages/:$PYTHONPATH"
Create a py file with your code. Add pyfile to spark context.
import matplotlib.pyplot as plt
plt.<your operations>
save the file as file.py. Add this to sparkcontext
spark.sparkContext.addPyFile("file.py")

how to use rpy2 within a packrat environment?

I try to use an R package that I have installed using the R package 'packrat' that allow to create a virtual environment similar to virtuanlenv in python. But I do not succeed.
Within a console using R I can run successfully the following code:
cd /path/to/packrat/environment
R # this launch a R console in the packrat environment
library(mycustompackage)
result = mycustompackage::myfunc()
q()
I would like to do the same using rpy2, but I'm unable to activate the packrat environment. Here follow what I've tested unsuccessfully.
from rpy2.robjects import r
from rpy2.robjects.packages import importr
packrat_dir = r.setwd('/path/to/packrat/environment')
importr('mycustompackage')
result = r.mycustompackage.myfunc()
But it fails at 'importr' because it cannot find the package 'mycustompackage'. Either unsuccessfull :
importr('mycustompackage', lib_loc='/path/to/packrat/environment')
Neither:
os.environ['R_HOME'] = '/path/to/packrat/environment'
importr('mycustompackage', lib_loc ='/path/to/packrat/environment')
Any suggestion on how to use rpy2 with packrat environments?
I am not familiar with the R package packrat, but I am noticing that the bash + R and Python/rpy2 code have a subtle difference that might matter a lot: in the bash + R case, when R is starting it is already in your packrat project directory whereas in the Python / rpy2 case R is starting from a different directory and is moved to the packrat project directory using setwd().
I am reading that packrat is using a file .Rprofile (https://rstudio.github.io/packrat/limitations.html), evaluated by R at startup time if in the current directory. I suspect that the issue is down to how packrat is used rather than an issue with rpy2.
Very good remark (hidden file = forgotten file). I found out how to make it running:
from rpy2.robjects import r
from rpy2.robjects.packages import importr
# Init the packrat environment
r.setwd('/path/to/packrat/environment')
r.source('.Rprofile')
# use the packages it contains
importr('mycustompackage')
result = r.myfunc()
lgautier, you made my day, thanks a lot.

Run .py script via sh import module error

This is a very basic question on how to code in python and run your script from a very beginner.
I'm writing a script using Xcode9.4.1 which is supposed to be for python3.6. I then have an sh script run.sh, in the same folder of the script (say "my_folder") which simply looks like
python my_script.py
The python script looks like
from tick.base import TimeFunction
import numpy as np
import matplotlib.pyplot as plt
v = np.arange(0., 10., 1.)
f_v = v + 1
u = TimeFunction((v, f_v))
plt.plot(v, u.value(v))
print('donne!\n')
But as I try to run my_script.sh from the terminal I get a "ImportError: No module named tick.base" error.
But the tick folder is actually present in "my_computer/anaconda3/lib/python3.6/site-packages" and up to last week I was using Spyder from anaconda navigator and everything was correctly working, so no "import error" occurred.
The question is quite trivial, in some sense it simply is "what's the typical procedure to code and run python script and how modules are supposed to be imported-downloaded when running on a given machine?"
I need it since my script is to be run on another machine through ssh and using my laptop to make some attempts. Up to last year I used to work in C and only need to move some folders with code and .h files.
Thank for help!
EDIT 1:
From the Spyder 3.2.7 setting, where the script was giving non problem, I printed the
import sys
print(sys.path)
The -manually- copied the content to the sys.path variable in my_script.py and rerun 'run.sh' and now getting a new (strange) error:
Traceback (most recent call last):
[...]
File "/Users/my_computer/anaconda3/lib/python3.6/site-packages/tick/array/build/array.py", line 106
def tick_double_array_to_file(_file: 'std::string', array: 'ArrayDouble const &') -> "void":
^
SyntaxError: invalid syntax
First, check the python which you are calling the script with is pointing to the anaconda python and it is of the same version you are expecting it to be. You can do "which python" command in Linux and Mac to which the path which points to python. It if is pointing to some different version or build of python than the one which you are expecting then add the needed path to the system environment PATH variable. In Linux and Mac this can be done by adding the following line in the .bashrc file at the /home/ folder:
export PATH=/your/python/path:$PATH
And then source the .bashrc file.
source .bashrc
If you are on a operating system like cent os ,breaking the default python path can break your yum so be careful before changing it.
I am running a script in PyCharm and under the Project Interpretor I have the path
C:\envs\conda\keras2\python.exe
When I try to run the script via ssh on the server I get a 'no module named' error. I get
/usr/bin/python as the ans to 'which python' on the server itself. Could you tell me which path I must add for the script to run properly?

Categories

Resources