Why wont my code execute both in Terminal & on spyder IDE? - python

I'm analysing this data set using ML techniques in Python3.5 on sypder IDE (Ubuntu OS) and my program is supposed to work fine (matches perfectly with tutorial program) but it does nothing when run - nothing gets printed or returned. The console of spyder IDE displays the following and does nothing after that:
runfile('/media/username/Laniakea/Projects/Training/SPYDER/classifier/sk_classifier.py', wdir='/media/username/Laniakea/Projects/Training/SPYDER/classifier')
I used to get this when a new program starts to run, and the output would follow but here, I get nothing. My program:
from sklearn import svm
import pandas as pd
import numpy as np
df_pickled_train2 = pd.read_pickle('df_train.pickle')
df_pickled_test2 = pd.read_pickle('df_test.pickle')
df_pickled_train2_y = pd.read_pickle('df_train_y.pickle')
df_pickled_test2_y = pd.read_pickle('df_test_y.pickle')
X = np.array(df_pickled_train2)
y = np.array(df_pickled_train2_y)
X_test = np.array(df_pickled_test2)
y_test = np.array(df_pickled_test2_y)
clf = svm.SVC(kernel='linear')
clf.fit(X,y.ravel())
print(clf.score(X_test,y_test))
print("Done")
If you want to see how the pickles get created (and this program runs fine - it even prints out the final line "Done" or anything else I want it to print):
import pandas as pd
import numpy as np
df_train = pd.read_csv('Adult-Incomes/train-labelled-final-variables-condensed-coded-countries-removed-unlabelled-income-to-the-left-relabelled-copy.csv')
df_test = pd.read_csv('Adult-Incomes/test-final-variables-cleaned-coded-copy-unlabelled.csv')
df_train_no_y = df_train.drop('Income',1)
df_test_no_y = df_test.drop(df_test.columns[0],axis=1)
df_train_y = pd.DataFrame(df_train['Income'])
df_train_y.to_pickle('df_train_y.pickle')
df_test_y = df_test[df_test.columns[0]]
df_test_y.to_pickle('df_test_y.pickle')
df_test_no_y.to_pickle('df_test.pickle')
df_train_no_y.to_pickle('df_train.pickle')
print ("DONE")
PS: Even if run from the Terminal, it simply executes but does nothing. Meaning, in terminal, the cursor would go to the next line and print out the output before prompting for another command right, but here, it simply stays there. It's not even hung, as the cursor blinks and computer is not hung. It feels like, the code somehow sends the executor into a limbo.
P.P.S: I even suspected that it is running a complex algo, genuinely requiring time and left it over night. Nothing happened even then.
Can someone tell me why my program wont run or display anything?

Related

Jupyter Notebook stuck at loading when importing data with pymongo to a pandas dataframe

Everytime I run it, the cell stays loading and it never finishes. Data is 2mbs so it should load really fast. Pymongo was installed with pymongo[srv]. My code:
import pandas as pd
import pymongo
import os
pwd = os.getenv("mongodb_pwd")
client = pymongo.MongoClient(
f"mongodb+srv://...:{pwd}#.../test?authSource=admin&replicaSet=...&readPreference=primary&ssl=true"
)
db = client["general"]
data = db["orders"].find()
df = pd.DataFrame(list(data))
any idea why this is happening? It used to work 2 weeks ago (under python 3.9, im using 3.10 now)
Ran the file as .py, still stuck.
After more than 5 minutes, the data was loaded. Any ideas why it is taking so long?
restarted the pc and everything worked fast again. No idea what happened.

my python functions do not work in jupyter notebook

i have written lines of code that run normally in jupyter notebook (here is the code below):
umd_departments = requests.get("https://api.umd.io/v0/courses/departments")
umd_departments_list = umd_departments.json()
umd_departments_list2 = json.dumps(umd_departments_list, indent=1)
department_storage = [department['dept_id'] for department in umd_departments_list]
print(department_storage)
the code above usually gives me the output i want and prints it out immediately. the issue i run into is when i try and put the code above into a function of its own it doesn't work
def get_UMD_departments():
umd_departments = requests.get("https://api.umd.io/v0/courses/departments")
umd_departments_list = umd_departments.json()
umd_departments_list2 = json.dumps(umd_departments_list, indent=1)
department_storage = [department['dept_id'] for department in umd_departments_list]
print(department_storage)
the problem i face with this version of code is that it never prints out anything as oppose to the other code i showed. with this code, i also don't get an error when i run it since the * symbol doesn't show up and i don't get an error message. so i'm not sure what the problem is. it usually just shows how many times i ran the cell. i was wondering if anyone knew a way to make the function version of my code to work, greatly appreciated.

Process finished with exit code -1073741819 (0xC0000005) - Rpy2

I have searched a lot for this error, on stack overflow and other websites but I cannot seem to find a solution to my problem.
Basically, I have a program that is in python, and I am using python's module rpy2 for communicating with some R functions, from python.
The problem is that when I run the code, sometimes, but not always I encounter this error. I am on windows. Sometimes when I restart my PC this code runs more exercises, but then eventually this error pops up again. What should I do ?
I have python 3.6.7, with PyCharm 2018.3.3. However I doubt the problem is from PyCharm because when I run my program from the cmd the same thing happens, except that the program halts directly without notifying me with the message "Process finished with exit code -1073741819 (0xC0000005)". This message only appears in PyCharm, but still.
I have rpy2 version 2.9.5
Code description
I do know, relatively, which part of the code is doing this, but I cannot optimize it more. In other words, In this part of the code, inside cross validation, I am over populating each of the train and validation sets in a certain way, and in order to do that, I am combining both X_train and y_train back into one data frame, overpopulating this data frame, and then getting back the updated, overpopulated, X_train and y_train, and performing my analysis on these overpopulated ones. I think combining both into numpy arrays into a pandas dataframe and then un-combining back is creating this memory error. Also its important to note that this is happening in each fold, and I'm doing a 10-folds-10-repeats cross validation. However, even when I run this on a Desktop PC rather than on my laptop the same thing happens, knowing that I have plenty of GBs left on my own laptop. I am doubting this is a python/rpy2 error ??
Code snippet
# I am calling this function inside each fold
df_combined = self.prepare_data(X_train, y_train)
and then after calling prepare_data() I do as follows:
# THE apply_f1(), apply_f2(), apply_f3(), and apply_f4() ARE THE FUNCTIONS
# THAT USE rpy2 INTERNALLY
if self.f1:
X_train_inner, y_train_inner = self.apply_f1(df_combined)
elif self.f2:
X_train_inner, y_train_inner = self.apply_f2(df_combined)
elif self.f3:
X_train_inner, y_train_inner = self.apply_f3(df_combined)
else:
X_train_inner, y_train_inner = self.apply_f4(df_combined)
The prepare_data() function:
def prepare_data(self, X_train, y_train):
'''
concatenates X_train_inner and y_train_inner into one, and make them a data frame
so we are able to process the data frame by SMOGN, RandUnder, GN, or SMOTER
'''
# reshape + rename
X_train_samp = X_train
y_train_samp = y_train.reshape(-1, 1)
# combine two numpy arrays together into one numpy array
combined = np.concatenate((X_train_samp, y_train_samp), axis=1)
# transform X_train + y_train into a pandas dataframe
column_names = self.other + [self.target_variable]
df_combined = pd.DataFrame(combined, columns=column_names)
# convert the combined pandas dataframe to R Data.Frame
df_combined = pandas2ri.py2ri(df_combined)
return df_combined
I have had this same error message "Process finished with exit code -1073741819 (0xC0000005)" with PyCharm 2021.1.
It happened because I selected Python 3.9 as an interpreter, while PyCharm was actually trying to use Python 3.10. And actually I had only Python 3.8 installed.
As far as I am concerned, the error disappeared after I selected Python 3.8 as an interpreter.

TensorFlow: minimalist program fails on distributed mode

I wrote a very simple program that runs just fine without distribution but hangs on CheckpointSaverHook in distributed mode (everything on my localhost though!). I've seen there's been a few questions about hanging in distributed mode, but none seem to match my question.
Here's the script (made to toy with the new layers API):
import numpy as np
import tensorflow as tf
from tensorflow.contrib.learn.python.learn import learn_runner
from tensorflow.contrib import layers
DATA_SIZE=10
DIMENSION=5
FEATURES='features'
def generate_input_fn():
def _input_fn():
mid = int(DATA_SIZE/2)
data = np.array([np.ones(DIMENSION) if x < mid else -np.ones(DIMENSION) for x in range(DATA_SIZE)])
labels = ['0' if x < mid else '1' for x in range(DATA_SIZE)]
table = tf.contrib.lookup.string_to_index_table_from_tensor(tf.constant(['0', '1']))
label_tensor = table.lookup(tf.convert_to_tensor(labels, dtype=tf.string))
return dict(zip([FEATURES], [tf.convert_to_tensor(data, dtype=tf.float32)])), label_tensor
return _input_fn
def build_estimator(model_dir):
features = layers.real_valued_column(FEATURES, dimension=DIMENSION)
return tf.contrib.learn.DNNLinearCombinedClassifier(
model_dir=model_dir,
dnn_feature_columns=[features],
dnn_hidden_units=[20,20])
def generate_exp_fun():
def _exp_fun(output_dir):
return tf.contrib.learn.Experiment(
build_estimator(output_dir),
train_input_fn=generate_input_fn(),
eval_input_fn=generate_input_fn(),
train_steps=100
)
return _exp_fun
if __name__ == '__main__':
tf.logging.set_verbosity(tf.logging.DEBUG)
learn_runner.run(generate_exp_fun(), 'job_dir')
To test distributed mode, I simply launch it with the environment variable TF_CONFIG={"cluster": {"ps":["localhost:5040"], "worker":["localhost:5041"]}, "task":{"type":"worker","index":0}, "environment": "local"} (this is for the worker, the same with ps type is used to launch the parameter server.
I use tensorflow-1.0.1 (but had the same behavior with 1.0.0) on windows-64, only CPU. I actually never get any error, it just hang on after INFO:tensorflow:Create CheckpointSaverHook. forever... I've tried to attach VisualStudio C++ debugger to the process but with little success so far, so I can't print a stack for what's happening in the native part.
P.S.: it's not a problem with DNNLinearCombinedClassifier because it fails as well with a simple tf.contrib.learn.LinearClassifier. And as noted in the comments, it's not due to both process running on localhost, since it fails also when running on separate VMs.
EDIT: I think there's actually an issue with server launching. It looks like the server is not launched when you're in local mode (no matter if distributed or not), cf. tensorflow/contrib/learn/python/learn/experiment.py l.250-258:
# Start the server, if needed. It's important to start the server before
# we (optionally) sleep for the case where no device_filters are set.
# Otherwise, the servers will wait to connect to each other before starting
# to train. We might as well start as soon as we can.
config = self._estimator.config
if (config.environment != run_config.Environment.LOCAL and
config.environment != run_config.Environment.GOOGLE and
config.cluster_spec and config.master):
self._start_server()
This will prevent the server from being started in local mode for the workers... Anyone has an idea if it's a bug or there's something I'm missing?
So this has been answered in: https://github.com/tensorflow/tensorflow/issues/8796. Finally, one should use CLOUD for any distributed operation.

Python script has to be run twice to execute

I have written a python script to gather and analyze data from a .csv file and then plot it using the matplotlib.pyplot module. I'm using numpy.genfromtext() to gather the data.
The first time I run the file, nothing happens. I get the console message:
>>> runfile('C:/my_filepath/thing.py')
and nothing more. If I run the file again, then it executes, prints the stuff, the plot comes up etc:
runfile('C:/my_filepath/thing.py')
~the stuff it's supposed to print~
More info: This problem only occurs on my laptop computer, which leads me to believe it has something to do with my matplotlib installation (mysterious to me because I installed an identical Anaconda package on my desktop and there is no issue). On the laptop the plot window is separate, and on the desktop the plot displays in the console. Maybe that's relevant.
Has anyone had this issue?
edit2: This only happens if I try to run the program multiple times in the same console. If I run the script in a fresh console it works fine, then you have to run it twice for every execution. If I close the matplotlib window, kill the console, and open a new one, I can execute it fine every time.
edit: here is a working example which exhibits the odd behavior. don't make fun of my code - I learned python over a weekend
import re
import numpy as np
from numpy import genfromtxt as gft
import matplotlib.pyplot as plt
def getnames(Pattern):
#construct the list of files in the directory
CSVList = []
for FileName in os.listdir():
if re.compile(Pattern).search(FileName):
CSVList.append(FileName)
print(len(CSVList), 'files found.')
#sort the list by the integer values of the last 4 digits of the filename
CSVList.sort(key=lambda x: int(x.rsplit('.')[0][-4:]))
return CSVList
def xy_extract(Data):
x = np.array([row[0] for row in Data])
y = np.array([row[1] for row in Data])
return x, y
CSVList = getnames('(?i)\.csv$')
print(CSVList)
#plt.figure(figsize=(10,5))
plt.xlim(799,3999)
for Filename in CSVList:
Data = gft(Filename, delimiter=',',skip_header=2)
x, y = xy_extract(Data)
plt.plot(x, y,label=Filename)
plt.show()

Categories

Resources