How do I print the calculated features in tensorflow (python) - python

I am a college student who has just started learning Python and doesn't have a lot of coding experience and have been experimenting with TensorFlow out of curiosity. I know that I should become more fluent in learning python before attempting such an ambitious project, but I really want to learn about this experimentally.
So my goal is to take a pre-formatted CSV that has the RSI, RS, MACD, and signal of a stock, and then also if the price of that stock increase the next day (in relation to the prvious day). Wether or not it increased is represented by a 1 or a 0 (1 being an increase, 0 being no change or decrease) so everything is an integer. What I am trying to find is what combination of these indicators leads to the increase. The increase or not is indicated by the class.
So far I have trained the model and tested my test set and gotten it to be 89% accurate, but what I am trying to do is print the combination of values that it found led to the increase. So how might I print the Spy_features that result in Spy_label(class) =1, from the already trained model that has been calculated?
If anymore information is needed, I will gladly provide it, I just feel like I've hit a wall with this aspect of my first project. Most of all I really would like to learn more about Python and machine learning, so a more explanation of how to go about something like this would be greatly appreciated.
import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing
import matplotlib.pyplot as plt
from IPython.display import clear_output
from six.moves import urllib
#Data files
SPY_Training = pd.read_csv(r'C:\Users\matth\Downloads\Spy_training(noheader).csv',names=["RSI","RS","MACD","signal","Class"])
SPY_Test = pd.read_csv(r'C:\Users\matth\Downloads\Spy_test(noheader).csv',names=["RSI","RS","MACD","signal","Class"])
SPY_Test.shape[0], SPY_Training.shape[0]
SPY_Training.head()
SpyTest_features = SPY_Test.copy()
SpyTest_labels = SpyTest_features.pop('Class')
Spy_features = SPY_Training.copy()
Spy_labels = Spy_features.pop('Class')
Spy_features = np.array(Spy_features)
Spy_features
print(Spy_features)
Spy_model = tf.keras.Sequential([
layers.Dense(64),
layers.Dense(1)
])
Spy_model.compile(loss = tf.losses.MeanSquaredError(),
optimizer= tf.optimizers.Adam(), metrics=['accuracy'])
Spy_model.fit(Spy_features, Spy_labels, epochs=10)
test_loss, test_acc = Spy_model.evaluate(SpyTest_features, SpyTest_labels, verbose=2)
print('Test accuracy:', test_acc)

In neural networks, every value has a effect on the outcome(either positive or negative).
All your 4 inputs are directly connected to 64 hidden units which are then directly connected to the 1 ouput unit. So one can not say in advance, what value is positively or negatively effecting the output.
You can use matplotlib/seaborn to understand the data by plotting multiple graphs between values.

Related

Convolutional layer return the same result for different input

i'm using a neural network on DNA sequences for classification, where the input is encoded in one hot encoding.
I was starting using a very simple network, just for some test.
ishape = (None,4)
ksize = 18
model.add(Conv1D(16, ksize, activation='relu', input_shape=ishape))
model.add(GlobalAvgPool1D(data_format="channels_last"))
model.add(Dense(2, activation='sigmoid'))
The problem was that it returns ever the same results, that change every time that i train the networks.
So i try to see the output of the global averaging layer.. and what i found is that the output is almost the same, for example, considering only one value, it goes from 0.17484 to 0.17424. ( This happend also if i try to predict a fake input with all 0 or all 1)
I don't know how resolve it.. some suggestion?
P.S. this problem is indipendent by the training, because i found that it's the same if i try to predict the input after the initializzation but before the training.
UPDATE---
I found that the weight of the convooutional layer are small, in the range of 0.1 or 0.01.. considering that i work with only 0 and 1 value, is possible that this is the problem? How can i fix this?
I believe it is expected behaviour from keras/tensorflow. Same code can give you different results because of randomness that is present during model fit and training..
May be you can try this and import below line on the top of your code
from numpy.random import seed
seed(1)
from tensorflow import set_random_seed
set_random_seed(2)
for reference you can check on this url and it's totally fine to have this kind of results

Loss value in Linear regression

I have done a linear regression problem with boston dataset and I have obtained the next results:
Loss value does not change with increasing number of value. What is the reason of this mistake? Please, help me
import pandas as pd
import torch
import numpy as np
import torch.nn as nn
from sklearn import preprocessing
training_set=pd.read_csv('boston_data.csv')
training_set=training_set.to_numpy()
test_set=test_set.to_numpy()
inputs=training_set[:,0:13]
inputs=preprocessing.normalize(inputs)
target=training_set[:,13:14]
target=preprocessing.normalize(target)
inputs=torch.from_numpy(inputs)
target=torch.from_numpy(target)
test_set=torch.from_numpy(test_set)
w=torch.randn(13,1,requires_grad=True)
b=torch.randn(404,1,requires_grad=True)
def model(x):
return x#w+b
pred=model(inputs.float())
def loss_MSE(x,y):
ras=x-y
return torch.sum(ras * ras) / ras.numel()
for i in range(100):
pred=model(inputs.float())
loss=loss_MSE(target,pred)
loss.backward()
with torch.no_grad():
w -= w.grad * 1e-5
b -= b.grad * 1e-5
w.grad.zero_()
b.grad.zero_()
print(loss)
welcome to Stackoverflow
Your main loop is fine (you could have made you life much easier however, you should probably read this), but your learning rate (1e-5) is most likely way too low.
I tried with a small dummy problem, it was solved very quickly with a learning rate ~1e-2, and would take tremendously longer with 1e-5. It does converge anyway though, but after much more than 100 epochs. You mentionned you tried increasing the number of epoch, did not write how with many you actually ran the experiment.
Please try increasing this parameter (learning rate) to see whether it solves your issue. You can also try removing the division by numel(), which will have the same effect (the division is also applied to the gradients).
Next time, please provide a small minimal example than can be run and help reproduce your error. Here most of your code is data loading which can be replaced with 2 lines of dummy data generation.

Different predictions on multiple run of the same algorithm scikit neural network

Since a MLP can implement any function. I have the following code, using which I am trying to implement the AND function. But what I find that on running the program multiple times, I end up getting different predicted values. Why is this happening ? Also how does one determine which type of activation function has to be provided at different layers ?
from sknn.mlp import Regressor,Layer,Classifier
import numpy as np
X_train = np.array([[0,0],[0,1],[1,0],[1,1]])
y_train = np.array([0,0,0,1])
nn = Classifier(layers=[Layer("Softmax", units=2),Layer("Linear", units=2)],learning_rate=0.001,n_iter=25)
nn.fit(X_train, y_train)
X_example = np.array([[0,0],[0,1],[1,0],[1,1]])
y_example = nn.predict(X_example)
print (y_example)
-The different values obtained for every run is because your weights are randomly initialized.
-Activation functions have different properties. You can either use your experience to decide which is best for your situation, or you can read how they work (https://stats.stackexchange.com/questions/115258/comprehensive-list-of-activation-functions-in-neural-networks-with-pros-cons)

How to get stable results with TensorFlow, setting random seed

I'm trying to run a neural network multiple times with different parameters in order to calibrate the networks parameters (dropout probabilities, learning rate e.d.). However I am having the problem that running the network while keeping the parameters the same still gives me a different solution when I run the network in a loop as follows:
filename = create_results_file()
for i in range(3):
g = tf.Graph()
with g.as_default():
accuracy_result, average_error = network.train_network(
parameters, inputHeight, inputWidth, inputChannels, outputClasses)
f, w = get_csv_writer(filename)
w.writerow([accuracy_result, "did run %d" % i, average_error])
f.close()
I am using the following code at the start of my train_network function before setting up the layers and error function of my network:
np.random.seed(1)
tf.set_random_seed(1)
I have also tried adding this code before the TensorFlow graph creation, but I keep getting different solutions in my results output.
I am using an AdamOptimizer and am initializing network weights using tf.truncated_normal. Additionally I am using np.random.permutation to shuffle the incoming images for each epoch.
Setting the current TensorFlow random seed affects the current default graph only. Since you are creating a new graph for your training and setting it as default (with g.as_default():), you must set the random seed within the scope of that with block.
For example, your loop should look like the following:
for i in range(3):
g = tf.Graph()
with g.as_default():
tf.set_random_seed(1)
accuracy_result, average_error = network.train_network(
parameters, inputHeight, inputWidth, inputChannels, outputClasses)
Note that this will use the same random seed for each iteration of the outer for loop. If you want to use a different—but still deterministic—seed in each iteration, you can use tf.set_random_seed(i + 1).
Backend Setup: cuda:10.1, cudnn: 7, tensorflow-gpu: 2.1.0, keras: 2.2.4-tf, and vgg19 customized model
After looking into the issue of unstable results for tensorflow backend with GPU training and large neural network models based on keras, I was finally able to get reproducible (stable) results as follows:
Import only those libraries that would be required for setting seed and initialize a seed value
import tensorflow as tf
import os
import numpy as np
import random
SEED = 0
Function to initialize seeds for all libraries which might have stochastic behavior
def set_seeds(seed=SEED):
os.environ['PYTHONHASHSEED'] = str(seed)
random.seed(seed)
tf.random.set_seed(seed)
np.random.seed(seed)
Activate Tensorflow deterministic behavior
def set_global_determinism(seed=SEED):
set_seeds(seed=seed)
os.environ['TF_DETERMINISTIC_OPS'] = '1'
os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
tf.config.threading.set_inter_op_parallelism_threads(1)
tf.config.threading.set_intra_op_parallelism_threads(1)
# Call the above function with seed value
set_global_determinism(seed=SEED)
Important notes:
Please call the above code before executing any other code
Model training might become slower since the code is deterministic, hence there's a tradeoff
I experimented several times with a varying number of epochs and different settings (including model.fit() with shuffle=True) and the above code gives me reproducible results.
References:
https://suneeta-mall.github.io/2019/12/22/Reproducible-ml-tensorflow.html
https://keras.io/getting_started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development
https://www.tensorflow.org/api_docs/python/tf/config/threading/set_inter_op_parallelism_threads
https://www.tensorflow.org/api_docs/python/tf/random/set_seed?version=nightly
Deterministic behaviour can be obtained either by supplying a graph-level or an operation-level seed. Both worked for me. A graph-level seed can be placed with tf.set_random_seed. An operation-level seed can be placed e.g, in a variable intializer as in:
myvar = tf.Variable(tf.truncated_normal(((10,10)), stddev=0.1, seed=0))
Tensorflow 2.0 Compatible Answer: For Tensorflow version greater than 2.0, if we want to set the Global Random Seed, the Command used is tf.random.set_seed.
If we are migrating from Tensorflow Version 1.x to 2.x, we can use the command,
tf.compat.v2.random.set_seed.
Note that tf.function acts like a re-run of a program in this case.
To set the Operation Level Seed (as answered above), we can use the command, tf.random.uniform([1], seed=1).
For more details, refer this Tensorflow Page.
It seems as if none of these answers will work due to underlying implementation issues in CuDNN.
You can get a bit more determinism by adding an extra flag
os.environ['PYTHONHASHSEED']=str(SEED)
os.environ['TF_CUDNN_DETERMINISTIC'] = '1' # new flag present in tf 2.0+
random.seed(SEED)
np.random.seed(SEED)
tf.set_random_seed(SEED)
But this still won't be entirely deterministic. To get an even more exact solution, you need use the procedure outlined in this nvidia repo.
Please add all random seed functions before your code:
tf.reset_default_graph()
tf.random.set_seed(0)
random.seed(0)
np.random.seed(0)
I think, some models in TensorFlow are using numpy or the python random function.
I'm using TensorFlow 2 (2.2.0) and I'm running code in JupyterLab. I've tested this in macOS Catalina and in Google Colab with same results. I'll add some stuff to Tensorflow Support's answer.
When I do some training using the model.fit() method I do it in a cell. I do some other stuff in other cells. This is the code I run in the mentioned cell:
# To have same results always place this on top of the cell
tf.random.set_seed(1)
(x_train, y_train), (x_test, y_test) = load_data_onehot_grayscale()
model = get_mlp_model_compiled() # Creates the model, compiles it and returns it
history = model.fit(x=x_train, y=y_train,
epochs=30,
callbacks=get_mlp_model_callbacks(),
validation_split=.1,
)
This is what I understand:
TensorFlow has some random processes that happen at different stages (initializing, shuffling, ...), every time those processes happen TensorFlow uses a random function. When you set the seed using tf.random.set_seed(1) you make those processes use it and if the seed is set and the processes don't change the results will be the same.
Now, in the code above, if I change tf.random.set_seed(1) to go below the line model = get_mlp_model_compiled() my results change, I believe it's because get_mlp_model_compiled() uses randomness and isn't using the seed I want.
Caveat about point 2: if I run the cell 3 times in a row I do get same results. I believe this happens because, in run nº1 get_mlp_model_compiled() isn't using TensorFlow's internal counter with my seed. In run nº2 it will be using a seed and all subsequent runs it will be using the seed too so after run nº2 results will be the same.
I might have some information wrong so feel free to correct me.
To understand what's going on you should read the docs, they're not so long and kind of easy to understand.
This answer is an addition to Luke's answer and for TF v2.2.0
import numpy as np
import os
import random
import tensorflow as tf # 2.2.0
SEED = 42
os.environ['PYTHONHASHSEED']=str(SEED)
os.environ['TF_CUDNN_DETERMINISTIC'] = '1' # TF 2.1
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)

How to get reproducible results in keras

I get different results (test accuracy) every time I run the imdb_lstm.py example from Keras framework (https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py)
The code contains np.random.seed(1337) in the top, before any keras imports. It should prevent it from generating different numbers for every run. What am I missing?
UPDATE: How to repro:
Install Keras (http://keras.io/)
Execute https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py a few times. It will train the model and output test accuracy.
Expected result: Test accuracy is the same on every run.
Actual result: Test accuracy is different on every run.
UPDATE2: I'm running it on Windows 8.1 with MinGW/msys, module versions:
theano 0.7.0
numpy 1.8.1
scipy 0.14.0c1
UPDATE3: I narrowed the problem down a bit. If I run the example with GPU (set theano flag device=gpu0) then I get different test accuracy every time, but if I run it on CPU then everything works as expected. My graphics card: NVIDIA GeForce GT 635)
You can find the answer at the Keras docs: https://keras.io/getting-started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development.
In short, to be absolutely sure that you will get reproducible results with your python script on one computer's/laptop's CPU then you will have to do the following:
Set the PYTHONHASHSEED environment variable at a fixed value
Set the python built-in pseudo-random generator at a fixed value
Set the numpy pseudo-random generator at a fixed value
Set the tensorflow pseudo-random generator at a fixed value
Configure a new global tensorflow session
Following the Keras link at the top, the source code I am using is the following:
# Seed value
# Apparently you may use different seed values at each stage
seed_value= 0
# 1. Set the `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)
# 2. Set the `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)
# 3. Set the `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)
# 4. Set the `tensorflow` pseudo-random generator at a fixed value
import tensorflow as tf
tf.random.set_seed(seed_value)
# for later versions:
# tf.compat.v1.set_random_seed(seed_value)
# 5. Configure a new global `tensorflow` session
from keras import backend as K
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
# for later versions:
# session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
# sess = tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(), config=session_conf)
# tf.compat.v1.keras.backend.set_session(sess)
It is needless to say that you do not have to to specify any seed or random_state at the numpy, scikit-learn or tensorflow/keras functions that you are using in your python script exactly because with the source code above we set globally their pseudo-random generators at a fixed value.
Theano's documentation talks about the difficulties of seeding random variables and why they seed each graph instance with its own random number generator.
Sharing a random number generator between different {{{RandomOp}}}
instances makes it difficult to producing the same stream regardless
of other ops in graph, and to keep {{{RandomOps}}} isolated.
Therefore, each {{{RandomOp}}} instance in a graph will have its very
own random number generator. That random number generator is an input
to the function. In typical usage, we will use the new features of
function inputs ({{{value}}}, {{{update}}}) to pass and update the rng
for each {{{RandomOp}}}. By passing RNGs as inputs, it is possible to
use the normal methods of accessing function inputs to access each
{{{RandomOp}}}’s rng. In this approach it there is no pre-existing
mechanism to work with the combined random number state of an entire
graph. So the proposal is to provide the missing functionality (the
last three requirements) via auxiliary functions: {{{seed, getstate,
setstate}}}.
They also provide examples on how to seed all the random number generators.
You can also seed all of the random variables allocated by a
RandomStreams object by that object’s seed method. This seed will be
used to seed a temporary random number generator, that will in turn
generate seeds for each of the random variables.
>>> srng.seed(902340) # seeds rv_u and rv_n with different seeds each
I finally got reproducible results with my code. It's a combination of answers I saw around the web. The first thing is doing what #alex says:
Set numpy.random.seed;
Use PYTHONHASHSEED=0 for Python 3.
Then you have to solve the issue noted by #user2805751 regarding cuDNN by calling your Keras code with the following additional THEANO_FLAGS:
dnn.conv.algo_bwd_filter=deterministic,dnn.conv.algo_bwd_data=deterministic
And finally, you have to patch your Theano installation as per this comment, which basically consists in:
replacing all calls to *_dev20 operator by its regular version in theano/sandbox/cuda/opt.py.
This should get you the same results for the same seed.
Note that there might be a slowdown. I saw a running time increase of about 10%.
In Tensorflow 2.0 you can set random seed like this:
import tensorflow as tf
tf.random.set_seed(221)
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential( [
layers.Dense(2,name = 'one'),
layers.Dense(3,activation = 'sigmoid', name = 'two'),
layers.Dense(2,name = 'three')])
x = tf.random.uniform((12,12))
model(x)
The problem is now solved in Tensorflow 2.0 ! I had the same issue with TF 1.x (see If Keras results are not reproducible, what's the best practice for comparing models and choosing hyper parameters? ) but
import os
####*IMPORANT*: Have to do this line *before* importing tensorflow
os.environ['PYTHONHASHSEED']=str(1)
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers
import random
import pandas as pd
import numpy as np
def reset_random_seeds():
os.environ['PYTHONHASHSEED']=str(1)
tf.random.set_seed(1)
np.random.seed(1)
random.seed(1)
#make some random data
reset_random_seeds()
NUM_ROWS = 1000
NUM_FEATURES = 10
random_data = np.random.normal(size=(NUM_ROWS, NUM_FEATURES))
df = pd.DataFrame(data=random_data, columns=['x_' + str(ii) for ii in range(NUM_FEATURES)])
y = df.sum(axis=1) + np.random.normal(size=(NUM_ROWS))
def run(x, y):
reset_random_seeds()
model = keras.Sequential([
keras.layers.Dense(40, input_dim=df.shape[1], activation='relu'),
keras.layers.Dense(20, activation='relu'),
keras.layers.Dense(10, activation='relu'),
keras.layers.Dense(1, activation='linear')
])
NUM_EPOCHS = 500
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x, y, epochs=NUM_EPOCHS, verbose=0)
predictions = model.predict(x).flatten()
loss = model.evaluate(x, y) #This prints out the loss by side-effect
#With Tensorflow 2.0 this is now reproducible!
run(df, y)
run(df, y)
run(df, y)
This works for me:
SEED = 123456
import os
import random as rn
import numpy as np
from tensorflow import set_random_seed
os.environ['PYTHONHASHSEED']=str(SEED)
np.random.seed(SEED)
set_random_seed(SEED)
rn.seed(SEED)
It is easier that it seems. Putting only this, it works:
import numpy as np
import tensorflow as tf
import random as python_random
def reset_seeds():
np.random.seed(123)
python_random.seed(123)
tf.random.set_seed(1234)
reset_seeds()
The KEY of the question, VERY IMPORTANT, is to call the function reset_seeds() every time before running the model. Doing that you will obtain reproducible results as I check in the Google Collab.
I would like to add something to the previous answers. If you use python 3 and you want to get reproducible results for every run, you have to
set numpy.random.seed in the beginning of your code
give PYTHONHASHSEED=0 as a parameter to the python interpreter
I have trained and tested Sequential() kind of neural networks using Keras. I performed non linear regression on noisy speech data. I used the following code to generate random seed :
import numpy as np
seed = 7
np.random.seed(seed)
I get the exact same results of val_loss each time I train and test on the same data.
I agree with the previous comment, but reproducible results sometimes needs the same environment(e.g. installed packages, machine characteristics and so on). So that, I recommend to copy your environment to other place in case to have reproducible results. Try to use one of the next technologies:
Docker. If you have a Linux this very easy to move your environment to other place. Also you can try to use DockerHub.
Binder. This is a cloud platform for reproducing scientific experiments.
Everware. This is yet another cloud platform for "reusable science". See the project repository on Github.
Unlike what has been said before, only Tensorflow seed has an effect on random generation of weights (latest version Tensorflow 2.6.0 and Keras 2.6.0)
Here is a small test you can run to check the influence of each seed (with np being numpy, tf being tensorflow and random the Python random library):
# Testing how seeds influence results
# -----------------------------------
print("Seed specification")
my_seed = 36
# To vary python hash, numpy random, python random and tensorflow random seeds
a, b, c, d = 0, 0, 0, 0
os.environ['PYTHONHASHSEED'] = str(my_seed+a) # Has no effect
np.random.seed(my_seed+b) # Has no effect
random.seed(my_seed+c) # Has no effect
tf.random.set_seed(my_seed+d) # Has an effect
print("Making ML model")
keras.mixed_precision.set_global_policy('float64')
model = keras.Sequential([
layers.Dense(2, input_shape=input_shape),#, activation='relu'),
layers.Dense(output_nb, activation=None),
])
#
weights_save = model.get_weights()
print("Some weights:", weights_save[0].flatten())
We notice that variables a, b, c have no effect on the results.
Only d has an effect on the results.
So, in the latest versions of Tensorflow, only tensorflow random seed has an influence on the random choice of weights.

Categories

Resources