How to split data into train,test and validation sets? [closed]

How to split data into train,test and validation sets? [closed] - python

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
Following is my code:
Here is the main function
trainingSet=[]
testSet=[]
validationSet=[]
loadDataset('iris.data.txt', trainingSet, testSet,validationSet)
And this is the loadDataset function
def loadDataset(filename, trainingSet=[] ,testSet=[],validationSet=[]):
with open(filename, 'rb') as csvfile:
lines = csv.reader(csvfile)
dataset = list(lines)
for x in range(len(dataset)-1):
for y in range(4):
dataset[x][y] = float(dataset[x][y])
random.shuffle(dataset)
trainingSet .append(dataset[:106])
testSet.append(dataset[106:128])
validationSet.append(dataset[128:150])
"loadDataset gets wine data set csv and converts it into a list of floats. Then it splits the data."
I am trying to split my data into 70-15-15. But when I print the lengths of each list it gives 1.

Simply using .extend instead of .append should solve your issue. .append adds the slice dataset[xxx] as a single element to the list. .extend, on the other hand, adds all the elements in dataset[xxx] to the list.
However, if you only call loadDataSet once, as in your example, there is no need to initialize empty datasets, and you can return the ranges directly.
main function:
trainingSet, testSet, validationSet = loadDataset('iris.data.txt')
loadDataset function:
def loadDataset(filename):
with open(filename, 'rb') as csvfile:
lines = csv.reader(csvfile)
dataset = list(lines)
for x in range(len(dataset)-1):
for y in range(4):
dataset[x][y] = float(dataset[x][y])
random.shuffle(dataset)
trainingSet = dataset[:106]
testSet = dataset[106:128]
validationSet = dataset[128:150]
return trainingSet, testSet, validationSet

Related

divide a matrix into multiple matrices in python at specific locations [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 days ago.
Improve this question
I have the following matrix (or 2D list):
matrix = [['0','0'],
['2','3'],
['1','9'],
['7','11'],
['1','2'],
['7','23'],
['0','0'],
['6','8'],
['3','1'],
['8','1'],
['4','3'],
['0','0'],
['63','9'],
['31','10'],
['82','11'],
['41','31']]
I would like to split it into multiple matrices based on the value in the row. The zeros will determine the location of the split:
matrix1 = [['0','0'],
['2','3'],
['1','9'],
['7','11'],
['1','2'],
['7','23']]
matrix2 = [['0','0'],
['6','8'],
['3','1'],
['8','1'],
['4','3']]
matrix3 = [['0','0'],
['63','9'],
['31','10'],
['82','11'],
['41','31']]
Then I need to write them to a CSV file (adjacent to each other) like this:

import csv
from itertools import zip_longest
matrix = [['0','0'],
['2','3'],
['1','9'],
['7','11'],
['1','2'],
['7','23'],
['0','0'],
['6','8'],
['3','1'],
['8','1'],
['4','3'],
['0','0'],
['63','9'],
['31','10'],
['82','11'],
['41','31']]
zero_pos = [i for i,element in enumerate(matrix) if element == ['0', '0']]
num_mats = len(zero_pos)
matrices = [matrix[zero_pos[i]:zero_pos[i+1]] if i+1<num_mats else matrix[zero_pos[i]:] for i in range(num_mats)]
with open('temp_output.csv', 'w', newline = '') as csv_file:
writer = csv.writer(csv_file, delimiter=',',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
for row in zip_longest(*matrices):
writer.writerow([element for matrix in row if matrix is not None for element in matrix])

How to plot curve from polar coordinates data stored in a .txt file in python? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
The program outputs polar coordinates to a text file the following way:
f = open(r'C:\Users\generic_user\Desktop\generic_txt_file.txt','w')
while n <= N-1:
u_nn = u_n + k*v_n
v_nn = v_n - k*u_n + k*mu*h**-2
u_n = u_nn
v_n = v_nn
r_n = u_n**-1
string_tita_n = str(n*k)
string_r_n = str(r_n)
f.write(string_tita_n + ' ' + string_r_n + '\n')
n = n + 1
f.close()
The data looks like this:
0.0 59953022.59004443
0.06283185307179587 59989603.88149546
0.12566370614359174 60062900.551973954
0.1884955592153876 60173036.61693284
0.25132741228718347 60320124.93744161
0.3141592653589793 60504265.25498237
0.3769911184307752 60725541.31412035
I'm relatively new to programming in general and might be looking in the wrong places, but I'm struggling with matplotlib to make this work and would appreciate getting pointed in the right direction.

This is the simplest way you can get it done:
import matplotlib.pyplot as plt
file = open(r'C:\Users\generic_user\Desktop\generic_txt_file.txt','r')
lines = file.readlines()
theta = []
rad = []
for line in lines:
t, r = line.replace("\n","").split(" ")
t = float(t)
r = float(r)
theta.append(t)
rad.append(r)
plt.polar(theta, rad)
plt.show()
You can have a look at the Pandas module and its method read_csv, which handles data reading.

Can openmdao compute partial derivatives across a Matlab ExternalCodeComp without explicitly defining them? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
Is it possible to have openmdao to approximate partial the derivatives across the ExternalCodeComp using finite difference.
Just by using the method self.declare_partials('*','*', method='fd') seems not to work
Optimization converges after 1 iteration with only 1 function and gradient evaluations.
The error that pop ups :
DerivativesWarning:Constraints or objectives [('p.f_xy', inds=[0])] cannot be impacted by the design variables of the problem.
DerivativesWarning:Design variables [('p.x', inds=[0]), ('p.y', inds=[0])] have no impact on the constraints or objective.
Optimization terminated successfully .

We run a test case very similar to this as part of the OpenMDAO test suite. Your declare_partials call is not quite correct because you list the first two args as '' and '' which would not match any variable names. I suspect that's just a typo in your post though because if you actually ran OpenMDAO while using those args you would get an exception telling you that the declared partials didn't match any variables. In the example shown below I declare the partials as self.declare_partials(of='*', wrt='*', method='fd'). Assuming that your partials are actually declared correctly, my guess is that for some reason the output file that your external code is generating is either not getting updated at all or you're always writing the same values to the output file. Below is a working example of an external code that computes a paraboloid. Hopefully that will help you track down the issue. If not, you can try posting your code here and we can go from there.
Here's the OpenMDAO script:
import sys
import openmdao.api as om
class ParaboloidExternalCodeCompFD(om.ExternalCodeComp):
def setup(self):
self.add_input('x', val=0.0)
self.add_input('y', val=0.0)
self.add_output('f_xy', val=0.0)
self.input_file = 'paraboloid_input.dat'
self.output_file = 'paraboloid_output.dat'
# providing these is optional; the component will verify that any input
# files exist before execution and that the output files exist after.
self.options['external_input_files'] = [self.input_file]
self.options['external_output_files'] = [self.output_file]
self.options['command'] = [
sys.executable, 'extcode_paraboloid.py', self.input_file, self.output_file
]
def setup_partials(self):
# this external code does not provide derivatives, use finite difference
self.declare_partials(of='*', wrt='*', method='fd')
def compute(self, inputs, outputs):
x = inputs['x']
y = inputs['y']
# generate the input file for the paraboloid external code
with open(self.input_file, 'w') as input_file:
input_file.write('%.16f\n%.16f\n' % (x, y))
# the parent compute function actually runs the external code
super().compute(inputs, outputs)
# parse the output file from the external code and set the value of f_xy
with open(self.output_file, 'r') as output_file:
f_xy = float(output_file.read())
outputs['f_xy'] = f_xy
prob = om.Problem()
model = prob.model
model.add_subsystem('p', ParaboloidExternalCodeCompFD())
# find optimal solution with SciPy optimize
# solution (minimum): x = 6.6667; y = -7.3333
prob.driver = om.ScipyOptimizeDriver()
prob.driver.options['optimizer'] = 'SLSQP'
prob.model.add_design_var('p.x', lower=-50, upper=50)
prob.model.add_design_var('p.y', lower=-50, upper=50)
prob.model.add_objective('p.f_xy')
prob.driver.options['tol'] = 1e-9
prob.driver.options['disp'] = True
prob.setup()
# Set input values
prob.set_val('p.x', 3.0)
prob.set_val('p.y', -4.0)
prob.run_driver()
print('p.x =', prob.get_val('p.x'), " expected:", [6.66666667])
print('p.x =', prob.get_val('p.y'), " expected:", [-7.3333333])
And here's the external code script, which is named extcode_paraboloid.py:
#!/usr/bin/env python
#
# usage: extcode_paraboloid.py input_filename output_filename
#
# Evaluates the equation f(x,y) = (x-3)^2 + xy + (y+4)^2 - 3.
#
# Read the values of `x` and `y` from input file
# and write the value of `f_xy` to output file.
if __name__ == '__main__':
import sys
input_filename = sys.argv[1]
output_filename = sys.argv[2]
with open(input_filename, 'r') as input_file:
file_contents = input_file.readlines()
x, y = [float(f) for f in file_contents]
f_xy = (x-3.0)**2 + x*y + (y+4.0)**2 - 3.0
with open(output_filename, 'w') as output_file:
output_file.write('%.16f\n' % f_xy)
If you place them both in the same directory and run the OpenMDAO script, you should get something like:
Optimization terminated successfully. (Exit mode 0)
Current function value: -27.333333333333
Iterations: 5
Function evaluations: 6
Gradient evaluations: 5
Optimization Complete
-----------------------------------
p.x = [6.66666633] expected: [6.66666667]
p.x = [-7.33333367] expected: [-7.3333333]

How do I read a text file of numbers into an array of arrays

In python, using the OpenCV library, I need to create some polylines. The example code for the polylines method shows:
cv2.polylines(img,[pts],True,(0,255,255))
I have all the 'pts' laid out in a text file in the format:
x1,y1,x2,y2,x3,y3,x4,y4
x1,y1,x2,y2,x3,y3,x4,y4
x1,y1,x2,y2,x3,y3,x4,y4
How can I read this file and provide the data to the [pts] variable in the method call?
I've tried the np.array(csv.reader(...)) method as well as a few others I've found examples of. I can successfully read the file, but it's not in the format the polylines method wants. (I am a newbie when it comes to python, if this was C++ or Java, it wouldn't be a problem).

I would try to use numpy to read the csv as an array.
from numpy import genfromtxt
p = genfromtxt('myfile.csv', delimiter=',')
cv2.polylines(img,p,True,(0,255,255))
You may have to pass a dtype argument to the genfromtext if you need to coerce the data to a specific format.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html

In case you know it is a fixed number of items in each row:
import csv
with open('myfile.csv') as csvfile:
rows = csv.reader(csvfile)
res = list(zip(*rows))
print(res)

I know it's not pretty and there is probably a MUCH BETTER way to do this, but it works. That being said, if someone could show me a better way, it would be much appreciated.
pointlist = []
f = open(args["slots"])
data = f.read().split()
for row in data:
tmp = []
col = row.split(";")
for points in col:
xy = points.split(",")
tmp += [[int(pt) for pt in xy]]
pointlist += [tmp]
slots = np.asarray(pointlist)

You might need to draw each polyline individually (to expand on #Chris's answer):
from numpy import genfromtxt
lines = genfromtxt('myfile.csv', delimiter=',')
for line in lines:
cv2.polylines(img, line.reshape((-1, 2)), True, (0,255,255))

Calculating and plotting a grow rate in years from a dictionary

I am trying to plot a graph from a CSV file with the following Python code;
import csv
import matplotlib.pyplot as plt
def population_dict(filename):
"""
Reads the population from a CSV file, containing
years in column 2 and population / 1000 in column 3.
#param filename: the filename to read the data from
#return dictionary containing year -> population
"""
dictionary = {}
with open(filename, 'r') as f:
reader = csv.reader(f)
f.next()
for row in reader:
dictionary[row[2]] = row[3]
return dictionary
dict_for_plot = population_dict('population.csv')
def plot_dict(dict_for_plot):
x_list = []
y_list = []
for data in dict_for_plot:
x = data
y = dict_for_plot[data]
x_list.append(x)
y_list.append(y)
plt.plot(x_list, y_list, 'ro')
plt.ylabel('population')
plt.xlabel('year')
plt.show()
plot_dict(dict_for_plot)
def grow_rate(data_dict):
# fill lists
growth_rates = []
x_list = []
y_list = []
for data in data_dict:
x = data
y = data_dict[data]
x_list.append(x)
y_list.append(y)
# calc grow_rate
for i in range(0, len(y_list)-1):
var = float(y_list[i+1]) - float(y_list[i])
var = var/y_list[i]
print var
growth_rates.append(var)
# growth_rate_dict = dict(zip(years, growth_rates))
grow_rate(dict_for_plot)
However, I'm getting a rather weird error on executing this code
Traceback (most recent call last):
File "/home/jharvard/Desktop/pyplot.py", line 71, in <module>
grow_rate(dict_for_plot)
File "/home/jharvard/Desktop/pyplot.py", line 64, in grow_rate
var = var/y_list[i]
TypeError: unsupported operand type(s) for /: 'float' and 'str'
I've been trying different methods to cast the y_list variable. For example; casting an int.
How can I solve this problem so I can get the percentage of the grow rate through the years to plot this.

Since CSV files are text files, you will need to convert them into numbers. Its easy to correct for the syntax error. Just use
var/float(y_list[i])
Even though that gets rid of the syntax error, there is a minor bug which is a little more difficult to spot, which may result in incorrect results under some circumstances. The main reason being that dictionaries are not ordered. i.e. the x and y values are not ordered in any way. The indentation for your program appears to be a bit off on my computer, so am unable to follow it exactly. But the gist of it appears to be that you are obtaining values from a file (x, and y values) and then finding the sequence
var[i] = (y[i+1] - y[i]) / y[i]
Unfortunately, your y_list[i] may not be in the same sequence as in the CSV file because, it is being populated from a dictionary.
In the section where you did:
for row in reader:
dictionary[row[2]] = row[3]
it is just better to preserve the order by doing
x, y = zip(*[ ( float(row[2]), float(row[3]) ) for row in reader])
x, y = map(numpy.array, [x, y])
return x, y
or something like this ...
Then, Numpy arrays have methods for handling your problem much more efficiently. You can then simply do:
growth_rates = numpy.diff(y) / y[:-1]
Hope this helps. Let me know if you have any questions.
Finally, if you do go the Numpy route, I would highly recommend its own csv reader. Check it out here: http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to split data into train,test and validation sets? [closed] - python

Related

divide a matrix into multiple matrices in python at specific locations [closed]

How to plot curve from polar coordinates data stored in a .txt file in python? [closed]

Can openmdao compute partial derivatives across a Matlab ExternalCodeComp without explicitly defining them? [closed]

How do I read a text file of numbers into an array of arrays

Calculating and plotting a grow rate in years from a dictionary

Categories

Resources