How to run a python project multiple time on different configurations? - python

I have a python project which takes as input a .csv file and some parameters and then give back some results.
Every time I want to try a new coded feature of my code and get the results I have to run the program changing the .csv name and other parameters. So it takes long to change every time this argument, because I have a lot of different input files.
There's a way to write a program in python which can do this for me?
For example a program that does:
- run "project.py" n times
- first time with "aaa.csv" file as input and parm1=7, then put results in "a_res.csv"
- second time with "bbb.csv" file as input and parm1=4, then put results in "b_res.csv"
- third time with "ccc.csv" file as input and parm1=2, then put results in "c_res.csv"
- fourth time with "ddd.csv" file as input and parm1=6, then put results in "d_res.csv"
- ...
Thanks!

yes, make a list of the configurations you want and execute your function in a loop that iterates over this configurations
configurations = [
["aaa.csv", 7, "a_res.csv"],
["bbb.csv", 4, "b_res.csv"],
["ccc.csv", 2, "c_res.csv"],
["ddd.csv", 6, "d_res.csv"]]
for c in configurations:
# assuming your python function accepts 3 parameters:
# input_file, param1, output_file
your_python_function(c[0], c[1], c[2])

Related

How do I keep a variable open and not refill it when running a Python script?

I want to ask you how I can keep a variable open and not refill it when I execute the script. As an example, I read the file and assigned all of its lines to a variable. Then, I created some processes to interact with data executed from a file. I realized I needed to change something in my process after running the file, so I changed a few lines and ran the script again. The file is large, and I need to wait for it to upload, so I considered how I could keep the variable that refers to this file open at all times and easily make changes to my script without having to wait so long for it to upload.
import numpy as np
from tqdm import tqdm
from scipy import spatial
# This is the variable that I want to keep always open
embeddings_dict = {}
# This is the current file
filename = "/some_filename"
with open(filename, 'r', encoding="utf-8") as f:
lines = f.readlines()
for i in tqdm(range(len(lines))):
values = lines[i].split()
word = values[0]
vector = np.asarray(values[1:], "float32")
embeddings_dict[word] = vector
# This is the process
def find_closest_embeddings_euc(embedding):
return sorted(embeddings_dict.keys(),
key=lambda word: spatial.distance.euclidean(embeddings_dict[word], embedding))
print(find_closest_embeddings_euc(embeddings_dict['software'])[:10])
I expect to understand how can I make it.
You can't really persist memory in RAM once a process finishes. What you're describing is a classic workflow in the ML community (having to load in some huge dataset in memory and then apply and tweak a series of transformations to it) and a notebook environment is usually the answer.
You can check out how to setup your environment at either of these links:
https://docs.jupyter.org/en/latest/install/notebook-classic.html
https://code.visualstudio.com/docs/datascience/jupyter-notebooks (I recommend this one if you are already using VS Code)
Once you create your first notebook, you can add two cells to it - one for the data loading and another for the transformations. Now you can execute them independently - you can load your data once and apply the transformations and experiment with them as many times as you'd like.

Python Script that is changes by execution

how can I write a script that it changes by execution?
For example, two a have script exist from two row:
list = [1, 2, 3, 4, 5]
sliced_list = list[0:1]
Executing it, 2nd row should be:
sliced_list = list[1:2]
and then,
sliced_list = list[2:3]
I want to modify variable "sliced_list" everytime I run this file.
Generally this is not something you should ever want to do, since it is likely to result in non-deterministic behavior, and in the event of a bug its possible to completely overwrite your script and lose data.
If you want to change the date your script is operating on you should store it persistently in some fashion. This could be in a separate file somewhere or in an environment variable.
But to do what your asking you would need to open the script, copy the contents, and modify the content as you desire like this:
with open("/path/to/script.py", 'r+') as script:
contents = script.read()
# ... some string logic here
# Point cursor to the beginning of the file
# If the original contents were longing than the new contents
# you'll have unwanted data at the end of the file.
script.seek(0)
script.write(contents)
You could save the start index into a file when you run the script. Then increment it and save it. Something like what is shown below.
import os
List=[1,2,3,4,5]
file=open('file.txt','r+')
start_index=int(file.read())
print(List[start_index:start_index+1])
file.close()
os.remove('file.txt')
file=open('file.txt','w')
file.write(str(start_index+1))
file.close()

Saving arrays within the loop with different name

I have a question related to the saving an array(.npy).
I have a computation that produces a new array for every loop.
Once a while (let's say every 10 steps) I would like to save my output into .npy with a specific name, let's say result_at_step_10.npy. Of course at steps 20, the file name should be result_at_step_20.npy
how could I do that in python?
If renaming the file is what your are looking for, here is the code
import os
os.rename("old_name","new_name")

How to produce multiple output files with a single input file using the command 'np.random.normal'?

I have a file with 44,586 lines of data. It is read in using pylab:
data = pl.loadtxt("20100101.txt")
density = data[:,0]
I need to run something like...
densities = np.random.normal(density, 30, 1)
np.savetxt('1.txt', np.vstack((densities.ravel())).T)
...and create a new file named 1.txt which has all 44,586 lines of my data randomised within the parameters I desire. Will my above commands be sufficient to read through and perform what I want on every line of data?
The more complicated part on top of this - is that I want to run this 1,000 times and produce 1,000 .txt files (1.txt, 2.txt ... 1000.txt) which each run the exact same command.
I become stuck when trying to run loops in scripts, as I am still very inexperienced. I am having trouble even beginning to get this running how I desire - also I am confused how to handle saving the files with their different names. I have used np.savetxt in the past, but don't know how to make it perform this task.
Thanks for any help!
There are two minor issues - the first relates to how to select the name of the files (which can be solved using pythons support for string concatenation), the second relates to np.random.normal - which only allows a size parameter when loc is a scalar.
data = pl.loadtxt("20100101.txt")
density = data[:,0]
for i in range(1, 1001):
densities = np.random.normal(loc=density, scale=30)
np.savetxt(str(i) + '.txt', densities)

What is the best way to write a Batch script in?

I am having around 20 scripts, each produce one output file as the output which is fed back as input to the next file. I want to now provide the user with an option to restart the batch script from any point in the script.
My friend suggested using make or ant having targets defined for each python script. I want to know your(advanced hackers) suggestions.
Thank you
Make works like this:
Target: dependencies
commands
Based on your scripts, you might try this type of Makefile:
Step20: output19
script20 #reads output19 and produces final output
Step19: output18
script19 # reads output18 and produces output19
.. etc ..
Step2: output1
script2 # reads output1 and produces output2
Step1:
script1 # produces output1
That way, each script won't be run until the output from the previous step has been produced. Running make Step20 will travel down the entire chain, and start at script1 if none of the outputs exist. Or, if output15 exists, it will start running script16.

Categories

Resources