python: glob, loops and local variables - python

My python script loops over many filles in the directory and performs some operations on each of the file, storing results for each of the file in specific variables, defined for each file, using exec() function:
# consider all filles within the current dirrectory, having pdb extension
pdb_list = glob.glob('*.pdb')
#make a list of the filles
list=[]
# loop over the list and make some operation with each file
for pdb in pdb_list:
# take file name w/o its extension
pdb_name=pdb.rsplit( ".", 1 )[ 0 ]
# save file_name of the file
list.append(pdb_name)
#set variable u_{pdb_name}, which will be accosiated with some function that do something on the corresponded file
exec(f'u_{pdb_name} = Universe(pdb)')
exec(f'print("This is %s computed from %s" % (u_{pdb_name}, pdb_name))')
# plot a graph using matplot liv
# exec(f'plt.savefig("rmsd_traj_{pdb_name}.png")')
Basically in my file-looping scripts I tend to use exec(f'...') when I need to save a new variable consisted of the part of some existing variable (like a name of the current file, u_{pdb_name})
Is it possible to do similar taks with the names of variavles but avoiding constantly exec() ?

You could try something like this:
lst = []
universes = {}
# loop over the list and make some operation with each file
for pdb in pdb_list:
# take file name w/o its extension
pdb_name = pdb.rsplit(".", 1)[0]
# save file_name of the file
lst.append(pdb_name)
key = f'u_{pdb_name}'
universes[key] = Universe(pdb)
print(f"This is {key} computed from {pdb_name}")
To access some value, just do:
universes[key] # where key is the variable name
If you want to iterate over all keys and values, do:
for key, universe in universes.items():
print(key)
print(universe.some_function())

Related

Assigning variable names from a dictionary in Python

I'm relatively new to working in Python and can't quite figure this little problem out.
I have a function that takes a .txt file input and reads each line, and based on what is on that line, it will reference a dictionary to assign a variable name. That variable is then assigned a value (also from the .txt file). I've managed to set it up to successfully do this part, but I cannot get it to return those variables as a function output.
Here is a simplified example of what I have:
The .txt file looks something like this:
File Title: 'Test_Template.env' # filename
Number of Objects: 1 # Ns
Object Size: 20 # sd
And the function is something like:
def read_env_inputs(envFilename):
env_dict = {'File Title': 'filename',
'Number of Objects': 'Ns',
'Object Size': 'sd'}
with open(envFilename) as f:
lines = f.readlines()
for line in lines:
line = line.split(':')
if line[0] in env_dict.keys():
if line[0] == 'File Title':
vars()[env_dict[line[0]]] = line[1].split('#')[0].strip()
else:
if len(line[1].split('#')[0].split(',')) == 1:
vars()[env_dict[line[0]]] = float(line[1].split('#')[0].strip())
else:
vars()[env_dict[line[0]]] = list(map(float,line[1].split('#')[0].split(',')))
return filename Ns sd
If I run this as a script (not a function), I end up having the properly named variables in my workspace and can manipulate them. However, this does not successfully define them in a way that allows them to be an output of the function.
I'm trying to avoid creating an if/elif statement for each variable. I'd like it to be able to reference the dictionary based on the key (which is working) and use the value associated with that key as the variable name.
The main problem here is that you are accessing vars() which is the dictionary containing variables that are in scope and, therefore, you cannot return this. vars() something that is very rarely used and isn't the correct solution in this case.
Assuming that the txt file doesn't contain repeating lines you can do something like this:
def read_env_inputs(envFilename):
env_dict = {"File Title": "filename", "Number of Objects": "Ns", "Object Size": "sd"}
# Result dictionary
res = {}
with open(envFilename) as f:
lines = f.readlines()
# We already read the file and don't need to stay inside the with open block
# Going back one level in indentation closes the file
for line in lines:
line = line.split(":")
if line[0] in env_dict: # No need for .keys()
res_name = env_dict[line[0]] # Name it will have in the result dictionary
if line[0] == "File Title":
# No need for vars()
res[res_name] = line[1].split("#")[0].strip()
else:
if len(line[1].split("#")[0].split(",")) == 1:
# No need for vars()
res[res_name] = float(line[1].split("#")[0].strip())
else:
# No need for vars()
res[res_name] = list(map(float, line[1].split("#")[0].split(",")))
return res
You can call the function similar to this:
env = read_env_inputs(".env")
print(env["filename"])
If you really want to you can assign the result to variables like this (it shouldn't be necessary):
filename = env["filename"]
Ns = env["Ns"]
sd = env["sd"]
Or if you want to use vars() (not best practices):
for name, value in env.items():
vars()[name] = value
Btw this code still contains some duplication. everywhere you have line[1].split("#")[0] you can substitute this for a variable (similar to what is done to res_name).

I have two JSON files. One JSON file is updated as each cycle and values are changed. I want to save each value at each cycle on my second JSON

X values are being taken from a Y.JSON file. The Y file could change in values depending. I want the X file to save all values without overwriting the previous saved value.
# Initialize new dictionary/JSON for X
X = dict ()
with open('X.json', 'w') as f:
f.write(json.dumps(X))
If these cycles do not represent any value/meaning which you'd like to include in the filename, you could try to time encode the filenames. In this way, you end up with file names that include the time the files were saved at.
from datetime import datetime
now = datetime.now()
time = now.strftime("%y%m%d%M%S") # choose any format you like
filename = time+'_X.json'
X = dict ()
with open(filename, 'w') as f:
f.write(json.dumps(X))
For example, if files are being created every 4 seconds gives the following filenames:
2106265848_X.json
2106265852_X.json
2106265856_X.json
2106265900_X.json
2106265904_X.json
2106265908_X.json
2106265912_X.json
2106265916_X.json
2106265920_X.json
However, if the cycles (or whatever experiment you are doing) do matter I would strongly recommend to include it in the filed name.
e.g.
filename = f"{time}_X_c{cycle}.json"
To end up with something like this as results:
2106260547_X_c0.json
2106260551_X_c1.json
2106260555_X_c2.json
2106260559_X_c3.json
2106260603_X_c4.json

How to iterate thru different filenames in python

I have assigned to variables different files. Now I want to make some operations iterating those variables. For example:
reduced_file1= 'names.xlsx'
reduced_file2= 'surnames.xlsx'
reduced_file3= 'city.xlsx'
reduced_file4= 'birth.xlsx'
the operations I want to iterate (with a FOR loop ) are:
xls= pd.ExcelFile(reduced_file1)
xls= pd.ExcelFile(reduced_file2)
xls= pd.ExcelFile(reduced_file3)
xls= pd.ExcelFile(reduced_file4)
...and so on
Basically every time is changing the name of the variable : reduced_file(i)
Thanks
files= ['names.xlsx', 'surnames.xlsx', 'city.xlsx', 'birth.xlsx']
for file in files:
xls = pd.ExcelFile(file)
You can also change string names by using f-strings:
for i in range(4):
print(f"this is number {i}")

Need help adding value to a variable

Here is the whole code section
for entry in auth_log:
# timestamp is converted to milliseconds for CEF
# repr is used to keep '\\' in the domain\username
extension = {
'rt=': str(time.ctime(int(entry['timestamp']))),
'src=': entry['ip'],
'dhost=': entry['host'],
'duser=': repr(entry['username']).lstrip("u").strip("'"),
'outcome=': entry['result'],
'cs1Label=': 'new_enrollment',
'cs1=': str(entry['new_enrollment']),
'cs2Label=': 'factor',
'cs2=': entry['factor'],
'ca3Label=': 'integration',
'cs3=': entry['integration'],
}
log_to_cef(entry['eventtype'], entry['eventtype'], **extension)
In line 5 (rt=), I would like to add the timestamp output to a variable where I can call it later in the script.
You can access the value from the dictionary directly with extension["rt="]?
If you are looking for a way to have a list of all the variables outside of your loop you can use this method.
Before your loop you should make an empty list like this:
extensionRt = []
Then after extension is created inside each loop use:
extensionRt.append(extension["rt="])
You can then access the values in this list by index:
extensionRt[YOUR INDEX HERE]

Extract variable from nested for loop

I am trying to loop through multiple files and extract a calculated variable from each as its own variable name (I.E. max_value[1], max_value[2], ...). Currently using a dictionary to store each variable.
### Create dictionary
max_value = dict()
### Loop through files using glob
for file in glob.glob('files'):
### Do calculations using file variables
calculated_value = 10
### Store calculated value in dictionary
for x in range(1,num_files+1):
max_value[x] = calculated_value
However, the nested for loop overwrites the previous saved max_value with the calculated_value of the last file. How can I avoid rewriting each max_value in the dictionary from the last file's max_value?
I think this is what you want:
### Create dictionary
max_value = dict()
### Loop through files using glob
for i, file in enumerate(glob.glob('files')):
### Do calculations using file variables
calculated_value = 10
### Store calculated value in dictionary
max_value[i] = calculated_value
You never update num_files, so you keep overwriting your own data.

Categories

Resources