Use of global dictionary to create variables - python

I need to create an unknown number of python variables, based on a list of file in a folder.
I found that I could use the global dictionary to create and initialize those variables:
# libraries import
import os.path
import glob
import numpy as np
# list of all the text files in the folder
list = glob.glob("*.txt")
# creation of the variables based on the name of each file
for file in list:
shortname = os.path.splitext(file)[0]
globals()[shortname] = np.loadtxt(file)
However, I was wondering if it was a good practice to access the global dictionary for variable assignment in python (when we do not know the number and name of the variables in advance) or if there was an alternative method preferable.

You should use a dedicated dictionary for this:
files = {f: np.loadtxt(f) for f in glob.glob("*.txt")}
Generally, you should not mix data and variable or attribute names. Your code could shadow just any Python built-in if a file with the same name exists.

No, you probably shouldn't be using globals for this. Instead, create a dictionary or class and store the values in that.

Related

Python: how to use glob and wildcard to open CDF files

I'm trying to open multiple .cdf files and store them in a dictonary, but when I try to use wildcard within the pycdf.CDF() command, this error is returned: spacepy.pycdf.CDFError: NO_SUCH_CDF: The specified CDF does not exist.
The .cdf files have a set initial name (instrumentfile), a date (20010101) and then a variable section (could be 1, 2, 3, or 4). This means that I can't simply write code such as:
DayCDF = pycdf.CDF('/home/location/instrumentfile'+str(dates)+'.cdf')
I also need to change the names of the variables that the .cdf data is assigned to as well, so I'm trying to import the data into a dictionary (also not sure if this is feasible).
The current code looks like this:
dictDayCDF = {}
for x in range(len(dates)):
dictDayCDF["DayCDF"+str(x)] = pycdf.CDF('/home/location/instrumentfile'+str(dates[x])+'*.cdf')
and returns the error spacepy.pycdf.CDFError: NO_SUCH_CDF: The specified CDF does not exist.
I have also tried using glob.glob as I have seen this recommended in answers to similar questions but I have not been able to work out how to apply the command to opening .cdf files:
dictDayCDF = {}
for x in range(len(dates)):
dictDayCDF["DayCDF"+str(x)] = pycdf.CDF(glob.glob('/home/location/instrumentfile'+str(dates[x])+'*.cdf'))
with this error being returned: ValueError: pathname must be string-like
The expected result is a dictionary of .cdf files that can be called with names DayCDF1, DayCDF2, etc that can be imported no matter the end variable section.
How about starting with the following code skeleton:
import glob
for file_name in glob.glob('./*.cdf'):
print(file_name)
#do something else with the file_name
As for the root cause of the error message you're encountering: if you check the documentation of the method you're trying to use, it indicates that
Open or create a CDF file by creating an object of this class.
Parameters:
pathname : string
name of the file to open or create
based on that, we can infer that it's expecting a single file name, not a list of file names. When you try to force a list of file names, that is, the result of using glob, it complains as you've observed.

Naming a list in python using a String name

Does anyone know of a way to name a list in python using a String. I am writing a script that iterates through a directory and parses each file and and generates lists with the contents of the file. I would like to use the filename to name each array. I was wondering if there was a way to do it similar to the exec() method but using lists instead of just a normal variable
If you really want to do it this way, then for instance:
import os
directory = os.getcwd() # current directory or any other you would like to specify
for name in os.listdir(directory):
globals()[name] = []
Each of the lists can be now referenced with the name of the file. Of course, this is a suboptimal approach, normally you should use other data structures, such as dictionaries, to perform your task.
You would be better off using a dictionary. Store the file name as the key value of the dictionary and place the contents inside the corresponding value for the key.
It's like
my_dict = {'file1.txt':'This is the contents of file1','file2.txt':'This is the content of file2'}

Query Related to Python - Folders Read

I want to read folders in python and probably make a list of it. Now my main concern is that most recent folder should be at location that is known to me. It can be the first element or last element of list. I am attaching image suggesting folders name. I want folder with name 20181005 either first in the list or last in the list.
I have tried this task and used os.listdir, but I am not very much confident on the way this function reads folders and store in list form. Would it store first folder as element or will it use creation date or modification date. If I could sort on the basis of name (20181005 etc), it would be really good.
Kindly suggest suitable method for the same.
Regards
os.listdir returns directory contents in arbitrary order, but you can sort that yourself:
l = sorted(listdir())
Since it seems that your folder names are ISO dates, they should sort correctly and the most recent one should be the last element after sorting.
If you need to access creation & modification times you can do that with os.path functions. If you want to sort by that, I would probably choose to put it in something like a pandas DataFrame to make it easier to manipulate.
import os
from datetime import datetime
import pandas as pd
path = "."
objects = os.listdir(path)
dirs = list()
for o in objects:
opath = os.path.join(path, o)
if os.path.isdir(opath):
dirs.append(dict(path=opath,
mtime=datetime.fromtimestamp(os.path.getmtime(opath)),
ctime=datetime.fromtimestamp(os.path.getctime(opath))))
data = pd.DataFrame(dirs)
data.sort_values(by='mtime')
Assumed, your directories has YYYYMMDD format naming. Then you can use listdir and sort to get the latest directory in last index.
import os
from os import listdir
mypath = 'D:\\anil'
list_dirs = []
for f in listdir(mypath):
if os.path.isdir(os.path.join(mypath, f)):
list_dirs.append(f)
list_dirs.sort()
for current_dir in list_dirs:
print(current_dir)

Iterating files in directory by name

Ok, I have a directory with many files and subdirectories; among these ones there are 20 directories called mbr001, mbr002, ... until mbr020 that are the ones I am interested in.
I want to write a program that iteratively goes into mbr001, do somethings, then to mbr002, do the same things, and so on.
A solution with Python's os.chdir('./mbr001') until 20 seems pretty inefficient.
Besides, when I am in such directories, let's say for example mbr012, I want to create a variable that has in its name the number of the mbr where I am, in this case 12. Something like variable = 'mbr012_file.dat'. (This variable is used in what I am doing inside each directory, not relevant here).
What I would need would be something like this (note this is pseudo-code):
for i in mbr[i]:
variable = "mbr[i]_file.dat"
...
How can I do the loop and the variable naming? Thanks in advance.
What about something like this ?
for i in range(1, 21):
dn = "./mbr{:03}".format(i)
var = "mbr{:03}_file.dat".format(i)
os.chdir(dn)
# Do your stuff here
#
#
os.chdir("..")
Use format:
for i in list_of_is:
filename = "mbr{0:02d}_file.dat".format(i)
You just need to concatenate '_files.dat' into the directory name
import re
for i in mbr_list:
variable = i + '_files.dat'
# use regex if you only interest on the numeric part
# variable = re.sub('mbr(\d+)', r'mbr\1_file.dat', i)
# do your thing here
In python 3.4 and above you can use pathlib and in python 3.6 and above you can use "f string" to format text
from pathlib import Path
for path in [d for d in Path.cwd().iterdir() if d.match("mbr*")]:
variable = f"{path.name}_file.dat"
# do other stuff

How to perform os.environ join in python?

I have a configuration of os.environ with default values (that cover 90% of my needs). I have a special application-framework-package, for example called SALOME, that does not provide package installation into system environment and tries to be self contained, it also requires use of special old technologies that rely on environmental variables thus sys.path and PYTHONPATH are not the only things it needs. I can get all variables it needs when it started calling os.environ inside an environment it creates. I can then serialize that os.environ dictionary.
I wonder how to apply a merge of os.environ I have on my currently running system with one I obtained by serializing?
Let's assume you have done something like the following to serialize the environment:
import json
import os
with open('environ.json', 'w') as f:
json.dump(dict(**os.environ), f)
You can now read those back like this (in another program)
import json
import os
with open('environ.json', 'r') as f:
os.environ.update(json.load(f))
This will only add or change the current environment variables to match the saved ones, but any additional variables will remain.
If you want to update only specific variables by adding them (so for instance to add extra paths), you can do that explicitly:
with open('environ.json', 'r') as f:
loadedenv = json.load(f)
pathvars = ['PATH', 'PYTHONPATH']
for p in pathvars:
os.environ[p] += ':' + loadedenv[p]
You can use the package environs to achieve exporting os.environ dictionary. It has inbuilt dumper/loader for exporting importing the environment variables.
from environs import Env
env = Env()
# reading an environment variable
gh_user = env('GITHUB_USER') # => 'sloria'
secret = env('SECRET') # => raises error if not set
# casting
api_key = env.str('API_KEY') # => '123abc'
date = env.date('SHIP_DATE') # => datetime.date(1984, 6, 25)
# serialize to a dictionary of simple types (numbers and strings)
env.dump()
# { 'API_KEY': '123abc',
# 'GITHUB_USER': 'sloria',
# 'SECRET': 'AASJI93WSJD93DWW3X0912NS2',
# 'SHIP_DATE': '1984-06-25'}}
If you want to have multiple values for a dictionary which the standard python dictionary does not offer than you can use
werkzeug.datastructures.MultiDict
os.environ = MultiDict([('Key1', 'First Value'), ('Key1', 'Second Value')])
The update will also work the same way as I have mentioned below.
If you do not want to preserve the old key values before you merge with the new dictionary then you can do the following.
Since os.environ is a dictionary that you already have in memory the other dict is the one you are reading from would need to be converted into json. I generally use ujson since it is really fast.
os.environ.update(new_dict)
If you want to save the json u can dump it to a file.
import ujson
with open('data.json', 'w') as file:
ujson.dump(dict(**os.environ), file)
if you want to read the file and update the os.environ dictionary than you can use.
with open('environ.json', 'r') as f:
os.environ.update(ujson.load(f))

Categories

Resources