regarding the parameters in os.path.join - python

I am trying to reproduce a python program, which includes the following line of code
data = glob(os.path.join("./data", config.dataset, "*.jpg"))
My guess is that it will capture all .jpg files stored in /data folder. But I am not sure the usage of config.dataset here? Should the folder structure look like /data/config.dataset/*.jpg The reason I need to understand this is because I need to create a data input folder to run the program. The original program does not share the detail on the data organization.

config.dataset in your code fragment is a variable. It's either a dataset attribute of some config object, or the dataset global variable in an imported config module (from this code's perspective they work the same).
As a few people have commented, for that code to work, config.dataset must evaluate to a string, probably a single directory name. So the result of the join call will be something like "./data/images/*.jpg" (if config.dataset is "images"). The variable could also have a (pre-joined) path section including one or more slashes. For instance, if config.dataset was "path/to/the/images", you'd end up with "./data/path/to/the/images/*.jpg".

Related

How do I tell a .py script to look for folders and file in the directory it is in?

I have a .py script that pulls in data from Google Sheets and outputs it in a yaml format. This is for my Hugo powered website which is served via Netlify. As I understand it, Netlify is capable of running Python too so I thought I could upload the web content and the python file in the same directory. This is required for updating the content dynamically, and I expect the python file to run everytime I trigger a build for the website. However, the python file requires certain credentials to work.
My code currently looks like this:
# Set location to write new files to.
outputpath = Path("D:/content/submission/")
#Read JSON:
json = Path("D:/credentials.json")
These are hardcoded local paths. When I bundle the script in the website directory, what paths should I write in so that when the script runs, these files are read in and outputted correctly?
I would want to output in my content/submission folder and read in from my creds/credentials.json. Should I just put these paths in? Will it know that it has to look within the directory for these folders, or is there something I need to add to the script that tells it to work within the directory it is sitting in?
🧨 First, credentials and secrets are best kept out of files (and esp, esp, source control).
For general file locations however, you can use something like:
pa_same_but_json = Path(__file__).with_suffix(".json")
pa_same_directory = Path(__file__).parent / "nosecrets.json")
To answer your comments:
(Mind you not 100% sure about Window Drives in the following):
parent
is an attribute on Path objects allowing you to "climb up" hierarchy.
Path("c:\temp\foo.json").parent returns same as Path("c:\temp")
Yes, you can do mypath.parent.parent
/ is a path concatenation operator
when applied to Path objects
So
myfile = os.path.join(["c:", "temp", "foo.json"])
and
myfile_as_a_Path = Path("c:") / "temp" / "foo.json"
are the same, except for one being a string, the other a Path instance. Once the first Path has been built (on C:) the rest of the code "knows that it operating on Path instances" and re-purposes the division operator support (probably some magic __div__ method intended for instance math ) to support path concatenation. This happens because most operations on Path instances return another Path, allowing you to do this type of chaining.
It's best not to write way up the hierarchy in a hosted/VM context (you never know directory structure above or if you have permissions), but something based on your script location might be
pa_current = Path(__file__).parent
# could `content/submission` but that's assuming you're always on Posix
# systems. Letting Pathlib do the work is safer, even if Windows probably
# puts up with `/`
pa_write = pa_current / "content" / "submission"
pa_read = pa_current / "credentials.json"
These at this points are Path instances, but really not much different than strings except having smarter methods to manipulate them. They don't know or care if the files exist or not.
P.S.
🧨 A consideration is that, in many web contexts, writing to code directories (like what happens in a content/submission under the python scripts) is a security goof as well.
Maybe pa_write = pa_current.parent.parent / "uploads" / "content" / "submission" would be better.
Specifically when it comes to user uploads and secrets, please refer to best practices for your platform, not just what Python can do. This answer was about pathlib.Path, not Hugo uploads.

Python in SPSS - KEEP variables

I have selected the variables I need based on a string within the variable name. I'm not sure how to keep only these variables from my SPSS file.
begin program.
import spss,spssaux
spssaux.OpenDataFile(r'XXXX.sav')
target_string = 'qb2'
variables = [var for var in spssaux.GetVariableNamesList() if target_string in var]
vars = spssaux.VariableDict().expand(variables)
nvars=len(vars)
for i in range(nvars):
print vars[i]
spss.Submit(r"""
SAVE OUTFILE='XXXX_reduced.sav'.
ADD FILES FILE=* /KEEP \n %s.
""" %(vars))
end program.
The list of variables that it prints out is correct, but it's falling over trying to KEEP them. I'm guessing it's something to do with not activating a dataset or bringing in the file again as to why there's errors?
Have you tried reversing the order of the SAVE OUTFILE and ADD FILES commands? I haven't run this in SPSS via Python, but in standard SPSS, your syntax will write the file to disk, and then select the variables for the active version in memory--so if you later access the saved file, it will be the version before you selected variables.
If that doesn't work, can you explain what you mean by falling over trying to KEEP them?
It appears that the problem has been solved, but I would like to point out another solution that can be done without writing any Python code. The extension command SPSSINC SELECT VARIABLES defines a macro based on properties of the variables. This can be used in the ADD FILES command.
SPSSINC SELECT VARIABLES MACRONAME="!selected"
/PROPERTIES PATTERN = ".*qb2".
ADD FILES /FILE=* /KEEP !selected.
The SELECT VARIABLES command is actually implemented in Python. Its selection criteria can also include other metadata such as type and measurement level.
You'll want to use the ADD FILES FILE command before the SAVE for your saved file to be the "reduced" file
I think your very last line in the python program should be trying to join the elements in the list vars. For example: %( " ".join(vars) )

Is there a set name I need to give my .py file containing the main function?

Whenever I start a project I have to think of a name for the first python file I start with. More often than not I simply give it the same name as the project folder (i.e. if folder name is project-x I often name the python file projectX.py).
Is there a particular name I need to give the first python file I start out with (such as main.py perhaps)?
When we are deciding on a name for your file you should keep in mind the following basic rules:
A file name chosen should be a short name
All lowercase should be used when selecting a name for the file
The file name can also contain an underscore().
Usually, the main.py includes 'top-level' code (it's called top-level because it is used to import other modules that the program needs in order to be run and it will be run first). Check the following documentation for more details.
There are no specific rules mandated by the language or the surrounding ecosystem. The only useful guidance is to avoid file names which make it hard to import your code, i.e. don't put dots (beyond the one just before the .py extension) or dashes in the file name.

What directory does os.path.join start at?

I made a script in the past to mass rename any file greater than x characters in a directory. When I made that script I had a source directory which you would need to input manually. Any file that was over x characters in that directory would be stripped of it's extension, renamed, then the extension would be re added and it would use os.path.join to join the source and the newly created filename+ext. I'm now making another script and used os.path.join("Folder in the current dir", "file in that dir"). Because this worked I'm guessing that when os.path.join is called with just a foldername and no full path in it's first parameter it starts it's search from the directory that the script it was run in? Just wondering if this is correct.
os.path.join has nothing to do with any actual filesystem, and does not "start" anywhere. It simply joins two arbitrary paths, whether they exist or not.
What os.path.join does is to just join path elements the system-compatible way, taking into effect the particular directory separator character, etc., into account. It's a simple string manipulation tool.
So the returned result simply starts from whatever you give to it as the first argument.

Store user defined data after inputted

I am making a python program, and I want to check if it is the users first time running the program (firstTime == True). After its ran however, I want to permanently change firstTime to False. (There are other variables that I want to take input for that will stay if it is the first run, but that should be solved the same way).
Is there a better way then just reading from a file that contains the data? If not, how can I find where the file is being ran from (so the data will be in the same dir)?
If you want to persist data, it will "eventually" be to disk files (though there might be intermediate steps, e.g. via a network or database system, eventually if the data is to be persistent it will be somewhere in disk files).
To "find out where you are",
import os
print os.path.dirname(os.path.abspath(__file__))
There are variants, but this is the basic idea. __file__ in any .py script or module gives the file path in which that file resides (won't work on the interactive command line, of course, since there's no file involved then;-).
The os.path module in Python's standard library has many useful function to manipulate path strings -- here, we're using two: abspath to give an absolute (not relative) version of the file's path, so you don't have to care about what your current working directory is; and dirname to extract just the directory name (actually, the whole directory path;-) and drop the filename proper (you don't care if the module's name is foo.py or bar.py, only in what directory it is;-).
It is enough to just create file in same directory if program is run first time (of course that file can be deleted to do stuff for first run again, but that can be sometimes usefull):
firstrunfile = 'config.dat'
if not os.path.exists(firstrunfile):
## configuration here
open(firstrunfile,'w').close() ## .write(configuration)
print 'First run'
firstTime == True
else:
print 'Not first run'
## read configuration
firstTime == False

Categories

Resources