Trying to write a program. Where inputs & outputs names are listed at the very beginning.
After running through it will then the output be generated.
Eg.
### First step. import files and assign names
df1= pd.read_csv(r'df1.csv',low_memory=False)
output file name = final_output
### final step. Output files and name it as 'final_output.csv'
df_final.to_csv('output file name.csv')
What I'm trying to is being able to define the name of the output file at the very beginning, then reference it at the end. Not manually name it at the very end of the program.
Something in SAS would be : Define A = 'output file name'. Reference it using "&A" at the very end.
But how to make it happen in python?
As one of the commenters mentioned, you could include the extension in the filename and then just pass that variable to .to_csv(). If that's not possible, it seems to me like you're looking to use string formatting. You could try this:
df1= pd.read_csv(r'df1.csv',low_memory=False)
output file name = 'final_output'
### final step. Output files and name it as 'final_output.csv'
df_final.to_csv(f'{output file name}.csv')
Using f-strings like this is a more compact and clear way to do string formatting and concatenation to use variable in-line.
Related
I need to download a file that is automated on SharePoint. The thing is this file has the following filename structure:
fileYYYYMmm.xlsb
Example: file2022M03.xlsb
I must refer to this file using a wildcard or something (I don't know what exactly), to get dynamically that file.
Example: file????M??.xlsb
I'm using the following line code
download_path = sp.create_link(f'https://enterprise.sharepoint.com/:x:/r/sites/GLB-GIS-PERISCPE/Shared%20Documents/TMS_Ch/file/file'+str(yy)+'M'+'??'+'.xlsb')
How can I do this in Python?
It's pretty easy with an f string. You just need to reference the variable in curly braces like this:
the_year = '22'
the_month = '03'
# print(f'https://enterprise.sharepoint.com/:x:/r/sites/GLB-GIS-PERISCPE/Shared%20Documents/TMS_Ch/file/file{the_year}M{the_month}.xlsb')
# https://enterprise.sharepoint.com/:x:/r/sites/GLB-GIS-PERISCPE/Shared%20Documents/TMS_Ch/file/file22M03.xlsb
download_path = sp.create_link(f'https://enterprise.sharepoint.com/:x:/r/sites/GLB-GIS-PERISCPE/Shared%20Documents/TMS_Ch/file/file{the_year}M{the_month}.xlsb')
I am assuming you only need one file and you will have variable name for year and month
You can use F-string like this:
download_path = sp.create_link(f'https://enterprise.sharepoint.com/:x:/r/sites/GLB-GIS-PERISCPE/Shared%20Documents/TMS_Ch/file/file{yy_var}M{mm_var}.xlsb')
Here yy_var will have store the year and mm_var will store the month.
I'm trying to open multiple .cdf files and store them in a dictonary, but when I try to use wildcard within the pycdf.CDF() command, this error is returned: spacepy.pycdf.CDFError: NO_SUCH_CDF: The specified CDF does not exist.
The .cdf files have a set initial name (instrumentfile), a date (20010101) and then a variable section (could be 1, 2, 3, or 4). This means that I can't simply write code such as:
DayCDF = pycdf.CDF('/home/location/instrumentfile'+str(dates)+'.cdf')
I also need to change the names of the variables that the .cdf data is assigned to as well, so I'm trying to import the data into a dictionary (also not sure if this is feasible).
The current code looks like this:
dictDayCDF = {}
for x in range(len(dates)):
dictDayCDF["DayCDF"+str(x)] = pycdf.CDF('/home/location/instrumentfile'+str(dates[x])+'*.cdf')
and returns the error spacepy.pycdf.CDFError: NO_SUCH_CDF: The specified CDF does not exist.
I have also tried using glob.glob as I have seen this recommended in answers to similar questions but I have not been able to work out how to apply the command to opening .cdf files:
dictDayCDF = {}
for x in range(len(dates)):
dictDayCDF["DayCDF"+str(x)] = pycdf.CDF(glob.glob('/home/location/instrumentfile'+str(dates[x])+'*.cdf'))
with this error being returned: ValueError: pathname must be string-like
The expected result is a dictionary of .cdf files that can be called with names DayCDF1, DayCDF2, etc that can be imported no matter the end variable section.
How about starting with the following code skeleton:
import glob
for file_name in glob.glob('./*.cdf'):
print(file_name)
#do something else with the file_name
As for the root cause of the error message you're encountering: if you check the documentation of the method you're trying to use, it indicates that
Open or create a CDF file by creating an object of this class.
Parameters:
pathname : string
name of the file to open or create
based on that, we can infer that it's expecting a single file name, not a list of file names. When you try to force a list of file names, that is, the result of using glob, it complains as you've observed.
Does anyone know of a way to name a list in python using a String. I am writing a script that iterates through a directory and parses each file and and generates lists with the contents of the file. I would like to use the filename to name each array. I was wondering if there was a way to do it similar to the exec() method but using lists instead of just a normal variable
If you really want to do it this way, then for instance:
import os
directory = os.getcwd() # current directory or any other you would like to specify
for name in os.listdir(directory):
globals()[name] = []
Each of the lists can be now referenced with the name of the file. Of course, this is a suboptimal approach, normally you should use other data structures, such as dictionaries, to perform your task.
You would be better off using a dictionary. Store the file name as the key value of the dictionary and place the contents inside the corresponding value for the key.
It's like
my_dict = {'file1.txt':'This is the contents of file1','file2.txt':'This is the content of file2'}
I have selected the variables I need based on a string within the variable name. I'm not sure how to keep only these variables from my SPSS file.
begin program.
import spss,spssaux
spssaux.OpenDataFile(r'XXXX.sav')
target_string = 'qb2'
variables = [var for var in spssaux.GetVariableNamesList() if target_string in var]
vars = spssaux.VariableDict().expand(variables)
nvars=len(vars)
for i in range(nvars):
print vars[i]
spss.Submit(r"""
SAVE OUTFILE='XXXX_reduced.sav'.
ADD FILES FILE=* /KEEP \n %s.
""" %(vars))
end program.
The list of variables that it prints out is correct, but it's falling over trying to KEEP them. I'm guessing it's something to do with not activating a dataset or bringing in the file again as to why there's errors?
Have you tried reversing the order of the SAVE OUTFILE and ADD FILES commands? I haven't run this in SPSS via Python, but in standard SPSS, your syntax will write the file to disk, and then select the variables for the active version in memory--so if you later access the saved file, it will be the version before you selected variables.
If that doesn't work, can you explain what you mean by falling over trying to KEEP them?
It appears that the problem has been solved, but I would like to point out another solution that can be done without writing any Python code. The extension command SPSSINC SELECT VARIABLES defines a macro based on properties of the variables. This can be used in the ADD FILES command.
SPSSINC SELECT VARIABLES MACRONAME="!selected"
/PROPERTIES PATTERN = ".*qb2".
ADD FILES /FILE=* /KEEP !selected.
The SELECT VARIABLES command is actually implemented in Python. Its selection criteria can also include other metadata such as type and measurement level.
You'll want to use the ADD FILES FILE command before the SAVE for your saved file to be the "reduced" file
I think your very last line in the python program should be trying to join the elements in the list vars. For example: %( " ".join(vars) )
I am trying to reproduce a python program, which includes the following line of code
data = glob(os.path.join("./data", config.dataset, "*.jpg"))
My guess is that it will capture all .jpg files stored in /data folder. But I am not sure the usage of config.dataset here? Should the folder structure look like /data/config.dataset/*.jpg The reason I need to understand this is because I need to create a data input folder to run the program. The original program does not share the detail on the data organization.
config.dataset in your code fragment is a variable. It's either a dataset attribute of some config object, or the dataset global variable in an imported config module (from this code's perspective they work the same).
As a few people have commented, for that code to work, config.dataset must evaluate to a string, probably a single directory name. So the result of the join call will be something like "./data/images/*.jpg" (if config.dataset is "images"). The variable could also have a (pre-joined) path section including one or more slashes. For instance, if config.dataset was "path/to/the/images", you'd end up with "./data/path/to/the/images/*.jpg".