Python: how to use glob and wildcard to open CDF files

Python: how to use glob and wildcard to open CDF files - python

I'm trying to open multiple .cdf files and store them in a dictonary, but when I try to use wildcard within the pycdf.CDF() command, this error is returned: spacepy.pycdf.CDFError: NO_SUCH_CDF: The specified CDF does not exist.
The .cdf files have a set initial name (instrumentfile), a date (20010101) and then a variable section (could be 1, 2, 3, or 4). This means that I can't simply write code such as:
DayCDF = pycdf.CDF('/home/location/instrumentfile'+str(dates)+'.cdf')
I also need to change the names of the variables that the .cdf data is assigned to as well, so I'm trying to import the data into a dictionary (also not sure if this is feasible).
The current code looks like this:
dictDayCDF = {}
for x in range(len(dates)):
dictDayCDF["DayCDF"+str(x)] = pycdf.CDF('/home/location/instrumentfile'+str(dates[x])+'*.cdf')
and returns the error spacepy.pycdf.CDFError: NO_SUCH_CDF: The specified CDF does not exist.
I have also tried using glob.glob as I have seen this recommended in answers to similar questions but I have not been able to work out how to apply the command to opening .cdf files:
dictDayCDF = {}
for x in range(len(dates)):
dictDayCDF["DayCDF"+str(x)] = pycdf.CDF(glob.glob('/home/location/instrumentfile'+str(dates[x])+'*.cdf'))
with this error being returned: ValueError: pathname must be string-like
The expected result is a dictionary of .cdf files that can be called with names DayCDF1, DayCDF2, etc that can be imported no matter the end variable section.

How about starting with the following code skeleton:
import glob
for file_name in glob.glob('./*.cdf'):
print(file_name)
#do something else with the file_name
As for the root cause of the error message you're encountering: if you check the documentation of the method you're trying to use, it indicates that
Open or create a CDF file by creating an object of this class.
Parameters:
pathname : string
name of the file to open or create
based on that, we can infer that it's expecting a single file name, not a list of file names. When you try to force a list of file names, that is, the result of using glob, it complains as you've observed.

Related

How to correctly apply a RE for obtaining the last name (of a file or folder) from a given path and print it on Python?

I have wrote a code which creates a dictionary that stores all the absolute paths of folders from the current path as keys, and all of its filenames as values, respectively. This code would only be applied to paths that have folders which only contain file images. Here:
import os
import re
# Main method
the_dictionary_list = {}
for name in os.listdir("."):
if os.path.isdir(name):
path = os.path.abspath(name)
print(f'\u001b[45m{path}\033[0m')
match = re.match(r'/(?:[^\\])[^\\]*$', path)
print(match)
list_of_file_contents = os.listdir(path)
print(f'\033[46m{list_of_file_contents}')
the_dictionary_list[path] = list_of_file_contents
print('\n')
print('\u001b[43mthe_dictionary_list:\033[0m')
print(the_dictionary_list)
The thing is, that I want this dictionary to store only the last folder names as keys instead of its absolute paths, so I was planning to use this re /(?:[^\\])[^\\]*$, which would be responsible for obtaining the last name (of a file or folder from a given path), and then add those last names as keys in the dictionary in the for loop.
I wanted to test the code above first to see if it was doing what I wanted, but it didn't seem so, the value of the match variable became None in each iteration, which didn't make sense to me, everything else works fine.
So I would like to know what I'm doing wrong here.

I would highly recommend to use the builtin library pathlib. It would appear you are interested in the f.name part. Here is a cheat sheet.

I decided to rewrite the code above, in case of wanting to apply it only in the current directory (where this program would be found).
import os
# Main method
the_dictionary_list = {}
for subdir in os.listdir("."):
if os.path.isdir(subdir):
path = os.path.abspath(subdir)
print(f'\u001b[45m{path}\033[0m')
list_of_file_contents = os.listdir(path)
print(f'\033[46m{list_of_file_contents}')
the_dictionary_list[subdir] = list_of_file_contents
print('\n')
print('\033[1;37;40mThe dictionary list:\033[0m')
for subdir in the_dictionary_list:
print('\u001b[43m'+subdir+'\033[0m')
for archivo in the_dictionary_list[subdir]:
print(" ", archivo)
print('\n')
print(the_dictionary_list)
This would be useful in case the user wants to run the program with a double click on a specific location (my personal case)

Rename directory with constantly changing name

I created a script that is supposed to download some data, then run a few processes. The data source (being ArcGIS Online) always downloads the data as a zip file and when extracted the folder name will be a series of letters and numbers. I noticed that these occasionally change (not entirely sure why). My thought is to run an os.listdir to get the folder name then rename it. Where I run into issues is that the list returns the folder name with brackets and quotes. It returns as ['f29a52b8908242f5b1f32c58b74c063b.gdb'] as the folder name while folder in the file explorer does not have the brackets and quotes. Below is my code and the error I receive.
from zipfile import ZipFile
file_name = "THDNuclearFacilitiesBaseSandboxData.zip"
with ZipFile(file_name) as zip:
# unzipping all the files
print("Unzipping "+ file_name)
zip.extractall("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
print('Unzip Complete')
#removes old zip file
os.remove(file_name)
x = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(str(x), "Test.gdb")
Output:
FileNotFoundError: [WinError 2] The system cannot find the file specified: "['f29a52b8908242f5b1f32c58b74c063b.gdb']" -> 'Test.gdb'
I'm relatively new to python scripting, so if there is an easier alternative, that would be great as well. Thanks!

os.listdir() returns a list files/objects that are in a folder.
lists are represented, when printed to the screen, using a set of brackets.
The name of each file is a string of characters and strings are represented, when printed to the screen, using quotes.
So we are seeing a list with a single filename:
['f29a52b8908242f5b1f32c58b74c063b.gdb']
To access an item within a list using Python, you can using index notation (which happens to also use brackets to tell Python which item in the list to use by referencing the index or number of the item.
Python list indexes starting at zero, so to get the first (and in this case only item in the list), you can use x[0].
x = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(x[0], "Test.gdb")
Having said that, I would generally not use x as a variable name in this case... I might write the code a bit differently:
files = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(files[0], "Test.gdb")

Square brackets indicate a list. Try x[0] that should get rid of the brackets and be just the data.
The return from listdir may be a list with only one value or a whole bunch

Accommodating variable filename for python script

I have to read multiple filenames which i will be treating as input for my python script. But the input files may have variable name depending upon the time it got generated.
File1: RM_Sales_Japan_2011201920191124194200.xlsx
File2: RM_Volume_Australia_201120192019154321194200.xlsx
How to accommodate these changes while reading a file instead of exactly specifying the filename every time we run the script?
Things i tried:
I have used below method in my previous scripts because it had only one file with known extension:
xlsxfile = "*.xlsx"
filelocation = "/user/script/" + xlsxfile
But with multiple files with similar extension i am not sure how to get the definition done.
EDIT1:
I was trying to get more clarity on using glob with read_excel. Please see my example code below:
import os
import glob
import pandas as pd
os.chdir ('D:\\Users\\RMoharir\\Downloads\\Smart Spend\\Input')
fls=glob.glob("Medical*.*")
df1 = pd.read_excel(fls, parse_cols = 'A:H', skiprows = 10, header = None)
But this gives me an error:
ValueError: Invalid file path or buffer object type: <class 'list'>
Any help is appreciated.

If you simply need to find all the files that match a given pattern in a directory, os and re modules have you covered.
import os
import re
files = os.listdir()
for file in files:
if re.match(r".*\.xlsx$", file):
print(file)
This short program will print out every file in the current directory whose name ends with .xslx. If you need to match a more complicated pattern, you may need to read up on Regular Expressions
Note that os.listdir takes an optional string argument of what path to look in, if not given it will look in the directory the program was ran from

How to read a file from a directory and convert it to a table?

I have a class that takes in positional arguments (startDate, endDate, unmappedDir, and fundCodes), I have the following methods:
The method below is supposed to take in a an array of fundCodes and look in a directory and see if it finds files matching a certain format
def file_match(self, fundCodes):
# Get a list of the files in the unmapped directory
files = os.listdir(self.unmappedDir)
# loop through all the files and search for matching fund code
for check_fund in fundCodes:
# set a file pattern
file_match = 'unmapped_positions_{fund}_{start}_{end}.csv'.format(fund=check_fund, start=self.startDate, end=self.endDate)
# look in the unmappeddir and see if there's a file with that name
if file_match in files:
# if there's a match, load unmapped positions as etl
return self.read_file(file_match)
else:
Logger.error('No file found with those dates/funds')
The other method is simply supposed to create an etl table from that file.
def read_file(self, filename):
loadDir = Path(self.unmappedDir)
for file in loadDir.iterdir():
print('*' *40)
Logger.info("Found a file : {}".format(filename))
print(filename)
unmapped_positions_table = etl.fromcsv(filename)
print(unmapped_positions_table)
print('*' * 40)
return unmapped_positions_table
When running it, I'm able to retrieve the filename:
Found a file : unmapped_positions_PUPSFF_2018-07-01_2018-07-11.csv
unmapped_positions_PUPSFF_2018-07-01_2018-07-11.csv
But when trying to create the table, I get this error:
FileNotFoundError: [Errno 2] No such file or directory: 'unmapped_positions_PUPSFF_2018-07-01_2018-07-11.csv'
Is it expecting a full path to the filename or something?

The proximate problem is that you need a full pathname.
The filename that you're trying to call fromcsv on is passed into the function, and ultimately came from listdir(self.unmappedDir). This means it's a path relative to self.unmappedDir.
Unless that happens to also be your current working directory, it's not going to be a valid path relative to the current working directory.
To fix that, you'd want to use os.path.join(self.unmappedDir, filename) instead of just filename. Like this:
return self.read_file(os.path.join(self.unmappedDir), file_match)
Or, alternatively, you'd want to use pathlib objects instead of strings, as you do with the for file in loadDir.iterdir(): loop. If file_match is a Path instead of a dumb string, then you can just pass it to read_file and it'll work.
But, if that's what you actually want, you've got a lot of useless code. In fact, the entire read_file function should just be one line:
def read_file(self, path):
return etl.fromcsv(path)
What you're doing instead is looping over every file in the directory, then ignoring that file and reading filename, and then returning early after the first one. So, if there's 1 file there, or 20 of them, this is equivalent to the one-liner; if there are no files, it returns None. Either way, it doesn't do anything useful except to add complexity, wasted performance, and multiple potential bugs.
If, on the other hand, the loop is supposed to do something meaningful, then you should be using file rather than filename inside the loop, and you almost certainly shouldn't be doing an unconditional return inside the loop.

with this:
files = os.listdir(self.unmappedDir)
you're getting the file names of self.unmappedDir
So when you get a match on the name (when generating your name), you have to read the file by passing the full path (else the routine probably checks for the file in the current directory):
return self.read_file(os.path.join(self.unmappedDir,file_match))
Aside: use a set here:
files = set(os.listdir(self.unmappedDir))
so the filename lookup will be much faster than with a list
And your read_file method (which I didn't see earlier) should just open the file, instead of scanning the directory again (and returning at first iteration anyway, so it doesn't make sense):
def read_file(self, filepath):
print('*' *40)
Logger.info("Found a file : {}".format(filepath))
print(filepath)
unmapped_positions_table = etl.fromcsv(filepath)
print(unmapped_positions_table)
print('*' * 40)
return unmapped_positions_table
Alternately, don't change your main code (except for the set part), and prepend the directory name in read_file since it's an instance method so you have it handy.

Attempting to read data from multiple files to multiple arrays

I would like to be able to read data from multiple files in one folder to multiple arrays and then perform analysis on these arrays such as plot graphs etc. I am currently having trouble reading the data from these files into multiple arrays.
My solution process so far is as follows;
import numpy as np
import os
#Create an empty list to read filenames to
filenames = []
for file in os.listdir('C\\folderwherefileslive'):
filenames.append(file)
This works so far, what I'd like to do next is to iterate over the filenames in the list using numpy.genfromtxt.
I'm trying to use os.path join to put the individual list entry at the end of the path specified in listdir earlier. This is some example code:
for i in filenames:
file_name = os.path.join('C:\\entryfromabove','i')
'data_'+[i] = np.genfromtxt('file_name',skiprows=2,delimiter=',')
This piece of code returns "Invalid syntax".
To sum up the solution process I'm trying to use so far:
1. Use os.listdir to get all the filenames in the folder I'm looking at.
2. Use os.path.join to direct np.genfromtxt to open and read data from each file to a numpy array named after that file.
I'm not experienced with python by any means - any tips or questions on what I'm trying to achieve are welcome.

For this kind of task you'd want to use a dictionary.
data = {}
for file in os.listdir('C\\folderwherefileslive'):
filenames.append(file)
path = os.path.join('C:\\folderwherefileslive', i)
data[file] = np.genfromtxt(path, skiprows=2, delimiter=',')
# now you could for example access
data['foo.txt']
Notice, that everything you put within single or double quotes ends up being a character string, so 'file_name' will just be some characters, whereas using file_name would use the value stored in variable by that name.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: how to use glob and wildcard to open CDF files - python

Related

How to correctly apply a RE for obtaining the last name (of a file or folder) from a given path and print it on Python?

Rename directory with constantly changing name

Accommodating variable filename for python script

How to read a file from a directory and convert it to a table?

Attempting to read data from multiple files to multiple arrays

Categories

Resources