Fast way to read filename from directory?

Fast way to read filename from directory? - python

Given a local directory structure of /foo/bar, and assuming that a given path contains exactly one file (filename and content does not matter), what is a reasonably fast way to get the filename of that single file (NOT the file content)?

1st element of os.listdir()
import os
os.listdir('/foo/bar')[0]

Well I know this code works...
for file in os.listdir('.'):
#do something

you can also use glob
import glob
print glob.glob("/path/*")[0]

os.path.basename will return the file name for you
so you can use it for the exact one file by adding your file path :
os.path.basename("/foo/bar/file.file")
or you can run through the files in the folder and read all names
file_src = "/foo/bar/"
for x in os.listdir(file_src):
print(os.path.basename(x))

Related

python: set file path to only point to files with a specific ending

I am trying to run a program with requires pVCF files alone as inputs. Due to the size of the data, I am unable to create a separate directory containing the particular files that I need.
The directory contains multiple files with 'vcf.gz.tbi' and 'vcf.gz' endings. Using the following code:
file_url = "file:///mnt/projects/samples/vcf_format/*.vcf.gz"
I tried to create a file path that only grabs the '.vcf.gz' files while excluding the '.vcf.gz.tbi' but I have been unsuccesful.

The code you have, as written, is just assigning your file path to the variable file_url. For something like this, glob is popular but isn't the only option:
import glob, os
file_url = "file:///mnt/projects/samples/vcf_format/"
os.chdir(file_url)
for file in glob.glob("*.vcf.gz"):
print(file)
Note that the file path doesn't contain the kind of file you want (in this case, a gzipped VCF), the glob for loop does that.
Check out this answer for more options.
It took some digging but it looks like you're trying to use the import_vcf function of Hail. To put the files in a list so that it can be passed as input:
import glob, os
file_url = "file:///mnt/projects/samples/vcf_format/"
def get_vcf_list(path):
vcf_list = []
os.chdir(path)
for file in glob.glob("*.vcf.gz"):
vcf_list.append(path + "/" + file)
return vcf_list
get_vcf_list(file_url)
# Now you pass 'get_vcf_list(file_url)' as your input instead of 'file_url'
mt = hl.import_vcf(get_vcf_list(file_url), force_bgz=True, reference_genome="GRCh38", array_elements_required=False)

Find efficiently a file with unknown extension

I have a problem that feels easy, but I cannot come up with a satisfying solution.
I have a file structure with a directory containing a very large number of files. The file names are just their index with an unknown extension. For example, the 10th file is "10.pdf" and the 42th file is "42.png". There can be many different extensions.
I need to access the i-th file from python, given index i but not knowing the extension. This will happen a lot, so I should be able to do it efficiently.
Here are the partial solutions I could think about:
I can glob the pattern f"{i}.*"
However, I think glob will check every file in the directory? This will be very slow for a large number of files.
I can save and preload the full name in a dict, in a JSON file like {..., 10: "10.pdf", ...}
This works, but I have to load and keep track of another heavy object. This feels wrong somehow...
If I have a list of all allowed extensions, I can just test all possibilities. This feels weird and unnecessary, but that's my best guess for now.
What do you think ? Is one of those proposal the correct way to do it ?

As I think, you only need the file name instead full filename+ext. So, one way is to remove the extension from the file, for example:
import os
path = r"Enter your folder's path here"
file_dict = {}
for file in os.listdir(path):
if os.path.isfile(file): # because os.listdir return both files and folders
file_name, ext = os.path.splitext(file)
print(file_name, ext)
For example, if your file is '10.pdf' then file_name='10' and ext='.pdf'. Then you can add it to a dictionary for the future:
file_dict[file_name] = os.path.join(path, file)
Another way is using regular expressions or "re"! if you have a patter(even complex pattern) 're' is awesome! You need to type your desired pattern, for example:
import os
import re
path = r"Enter your folder's path here"
file_dict = {}
for file in os.listdir(path):
if os.path.isfile(file):
mo = re.search(r'(.*\)(..*)', file)
file_name, ext = mo.groups()
print(file_name, ext)

Check if mp3 file is in given directory

I can't see what i am doing wrong here
import os
def check():
path = "d://Mu$ic//"
for file in os.listdir(path):
if file.endswith("*.mp3"):
print (os.path.join(path,file))
check()
The output doesn't show up , any idea what am i doing wrong?

Omitting the * should work, since endswith matches on a string suffix (as opposed to *.mp3 which is a wildcard):
file.endswith(".mp3"):
Also, you could replace // with just /.

Is only mp3 files in the folder? If so why do you need the check for endswith? If mp3 is the only file type in that folder it should be able to return them without checking if they end with .mp3. Unless you have other file types in there then you would need the endswith check. I did your code here without endswith and it returned every .wav file in the folder.

If you are using python 3, you also can use glob
import glob
_path = '{0}**/*.mp3'.format("d://Mu$ic//")
file_lst = glob.glob(_path, recursive=True)
print(file_lst)
['2.mp3', '3.mp3', ...]

how to get name of a file in directory using python

There is an mkv file in a folder named "export". What I want to do is to make a python script which fetches the file name from that export folder.
Let's say the folder is at "C:\Users\UserName\Desktop\New_folder\export".
How do I fetch the name?
I tried using this os.path.basename and os.path.splitext .. well.. didn't work out like I expected.

os.path implements some useful functions on pathnames. But it doesn't have access to the contents of the path. For that purpose, you can use os.listdir.
The following command will give you a list of the contents of the given path:
os.listdir("C:\Users\UserName\Desktop\New_folder\export")
Now, if you just want .mkv files you can use fnmatch(This module provides support for Unix shell-style wildcards) module to get your expected file names:
import fnmatch
import os
print([f for f in os.listdir("C:\Users\UserName\Desktop\New_folder\export") if fnmatch.fnmatch(f, '*.mkv')])
Also as #Padraic Cunningham mentioned as a more pythonic way for dealing with file names you can use glob module :
map(path.basename,glob.iglob(pth+"*.mkv"))

You can use glob:
from glob import glob
pth ="C:/Users/UserName/Desktop/New_folder/export/"
print(glob(pth+"*.mkv"))
path+"*.mkv" will match all the files ending with .mkv.
To just get the basenames you can use map or a list comp with iglob:
from glob import iglob
print(list(map(path.basename,iglob(pth+"*.mkv"))))
print([path.basename(f) for f in iglob(pth+"*.mkv")])
iglob returns an iterator so you don't build a list for no reason.

I assume you're basically asking how to list files in a given directory. What you want is:
import os
print os.listdir("""C:\Users\UserName\Desktop\New_folder\export""")
If there's multiple files and you want the one(s) that have a .mkv end you could do:
import os
files = os.listdir("""C:\Users\UserName\Desktop\New_folder\export""")
mkv_files = [_ for _ in files if _[-4:] == ".mkv"]
print mkv_files

If you are searching for recursive folder search, this method will help you to get filename using os.walk, also you can get those file's path and directory using this below code.
import os, fnmatch
for path, dirs, files in os.walk(os.path.abspath(r"C:/Users/UserName/Desktop/New_folder/export/")):
for filename in fnmatch.filter(files, "*.mkv"):
print(filename)

You can use glob
import glob
for file in glob.glob('C:\Users\UserName\Desktop\New_folder\export\*.mkv'):
print(str(file).split('\')[-1])
This will list out all the files having extention .mkv as
file.mkv, file2.mkv and so on.

From os.walk you can read file paths as a list
files = [ file_path for _, _, file_path in os.walk(DIRECTORY_PATH)]
for file_name in files[0]: #note that it has list of lists
print(file_name)

Find directory of a file in Python

If I have the following file:
file = '/Users/david542/Desktop/work.txt'
I can use os.path.basename(file) to get the file name.
What command would I use to get the directory of the file (i.e., to get "/Users/david542/Desktop") ?

os.path.dirname(file) returns the directory of the passed file name. Alternatively, you can use os.path.split(file) which will give you a tuple containing the directory name and the file name in one call.

>>> os.path.dirname(os.path.realpath('/Users/david542/Desktop/work.txt'))

os.path.dirname(file) will yield directory name.
import os
print(os.path.dirname("c:/windows/try.txt"))

I think you're searching for os.path.dirname. Otherwise you could use os.path.split which returns the path and the filename in a tuple.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fast way to read filename from directory? - python

Given a local directory structure of /foo/bar, and assuming that a given path contains exactly one file (filename and content does not matter), what is a reasonably fast way to get the filename of that single file (NOT the file content)?

1st element of os.listdir() import os os.listdir('/foo/bar')[0]

Well I know this code works... for file in os.listdir('.'): #do something

you can also use glob import glob print glob.glob("/path/*")[0]

Related

python: set file path to only point to files with a specific ending

Find efficiently a file with unknown extension

Check if mp3 file is in given directory

how to get name of a file in directory using python

Find directory of a file in Python

Categories

Resources