I have a script I run daily to compile a bunch of spreadsheets into one. Well after a year of running one of the filenames changed due to it being produced 14 seconds later. I read the filename in like this
uproduction = Path(r"\\server\folder\P"+year+month+day+r"235900.xls")
and then df = pd.read_excel(upreduction)
This was working fine until the file name changed to P20210225235914.xls . When I am using a raw string like that is there a way I can make it pick any file that starts with P20210225*.xls ? I can't seem to find exactly what i'm looking for for in the docs
You can use glob:
from glob import glob
glob(r"\\server\folder\P"+year+month+day+"*.xls")
You can use the glob method on the Path:
for file in Path(r'\\server\folder\').glob(r'P20210225*.xls'):
print(file.name)
Related
I have to read multiple filenames which i will be treating as input for my python script. But the input files may have variable name depending upon the time it got generated.
File1: RM_Sales_Japan_2011201920191124194200.xlsx
File2: RM_Volume_Australia_201120192019154321194200.xlsx
How to accommodate these changes while reading a file instead of exactly specifying the filename every time we run the script?
Things i tried:
I have used below method in my previous scripts because it had only one file with known extension:
xlsxfile = "*.xlsx"
filelocation = "/user/script/" + xlsxfile
But with multiple files with similar extension i am not sure how to get the definition done.
EDIT1:
I was trying to get more clarity on using glob with read_excel. Please see my example code below:
import os
import glob
import pandas as pd
os.chdir ('D:\\Users\\RMoharir\\Downloads\\Smart Spend\\Input')
fls=glob.glob("Medical*.*")
df1 = pd.read_excel(fls, parse_cols = 'A:H', skiprows = 10, header = None)
But this gives me an error:
ValueError: Invalid file path or buffer object type: <class 'list'>
Any help is appreciated.
If you simply need to find all the files that match a given pattern in a directory, os and re modules have you covered.
import os
import re
files = os.listdir()
for file in files:
if re.match(r".*\.xlsx$", file):
print(file)
This short program will print out every file in the current directory whose name ends with .xslx. If you need to match a more complicated pattern, you may need to read up on Regular Expressions
Note that os.listdir takes an optional string argument of what path to look in, if not given it will look in the directory the program was ran from
I've already posted here with the same question but I sadly I couldn't come up with a solution (even though some of you guys gave me awesome answers but most of them weren't what I was looking for), so I'll try again and this time giving more information about what I'm trying to do.
So, I'm using a program called GMAT to get some outputs (.txt files with numerical values). These outputs have different names, but because I'm using them to more than one thing I'm getting something like this:
GMATd_1.txt
GMATd_2.txt
GMATf_1.txt
GMATf_2.txt
Now, what I need to do is to use these outputs as inputs in my code. I need to work with them in other functions of my script, and since I will have a lot of these .txt files I want to rename them as I don't want to use them like './path/etc'.
So what I wanted was to write a loop that could get these files and rename them inside the script so I can use these files with the new name in other functions (outside the loop).
So instead of having to this individually:
GMATds1= './path/GMATd_1.txt'
GMATds2= './path/GMATd_2.txt'
I wanted to write a loop that would do that for me.
I've already tried using a dictionary:
import os
import fnmatch
dict = {}
for filename in os.listdir('.'):
if fnmatch.fnmatch(filename, 'thing*.txt'):
examples[filename[:6]] = filename
This does work but I can't use the dictionary key outside the loop.
If I understand correctly, you try to fetch files with similar names (at least a re-occurring pattern) and rename them. This can be accomplished with the following code:
import glob
import os
all_files = glob.glob('path/to/directory/with/files/GMAT*.txt')
for file in files:
new_path = create_new_path(file) # possibly split the file name, change directory and/or filename
os.rename(file, new_path)
The glob library allows for searching files with * wildcards and makes it hence possible to search for files with a specific pattern. It lists all the files in a certain directory (or multiple directories if you include a * wildcard as a directory). When you iterate over the files, you could either directly work with the input of the files (as you apparently intend to do) or rename them as shown in this snippet. To rename them, you would need to generate a new path - so you would have to write the create_new_path function that takes the old path and creates a new one.
Since python 3.4 you should be using the built-in pathlib package instead of os or glob.
from pathlib import Path
import shutil
for file_src in Path("path/to/files").glob("GMAT*.txt"):
file_dest = str(file_src.resolve()).replace("ds", "d_")
shutil.move(file_src, file_dest)
you can use
import os
path='.....' # path where these files are located
path1='.....' ## path where you want these files to store
i=1
for file in os.listdir(path):
if file.endswith(end='.txt'):
os.rename(path + "/" + file, path1 + "/"+str(i) + ".txt")
i+=1
it will rename all the txt file in the source folder to 1,2,3,....n.txt
I have the following files:
/tmp/test_glob/client.log.71.gz
/tmp/test_glob/client.log.63.gz
/tmp/test_glob/client.log.11
/tmp/test_glob/core_dump.log
/tmp/test_glob/client.log.32
/tmp/test_glob/dm.log
/tmp/test_glob/client.log
/tmp/test_glob/client.log.1
/tmp/test_glob/client.log.64.gz
I want to get all .log files, EXCEPT the files, that end with .gz.
The desired result should be the following:
/tmp/test_glob/client.log.11
/tmp/test_glob/core_dump.log
/tmp/test_glob/client.log.32
/tmp/test_glob/dm.log
/tmp/test_glob/client.log
/tmp/test_glob/client.log.1
I have written this simple code:
import glob
import os
glob_pattern = u'*.log*'
for log_path in glob.glob(os.path.join('/tmp/test_glob', glob_pattern)):
print('log_path: ', log_path)
but it returns all file from folder /tmp/test_glob/
I tried to modify this pattern like this:
glob_pattern = u'*.log.[0-9][0-9]'
but it returns only
/tmp/test_glob/client.log.11
/tmp/test_glob/client.log.32
How to fix this pattern ?
Using Pythex(a Python regex tester), the match string
glob_pattern = u'.*(\.log)(?!.*(gz)).*'
Worked well for your goal.
Try **/*.log!(*.gz)
Test using globster.xyz
That isn't a glob pattern. You don't want glob. You want to use the re module functions to filter the results of os.listdir.
I have a directory containing a number of files with this format:
1 or 2 numbers_S followed by 1 or 2 numbers_L001_R1 or R2_001.fastq
Examples: 1_S1_L001_R1_001.fastq or 14_S14_L001_R2_001.fastq
I want the file names to be like this: 1_R1.fastq 14_R2.fastq
I have figured out the regexp that reflects the file names and can successfully do the search and replace within TextWrangler. Below is the regexp that I came up with:
Search: (\d+)\wS\d+\wL001\w(R\d)\w001(\.fastq)
Replace: \1_\2\3 (or $1_$2$3 depending on the program)
However, I would like to know how to batch rename the files using a simple Python script. I would appreciate any advice.
Thank you!
You can do something like this
import glob, re, os
for filename in glob.glob('/some/dir/*.fastq'):
new_name = re.sub(pattern, r'\1_\2\3', filename)
os.rename(filename, new_name)
Consider using the package os, from which you can use os.rename(src, dst). The documentation is right here.
I have written a piece of a code which is supposed to read the texts inside several files which are located in a directory. These files are basically text files but they do not have any extensions.But my code is not able to read them:
corpus_path = 'Reviews/'
for infile in glob.glob(os.path.join(corpus_path,'*.*')):
review_file = open(infile,'r').read()
print review_file
To test if this code works, I put a dummy text file, dummy.txt. which worked because it has extension. But i don't know what should be done so files without the extensions could be read.
can someone help me? Thanks
Glob patterns don't work the same way as wildcards on the Windows platform. Just use * instead of *.*. i.e. os.path.join(corpus_path,'*'). Note that * will match every file in the directory - if that's not what you want then you can revise the pattern accordingly.
See the glob module documentation for more details.
Just use * instead of *.*.
The latter requires an extension to be present (more precisely, there needs to be a dot in the filename), the former doesn't.
You could search for * instead of *.*, but this will match every file in your directory.
Fundamentally, this means that you will have to handle cases where the file you are opening is not a text file.
it seems that you need
from os import listdir
from filename in ( fn for fn in listdir(corpus_path) if '.' not in fn):
# do something
you could write
from os import listdir
for fn in listdir(corpus_path):
if '.' not in fn:
# do something
but the former with a generator spares one indentation level