I'd like to know how to import a dataframe based on part of the file name
I have a file like: 'Report_Lineup_Export_20220809_1354.xls' where the numbers are de date and hour when you donwloaded the file.
I'm importing it like:
lineup_source = pd.read_excel('C:/Users/fernandom/OneDrive/08_Scripts/01_Python/Report_Lineup_Export_20220809_1354.xls')
I want my code to read whatever is in the folder that starts with 'Report_Lineup_Export', ignoring the last bit.
Thanks.
You can use glob module:
from glob import glob
import os
for filename in glob(os.path.join(_mydir, 'Report_Lineup_Export*.xlsx')):
do_something_with_filename(filename)
Note that you can cut out .xlsx if you'd like to get all kind of files starting with that initial string.
Adding to the above answer, you can do it with the pathlib module in Python.
from pathlib import Path
for file in Path('some/dir').glob('*.xlsx'):
do_something(file)
Related
I'm writing a script to import a csv using pandas, then upload it to a SQL server. It all works when I have a test file with one name, but that won't be the case in production. I need to figure out how to import with only a semi-known filename (the filename will always follow the same convention, but there will be differences in the filenames such as dates/times). I should, however, note that there will only ever be one csv file in the folder it's checking. I have tried the wildcard in the import path, but that didn't work.
After it's imported, I also then need to return the filename.
Thanks!
Look into the OS module:
import os
files = os.listdir("./")
csv_files = [filename for filename in files if filename.endswith(".csv")]
csv_files is a list with all the files that ends with .csv
I have a script I run daily to compile a bunch of spreadsheets into one. Well after a year of running one of the filenames changed due to it being produced 14 seconds later. I read the filename in like this
uproduction = Path(r"\\server\folder\P"+year+month+day+r"235900.xls")
and then df = pd.read_excel(upreduction)
This was working fine until the file name changed to P20210225235914.xls . When I am using a raw string like that is there a way I can make it pick any file that starts with P20210225*.xls ? I can't seem to find exactly what i'm looking for for in the docs
You can use glob:
from glob import glob
glob(r"\\server\folder\P"+year+month+day+"*.xls")
You can use the glob method on the Path:
for file in Path(r'\\server\folder\').glob(r'P20210225*.xls'):
print(file.name)
I have about 20 csv files that I need to read in, is it possible to read in the whole folder instead of doing them individually? I am using python. Thanks
You can't. The fileinput module almost meets your needs, allowing you to pretend a bunch of files are a single file, but it also doesn't meet the requirements of files for the csv module (namely, that newline translation must be turned off). Just open the files one-by-one and append the results of parsing to a single list; it's not that much more effort. No matter what you do something must "do them individually"; there is no magic to say "read 20 files exactly as if they were one file". Even fobbing off to cat or the like (to concatenate all the files into a single stream you can read from) is just shunting the same file-by-file work elsewhere.
You can pull a list of files in Python by using os.listdir. From there, you can loop over your list of files, and generate a list of CSV files:
import os
filenames = os.listdir("path/to/directory/")
csv_files = []
for name in filenames:
if filename.endswith("csv"):
csv_files.append(name)
From there, you'll have a list containing every CSV in your directory.
The shortest thing that I can think of is this, it's not in one line because you have to import a bunch of stuff so that line is not that long:
from os import listdir
from os.path import isfile
from os.path import splitext
from os.path import join
import pandas as pd
source = '/tmp/'
dfs = [
pd.read_csv(join(source, path)) for path in listdir(source) if isfile(join(source, path)) and splitext(join(source, path))[1] == '.csv'
]
I am looking to pull in a csv file that is downloaded to my downloads folder into a pandas dataframe. Each time it is downloaded it adds a number to the end of the string, as the filename is already in the folder. For example, 'transactions (44).csv' is in the folder, the next time this file is downloaded it is named 'transactions (45).csv'.
I've looked into the glob library or using the os library to open the most recent file in my downloads folder. I was unable to produce a solution. I'm thinking I need some way to connected to the downloads path, find all csv file types, those with the string 'transactions' in it, and grab the one with the max number in the full filename string.
list(csv.reader(open(path + '/transactions (45).csv'))
I'm hoping for something like this path + '/%transactions%' + 'max()' + '.csv' I know the final answer will be completely different, but I hope this makes sense.
Assuming format "transactions (number).csv", try below:
import os
import numpy as np
files=os.listdir('Downloads/')
tranfiles=[f for f in files if 'transactions' in f]
Now, your target file is as below:
target_file=tranfiles[np.argmax([int(t.split('(')[1].split(')')[0]) for t in tranfiles])]
Read that desired file as below:
df=pd.read_csv('Downloads/'+target_file)
One option is to use regular expressions to extract the numerically largest file ID and then construct a new file name:
import re
import glob
last_id = max(int(re.findall(r" \(([0-9]+)\).csv", x)[0]) \
for x in glob.glob("transactions*.csv"))
name = f'transactions ({last_id}).csv'
Alternatively, find the newest file directly by its modification time
Note that you should not use a CSV reader to read CSV files in Pandas. Use pd.read_csv() instead.
The python script I am writing should be able to work in any folder I put in.
I have to find all the .avi files. If the file name ends with cropped.avi then I have to rename it and save it as .avi, without the word cropped.
import os
import glob
os.getcwd()
glob.glob('*.avi')
directory = glob.glob('*.avi')
for directory:
if i.endswith("cropped.avi"):
os.rename("cropped.avi",".avi")
I think my code is missing something and I don't know what to do!!
The for loop syntax was incorrect. This works:
import os
import glob
directory = glob.glob('*.avi')
for i in directory:
if i.endswith("cropped.avi"):
os.rename("cropped.avi",".avi")