Change suffix of multiple files

Change suffix of multiple files - python

I have multiple text files with names containing 6 groups of period-separated digits matching the pattern year.month.day.hour.minute.second.
I want to add a .txt suffix to these files to make them easier to open as text files.
I tried the following code and I I tried with os.rename without success:
Question
How can I add .txt to the end of these file names?
path = os.chdir('realpath')
for f in os.listdir():
file_name = os.path.splitext(f)
name = file_name +tuple(['.txt'])
print(name)

You have many problems in your script. You should read each method's documentation before using it. Here are some of your mistakes:
os.chdir('realpath') - Do you really want to go to the reapath directory?
os.listdir(): − Missing argument, you need to feed a path to listdir.
print(name) - This will print the new filename, not actually rename the file.
Here is a script that uses a regex to find files whose names are made of 6 groups of digits (corresponding to your pattern year.month.day.hour.minute.second) in the current directory, then adds the .txt suffix to those files with os.rename:
import os
import re
regex = re.compile("[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+")
for filename in os.listdir("."):
if regex.match(filename):
os.rename(filename, filename + ".txt")

Related

REGEX \d is not working while reading a glob

what I'm trying is to find the highest number of the files that I have in the folder.
Once decleared the path of the folder, I tried to add the name of the files that I want to read. In that file I want to find any possible number from one to more digits.
path = "C:/Users/Desktop/Data/noupdated/FEB/batch/csv/"
files = glob.glob(path + "noupdated_\d+_FEB_españa_info_csv.csv")
for file in files:
print(file)
It should be printing in this case, the following outcome:
"C:/Users/Desktop/Data/noupdated/FEB/batch/csv/noupdated_0_FEB_españa_info_csv.csv"
"C:/Users/Desktop/Data/noupdated/FEB/batch/csv/noupdated_1_FEB_españa_info_csv.csv"
But instead, it's not printing anything.
Thank you! I'm relatively new with the regex syntax.

Pandas: How to read xlsx files from a folder matching only specific names

I have a folder full of excel files and i have to read only 3 files from that folder and put them into individual dataframes.
File1: Asterix_New file_Jan2020.xlsx
File2: Asterix_Master file_Jan2020.xlsx
File3: Asterix_Mapping file_Jan2020.xlsx
I am aware of the below syntax which finds xlsx file from a folder but not sure how to relate it to specific keywords. In this case starting with "Asterix_"
files_xlsx = [f for f in files if f[-4:] == "xlsx"]
Also i am trying to put each of the excel file in a individual dataframe but not getting successful:
for i in files_xlsx:
df[i] = pd.read_excel(files_xlsx[0])
Any suggestions are appreciated.

I suggest using pathlib. If all the files are in a folder:
from pathlib import Path
from fnmatch import fnmatch
folder = Path('name of folder')
Search for the files using glob. I will also suggest using fnmatch to include the files whose extensions are in capital letters.
iterdir allows you to iterate through the files in the folder
name is a method in pathlib that gives you the name of the file in string format
using the str lower method ensures that extensions such as XLSX, which is uppercase is captured
excel_only_files = [xlsx for xlsx in folder.iterdir()
if fnmatch(xlsx.name.lower(),'asterix_*.xlsx')]
OR
#you'll have to test this, i did not put it though any tests
excel_only_files = list(folder.rglob('Asterix_*.[xlsx|XLSX]')
from there, you can run a list comprehension to read your files:
dataframes = [pd.read_excel(f) for f in excel_only_files]

Use glob.glob to do your pattern matches
import glob
for i in glob.glob('Asterix_*.xlsx'):
...

First generate a list of files you want to read in using glob (based on #cup's answer) and then append them to a list.
import pandas as pd
import glob
my_df_list = [pd.read_excel(f) for f in glob.iglob('Asterix_*.xlsx')]
Depending on what you want to achieve, you can also use a dict to allow for key-value pairs.

At the end of the if statement you need to add another condition for files which also contain 'Asterix_':
files_xlsx = [f for f in files if f[-4:] == "xlsx" and "Asterix_" in f]
The f[-4:] == "xlsx" is to make sure the last 4 characters of the file name are xlsx and "Asterix_" in f makes sure that "Asterix_" exists anywhere in the file name.
To then read these using pandas try:
for file in excel_files:
df = pd.read_excel(file)
print(df)
That should print the result of the DataFrame read from the excel file

If you have read in the file names, you can make sure that it starts with and ends with the desired strings by using this list comprehension:
files = ['filea.txt', 'fileb.xlsx', 'filec.xlsx', 'notme.txt']
files_xlsx = [f for f in files if f.startswith('file') and f.endswith('xlsx')]
files_xlsx # ['fileb.xlsx', 'filec.xlsx']
The list comprehension says, "Give me all the files that start with file AND end with xlsx.

How to remove characters from multiple files in python

I'm, trying to write a simple program to batch rename files in a folder.
file format:
11170_tcd001-20160824-094716.txt
11170_tcd001-20160824-094716.rst
11170_tcd001-20160824-094716.raw
I have 48 of the above with a different 14 digit character configuration after the first "-".
My final goal is to convert the above to:
11170_tcd001.txt
11170_tcd001.rst
11170_tcd001.raw
I know it's possible to os.rename files in python. However, I can't figure out how to batch rename multiple files with a different character configuration.
Is this possible?
some pseudocode below of what I would like to achieve.
import os
pathiter = (os.path.join(root, filename)
for root, _, filenames in os.walk(folder)
for filename in filenames
)
for path in pathiter:
newname = path.replace('14 digits.txt', ' 0 digits.txt')
if newname != path:
os.rename(path,newname)

If you are looking for a non-regex approach and considering your files all match that particular pattern you are expecting, what you can do first is get the extension of the file using splitext:
from os.path import splitext
file_name = '11170_tcd001-20160824-094716.txt'
extension = splitext(file_name)[1]
print(extension) # outputs: .txt
Then, with the extension in hand, split the file_name on the - and get the first item since you know that is the part that you want to keep:
new_filename = file_name.split('-')[0]
print(new_filename) # 11170_tcd001
Now, append the extension:
new_filename = new_filename + extension
print(new_filename) # 11170_tcd001.txt
Now you can proceed with the rename:
os.rename(file_name, new_filename)

You should probably try using regular expressions, like
import re
<...>
newfilename = re.sub(r'-\d{8}-\d{6}\b', '', oldfilename)
<...>
This will replace any 'hyphen, 8 digits, hyphen, 6 digits' not followed by letter, digit or underscore with empty string in your filename. Hope I got you right.

Creating subdirectories and sorting files based on filename PYTHON

I have a large directory with many part files with their revisions, I want to recursively create a new folder for each part, and then move all of the related files into that folder. I am trying to do this by isolating a 7 digit number which would be used as an identifier for the part, and all the related filenames would also include this number.
import os
import shutil
import csv
import glob
from fnmatch import fnmatch, filter
from os.path import isdir, join
from shutil import copytree, copy2, Error, copystat
from shutil import copytree, ignore_patterns
dirname = ' '
# pattern = '*???????*'
for root, dirs, files in os.walk(dirname):
for fpath in files:
print(fpath)
if fpath[0:6].isdigit():
matchdir = os.mkdir(os.path.join(os.path.dirname(fpath)))
partnum = str(fpath[0:6])
pattern = str(partnum)
filematch = fnmatch(files, pattern)
print(filematch)
shutil.move(filematch, matchdir)
This is what I have so far, basically I'm not sure how to get the original filename and use it as the matching patter for the rest of the files. The original filename I want to use for this matching pattern is just a 7 digit number, and all of the related files may have other characters (REV-2) for example.

Don't overthink it
I think you're getting confused about what os.walk() gives you - recheck the docs. dirs and files are just a list of names of the directories / files, not the full paths.
Here's my suggestion. Assuming that you're starting with a directory layout something like:
directory1
1234567abc.txt
1234567abc.txt
1234567bcd.txt
2234567abc.txt
not-interesting.txt
And want to end with something like:
directory1
1234567
abc.txt
1234567
abc.txt
bcd.txt
2234567
abc.txt
not-interesting.txt
If that's correct, then there's no need to rematch the files in the directory, just operate on each file individually, and make the part directory only if it doesn't already exist. I would also use a regular expression to do this, so something like:
import os
import re
import shutil
for root, dirs, files in os.walk(dirname):
for fname in files:
# Match a string starting with 7 digits followed by everything else.
# Capture each part in a group so we can access them later.
match_object = re.match('([0-9]{7})(.*)$', fname)
if match_object is None:
# The regular expression did not match, ignore the file.
continue
# Form the new directory path using the number from the regular expression and the current root.
new_dir = os.path.join(root, match_object.group(1))
if not os.path.isdir(new_dir):
os.mkdir(new_dir)
new_file_path = os.path.join(new_dir, match_object.group(2))
# Or, if you don't want to change the filename, use:
new_file_path = os.path.join(new_dir, fname)
old_file_path = os.path.join(root, fname)
shutil.move(old_file_path, new_file_path)
Note that I have:
Switched the sense of the condition, we continue the loop immediately if the file is not interesting. This is a useful pattern to use to make sure that your code does not get too heavily indented.
Changed the name of fpath to fname. This is because it's not a path but just the name of the file, so it's better to call it fname.
Please clarify the question if that's not what you meant!
[edit] to show how to copy the file without changing its name.

Reading all files in all directories [duplicate]

This question already has answers here:
How do I list all files of a directory?
(21 answers)
Closed 9 years ago.
I have the code working to read in the values of a single text file but am having difficulties reading all files from all directories and putting all of the contents together.
Here is what I have:
filename = '*'
filesuffix = '*'
location = os.path.join('Test', filename + "." + filesuffix)
Document = filename
thedictionary = {}
with open(location) as f:
file_contents = f.read().lower().split(' ') # split line on spaces to make a list
for position, item in enumerate(file_contents):
if item in thedictionary:
thedictionary[item].append(position)
else:
thedictionary[item] = [position]
wordlist = (thedictionary, Document)
#print wordlist
#print thedictionary
note that I am trying to stick the wildcard * in for the filename as well as the wildcard for the filesuffix. I get the following error:
"IOError: [Errno 2] No such file or directory: 'Test/.'"
I am not sure if this is even the right way to do it but it seems that if I somehow get the wildcards working - it should work.
I have gotten this example to work: Python - reading files from directory file not found in subdirectory (which is there)
Which is a little different - but don't know how to update it to read all files. I am thinking that in this initial set of code:
previous_dir = os.getcwd()
os.chdir('testfilefolder')
#add something here?
for filename in os.listdir('.'):
That I would need to add something where I have an outer for loop but don't quite know what to put in it..
Any thoughts?

Python doesn't support wildcards directly in filenames to the open() call. You'll need to use the glob module instead to load files from a single level of subdirectories, or use os.walk() to walk an arbitrary directory structure.
Opening all text files in all subdirectories, one level deep:
import glob
for filename in glob.iglob(os.path.join('Test', '*', '*.txt')):
with open(filename) as f:
# one file open, handle it, next loop will present you with a new file.
Opening all text files in an arbitrary nesting of directories:
import os
import fnmatch
for dirpath, dirs, files in os.walk('Test'):
for filename in fnmatch.filter(files, '*.txt'):
with open(os.path.join(dirpath, filename)):
# one file open, handle it, next loop will present you with a new file.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Change suffix of multiple files - python

Related

REGEX \d is not working while reading a glob

Pandas: How to read xlsx files from a folder matching only specific names

How to remove characters from multiple files in python

Creating subdirectories and sorting files based on filename PYTHON

Reading all files in all directories [duplicate]

Categories

Resources