I currently have several hundred pdf files with file names that I would like to change.
The current names of the files don't really follow a pattern, however I have a Excel file with what the current file name is and what the new file name is that I want for a specific file. This looks similar to this:
I am looking for a way in python to rename all of my files (about 500) according to my excel index.
What I tried:
import os
path = 'C:\\Users\\Desktop\\Project\\'
files = os.listdir(path)
for file in files:
os.rename(os.path.join(path, file), os.path.join(path, '00' + file + '.pdf'))
Thanks.
If you can save the excel file as csv, this should work
import os
import csv
path = 'C:\\Users\\Desktop\\Project\\'
with open('my_csv.csv') as f:
reader = csv.reader(f)
next(reader) # Skip first row
for line in reader:
src = os.path.join(path, line[0])
dest = os.path.join(path, line[1])
os.rename(src, dest)
You are really close!
You need to iterate over the names in your xlsx file. One simple way is to load the data using pandas.read_excel and finally iterate over the source and dest column and renaming the file.
You can use os.pth.join to create the full path from a given folder and a given file.
Here the code:
# Import module
import os # Rename file
import pandas as pd # read csv
# Your different folders
path_folder = r'C:\Users\Desktop\Project'
path_csv = r'C:\Users\Desktop\Project\csv_file.xlsx'
# Load data
df = pd.read_excel(path_csv)
print(df)
# Current file name Desired file name
# 0 a.pdf 001.pdf
# 1 b.pdf 002.pdf
# 2 c.pdf 003.pdf
# Iterate over each row of the dataframe
for old_name, new_name in zip(df["Current file name"], df["Desired file name"]):
# Create source path and destination path
source_file = os.path.join(path_folder, old_name)
dest_file = os.path.join(path_folder, new_name)
# Rename the current file using the source path (old name)
# and the destination path (new name)
os.rename(source_file, dest_file )
Excel file used:
Hope that helps !
Provided you have table with names, you can use the following code:
import os
names = '''a.pdf 001.pdf
b.pdf 002.pdf
c.pdf 003.pdf'''
os.chdir(r'C:\Users\Desktop\Project')
for line in names.splitlines(False):
old, new = line.split()
os.rename(old, new)
You can copy table from Excel to this piece of code
If you don't care about table, you can try
import os
from itertools import count
numbers = count(1)
os.chdir(r'C:\Users\Desktop\Project')
for old in os.listdir('.'):
if not old.endswith('.pdf'):
continue
new = '%03d.pdf' % next(numbers)
os.rename(old, new)
Related
I am working on a code that gets as input a zip file that contains excel files, extract them in a folder, convert them in dataframes and load all these dataframes files in a list. I would like to create a new folder, convert those dataframes in csv files and save them in the above-mentioned folder. The goal is to be able to download as a zip file a folder of csv files.
The main problem for me is to make sure that every csv file has the name of the excel file it was originated from.
I'm adding my code, in the first block there's the first part of the code, while in the second one there's the part of the code in which i have a problem.
running this last part of the code i get this error:
"XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\xef\xbb\xbf;data'"
%%capture
import os
import numpy as np
import pandas as pd
import glob
import os.path
!pip install xlrd==1.2.0
from google.colab import files
uploaded = files.upload()
%%capture
zipname = list(uploaded.keys())[0]
destination_path = 'files'
infolder = os.path.join('/content/', destination_path)
!unzip -o $zipname -d $destination_path
# Load an excel file, return events dataframe + file header dataframe
def load_xlsx(fullpath):
return events, meta
tasks = [os.path.join(dp, fname) for dp, dn, filenames in os.walk(infolder) for fname in filenames if fname.lower().endswith('.xls')]
dfs = []
metas = []
for fname in tasks:
df, meta = load_xlsx(fname)
dfs.append(df)
metas.append(meta)
newpath = 'csv2021'
if not os.path.exists(newpath):
os.makedirs(newpath)
filepath = os.path.join('/content/files/', newpath)
for fname in tasks:
filename = load_xlsx(fname)
my_csv = filename.to_csv(os.path.join(filepath, filename), encoding="utf-8-sig" , sep = ';')
I have 7 vcf files present in 2 directories:
dir
I want to concatenate all files present on both folders and then read them through python.
I am trying this code:
# Import Modules
import os
import pandas as pd
import vcf
# Folder Path
path1 = "C://Users//USER//Desktop//Anas/VCFs_1/"
path2 = "C://Users//USER//Desktop//Anas/VCFs_2/"
#os.chdir(path1)
def read(f1,f2):
reader = vcf.Reader(open(f1,f2))
df = pd.DataFrame([vars(r) for r in reader])
out = df.merge(pd.DataFrame(df.INFO.tolist()),
left_index=True, right_index=True)
return out
# Read text File
def read_text_file(file_path1,file_path2):
with open(file_path1, 'r') as f:
with open(file_path2,'r') as f:
print(read(path1,path2))
# iterate through all file
for file in os.listdir():
# Check whether file is in text format or not
if file.endswith(".vcf"):
file_path1 = f"{path1}\{file}"
file_path2 = f"{path2}\{file}"
print(file_path1,"\n\n",file_path2)
# call read text file function
#data = read_text_file(path1,path2)
print(read_text_file(path1,path2))
But its giving me permission error. I know when we try to read folders instead files then we get this error. But how can i read files present in folders? Any suggestion?
You may need to run your Python code with Administrator privileges, if you are trying to access another user's files.
I need to transform csv files into Excel files in an automatic way. I am failing in naming Excel files with the name of the corresponding csv file.
I saved csv files as 'Trials_1', 'Trials_2', Trilas_3' but with the code that I wrote Python gives me an error and asks me for csv file named 'Trials_4'. Then, if I rename csv file 'Trials_1' into 'Trials_4' the program works and generates an Excel file named 'Trials_1'.
How can I correct my code?
'''
import csv
import openpyxl as xl
import os, os.path
directory=r'C:\\Users\\PycharmProjects\\input\\'
folder=r'C:\\Users\\PycharmProjects\\output\\'
for csv_file in os.listdir(directory):
def csv_to_excel(csv_file, excel_file):
csv_data=[]
with open(os.path.join(directory, csv_file)) as file_obj:
reader=csv.reader(file_obj)
for row in reader:
csv_data.append(row)
workbook= xl.Workbook()
sheet=workbook.active
for row in csv_data:
sheet.append(row)
workbook.save(os.path.join(folder,excel_file))
if __name__=="__main__":
m = sum(1 for f in os.listdir(directory) if os.path.isfile(os.path.join(directory, f)))
new_name = "{}Trial_{}.csv".format(directory, m + 1)
k = sum(1 for file in os.listdir(folder) if os.path.isfile(os.path.join(folder, file)))
new_name_e = "{}Trial_{}.xlsx".format(folder, k + 1)
csv_to_excel(new_name,new_name_e)
'''
Thanks.
Hi Annachiara welcome to StackOverflow,
I would modify the "csv_to_excel" function by using only pandas.
Before that you should install 'xlsxwriter' with:
pip install XlsxWriter
Then the function would be like this:
def csv_to_excel(csv_file,excel_file,csv_sep=';'):
# read the csv file with pandas
df=pd.read_csv(csv_file,sep=csv_sep)
# create the excel file
writer=pd.ExcelWriter(excel_file, engine='xlsxwriter')
# copy the csv content (df) into the excel file
df.to_excel(writer,index=False)
# save the excel file
writer.save()
# print what you converted for reference
print(f'csv file {csv_file} saved as excel in {excel_file}')
Just only make sure that the csv is read correctly: I added just the separator parameter, but you might want to add all the other parameters (like parse dates etc.)
Then you can convert the list of csv files with a for loop (I used more steps to make it clearer)
dir_in=r'C:\\Users\\PycharmProjects\\input\\'
dir_out=r'C:\\Users\\PycharmProjects\\output\\'
csvs_to_convert=os.listdir(dir_in)
for csv_file_in in csvs_to_convert:
# remove extension from csv files
file_name_no_extension=os.path.splitext(csv_file_in)[0]
# add excel extension .xlsx
excel_name_out=file_name_no_extension+'.xlsx'
# write names with their directories
complete_excel_name_out=os.path.join(dir_out,excel_name_out)
complete_csv_name_in=os.path.join(dir_in,csv_file_in)
# convert csv file to excel file
csv_to_excel(complete_csv_name_in,complete_excel_name_out,csv_sep=';')
Each csv as seperate excel file
import glob
import pandas as pd
import os
csv_files = glob.glob('*.csv')
for filename in csv_files:
sheet_name = os.path.split(filename)[-1].replace('.csv', '.xlsx')
df = pd.read_csv(filename)
df.to_excel(sheet_name, index=False)
All csv in same excel in different sheet
import glob
import pandas as pd
import os
# Create excel file
writer = pd.ExcelWriter('all_csv.xlsx')
csv_files = glob.glob('*.csv')
for filename in csv_files:
sheet_name = os.path.split(filename)[-1].replace('.csv', '')
df = pd.read_csv(filename)
# Append each csv as sheet
df.to_excel(writer, sheet_name=sheet_name, index=False)
writer.save()
Assuming you would like to keep the same structure of your code, I just fixed some technical issues in your code to make it work (please change the folders path to your own):
import csv
import openpyxl as xl
import glob, os, os.path
directory= 'input'
folder= '../output' # Since 'input' would be my cwd, need to step back a directory to reach 'output'
# Using your function, just passing different arguments for convinient.
def csv_to_excel(f_path, f_name):
csv_data=[]
with open(f_path, 'r') as file_obj:
reader=csv.reader(file_obj)
for row in reader:
csv_data.append(row)
workbook= xl.Workbook()
sheet=workbook.active
for row in csv_data:
sheet.append(row)
workbook.save(os.path.join(folder, f_name + ".xlsx"))
def main():
os.chdir(directory) # Defining input directory as your cwd
# Searching for all files with csv extention and sending each to your function
for file in glob.glob("*.csv"):
f_path = os.getcwd() + '\\' + file # Saving the absolute path to the file
f_name = (os.path.splitext(file)[0]) # Saving the name of the file
csv_to_excel(f_path, f_name)
if __name__=="__main__":
main()
P.S:
Please avoid iterating a definition of a function since you only need to define a function once.
I am trying to write a script which will rename a block of asp files to their correct name.
Currently I have a folder which has asp files in it which are just named as id1.asp, id2,asp and so on.
I have a CSV file which has the ids in it and a description for that ID.
id1, pen
id2, rubber
id3, paper
etc.
I am trying to work out how to rename the id1.asp to be pen.asp, id2.asp to rubber.asp and so on.
Thank you. This is what I have tried so far:
import csv
import os
import shutil
a_csv_file = open("skuconvert.csv", "r")
dict_reader = csv.DictReader(a_csv_file)
for row in a_csv_file:
print (row)
ordered_dict_from_csv = list(dict_reader)[0]
dict_from_csv = dict(ordered_dict_from_csv)
print(dict_from_csv)
dirs = os.listdir('./')
path = ''
head_tail = os.path.split(path)
You can use glob.glob() to find all of your ASP files in a folder. First load your CSV file in as a dictionary (this assumes there are two columns, the IDs and the names). If there are more columns or if there is a header this would need to be dealt with differently.
import glob
import csv
import os
import shutil
with open("skuconvert.csv") as f_input:
csv_input = csv.reader(f_input)
ids = dict(csv_input)
asp_files = glob.glob(r"f:\dropbox\python temp\*.asp")
for asp_file in asp_files:
path, basename = os.path.split(asp_file)
filename, ext = os.path.splitext(basename)
if filename in ids:
new_name = os.path.join(path, f'{ids[filename]}{ext}')
print(f"Renaming: '{asp_file}' to '{new_name}'")
try:
shutil.move(asp_file, new_name)
except:
print(f"Unable to rename: {new_name}")
else:
print(f"ID unknown: {filename}")
Then for each ASP file that is found, split out the path and extension and determine if the ID is found in the CSV file. If it is, build the new filename and call shutil.move() to rename the file. If it is not, then print the unknown file.
You can use a CVS Reader to read the filenames and create a dictionary mapping of id to new file name. Then you can use os.listdir() with os.path.splitext() and shutil.move() to rename your files.
I have several csv files in a folder that I need to read and do the same thing to each file. I want to rename each dataframe that is created with the file name, but am not sure how. Could I store the file names in a list and then refer to them later somehow...? My current code is bellow. Thank you in advance.
import os
Path = "C:\Users\DATA"
filelist = os.listdir(Path)
for x in filelist:
RawData = pd.read_csv("C:\Users\DATA\%s" % x)
What if you have just one dataframe with all files?
import os
path = "C:\Users\DATA"
raw_data = {i: pd.read_csv(os.path.abspath(i)) for i in os.listdir(path)}