Inside my input dir I have three reports FinalReport_Nm, FinalReport_S01, FinalReport_S02 etc. I will be adding about 50 more reports to this so the naming will continue on with S03, S04, T01, T02 etc. What I want this script to do is loop through the folder of reports, take FinalReport_NM, and paste it into my template, and then save this as SecondaryReport_1a_NM, and then loop back through and copy FinalReport_S01, paste it to the template, and save as SecondaryReport_1a_S01 etc.
I though by creating schedules NM S01 S02 as seen below in the script and trying to concatenate at the bottom where it says output_file would work but this is a huge fail. How can I get this script to work where it will rename the files as it loops through them.
import openpyxl as xl;
import os
input_dir = 'C:\\Python\\Reports'
output_dir = 'C:\\Reports\\output'
template = 'C:\\Python\\Report_Template.xlsx'
NewFileName = 'SecondaryReport_1a_'
schedule_index = 0
schedules=['Nm', 'S01', 'S02']
files = [file for file in os.listdir(input_dir)
if os.path.isfile(file) and file.endswith('.xlsx')]
for file in files:
input_file = os.path.join(input_dir, file)
wb=xl.load_workbook(input_file)
ws=wb.worksheets[1]
# Open template
wb2 = xl.load_workbook(template)
ws2 = wb2.worksheets[2]
# calculate total number of rows and
# columns in source excel file
mr = ws.max_row
mc = ws.max_column
# copying the cell values from source
# excel file to destination excel file
for i in range (1, mr + 1):
for j in range (1, mc + 1):
# reading cell value from source excel file
c = ws.cell(row = i, column = j)
# Cells for source data to pasted inside Template
ws2.cell(row = i+12, column = j+1).value = c.value
# saving the destination excel file
output_file = (output_dir, f"{summaryFile}_{schedules[schedule_index]}")
schedule_index += 1
wb2.save(output_file)
As I understand the question, you have several xlsx files that have file names of the format "FinalReport_(suffix).xlsx", where suffix can be "Nm", "S01", "S01", "T01", etc. For each one, you want to create a new file with a name like "SecondaryReport_1a_(suffix).xlsx", where the suffix is the same.
For this simple case, the suffix can be extracted using string slicing:
prefix_length = len("FinalReport_")
suffix = file[:prefix_length]
The output filename can be created like so:
output_file = os.path.join(output_dir, f"SecondaryReport_1a_{suffix}.xlsx")
Related
I am trying to copy contents from a source excel file to destination excel file. So my source excel file has hyperlinks in it. But when I paste the source file contents to destination excel file the hyperlinks are gone.
Is there any way to preserve the hyperlinks???
CODE:
import openpyxl as xl;
# opening the source excel file
filename ="C:\\Users\\Admin\\Desktop\\trading.xlsx"
wb1 = xl.load_workbook(filename)
ws1 = wb1.worksheets[0]
# opening the destination excel file
filename1 ="C:\\Users\\Admin\\Desktop\\test.xlsx"
wb2 = xl.load_workbook(filename1)
ws2 = wb2.active
# calculate total number of rows and
# columns in source excel file
mr = ws1.max_row
mc = ws1.max_column
# copying the cell values from source
# excel file to destination excel file
for i in range (1, mr + 1):
for j in range (1, mc + 1):
# reading cell value from source excel file
c = ws1.cell(row = i, column = j)
# writing the read value to destination excel file
ws2.cell(row = i, column = j).value = c.value
# saving the destination excel file
wb2.save(str(filename1))
Add ws2.cell(row = i, column = j).hyperlink = c.hyperlink after ws2.cell(row = i, column = j).value = c.value.
Do the same thing with .style if you want to preserve the blue colour of the links.
This is my first time of using Openpyxl. So I am trying to copy and paste contents from one Excel workbook to another using the library. The code works fine when I execute it. However, my challenge is that instead of the code to copy the content below the the content already in the workbook, it replaces the content in the workbook. And this is not my objective.
I have tried to look at the tutorial and also do some research, but I could not seem to get what I was doing wrong. Below is the code that I am using
import openpyxl as opxl
#Opening the destination excel file
sourceFile = "C:\\ForTesting\\sourceFile.xlsx"
sourceworkbook = opxl.load_workbook(sourceFile)
sourceworksheet = sourceworkbook.worksheets[0]
#Opening the destination excel file
destinationFile = "C:\\ForTesting\\destinationFile.xlsx"
desinationworkbook = opxl.load_workbook(destinationFile)
destinationworksheet = desinationworkbook.active
# calculate total number of rows and
# columns in source excel file
maximum_row = sourceworksheet.max_row
maximum_column = sourceworksheet.max_column
# copying the cell values from source
# excel file to destination excel file
for i in range (1, maximum_row + 1):
for j in range (1, maximum_column + 1):
#reading the cell value from source excel file
c = sourceworksheet.cell(row = i, column = j)
#writing the value read from the source file to destination file
destinationworksheet.cell(row = i, column = j).value = c.value
#Save the destination excel file
sourceworkbook.save(str(destinationFile))
print("The File has been saved successfully")
I will appreciate it if someone can point me to what I am doing wrong or what I need to do.
As I understood you want to append content of excel1 to excel2. But problem is that you are overwriting it. Reason is that in excel2 you start to write from first row, instead of the last.
import openpyxl as opxl
#Opening the destination excel file
sourceFile = "C:\\ForTesting\\sourceFile.xlsx"
sourceworkbook = opxl.load_workbook(sourceFile)
sourceworksheet = sourceworkbook.worksheets[0]
#Opening the destination excel file
destinationFile = "C:\\ForTesting\\destinationFile.xlsx"
desinationworkbook = opxl.load_workbook(destinationFile)
destinationworksheet = desinationworkbook.active
# calculate total number of rows and
# columns in source excel file
maximum_row = sourceworksheet.max_row
maximum_column = sourceworksheet.max_column
# find excel2 max_row
max_row_dest = destinationworksheet.max_row
# copying the cell values from source
# excel file to destination excel file
for i in range (1, maximum_row + 1):
for j in range (1, maximum_column + 1):
#reading the cell value from source excel file
c = sourceworksheet.cell(row = i, column = j)
#writing the value read from the source file to destination file
destinationworksheet.cell(row = max_row_dest, column = j).value = c.value
max_row += 1
#Save the destination excel file
sourceworkbook.save(str(destinationFile))
print("The File has been saved successfully")
The solution I am about to give might be 0.02 seconds slower but is a lot easier:
Use Pandas, An Intuitive python library:
import pandas as pd
sourceFile = "C:\\ForTesting\\sourceFile.xlsx"
destinationFile = "C:\\ForTesting\\destinationFile.xlsx"
df = pd.read_excel(sourceFile)
df.to_excel(destinationFile)
I have many csv files in one subfolder, say data. Each of these .csv files contain a date column.
430001.csv, 43001(1).csv,43001(2).csv,..........,43001(110).csv etc.
I want to rename all the files in folder according to the date inside column of csv file.
Desired output:
430001-1980.csv, 43001-1981.csv,43001-1985.csv,..........,43001-2010.csv etc.
I tried to follow the steps advised in :
Renaming multiple csv files
Still could not get the desired output.
Any help would be highly appreciated.
Thanks!
You can loop through them, extract the date to create a new filename, and then save it.
# packages to import
import os
import pandas as pd
import glob
import sys
data_p = "Directory with your data"
output_p = "Directory where you want to save your output"
retval = os.getcwd()
print (retval) # see in which folder you are
os.chdir(data_p) # move to the folder with your data
os.getcwd()
filenames = sorted(glob.glob('*.csv'))
fnames = list(filenames) # get the names of all your files
#print(fnames)
for f in range(len(fnames)):
print(f'fname: {fnames[f]}\n')
pfile = pd.read_csv(fnames[f], delimiter=",") # read in file
#extract filename
filename = fnames[f]
parts = filename.split(".") # giving you the number in file name and .csv
only_id = parts[0].split("(") # if there is a bracket included
# get date from your file
filedate = pfile["date"][0] # assuming this is on the first row
filedate = str(filedate)
# get new filename
newfilename = only_id[0]+"-"+filedate+parts[1]
# save your file (don't put a slash at the end of your directories on top)
pfile.to_csv(output_p+"/"+newfilename, index = False, header = True)
I have as many as 1500 text files and I want to copy 5 lines from every text file, say line 4,5,9,14 and 32. I want to make columns of these files in an excel sheet one below the other, of the 1500 text files. I have figured out a code that takes in only one txt file but copies all the data into rows. Any help will be appreciated.
Here is my code:
import csv
import xlwt
import os
import sys
# Look for input file in same location as script file:
inputfilename = os.path.join(os.path.dirname(sys.argv[0]),
'C:/path/filename.txt')
# Strip off the path
basefilename = os.path.basename(inputfilename)
# Strip off the extension
basefilename_noext = os.path.splitext(basefilename)[0]
# Get the path of the input file as the target output path
targetoutputpath = os.path.dirname(inputfilename)
# Generate the output filename
outputfilename = os.path.join(targetoutputpath, basefilename_noext + '.xls')
# Create a workbook object
workbook = xlwt.Workbook()
# Add a sheet object
worksheet = workbook.add_sheet(basefilename_noext, cell_overwrite_ok=True)
# Get a CSV reader object set up for reading the input file with tab
delimiters
datareader = csv.reader(open(inputfilename, 'rb'),
delimiter='\t', quotechar='"')
# Process the file and output to Excel sheet
for rowno, row in enumerate(datareader):
for colno, colitem in enumerate(row):
worksheet.write(rowno, colno, colitem)
# Write the output file.
workbook.save(outputfilename)
# Open it via the operating system (will only work on Windows)
# On Linux/Unix you would use subprocess.Popen(['xdg-open', filename])
os.startfile(outputfilename)
You would first need to put all of your required text files in the current folder, glob.glob('*.txt') could then be used to get a list of these filenames. For each text file, read the files in using readlines() and extract the required lines using itemgetter(). For each file, create a new row in your output worksheet and write each line as a different column entry.
import xlwt
import glob
import operator
# Create a workbook object
wb = xlwt.Workbook()
# # Add a sheet object
ws = wb.add_sheet('Sheet1', cell_overwrite_ok=True)
rowy = 0
for text_filename in glob.glob('*.txt'):
with open(text_filename) as f_input:
try:
lines = [line.strip() for line in operator.itemgetter(4, 5, 9, 14, 32)(f_input.readlines())]
except IndexError as e:
print "'{}' is too short".format(text_filename)
lines = []
# Output to Excel sheet
for colno, colitem in enumerate(lines):
ws.write(rowy, colno, colitem)
rowy += 1
# Write the output file.
wb.save('output.xls')
So far for my code to read from text files and export to Excel I have:
import glob
data = {}
for infile in glob.glob("*.txt"):
with open(infile) as inf:
data[infile] = [l[:-1] for l in inf]
with open("summary.xls", "w") as outf:
outf.write("\t".join(data.keys()) + "\n")
for sublst in zip(*data.values()):
outf.write("\t".join(sublst) + "\n")
The goal with this was to reach all of the text files in a specific folder.
However, when I run it, Excel gives me an error saying,
"File cannot be opened because: Invalid at the top level of the document. Line 1, Position 1. outputgooderr.txt outputbaderr.txt. fixed_inv.txt
Note: outputgooderr.txt, outputbaderr.txt.,fixed_inv.txt are the names of the text files I wish to export to Excel, one file per sheet.
When I only have one file for the program to read, it is able to extract the data. Unfortunately, this is not what I would like since I have multiple files.
Please let me know of any ways I can combat this. I am very much so a beginner in programming in general and would appreciate any advice! Thank you.
If you're not opposed to having the outputted excel file as a .xlsx rather than .xls, I'd recommend making use of some of the features of Pandas. In particular pandas.read_csv() and DataFrame.to_excel()
I've provided a fully reproducible example of how you might go about doing this. Please note that I create 2 .txt files in the first 3 lines for the test.
import pandas as pd
import numpy as np
import glob
# Creating a dataframe and saving as test_1.txt/test_2.txt in current directory
# feel free to remove the next 3 lines if yo want to test in your directory
df = pd.DataFrame(np.random.randn(10, 3), columns=list('ABC'))
df.to_csv('test_1.txt', index=False)
df.to_csv('test_2.txt', index=False)
txt_list = [] # empty list
sheet_list = [] # empty list
# a for loop through filenames matching a specified pattern (.txt) in the current directory
for infile in glob.glob("*.txt"):
outfile = infile.replace('.txt', '') #removing '.txt' for excel sheet names
sheet_list.append(outfile) #appending for excel sheet name to sheet_list
txt_list.append(infile) #appending for '...txt' to txtt_list
writer = pd.ExcelWriter('summary.xlsx', engine='xlsxwriter')
# a for loop through all elements in txt_list
for i in range(0, len(txt_list)):
df = pd.read_csv('%s' % (txt_list[i])) #reading element from txt_list at index = i
df.to_excel(writer, sheet_name='%s' % (sheet_list[i]), index=False) #reading element from sheet_list at index = i
writer.save()
Output example: