Mail Merge With Python - python

I have been trying to write a python script to mail merge labels. It would need to allow me to look into a folder, open an excel document, merge the document, and print it as a pdf. All the rows in each excel file are part of the same document and I'd like for them to be printed together. I've written up a script that opens a word template and pulls up the excel file to populate into the mail merge but when I print it:
The printed copy only shows me the merge fields not the information on the workbook
Only prints the first page, some of the files I use to make labels would make more than one page.
I've included the code that I have as well as pictures of what I'm currently getting and what I need the end Product to look like.
If anyone can help me on this, you would be a live saver.
What I need:
What I'm getting:
from os import listdir
import win32com.client as win32
import pathlib
import os
import pandas as pd
pd.options.mode.chained_assignment = None
working_directory = os.getcwd()
path = pathlib.Path().resolve()
inputPath = str(path) + '\Output'
outputPath = str(path) + '\OutputPDF'
inputs = listdir(inputPath)
wordApp = win32.Dispatch('Word.Application')
wordApp.Visible = True
sourceDoc = wordApp.Documents.Open(os.path.join(working_directory, 'labelTemplate.docx'))
mail_merge = sourceDoc.MailMerge
for x in inputs[1:]:
mail_merge.OpenDataSource(inputPath + '/'+ x)
print (x)
y = x.replace('.xlsx', '')
z = y.replace('output_','')
print (z)
mail_merge = wordApp.ActiveDocument
mail_merge.ExportAsFixedFormat(os.path.join(outputPath, z), exportformat:=17)`

Related

Difficulty trying to export query results as a CSV, uploaded to SharePoint (PySpark)

I am trying to run a query, with the result saved as a CSV that is uploaded to a SharePoint folder. This is within Databricks via Pyspark.
My code below is close to doing this, but the final line is not functioning correctly - the file generated in SharePoint does not contain any data, though the dataframe does.
I'm new to Python and Databricks, if anyone can provide some guidance on how to correct that final line I'd really appreciate it!
from shareplum import Site
from shareplum.site import Version
import pandas as pd
sharepointUsername =
sharepointPassword =
sharepointSite =
website =
sharepointFolder =
# Connect to SharePoint Folder
authcookie = Office365(website, username=sharepointUsername, password=sharepointPassword).GetCookies()
site = Site(sharepointSite, version=Version.v2016, authcookie=authcookie)
folder = site.Folder(sharepointFolder)
FileName = "Data_Export.csv"
Query = "SELECT * FROM TABLE"
df = spark.sql(Query)
pandasdf = df.toPandas()
folder.upload_file(pandasdf.to_csv(FileName, encoding = 'utf-8'), FileName)
Sure my code is still garbage, but it does work. I needed to convert the dataframe into a variable containing CSV formatted data prior to uploading it to SharePoint; effectively I was trying to skip a step before. Last two lines were updated:
from shareplum.site import Version
import pandas as pd
sharepointUsername =
sharepointPassword =
sharepointSite =
website =
sharepointFolder =
# Connect to SharePoint Folder
authcookie = Office365(website, username=sharepointUsername, password=sharepointPassword).GetCookies()
site = Site(sharepointSite, version=Version.v2016, authcookie=authcookie)
folder = site.Folder(sharepointFolder)
FileName = "Data_Export.csv"
Query = "SELECT * FROM TABLE"
df = (spark.sql(QueryAllocation)).toPandas().to_csv(header=True, index=False, encoding='utf-8')
folder.upload_file(df, FileName)

Python - Excel to HTML (keeping format)

I am trying to convert an Excel file to an HTML file while keeping the format of the workbook.
Using Excel, I am able to switch from xlsx to htm: File -> Save as -> Web page (*.html, *.htm)
Using Python, I am always getting something gibberish like the below image as workbook.htm or workbook.html.
import xlwings as xw
file_path = "*.xlsx"
excel_app = xw.App(visible=False)
wb = excel_app.books.open(file_path)
wb.save("*.html")
wb.save("*.htm")
from xlsx2html import xlsx2html
xlsx2html('*xlsx', '*.htm')
xlsx2html('*xlsx', '*.html')
I have used dummy files, I am just trying to go from the xlsx file to the htm/hmtl file using Python and keeping the format, e.g. background colors, borders, etc.
I used to have such problem. I also used xlwings library, customized it and success. You find and edit in the file xlwings/_xlwindows.py as follows:
def save(self, path=None):
saved_path = self.xl.Path
source_ext = os.path.splitext(self.name)[1] if saved_path else None
target_ext = os.path.splitext(path)[1] if path else '.xlsx'
if saved_path and source_ext == target_ext:
file_format = self.xl.FileFormat
else:
ext_to_file_format = {'.xlsx': FileFormat.xlOpenXMLWorkbook,
'.xlsm': FileFormat.xlOpenXMLWorkbookMacroEnabled,
'.xlsb': FileFormat.xlExcel12,
'.xltm': FileFormat.xlOpenXMLTemplateMacroEnabled,
'.xltx': FileFormat.xlOpenXMLTemplateMacroEnabled,
'.xlam': FileFormat.xlOpenXMLAddIn,
'.xls': FileFormat.xlWorkbookNormal,
'.xlt': FileFormat.xlTemplate,
'.xla': FileFormat.xlAddIn,
'.html': FileFormat.xlHtml # ---> add new
}

File Not Found Error while Downloading Image files

I am using Windows 8.1, so I have been web scraping a lot recently and have been very successful in finding out some errors as well, but now I am stuck in downloading the files as they will not download and giving me a
FileNotFoundError.
I have removed all the unknown characters from the name files but still, get this error. any help.
I have also made the names lowercase just in case. The error happens when I download the 22nd item, other items download fine before the 22nd one .
My Code and also the Excel file For reference:
import time
import pandas as pd
import requests
Final1 = pd.read_excel("Sneakers.xlsx")
Final1.index+=1
a = Final1.index.tolist()
Images = Final1["Images"].tolist()
Name = Final1["Name"].str.lower().tolist()
Brand = Final1["Brand"].str.lower().tolist()
s = requests.Session()
for i,n,b,l in zip(a,Name,Brand,Images):
r = s.get(l).content
with open("Images//" + f"{i}-{n}-{b}.jpg","wb") as f:
f.write(r)
Excel File (Google Drive) : Excel File
It seems like you don't have Images folder in your path.
It's better way to use os.path.join() function for joining path in python.
Try Below:
import os
import time
import pandas as pd
import requests
Final1 = pd.read_excel("Sneakers.xlsx")
Final1.index+=1
a = Final1.index.tolist()
Images = Final1["Images"].tolist()
Name = Final1["Name"].str.lower().tolist()
Brand = Final1["Brand"].str.lower().tolist()
# Added
if not os.path.exists("Images"):
os.mkdir("Images")
s = requests.Session()
for i,n,b,l in zip(a,Name,Brand,Images):
r = s.get(l).content
# with open("Images//" + f"{i}-{n}-{b}.jpg","wb") as f:
with open(os.path.join("Images", f"{i}-{n}-{b}.jpg"),"wb") as f:
f.write(r)

Tabula does not recognize table

I have a simple python program that takes in a pdf (with a table) and saves the data into a csv file using tabula:
import tabula
if __name__ == '__main__':
path = input('Filename: ')
pathSegments = path.split('/')
folder = ''
i = 0
while i < len(pathSegments)-1:
folder += '/' + pathSegments[i]
i += 1
name = pathSegments[len(pathSegments)-1].split('.')[0]
dest = folder + '/' + name + '.csv'
print(dest)
tabula.convert_into(path, dest, pages = "all", output_format = "csv")
I tried multiple different pdfs, for example one with the following picture:
The result however, is always an empty csv file, tabula does not seem to recognize the tables
Tabula isn't perfect at picking up tables. I would look into adding a templates to give tabula more guidance. These templates could by dynamically generated depending on different features of the document. See the function tabula.read_pdf_with_template as documented here: https://tabula-py.readthedocs.io/en/latest/tabula.html#tabula.io.read_pdf_with_template.

Save selection of multiple Excel workbooks to one pdf with Python

I want to make a pdf somposed by ranges in all Excel-workbooks located in a given folder (folderwithallfiles). All workbooks will have the same structure so the range reference will be the same for all workbooks.
I thought I got it with the script below, but it does not work.
import win32com.client as win32
import glob
import os
xlfiles = sorted(glob.glob("*.xlsx"))
#print "Reading %d files..."%len(xlfiles)
cwd = "C:\\Users\\user\folderwithallfiles"
#cwd = os.getcwd()
path_to_pdf = r'C:\\Users\\user\folderwithallfiles\multitest.pdf'
excel = win32.gencache.EnsureDispatch('Excel.Application')
for xlfile in xlfiles:
wb = excel.Workbooks.Open(cwd+"\\"+xlfile)
ws = wb.Sheets('sheet 1')
ws.Range("A1:Q59").Select()
wb.ActiveSheet.ExportAsFixedFormat(0, path_to_pdf)
Please check the below code if it works. I have written on the fly. Let me know if you find issues in it.
import pandas as pd
import numpy as np
import glob
import pdfkit as pdf
all_data = pd.DataFrame()
for f in glob.glob("filepath\file*.xlsx"):
df = pd.read_excel(f)
all_data = all_data.append(df, ignore_index=True)
all_data.to_html("filepath\all_data.html)
pdf.from_file("filepath\all_data.html", "filepath\all_data.pdf")

Categories

Resources