Add custom properties to Excel workbook with openpyxl

Add custom properties to Excel workbook with openpyxl - python

With VBA, I can edit arbitrary workbook metadata like so, and it will be reflected on SharePoint:
With ThisWorkbook
.ContentTypeProperties("Property A") = 1
.ContentTypeProperties("Prop B") = “Something”
End With
Now, I am hoping to do the same with openpyxl
I can do this for properties without spaces:
wb.properties.title = 'test'
but properties with spaces won't work--I try this and the script runs, but nothing shows on SharePoint:
setattr(wb.properties, 'Project Title', 'hello')
wb.properties.__dict__['Project Number'] = '12'
This can be done with xlsxwriter
import xlsxwriter
workbook = xlsxwriter.Workbook(wb_path)
workbook.set_custom_property('Project Title', 'hello')
workbook.close()
but it will create a new workbook...
According to this https://foss.heptapod.net/openpyxl/openpyxl/-/merge_requests/384/diffs?commit_id=e00ce36aa92ae4fffa7014e460a8999681d73b8b
I could simply do
wb.custom_doc_props.add(k, v), but I'm getting no attribute custom_doc_props with the latest version (I believe), 3.0.10.
I installed the 3.2.0b1 version with pip, and now get AttributeError: 'CustomDocumentPropertyList' object has no attribute 'add'. I guess the method isn't fully implemented yet

This works with version 3.1.0 of openpyxl. You can download via pip with
python -m pip install https://foss.heptapod.net/openpyxl/openpyxl/-/archive/branch/3.1/openpyxl-branch-3.1.zip
and assign properties like so
from openpyxl.packaging.custom import (
BoolProperty,
DateTimeProperty,
FloatProperty,
IntProperty,
LinkProperty,
StringProperty,
CustomPropertyList,
)
props = CustomPropertyList()
props.append(StringProperty(name='hello world', value='foo bar'))
wb.custom_doc_props = props
wb.save(...)
Data is preserved on SharePoint. More info here: https://foss.heptapod.net/openpyxl/openpyxl/-/blob/branch/3.1/doc/workbook_custom_doc_props.rst

Related

Add notes using openpyxl

everyone. I'm looking for a way to make notes in excel worksheet, using python. Found a way to add comments, but I need notes like on screenshot. Is there an easy way to add them using openpyxl or any other lib? screenshot of a note

Updating this because this post is the top google result
The openpyxl docs incorrectly refer to notes as comments. Just follow the instructions in the docs on how to add comments: https://openpyxl.readthedocs.io/en/latest/comments.html. There is one issue with their example code though in the docs, so use the below code instead:
from openpyxl import Workbook
from openpyxl.comments import Comment
wb = Workbook()
ws = wb.active
comment = Comment('This is the comment text', 'Comment Author')
ws["A1".comment] = comment
wb.save('your_file.xlsx')

I've been trying to accomplish the same thing. Apparently that "note" is the same as data validation as described in the docs.
So what you do is:
from openpyxl import load_workbook
from openpyxl.worksheet.datavalidation import DataValidation
wb = load_workbook('my_sheets.xlsx')
# Create 'note'
dv = DataValidation()
dv.errorTitle = 'Your note title'
dv.error = 'Your note body'
# Add 'note' to A1 in the active sheet
dv.add(wb.active['A1'])
# This is required also, or you won't see the note
wb.active.add_data_validation(dv)
wb.save('my_sheet_with_note.xlsx')
It also mentions prompts, which is something you can look into:
# Optionally set a custom prompt message
dv.promptTitle = 'List Selection'
dv.prompt = 'Please select from the list'
Edit: I updated the answer with a part of the code that solved my problem at the time:
def add_note(cell, prompt_title='', prompt=''):
p = {
'promptTitle': prompt_title[:32],
'prompt': prompt[:255],
}
dv = DataValidation(**p)
dv.add(cell)
cell.parent.add_data_validation(dv)

Issues / question with batch update with pygsheets / google sheets

Ok I'm new to python but...I really like it. I have been trying to figure this out for awhile and thought someone could help that knows a lot more than I.
So what I would like to do is use pygsheets and combine batch the updates with one api call vs several. I have been searching for examples or ideas and found if you unlink and link it will do this? I tried and it speed it up only a little bit, then I looked and you could use update.values vs update.value. I have got it to work with the something like this wk1.update_values('A2:C4',[[1,2,3],[4,5,6],[7,8,9]]) but what if you want the updates to be in specific cell locations vs a range like a2:c4? I appreciate any advice in advance.
https://pygsheets.readthedocs.io/en/latest/worksheet.html#pygsheets.Worksheet.update_values
https://pygsheets.readthedocs.io/en/latest/sheet_api.html?highlight=batch_updates#pygsheets.sheet.SheetAPIWrapper.values_batch_update
import pygsheets
gc = pygsheets.authorize() # This will create a link to authorize
# Open spreadsheet
GS_ID = ''
File_Tab_Name = 'File1'
Main_Topic = 'Main Topic'
Actual_Company_Name = 'Company Name'
Street = 'Street Address'
City_State_Zip = 'City State Zip'
Phone_Number = 'Phone Number'
# 2. Open spreadsheet by key
sh = gc.open_by_key(GS_ID)
sh.title = File_Tab_Name
wk1 = sh[0]
wk1.title = File_Tab_Name
#wk1.update_values('A2:C4',[[1,2,3],[4,5,6],[7,8,9]])
wk1.update_values([['a1'],['h1'],['i3']],[[Main_Topic],[Actual_Company_Name],[Street]]) ### is this possible
#wk1.unlink()
#wk1.title = File_Tab_Name
#wk1.update_value("a1",Main_Topic) ###Topic
#wk1.update_value("h1",Actual_Company_Name) ###Company Name
#wk1.update_value("i3",Street) ###Street Address
#wk1.update_value("i4",City_State_Zip) ###City State Zip
#wk1.update_value("i5",Phone_Number) ### Phone Number
#wk1.link() # will do all the updates

From what I could undersand you want to batch update values. you can use the update_values_batch function.
wks.update_values_batch(['A1:A2', 'B1:B2'], [[[1],[2]], [[3],[4]]])
# or
wks.update_values_batch([((1,1), (2,1)), 'B1:B2'], [[[1,2]], [[3,4]]], 'COLUMNS')
# or
wks.update_values_batch(['A1:A2', 'B1:B2'], [[[1,2]], [[3,4]]], 'COLUMNS')
see doc here.
NB: update pygsheets to latest version or install from gitub
pip install --upgrade https://github.com/nithinmurali/pygsheets/archive/staging.zip

Unfortunately, pygsheets has no method for updating multiple ranges in batch. Instead, you can use gspread.
gspread has batch_update method where you can update multiple cell or range at once.
Example:
Code:
import gspread
gc = gspread.service_account()
sh = gc.open_by_key("insert spreadsheet key here").sheet1
sh.batch_update([{
'range': 'A1:B1',
'values': [['42', '43']],
}, {
'range': 'A2:B2',
'values': [['44', '45']],
}])
Output:
References:
gspread:batch_update()
gspread Authentication

How to fill PDF forms using Python

I have a PDF form created using Adobe LiveCycle Designer ES 10.4. I need to fill it using Python so that we can reduce manual labor. I searched the web and read some article most of them were focused around pdfrw library, I tried using it and extracted some information from PDF form as shown below
Code
from pdfrw import PdfReader
pdf = PdfReader('sample.pdf')
print(pdf.keys())
print(pdf.Info)
print(pdf.Root.keys())
print('PDF has {} pages'.format(len(pdf.pages)))
Output
['/Root', '/Info', '/ID', '/Size']
{'/CreationDate': "(D:20180822164509+05'30')", '/Creator': '(Adobe LiveCycle Designer ES 10.4)', '/ModDate': "(D:20180822165611+05'30')", '/Producer': '(Adobe XML Form Module Library)'}
['/AcroForm', '/MarkInfo', '/Metadata', '/Names', '/NeedsRendering', '/Pages', '/Perms', '/StructTreeRoot', '/Type']
PDF has 1 pages
I am not sure how further I can use pdfrw to access the fillable fields from the PDF form and fill them using Python is it possible. Any suggestions would be helpful.

You can find the form fields here:
pdf.Root.AcroForm.Fields
or here
pdf.Root.Pages.Kids[page_index].Annots
This is a PdfArray object. Basically a List.
The Name of the field is found here:
pdf.Root.AcroForm.Fields[field_index].T
Other keys include the value .V
There's a bunch of display information, like the font etc under .AP.N.Resources
However, if you update the value for a field and output the pdf file. It might only display the value when the field has focus i.e is clicked on.
I haven't figured out how to fix that yet.

I wrote a library built upon:'pdfrw', 'pdf2image', 'Pillow', 'PyPDF2' called fillpdf (pip install fillpdf and poppler dependency conda install -c conda-forge poppler)
Basic usage:
from fillpdf import fillpdfs
fillpdfs.get_form_fields("blank.pdf")
# returns a dictionary of fields
# Set the returned dictionary values a save to a variable
# For radio boxes ('Off' = not filled, 'Yes' = filled)
data_dict = {
'Text2': 'Name',
'Text4': 'LastName',
'box': 'Yes',
}
fillpdfs.write_fillable_pdf('blank.pdf', 'new.pdf', data_dict)
# If you want it flattened:
fillpdfs.flatten_pdf('new.pdf', 'newflat.pdf')
More info here:
https://github.com/t-houssian/fillpdf
If some fields don't fill, you can use fitz (pip install PyMuPDF) and PyPDF2 (pip install PyPDF2) like the following altering the points as needed:
import fitz
from PyPDF2 import PdfFileReader
file_handle = fitz.open('blank.pdf')
pdf = PdfFileReader(open('blank.pdf','rb'))
box = pdf.getPage(0).mediaBox
w = box.getWidth()
h = box.getHeight()
# For images
image_rectangle = fitz.Rect((w/2)-200,h-255,(w/2)-100,h-118)
pages = pdf.getNumPages() - 1
last_page = file_handle[pages]
last_page._wrapContents()
last_page.insertImage(image_rectangle, filename=f'image.png')
# For text
last_page.insertText(fitz.Point((w/2)-247 , h-478), 'John Smith', fontsize=14, fontname="times-bold")
file_handle.save(f'newpdf.pdf')

Use this to fill every fields if they are indexed.
template = PdfReader('template.pdf')
page_c = 0
while page_c < len(template.Root.Pages.Kids): #LOOP through pages
annot_c = 0
while annot_c < len(template.Root.Pages.Kids[page_c].Annots): #LOOP through fields
template.Root.Pages.Kids[page_c].Annots[annot_c].update(PdfDict(V=str(annot_c)+'-'+str(page_c)))
annot_c=annot_c+1
page_c=page_c+1
PdfWriter().write('output.pdf', template)

AcroForm based Forms using PDFix SDK
def SetFormFieldValue(email, key, open_path, save_path):
pdfix = GetPdfix()
if pdfix is None:
raise Exception('Pdfix Initialization fail')
if not pdfix.Authorize(pdfix_email, pdfix_license):
raise Exception('Authorization fail : ' + pdfix.GetError())
doc = pdfix.OpenDoc(open_path, "")
if doc is None:
raise Exception('Unable to open pdf : ' + pdfix.GetError())
field = doc.GetFormFieldByName("Text1")
if field is not None:
value = field.GetValue()
value = "New Value"
field.SetValue(value)
if not doc.Save(save_path, kSaveFull):
raise Exception(pdfix.GetError())
doc.Close()
pdfix.Destroy()

A full solution was provided here: How to edit editable pdf using the pdfrw library?
The key part is the:
template_pdf.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true')))

References from Python Bigquery Client don't work

Im having troubles running the following code:
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json(BQJSONKEY,project = BQPROJECT)
dataset = client.dataset(BQDATASET)
assert not dataset.exists()
The following error pop up:
'DatasetReference' object has no attribute 'exists'
Similarly when i do:
table = dataset.table(BQTABLE)
i get: 'TableReference' object has no attribute 'exists'
However, according to the docs it should work:
https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery/usage.html#datasets
here is my pip freeze (the part with google-cloud):
gapic-google-cloud-datastore-v1==0.15.3
gapic-google-cloud-error-reporting-v1beta1==0.15.3
gapic-google-cloud-logging-v2==0.91.3
gevent==1.2.2
glob2==0.5
gmpy2==2.0.8
google-api-core==0.1.1
google-auth==1.2.1
google-cloud==0.30.0
google-cloud-bigquery==0.28.0
google-cloud-bigtable==0.28.1
google-cloud-core==0.28.0
google-cloud-datastore==1.4.0
google-cloud-dns==0.28.0
google-cloud-error-reporting==0.28.0
google-cloud-firestore==0.28.0
google-cloud-language==1.0.0
google-cloud-logging==1.4.0
google-cloud-monitoring==0.28.0
google-cloud-pubsub==0.29.1
google-cloud-resource-manager==0.28.0
google-cloud-runtimeconfig==0.28.0
google-cloud-spanner==0.29.0
google-cloud-speech==0.30.0
google-cloud-storage==1.6.0
google-cloud-trace==0.16.0
google-cloud-translate==1.3.0
google-cloud-videointelligence==0.28.0
google-cloud-vision==0.28.0
google-gax==0.15.16
google-resumable-media==0.3.1
googleapis-common-protos==1.5.3
I wonder how can i fix it and make it work?

Not sure how you got to this docs but you should be using these as reference:
https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html#datasets
Code for 0.28 would be something like:
dataset_refence = client.dataset(BQDATASET)
dataset = client.get_dataset(dataset_reference)
assert dataset.created is not None

I think you forgot to create the dataset before calling exists()
dataset = client.dataset(BQDATASET)
dataset.create()
assert not dataset.exists()

AutoFilter method of Range class failed (Dispatch vs EnsureDispatch)

This code fails with error: "AutoFilter method of Range class failed"
from win32com.client.gencache import EnsureDispatch
excel = EnsureDispatch('Excel.Application')
excel.Visible = 1
workbook = excel.Workbooks.Add()
sheet = workbook.ActiveSheet
sheet.Cells(1, 1).Value = 'Hello world'
sheet.Columns.AutoFilter()
This code also fails although it used to work:
from win32com.client import Dispatch
excel = Dispatch('Excel.Application')
excel.Visible = 1
workbook = excel.Workbooks.Add()
sheet = excel.ActiveSheet
sheet.Cells(1, 1).Value = 'Hello world'
sheet.Columns.AutoFilter()

Python uses win32com to communicate directly with Windows applications, and can work with (via EnsureDispatch) or without (via Dispatch) prior knowledge of the application's API. When you call EnsureDispatch, the API is fetched and written into win32com.gen_py., thereby permanently adding the application's API into your Python library.
Once you've initialised an application with EnsureDispatch, any time that a script uses Dispatch for that application, it will be given the pre-fetched API. This is good, because you can then make use of the predefined application constants (from win32com.client import constants).
However, sometimes previously working code will break. For example, in the following code, AutoFilter() will work without an argument as long as the Excel API has never previously been cached in the library...
# ExcelAutoFilterTest1
# Works unless you ever previously called EnsureDispatch('Excel.Application')
from win32com.client import Dispatch
excel = Dispatch('Excel.Application')
excel.Visible = 1
workbook = excel.Workbooks.Add()
sheet = workbook.ActiveSheet
sheet.Cells(1, 1).Value = 'Hello world'
sheet.Columns.AutoFilter()
The following code will always fail because now the Excel API has been fetched and written to win32com.gen_py.00020813-0000-0000-C000-000000000046x0x1x7 in your Python library, it will no longer accept AutoFilter() without an argument.
# ExcelAutoFilterTest2
# Always fails with error: AutoFilter method of Range class failed
from win32com.client.gencache import EnsureDispatch
excel = EnsureDispatch('Excel.Application')
excel.Visible = 1
workbook = excel.Workbooks.Add()
sheet = workbook.ActiveSheet
sheet.Cells(1, 1).Value = 'Hello world'
sheet.Columns.AutoFilter()
The following code always works because we're now providing the VisibleDropDown argument (1=on, 0=off).
# ExcelAutoFilterTest3
# Always succeeds
from win32com.client.gencache import EnsureDispatch
excel = EnsureDispatch('Excel.Application')
excel.Visible = 1
workbook = excel.Workbooks.Add()
sheet = workbook.ActiveSheet
sheet.Cells(1, 1).Value = 'Hello world'
sheet.Columns.AutoFilter(1)
This seems to be a bug, because the Excel API documentation claims that all arguments to AutoFilter are optional:
"If you omit all the arguments, this method simply toggles the display
of the AutoFilter drop-down arrows in the specified range."

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Add custom properties to Excel workbook with openpyxl - python

Related

Add notes using openpyxl

Issues / question with batch update with pygsheets / google sheets

How to fill PDF forms using Python

References from Python Bigquery Client don't work

AutoFilter method of Range class failed (Dispatch vs EnsureDispatch)

Categories

Resources