Find and Replace text within headers with Win32COM - python

I'd like to find some words in the headers of a Word document and replace them with other words : I've done this in the body of the document with the following code, and it works fine.
import win32com.client
wdFindContinue = 1
wdReplaceAll = 2
app = win32com.client.DispatchEx("Word.Application")
app.Visible = 1
app.DisplayAlerts = 0
app.Documents.Open(document_path)
FromTo = {"<#TITLE#>":"My title", "<#DATE#>":"Today"}
for From in FromTo.keys():
app.Selection.Find.Execute(From, False, False, False, False, False, True, wdFindContinue, False, FromTo[From], wdReplaceAll)
The problem is that this code doesn't work for headers and footers. I've also tried this :
app.ActiveDocument.Sections(1).Headers(win32com.client.constants.wdHeaderFooterPrimary).Range.Select
app.Selection.Find.Execute(From, False, False, False, False, False, True, wdFindContinue, False, FromTo[From], wdReplaceAll)
But it doesn't work better (despite the fact that I don't have any error message).
Does someone have an idea on how to do that? Another information is that I have an image inserted in the headers as well, I don't know if it matters or not.

You must activate header/footer pane after open document.
Language Visual basic. Change syntax to python
ActiveDocument.ActiveWindow.Panes(1).View.SeekView=wdSeekCurrentPageHeader
for header and
ActiveDocument.ActiveWindow.Panes(1).View.SeekView = wdSeekCurrentPageFooter
for footer
Then search/replace
To change pane to main part use
ActiveDocument.ActiveWindow.Panes(1).View.SeekView = wdSeekMainDocument

Related

Trying to make a local file dictionary and search system

I am trying to make a local file searcher, which will search for files based on tags, and also will by names. i dont have any idea on how to make the searching system nor the python dictionary and searching with tags which confuse me.
files = {'samplefile1.txt', 'samplefile2.txt'}
fileName = ''
fileDiscription = 'Enter Discription here'
isMP3File = True
isMP4File = True
isTxtFile = True
isArchived = True
tags = ['sample1', 'sample2', 'favorited']
filesDictionary = {
'samplefile1.txt': {
fileName: 'coolFile1',
fileDiscription: 'cool disc.',
isMP3File: False,
isMP4File: False,
isTxtFile: True,
isArchived: False,
tags = ['sample1', 'favorited']
},
'samplefile1.txt': {
fileName: 'coolFile2',
fileDiscription: 'cool disc2',
isMP3File: False,
isMP4File: False,
isTxtFile: True,
isArchived: True,
tags = ['sample2']
},
}
so in the code above, with search function, it should show only samplefile1.txt when searched by 'sample1', or 'favorited', or samplefile2.txt if searched with 'sample2'
(also fileName is the name i was talking about in this question, not the file name on pc)
(also any idea on how to automate this 'files' dictionary adding using gui (something like how you would post stuff to twitter or smth, with ticks and message boxes))
Create a dictionary where you have each tag as a key, and the filename as a value.
Since you want to search by tag, having the tags as keys will make the search time constant.
searchDict = {
'sample1': ['samplefile1.txt'],
'favorited': ['samplefile1.txt'],
'sample2': ['samplefile2.txt']
}
then given a tag you can just get the filename inmediately
searchDict['sample1'] # will return ['samplefile1.txt']
You can then use that key to access your main dictionary files
for filename in searchDict['sample1']:
print(files[filename])
will print
{
fileName: 'coolFile1',
fileDiscription: 'cool disc.',
isMP3File: False,
isMP4File: False,
isTxtFile: True,
isArchived: False,
tags = ['sample1', 'favorited']
}
To create the searchDict, you can iterate once over your database of files, getting the tags and associating them to the filenames. It will be a costly operation if your database is big, but once done your search will run in constant time.

huggingface transformer question answer confidence score

How can we fetch the answer confidence score from the sample code of huggingface transformer question answer? I see that pipeline does return the score, but can the below core also return the confidence score.
from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering
import tensorflow as tf
tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
text = r"""
🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""
questions = [
"How many pretrained models are available in Transformers?",
"What does Transformers provide?",
"Transformers provides interoperability between which frameworks?",
]
for question in questions:
inputs = tokenizer.encode_plus(question, text, add_special_tokens=True, return_tensors="tf")
input_ids = inputs["input_ids"].numpy()[0]
text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
answer_start_scores, answer_end_scores = model(inputs)
answer_start = tf.argmax(
answer_start_scores, axis=1
).numpy()[0] # Get the most likely beginning of answer with the argmax of the score
answer_end = (
tf.argmax(answer_end_scores, axis=1) + 1
).numpy()[0] # Get the most likely end of answer with the argmax of the score
answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
print(f"Question: {question}")
print(f"Answer: {answer}\n")
Code picked up from
https://huggingface.co/transformers/usage.html
The score is just a multiplication of the logits of the answer start token answer end token after applying the softmax function. Please have a look at the example below:
Pipeline output:
import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")
model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad")
text = r"""
🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""
question = "How many pretrained models are available in Transformers?"
question_answerer = pipeline("question-answering", model = model, tokenizer= tokenizer)
print(question_answerer(question=question, context = text))
Output:
{'score': 0.5254509449005127, 'start': 256, 'end': 264, 'answer': 'over 32+'}
Without pipeline:
inputs = tokenizer(question, text, add_special_tokens=True, return_tensors="pt")
outputs = model(**inputs)
At first, we create a mask that has a 1 for every context token and 0 otherwise (question tokens and special tokens. We use the batchencoding.sequence_ids method for that:
non_answer_tokens = [x if x in [0,1] else 0 for x in inputs.sequence_ids()]
non_answer_tokens = torch.tensor(non_answer_tokens, dtype=torch.bool)
non_answer_tokens
Output:
tensor([False, False, False, False, False, False, False, False, False, False,
False, False, False, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, False])
We use this mask to set the logits of the special tokens and the questions tokens to negative infinite and apply the softmax afterward (the negative infinite prevents these tokens from influencing the softmax result):
from torch.nn.functional import softmax
potential_start = torch.where(non_answer_tokens, outputs.start_logits, torch.tensor(float('-inf'),dtype=torch.float))
potential_end = torch.where(non_answer_tokens, outputs.end_logits, torch.tensor(float('-inf'),dtype=torch.float))
potential_start = softmax(potential_start, dim = 1)
potential_end = softmax(potential_end, dim = 1)
potential_start
Output:
tensor([[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 1.0567e-04, 9.7031e-05, 1.9445e-06, 1.5849e-06, 1.2075e-07,
3.1704e-08, 4.7796e-06, 1.8712e-07, 6.2977e-08, 1.5481e-07, 8.0004e-08,
3.7896e-07, 1.6438e-07, 9.7762e-08, 1.0898e-05, 1.6518e-07, 5.6349e-08,
2.4848e-07, 2.1459e-07, 1.3785e-06, 1.0386e-07, 1.8803e-07, 8.1887e-08,
4.1088e-07, 1.5618e-07, 2.5624e-06, 1.8526e-06, 2.6710e-06, 6.8466e-08,
1.7953e-07, 3.6242e-07, 2.2788e-07, 2.3384e-06, 1.2147e-05, 1.6065e-07,
3.3257e-07, 2.6021e-07, 2.8140e-06, 1.3698e-07, 1.1066e-07, 2.8436e-06,
1.2171e-07, 9.9341e-07, 1.1684e-07, 6.8935e-08, 5.6335e-08, 1.3314e-07,
1.3038e-07, 7.9560e-07, 1.0671e-07, 9.1864e-08, 5.6394e-07, 3.0210e-08,
7.2176e-08, 5.4452e-08, 1.2873e-07, 9.2636e-08, 9.6012e-07, 7.8008e-08,
1.3124e-07, 1.3680e-06, 8.8716e-07, 8.6627e-07, 6.4750e-06, 2.5951e-07,
6.1648e-07, 8.7724e-07, 1.0796e-05, 2.6633e-07, 5.4644e-07, 1.7553e-07,
1.6015e-05, 5.0054e-07, 8.2263e-07, 2.6336e-06, 2.0743e-05, 4.0008e-07,
1.9330e-06, 2.0312e-04, 6.0256e-01, 3.9638e-01, 3.1568e-04, 2.2009e-05,
1.2485e-06, 2.4744e-06, 1.0092e-05, 3.1047e-06, 1.3597e-04, 1.5105e-06,
1.4960e-06, 8.1164e-08, 1.6534e-06, 4.6181e-07, 8.7354e-08, 2.2356e-07,
9.1145e-07, 8.8194e-06, 4.4202e-07, 1.9238e-07, 2.8077e-07, 1.4117e-05,
2.0613e-07, 1.2676e-06, 8.1317e-08, 2.2337e-06, 1.2399e-07, 6.1745e-08,
3.4725e-08, 2.7878e-07, 4.1457e-07, 0.0000e+00]],
grad_fn=<SoftmaxBackward>)
These probabilities can now be used to extract the start and end token of the answer and to calculate the answer score:
answer_start = torch.argmax(potential_start)
answer_end = torch.argmax(potential_end)
answer = tokenizer.decode(inputs.input_ids.squeeze()[answer_start:answer_end+1])
print(potential_start.squeeze()[answer_start])
print(potential_end.squeeze()[answer_end])
print(potential_start.squeeze()[answer_start] *potential_end.squeeze()[answer_end])
print(answer)
Output:
tensor(0.6026, grad_fn=<SelectBackward>)
tensor(0.8720, grad_fn=<SelectBackward>)
tensor(0.5255, grad_fn=<MulBackward0>)
over 32 +
P.S.: Please keep in mind that this answer does not cover any special cases (end token before start token).

How to Group by Date Field in a PivotTable using win32com.client

I tried many approaches from last few hours but no luck. Somebody please help me.
group_dt = pt.PivotFields('Created')
group_dt.LabelRange.Group(Start=True, End=True, Periods=Array(False, False, False, False, True, False, True))
Error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-75-0355ab1abb88> in <module>
1 group_dt = pt.PivotFields('Created')
----> 2 group_dt.LabelRange.Group(Start=True, End=True, Periods=Array(False, False, False, False, True, False, True))
TypeError: 'str' object is not callable
After a lot of research I have figured out the way to group date field using win32com.client
cell = pivot_sheet.Range('B5')
cell.Group(Start=True, End=True, Periods=list([False, False, False, False, True, False, True]))

ConfigObj: prevent writing empty sections

I'm using ConfigObj (5.0.6, both on python 2.7 and python 3.8) to manage my configs, but when I'm writing to file config with some sections only presented in configspec, they're appearing only as empty sections, which is not desired. I would appreciate any suggestions of fixing that behaviour of ConfigObj.
Minimal example of what happening:
from configobj import ConfigObj
from validate import Validator
spec = ["[Section]", "option = boolean(default=True)"]
config = ConfigObj(infile={'Section2': {'option2': False}}, configspec=spec)
config.validate(Validator())
print(config)
print(config.write())
Output:
{'Section2': {'option2': False}, 'Section': {'option': True}}
['[Section2]', ' option2 = False', '[Section]']
Desired output (there should be no empty sections when writing):
{'Section2': {'option2': False}, 'Section': {'option': True}}
['[Section2]', ' option2 = False']
Edit 1: I'm using write() to actually write into file so I would prefer not just mess with returned list of strings
To put default values in the output config file, pass copy = True to the validate:
from configobj import ConfigObj
from validate import Validator
spec = ["[Section]", "option = boolean(default=True)"]
config = ConfigObj(infile={'Section2': {'option2': False}}, configspec=spec)
# set copy = True vvvvvvvvvvv
config.validate(Validator(), copy = True)
print(config)
print(config.write())
which gives your desired output
{'Section2': {'option2': False}, 'Section': {'option': True}}
['[Section2]', 'option2 = False', '[Section]', 'option = True']

How to Escape true/false boolean at python JSON string

i have following code
headers = {'Content-Type': 'application/json', 'cwauth-token': token}
payload = {'namePostfix': 'test99682', 'costModel': 'NOT_TRACKED', 'clickRedirectType': 'REGULAR', 'trafficSource':{'id': '3a7ff9ec-19af-4996-94c1-7f33e036e7af'}, 'redirectTarget': 'DIRECT_URL', 'client':{'id': 'clentIDc', 'clientCode': 'xxx', 'mainDomain': 'domain.tld', 'defaultDomain': 'domain.tld'', 'dmrDomain': 'domain.tld'', 'customParam1Available': false, 'realtimeRoutingAPI': false, 'rootRedirect': false}, 'country':{'code': 'UK'}, 'directRedirectUrl': 'http://google.co.uk'}
r = requests.post('http://stackoverflow.com', json=payload, headers=headers)
When i hit start, it gives error
NameError: name 'false' is not defined
How i can escape those false booleans at payload?
Python doesn't use false, it uses False, hence you're getting a NameError because Python is looking for a variable called false which doesn't exist.
Replace false with False in your dictionary. You've also got a few too many quotes in places, so I've removed those:
payload = {'namePostfix': 'test99682', 'costModel': 'NOT_TRACKED', 'clickRedirectType': 'REGULAR', 'trafficSource':{'id': '3a7ff9ec-19af-4996-94c1-7f33e036e7af'}, 'redirectTarget': 'DIRECT_URL', 'client':{'id': 'clentIDc', 'clientCode': 'xxx', 'mainDomain': 'domain.tld', 'defaultDomain': 'domain.tld', 'dmrDomain': 'domain.tld', 'customParam1Available': False, 'realtimeRoutingAPI': False, 'rootRedirect': False}, 'country':{'code': 'UK'}, 'directRedirectUrl': 'http://google.co.uk'}
Likewise, the opposite boolean value is True (not true), and the "null" data type is None.
False is the correct name to use in Python. In Python, the boolean values, true and false are defined by capitalized True and False

Categories

Resources