I am trying to make a local file searcher, which will search for files based on tags, and also will by names. i dont have any idea on how to make the searching system nor the python dictionary and searching with tags which confuse me.
files = {'samplefile1.txt', 'samplefile2.txt'}
fileName = ''
fileDiscription = 'Enter Discription here'
isMP3File = True
isMP4File = True
isTxtFile = True
isArchived = True
tags = ['sample1', 'sample2', 'favorited']
filesDictionary = {
'samplefile1.txt': {
fileName: 'coolFile1',
fileDiscription: 'cool disc.',
isMP3File: False,
isMP4File: False,
isTxtFile: True,
isArchived: False,
tags = ['sample1', 'favorited']
},
'samplefile1.txt': {
fileName: 'coolFile2',
fileDiscription: 'cool disc2',
isMP3File: False,
isMP4File: False,
isTxtFile: True,
isArchived: True,
tags = ['sample2']
},
}
so in the code above, with search function, it should show only samplefile1.txt when searched by 'sample1', or 'favorited', or samplefile2.txt if searched with 'sample2'
(also fileName is the name i was talking about in this question, not the file name on pc)
(also any idea on how to automate this 'files' dictionary adding using gui (something like how you would post stuff to twitter or smth, with ticks and message boxes))
Create a dictionary where you have each tag as a key, and the filename as a value.
Since you want to search by tag, having the tags as keys will make the search time constant.
searchDict = {
'sample1': ['samplefile1.txt'],
'favorited': ['samplefile1.txt'],
'sample2': ['samplefile2.txt']
}
then given a tag you can just get the filename inmediately
searchDict['sample1'] # will return ['samplefile1.txt']
You can then use that key to access your main dictionary files
for filename in searchDict['sample1']:
print(files[filename])
will print
{
fileName: 'coolFile1',
fileDiscription: 'cool disc.',
isMP3File: False,
isMP4File: False,
isTxtFile: True,
isArchived: False,
tags = ['sample1', 'favorited']
}
To create the searchDict, you can iterate once over your database of files, getting the tags and associating them to the filenames. It will be a costly operation if your database is big, but once done your search will run in constant time.
How can we fetch the answer confidence score from the sample code of huggingface transformer question answer? I see that pipeline does return the score, but can the below core also return the confidence score.
from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering
import tensorflow as tf
tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
text = r"""
🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""
questions = [
"How many pretrained models are available in Transformers?",
"What does Transformers provide?",
"Transformers provides interoperability between which frameworks?",
]
for question in questions:
inputs = tokenizer.encode_plus(question, text, add_special_tokens=True, return_tensors="tf")
input_ids = inputs["input_ids"].numpy()[0]
text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
answer_start_scores, answer_end_scores = model(inputs)
answer_start = tf.argmax(
answer_start_scores, axis=1
).numpy()[0] # Get the most likely beginning of answer with the argmax of the score
answer_end = (
tf.argmax(answer_end_scores, axis=1) + 1
).numpy()[0] # Get the most likely end of answer with the argmax of the score
answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
print(f"Question: {question}")
print(f"Answer: {answer}\n")
Code picked up from
https://huggingface.co/transformers/usage.html
The score is just a multiplication of the logits of the answer start token answer end token after applying the softmax function. Please have a look at the example below:
Pipeline output:
import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")
model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad")
text = r"""
🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""
question = "How many pretrained models are available in Transformers?"
question_answerer = pipeline("question-answering", model = model, tokenizer= tokenizer)
print(question_answerer(question=question, context = text))
Output:
{'score': 0.5254509449005127, 'start': 256, 'end': 264, 'answer': 'over 32+'}
Without pipeline:
inputs = tokenizer(question, text, add_special_tokens=True, return_tensors="pt")
outputs = model(**inputs)
At first, we create a mask that has a 1 for every context token and 0 otherwise (question tokens and special tokens. We use the batchencoding.sequence_ids method for that:
non_answer_tokens = [x if x in [0,1] else 0 for x in inputs.sequence_ids()]
non_answer_tokens = torch.tensor(non_answer_tokens, dtype=torch.bool)
non_answer_tokens
Output:
tensor([False, False, False, False, False, False, False, False, False, False,
False, False, False, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, False])
We use this mask to set the logits of the special tokens and the questions tokens to negative infinite and apply the softmax afterward (the negative infinite prevents these tokens from influencing the softmax result):
from torch.nn.functional import softmax
potential_start = torch.where(non_answer_tokens, outputs.start_logits, torch.tensor(float('-inf'),dtype=torch.float))
potential_end = torch.where(non_answer_tokens, outputs.end_logits, torch.tensor(float('-inf'),dtype=torch.float))
potential_start = softmax(potential_start, dim = 1)
potential_end = softmax(potential_end, dim = 1)
potential_start
Output:
tensor([[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 1.0567e-04, 9.7031e-05, 1.9445e-06, 1.5849e-06, 1.2075e-07,
3.1704e-08, 4.7796e-06, 1.8712e-07, 6.2977e-08, 1.5481e-07, 8.0004e-08,
3.7896e-07, 1.6438e-07, 9.7762e-08, 1.0898e-05, 1.6518e-07, 5.6349e-08,
2.4848e-07, 2.1459e-07, 1.3785e-06, 1.0386e-07, 1.8803e-07, 8.1887e-08,
4.1088e-07, 1.5618e-07, 2.5624e-06, 1.8526e-06, 2.6710e-06, 6.8466e-08,
1.7953e-07, 3.6242e-07, 2.2788e-07, 2.3384e-06, 1.2147e-05, 1.6065e-07,
3.3257e-07, 2.6021e-07, 2.8140e-06, 1.3698e-07, 1.1066e-07, 2.8436e-06,
1.2171e-07, 9.9341e-07, 1.1684e-07, 6.8935e-08, 5.6335e-08, 1.3314e-07,
1.3038e-07, 7.9560e-07, 1.0671e-07, 9.1864e-08, 5.6394e-07, 3.0210e-08,
7.2176e-08, 5.4452e-08, 1.2873e-07, 9.2636e-08, 9.6012e-07, 7.8008e-08,
1.3124e-07, 1.3680e-06, 8.8716e-07, 8.6627e-07, 6.4750e-06, 2.5951e-07,
6.1648e-07, 8.7724e-07, 1.0796e-05, 2.6633e-07, 5.4644e-07, 1.7553e-07,
1.6015e-05, 5.0054e-07, 8.2263e-07, 2.6336e-06, 2.0743e-05, 4.0008e-07,
1.9330e-06, 2.0312e-04, 6.0256e-01, 3.9638e-01, 3.1568e-04, 2.2009e-05,
1.2485e-06, 2.4744e-06, 1.0092e-05, 3.1047e-06, 1.3597e-04, 1.5105e-06,
1.4960e-06, 8.1164e-08, 1.6534e-06, 4.6181e-07, 8.7354e-08, 2.2356e-07,
9.1145e-07, 8.8194e-06, 4.4202e-07, 1.9238e-07, 2.8077e-07, 1.4117e-05,
2.0613e-07, 1.2676e-06, 8.1317e-08, 2.2337e-06, 1.2399e-07, 6.1745e-08,
3.4725e-08, 2.7878e-07, 4.1457e-07, 0.0000e+00]],
grad_fn=<SoftmaxBackward>)
These probabilities can now be used to extract the start and end token of the answer and to calculate the answer score:
answer_start = torch.argmax(potential_start)
answer_end = torch.argmax(potential_end)
answer = tokenizer.decode(inputs.input_ids.squeeze()[answer_start:answer_end+1])
print(potential_start.squeeze()[answer_start])
print(potential_end.squeeze()[answer_end])
print(potential_start.squeeze()[answer_start] *potential_end.squeeze()[answer_end])
print(answer)
Output:
tensor(0.6026, grad_fn=<SelectBackward>)
tensor(0.8720, grad_fn=<SelectBackward>)
tensor(0.5255, grad_fn=<MulBackward0>)
over 32 +
P.S.: Please keep in mind that this answer does not cover any special cases (end token before start token).
i have following code
headers = {'Content-Type': 'application/json', 'cwauth-token': token}
payload = {'namePostfix': 'test99682', 'costModel': 'NOT_TRACKED', 'clickRedirectType': 'REGULAR', 'trafficSource':{'id': '3a7ff9ec-19af-4996-94c1-7f33e036e7af'}, 'redirectTarget': 'DIRECT_URL', 'client':{'id': 'clentIDc', 'clientCode': 'xxx', 'mainDomain': 'domain.tld', 'defaultDomain': 'domain.tld'', 'dmrDomain': 'domain.tld'', 'customParam1Available': false, 'realtimeRoutingAPI': false, 'rootRedirect': false}, 'country':{'code': 'UK'}, 'directRedirectUrl': 'http://google.co.uk'}
r = requests.post('http://stackoverflow.com', json=payload, headers=headers)
When i hit start, it gives error
NameError: name 'false' is not defined
How i can escape those false booleans at payload?
Python doesn't use false, it uses False, hence you're getting a NameError because Python is looking for a variable called false which doesn't exist.
Replace false with False in your dictionary. You've also got a few too many quotes in places, so I've removed those:
payload = {'namePostfix': 'test99682', 'costModel': 'NOT_TRACKED', 'clickRedirectType': 'REGULAR', 'trafficSource':{'id': '3a7ff9ec-19af-4996-94c1-7f33e036e7af'}, 'redirectTarget': 'DIRECT_URL', 'client':{'id': 'clentIDc', 'clientCode': 'xxx', 'mainDomain': 'domain.tld', 'defaultDomain': 'domain.tld', 'dmrDomain': 'domain.tld', 'customParam1Available': False, 'realtimeRoutingAPI': False, 'rootRedirect': False}, 'country':{'code': 'UK'}, 'directRedirectUrl': 'http://google.co.uk'}
Likewise, the opposite boolean value is True (not true), and the "null" data type is None.
False is the correct name to use in Python. In Python, the boolean values, true and false are defined by capitalized True and False