huggingface transformer question answer confidence score - python
How can we fetch the answer confidence score from the sample code of huggingface transformer question answer? I see that pipeline does return the score, but can the below core also return the confidence score.
from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering
import tensorflow as tf
tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
text = r"""
🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""
questions = [
"How many pretrained models are available in Transformers?",
"What does Transformers provide?",
"Transformers provides interoperability between which frameworks?",
]
for question in questions:
inputs = tokenizer.encode_plus(question, text, add_special_tokens=True, return_tensors="tf")
input_ids = inputs["input_ids"].numpy()[0]
text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
answer_start_scores, answer_end_scores = model(inputs)
answer_start = tf.argmax(
answer_start_scores, axis=1
).numpy()[0] # Get the most likely beginning of answer with the argmax of the score
answer_end = (
tf.argmax(answer_end_scores, axis=1) + 1
).numpy()[0] # Get the most likely end of answer with the argmax of the score
answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
print(f"Question: {question}")
print(f"Answer: {answer}\n")
Code picked up from
https://huggingface.co/transformers/usage.html
The score is just a multiplication of the logits of the answer start token answer end token after applying the softmax function. Please have a look at the example below:
Pipeline output:
import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")
model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad")
text = r"""
🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""
question = "How many pretrained models are available in Transformers?"
question_answerer = pipeline("question-answering", model = model, tokenizer= tokenizer)
print(question_answerer(question=question, context = text))
Output:
{'score': 0.5254509449005127, 'start': 256, 'end': 264, 'answer': 'over 32+'}
Without pipeline:
inputs = tokenizer(question, text, add_special_tokens=True, return_tensors="pt")
outputs = model(**inputs)
At first, we create a mask that has a 1 for every context token and 0 otherwise (question tokens and special tokens. We use the batchencoding.sequence_ids method for that:
non_answer_tokens = [x if x in [0,1] else 0 for x in inputs.sequence_ids()]
non_answer_tokens = torch.tensor(non_answer_tokens, dtype=torch.bool)
non_answer_tokens
Output:
tensor([False, False, False, False, False, False, False, False, False, False,
False, False, False, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, False])
We use this mask to set the logits of the special tokens and the questions tokens to negative infinite and apply the softmax afterward (the negative infinite prevents these tokens from influencing the softmax result):
from torch.nn.functional import softmax
potential_start = torch.where(non_answer_tokens, outputs.start_logits, torch.tensor(float('-inf'),dtype=torch.float))
potential_end = torch.where(non_answer_tokens, outputs.end_logits, torch.tensor(float('-inf'),dtype=torch.float))
potential_start = softmax(potential_start, dim = 1)
potential_end = softmax(potential_end, dim = 1)
potential_start
Output:
tensor([[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 1.0567e-04, 9.7031e-05, 1.9445e-06, 1.5849e-06, 1.2075e-07,
3.1704e-08, 4.7796e-06, 1.8712e-07, 6.2977e-08, 1.5481e-07, 8.0004e-08,
3.7896e-07, 1.6438e-07, 9.7762e-08, 1.0898e-05, 1.6518e-07, 5.6349e-08,
2.4848e-07, 2.1459e-07, 1.3785e-06, 1.0386e-07, 1.8803e-07, 8.1887e-08,
4.1088e-07, 1.5618e-07, 2.5624e-06, 1.8526e-06, 2.6710e-06, 6.8466e-08,
1.7953e-07, 3.6242e-07, 2.2788e-07, 2.3384e-06, 1.2147e-05, 1.6065e-07,
3.3257e-07, 2.6021e-07, 2.8140e-06, 1.3698e-07, 1.1066e-07, 2.8436e-06,
1.2171e-07, 9.9341e-07, 1.1684e-07, 6.8935e-08, 5.6335e-08, 1.3314e-07,
1.3038e-07, 7.9560e-07, 1.0671e-07, 9.1864e-08, 5.6394e-07, 3.0210e-08,
7.2176e-08, 5.4452e-08, 1.2873e-07, 9.2636e-08, 9.6012e-07, 7.8008e-08,
1.3124e-07, 1.3680e-06, 8.8716e-07, 8.6627e-07, 6.4750e-06, 2.5951e-07,
6.1648e-07, 8.7724e-07, 1.0796e-05, 2.6633e-07, 5.4644e-07, 1.7553e-07,
1.6015e-05, 5.0054e-07, 8.2263e-07, 2.6336e-06, 2.0743e-05, 4.0008e-07,
1.9330e-06, 2.0312e-04, 6.0256e-01, 3.9638e-01, 3.1568e-04, 2.2009e-05,
1.2485e-06, 2.4744e-06, 1.0092e-05, 3.1047e-06, 1.3597e-04, 1.5105e-06,
1.4960e-06, 8.1164e-08, 1.6534e-06, 4.6181e-07, 8.7354e-08, 2.2356e-07,
9.1145e-07, 8.8194e-06, 4.4202e-07, 1.9238e-07, 2.8077e-07, 1.4117e-05,
2.0613e-07, 1.2676e-06, 8.1317e-08, 2.2337e-06, 1.2399e-07, 6.1745e-08,
3.4725e-08, 2.7878e-07, 4.1457e-07, 0.0000e+00]],
grad_fn=<SoftmaxBackward>)
These probabilities can now be used to extract the start and end token of the answer and to calculate the answer score:
answer_start = torch.argmax(potential_start)
answer_end = torch.argmax(potential_end)
answer = tokenizer.decode(inputs.input_ids.squeeze()[answer_start:answer_end+1])
print(potential_start.squeeze()[answer_start])
print(potential_end.squeeze()[answer_end])
print(potential_start.squeeze()[answer_start] *potential_end.squeeze()[answer_end])
print(answer)
Output:
tensor(0.6026, grad_fn=<SelectBackward>)
tensor(0.8720, grad_fn=<SelectBackward>)
tensor(0.5255, grad_fn=<MulBackward0>)
over 32 +
P.S.: Please keep in mind that this answer does not cover any special cases (end token before start token).
Related
How to Group by Date Field in a PivotTable using win32com.client
I tried many approaches from last few hours but no luck. Somebody please help me. group_dt = pt.PivotFields('Created') group_dt.LabelRange.Group(Start=True, End=True, Periods=Array(False, False, False, False, True, False, True)) Error: --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-75-0355ab1abb88> in <module> 1 group_dt = pt.PivotFields('Created') ----> 2 group_dt.LabelRange.Group(Start=True, End=True, Periods=Array(False, False, False, False, True, False, True)) TypeError: 'str' object is not callable
After a lot of research I have figured out the way to group date field using win32com.client cell = pivot_sheet.Range('B5') cell.Group(Start=True, End=True, Periods=list([False, False, False, False, True, False, True]))
How to decode javascript-unicode string in python?
I have this string: {\u0022allow_group_shows\u0022: true, \u0022needs_supporter_to_pm\u0022: true, \u0022ads_zone_ids\u0022: {\u0022300x250,centre\u0022: \u0022\u0022, \u0022300x250,right\u0022: \u0022\u0022, \u0022300x250,left\u0022: \u0022\u0022, \u0022468x60\u0022: \u0022\u0022, \u0022160x600,top\u0022: \u0022\u0022, \u0022160x600,bottom\u0022: \u0022\u0022, \u0022160x600,middle\u0022: \u0022\u0022}, \u0022chat_settings\u0022: {\u0022sort_users_key\u0022: \u0022a\u0022, \u0022silence_broadcasters\u0022: \u0022false\u0022, \u0022highest_token_color\u0022: \u0022darkpurple\u0022, \u0022emoticon_autocomplete_delay\u0022: \u00220\u0022, \u0022ignored_users\u0022: \u0022\u0022, \u0022show_emoticons\u0022: true, \u0022font_size\u0022: \u00229pt\u0022, \u0022b_tip_vol\u0022: \u002210\u0022, \u0022allowed_chat\u0022: \u0022all\u0022, \u0022room_leave_for\u0022: \u0022org\u0022, \u0022font_color\u0022: \u0022#494949\u0022, \u0022font_family\u0022: \u0022default\u0022, \u0022room_entry_for\u0022: \u0022org\u0022, \u0022v_tip_vol\u0022: \u002280\u0022}, \u0022is_age_verified\u0022: true, \u0022flash_host\u0022: \u0022edge143.stream.highwebmedia.com\u0022, \u0022tips_in_past_24_hours\u0022: 0, \u0022dismissible_messages\u0022: [], \u0022show_mobile_site_banner_link\u0022: false, \u0022last_vote_in_past_90_days_down\u0022: false, \u0022server_name\u0022: \u0022113\u0022, \u0022num_users_required_for_group\u0022: 2, \u0022group_show_price\u0022: 18, \u0022is_mobile\u0022: false, \u0022chat_username\u0022: \u0022__anonymous__eiwBXR\u0022, \u0022recommender_hmac\u0022: \u0022dae28e4e9afa15da7c6227af2e8fb8abd85a3714aca8f86f01a53a6dd1377115\u0022, \u0022broadcaster_gender\u0022: \u0022couple\u0022, \u0022hls_source\u0022: \u0022https://localhost/live\u002Dhls/amlst:jimmy_and_amy\u002Dsd\u002De73f4b67186a2ec4c13137607d02470ac61f32b60ac15e691bf33493423ef477_trns_h264/playlist.m3u8\u0022, \u0022allow_show_recordings\u0022: true, \u0022is_moderator\u0022: false, \u0022room_status\u0022: \u0022public\u0022, \u0022edge_auth\u0022: \u0022{\u005C\u0022username\u005C\u0022:\u005C\u0022__anonymous__eiwBXR\u005C\u0022,\u005C\u0022org\u005C\u0022:\u005C\u0022A\u005C\u0022,\u005C\u0022expire\u005C\u0022:1590669977,\u005C\u0022sig\u005C\u0022:\u005C\u0022454d96141c66fb42f74e9620b9d79e937de3a774a5687021f8650cc4f563d371\u005C\u0022,\u005C\u0022room\u005C\u0022:\u005C\u0022jimmy_and_amy\u005C\u0022}\u0022, \u0022is_supporter\u0022: false, \u0022chat_password\u0022: \u0022{\u005C\u0022username\u005C\u0022:\u005C\u0022__anonymous__eiwBXR\u005C\u0022,\u005C\u0022org\u005C\u0022:\u005C\u0022A\u005C\u0022,\u005C\u0022expire\u005C\u0022:1590669977,\u005C\u0022sig\u005C\u0022:\u005C\u0022454d96141c66fb42f74e9620b9d79e937de3a774a5687021f8650cc4f563d371\u005C\u0022,\u005C\u0022room\u005C\u0022:\u005C\u0022jimmy_and_amy\u005C\u0022}\u0022, \u0022room_pass\u0022: \u0022b5b2408cd91e6c595a3f732a5b7b1567b566bcc92f384ce5e6a00a26a24fb5c7\u0022, \u0022low_satisfaction_score\u0022: false, \u0022tfa_enabled\u0022: false, \u0022room_title\u0022: \u0022(STEP SIS CUM FACE) shh... Luna is here and dont know what we are doing #hairy #creampie #stockings #new #lush [3536 tokens remaining]\u0022, \u0022satisfaction_score\u0022: {\u0022down_votes\u0022: 15, \u0022up_votes\u0022: 67, \u0022percent\u0022: 82, \u0022max\u0022: 31222657}, \u0022viewer_username\u0022: \u0022AnonymousUser\u0022, \u0022hidden_message\u0022: \u0022\u0022, \u0022following\u0022: false, \u0022wschat_host\u0022: \u0022https://chatws\u002D45.stream.highwebmedia.com/ws\u0022, \u0022has_studio\u0022: false, \u0022num_followed\u0022: 0, \u0022spy_private_show_price\u0022: 30, \u0022hide_satisfaction_score\u0022: false, \u0022broadcaster_username\u0022: \u0022jimmy_and_amy\u0022, \u0022ignored_emoticons\u0022: [], \u0022apps_running\u0022: \u0022[[\u005C\u0022Tip Goal\u005C\u0022,\u005C\u0022\u005C\u005C/apps\u005C\u005C/app_details\u005C\u005C/tip\u002Dgoal\u005C\u005C/?slot\u003D0\u005C\u0022],[\u005C\u0022Ultra Bot \u002D 4Sci\u005C\u0022,\u005C\u0022\u005C\u005C/apps\u005C\u005C/app_details\u005C\u005C/ultra\u002Dbot\u002D4sci\u005C\u005C/?slot\u003D2\u005C\u0022],[\u005C\u0022Roll The Dice\u005C\u0022,\u005C\u0022\u005C\u005C/apps\u005C\u005C/app_details\u005C\u005C/roll\u002Dthe\u002Ddice\u002D5\u005C\u005C/?slot\u003D3\u005C\u0022]]\u0022, \u0022token_balance\u0022: 0, \u0022private_min_minutes\u0022: 10, \u0022viewer_gender\u0022: \u0022m\u0022, \u0022allow_anonymous_tipping\u0022: false, \u0022num_users_waiting_for_group\u0022: 0, \u0022last_vote_in_past_24_hours\u0022: null, \u0022is_widescreen\u0022: true, \u0022num_viewers\u0022: 1672, \u0022broadcaster_on_new_chat\u0022: false, \u0022private_show_price\u0022: 30, \u0022num_followed_online\u0022: 0, \u0022allow_private_shows\u0022: true} and I want to decode it in python to view to make it easier for me to send it over the internet into our android app... anyways that why I am here it should look like "{"allow_group_shows": true, "needs_supporter_to_pm": true, "ads_zone_ids": {"300x250,centre": "", "300x250,right": "", "300x250,left": "", "468x60": "", "160x600,top": "", "160x600,bottom": "", "160x600,middle": ""}, "chat_settings": {"sort_users_key": "a", "silence_broadcasters": "false", "highest_token_color": "darkpurple", "emoticon_autocomplete_delay": "0", "ignored_users": "", "show_emoticons": true, "font_size": "9pt", "b_tip_vol": "10", "allowed_chat": "all", "room_leave_for": "org", "font_color": "#494949", "font_family": "default", "room_entry_for": "org", "v_tip_vol": "80"}, "is_age_verified": true, "flash_host": "edge306.stream.highwebmedia.com", "tips_in_past_24_hours": 0, "dismissible_messages": [], "show_mobile_site_banner_link": false, "last_vote_in_past_90_days_down": false, "server_name": "115", "num_users_required_for_group": 2, "group_show_price": 18, "is_mobile": false, "chat_username": "bom4b5", "recommender_hmac": "ed05e292bb82262255a96944d81bb04dc2d248ca69fff35cf5d7015889c005b1", "broadcaster_gender": "couple", "hls_source": "https://***/live-edge/****-sd-e73f4b67186a2ec4c13137607d02470ac61f32b60***%22%7D", "allow_show_recordings": true, "is_moderator": false, "room_status": "public", "edge_auth": "{\"username\":\"bom4b5\",\"org\":\"A\",\"expire\":1590666696,\"sig\":\"49b6844fde2c47c2430bd05946b6cfbc9c7864788b9236d7f5af5ff88efd3f95\",\"room\":\"jimmy_and_amy\"}", "is_supporter": false, "chat_password": "****", "room_pass": "b5b2408cd91e6c595a3f732a5b7b1567b566bcc92f384ce5e6a00a26a24fb5c7", "low_satisfaction_score": false, "tfa_enabled": false, "room_title": "(STEP SIS CUM FACE) shh... Luna is here and dont know what we are doing #hairy #creampie #stockings #new #lush [4310 tokens remaining]", "satisfaction_score": {"down_votes": 15, "up_votes": 67, "percent": 82, "max": 31222657}, "viewer_username": "bom4b5", "hidden_message": "", "following": false, "wschat_host": "https://chatws-45.stream.highwebmedia.com/ws", "has_studio": false, "num_followed": 0, "spy_private_show_price": 30, "hide_satisfaction_score": false, "broadcaster_username": "jimmy_and_amy", "ignored_emoticons": [], "apps_running": "[[\"Tip Goal\",\"\\/apps\\/app_details\\/tip-goal\\/?slot=0\"],[\"Ultra Bot - 4Sci\",\"\\/apps\\/app_details\\/ultra-bot-4sci\\/?slot=2\"],[\"Roll The Dice\",\"\\/apps\\/app_details\\/roll-the-dice-5\\/?slot=3\"]]", "token_balance": 0, "private_min_minutes": 10, "viewer_gender": "f", "allow_anonymous_tipping": false, "num_users_waiting_for_group": 0, "last_vote_in_past_24_hours": null, "is_widescreen": true, "num_viewers": 331, "broadcaster_on_new_chat": false, "private_show_price": 30, "num_followed_online": 0, "allow_private_shows": true}" I tried How to decode javascript unicode string in python? it worked but the only one issue is that I cant decode it after I split gett = s.get("https://localhost.com/34534535/") # print(gett.text + "\n\n\n\n") m = json.dumps({"k": gett.text}) # decodes it # split testt = (json.loads(m)["k"]).split('window.initialRoomDossier = "') testt = testt[1].split('";') final = testt[0] # show the same the thing is that if I load it from a string like a = "{\u0022allow_group_shows\u0022: true, \u0022nee..." it does work but not after I split
Here is how I solved it import js2py #split splitt = (json.loads(m)["k"]).split('window.initialRoomDossier = "') splitt = splitt[1].split('";') final = str(splitt[0]) final = 'var kek = "'+final+'";'#Turning it into javascript code (Later we will run it) #Javascript Virtual Machine context = js2py.EvalJs({'python_sum': sum}) #Slides the sexy code context.execute(final)#Running the javascript code #print (context.kek)# m3u8_url = json.loads(context.kek)["hls_source"]
Expected indented block - line continuation?
solved_places = { 'a1': False,'a2': False,'a3': False,'a4': False,'a5': False, 'b1': False,'b2': False,'b3': False,'b4': False,'b5': False, 'c1': False,'c2': False,'c3': False,'c4': False,'c5': False, 'd1': False,'d2': False,'d3': False,'d4': False,'d5': False, 'e1': False,'e2': False,'e3': False,'e4': False,'e5': False } Getting error pointer on the second line, (b1) asking for an indentation. (ERROR: Expected an indented Block) Whats wrong? (no Spaces, all tabs, 1 tab = 4 Spaces...) I've tried different Levels of indentation, Spaces, commas etc...) like this: solved_places = {'a1': False,'a2': False,'a3': False,'a4': False,'a5': False, 'b1': False,'b2': False,'b3': False,'b4': False,'b5': False, 'c1': False,'c2': False,'c3': False,'c4': False,'c5': False, 'd1': False,'d2': False,'d3': False,'d4': False,'d5': False, 'e1': False,'e2': False,'e3': False,'e4': False,'e5': False}
Choosing an equlation from api response Python
Im trying to work with api responses; Here is the example response that comes from api; {u'blog': {u'followed': False, u'is_adult': False, u'can_subscribe': False, u'is_nsfw': False, u'ask': True, u'likes': 920, u'is_blocked_from_primary': False, u'can_submit': True, u'ask_anon': True, u'subscribed': False, u'share_likes': True, u'updated': 1493576375, u'description': u'<p>"Che hai dei bellissimi occhi quando mi cerchi."</p><p>18 </p><p>Beginner Wiccan and Witch </p><p>\U0001f312\U0001f315\U0001f318</p>', u'total_posts': 13992, u'submission_page_title': u'Submit', u'submission_terms': {u'title': u'Submit', u'tags': [], u'guidelines': u'', u'accepted_types': [u'text', u'photo', u'quote', u'link', u'video']}, u'name': u'darknessinmyheartt', u'url': u'http://darknessinmyheartt.tumblr.com/', u'ask_page_title': u'lets ask something!/ haydi sor!', u'title': u'"Laurel"', u'posts': 13992, u'reply_conditions': u'3', u'can_send_fan_mail': False}} how can ı get only the value of u'updated' from that response u'updated': 1493576375 I have to define that value to "x"
I think the tumblr api response is a python dict, based on that, try: x = client.blog_info('darknessinmyheartt') print x['blog']['updated']
You got json response, in Python you can get value from json data, via key. For your example, you need to do next things: your_data['blog']['updated'] via your_data['blog'] You will get object with key, values {'followed': False, u'is_adult': False, u'can_subscribe': False, ....} and via your_data['blog']['updated'] you will get value 1493576375
Find and Replace text within headers with Win32COM
I'd like to find some words in the headers of a Word document and replace them with other words : I've done this in the body of the document with the following code, and it works fine. import win32com.client wdFindContinue = 1 wdReplaceAll = 2 app = win32com.client.DispatchEx("Word.Application") app.Visible = 1 app.DisplayAlerts = 0 app.Documents.Open(document_path) FromTo = {"<#TITLE#>":"My title", "<#DATE#>":"Today"} for From in FromTo.keys(): app.Selection.Find.Execute(From, False, False, False, False, False, True, wdFindContinue, False, FromTo[From], wdReplaceAll) The problem is that this code doesn't work for headers and footers. I've also tried this : app.ActiveDocument.Sections(1).Headers(win32com.client.constants.wdHeaderFooterPrimary).Range.Select app.Selection.Find.Execute(From, False, False, False, False, False, True, wdFindContinue, False, FromTo[From], wdReplaceAll) But it doesn't work better (despite the fact that I don't have any error message). Does someone have an idea on how to do that? Another information is that I have an image inserted in the headers as well, I don't know if it matters or not.
You must activate header/footer pane after open document. Language Visual basic. Change syntax to python ActiveDocument.ActiveWindow.Panes(1).View.SeekView=wdSeekCurrentPageHeader for header and ActiveDocument.ActiveWindow.Panes(1).View.SeekView = wdSeekCurrentPageFooter for footer Then search/replace To change pane to main part use ActiveDocument.ActiveWindow.Panes(1).View.SeekView = wdSeekMainDocument