Expected indented block - line continuation? - python

solved_places = {
'a1': False,'a2': False,'a3': False,'a4': False,'a5': False,
'b1': False,'b2': False,'b3': False,'b4': False,'b5': False,
'c1': False,'c2': False,'c3': False,'c4': False,'c5': False,
'd1': False,'d2': False,'d3': False,'d4': False,'d5': False,
'e1': False,'e2': False,'e3': False,'e4': False,'e5': False
}
Getting error pointer on the second line, (b1) asking for an indentation.
(ERROR: Expected an indented Block)
Whats wrong? (no Spaces, all tabs, 1 tab = 4 Spaces...)
I've tried different Levels of indentation, Spaces, commas etc...) like this:
solved_places =
{'a1': False,'a2': False,'a3': False,'a4': False,'a5': False,
'b1': False,'b2': False,'b3': False,'b4': False,'b5': False,
'c1': False,'c2': False,'c3': False,'c4': False,'c5': False,
'd1': False,'d2': False,'d3': False,'d4': False,'d5': False,
'e1': False,'e2': False,'e3': False,'e4': False,'e5': False}

Related

huggingface transformer question answer confidence score

How can we fetch the answer confidence score from the sample code of huggingface transformer question answer? I see that pipeline does return the score, but can the below core also return the confidence score.
from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering
import tensorflow as tf
tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
text = r"""
🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""
questions = [
"How many pretrained models are available in Transformers?",
"What does Transformers provide?",
"Transformers provides interoperability between which frameworks?",
]
for question in questions:
inputs = tokenizer.encode_plus(question, text, add_special_tokens=True, return_tensors="tf")
input_ids = inputs["input_ids"].numpy()[0]
text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
answer_start_scores, answer_end_scores = model(inputs)
answer_start = tf.argmax(
answer_start_scores, axis=1
).numpy()[0] # Get the most likely beginning of answer with the argmax of the score
answer_end = (
tf.argmax(answer_end_scores, axis=1) + 1
).numpy()[0] # Get the most likely end of answer with the argmax of the score
answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
print(f"Question: {question}")
print(f"Answer: {answer}\n")
Code picked up from
https://huggingface.co/transformers/usage.html
The score is just a multiplication of the logits of the answer start token answer end token after applying the softmax function. Please have a look at the example below:
Pipeline output:
import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")
model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad")
text = r"""
🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""
question = "How many pretrained models are available in Transformers?"
question_answerer = pipeline("question-answering", model = model, tokenizer= tokenizer)
print(question_answerer(question=question, context = text))
Output:
{'score': 0.5254509449005127, 'start': 256, 'end': 264, 'answer': 'over 32+'}
Without pipeline:
inputs = tokenizer(question, text, add_special_tokens=True, return_tensors="pt")
outputs = model(**inputs)
At first, we create a mask that has a 1 for every context token and 0 otherwise (question tokens and special tokens. We use the batchencoding.sequence_ids method for that:
non_answer_tokens = [x if x in [0,1] else 0 for x in inputs.sequence_ids()]
non_answer_tokens = torch.tensor(non_answer_tokens, dtype=torch.bool)
non_answer_tokens
Output:
tensor([False, False, False, False, False, False, False, False, False, False,
False, False, False, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, False])
We use this mask to set the logits of the special tokens and the questions tokens to negative infinite and apply the softmax afterward (the negative infinite prevents these tokens from influencing the softmax result):
from torch.nn.functional import softmax
potential_start = torch.where(non_answer_tokens, outputs.start_logits, torch.tensor(float('-inf'),dtype=torch.float))
potential_end = torch.where(non_answer_tokens, outputs.end_logits, torch.tensor(float('-inf'),dtype=torch.float))
potential_start = softmax(potential_start, dim = 1)
potential_end = softmax(potential_end, dim = 1)
potential_start
Output:
tensor([[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 1.0567e-04, 9.7031e-05, 1.9445e-06, 1.5849e-06, 1.2075e-07,
3.1704e-08, 4.7796e-06, 1.8712e-07, 6.2977e-08, 1.5481e-07, 8.0004e-08,
3.7896e-07, 1.6438e-07, 9.7762e-08, 1.0898e-05, 1.6518e-07, 5.6349e-08,
2.4848e-07, 2.1459e-07, 1.3785e-06, 1.0386e-07, 1.8803e-07, 8.1887e-08,
4.1088e-07, 1.5618e-07, 2.5624e-06, 1.8526e-06, 2.6710e-06, 6.8466e-08,
1.7953e-07, 3.6242e-07, 2.2788e-07, 2.3384e-06, 1.2147e-05, 1.6065e-07,
3.3257e-07, 2.6021e-07, 2.8140e-06, 1.3698e-07, 1.1066e-07, 2.8436e-06,
1.2171e-07, 9.9341e-07, 1.1684e-07, 6.8935e-08, 5.6335e-08, 1.3314e-07,
1.3038e-07, 7.9560e-07, 1.0671e-07, 9.1864e-08, 5.6394e-07, 3.0210e-08,
7.2176e-08, 5.4452e-08, 1.2873e-07, 9.2636e-08, 9.6012e-07, 7.8008e-08,
1.3124e-07, 1.3680e-06, 8.8716e-07, 8.6627e-07, 6.4750e-06, 2.5951e-07,
6.1648e-07, 8.7724e-07, 1.0796e-05, 2.6633e-07, 5.4644e-07, 1.7553e-07,
1.6015e-05, 5.0054e-07, 8.2263e-07, 2.6336e-06, 2.0743e-05, 4.0008e-07,
1.9330e-06, 2.0312e-04, 6.0256e-01, 3.9638e-01, 3.1568e-04, 2.2009e-05,
1.2485e-06, 2.4744e-06, 1.0092e-05, 3.1047e-06, 1.3597e-04, 1.5105e-06,
1.4960e-06, 8.1164e-08, 1.6534e-06, 4.6181e-07, 8.7354e-08, 2.2356e-07,
9.1145e-07, 8.8194e-06, 4.4202e-07, 1.9238e-07, 2.8077e-07, 1.4117e-05,
2.0613e-07, 1.2676e-06, 8.1317e-08, 2.2337e-06, 1.2399e-07, 6.1745e-08,
3.4725e-08, 2.7878e-07, 4.1457e-07, 0.0000e+00]],
grad_fn=<SoftmaxBackward>)
These probabilities can now be used to extract the start and end token of the answer and to calculate the answer score:
answer_start = torch.argmax(potential_start)
answer_end = torch.argmax(potential_end)
answer = tokenizer.decode(inputs.input_ids.squeeze()[answer_start:answer_end+1])
print(potential_start.squeeze()[answer_start])
print(potential_end.squeeze()[answer_end])
print(potential_start.squeeze()[answer_start] *potential_end.squeeze()[answer_end])
print(answer)
Output:
tensor(0.6026, grad_fn=<SelectBackward>)
tensor(0.8720, grad_fn=<SelectBackward>)
tensor(0.5255, grad_fn=<MulBackward0>)
over 32 +
P.S.: Please keep in mind that this answer does not cover any special cases (end token before start token).

How to filter efficiently huge list by multiple rules?

I am writing an open-source PyPi package, that should filter the AWS EC2 instances.
In my function ec_compare__from_dict, I am filtering a list of 350+ elements that takes 364Kb on disk.
The following example of execution returns 1 filtered element:
>>> ec_compare__from_dict(_partial=_partial,InstanceType='z1d',FreeTierEligible=False,SupportedUsageClasses='spot',BareMetal=True)
[{'InstanceType': 'z1d.metal', 'CurrentGeneration': True, 'FreeTierEligible': False, 'SupportedUsageClasses': ['on-demand', 'spot'], 'SupportedRootDeviceTypes': ['ebs'], 'BareMetal': True, 'ProcessorInfo': {'SupportedArchitectures': ['x86_64'], 'SustainedClockSpeedInGhz': 4.0}, 'VCpuInfo': {'DefaultVCpus': 48}, 'MemoryInfo': {'SizeInMiB': 393216}, 'InstanceStorageSupported': True, 'InstanceStorageInfo': {'TotalSizeInGB': 1800, 'Disks': [{'SizeInGB': 900, 'Count': 2, 'Type': 'ssd'}]}, 'EbsInfo': {'EbsOptimizedSupport': 'default', 'EncryptionSupport': 'supported'}, 'NetworkInfo': {'NetworkPerformance': '25 Gigabit', 'MaximumNetworkInterfaces': 15, 'Ipv4AddressesPerInterface': 50, 'Ipv6AddressesPerInterface': 50, 'Ipv6Supported': True, 'EnaSupport': 'required'}, 'PlacementGroupInfo': {'SupportedStrategies': ['cluster', 'partition', 'spread']}, 'HibernationSupported': False, 'BurstablePerformanceSupported': False, 'DedicatedHostsSupported': True, 'AutoRecoverySupported': False}]
My problem is the following:
I want to filter the list with all filters that have different rules in one single list comprehension.
But I am losing readability and I am creating a spaghetti code. Please point me to the better design decisions.
from typing import List
def ec2keys(*arg) -> List:
values = {'str': ['InstanceType', 'Hypervisor'], 'bool': ['FreeTierEligible', 'HibernationSupported', 'CurrentGeneration', 'BurstablePerformanceSupported', 'AutoRecoverySupported', 'DedicatedHostsSupported', 'InstanceStorageSupported', 'BareMetal'], 'list': ['SupportedUsageClasses', 'SupportedRootDeviceTypes'], 'dict': ['InstanceStorageInfo', 'VCpuInfo', 'EbsInfo', 'FpgaInfo', 'PlacementGroupInfo', 'GpuInfo', 'InferenceAcceleratorInfo', 'MemoryInfo', 'NetworkInfo', 'ProcessorInfo'], 'other': []}
return [elem for k,v in values.items() if k in arg or not arg for elem in v]
def ec_compare__from_dict(_partial: List,**kwargs):
_instance_type = kwargs.get('InstanceType')
flat_keys = set(ec2keys('str', 'bool')).intersection(
set(kwargs.keys())) - {'InstanceType'}
complex_filter_keys = set(ec2keys()).intersection(
set(kwargs.keys()))
list_keys_dict = {k: list(
(lambda x: x if isinstance(x, list) else [x])(kwargs.get(k)))
for k in set(ec2keys('list')).intersection(
set(kwargs.keys()))
}
# here I started with list comprehension
_partial = [x for x in _partial
if all(elem in x.keys() for elem in flat_keys)
and all(elem in x.keys() for elem in complex_filter_keys)
and all(x[elem] == kwargs[elem] for elem in flat_keys)
]
# this is re-apply filter again to all elements
if isinstance(_instance_type, str) and _instance_type:
_partial = [x for x in _partial
if str(x['InstanceType']).startswith(_instance_type)
]
elif isinstance(_instance_type, (list, set)) and _instance_type:
_partial = [x for x in _partial
if any(str(x['InstanceType']).startswith(elem)
for elem in _instance_type)
]
# this is how I filter list values
if list_keys_dict:
_partial = [x for x in _partial
if any(set(x[k]).intersection(v) for k, v in list_keys_dict.items())
]
return _partial
Example data
_partial = [{'InstanceType': 'z1d.metal', 'CurrentGeneration': True, 'FreeTierEligible': False, 'SupportedUsageClasses': ['on-demand', 'spot'], 'SupportedRootDeviceTypes': ['ebs'], 'BareMetal': True, 'ProcessorInfo': {'SupportedArchitectures': ['x86_64'], 'SustainedClockSpeedInGhz': 4.0}, 'VCpuInfo': {'DefaultVCpus': 48}, 'MemoryInfo': {'SizeInMiB': 393216}, 'InstanceStorageSupported': True, 'InstanceStorageInfo': {'TotalSizeInGB': 1800, 'Disks': [{'SizeInGB': 900, 'Count': 2, 'Type': 'ssd'}]}, 'EbsInfo': {'EbsOptimizedSupport': 'default', 'EncryptionSupport': 'supported'}, 'NetworkInfo': {'NetworkPerformance': '25 Gigabit', 'MaximumNetworkInterfaces': 15, 'Ipv4AddressesPerInterface': 50, 'Ipv6AddressesPerInterface': 50, 'Ipv6Supported': True, 'EnaSupport': 'required'}, 'PlacementGroupInfo': {'SupportedStrategies': ['cluster', 'partition', 'spread']}, 'HibernationSupported': False, 'BurstablePerformanceSupported': False, 'DedicatedHostsSupported': True, 'AutoRecoverySupported': False}]
Because of your nested list and dict structure, I think a class comparison is not the easiest one. But in a class comparison you can generate a comparison method for every item separately, which will cut the big function in many small ones. This will lead to some maintaining issues if the interface changes.
Your dictionary comparison approach is better in that case, but I would rewrite it using recursion for the nested dictionaries. By using recursion, you can simplify the nesting a bit.
By using your provided input:
data = {
'InstanceType': 'z1d.metal',
'CurrentGeneration': True,
'FreeTierEligible': False,
'SupportedUsageClasses': ['on-demand', 'spot'],
'SupportedRootDeviceTypes': ['ebs'],
'BareMetal': True,
'ProcessorInfo': {'SupportedArchitectures': ['x86_64'],
'SustainedClockSpeedInGhz': 4.0},
'VCpuInfo': {'DefaultVCpus': 48},
'MemoryInfo': {'SizeInMiB': 393216},
'InstanceStorageSupported': True,
'InstanceStorageInfo': {'TotalSizeInGB': 1800,
'Disks': [{'SizeInGB': 900, 'Count': 2, 'Type': 'ssd'}]},
'EbsInfo': {'EbsOptimizedSupport': 'default', 'EncryptionSupport': 'supported'},
'NetworkInfo': {'NetworkPerformance': '25 Gigabit',
'MaximumNetworkInterfaces': 15,
'Ipv4AddressesPerInterface': 50,
'Ipv6AddressesPerInterface': 50,
'Ipv6Supported': True,
'EnaSupport': 'required'},
'PlacementGroupInfo': {'SupportedStrategies': ['cluster', 'partition', 'spread']},
'HibernationSupported': False,
'BurstablePerformanceSupported': False,
'DedicatedHostsSupported': True,
'AutoRecoverySupported': False}
I generated a few possible filters (valid are True, invalid are False):
data_check_valid = {
'InstanceType': 'z1d.metal',
'InstanceStorageInfo': {'TotalSizeInGB': 1800},
'PlacementGroupInfo': {'SupportedStrategies': ['spread']},
}
data_check_invalid_strategy = {
'InstanceType': 'z1d.metal',
'InstanceStorageInfo': {'TotalSizeInGB': 1800},
'PlacementGroupInfo': {'SupportedStrategies': ['clustering']}, # Clustering is not supported.
}
data_check_invalid_count = {
'InstanceType': 'z1d.metal',
'InstanceStorageInfo': {'TotalSizeInGB': 1800,
'Disks': [{'SizeInGB': 900, 'Count': 4, 'Type': 'ssd'}]}, # Counts are unequal
}
Then we will compare the two dictionaries element for element, including nested elements.
For this the following function is used:
def verify_element(original, check) -> bool:
# Compare the types
if type(original) != type(check):
return False
# recursively call this function for every element in the dictionary (if key exists)
if isinstance(check, dict):
for key, value in check.items():
if key not in original:
return False
if not verify_element(value, original[key]):
return False
return True
# The value inside check has to occur in any of the original elements
# This behaviour is required, because we do not know where the check elements is positioned.
if isinstance(check, (tuple, list)):
for element in check:
if not any(verify_element(each, element) for each in original):
return False
return True
# Verify the element directly.
if isinstance(check, (str, bool, int, float)):
return original == check
# Handle any unknown data types.
raise TypeError(f"Type {type(check)}, with value {check} cannot be compared.")
To compare both dictionaries with each other, the final check will then look like this:
if __name__ == '__main__':
print(verify_element(data, data_check_valid)) # True
print(verify_element(data, data_check_invalid_strategy)) # False
print(verify_element(data, data_check_invalid_count)) # False
# When you change 'Count' to 2, the answer will become # True
If you want to use this cleanly you can put it in a class and compare every element individually using the above function. This makes it possible to also include custom validators, such as should be bigger or smaller than the original value (which is currently impossible with the above code).

How to decode javascript-unicode string in python?

I have this string:
{\u0022allow_group_shows\u0022: true, \u0022needs_supporter_to_pm\u0022: true, \u0022ads_zone_ids\u0022: {\u0022300x250,centre\u0022: \u0022\u0022, \u0022300x250,right\u0022: \u0022\u0022, \u0022300x250,left\u0022: \u0022\u0022, \u0022468x60\u0022: \u0022\u0022, \u0022160x600,top\u0022: \u0022\u0022, \u0022160x600,bottom\u0022: \u0022\u0022, \u0022160x600,middle\u0022: \u0022\u0022}, \u0022chat_settings\u0022: {\u0022sort_users_key\u0022: \u0022a\u0022, \u0022silence_broadcasters\u0022: \u0022false\u0022, \u0022highest_token_color\u0022: \u0022darkpurple\u0022, \u0022emoticon_autocomplete_delay\u0022: \u00220\u0022, \u0022ignored_users\u0022: \u0022\u0022, \u0022show_emoticons\u0022: true, \u0022font_size\u0022: \u00229pt\u0022, \u0022b_tip_vol\u0022: \u002210\u0022, \u0022allowed_chat\u0022: \u0022all\u0022, \u0022room_leave_for\u0022: \u0022org\u0022, \u0022font_color\u0022: \u0022#494949\u0022, \u0022font_family\u0022: \u0022default\u0022, \u0022room_entry_for\u0022: \u0022org\u0022, \u0022v_tip_vol\u0022: \u002280\u0022}, \u0022is_age_verified\u0022: true, \u0022flash_host\u0022: \u0022edge143.stream.highwebmedia.com\u0022, \u0022tips_in_past_24_hours\u0022: 0, \u0022dismissible_messages\u0022: [], \u0022show_mobile_site_banner_link\u0022: false, \u0022last_vote_in_past_90_days_down\u0022: false, \u0022server_name\u0022: \u0022113\u0022, \u0022num_users_required_for_group\u0022: 2, \u0022group_show_price\u0022: 18, \u0022is_mobile\u0022: false, \u0022chat_username\u0022: \u0022__anonymous__eiwBXR\u0022, \u0022recommender_hmac\u0022: \u0022dae28e4e9afa15da7c6227af2e8fb8abd85a3714aca8f86f01a53a6dd1377115\u0022, \u0022broadcaster_gender\u0022: \u0022couple\u0022, \u0022hls_source\u0022: \u0022https://localhost/live\u002Dhls/amlst:jimmy_and_amy\u002Dsd\u002De73f4b67186a2ec4c13137607d02470ac61f32b60ac15e691bf33493423ef477_trns_h264/playlist.m3u8\u0022, \u0022allow_show_recordings\u0022: true, \u0022is_moderator\u0022: false, \u0022room_status\u0022: \u0022public\u0022, \u0022edge_auth\u0022: \u0022{\u005C\u0022username\u005C\u0022:\u005C\u0022__anonymous__eiwBXR\u005C\u0022,\u005C\u0022org\u005C\u0022:\u005C\u0022A\u005C\u0022,\u005C\u0022expire\u005C\u0022:1590669977,\u005C\u0022sig\u005C\u0022:\u005C\u0022454d96141c66fb42f74e9620b9d79e937de3a774a5687021f8650cc4f563d371\u005C\u0022,\u005C\u0022room\u005C\u0022:\u005C\u0022jimmy_and_amy\u005C\u0022}\u0022, \u0022is_supporter\u0022: false, \u0022chat_password\u0022: \u0022{\u005C\u0022username\u005C\u0022:\u005C\u0022__anonymous__eiwBXR\u005C\u0022,\u005C\u0022org\u005C\u0022:\u005C\u0022A\u005C\u0022,\u005C\u0022expire\u005C\u0022:1590669977,\u005C\u0022sig\u005C\u0022:\u005C\u0022454d96141c66fb42f74e9620b9d79e937de3a774a5687021f8650cc4f563d371\u005C\u0022,\u005C\u0022room\u005C\u0022:\u005C\u0022jimmy_and_amy\u005C\u0022}\u0022, \u0022room_pass\u0022: \u0022b5b2408cd91e6c595a3f732a5b7b1567b566bcc92f384ce5e6a00a26a24fb5c7\u0022, \u0022low_satisfaction_score\u0022: false, \u0022tfa_enabled\u0022: false, \u0022room_title\u0022: \u0022(STEP SIS CUM FACE) shh... Luna is here and dont know what we are doing #hairy #creampie #stockings #new #lush [3536 tokens remaining]\u0022, \u0022satisfaction_score\u0022: {\u0022down_votes\u0022: 15, \u0022up_votes\u0022: 67, \u0022percent\u0022: 82, \u0022max\u0022: 31222657}, \u0022viewer_username\u0022: \u0022AnonymousUser\u0022, \u0022hidden_message\u0022: \u0022\u0022, \u0022following\u0022: false, \u0022wschat_host\u0022: \u0022https://chatws\u002D45.stream.highwebmedia.com/ws\u0022, \u0022has_studio\u0022: false, \u0022num_followed\u0022: 0, \u0022spy_private_show_price\u0022: 30, \u0022hide_satisfaction_score\u0022: false, \u0022broadcaster_username\u0022: \u0022jimmy_and_amy\u0022, \u0022ignored_emoticons\u0022: [], \u0022apps_running\u0022: \u0022[[\u005C\u0022Tip Goal\u005C\u0022,\u005C\u0022\u005C\u005C/apps\u005C\u005C/app_details\u005C\u005C/tip\u002Dgoal\u005C\u005C/?slot\u003D0\u005C\u0022],[\u005C\u0022Ultra Bot \u002D 4Sci\u005C\u0022,\u005C\u0022\u005C\u005C/apps\u005C\u005C/app_details\u005C\u005C/ultra\u002Dbot\u002D4sci\u005C\u005C/?slot\u003D2\u005C\u0022],[\u005C\u0022Roll The Dice\u005C\u0022,\u005C\u0022\u005C\u005C/apps\u005C\u005C/app_details\u005C\u005C/roll\u002Dthe\u002Ddice\u002D5\u005C\u005C/?slot\u003D3\u005C\u0022]]\u0022, \u0022token_balance\u0022: 0, \u0022private_min_minutes\u0022: 10, \u0022viewer_gender\u0022: \u0022m\u0022, \u0022allow_anonymous_tipping\u0022: false, \u0022num_users_waiting_for_group\u0022: 0, \u0022last_vote_in_past_24_hours\u0022: null, \u0022is_widescreen\u0022: true, \u0022num_viewers\u0022: 1672, \u0022broadcaster_on_new_chat\u0022: false, \u0022private_show_price\u0022: 30, \u0022num_followed_online\u0022: 0, \u0022allow_private_shows\u0022: true}
and I want to decode it in python to view to make it easier for me to send it over the internet into our android app... anyways that why I am here it should look like
"{"allow_group_shows": true, "needs_supporter_to_pm": true, "ads_zone_ids": {"300x250,centre": "", "300x250,right": "", "300x250,left": "", "468x60": "", "160x600,top": "", "160x600,bottom": "", "160x600,middle": ""}, "chat_settings": {"sort_users_key": "a", "silence_broadcasters": "false", "highest_token_color": "darkpurple", "emoticon_autocomplete_delay": "0", "ignored_users": "", "show_emoticons": true, "font_size": "9pt", "b_tip_vol": "10", "allowed_chat": "all", "room_leave_for": "org", "font_color": "#494949", "font_family": "default", "room_entry_for": "org", "v_tip_vol": "80"}, "is_age_verified": true, "flash_host": "edge306.stream.highwebmedia.com", "tips_in_past_24_hours": 0, "dismissible_messages": [], "show_mobile_site_banner_link": false, "last_vote_in_past_90_days_down": false, "server_name": "115", "num_users_required_for_group": 2, "group_show_price": 18, "is_mobile": false, "chat_username": "bom4b5", "recommender_hmac": "ed05e292bb82262255a96944d81bb04dc2d248ca69fff35cf5d7015889c005b1", "broadcaster_gender": "couple", "hls_source": "https://***/live-edge/****-sd-e73f4b67186a2ec4c13137607d02470ac61f32b60***%22%7D", "allow_show_recordings": true, "is_moderator": false, "room_status": "public", "edge_auth": "{\"username\":\"bom4b5\",\"org\":\"A\",\"expire\":1590666696,\"sig\":\"49b6844fde2c47c2430bd05946b6cfbc9c7864788b9236d7f5af5ff88efd3f95\",\"room\":\"jimmy_and_amy\"}", "is_supporter": false, "chat_password": "****", "room_pass": "b5b2408cd91e6c595a3f732a5b7b1567b566bcc92f384ce5e6a00a26a24fb5c7", "low_satisfaction_score": false, "tfa_enabled": false, "room_title": "(STEP SIS CUM FACE) shh... Luna is here and dont know what we are doing #hairy #creampie #stockings #new #lush [4310 tokens remaining]", "satisfaction_score": {"down_votes": 15, "up_votes": 67, "percent": 82, "max": 31222657}, "viewer_username": "bom4b5", "hidden_message": "", "following": false, "wschat_host": "https://chatws-45.stream.highwebmedia.com/ws", "has_studio": false, "num_followed": 0, "spy_private_show_price": 30, "hide_satisfaction_score": false, "broadcaster_username": "jimmy_and_amy", "ignored_emoticons": [], "apps_running": "[[\"Tip Goal\",\"\\/apps\\/app_details\\/tip-goal\\/?slot=0\"],[\"Ultra Bot - 4Sci\",\"\\/apps\\/app_details\\/ultra-bot-4sci\\/?slot=2\"],[\"Roll The Dice\",\"\\/apps\\/app_details\\/roll-the-dice-5\\/?slot=3\"]]", "token_balance": 0, "private_min_minutes": 10, "viewer_gender": "f", "allow_anonymous_tipping": false, "num_users_waiting_for_group": 0, "last_vote_in_past_24_hours": null, "is_widescreen": true, "num_viewers": 331, "broadcaster_on_new_chat": false, "private_show_price": 30, "num_followed_online": 0, "allow_private_shows": true}"
I tried How to decode javascript unicode string in python?
it worked but the only one issue is that I cant decode it after I split
gett = s.get("https://localhost.com/34534535/")
# print(gett.text + "\n\n\n\n")
m = json.dumps({"k": gett.text}) # decodes it
# split
testt = (json.loads(m)["k"]).split('window.initialRoomDossier = "')
testt = testt[1].split('";')
final = testt[0] # show the same
the thing is that if I load it from a string like
a = "{\u0022allow_group_shows\u0022: true, \u0022nee..."
it does work but not after I split
Here is how I solved it
import js2py
#split
splitt = (json.loads(m)["k"]).split('window.initialRoomDossier = "')
splitt = splitt[1].split('";')
final = str(splitt[0])
final = 'var kek = "'+final+'";'#Turning it into javascript code (Later we will run it)
#Javascript Virtual Machine
context = js2py.EvalJs({'python_sum': sum})
#Slides the sexy code
context.execute(final)#Running the javascript code
#print (context.kek)#
m3u8_url = json.loads(context.kek)["hls_source"]

How to Escape true/false boolean at python JSON string

i have following code
headers = {'Content-Type': 'application/json', 'cwauth-token': token}
payload = {'namePostfix': 'test99682', 'costModel': 'NOT_TRACKED', 'clickRedirectType': 'REGULAR', 'trafficSource':{'id': '3a7ff9ec-19af-4996-94c1-7f33e036e7af'}, 'redirectTarget': 'DIRECT_URL', 'client':{'id': 'clentIDc', 'clientCode': 'xxx', 'mainDomain': 'domain.tld', 'defaultDomain': 'domain.tld'', 'dmrDomain': 'domain.tld'', 'customParam1Available': false, 'realtimeRoutingAPI': false, 'rootRedirect': false}, 'country':{'code': 'UK'}, 'directRedirectUrl': 'http://google.co.uk'}
r = requests.post('http://stackoverflow.com', json=payload, headers=headers)
When i hit start, it gives error
NameError: name 'false' is not defined
How i can escape those false booleans at payload?
Python doesn't use false, it uses False, hence you're getting a NameError because Python is looking for a variable called false which doesn't exist.
Replace false with False in your dictionary. You've also got a few too many quotes in places, so I've removed those:
payload = {'namePostfix': 'test99682', 'costModel': 'NOT_TRACKED', 'clickRedirectType': 'REGULAR', 'trafficSource':{'id': '3a7ff9ec-19af-4996-94c1-7f33e036e7af'}, 'redirectTarget': 'DIRECT_URL', 'client':{'id': 'clentIDc', 'clientCode': 'xxx', 'mainDomain': 'domain.tld', 'defaultDomain': 'domain.tld', 'dmrDomain': 'domain.tld', 'customParam1Available': False, 'realtimeRoutingAPI': False, 'rootRedirect': False}, 'country':{'code': 'UK'}, 'directRedirectUrl': 'http://google.co.uk'}
Likewise, the opposite boolean value is True (not true), and the "null" data type is None.
False is the correct name to use in Python. In Python, the boolean values, true and false are defined by capitalized True and False

ValueError: Expecting property name: line 1 column 2 (char 1)

I have a database with lots of tweets that I crawled using Twitter API. Those tweets are in a json format what allows me to convert them into dictionaries, right? So here we go: I'm trying to convert these strings into dictionaries using the json package. But everytime I try to run json.loads(string), it gives me an error:
ValueError: Expecting property name: line 1 column 2 (char 1).
Here's an example of the "json strings" I have in my database.
{u"contributors": "None", u"truncated": False, u"text": u"RT #WhoScored: Most clear-cut chances missed at World Cup 2014: M\xfcller / Higua\xedn / Benzema (5), de Vrij / \xd6zil / Ronaldo (4)", u"in_reply_to_status_id": "None", u"id": 487968527395983360, u"favorite_count": 0, u"source": u"Twitter for BlackBerry\xae", u"retweeted": False, u"coordinates": "None", u"entities": {u"user_mentions": [{u"id": 99806132, u"indices": [3, 13], u"id_str": u"99806132", u"screen_name": u"WhoScored", u"name": u"WhoScored.com"}], u"symbols": [], u"trends": [], u"hashtags": [], u"urls": []}, u"in_reply_to_screen_name": "None", u"id_str": u"487968527395983360", u"retweet_count": 0, u"in_reply_to_user_id": "None", u"favorited": False, u"retweeted_status": {u"contributors": "None", u"truncated": False, u"text": u"Most clear-cut chances missed at World Cup 2014: M\xfcller / Higua\xedn / Benzema (5), de Vrij / \xd6zil / Ronaldo (4)", u"in_reply_to_status_id": "None", u"id": 487955847025143808, u"favorite_count": 17, u"source": u"TweetDeck", u"retweeted": False, u"coordinates": "None", u"entities": {u"user_mentions": [], u"symbols": [], u"trends": [], u"hashtags": [], u"urls": []}, u"in_reply_to_screen_name": "None", u"id_str": u"487955847025143808", u"retweet_count": 59, u"in_reply_to_user_id": "None", u"favorited": False, u"user": {u"follow_request_sent": "None", u"profile_use_background_image": True, u"default_profile_image": False, u"id": 99806132, u"verified": True, u"profile_image_url_https": u"https://pbs.twimg.com/profile_images/477005408557486083/9MVR7GdF_normal.jpeg", u"profile_sidebar_fill_color": u"DDEEF6", u"profile_text_color": u"333333", u"followers_count": 425860, u"profile_sidebar_border_color": u"C0DEED", u"id_str": u"99806132", u"profile_background_color": u"272727", u"listed_count": 3245, u"profile_background_image_url_https": u"https://pbs.twimg.com/profile_background_images/439356280/123abc.jpg", u"utc_offset": 3600, u"statuses_count": 24118, u"description": u"The largest detailed football statistics website, covering Europe"s top leagues and more. Follow #WSTipster for betting tips. Powered by Opta data.", u"friends_count": 67, u"location": u"London", u"profile_link_color": u"0084B4", u"profile_image_url": u"http://pbs.twimg.com/profile_images/477005408557486083/9MVR7GdF_normal.jpeg", u"following": "None", u"geo_enabled": False, u"profile_banner_url": u"https://pbs.twimg.com/profile_banners/99806132/1402565693", u"profile_background_image_url": u"http://pbs.twimg.com/profile_background_images/439356280/123abc.jpg", u"name": u"WhoScored.com", u"lang": u"en", u"profile_background_tile": False, u"favourites_count": 250, u"screen_name": u"WhoScored", u"notifications": "None", u"url": u"http://whoscored.com", u"created_at": u"Sun Dec 27 23:22:45 +0000 2009", u"contributors_enabled": False, u"time_zone": u"London", u"protected": False, u"default_profile": False, u"is_translator": False}, u"geo": "None", u"in_reply_to_user_id_str": "None", u"possibly_sensitive": False, u"lang": u"ro", u"created_at": u"Sat Jul 12 13:45:14 +0000 2014", u"filter_level": u"low", u"in_reply_to_status_id_str": "None", u"place": "None"}, u"user": {u"follow_request_sent": "None", u"profile_use_background_image": True, u"default_profile_image": False, u"id": 498676612, u"verified": False, u"profile_image_url_https": u"https://pbs.twimg.com/profile_images/485720258934603776/BmUaZHax_normal.jpeg", u"profile_sidebar_fill_color": u"DDEEF6", u"profile_text_color": u"333333", u"followers_count": 192, u"profile_sidebar_border_color": u"C0DEED", u"id_str": u"498676612", u"profile_background_color": u"C0DEED", u"listed_count": 1, u"profile_background_image_url_https": u"https://pbs.twimg.com/profile_background_images/654833637/l9fp65m0xqzsmoneg8pz.jpeg", u"utc_offset": "None", u"statuses_count": 6468, u"description": u"Garut 10 July | Asda islamic school | Farmasi | #judikajude | path : Dicky Darul Majid", u"friends_count": 153, u"location": u"", u"profile_link_color": u"B30000", u"profile_image_url": u"http://pbs.twimg.com/profile_images/485720258934603776/BmUaZHax_normal.jpeg", u"following": "None", u"geo_enabled": True, u"profile_banner_url": u"https://pbs.twimg.com/profile_banners/498676612/1404927261", u"profile_background_image_url": u"http://pbs.twimg.com/profile_background_images/654833637/l9fp65m0xqzsmoneg8pz.jpeg", u"name": u"DICKY ", u"lang": u"en", u"profile_background_tile": False, u"favourites_count": 74, u"screen_name": u"DickyDarulMajid", u"notifications": "None", u"url": "None", u"created_at": u"Tue Feb 21 09:21:59 +0000 2012", u"contributors_enabled": False, u"time_zone": "None", u"protected": False, u"default_profile": False, u"is_translator": False}, u"geo": "None", u"in_reply_to_user_id_str": "None", u"possibly_sensitive": False, u"lang": u"ro", u"created_at": u"Sat Jul 12 14:35:37 +0000 2014", u"filter_level": u"medium", u"in_reply_to_status_id_str": "None", u"place": "None"}
And here is the code:
import sys, codecs, json
encode = sys.stdin.encoding
all_entries = Tweet.objects.all()[1427:1435]
for entry in all_entries:
tweet = entry.tweet.encode(encode)
json_acceptable_string = tweet.replace("'", "\"")
json_acceptable_string = json_acceptable_string.replace("None", "\"None\"")
data = json.loads(json_acceptable_string)
print data
Traceback:
Traceback (most recent call last):
File "/home/kiko/workspace/SA_WorldCup/main.py", line 6, in <module>
util.tweets_count()
File "/home/kiko/workspace/SA_WorldCup/util/__init__.py", line 25, in tweets_count
data = json.loads(json_acceptable_string, object_hook=JSONObject)
File "/usr/lib/python2.7/json/__init__.py", line 351, in loads
return cls(encoding=encoding, **kw).decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 2 (char 1)
I tried many times, but with no succeed. Can you help me out? Thanks a lot.
It looks like your "json" is actually python literal syntax. In that case, it might be easier to ast.literal_eval the string1.
As a side benefit, then you don't have to do any (possibly sketchy) replacements of None with "None". (Consider a tweet which says "None of your base are belong to us")
1although it would probably be even better to make sure that proper json is being dumped into the database to begin with...
use eval()
In [13]: a = "[{'start_city': '1', 'end_city': 'aaa', 'number': 1},\
...: {'start_city': '2', 'end_city': 'bbb', 'number': 1},\
...: {'start_city': '3', 'end_city': 'ccc', 'number': 1}]"
In [14]: eval(a)
Out[14]:
[{'end_city': 'aaa', 'number': 1, 'start_city': '1'},
{'end_city': 'bbb', 'number': 1, 'start_city': '2'},
{'end_city': 'ccc', 'number': 1, 'start_city': '3'}

Categories

Resources