How to decode javascript-unicode string in python? - python

I have this string:
{\u0022allow_group_shows\u0022: true, \u0022needs_supporter_to_pm\u0022: true, \u0022ads_zone_ids\u0022: {\u0022300x250,centre\u0022: \u0022\u0022, \u0022300x250,right\u0022: \u0022\u0022, \u0022300x250,left\u0022: \u0022\u0022, \u0022468x60\u0022: \u0022\u0022, \u0022160x600,top\u0022: \u0022\u0022, \u0022160x600,bottom\u0022: \u0022\u0022, \u0022160x600,middle\u0022: \u0022\u0022}, \u0022chat_settings\u0022: {\u0022sort_users_key\u0022: \u0022a\u0022, \u0022silence_broadcasters\u0022: \u0022false\u0022, \u0022highest_token_color\u0022: \u0022darkpurple\u0022, \u0022emoticon_autocomplete_delay\u0022: \u00220\u0022, \u0022ignored_users\u0022: \u0022\u0022, \u0022show_emoticons\u0022: true, \u0022font_size\u0022: \u00229pt\u0022, \u0022b_tip_vol\u0022: \u002210\u0022, \u0022allowed_chat\u0022: \u0022all\u0022, \u0022room_leave_for\u0022: \u0022org\u0022, \u0022font_color\u0022: \u0022#494949\u0022, \u0022font_family\u0022: \u0022default\u0022, \u0022room_entry_for\u0022: \u0022org\u0022, \u0022v_tip_vol\u0022: \u002280\u0022}, \u0022is_age_verified\u0022: true, \u0022flash_host\u0022: \u0022edge143.stream.highwebmedia.com\u0022, \u0022tips_in_past_24_hours\u0022: 0, \u0022dismissible_messages\u0022: [], \u0022show_mobile_site_banner_link\u0022: false, \u0022last_vote_in_past_90_days_down\u0022: false, \u0022server_name\u0022: \u0022113\u0022, \u0022num_users_required_for_group\u0022: 2, \u0022group_show_price\u0022: 18, \u0022is_mobile\u0022: false, \u0022chat_username\u0022: \u0022__anonymous__eiwBXR\u0022, \u0022recommender_hmac\u0022: \u0022dae28e4e9afa15da7c6227af2e8fb8abd85a3714aca8f86f01a53a6dd1377115\u0022, \u0022broadcaster_gender\u0022: \u0022couple\u0022, \u0022hls_source\u0022: \u0022https://localhost/live\u002Dhls/amlst:jimmy_and_amy\u002Dsd\u002De73f4b67186a2ec4c13137607d02470ac61f32b60ac15e691bf33493423ef477_trns_h264/playlist.m3u8\u0022, \u0022allow_show_recordings\u0022: true, \u0022is_moderator\u0022: false, \u0022room_status\u0022: \u0022public\u0022, \u0022edge_auth\u0022: \u0022{\u005C\u0022username\u005C\u0022:\u005C\u0022__anonymous__eiwBXR\u005C\u0022,\u005C\u0022org\u005C\u0022:\u005C\u0022A\u005C\u0022,\u005C\u0022expire\u005C\u0022:1590669977,\u005C\u0022sig\u005C\u0022:\u005C\u0022454d96141c66fb42f74e9620b9d79e937de3a774a5687021f8650cc4f563d371\u005C\u0022,\u005C\u0022room\u005C\u0022:\u005C\u0022jimmy_and_amy\u005C\u0022}\u0022, \u0022is_supporter\u0022: false, \u0022chat_password\u0022: \u0022{\u005C\u0022username\u005C\u0022:\u005C\u0022__anonymous__eiwBXR\u005C\u0022,\u005C\u0022org\u005C\u0022:\u005C\u0022A\u005C\u0022,\u005C\u0022expire\u005C\u0022:1590669977,\u005C\u0022sig\u005C\u0022:\u005C\u0022454d96141c66fb42f74e9620b9d79e937de3a774a5687021f8650cc4f563d371\u005C\u0022,\u005C\u0022room\u005C\u0022:\u005C\u0022jimmy_and_amy\u005C\u0022}\u0022, \u0022room_pass\u0022: \u0022b5b2408cd91e6c595a3f732a5b7b1567b566bcc92f384ce5e6a00a26a24fb5c7\u0022, \u0022low_satisfaction_score\u0022: false, \u0022tfa_enabled\u0022: false, \u0022room_title\u0022: \u0022(STEP SIS CUM FACE) shh... Luna is here and dont know what we are doing #hairy #creampie #stockings #new #lush [3536 tokens remaining]\u0022, \u0022satisfaction_score\u0022: {\u0022down_votes\u0022: 15, \u0022up_votes\u0022: 67, \u0022percent\u0022: 82, \u0022max\u0022: 31222657}, \u0022viewer_username\u0022: \u0022AnonymousUser\u0022, \u0022hidden_message\u0022: \u0022\u0022, \u0022following\u0022: false, \u0022wschat_host\u0022: \u0022https://chatws\u002D45.stream.highwebmedia.com/ws\u0022, \u0022has_studio\u0022: false, \u0022num_followed\u0022: 0, \u0022spy_private_show_price\u0022: 30, \u0022hide_satisfaction_score\u0022: false, \u0022broadcaster_username\u0022: \u0022jimmy_and_amy\u0022, \u0022ignored_emoticons\u0022: [], \u0022apps_running\u0022: \u0022[[\u005C\u0022Tip Goal\u005C\u0022,\u005C\u0022\u005C\u005C/apps\u005C\u005C/app_details\u005C\u005C/tip\u002Dgoal\u005C\u005C/?slot\u003D0\u005C\u0022],[\u005C\u0022Ultra Bot \u002D 4Sci\u005C\u0022,\u005C\u0022\u005C\u005C/apps\u005C\u005C/app_details\u005C\u005C/ultra\u002Dbot\u002D4sci\u005C\u005C/?slot\u003D2\u005C\u0022],[\u005C\u0022Roll The Dice\u005C\u0022,\u005C\u0022\u005C\u005C/apps\u005C\u005C/app_details\u005C\u005C/roll\u002Dthe\u002Ddice\u002D5\u005C\u005C/?slot\u003D3\u005C\u0022]]\u0022, \u0022token_balance\u0022: 0, \u0022private_min_minutes\u0022: 10, \u0022viewer_gender\u0022: \u0022m\u0022, \u0022allow_anonymous_tipping\u0022: false, \u0022num_users_waiting_for_group\u0022: 0, \u0022last_vote_in_past_24_hours\u0022: null, \u0022is_widescreen\u0022: true, \u0022num_viewers\u0022: 1672, \u0022broadcaster_on_new_chat\u0022: false, \u0022private_show_price\u0022: 30, \u0022num_followed_online\u0022: 0, \u0022allow_private_shows\u0022: true}
and I want to decode it in python to view to make it easier for me to send it over the internet into our android app... anyways that why I am here it should look like
"{"allow_group_shows": true, "needs_supporter_to_pm": true, "ads_zone_ids": {"300x250,centre": "", "300x250,right": "", "300x250,left": "", "468x60": "", "160x600,top": "", "160x600,bottom": "", "160x600,middle": ""}, "chat_settings": {"sort_users_key": "a", "silence_broadcasters": "false", "highest_token_color": "darkpurple", "emoticon_autocomplete_delay": "0", "ignored_users": "", "show_emoticons": true, "font_size": "9pt", "b_tip_vol": "10", "allowed_chat": "all", "room_leave_for": "org", "font_color": "#494949", "font_family": "default", "room_entry_for": "org", "v_tip_vol": "80"}, "is_age_verified": true, "flash_host": "edge306.stream.highwebmedia.com", "tips_in_past_24_hours": 0, "dismissible_messages": [], "show_mobile_site_banner_link": false, "last_vote_in_past_90_days_down": false, "server_name": "115", "num_users_required_for_group": 2, "group_show_price": 18, "is_mobile": false, "chat_username": "bom4b5", "recommender_hmac": "ed05e292bb82262255a96944d81bb04dc2d248ca69fff35cf5d7015889c005b1", "broadcaster_gender": "couple", "hls_source": "https://***/live-edge/****-sd-e73f4b67186a2ec4c13137607d02470ac61f32b60***%22%7D", "allow_show_recordings": true, "is_moderator": false, "room_status": "public", "edge_auth": "{\"username\":\"bom4b5\",\"org\":\"A\",\"expire\":1590666696,\"sig\":\"49b6844fde2c47c2430bd05946b6cfbc9c7864788b9236d7f5af5ff88efd3f95\",\"room\":\"jimmy_and_amy\"}", "is_supporter": false, "chat_password": "****", "room_pass": "b5b2408cd91e6c595a3f732a5b7b1567b566bcc92f384ce5e6a00a26a24fb5c7", "low_satisfaction_score": false, "tfa_enabled": false, "room_title": "(STEP SIS CUM FACE) shh... Luna is here and dont know what we are doing #hairy #creampie #stockings #new #lush [4310 tokens remaining]", "satisfaction_score": {"down_votes": 15, "up_votes": 67, "percent": 82, "max": 31222657}, "viewer_username": "bom4b5", "hidden_message": "", "following": false, "wschat_host": "https://chatws-45.stream.highwebmedia.com/ws", "has_studio": false, "num_followed": 0, "spy_private_show_price": 30, "hide_satisfaction_score": false, "broadcaster_username": "jimmy_and_amy", "ignored_emoticons": [], "apps_running": "[[\"Tip Goal\",\"\\/apps\\/app_details\\/tip-goal\\/?slot=0\"],[\"Ultra Bot - 4Sci\",\"\\/apps\\/app_details\\/ultra-bot-4sci\\/?slot=2\"],[\"Roll The Dice\",\"\\/apps\\/app_details\\/roll-the-dice-5\\/?slot=3\"]]", "token_balance": 0, "private_min_minutes": 10, "viewer_gender": "f", "allow_anonymous_tipping": false, "num_users_waiting_for_group": 0, "last_vote_in_past_24_hours": null, "is_widescreen": true, "num_viewers": 331, "broadcaster_on_new_chat": false, "private_show_price": 30, "num_followed_online": 0, "allow_private_shows": true}"
I tried How to decode javascript unicode string in python?
it worked but the only one issue is that I cant decode it after I split
gett = s.get("https://localhost.com/34534535/")
# print(gett.text + "\n\n\n\n")
m = json.dumps({"k": gett.text}) # decodes it
# split
testt = (json.loads(m)["k"]).split('window.initialRoomDossier = "')
testt = testt[1].split('";')
final = testt[0] # show the same
the thing is that if I load it from a string like
a = "{\u0022allow_group_shows\u0022: true, \u0022nee..."
it does work but not after I split

Here is how I solved it
import js2py
#split
splitt = (json.loads(m)["k"]).split('window.initialRoomDossier = "')
splitt = splitt[1].split('";')
final = str(splitt[0])
final = 'var kek = "'+final+'";'#Turning it into javascript code (Later we will run it)
#Javascript Virtual Machine
context = js2py.EvalJs({'python_sum': sum})
#Slides the sexy code
context.execute(final)#Running the javascript code
#print (context.kek)#
m3u8_url = json.loads(context.kek)["hls_source"]

Related

huggingface transformer question answer confidence score

How can we fetch the answer confidence score from the sample code of huggingface transformer question answer? I see that pipeline does return the score, but can the below core also return the confidence score.
from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering
import tensorflow as tf
tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
text = r"""
🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""
questions = [
"How many pretrained models are available in Transformers?",
"What does Transformers provide?",
"Transformers provides interoperability between which frameworks?",
]
for question in questions:
inputs = tokenizer.encode_plus(question, text, add_special_tokens=True, return_tensors="tf")
input_ids = inputs["input_ids"].numpy()[0]
text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
answer_start_scores, answer_end_scores = model(inputs)
answer_start = tf.argmax(
answer_start_scores, axis=1
).numpy()[0] # Get the most likely beginning of answer with the argmax of the score
answer_end = (
tf.argmax(answer_end_scores, axis=1) + 1
).numpy()[0] # Get the most likely end of answer with the argmax of the score
answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
print(f"Question: {question}")
print(f"Answer: {answer}\n")
Code picked up from
https://huggingface.co/transformers/usage.html
The score is just a multiplication of the logits of the answer start token answer end token after applying the softmax function. Please have a look at the example below:
Pipeline output:
import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")
model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad")
text = r"""
🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""
question = "How many pretrained models are available in Transformers?"
question_answerer = pipeline("question-answering", model = model, tokenizer= tokenizer)
print(question_answerer(question=question, context = text))
Output:
{'score': 0.5254509449005127, 'start': 256, 'end': 264, 'answer': 'over 32+'}
Without pipeline:
inputs = tokenizer(question, text, add_special_tokens=True, return_tensors="pt")
outputs = model(**inputs)
At first, we create a mask that has a 1 for every context token and 0 otherwise (question tokens and special tokens. We use the batchencoding.sequence_ids method for that:
non_answer_tokens = [x if x in [0,1] else 0 for x in inputs.sequence_ids()]
non_answer_tokens = torch.tensor(non_answer_tokens, dtype=torch.bool)
non_answer_tokens
Output:
tensor([False, False, False, False, False, False, False, False, False, False,
False, False, False, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, False])
We use this mask to set the logits of the special tokens and the questions tokens to negative infinite and apply the softmax afterward (the negative infinite prevents these tokens from influencing the softmax result):
from torch.nn.functional import softmax
potential_start = torch.where(non_answer_tokens, outputs.start_logits, torch.tensor(float('-inf'),dtype=torch.float))
potential_end = torch.where(non_answer_tokens, outputs.end_logits, torch.tensor(float('-inf'),dtype=torch.float))
potential_start = softmax(potential_start, dim = 1)
potential_end = softmax(potential_end, dim = 1)
potential_start
Output:
tensor([[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 1.0567e-04, 9.7031e-05, 1.9445e-06, 1.5849e-06, 1.2075e-07,
3.1704e-08, 4.7796e-06, 1.8712e-07, 6.2977e-08, 1.5481e-07, 8.0004e-08,
3.7896e-07, 1.6438e-07, 9.7762e-08, 1.0898e-05, 1.6518e-07, 5.6349e-08,
2.4848e-07, 2.1459e-07, 1.3785e-06, 1.0386e-07, 1.8803e-07, 8.1887e-08,
4.1088e-07, 1.5618e-07, 2.5624e-06, 1.8526e-06, 2.6710e-06, 6.8466e-08,
1.7953e-07, 3.6242e-07, 2.2788e-07, 2.3384e-06, 1.2147e-05, 1.6065e-07,
3.3257e-07, 2.6021e-07, 2.8140e-06, 1.3698e-07, 1.1066e-07, 2.8436e-06,
1.2171e-07, 9.9341e-07, 1.1684e-07, 6.8935e-08, 5.6335e-08, 1.3314e-07,
1.3038e-07, 7.9560e-07, 1.0671e-07, 9.1864e-08, 5.6394e-07, 3.0210e-08,
7.2176e-08, 5.4452e-08, 1.2873e-07, 9.2636e-08, 9.6012e-07, 7.8008e-08,
1.3124e-07, 1.3680e-06, 8.8716e-07, 8.6627e-07, 6.4750e-06, 2.5951e-07,
6.1648e-07, 8.7724e-07, 1.0796e-05, 2.6633e-07, 5.4644e-07, 1.7553e-07,
1.6015e-05, 5.0054e-07, 8.2263e-07, 2.6336e-06, 2.0743e-05, 4.0008e-07,
1.9330e-06, 2.0312e-04, 6.0256e-01, 3.9638e-01, 3.1568e-04, 2.2009e-05,
1.2485e-06, 2.4744e-06, 1.0092e-05, 3.1047e-06, 1.3597e-04, 1.5105e-06,
1.4960e-06, 8.1164e-08, 1.6534e-06, 4.6181e-07, 8.7354e-08, 2.2356e-07,
9.1145e-07, 8.8194e-06, 4.4202e-07, 1.9238e-07, 2.8077e-07, 1.4117e-05,
2.0613e-07, 1.2676e-06, 8.1317e-08, 2.2337e-06, 1.2399e-07, 6.1745e-08,
3.4725e-08, 2.7878e-07, 4.1457e-07, 0.0000e+00]],
grad_fn=<SoftmaxBackward>)
These probabilities can now be used to extract the start and end token of the answer and to calculate the answer score:
answer_start = torch.argmax(potential_start)
answer_end = torch.argmax(potential_end)
answer = tokenizer.decode(inputs.input_ids.squeeze()[answer_start:answer_end+1])
print(potential_start.squeeze()[answer_start])
print(potential_end.squeeze()[answer_end])
print(potential_start.squeeze()[answer_start] *potential_end.squeeze()[answer_end])
print(answer)
Output:
tensor(0.6026, grad_fn=<SelectBackward>)
tensor(0.8720, grad_fn=<SelectBackward>)
tensor(0.5255, grad_fn=<MulBackward0>)
over 32 +
P.S.: Please keep in mind that this answer does not cover any special cases (end token before start token).

How to Group by Date Field in a PivotTable using win32com.client

I tried many approaches from last few hours but no luck. Somebody please help me.
group_dt = pt.PivotFields('Created')
group_dt.LabelRange.Group(Start=True, End=True, Periods=Array(False, False, False, False, True, False, True))
Error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-75-0355ab1abb88> in <module>
1 group_dt = pt.PivotFields('Created')
----> 2 group_dt.LabelRange.Group(Start=True, End=True, Periods=Array(False, False, False, False, True, False, True))
TypeError: 'str' object is not callable
After a lot of research I have figured out the way to group date field using win32com.client
cell = pivot_sheet.Range('B5')
cell.Group(Start=True, End=True, Periods=list([False, False, False, False, True, False, True]))

Expected indented block - line continuation?

solved_places = {
'a1': False,'a2': False,'a3': False,'a4': False,'a5': False,
'b1': False,'b2': False,'b3': False,'b4': False,'b5': False,
'c1': False,'c2': False,'c3': False,'c4': False,'c5': False,
'd1': False,'d2': False,'d3': False,'d4': False,'d5': False,
'e1': False,'e2': False,'e3': False,'e4': False,'e5': False
}
Getting error pointer on the second line, (b1) asking for an indentation.
(ERROR: Expected an indented Block)
Whats wrong? (no Spaces, all tabs, 1 tab = 4 Spaces...)
I've tried different Levels of indentation, Spaces, commas etc...) like this:
solved_places =
{'a1': False,'a2': False,'a3': False,'a4': False,'a5': False,
'b1': False,'b2': False,'b3': False,'b4': False,'b5': False,
'c1': False,'c2': False,'c3': False,'c4': False,'c5': False,
'd1': False,'d2': False,'d3': False,'d4': False,'d5': False,
'e1': False,'e2': False,'e3': False,'e4': False,'e5': False}

Choosing an equlation from api response Python

Im trying to work with api responses;
Here is the example response that comes from api;
{u'blog': {u'followed': False, u'is_adult': False, u'can_subscribe': False, u'is_nsfw': False, u'ask': True, u'likes': 920, u'is_blocked_from_primary': False, u'can_submit': True, u'ask_anon': True, u'subscribed': False, u'share_likes': True, u'updated': 1493576375, u'description': u'<p>"Che hai dei bellissimi occhi quando mi cerchi."</p><p>18 </p><p>Beginner Wiccan and Witch </p><p>\U0001f312\U0001f315\U0001f318</p>', u'total_posts': 13992, u'submission_page_title': u'Submit', u'submission_terms': {u'title': u'Submit', u'tags': [], u'guidelines': u'', u'accepted_types': [u'text', u'photo', u'quote', u'link', u'video']}, u'name': u'darknessinmyheartt', u'url': u'http://darknessinmyheartt.tumblr.com/', u'ask_page_title': u'lets ask something!/ haydi sor!', u'title': u'"Laurel"', u'posts': 13992, u'reply_conditions': u'3', u'can_send_fan_mail': False}}
how can ı get only the value of u'updated' from that response
u'updated': 1493576375
I have to define that value to "x"
I think the tumblr api response is a python dict, based on that, try:
x = client.blog_info('darknessinmyheartt')
print x['blog']['updated']
You got json response, in Python you can get value from json data, via key.
For your example, you need to do next things:
your_data['blog']['updated']
via your_data['blog'] You will get object with key, values
{'followed': False, u'is_adult': False, u'can_subscribe': False, ....}
and via your_data['blog']['updated'] you will get value 1493576375

ValueError: Expecting property name: line 1 column 2 (char 1)

I have a database with lots of tweets that I crawled using Twitter API. Those tweets are in a json format what allows me to convert them into dictionaries, right? So here we go: I'm trying to convert these strings into dictionaries using the json package. But everytime I try to run json.loads(string), it gives me an error:
ValueError: Expecting property name: line 1 column 2 (char 1).
Here's an example of the "json strings" I have in my database.
{u"contributors": "None", u"truncated": False, u"text": u"RT #WhoScored: Most clear-cut chances missed at World Cup 2014: M\xfcller / Higua\xedn / Benzema (5), de Vrij / \xd6zil / Ronaldo (4)", u"in_reply_to_status_id": "None", u"id": 487968527395983360, u"favorite_count": 0, u"source": u"Twitter for BlackBerry\xae", u"retweeted": False, u"coordinates": "None", u"entities": {u"user_mentions": [{u"id": 99806132, u"indices": [3, 13], u"id_str": u"99806132", u"screen_name": u"WhoScored", u"name": u"WhoScored.com"}], u"symbols": [], u"trends": [], u"hashtags": [], u"urls": []}, u"in_reply_to_screen_name": "None", u"id_str": u"487968527395983360", u"retweet_count": 0, u"in_reply_to_user_id": "None", u"favorited": False, u"retweeted_status": {u"contributors": "None", u"truncated": False, u"text": u"Most clear-cut chances missed at World Cup 2014: M\xfcller / Higua\xedn / Benzema (5), de Vrij / \xd6zil / Ronaldo (4)", u"in_reply_to_status_id": "None", u"id": 487955847025143808, u"favorite_count": 17, u"source": u"TweetDeck", u"retweeted": False, u"coordinates": "None", u"entities": {u"user_mentions": [], u"symbols": [], u"trends": [], u"hashtags": [], u"urls": []}, u"in_reply_to_screen_name": "None", u"id_str": u"487955847025143808", u"retweet_count": 59, u"in_reply_to_user_id": "None", u"favorited": False, u"user": {u"follow_request_sent": "None", u"profile_use_background_image": True, u"default_profile_image": False, u"id": 99806132, u"verified": True, u"profile_image_url_https": u"https://pbs.twimg.com/profile_images/477005408557486083/9MVR7GdF_normal.jpeg", u"profile_sidebar_fill_color": u"DDEEF6", u"profile_text_color": u"333333", u"followers_count": 425860, u"profile_sidebar_border_color": u"C0DEED", u"id_str": u"99806132", u"profile_background_color": u"272727", u"listed_count": 3245, u"profile_background_image_url_https": u"https://pbs.twimg.com/profile_background_images/439356280/123abc.jpg", u"utc_offset": 3600, u"statuses_count": 24118, u"description": u"The largest detailed football statistics website, covering Europe"s top leagues and more. Follow #WSTipster for betting tips. Powered by Opta data.", u"friends_count": 67, u"location": u"London", u"profile_link_color": u"0084B4", u"profile_image_url": u"http://pbs.twimg.com/profile_images/477005408557486083/9MVR7GdF_normal.jpeg", u"following": "None", u"geo_enabled": False, u"profile_banner_url": u"https://pbs.twimg.com/profile_banners/99806132/1402565693", u"profile_background_image_url": u"http://pbs.twimg.com/profile_background_images/439356280/123abc.jpg", u"name": u"WhoScored.com", u"lang": u"en", u"profile_background_tile": False, u"favourites_count": 250, u"screen_name": u"WhoScored", u"notifications": "None", u"url": u"http://whoscored.com", u"created_at": u"Sun Dec 27 23:22:45 +0000 2009", u"contributors_enabled": False, u"time_zone": u"London", u"protected": False, u"default_profile": False, u"is_translator": False}, u"geo": "None", u"in_reply_to_user_id_str": "None", u"possibly_sensitive": False, u"lang": u"ro", u"created_at": u"Sat Jul 12 13:45:14 +0000 2014", u"filter_level": u"low", u"in_reply_to_status_id_str": "None", u"place": "None"}, u"user": {u"follow_request_sent": "None", u"profile_use_background_image": True, u"default_profile_image": False, u"id": 498676612, u"verified": False, u"profile_image_url_https": u"https://pbs.twimg.com/profile_images/485720258934603776/BmUaZHax_normal.jpeg", u"profile_sidebar_fill_color": u"DDEEF6", u"profile_text_color": u"333333", u"followers_count": 192, u"profile_sidebar_border_color": u"C0DEED", u"id_str": u"498676612", u"profile_background_color": u"C0DEED", u"listed_count": 1, u"profile_background_image_url_https": u"https://pbs.twimg.com/profile_background_images/654833637/l9fp65m0xqzsmoneg8pz.jpeg", u"utc_offset": "None", u"statuses_count": 6468, u"description": u"Garut 10 July | Asda islamic school | Farmasi | #judikajude | path : Dicky Darul Majid", u"friends_count": 153, u"location": u"", u"profile_link_color": u"B30000", u"profile_image_url": u"http://pbs.twimg.com/profile_images/485720258934603776/BmUaZHax_normal.jpeg", u"following": "None", u"geo_enabled": True, u"profile_banner_url": u"https://pbs.twimg.com/profile_banners/498676612/1404927261", u"profile_background_image_url": u"http://pbs.twimg.com/profile_background_images/654833637/l9fp65m0xqzsmoneg8pz.jpeg", u"name": u"DICKY ", u"lang": u"en", u"profile_background_tile": False, u"favourites_count": 74, u"screen_name": u"DickyDarulMajid", u"notifications": "None", u"url": "None", u"created_at": u"Tue Feb 21 09:21:59 +0000 2012", u"contributors_enabled": False, u"time_zone": "None", u"protected": False, u"default_profile": False, u"is_translator": False}, u"geo": "None", u"in_reply_to_user_id_str": "None", u"possibly_sensitive": False, u"lang": u"ro", u"created_at": u"Sat Jul 12 14:35:37 +0000 2014", u"filter_level": u"medium", u"in_reply_to_status_id_str": "None", u"place": "None"}
And here is the code:
import sys, codecs, json
encode = sys.stdin.encoding
all_entries = Tweet.objects.all()[1427:1435]
for entry in all_entries:
tweet = entry.tweet.encode(encode)
json_acceptable_string = tweet.replace("'", "\"")
json_acceptable_string = json_acceptable_string.replace("None", "\"None\"")
data = json.loads(json_acceptable_string)
print data
Traceback:
Traceback (most recent call last):
File "/home/kiko/workspace/SA_WorldCup/main.py", line 6, in <module>
util.tweets_count()
File "/home/kiko/workspace/SA_WorldCup/util/__init__.py", line 25, in tweets_count
data = json.loads(json_acceptable_string, object_hook=JSONObject)
File "/usr/lib/python2.7/json/__init__.py", line 351, in loads
return cls(encoding=encoding, **kw).decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 2 (char 1)
I tried many times, but with no succeed. Can you help me out? Thanks a lot.
It looks like your "json" is actually python literal syntax. In that case, it might be easier to ast.literal_eval the string1.
As a side benefit, then you don't have to do any (possibly sketchy) replacements of None with "None". (Consider a tweet which says "None of your base are belong to us")
1although it would probably be even better to make sure that proper json is being dumped into the database to begin with...
use eval()
In [13]: a = "[{'start_city': '1', 'end_city': 'aaa', 'number': 1},\
...: {'start_city': '2', 'end_city': 'bbb', 'number': 1},\
...: {'start_city': '3', 'end_city': 'ccc', 'number': 1}]"
In [14]: eval(a)
Out[14]:
[{'end_city': 'aaa', 'number': 1, 'start_city': '1'},
{'end_city': 'bbb', 'number': 1, 'start_city': '2'},
{'end_city': 'ccc', 'number': 1, 'start_city': '3'}

Categories

Resources