How to fetch single item out of long string? - python

I have a very string as output of function as follows:
tmp = <"last seen":1568,"reviews [{"id":15869,"author":"abnbvg","changes":........>
How will I fetch the "id":15869 out of it?

The string content looks like JSON, so either use the json module or use a regular expression to extract the specific string you need.

The data looks like a JSON string. Use:
try:
import json
except ImportError:
import simplejson as json
tmp = '"last seen":1568,"reviews":[{"id":15869,"author":"abnbvg"}]'
data = json.loads('{{{}}}'.format(tmp))
>>> print data
{u'reviews': [{u'id': 15869, u'author': u'abnbvg'}], u'last seen': 1568}
>>> print data['reviews'][0]['id']
15869
Note that I wrapped the string in { and } to make a dictionary. You might not have to do that if the actual JSON string is already encapsulated with braces.

If id is the only thing you need from the string and it will always be something like {"id":15869,"author":"abnbvg"..., then you can go with sinple string split instead of json conversion.
tmp = '"last seen":1568,"reviews" : [{"id":15869,"author":"abnbvg","changes":........'
tmp1 = tmp.split('"id":', 1)[1]
id = tmp1.split(",", 1)[0]
Please note that tmp1 line may raise IndexError in case there is no "id" key found in the string. You can use -1 instead of 1 to side step. But in this way, you can report that "id" is not found.
try:
tmp1 = tmp.split('"id":', 1)[1]
id = tmp1.split(",", 1)[0]
except IndexError:
print "id key is not present in the json"
id = None
If you do really need more variables from the json string, please go with mhawke's solution of converting the json to dictionary and getting the value. You can use ast.literal_eval
from ast import literal_eval
tmp = '"last seen":1568,"reviews" : [{"id":15869,"author":"abnbvg","changes":........'
tmp_dict = literal_eval("""{%s}"""%(tmp))
print tmp_dict["reviews"][0]["id"]
In the second case, if you need to collect all the "id" keys in the list, this will help:
id_list =[]
for id_dict in tmp_dict["reviews"]:
id_list.append(id_dict["id"])
print id_list

Related

Assuming the structure of the json string does not change, is the order of a jsonpath match value result stable?

Assuming the structure of the json string does not change, is the order of a jsonpath match value result stable?
import jsonpath_ng
response = json.loads(response)
jsonpath_expression_name = jsonpath_ng.parse("$[forms][0][questionGroups][*][questions]..[name]")
match_name = [match.value for match in jsonpath_expression_name.find(response)]
jsonpath_expression_id = jsonpath_ng.parse("$[forms][0][questionGroups][*][questions]..[id]")
matches_id = [match.value for match in jsonpath_expression_id.find(response)]
survey_q_dict = { k:v for (k,v) in zip(matches_id, match_name)}
Thank you!

how to print after the keyword from python?

i have following string in python
b'{"personId":"65a83de6-b512-4410-81d2-ada57f18112a","persistedFaceIds":["792b31df-403f-4378-911b-8c06c06be8fa"],"name":"waqas"}'
I want to print the all alphabet next to keyword "name" such that my output should be
waqas
Note the waqas can be changed to any number so i want print any name next to keyword name using string operation or regex?
First you need to decode the string since it is binary b. Then use literal eval to make the dictionary, then you can access by key
>>> s = b'{"personId":"65a83de6-b512-4410-81d2-ada57f18112a","persistedFaceIds":["792b31df-403f-4378-911b-8c06c06be8fa"],"name":"waqas"}'
>>> import ast
>>> ast.literal_eval(s.decode())['name']
'waqas'
It is likely you should be reading your data into your program in a different manner than you are doing now.
If I assume your data is inside a JSON file, try something like the following, using the built-in json module:
import json
with open(filename) as fp:
data = json.load(fp)
print(data['name'])
if you want a more algorithmic way to extract the value of name:
s = b'{"personId":"65a83de6-b512-4410-81d2-ada57f18112a",\
"persistedFaceIds":["792b31df-403f-4378-911b-8c06c06be8fa"],\
"name":"waqas"}'
s = s.decode("utf-8")
key = '"name":"'
start = s.find(key) + len(key)
stop = s.find('"', start + 1)
extracted_string = s[start : stop]
print(extracted_string)
output
waqas
You can convert the string into a dictionary with json.loads()
import json
mystring = b'{"personId":"65a83de6-b512-4410-81d2-ada57f18112a","persistedFaceIds":["792b31df-403f-4378-911b-8c06c06be8fa"],"name":"waqas"}'
mydict = json.loads(mystring)
print(mydict["name"])
# output 'waqas'
First you need to convert the string into a proper JSON Format by removing b from the string using substring in python suppose you have a variable x :
import json
x = x[1:];
dict = json.loads(x) //convert JSON string into dictionary
print(dict["name"])

How to remove numbers from json in python

I having some json format like
json= 5843080158430803{"name":"NAME", "age":"56",}
So, how i get {"name":"NAME", "age":"56",} Using regex/split (which one is bets method for it) in Python.
Thanks in Advance...
Split the first occurance of { into an array, and get the second element in the array.
We also have to add the { again because its removed by the split function
json = '5843080158430803{"name":"NAME", "age":"56",}'
json = '{' + json.split('{', 1)[1]
print(json)
Result: {"name":"NAME", "age":"56",}
perhaps you could split at at the first { and then replace the part prior to it.
I am assuming the json you have above is actually a string. Then you could do:
json_prefix = json.split("{")
json = json.replace(json_prefix, "")

Get list from string with exec in python

I have:
"[15765,22832,15289,15016,15017]"
I want:
[15765,22832,15289,15016,15017]
What should I do to convert this string to list?
P.S. Post was edited without my permission and it lost important part. The type of line that looks like list is 'bytes'. This is not string.
P.S. №2. My initial code was:
import urllib.request, re
f = urllib.request.urlopen("http://www.finam.ru/cache/icharts/icharts.js")
lines = f.readlines()
for line in lines:
m = re.match('var\s+(\w+)\s*=\s*\[\\s*(.+)\s*\]\;', line.decode('windows-1251'))
if m is not None:
varname = m.group(1)
if varname == "aEmitentIds":
aEmitentIds = line #its type is 'bytes', not 'string'
I need to get list from line
line from web page looks like
[15765, 22832, 15289, 15016, 15017]
Assuming s is your string, you can just use split and then cast each number to integer:
s = [int(number) for number in s[1:-1].split(',')]
For detailed information about split function:
Python3 split documentation
What you have is a stringified list. You could use a json parser to parse that information into the corresponding list
import json
test_str = "[15765,22832,15289,15016,15017]"
l = json.loads(test_str) # List that you need.
Or another way to do this would be to use ast
import ast
test_str = "[15765,22832,15289,15016,15017]"
data = ast.literal_eval(test_str)
The result is
[15765, 22832, 15289, 15016, 15017]
To understand why using eval() is bad practice you could refer to this answer
You can also use regex to pull out numeric values from the string as follows:
import re
lst = "[15765,22832,15289,15016,15017]"
lst = [int(number) for number in re.findall('\d+',lst)]
Output of the above code is,
[15765, 22832, 15289, 15016, 15017]

How do i extract text from double quotes and add it to string ? python 3.x

import re
response_text = '{"captchaData":"/9j/4AAQSkZJRgABAQAAAQABAAD//gATYWJmNjUxYjM1ZjA3ZWRiMgD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCABGAMgDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3+iiigAooooAKKKKACikzRmgBaKTNGaAFopM0ZoAWikzRmgBaKTNGaAFopM0ZoAWikzRmgBaKTNGaAFopM0tABRRRQAUUUUAFFFFABRRRQAhooNJQAtFQPe2sb7HuYVf+60gBqZWDAFSCD0IouK6YUUtFAwopKWgBKKKWgBKWkrzTxzqmvafrkdlb6iyW90AYljUKVycYJ61lVqqnHmaMMRXVGHO1c9MpKbGgjiSMEkKoGT1NOrU3FooooASnU2nDpQAUUUUAFFFFABRRRQAUUUUAIetct4/1K70zw00lmzI8kqxtIvVVIP8Ahj8a6k9agu7W3vrV7e6iWWFxhkYZBqKkXKLSdjOtBzpuMXZs5Hw3omna14GgSe3RpZ1fdMwy4fcRuz1rhNJ8S6v4Uv57Xd5qxs0bwSklQwOMj0rv7TX4ptQHh/w1BEqQA752HyIAecDvya4jx/pEum63HPJMZjdx72kKhcuODwPbH5151bSCnT3jo2ePiVy041KT1jo2jc0Lx7qt/r4trmBGDoyRwRrg+Z1GSfoa0YvF+qaX4oGma/FAkMo3I8X8IOcc9xkYrh9JuDD4y0m6Bx50sLMfUthXP57q6rx+BF4v0K4A5yv6SA/1ohVn7Ny5tUwpV6vsnPmd4v8ABmrc+O5NP1uG11DSpLa0mwUmdvm2k43Ef5xXZM6Km9mAXrknArzr4rxjydLlxyGkXP12/wCFc5qevXPiPUdO04TOtoBDEVB+8xA3E+vOa2eJdKcoy12sdEsZKhUnCfvbW+Z7DHqFlK+yO8t3b+6sqk/zqxWa/h/S5LEWZsofKC7RhQCPcHrn3rjfDmv3GjeJbvw/qVw0ttGziGWQ5KYGRk+hFdEqrg0p9TsnXdOUVUW/XzPQJp4bdN88scSf3nYKP1rzbxXIl98SNFijdXjHkAlTkH94Sf0rX8LSnxTql/rV6m+CJ/JtInGVQdSceuMc/Wub8i1tPiuwUrHa28hlP91AI95+nOa569TnhF9GzjxVV1acWtnJHrVLXJaNqmpeK5ri6gnax0yJ/Li2IDJIe5JOQB+FV9X8Rah4U1e2ivpRe6fcDiQqFkTB56cH8q6PbxUebp3Ot4qCjzv4e52lLTUdZI1dDlWAII7inVsdIUo6U2nDpQAUUUUAFFFFABRRRQAUUUUAIetNkXfGy5xkEZpxpKAPFPC+pnwp4pk/tBHRNrQzALkryOcd+RW58Qbz+2dKs721tZxawyEefImwNuHYHnHHWvSXs7WWUSyW0LyDo7Rgn86W4tYbu2e3njV4nGCrAEflXGsLJU3T5tDzY4GapSo82j8jwjTiJNW0JEOWEkakDsfOY/1Fdv4/Xz/Fnh+3H3mdR+cgH9Kuj4Z2UF3HdWmo3MUsbiRCyKwBByOMCpdV8I6rfeI4NZXUreR7d1MUUkRUBVOQMgmsI0Kkabi1u0c0MLWhSlBx3a7bIzPixIBDpUXctK35bf8AGuH8NLv8T6YP+nmPP/fQruvGHhfxH4g1FZ1jtDDCpSJElOcZzk5A5rAXSb7QPFun32o2n2eB5w5KsGROeeR+dZ14Sdbna0ujLFU5yxPtGmldHsleEeLbnzfF+pyxN0lKZHsNp/ka9b13xHa6VpzPDKk93IMW8MZ3","captchaMime":"image/jpeg","captchaToken":"ALXfmJpxoaxq6LYBXm-kJzIl0Yd5mHG1XbttsBX-EKxMYtYNIc6uTv89fmRxeWZGEgpi2L9sjXYlkm6Vplav_wy2KjdB5J4j3i5fB6CEuPOMXIjEql6mPBJ8-YJTCpOzzk8kOcW5nuBbuLOdMVyVxquLbWjqLZzHeN0iT4Jm4SIZ9mQNfapNGkE","status":"CAPTCHA"}'
love = '"captchaData":"mydata"'
session_token_temp = re.search(r'(\"captchaData\":\")(\w*)',
response_text).group()
session_token = str(session_token_temp)
i want to extract value of captchaData and captchaToken and add the data to string like this
extracted_data = (value_of_captchaData)
extracted_data2 = (value_of_captchaToken)
It sounds like what you are actually trying to do is to parse JSON. JSON is a format that is often used to represent data on the web.
If you are using requests (which it sounds like from the tag), you can use .json() to parse the result. Otherwise use the built-in json module.
r = requests.get("https://httpbin.org/get")
data = r.json()
or
import json
data = json.loads('"key": "value"')
If all you want is to remove a given character before and after a string, you can use strip
'"some string"'.strip('"') # 'some string'
You can use ast.literal_eval instead of regex:
import ast
response_text = '{"captchaData":"/9j/4AAQSkZJRgABAQAAAQABAAD//gATYWJmNjUxYjM1ZjA3ZWRiMgD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCABGAMgDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3+iiigAooooAKKKKACikzRmgBaKTNGaAFopM0ZoAWikzRmgBaKTNGaAFopM0ZoAWikzRmgBaKTNGaAFopM0tABRRRQAUUUUAFFFFABRRRQAhooNJQAtFQPe2sb7HuYVf+60gBqZWDAFSCD0IouK6YUUtFAwopKWgBKKKWgBKWkrzTxzqmvafrkdlb6iyW90AYljUKVycYJ61lVqqnHmaMMRXVGHO1c9MpKbGgjiSMEkKoGT1NOrU3FooooASnU2nDpQAUUUUAFFFFABRRRQAUUUUAIetct4/1K70zw00lmzI8kqxtIvVVIP8Ahj8a6k9agu7W3vrV7e6iWWFxhkYZBqKkXKLSdjOtBzpuMXZs5Hw3omna14GgSe3RpZ1fdMwy4fcRuz1rhNJ8S6v4Uv57Xd5qxs0bwSklQwOMj0rv7TX4ptQHh/w1BEqQA752HyIAecDvya4jx/pEum63HPJMZjdx72kKhcuODwPbH5151bSCnT3jo2ePiVy041KT1jo2jc0Lx7qt/r4trmBGDoyRwRrg+Z1GSfoa0YvF+qaX4oGma/FAkMo3I8X8IOcc9xkYrh9JuDD4y0m6Bx50sLMfUthXP57q6rx+BF4v0K4A5yv6SA/1ohVn7Ny5tUwpV6vsnPmd4v8ABmrc+O5NP1uG11DSpLa0mwUmdvm2k43Ef5xXZM6Km9mAXrknArzr4rxjydLlxyGkXP12/wCFc5qevXPiPUdO04TOtoBDEVB+8xA3E+vOa2eJdKcoy12sdEsZKhUnCfvbW+Z7DHqFlK+yO8t3b+6sqk/zqxWa/h/S5LEWZsofKC7RhQCPcHrn3rjfDmv3GjeJbvw/qVw0ttGziGWQ5KYGRk+hFdEqrg0p9TsnXdOUVUW/XzPQJp4bdN88scSf3nYKP1rzbxXIl98SNFijdXjHkAlTkH94Sf0rX8LSnxTql/rV6m+CJ/JtInGVQdSceuMc/Wub8i1tPiuwUrHa28hlP91AI95+nOa569TnhF9GzjxVV1acWtnJHrVLXJaNqmpeK5ri6gnax0yJ/Li2IDJIe5JOQB+FV9X8Rah4U1e2ivpRe6fcDiQqFkTB56cH8q6PbxUebp3Ot4qCjzv4e52lLTUdZI1dDlWAII7inVsdIUo6U2nDpQAUUUUAFFFFABRRRQAUUUUAIetNkXfGy5xkEZpxpKAPFPC+pnwp4pk/tBHRNrQzALkryOcd+RW58Qbz+2dKs721tZxawyEefImwNuHYHnHHWvSXs7WWUSyW0LyDo7Rgn86W4tYbu2e3njV4nGCrAEflXGsLJU3T5tDzY4GapSo82j8jwjTiJNW0JEOWEkakDsfOY/1Fdv4/Xz/Fnh+3H3mdR+cgH9Kuj4Z2UF3HdWmo3MUsbiRCyKwBByOMCpdV8I6rfeI4NZXUreR7d1MUUkRUBVOQMgmsI0Kkabi1u0c0MLWhSlBx3a7bIzPixIBDpUXctK35bf8AGuH8NLv8T6YP+nmPP/fQruvGHhfxH4g1FZ1jtDDCpSJElOcZzk5A5rAXSb7QPFun32o2n2eB5w5KsGROeeR+dZ14Sdbna0ujLFU5yxPtGmldHsleEeLbnzfF+pyxN0lKZHsNp/ka9b13xHa6VpzPDKk93IMW8MZ3","captchaMime":"image/jpeg","captchaToken":"ALXfmJpxoaxq6LYBXm-kJzIl0Yd5mHG1XbttsBX-EKxMYtYNIc6uTv89fmRxeWZGEgpi2L9sjXYlkm6Vplav_wy2KjdB5J4j3i5fB6CEuPOMXIjEql6mPBJ8-YJTCpOzzk8kOcW5nuBbuLOdMVyVxquLbWjqLZzHeN0iT4Jm4SIZ9mQNfapNGkE","status":"CAPTCHA"}'
new_data = ast.literal_eval(response_text)
print(new_data["captchaData"])
print(new_data['captchaToken'])

Categories

Resources