So I have a string that contains data below
https://myanimelist.net/animelist/domis1/load.json?status=2&offset=0.
I want to find all 'anime_id' and put them into the list (only numbers).
I tried with find('anime_id'), but I can't do this for multiple occurings in the string.
Here is an example, how to extract anime_id from a json file called test.json, using built-in json module:
import json
with open('test.json') as f:
data = json.load(f)
# Create generator and search for anime_id
gen = (i['anime_id'] for i in data)
# If needed, iterate over generator and create a list
gen_list = list(gen)
# Print list on console
print(gen_list)
Your string is in json format, you can parse it with the builtin json module.
import json
data = json.loads(your_string)
for d in data:
print(d["anime_id"])
Related
I am using pyspark in which I am extacting the required string from log files which is a JSON string but without the quotes. Below is the example:
{PlatformVersion=123,PlatformClient=html,namespace=NAT}
I want to convert it to either CSV or JSON as I want to further store it into relation DB using data pipelines. Is there a way to achieve converting such string to CSV or JSON?
The below is doing the job.
Steps:
remove curly braces
split by ,
split by =
populate the dict with key & value
result = {}
log_line = '{PlatformVersion=123,PlatformClient=html,namespace=NAT}'
log_line = log_line[1:-1]
parts = log_line.split(',')
for part in parts:
k,v = part.split('=')
result[k] = v
print(result)
output
{'PlatformVersion': '123', 'PlatformClient': 'html', 'namespace': 'NAT'}
I want to read a dictionary from a text file. This dictionary seems like {'key': [1, ord('#')]}. I read about eval() and literal_eval(), but none of those two will work due to ord().
I also tried json.loads and json.dumps, but no positive results.
Which other way could I use to do it?
So Assuming you read the text file in with open as a string and not with json.loads you could do some simple regex searching for what is between the parenthesis of ord e.g ord('#') -> #
This is a minimal solution that reads everything from the file as a single string then finds all instances of ord and places the integer representation in an output list called ord_. For testing this example myfile.txt was a text file with the following in it
{"key": [1, "ord('#')"],
"key2": [1, "ord('K')"]}
import json
import re
with open(r"myfile.txt") as f:
json_ = "".join([line.rstrip("\n") for line in f])
rgx = re.compile(r"ord\(([^\)]+)\)")
rgd = rgx.findall(json_)
ord_ = [ord(str_.replace(r"'", "")) for str_ in rgd]
json.dump() and json.load() will not work because ord() is not JSON Serializable (meaning that the function cannot be a JSON object.
Yes, eval is really bad practice, I would never recommend it to anyone for any use.
The best way I can think of to solve this is to use conditions and an extra list.
# data.json = {'key': [1, ['ord', '#']]} # first one is function name, second is arg
with open("data.json") as f:
data = json.load(f)
# data['key'][1][0] is "ord"
if data['key'][1][0] == "ord":
res = ord(data['key'][1][1])
i have following string in python
b'{"personId":"65a83de6-b512-4410-81d2-ada57f18112a","persistedFaceIds":["792b31df-403f-4378-911b-8c06c06be8fa"],"name":"waqas"}'
I want to print the all alphabet next to keyword "name" such that my output should be
waqas
Note the waqas can be changed to any number so i want print any name next to keyword name using string operation or regex?
First you need to decode the string since it is binary b. Then use literal eval to make the dictionary, then you can access by key
>>> s = b'{"personId":"65a83de6-b512-4410-81d2-ada57f18112a","persistedFaceIds":["792b31df-403f-4378-911b-8c06c06be8fa"],"name":"waqas"}'
>>> import ast
>>> ast.literal_eval(s.decode())['name']
'waqas'
It is likely you should be reading your data into your program in a different manner than you are doing now.
If I assume your data is inside a JSON file, try something like the following, using the built-in json module:
import json
with open(filename) as fp:
data = json.load(fp)
print(data['name'])
if you want a more algorithmic way to extract the value of name:
s = b'{"personId":"65a83de6-b512-4410-81d2-ada57f18112a",\
"persistedFaceIds":["792b31df-403f-4378-911b-8c06c06be8fa"],\
"name":"waqas"}'
s = s.decode("utf-8")
key = '"name":"'
start = s.find(key) + len(key)
stop = s.find('"', start + 1)
extracted_string = s[start : stop]
print(extracted_string)
output
waqas
You can convert the string into a dictionary with json.loads()
import json
mystring = b'{"personId":"65a83de6-b512-4410-81d2-ada57f18112a","persistedFaceIds":["792b31df-403f-4378-911b-8c06c06be8fa"],"name":"waqas"}'
mydict = json.loads(mystring)
print(mydict["name"])
# output 'waqas'
First you need to convert the string into a proper JSON Format by removing b from the string using substring in python suppose you have a variable x :
import json
x = x[1:];
dict = json.loads(x) //convert JSON string into dictionary
print(dict["name"])
I have a string in the form of
[[sourceId:111, clientId:12345, clientName:testclient, module:test,source:Request, userName:Michelle Jackson],[sourceId:112, clientId:1233, clientName:testclient2, module:test, source:Request, userName:Michelle Jackson]]
How do I convert it into a valid python list of json ?
Although I never recommend doing this, here's the code
import re
arr = []
for x in s.split('],['):
kv = re.sub('\[|\]', '', x)
arr.append(dict(kvi.split(':') for kvi in kv.split(',')))
NOTE: If the string is system generated, it's better to get it in JSON format in the first place.
Basically I want to read a string from a text file and store it as orderedDict.
My file contains the following content.
content.txt:
variable_one=OrderedDict([('xxx', [['xxx_a', 'xxx_b'],['xx_c', 'xx_d']]),('yyy', [['yyy_a', 'yyy_b'],['yy_c', 'yy_d']]))])
variable_two=OrderedDict([('xxx', [['xxx_a', 'xxx_b'],['xx_c', 'xx_d']]),('yyy', [['yyy_a', 'yyy_b'],['yy_c', 'yy_d']]))])
how will I retrieve values in python as:
xxx
xxx_a -> xxx_b
xxx_c -> xxx_d
import re
from ast import literal_eval
from collections import OrderedDict
# This string is slightly different from your sample which had an extra bracket
line = "variable_one=OrderedDict([('xxx', [['xxx_a', 'xxx_b'],['xx_c', 'xx_d']]),('yyy', [['yyy_a', 'yyy_b'],['yy_c', 'yy_d']])])"
match = re.match(r'(\w+)\s*=\s*OrderedDict\((.+)\)\s*$', line)
variable, data = match.groups()
# This allows safe evaluation: data can only be a basic data structure
data = literal_eval(data)
data = [(key, OrderedDict(val)) for key, val in data]
data = OrderedDict(data)
Verification that it works:
print variable
import json
print json.dumps(data, indent=4)
Output:
variable_one
{
"xxx": {
"xxx_a": "xxx_b",
"xx_c": "xx_d"
},
"yyy": {
"yyy_a": "yyy_b",
"yy_c": "yy_d"
}
}
Having said all that, your request is very odd. If you can control the source of the data, use a real serialisation format that supports order (so not JSON). Don't output Python code.