Dictionary object has no attribute split - python

My data file looks like this:
{'data': 'xyz', 'code': '<:c:605445> **[Code](https://traindata/35.6547,56475', 'time': '2021-12-30T09:56:53.547', 'value': 'True', 'stats': '96/23', 'dupe_id': 'S<:c-74.18'}
I'm trying to print this line:
35.6547,56475
Here is my code:
data = "above mentioned data"
for s in data.values():
print(s)
while data != "stop":
if data == "quit":
os.system("disconnect")
else:
x, y = s.split(',', 1)
The output is:
{'data': 'xyz', 'code': '<:c:605445> **[Code](https://traindata/35.6547,56475', 'time': '2021-12-30T09:56:53.547', 'value': 'True', 'stats': '95/23', 'dupe_id': 'S<:c-74.18'}
x, y = s.split(',', 1)
AttributeError: 'dict' object has no attribute 'split'
I've tried converting it into tuple, list but I'm getting the same error. The input in x,y should be the above mentioned expected output (35.6547,56475).
Any help will be highly appreciated.

You can do it like this:
x,y = d['code'].split('/')[-1].split(',')
That means, you need to access the dictionary by one of it's keys, here you want to go for 'code'. You retrieve the string '<:c:605445> **[Code](https://traindata/35.6547,56475' which you can now either parse via regex or you just do a split at the '/' and take the last element of it using [-1]. Then you can just split the remaining numbers, that you are actually looking for and write them to x and y respectively.
Of course, you might want to check your incoming data to be valid by catching the KeyError you mentioned in the comments:
try:
x,y = d['code'].split('/')[-1].split(',')
except KeyError:
print(f'Data invalid. Key "code" not found. Got: {data} instead')

Another option would be to use a simple regex on the code element - regex starting at the end of the string, find all digits to a . find all digits to a , find all digits.
import re
d = {'data': 'xyz', 'code': ':c: **[Code](https://traindata/35.6547,56475', 'time': '2021-12-30T09:56:53.547', 'value': 'True', 'stats': '96/23', 'dupe_id': 'S<:c-74.18'}
print(re.findall(r'\d+.\d+,\d+$', d['code'])[0])

you can only split text not a dictionary type
first get the text that you want to split
d['code']

Related

JSONDecodeError when trying to extract a json

I am getting this error every time i try to extract the json in this api
Request_URL='https://freeserv.dukascopy.com/2.0/api/group=quotes&method=realtimeSentimentIndex&enabled=true&key=bsq3l3p5lc8w4s0c&type=swfx&jsonp=_callbacks____1kvynkpid'
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
import json
import requests
import pandas as pd
r = requests.get(Request_URL)
df = pd.DataFrame(r.json())
The problem is that the response coming back is in JSONP format. That is, it is JavaScript consisting of a call to a function with an argument that is a JavaScript structure (which should be a valid JSON string if it had single quotes around it, but there is no guarantee that it is). In part it looks like:
_callbacks____1kvynkpid([{"id":"10012" ...])
So we need to first remove the JavaScript call, which are the leading characters up to and including the first ( character and the final ):
import requests
import json
request_url = 'https://freeserv.dukascopy.com/2.0/api?group=quotes&method=realtimeSentimentIndex&enabled=true&key=bsq3l3p5lc8w4s0c&type=swfx&jsonp=_callbacks____1kvynkpid'
r = requests.get(request_url)
text = r.text
idx = text.index('(')
# skip everything up to and including opening '(' and then skip closing ')'
text = text[idx+1:-1]
print(json.loads(text))
Prints:
[{'id': '10012', 'title': 'ESP.IDX/EUR', 'date': '1636925400000', 'long': '71.43', 'short': '28.57'}, {'id': '10104', 'title': 'AUS.IDX/AUD', 'date': '1636925400000', 'long': '70.59', 'short': '29.41'}, {'id': '10266', 'title': 'NLD.IDX/EUR', 'date': '1636925400000', 'long': '73.48', 'short': '26.52'},
... data too big too fully reproduce
{'id': '82862', 'title': 'MAT/USD', 'date': '1636925400000', 'long': '70.27', 'short': '29.73'}, {'id': '82866', 'title': 'ENJ/USD', 'date': '1636925400000', 'long': '72.16', 'short': '27.84'}]
In this case the structure when interpreted as a string adhered to the JSON format and so we were able to parse it with json.loads(). But what if the JavaScript structure had been (in part):
[{'id':'10012'}]
This is both legal JavaScript and legal Python, but not legal JSON because strings must be enclosed within double-quotes for it to be valid JSON. But since it is legal Python, we could use ast.literal_eval:
import requests
import ast
request_url = 'https://freeserv.dukascopy.com/2.0/api?group=quotes&method=realtimeSentimentIndex&enabled=true&key=bsq3l3p5lc8w4s0c&type=swfx&jsonp=_callbacks____1kvynkpid'
r = requests.get(request_url)
text = r.text
idx = text.index('(')
# skip everything up to and including opening '(' and then skip closing ')'
text = text[idx+1:-1]
print(ast.literal_eval(text))
Of course, for the current situation both json.loads and ast.literal_eval happened to work. However, if the JavaScript structure had been:
[{id:'10012'}]
This is valid JavaScript but, alas, not valid Python and cannot be parsed with either json.loads or ast.literal_eval.

How to extract only a specific value from a python sublist that I got as an API response from Monkeylearn

I have been training a text classification model in Monkeylearn and as a response to my API query, I get a python list as a result. I want to extract only the specific text classification value from it. Attaching the code below.
ml = MonkeyLearn('42b2344587')
data = reddittext[2] # dataset in a python list
model_id = 'cl7C'
result = ml.classifiers.classify(model_id, data)
print(result.body) #response from API in list format
Output I get is :
[{'text': 'comment\n', 'external_id': None, 'error': False, 'classifications': []},
{'text': 'So this is the worst series of Kohli like in years.\n', 'external_id': None, 'error': False, 'classifications': []},
{'text': 'Saini ODI average at 53 😂\n', 'external_id': None, 'error': False, 'classifications': [{'tag_name': 'Batting', 'tag_id': 122983950, 'confidence': 0.64}]}]
I want to only print the classifications - tag_name ie "Batting" from this list.
type(result.body)
the output I get is: List
The result.body is a list of dicts and text, also known as the JSON format.
You can get the desired information by iterating through the lists with a for-loop and performing dictionary look ups with d["key"] if you know the key exists or d.get("key") if you don't know whether the key exists in the dictionary. The get command will return None if the key tag_name doesn't exist.
for entry in result.body:
for classification in entry['classifications']:
tag_name = classification.get('tag_name')
if tag_name is not None:
print(tag_name)
Since I don't know if response format is fixed or not, assuming it isn't.
Use Json to encode response to string, and use regex to find string.
This way, you can match multiple occurances. Since you're receiving most likely json files, json module won't complain about encoding it.
import json
import re
testcase = [{'text': 'comment\n', 'external_id': None, 'error': False, 'classifications': []},
{'text': 'So this is the worst series of Kohli like in years.\n', 'external_id': None, 'error': False, 'classifications': []},
{'text': 'Saini ODI average at 53 😂\n', 'external_id': None, 'error': False, 'classifications': [{'tag_name': 'Batting', 'tag_id': 122983950, 'confidence': 0.64}]}]
# if data format is fixed
print(testcase[-1]['classifications'][0]['tag_name'])
# if not, expensive but works.
def json_find(source, key_name):
json_str = json.dumps(source)
pattern = f'(?<={key_name}": ")([^,"]*)'
found = re.findall(pattern, json_str)
return found
print(json_find(testcase, 'tag_name')[0])
Result:
Batting
Batting

python-Get a string after specific word in line

I'm searching JIRA tickets which have specific subject.I put results in JSON file (whole file:https://1drv.ms/f/s!AizscpxS0QM4attoSBbMLkmKp1s)
I wrote a python code to get ticket description
#!/usr/bin/python
import sys
import json
if sys.version[0] == '2':
reload(sys)
sys.setdefaultencoding("utf-8")
sys.stdout = open('output.txt','wt')
datapath = sys.argv[1]
data = json.load(open(datapath))
for issue in data['issues']:
if len(issue['fields']['subtasks']) == 0 or 'description' in issue['fields']:
custom_field = issue['fields']['description']
my_string=custom_field
#print custom_field
print my_string.split("name:",1)[1]
Some tickets have this value in description:
"description": "name:some name\r\n\r\ncount:5\r\n\r\nregion:some region\r\n\r\n\u00a0",
i need to get values after Name, count and region for all tickets:
desired output (in this example JSON file):
some name 5 some region
some name 5 some region
With code above i can get all values after name
some name^M
^M
count:5^M
^M
region:some region
Also, how to skip processing tickets which have no these values in description, in that case i get:
print custom_field.split("name",1)[2]
IndexError: list index out of range
This looks like a job for a regular expression:
>>> import re
>>> x = r"(\w+):(.+)\r\n\r"
>>> regexp = re.compile(x)
>>> s = "name:some name\r\n\r\ncount:5\r\n\r\nregion:some region\r\n\r\n\u00a0"
>>> regexp.findall(s)
[('name', 'some name'), ('count', '5'), ('region', 'some region')]
Or, if you want a dictionary back,
>>> dict(regexp.findall(s))
{'count': '5', 'region': 'some region', 'name': 'some name'}
You can drop the keys from the dict like this:
>>> mydict = dict(regexp.findall(s))
>>> mydict.values()
mydict.values()
['5', 'some region', 'some name']
But be careful, because they may not be in the order you expect. To match your desired output:
>>> mydict = dict(regexp.findall(s))
>>> print("{name} {count:2s} {region}".format(**mydict))
some name 5 some region
If you don't have the expected values, the findall() call will return an empty or incomplete list. In that case you must check the returned dict before printing it, otherwise the format() call will fail.
One way to ensure that the dict always has the expected values is to set it up beforehand with defaults.
>>> mydict = {'count': 'n/a', 'region': 'n/a', 'name': 'n/a'}
>>> mydict.update(dict(regexp.findall(s)))
Then the format() call will always work, even if one of the fields is missing from the data.
you can use this try catch expression
try:
print custom_field.split("name",1)[2]
except :
print("Skipping ..")

Printing json data in string format

I have many json fields in my model. I want to print them in the string format.
The code I am using is :
data=[]
detail=details.objects.filter(Id=item['Id'])
for i in compliance:
data.append(str("Name")+str(":")+str(i.Name)+str(" , ")+str("Details")+str(":")+str(i.Details)
print data
The output I am getting is :
Name:ABC, Details:{u'Status': u'True', u'Remarks': u'No Remark'}
The expected output is:
Name:ABC, Details:Status:True,Remarks:No Remark
Any help will be appreciated.
Check if your data is of type dict
If not print as you are doing now
If yes then send dictionary to another function which does as below
def print_dict(d):
return ",".join([key+":"+str(d[key]) for key in d])
You can do it this way, assuming compliance is a dict / json.
Save the key dicts in a list
Iterate over that list and build a concatenated list
Code would look like this:
keyorder = ['Name', 'Status', 'Remarks']
res = []
for key in keyorder:
res.append(key + ':' + compliance[key])
', '.join(res)
'Name:ABC, Status:True, Remarks:No remarks'
As #chkri suggested check first if your data is dict if yes then you can try this one line solution:
dict={'Name':'ABC', 'Details':{u'Status': u'True', u'Remarks': u'No Remark'}}
print({k:v for k,v in dict.items()})
output:
{'Name': 'ABC', 'Details': {'Remarks': 'No Remark', 'Status': 'True'}}

Python: Extract info from xml to dictionary

I need to extract information from an xml file, isolate it from the xml tags before and after, store the information in a dictionary, then loop through the dictionary to print a list. I am an absolute beginner so I'd like to keep it as simple as possible and I apologize if how I've described what I'd like to do doesn't make much sense.
here is what i have so far.
for line in open("/people.xml"):
if "name" in line:
print (line)
if "age" in line:
print(line)
Current Output:
<name>John</name>
<age>14</age>
<name>Kevin</name>
<age>10</age>
<name>Billy</name>
<age>12</age>
Desired Output
Name Age
John 14
Kevin 10
Billy 12
edit- So using the code below I can get the output:
{'Billy': '12', 'John': '14', 'Kevin': '10'}
Does anyone know how to get from this to a chart with headers like my desired output?
try xmldict (Convert xml to python dictionaries, and vice-versa.):
>>> xmldict.xml_to_dict('''
... <root>
... <persons>
... <person>
... <name first="foo" last="bar" />
... </person>
... <person>
... <name first="baz" last="bar" />
... </person>
... </persons>
... </root>
... ''')
{'root': {'persons': {'person': [{'name': {'last': 'bar', 'first': 'foo'}}, {'name': {'last': 'bar', 'first': 'baz'}}]}}}
# Converting dictionary to xml
>>> xmldict.dict_to_xml({'root': {'persons': {'person': [{'name': {'last': 'bar', 'first': 'foo'}}, {'name': {'last': 'bar', 'first': 'baz'}}]}}})
'<root><persons><person><name><last>bar</last><first>foo</first></name></person><person><name><last>bar</last><first>baz</first></name></person></persons></root>'
or try xmlmapper (list of python dictionary with parent-child relationship):
>>> myxml='''<?xml version='1.0' encoding='us-ascii'?>
<slideshow title="Sample Slide Show" date="2012-12-31" author="Yours Truly" >
<slide type="all">
<title>Overview</title>
<item>Why
<em>WonderWidgets</em>
are great
</item>
<item/>
<item>Who
<em>buys</em>
WonderWidgets1
</item>
</slide>
</slideshow>'''
>>> x=xml_to_dict(myxml)
>>> for s in x:
print s
>>>
{'text': '', 'tail': None, 'tag': 'slideshow', 'xmlinfo': {'ownid': 1, 'parentid': 0}, 'xmlattb': {'date': '2012-12-31', 'author': 'Yours Truly', 'title': 'Sample Slide Show'}}
{'text': '', 'tail': '', 'tag': 'slide', 'xmlinfo': {'ownid': 2, 'parentid': 1}, 'xmlattb': {'type': 'all'}}
{'text': 'Overview', 'tail': '', 'tag': 'title', 'xmlinfo': {'ownid': 3, 'parentid': 2}, 'xmlattb': {}}
{'text': 'Why', 'tail': '', 'tag': 'item', 'xmlinfo': {'ownid': 4, 'parentid': 2}, 'xmlattb': {}}
{'text': 'WonderWidgets', 'tail': 'are great', 'tag': 'em', 'xmlinfo': {'ownid': 5, 'parentid': 4}, 'xmlattb': {}}
{'text': None, 'tail': '', 'tag': 'item', 'xmlinfo': {'ownid': 6, 'parentid': 2}, 'xmlattb': {}}
{'text': 'Who', 'tail': '', 'tag': 'item', 'xmlinfo': {'ownid': 7, 'parentid': 2}, 'xmlattb': {}}
{'text': 'buys', 'tail': 'WonderWidgets1', 'tag': 'em', 'xmlinfo': {'ownid': 8, 'parentid': 7}, 'xmlattb': {}}
above code will give generator. When you iterate over it; you will get information in dict keys; like tag, text, xmlattb,tail and addition information in xmlinfo. Here root element will have parentid information as 0.
Use an XML parser for this. For example,
import xml.etree.ElementTree as ET
doc = ET.parse('people.xml')
names = [name.text for name in doc.findall('.//name')]
ages = [age.text for age in doc.findall('.//age')]
people = dict(zip(names,ages))
print(people)
# {'Billy': '12', 'John': '14', 'Kevin': '10'}
It seems to me that this is an exercise in learning how to parse this XML manually rather than simply pulling a library out of the bag to do it for you. If I am wrong, I suggest watching the udacity video by Steve Huffman that can be found here: http://www.udacity.com/view#Course/cs253/CourseRev/apr2012/Unit/362001/Nugget/365002. He explains how to use the minidom module to parse lightweight xml files such as these.
Now, the first point I want to make in my answer, is that you don't want to create a python dictionary to print all of these values. A python dictionary is simply a set of keys that correspond to values. There is no ordering to them, and so traversal in the order they appeared in the file is a pain in the butt. You are trying to print out all of the names together with their corresponding ages, so a data structure like a list of tuples would probably be better suited to collating your data.
It seems like the structure of your xml file is such that each name tag is succeeded by an age tag that corresponds to it. There also seems to only be a single name tag per line. This makes matters fairly simple. I'm not going to write the most efficient or universal solution to this problem, but instead I will try to make the code as simple to understand as I can.
So let's first create a list to store the data:
Let's then create a list to store the data:
a_list = []
Now open your file, and initialize a couple of variables to hold each name and age:
from __future__ import with_statement
with open("/people.xml") as f:
name, age = None, None #initialize a name and an age variable to be used during traversals.
for line in f:
name = extract_name(line,name) # This function will be defined later.
age = extract_age(line) # So will this one.
if age: #We know that if age is defined, we can add a person to our list and reset our variables
a_list.append( (name,age) ) # and now we can re-initialize our variables.
name,age = None , None # otherwise simply read the next line until age is defined.
Now for each line in the file, we wanted to determine whether it contains a user. If it did, we wanted to extract the name. Let's create a function used to do this:
def extract_name(a_line,name): #we pass in the line as well as the name value that that we defined before beginning our traversal.
if name: # if the name is predefined, we simply want to keep the name at its current value. (we can clear it upon encountering the corresponding age.)
return name
if not "<name>" in a_line: #if no "<name>" in a_line, return. otherwise, extract new name.
return
name_pos = a_line.find("<name>")+6
end_pos = a_line.find("</name>")
return a_line[name_pos:end_pos]
Now, we must create a function to parse the line for a user's age. We can do this in a similar way to the previous function, but we know that once we have an age, it will be added into the list immediately. As such, we never need to concern ourselves with age's previous value. The function can therefore look like this:
def extract_age(a_line):
if not "<age>" in a_line: #if no "<age>" in a_line:
return
age_pos = a_line.find("<age>")+5 # else extract age from line and return it.
end_pos = a_line.find("</age>")
return a_line[age_pos:end_pos]
Finally, you want to print the list. You might do it as follows:
for item in a_list:
print '\t'.join(item)
Hope this helped. I haven't tested out my code, so it might still be slightly buggy. The concepts are there, though. :)
Here's another way using lxml library:
from lxml import objectify
def xml_to_dict(xml_str):
""" Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """
def xml_to_dict_recursion(xml_object):
dict_object = xml_object.__dict__
if not dict_object: # if empty dict returned
return xml_object
for key, value in dict_object.items():
dict_object[key] = xml_to_dict_recursion(value)
return dict_object
return xml_to_dict_recursion(objectify.fromstring(xml_str))
xml_string = """<?xml version="1.0" encoding="UTF-8"?><Response><NewOrderResp>
<IndustryType>Test</IndustryType><SomeData><SomeNestedData1>1234</SomeNestedData1>
<SomeNestedData2>3455</SomeNestedData2></SomeData></NewOrderResp></Response>"""
print xml_to_dict(xml_string)
To preserve the parent node, use this instead:
def xml_to_dict(xml_str):
""" Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """
def xml_to_dict_recursion(xml_object):
dict_object = xml_object.__dict__
if not dict_object: # if empty dict returned
return xml_object
for key, value in dict_object.items():
dict_object[key] = xml_to_dict_recursion(value)
return dict_object
xml_obj = objectify.fromstring(xml_str)
return {xml_obj.tag: xml_to_dict_recursion(xml_obj)}
And if you want to only return a subtree and convert it to dict, you can use Element.find() :
xml_obj.find('.//') # lxml.objectify.ObjectifiedElement instance
See lxml documentation.

Categories

Resources