Unable to store non-characters text in a dictionary in python - python

I am trying sentiment analysis where I have data like
source_text-> #LiesbethHBC I have a good feeling actually 🙈 its not that long, it's pretty soon!\nAw you deserve these tickets
then! 💖
result_value-> Sentiment(polarity=0.0, subjectivity=0.0)
I want to store this key value pair in a python dictionary.
I tried creating one as:
dict={}
dict[source_text].append(result_value)
but I get KeyError
Is there a way to store such text(just not characters) in a dictionary?

Your problem has nothing to do with "non-character text" (which doesn't mean anything actually), the only requirement for an object to be usable as a dict key is that it's hashable, and there's absolutely no restriction on what you can use as value.
Your problem quite simply comes from the fact that you're trying to get the value for an inexistant key (that's what KeyError means : the key you ask for does not exist in the dict).
Here :
mydict = {}
at this point, mydict is empty so just any item access will raise a KeyError
then you're doing this:
dict[source_text].append(result_value)
which is basically:
something = mydict[source_text] # get value for key `source_text`
something.append(result_value)
Since your dict is empty, the first line WILL obviously raise a KeyError.
If you want to store one unique result_value for each source_text value then the proper syntax is:
mydict[source_text] = result_value
If you want to store a list of result_value for each source_text value then you have to either explicitely test if the key is set, if not set it with an empty list, then append to this list:
if source_text not in mydict:
mydict[source_text] = []
mydict[source_text].append(result_value)
or just use a DefaultDict instead:
from collections import DefaultDict
mydict = DefaultDict(list)
# DefaultDict will automagically create the key with an empty list
# as value if the key is missing
mydict[source_text].append(result_value)
Now I strongly suggest that you invest some time in properly learning Python (hint: there's a quite decent tutorial in the official documentation) if you have to use it, this will save on everyone's time.

The problem is that when you tried to pull out the key #LiesbethHBC I have a good feeling actually 🙈 its not that long, it's pretty soon!\nAw you deserve these tickets then! 💖 in the dictionary which in this case is non-existent, Python gave you a KeyError meaning that the key didn't exist in the dictionary. A simple way to solve this is by initially checking whether you have that particular key in the dictionary, if yes, do whatever you wanna do with it, else create that key first.
By the way, avoid using dict (dictionary datatype) or any other datatypes as a variable name.
This is what you should actually do:
dictionary = {} # Since, 'dict' is the dictionary data-type in Python
if (source_text in dictionary):
# If the key exists...
dictionary[source_text].append(result_value)
else:
# If the key does not exist...
dictionary[source_text] = []
This should help...

Have you tried using '.update' method?
dict = {}
dict.update({'First':'Test'})
dict.update({'Lets Get':'Real'})
print (dict)
Output:
{'Testing': 'Dictionaries', 'Lets Get': 'Real'}
EDIT:
Or even:
dict = {}
dict.update({'Polarity':0.91})
dict.update({'Subjectivity':0.73})
print (dict)
Output:
{'Polarity': 0.8, 'Subjectivity': 0.73}

Related

How to Identify an dictionary inside an dictionary

I have a dictionary like this:
test = {'user_id':125, 'company':'XXXX', 'payload': {"tranx": "456b62448367","payload": {"snr": "25%","Soil": 45,"humidity": 85}}}
The requirement is :
the payload inside a dictionary(test), is dynamic sometimes the payload will come and sometimes it won't, and the payload name is temporary, may after some time it will become "abc" or anything.
In this case,
I want to Identify the "test" is a nested dict or not.
If it is nested dict I want to know the "key" of the nested dictionary, How can I solve this.
iterate and check
for key, value in outer_dict.items():
if isinstance(value, dict):
print(key)

Sorting large file of emails based on domain shows error

I am trying to sort the 1 GB file containing emails based on the domains that they have using the following logic:
data = {}
emails = open('test','r',encoding='ascii',errors='ignore')
for email in emails.readlines():
(user, domain) = email.split('#')
data[domain] = email
keys = data.keys()
keys.sort()
print([data[x] for x in keys])
When I ran the file using Python 3.5 I got the following error:
keys.sort()
AttributeError: 'dict_keys' object has no attribute 'sort'
Kindly, let me know what to do to make it run successfully.
You need to call list on the returned dict_keys object to cast it into a list which has the list.sort method:
keys = list(data.keys())
keys.sort()
Or simply call sorted directly on the dict_keys object to return a sorted list:
keys = sorted(data.keys())
On another note, you should dedent this part of the code so the sorting is not done every time a new key is added to the dict, but at the end of the loop.
Or simply apply sorted on the dict directly if you don't actually need the list of keys:
for email in emails.readlines():
(user, domain) = email.split('#')
data[domain] = email
print([v for _, v in sorted(data.items(), key=lambda x: x[0])]))
I am posting this as an answer because it will be easier to read. It doesn't answer your question directly, since it has already been answered, but is answering a problem you will notice the moment your code is able to be run.
Problem: Duplicated domains will result in only the last entry being saved. The line
data[domain] = email
overwrites what might have been written under that key before. What you want to do is substitute the line mentioned with this block:
try:
data[domain].append(email)
except KeyError:
data[domain] = [email]
That will create a list of users on same domain. If the key hasn't been found, a KeyError will be raised, and that is a signal that it is a new domain and you have to create another list. If the key has been found, we just append new email.

Retrieve JSON key and value

I am POSTing a .json object to my server with different keys attached.
ID, time, content
On my server I want to then wrap this again in another .json file with another APIs key and value formatting.
So... I want to store the key and value for 'content'
Currently i can obtain the value for 'content by:
content = json_obj['content']
But this only returns the value. What is the syntax for storing the key and value in content? The desirable outcome:
content = {'content' : "........."}
Your JSON_obj here is acting as a dictionary, so you can use Python's items (Python 3+) or iteritems (Python 2.7) functions:
for k,v in json_obj.iteritems():
foo = {k: v}
# do something with foo
Solved, thanks SuperSaiyan you triggered this thought.
Just create a new dictionary:
content = json_obj['content']
test_obj = {'content':content}
Could also:
test_obj = {'content':json_obj['content']}

Printing output from json

I am still new to python, and brand new to json. I am trying to go through output that is in json. I am not yet sure which fields will need to be printed out, but I do know that two of them will be needed.
How could I change:
import json
from pprint import pprint
with open('out.json') as data_file:
data = json.load(data_file)
pprint(data)
to print out say, field one, and field two?
I figure if I can print field one, and two, I can play around with it until I find the right fields. I imagine this is a derp level question, but being able to print specific fields is what I need to be able to do.
json.load is returning python obj (https://docs.python.org/3/library/json.html#json.load) so depending on content of 'out.json' it can be either dict, list or few other types.
In case of dictionary you can go with data['key'] or if it's list go with data[index] - where index is 1,2,...
For looping use for ie for list:
for elem in data:
print(elem)
of for dictionary:
for key, value in data.items():
print(key, value)
You could have find it easily in python's json documentation.
Here data is a dict type object. You can get any value by using the corresponding key like this:
print data['field']
But it will throw a KeyError if the field key is not present in the dict. For avoiding this issue you can use the get() method.
print data.get('field')
This will return None in case of missing key.

Python requests.json() object as dictionary does not recognize its own keys with hasattr() or value in object.keys() call

Summary: dictionary/json object indicates it does not have a given key (using either a hasattr call or a value in object.keys boolean test even though that key shows up in an object.keys() call. So how can I access the value for that key?
Longer version: I am quite puzzled trying to parse some json coming back from an API. When I try to determine whether the json object, which is showing up as a dictionary, has a given key, the code returns false for the key even when it shows the key is there for the object.
Here is how I am retrieving the json:
r = requests.get(url, headers = {'User-Agent':UA})
try:
print(r.json())
jsonobject = r.json()
print("class of jsonobject is %s"%jsonobject.__class__.__name__)
print("here are dictionary keys %s"%jsonobject.keys())
if hasattr(jsonobject, 'laps') and jsonobject['laps'] is not None:
...
else:
print("no laps object")
if hasattr(jsonobject, 'points') and jsonobject['points'] is not None:
...
The reason I am doing this is that often I am getting encoding errors from the field nested within the 'laps' array or the 'points' array so that I cannot insert the json data into a MongoDB database. I would like to delete these fields from the json object since they don't contain useful information anyway.
The problem is that the json object is always returning false for hasattr(jsonobject, 'laps') and hasattr(jsonobject,'points'. It returned false even in the case of a record where I then printed out the keys and they showed:
here are dictionary keys dict_keys(['is_peptalk_allowed', 'show_workout', 'hydration', 'records', 'include_in_stats', 'expand', 'pb_count', 'start_time', 'calories', 'altitude_max', 'hashtags', 'laps', 'pictures', 'duration', 'playlist'\
, 'sport', 'points', 'show_map', 'local_start_time', 'speed_avg', 'tagged_users', 'distance', 'altitude_min', 'is_live', 'author', 'feed_id', 'speed_max', 'id'])
So I thought perhaps the dict was behaving strangely with hasattr, and rewrote the code as:
if 'laps' in jsonobject.keys() and jsonobject['laps'] is not None:
but that also returns false even thoug hit again prints the same array of keys that does include 'laps'.
hasattr() is entirely the wrong tool to use. It tests for attributes, but dictionary keys are not attributes.
To test for keys, use the in test directly against the dictionary:
if 'lap' in jsonobject:
Calling jsonobject.keys() is redundant and creates a new dictionary view object.
It'll be true for your dictionary, but that's not the only thing you are testing for. Your test is:
if 'lap' in jsonobject and jsonobject['lap'] is not None:
That'll fail if 'lap' is a key but the value in the dictionary is None.
The above test can be more simply and compactly stated as:
if jsonobject.get('lap') is not None:
If None is a valid value, don't test for it; stick to just 'lap' in jsonobject.

Categories

Resources