eval function doesnt turn dict-like string into dict? - python

So I have several strings in a DataFrame column looking like this one for example:
{'Free to Play': 17555, 'Multiplayer': 10499, 'FPS': 9248, 'Action': 8188, 'Shooter': 7857, 'Class-Based': 6098, 'Team-Based': 5363, 'Funny': 5155, 'First-Person': 4846, 'Trading': 4512, 'Cartoony': 4240, 'Competitive': 4116, 'Online Co-Op': 4016, 'Co-op': 3920, 'Robots': 3112, 'Comedy': 3049, 'Tactical': 2726, 'Crafting': 2491, 'Cartoon': 2450, 'Moddable': 2315}
I am trying to access the keys of the dict but as it is still a string I wanted to convert it into dictionaries and found people saying that eval can be used for that. And yes when I try like this it works fine and test_dict is of type dict:
test_str = "{'Early Access': 77, 'RPG': 202}"
test_dict = eval(test_str)
Yet when working with the strings in the DataFrame
tags = main_data["tags"]
for taglist in tags:
taglist = "\"" + taglist + "\""
tag_dict = eval(taglist)
tag_dict always remains a string and after some strings eval throws errors like these:
File "<string>", line 1
"{'Action': 2681, 'FPS': 2048, 'Multiplayer': 1659, 'Shooter': 1420, 'Classic': 1344, 'Team-Based': 943, 'First-Person': 799, 'Competitive': 790, 'Tactical': 734, "1990's": 564, 'e-sports': 550, 'PvP': 480, 'Military': 367, 'Strategy': 329, 'Score Attack': 200, 'Survival': 192, 'Old School': 164, 'Assassin': 151, '1980s': 144, 'Violent': 40}"
^
SyntaxError: invalid syntax
I found out it might be a problem with the length of the strings as when using taglist = "\"\"\"" + taglist + "\"\"\"" eval doesnt throw any errors, goes through all the strings but still they are not converted to a dict and remain str.
Maybe I have done some rookie mistake or there are better approaches to solving my problem?

Since you're serializing your dict to some kind of external storage, I would use json. It's designed for this, whereas eval is ... tricky. And you're actually running code, so whatever someone puts in the database, you're going to run it.
There's one catch. Json expects double quotes. Since it's already written to the database as python code with single quotes around the dictionary keys, you're going to have to convert those to double quotes to be legal json. I'd suggest fixing it once in the database, and then use json going forward.
import json
data_dict = {'Free to Play': 17555, 'Multiplayer': 10499, 'FPS': 9248, 'Action': 8188, 'Shooter': 7857, 'Class-Based': 6098, 'Team-Based': 5363, 'Funny': 5155, 'First-Person': 4846, 'Trading': 4512, 'Cartoony': 4240, 'Competitive': 4116, 'Online Co-Op': 4016, 'Co-op': 3920, 'Robots': 3112, 'Comedy': 3049, 'Tactical': 2726, 'Crafting': 2491, 'Cartoon': 2450, 'Moddable': 2315}
data_dict.update({'Early Access': 77, 'RPG': 202})
data_string = json.dumps(data_dict)
# write it to a file or database
# read it later, we'll assume that's data_string
data_dict = json.loads(data_string)
print (data_dict['RPG'])
database_string = "{'Free to Play': 17555, 'Multiplayer': 10499, 'FPS': 9248, 'Action': 8188, 'Shooter': 7857, 'Class-Based': 6098, 'Team-Based': 5363, 'Funny': 5155, 'First-Person': 4846, 'Trading': 4512, 'Cartoony': 4240, 'Competitive': 4116, 'Online Co-Op': 4016, 'Co-op': 3920, 'Robots': 3112, 'Comedy': 3049, 'Tactical': 2726, 'Crafting': 2491, 'Cartoon': 2450, 'Moddable': 2315}"
# this isn't a general purpose converter, but works for this case
# just to change the single quotes to double quotes
converted_to_legal_json = database_string.replace("'", '"')
data_dict = json.loads(converted_to_legal_json)
print (data_dict['Multiplayer'])
I can probably correct your eval if you want, but can't do it right this second. But like I said, not recommended. And I'd use ast.literal_eval rather than actually executing it with eval, for security reasons.

Related

Pymongo ignoring allowDiskUse = True

I've looked at the other answers to this question, and yet it is still not working. I am trying to delete duplicate cases, here is the function:
def deleteDups(datab):
col = db[datab]
pipeline = [
{'$group': {
'_id': {
'CASE NUMBER': '$CASE NUMBER',
'JURISDICTION': '$JURISDICTION'},#needs to be case insensitive
'count': {'$sum': 1},
'ids': {'$push': '$_id'}
}
},
{'$match': {'count': {'$gt': 1}}},
]
results = col.aggregate(pipeline, allowDiskUse = True)
count = 0
for result in results:
doc_count = 0
print(result)
it = iter(result['ids'])
next(it)
for id in it:
deleted = col.delete_one({'_id': id})
count += 1
doc_count += 1
#print("API call recieved:", deleted.acknowledged) debug, is the database recieving requests
print("Total documents deleted:", count)
And yet, every time, I get this traceback:
File "C:\Users\*****\Documents\GitHub\*****\controller.py", line 202, in deleteDups
results = col.aggregate(pipeline, allowDiskUse = True)
File "C:\Python38\lib\site-packages\pymongo\collection.py", line 2375, in aggregate
return self._aggregate(_CollectionAggregationCommand,
File "C:\Python38\lib\site-packages\pymongo\collection.py", line 2297, in _aggregate
return self.__database.client._retryable_read(
File "C:\Python38\lib\site-packages\pymongo\mongo_client.py", line 1464, in _retryable_read
return func(session, server, sock_info, slave_ok)
File "C:\Python38\lib\site-packages\pymongo\aggregation.py", line 136, in get_cursor
result = sock_info.command(
File "C:\Python38\lib\site-packages\pymongo\pool.py", line 603, in command
return command(self.sock, dbname, spec, slave_ok,
File "C:\Python38\lib\site-packages\pymongo\network.py", line 165, in command
helpers._check_command_response(
File "C:\Python38\lib\site-packages\pymongo\helpers.py", line 159, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in.
I asterisked out bits of path to protect privacy. But it is driving me absolutely nuts that this line: results = col.aggregate(pipeline, allowDiskUse = True) very explicitly passes allowDiskUse = True, and Mongo is just ignoring it. If I misspelled something, I'm blind. True has to be capitalized to pass a bool in python.
I feel like I'm going crazy here.
According to the documentation:
Atlas Free Tier and shared clusters do not support the allowDiskUse option for the aggregation command or its helper method.
(Thanks to Shane Harvey for this info)

Take the items of a json file separately

Hey guys so I am trying to read a Json file and write a specific item of it in a list. But the json file is in single quotes so I get the error.
simplejson.errors.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
I tried to convert the json file from single quotes to double but it didn't work (I also saw the other stackoverflow questions about this but didn't work for me). Because I tried it with str.replace. or json dumps etc. And it always had a different problem. My code is this:
messages = []
with open("commitsJson.json","r", encoding="utf8") as json_file:
data = json.load(json_file)
for p in data['items']:
messages.append(p['message'])
authors.write(p['message']+"\r\n")
print(p['message'])
So the expected result is to read the json file and write specific items of it into a file or list, etc...
EDIT:
Sample of json file:
{'total_count': 3, 'incomplete_results': False, 'items': [{'url': 'https://gits-20.bkf.sda.eu/api/v3/repos/repo/name/commits/2189312903jsadada',
'sha': '2131932103812jdskfsl', 'node_id': 'asl;dkas;ldjasldasio1203',
'html_url': 'https://gits-20.bkf.sda.eu/api/v3/repos/repo/name/commits/2189312903jsadada',
'comments_url': 'https://gits-20.bkf.sda.eu/api/v3/repos/repo/name/commits/2189312903jsadada',
'commit': {'url': 'https://gits-20.bkf.sda.eu/api/v3/repos/repo/name/commits/2189312903jsadada', 'message': 'Initial commit 1'
Something like that. Basically a github api response but with single quotes instead of double...
Desired output would be to get the 'message' Items of all the json file in to another file like:
Initial commit 1
Initial commit 2
Initial commit 3
Initial commit 4
Initial commit 5
Initial commit 6
Initial commit 7
....
ERROR:
The problem is that json expects double quotes to surround strings
Using ast.literal_eval on the file contents:
commitJson.json:
{
'total_count': 3, 'incomplete_results': False, 'items': [{'url': 'https://gits-20.bkf.sda.eu/api/v3/repos/repo/name/commits/2189312903jsadada',
'sha': '2131932103812jdskfsl', 'node_id': 'asl;dkas;ldjasldasio1203',
'html_url': 'https://gits-20.bkf.sda.eu/api/v3/repos/repo/name/commits/2189312903jsadada',
'comments_url': 'https://gits-20.bkf.sda.eu/api/v3/repos/repo/name/commits/2189312903jsadada',
'commit': {'url': 'https://gits-20.bkf.sda.eu/api/v3/repos/repo/name/commits/2189312903jsadada', 'message': 'Initial commit 1'}}]
}
Hence:
import ast
with open("commitJson.json","r", encoding="utf8") as json_file:
data = ast.literal_eval(json_file.read())
for elem in data['items']:
for e in elem['commit']:
if 'message' in e:
print(elem['commit'][e])
OUTPUT:
Initial commit 1
Shorter-version:
print([elem['commit'][e] for e in elem['commit'] if 'message' in e for elem in data['items']])
OUTPUT:
['Initial commit 1']

Exporting response.txt to csv file

I'm trying to parse data that I receive from a curl request through python. The data is in the following format:
{'meta': {'from': '1520812800',
'granularity': 'daily',
'to': '1523232000',
'total': 6380},
'data': [{'count': 660, 'date': '2018-03-12'},
{'count': 894, 'date': '2018-03-13'}]}
Originally, the data was returned as a string probably because I used response.text to retrieve the data. I converted the string into a dictionary using ast.literal_eval(response.text). I managed to parse the "data" key and ignore "meta". So currently,
data = [{"date":"2018-03-12","count":660},{"date":"2018-03-13","count":894}]}`.
I am trying to export the values for "date" and "count" to a csv file. In my code I have this:
keys = data[0].keys()
print("----------KEYS:---------")
print keys #['date','count']
print("------------------------")
with open('mycsv.csv','wb') as output_file:
thewriter = csv.DictWriter(output_file, fieldnames =
['date','count'])
thewriter.writeheader()
thewriter.writerow(data)
However, python does not like this and gives me an error:
Traceback (most recent call last):
File "curlparser.py", line 45, in <module>
thewriter.writerow(data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 152, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 148, in _dict_to_list
+ ", ".join([repr(x) for x in wrong_fields]))
ValueError: dict contains fields not in fieldnames: {"date":"2018-03-12","count":660},{"date":"2018-03-13","count":894}

python re extract items within curly brakets

I have a large dataset with such as in my sql such as:
("Successfully confirmed payment - {'PAYMENTINFO_0_TRANSACTIONTYPE': ['expresscheckout'], 'ACK': ['Success'], 'PAYMENTINFO_0_PAYMENTTYPE': ['instant'], 'PAYMENTINFO_0_RECEIPTID': ['1037-5147-8706-9322'], 'PAYMENTINFO_0_REASONCODE': ['None'], 'SHIPPINGOPTIONISDEFAULT': ['false'], 'INSURANCEOPTIONSELECTED': ['false'], 'CORRELATIONID': ['1917b2c0e5a51'], 'PAYMENTINFO_0_TAXAMT': ['0.00'], 'PAYMENTINFO_0_TRANSACTIONID': ['3U4531424V959583R'], 'PAYMENTINFO_0_ACK': ['Success'], 'PAYMENTINFO_0_PENDINGREASON': ['authorization'], 'PAYMENTINFO_0_AMT': ['245.40'], 'PAYMENTINFO_0_PROTECTIONELIGIBILITY': ['Eligible'], 'PAYMENTINFO_0_ERRORCODE': ['0'], 'TOKEN': ['EC-82295469MY6979044'], 'VERSION': ['95.0'], 'SUCCESSPAGEREDIRECTREQUESTED': ['true'], 'BUILD': ['7507921'], 'PAYMENTINFO_0_CURRENCYCODE': ['GBP'], 'TIMESTAMP': ['2013-08-29T09:15:59Z'], 'PAYMENTINFO_0_SECUREMERCHANTACCOUNTID': ['XFQALBN3EBE8S'], 'PAYMENTINFO_0_PROTECTIONELIGIBILITYTYPE': ['ItemNotReceivedEligible,UnauthorizedPaymentEligible'], 'PAYMENTINFO_0_ORDERTIME': ['2013-08-29T09:15:59Z'], 'PAYMENTINFO_0_PAYMENTSTATUS': ['Pending']}", 1L, datetime.datetime(2013, 8, 29, 11, 15, 59))
I use the following regex to pull the data from the first item list that is within curley brackets
paypal_meta_re = re.compile(r"""\{(.*)\}""").findall
This works as expected, but when I try to remove the square brackets from the dictionary values, I get an error.
here is my code:
paypal_meta = get_paypal(order_id)
paypal_msg_re = paypal_meta_re(paypal_meta[0])
print type(paypal_msg_re), len(paypal_msg_re)
paypal_str = ''.join(map(str, paypal_msg_re))
print paypal_str, type(paypal_str)
paypal = ast.literal_eval(paypal_str)
paypal_dict = {}
for k, v in paypal.items():
paypal_dict[k] = str(v[0])
if paypal_dict:
namespace['payment_gateway'] = { 'paypal' : paypal_dict}
and here is the traceback:
Traceback (most recent call last):
File "users.py", line 383, in <module>
orders = get_orders(user_id, mongo_user_id, address_book_list)
File "users.py", line 290, in get_orders
paypal = ast.literal_eval(paypal_str)
File "/usr/local/Cellar/python/2.7.2/lib/python2.7/ast.py", line 49, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/usr/local/Cellar/python/2.7.2/lib/python2.7/ast.py", line 37, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "<unknown>", line 1
'PAYMENTINFO_0_TRANSACTIONTYPE': ['expresscheckout'], 'ACK': ['Success'], 'PAYMENTINFO_0_PAYMENTTYPE': ['instant'], 'PAYMENTINFO_0_RECEIPTID': ['2954-8480-1689-8177'], 'PAYMENTINFO_0_REASONCODE': ['None'], 'SHIPPINGOPTIONISDEFAULT': ['false'], 'INSURANCEOPTIONSELECTED': ['false'], 'CORRELATIONID': ['5f22a1dddd174'], 'PAYMENTINFO_0_TAXAMT': ['0.00'], 'PAYMENTINFO_0_TRANSACTIONID': ['36H74806W7716762Y'], 'PAYMENTINFO_0_ACK': ['Success'], 'PAYMENTINFO_0_PENDINGREASON': ['authorization'], 'PAYMENTINFO_0_AMT': ['86.76'], 'PAYMENTINFO_0_PROTECTIONELIGIBILITY': ['PartiallyEligible'], 'PAYMENTINFO_0_ERRORCODE': ['0'], 'TOKEN': ['EC-6B957889FK3149915'], 'VERSION': ['95.0'], 'SUCCESSPAGEREDIRECTREQUESTED': ['true'], 'BUILD': ['6680107'], 'PAYMENTINFO_0_CURRENCYCODE': ['GBP'], 'TIMESTAMP': ['2013-07-02T13:02:50Z'], 'PAYMENTINFO_0_SECUREMERCHANTACCOUNTID': ['XFQALBN3EBE8S'], 'PAYMENTINFO_0_PROTECTIONELIGIBILITYTYPE': ['ItemNotReceivedEligible'], 'PAYMENTINFO_0_ORDERTIME': ['2013-07-02T13:02:49Z'], 'PAYMENTINFO_0_PAYMENTSTATUS': ['Pending']
^
SyntaxError: invalid syntax
where as if i split the code, using
msg, paypal_msg = paypal_meta[0].split(' - ')
paypal = ast.literal_eval(paypal_msg)
paypal_dict = {}
for k, v in paypal.items():
paypal_dict[k] = str(v[0])
if paypal_dict:
namespace['payment_gateway'] = { 'paypal' : paypal_dict}
insert = orders_dbs.save(namespace)
return insert
This works, but I can't use it, as some of the records returned don't split and is not accurate.
Basically, I want to take the items in the curly brackets and remove the square brackets from the values and then create a new dictionary from that.
You need to include the curly braces, your code omits these:
r"""({.*})""")
Note that the parentheses are now around the {...}.
Alternatively, if there is always a message and one dash before the dictionary, you can use str.partition() to split that off:
paypal_msg = paypal_meta[0].partition(' - ')[-1]
or limit your splitting with str.split() to just once:
paypal_msg = paypal_meta[0].split(' - ', 1)[-1]
Try to avoid putting Python structures like that into the database instead; store JSON in a separate column rather than a string dump of the object.

Python: converting a list of dictionaries to json

I have a list of dictionaries, looking some thing like this:
list = [{'id': 123, 'data': 'qwerty', 'indices': [1,10]}, {'id': 345, 'data': 'mnbvc', 'indices': [2,11]}]
and so on. There may be more documents in the list. I need to convert these to one JSON document, that can be returned via bottle, and I cannot understand how to do this. Please help. I saw similar questions on this website, but I couldn't understand the solutions there.
use json library
import json
json.dumps(list)
by the way, you might consider changing variable list to another name, list is the builtin function for a list creation, you may get some unexpected behaviours or some buggy code if you don't change the variable name.
import json
list = [{'id': 123, 'data': 'qwerty', 'indices': [1,10]}, {'id': 345, 'data': 'mnbvc', 'indices': [2,11]}]
Write to json File:
with open('/home/ubuntu/test.json', 'w') as fout:
json.dump(list , fout)
Read Json file:
with open(r"/home/ubuntu/test.json", "r") as read_file:
data = json.load(read_file)
print(data)
#list = [{'id': 123, 'data': 'qwerty', 'indices': [1,10]}, {'id': 345, 'data': 'mnbvc', 'indices': [2,11]}]
response_json = ("{ \"response_json\":" + str(list_of_dict)+ "}").replace("\'","\"")
response_json = json.dumps(response_json)
response_json = json.loads(response_json)
To convert it to a single dictionary with some decided keys value, you can use the code below.
data = ListOfDict.copy()
PrecedingText = "Obs_"
ListOfDictAsDict = {}
for i in range(len(data)):
ListOfDictAsDict[PrecedingText + str(i)] = data[i]

Categories

Resources