python re extract items within curly brakets

python re extract items within curly brakets - python

I have a large dataset with such as in my sql such as:
("Successfully confirmed payment - {'PAYMENTINFO_0_TRANSACTIONTYPE': ['expresscheckout'], 'ACK': ['Success'], 'PAYMENTINFO_0_PAYMENTTYPE': ['instant'], 'PAYMENTINFO_0_RECEIPTID': ['1037-5147-8706-9322'], 'PAYMENTINFO_0_REASONCODE': ['None'], 'SHIPPINGOPTIONISDEFAULT': ['false'], 'INSURANCEOPTIONSELECTED': ['false'], 'CORRELATIONID': ['1917b2c0e5a51'], 'PAYMENTINFO_0_TAXAMT': ['0.00'], 'PAYMENTINFO_0_TRANSACTIONID': ['3U4531424V959583R'], 'PAYMENTINFO_0_ACK': ['Success'], 'PAYMENTINFO_0_PENDINGREASON': ['authorization'], 'PAYMENTINFO_0_AMT': ['245.40'], 'PAYMENTINFO_0_PROTECTIONELIGIBILITY': ['Eligible'], 'PAYMENTINFO_0_ERRORCODE': ['0'], 'TOKEN': ['EC-82295469MY6979044'], 'VERSION': ['95.0'], 'SUCCESSPAGEREDIRECTREQUESTED': ['true'], 'BUILD': ['7507921'], 'PAYMENTINFO_0_CURRENCYCODE': ['GBP'], 'TIMESTAMP': ['2013-08-29T09:15:59Z'], 'PAYMENTINFO_0_SECUREMERCHANTACCOUNTID': ['XFQALBN3EBE8S'], 'PAYMENTINFO_0_PROTECTIONELIGIBILITYTYPE': ['ItemNotReceivedEligible,UnauthorizedPaymentEligible'], 'PAYMENTINFO_0_ORDERTIME': ['2013-08-29T09:15:59Z'], 'PAYMENTINFO_0_PAYMENTSTATUS': ['Pending']}", 1L, datetime.datetime(2013, 8, 29, 11, 15, 59))
I use the following regex to pull the data from the first item list that is within curley brackets
paypal_meta_re = re.compile(r"""\{(.*)\}""").findall
This works as expected, but when I try to remove the square brackets from the dictionary values, I get an error.
here is my code:
paypal_meta = get_paypal(order_id)
paypal_msg_re = paypal_meta_re(paypal_meta[0])
print type(paypal_msg_re), len(paypal_msg_re)
paypal_str = ''.join(map(str, paypal_msg_re))
print paypal_str, type(paypal_str)
paypal = ast.literal_eval(paypal_str)
paypal_dict = {}
for k, v in paypal.items():
paypal_dict[k] = str(v[0])
if paypal_dict:
namespace['payment_gateway'] = { 'paypal' : paypal_dict}
and here is the traceback:
Traceback (most recent call last):
File "users.py", line 383, in <module>
orders = get_orders(user_id, mongo_user_id, address_book_list)
File "users.py", line 290, in get_orders
paypal = ast.literal_eval(paypal_str)
File "/usr/local/Cellar/python/2.7.2/lib/python2.7/ast.py", line 49, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/usr/local/Cellar/python/2.7.2/lib/python2.7/ast.py", line 37, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "<unknown>", line 1
'PAYMENTINFO_0_TRANSACTIONTYPE': ['expresscheckout'], 'ACK': ['Success'], 'PAYMENTINFO_0_PAYMENTTYPE': ['instant'], 'PAYMENTINFO_0_RECEIPTID': ['2954-8480-1689-8177'], 'PAYMENTINFO_0_REASONCODE': ['None'], 'SHIPPINGOPTIONISDEFAULT': ['false'], 'INSURANCEOPTIONSELECTED': ['false'], 'CORRELATIONID': ['5f22a1dddd174'], 'PAYMENTINFO_0_TAXAMT': ['0.00'], 'PAYMENTINFO_0_TRANSACTIONID': ['36H74806W7716762Y'], 'PAYMENTINFO_0_ACK': ['Success'], 'PAYMENTINFO_0_PENDINGREASON': ['authorization'], 'PAYMENTINFO_0_AMT': ['86.76'], 'PAYMENTINFO_0_PROTECTIONELIGIBILITY': ['PartiallyEligible'], 'PAYMENTINFO_0_ERRORCODE': ['0'], 'TOKEN': ['EC-6B957889FK3149915'], 'VERSION': ['95.0'], 'SUCCESSPAGEREDIRECTREQUESTED': ['true'], 'BUILD': ['6680107'], 'PAYMENTINFO_0_CURRENCYCODE': ['GBP'], 'TIMESTAMP': ['2013-07-02T13:02:50Z'], 'PAYMENTINFO_0_SECUREMERCHANTACCOUNTID': ['XFQALBN3EBE8S'], 'PAYMENTINFO_0_PROTECTIONELIGIBILITYTYPE': ['ItemNotReceivedEligible'], 'PAYMENTINFO_0_ORDERTIME': ['2013-07-02T13:02:49Z'], 'PAYMENTINFO_0_PAYMENTSTATUS': ['Pending']
^
SyntaxError: invalid syntax
where as if i split the code, using
msg, paypal_msg = paypal_meta[0].split(' - ')
paypal = ast.literal_eval(paypal_msg)
paypal_dict = {}
for k, v in paypal.items():
paypal_dict[k] = str(v[0])
if paypal_dict:
namespace['payment_gateway'] = { 'paypal' : paypal_dict}
insert = orders_dbs.save(namespace)
return insert
This works, but I can't use it, as some of the records returned don't split and is not accurate.
Basically, I want to take the items in the curly brackets and remove the square brackets from the values and then create a new dictionary from that.

You need to include the curly braces, your code omits these:
r"""({.*})""")
Note that the parentheses are now around the {...}.
Alternatively, if there is always a message and one dash before the dictionary, you can use str.partition() to split that off:
paypal_msg = paypal_meta[0].partition(' - ')[-1]
or limit your splitting with str.split() to just once:
paypal_msg = paypal_meta[0].split(' - ', 1)[-1]
Try to avoid putting Python structures like that into the database instead; store JSON in a separate column rather than a string dump of the object.

Related

eval function doesnt turn dict-like string into dict?

So I have several strings in a DataFrame column looking like this one for example:
{'Free to Play': 17555, 'Multiplayer': 10499, 'FPS': 9248, 'Action': 8188, 'Shooter': 7857, 'Class-Based': 6098, 'Team-Based': 5363, 'Funny': 5155, 'First-Person': 4846, 'Trading': 4512, 'Cartoony': 4240, 'Competitive': 4116, 'Online Co-Op': 4016, 'Co-op': 3920, 'Robots': 3112, 'Comedy': 3049, 'Tactical': 2726, 'Crafting': 2491, 'Cartoon': 2450, 'Moddable': 2315}
I am trying to access the keys of the dict but as it is still a string I wanted to convert it into dictionaries and found people saying that eval can be used for that. And yes when I try like this it works fine and test_dict is of type dict:
test_str = "{'Early Access': 77, 'RPG': 202}"
test_dict = eval(test_str)
Yet when working with the strings in the DataFrame
tags = main_data["tags"]
for taglist in tags:
taglist = "\"" + taglist + "\""
tag_dict = eval(taglist)
tag_dict always remains a string and after some strings eval throws errors like these:
File "<string>", line 1
"{'Action': 2681, 'FPS': 2048, 'Multiplayer': 1659, 'Shooter': 1420, 'Classic': 1344, 'Team-Based': 943, 'First-Person': 799, 'Competitive': 790, 'Tactical': 734, "1990's": 564, 'e-sports': 550, 'PvP': 480, 'Military': 367, 'Strategy': 329, 'Score Attack': 200, 'Survival': 192, 'Old School': 164, 'Assassin': 151, '1980s': 144, 'Violent': 40}"
^
SyntaxError: invalid syntax
I found out it might be a problem with the length of the strings as when using taglist = "\"\"\"" + taglist + "\"\"\"" eval doesnt throw any errors, goes through all the strings but still they are not converted to a dict and remain str.
Maybe I have done some rookie mistake or there are better approaches to solving my problem?

Since you're serializing your dict to some kind of external storage, I would use json. It's designed for this, whereas eval is ... tricky. And you're actually running code, so whatever someone puts in the database, you're going to run it.
There's one catch. Json expects double quotes. Since it's already written to the database as python code with single quotes around the dictionary keys, you're going to have to convert those to double quotes to be legal json. I'd suggest fixing it once in the database, and then use json going forward.
import json
data_dict = {'Free to Play': 17555, 'Multiplayer': 10499, 'FPS': 9248, 'Action': 8188, 'Shooter': 7857, 'Class-Based': 6098, 'Team-Based': 5363, 'Funny': 5155, 'First-Person': 4846, 'Trading': 4512, 'Cartoony': 4240, 'Competitive': 4116, 'Online Co-Op': 4016, 'Co-op': 3920, 'Robots': 3112, 'Comedy': 3049, 'Tactical': 2726, 'Crafting': 2491, 'Cartoon': 2450, 'Moddable': 2315}
data_dict.update({'Early Access': 77, 'RPG': 202})
data_string = json.dumps(data_dict)
# write it to a file or database
# read it later, we'll assume that's data_string
data_dict = json.loads(data_string)
print (data_dict['RPG'])
database_string = "{'Free to Play': 17555, 'Multiplayer': 10499, 'FPS': 9248, 'Action': 8188, 'Shooter': 7857, 'Class-Based': 6098, 'Team-Based': 5363, 'Funny': 5155, 'First-Person': 4846, 'Trading': 4512, 'Cartoony': 4240, 'Competitive': 4116, 'Online Co-Op': 4016, 'Co-op': 3920, 'Robots': 3112, 'Comedy': 3049, 'Tactical': 2726, 'Crafting': 2491, 'Cartoon': 2450, 'Moddable': 2315}"
# this isn't a general purpose converter, but works for this case
# just to change the single quotes to double quotes
converted_to_legal_json = database_string.replace("'", '"')
data_dict = json.loads(converted_to_legal_json)
print (data_dict['Multiplayer'])
I can probably correct your eval if you want, but can't do it right this second. But like I said, not recommended. And I'd use ast.literal_eval rather than actually executing it with eval, for security reasons.

AttributeError: 'dict' object has no attribute 'split'

I am trying to run this code where data of a dictionary is saved in a separate csv file.
Here is the dict:
body = {
'dont-ask-for-email': 0,
'action': 'submit_user_review',
'post_id': 76196,
'email': email_random(),
'subscribe': 1,
'previous_hosting_id': prev_hosting_comp_random(),
'fb_token': '',
'title': review_title_random(),
'summary': summary_random(),
'score_pricing': star_random(),
'score_userfriendly': star_random(),
'score_support': star_random(),
'score_features': star_random(),
'hosting_type': hosting_type_random(),
'author': name_random(),
'social_link': '',
'site': '',
'screenshot[image][]': '',
'screenshot[description][]': '',
'user_data_process_agreement': 1,
'user_email_popup': '',
'subscribe_popup': 1,
'email_asked': 1
}
Now this is the code to write in a CSV file and finally save it:
columns = []
rows = []
chunks = body.split('}')
for chunk in chunks:
row = []
if len(chunk)>1:
entry = chunk.replace('{','').strip().split(',')
for e in entry:
item = e.strip().split(':')
if len(item)==2:
row.append(item[1])
if chunks.index(chunk)==0:
columns.append(item[0])
rows.append(row)
df = pd.DataFrame(rows, columns = columns)
df.head()
df.to_csv ('r3edata.csv', index = False, header = True)
but this is the error I get:
Traceback (most recent call last):
File "codeOffshoreupdated.py", line 125, in <module>
chunks = body.split('}')
AttributeError: 'dict' object has no attribute 'split'
I know that dict has no attribute named split but how do I fix it?
Edit:
format of the CSV I want:
dont-ask-for-email, action, post_id, email, subscribe, previous_hosting_id, fb_token, title, summary, score_pricing, score_userfriendly, score_support, score_features, hosting_type,author, social_link, site, screenshot[image][],screenshot[description][],user_data_process_agreement,user_email_popup,subscribe_popup,email_asked
0,'submit_user_review',76196,email_random(),1,prev_hosting_comp_random(),,review_title_random(),summary_random(),star_random(),star_random(),star_random(),star_random(),hosting_type_random(),name_random(),,,,,1,,1,1
Note: all these functions mentioned are return values
Edit2:
I am picking emails from the email_random() function like this:
def email_random():
with open('emaillist.txt') as emails:
read_emails = csv.reader(emails, delimiter = '\n')
return random.choice(list(read_emails))[0]
and the emaillist.txt is like this:
xyz#gmail.com
xya#gmail.com
xyb#gmail.com
xyc#gmail.com
xyd#gmail.com
other functions are also picking the data from the files like this too.

Since body is a dictionary, you don't have to a any manual parsing to get it into a CSV format.
If you want the function calls (like email_random()) to be written into the CSV as such, you need to wrap them into quotes (as I have done below). If you want them to resolve as function calls and write the results, you can keep them as they are.
import csv
def email_random():
return "john#example.com"
body = {
'dont-ask-for-email': 0,
'action': 'submit_user_review',
'post_id': 76196,
'email': email_random(),
'subscribe': 1,
'previous_hosting_id': "prev_hosting_comp_random()",
'fb_token': '',
'title': "review_title_random()",
'summary': "summary_random()",
'score_pricing': "star_random()",
'score_userfriendly': "star_random()",
'score_support': "star_random()",
'score_features': "star_random()",
'hosting_type': "hosting_type_random()",
'author': "name_random()",
'social_link': '',
'site': '',
'screenshot[image][]': '',
'screenshot[description][]': '',
'user_data_process_agreement': 1,
'user_email_popup': '',
'subscribe_popup': 1,
'email_asked': 1
}
with open('example.csv', 'w') as fhandle:
writer = csv.writer(fhandle)
items = body.items()
writer.writerow([key for key, value in items])
writer.writerow([value for key, value in items])
What we do here is:
with open('example.csv', 'w') as fhandle:
this opens a new file (named example.csv) with writing permissions ('w') and stores the reference into variable fhandle. If using with is not familiar to you, you can learn more about them from this PEP.
body.items() will return an iterable of tuples (this is done to guarantee dictionary items are returned in the same order). The output of this will look like [('dont-ask-for-email', 0), ('action', 'submit_user_review'), ...].
We can then write first all the keys using a list comprehension and to the next row, we write all the values.
This results in
dont-ask-for-email,action,post_id,email,subscribe,previous_hosting_id,fb_token,title,summary,score_pricing,score_userfriendly,score_support,score_features,hosting_type,author,social_link,site,screenshot[image][],screenshot[description][],user_data_process_agreement,user_email_popup,subscribe_popup,email_asked
0,submit_user_review,76196,john#example.com,1,prev_hosting_comp_random(),,review_title_random(),summary_random(),star_random(),star_random(),star_random(),star_random(),hosting_type_random(),name_random(),,,,,1,,1,1

tinydb: how to update a document with a condition

Hi I would like to update some documents that match a query. So for each document I would like to update the field 'parent_id' if and only if this document have an ID greater then i.e. 6
for result in results:
db.update(set('parent_id', current_element_id),
result.get('id') > current_element_id )
error:
Traceback (most recent call last):
File "debug.py", line 569, in <module>
convertxml=parse(xmlfile, force_list=('interface',))
File "debug.py", line 537, in parse
parser.Parse(xml_input, True)
File "..\Modules\pyexpat.c", line 468, in EndElement
File "debug.py", line 411, in endElement
db.update(set('parent_id', current_element_id), result.get('id') > current_element_id )
File "C:\ProgramData\Miniconda3\lib\site-packages\tinydb\database.py", line 477, in update
cond, doc_ids
File "C:\ProgramData\Miniconda3\lib\site-packages\tinydb\database.py", line 319, in process_elements
if cond(data[doc_id]):
TypeError: 'bool' object is not callable
example of document that should be update:
...,
{'URI': 'http://www.john-doe/',
'abbr': 'IDD',
'affiliation': 'USA',
'closed': False,
'created': '2018-06-01 22:49:02.927347',
'element': 'distrbtr',
'id': 7,
'parent_id': None
},...
In the documentation of tinydb I see that I can use set. Otherwise if I don't use Set it will update all the document db.update(dict) which I don't want to.

Using the Docs using write_back to replace part of a document is better
>>> docs = db.search(User.name == 'John')
[{name: 'John', age: 12}, {name: 'John', age: 44}]
>>> for doc in docs:
... doc['name'] = 'Jane'
>>> db.write_back(docs) # Will update the documents we retrieved
>>> docs = db.search(User.name == 'John')
[]
>>> docs = db.search(User.name == 'Jane')
[{name: 'Jane', age: 12}, {name: 'Jane', age: 44}]
implementing it to my situation
for result in results:
if result['parent_id'] != None:
result['parent_id'] = current_element_id
db.write_back(results)

Exporting response.txt to csv file

I'm trying to parse data that I receive from a curl request through python. The data is in the following format:
{'meta': {'from': '1520812800',
'granularity': 'daily',
'to': '1523232000',
'total': 6380},
'data': [{'count': 660, 'date': '2018-03-12'},
{'count': 894, 'date': '2018-03-13'}]}
Originally, the data was returned as a string probably because I used response.text to retrieve the data. I converted the string into a dictionary using ast.literal_eval(response.text). I managed to parse the "data" key and ignore "meta". So currently,
data = [{"date":"2018-03-12","count":660},{"date":"2018-03-13","count":894}]}`.
I am trying to export the values for "date" and "count" to a csv file. In my code I have this:
keys = data[0].keys()
print("----------KEYS:---------")
print keys #['date','count']
print("------------------------")
with open('mycsv.csv','wb') as output_file:
thewriter = csv.DictWriter(output_file, fieldnames =
['date','count'])
thewriter.writeheader()
thewriter.writerow(data)
However, python does not like this and gives me an error:
Traceback (most recent call last):
File "curlparser.py", line 45, in <module>
thewriter.writerow(data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 152, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 148, in _dict_to_list
+ ", ".join([repr(x) for x in wrong_fields]))
ValueError: dict contains fields not in fieldnames: {"date":"2018-03-12","count":660},{"date":"2018-03-13","count":894}

dictionary update sequence element #0 has length 3; 2 is required

I want to add lines to the object account.bank.statement.line through other object But I get following error:
"dictionary update sequence element #0 has length 3; 2 is required"
Here is my code:
def action_account_line_create(self, cr, uid, ids):
res = False
cash_id = self.pool.get('account.bank.statement.line')
for exp in self.browse(cr, uid, ids):
company_id = exp.company_id.id
#statement_id = exp.statement_id.id
lines = []
for l in exp.line_ids:
lines.append((0, 0, {
'name': l.name,
'date': l.date,
'amount': l.amount,
'type': l.type,
'statement_id': exp.statement_id.id,
'account_id': l.account_id.id,
'account_analytic_id': l.analytic_account_id.id,
'ref': l.ref,
'note': l.note,
'company_id': l.company_id.id
}))
inv_id = cash_id.create(cr, uid, lines,context=None)
res = inv_id
return res
I changed it on that but then I ran into this error:
File "C:\Program Files (x86)\OpenERP 6.1-20121029-003136\Server\server\.\openerp\workflow\wkf_expr.py", line 68, in execute
File "C:\Program Files (x86)\OpenERP 6.1-20121029-003136\Server\server\.\openerp\workflow\wkf_expr.py", line 58, in _eval_expr
File "C:\Program Files (x86)\OpenERP 6.1-20121029-003136\Server\server\.\openerp\tools\safe_eval.py", line 241, in safe_eval
File "C:\Program Files (x86)\OpenERP 6.1-20121029-003136\Server\server\.\openerp\tools\safe_eval.py", line 108, in test_expr
File "<string>", line 0
^
SyntaxError: unexpected EOF while parsing
Code:
def action_account_line_create(self, cr, uid, ids, context=None):
res = False
cash_id = self.pool.get('account.bank.statement.line')
for exp in self.browse(cr, uid, ids):
company_id = exp.company_id.id
lines = []
for l in exp.line_ids:
res = cash_id.create ( cr, uid, {
'name': l.name,
'date': l.date,
'amount': l.amount,
'type': l.type,
'statement_id': exp.statement_id.id,
'account_id': l.account_id.id,
'account_analytic_id': l.analytic_account_id.id,
'ref': l.ref,
'note': l.note,
'company_id': l.company_id.id
}, context=None)
return res

This error raised up because you trying to update dict object by using a wrong sequence (list or tuple) structure.
cash_id.create(cr, uid, lines,context=None) trying to convert lines into dict object:
(0, 0, {
'name': l.name,
'date': l.date,
'amount': l.amount,
'type': l.type,
'statement_id': exp.statement_id.id,
'account_id': l.account_id.id,
'account_analytic_id': l.analytic_account_id.id,
'ref': l.ref,
'note': l.note,
'company_id': l.company_id.id
})
Remove the second zero from this tuple to properly convert it into a dict object.
To test it your self, try this into python shell:
>>> l=[(0,0,{'h':88})]
>>> a={}
>>> a.update(l)
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
a.update(l)
ValueError: dictionary update sequence element #0 has length 3; 2 is required
>>> l=[(0,{'h':88})]
>>> a.update(l)

I was getting this error when I was updating the dictionary with the wrong syntax:
Try with these:
lineItem.values.update({attribute,value})
instead of
lineItem.values.update({attribute:value})

Not really an answer to the specific question, but if there are others, like me, who are getting this error in fastAPI and end up here:
It is probably because your route response has a value that can't be JSON serialised by jsonable_encoder. For me it was WKBElement: https://github.com/tiangolo/fastapi/issues/2366
Like in the issue, I ended up just removing the value from the output.

One of the fast ways to create a dict from equal-length tuples:
>>> t1 = (a,b,c,d)
>>> t2 = (1,2,3,4)
>>> dict(zip(t1, t2))
{'a':1, 'b':2, 'c':3, 'd':4, }

I got dictionary update sequence element #0 has length 3; 2 is required
When I was trying to convert a dict to a list using .values()
Solved it by using .items()
list(dict(new_row.items()))

To anyone following patrick collins video[1] on smart contract development who runs into this error code; in the brownie-config.yaml file I had an '=' instead of a '-' in the following line(s)
...
compiler:
solc:
remappings:
= '#openzeppelin=OpenZeppelin/openzeppelin- contracts#4.2.0'
The first '=' should be '-'. The second one is as it should be.
reference:
[1]: https://www.youtube.com/watch?v=M576WGiDBdQ

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python re extract items within curly brakets - python

Related

eval function doesnt turn dict-like string into dict?

AttributeError: 'dict' object has no attribute 'split'

tinydb: how to update a document with a condition

Exporting response.txt to csv file

dictionary update sequence element #0 has length 3; 2 is required

Categories

Resources