Exporting response.txt to csv file - python

I'm trying to parse data that I receive from a curl request through python. The data is in the following format:
{'meta': {'from': '1520812800',
'granularity': 'daily',
'to': '1523232000',
'total': 6380},
'data': [{'count': 660, 'date': '2018-03-12'},
{'count': 894, 'date': '2018-03-13'}]}
Originally, the data was returned as a string probably because I used response.text to retrieve the data. I converted the string into a dictionary using ast.literal_eval(response.text). I managed to parse the "data" key and ignore "meta". So currently,
data = [{"date":"2018-03-12","count":660},{"date":"2018-03-13","count":894}]}`.
I am trying to export the values for "date" and "count" to a csv file. In my code I have this:
keys = data[0].keys()
print("----------KEYS:---------")
print keys #['date','count']
print("------------------------")
with open('mycsv.csv','wb') as output_file:
thewriter = csv.DictWriter(output_file, fieldnames =
['date','count'])
thewriter.writeheader()
thewriter.writerow(data)
However, python does not like this and gives me an error:
Traceback (most recent call last):
File "curlparser.py", line 45, in <module>
thewriter.writerow(data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 152, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 148, in _dict_to_list
+ ", ".join([repr(x) for x in wrong_fields]))
ValueError: dict contains fields not in fieldnames: {"date":"2018-03-12","count":660},{"date":"2018-03-13","count":894}

Related

tinydb: how to update a document with a condition

Hi I would like to update some documents that match a query. So for each document I would like to update the field 'parent_id' if and only if this document have an ID greater then i.e. 6
for result in results:
db.update(set('parent_id', current_element_id),
result.get('id') > current_element_id )
error:
Traceback (most recent call last):
File "debug.py", line 569, in <module>
convertxml=parse(xmlfile, force_list=('interface',))
File "debug.py", line 537, in parse
parser.Parse(xml_input, True)
File "..\Modules\pyexpat.c", line 468, in EndElement
File "debug.py", line 411, in endElement
db.update(set('parent_id', current_element_id), result.get('id') > current_element_id )
File "C:\ProgramData\Miniconda3\lib\site-packages\tinydb\database.py", line 477, in update
cond, doc_ids
File "C:\ProgramData\Miniconda3\lib\site-packages\tinydb\database.py", line 319, in process_elements
if cond(data[doc_id]):
TypeError: 'bool' object is not callable
example of document that should be update:
...,
{'URI': 'http://www.john-doe/',
'abbr': 'IDD',
'affiliation': 'USA',
'closed': False,
'created': '2018-06-01 22:49:02.927347',
'element': 'distrbtr',
'id': 7,
'parent_id': None
},...
In the documentation of tinydb I see that I can use set. Otherwise if I don't use Set it will update all the document db.update(dict) which I don't want to.
Using the Docs using write_back to replace part of a document is better
>>> docs = db.search(User.name == 'John')
[{name: 'John', age: 12}, {name: 'John', age: 44}]
>>> for doc in docs:
... doc['name'] = 'Jane'
>>> db.write_back(docs) # Will update the documents we retrieved
>>> docs = db.search(User.name == 'John')
[]
>>> docs = db.search(User.name == 'Jane')
[{name: 'Jane', age: 12}, {name: 'Jane', age: 44}]
implementing it to my situation
for result in results:
if result['parent_id'] != None:
result['parent_id'] = current_element_id
db.write_back(results)

python re extract items within curly brakets

I have a large dataset with such as in my sql such as:
("Successfully confirmed payment - {'PAYMENTINFO_0_TRANSACTIONTYPE': ['expresscheckout'], 'ACK': ['Success'], 'PAYMENTINFO_0_PAYMENTTYPE': ['instant'], 'PAYMENTINFO_0_RECEIPTID': ['1037-5147-8706-9322'], 'PAYMENTINFO_0_REASONCODE': ['None'], 'SHIPPINGOPTIONISDEFAULT': ['false'], 'INSURANCEOPTIONSELECTED': ['false'], 'CORRELATIONID': ['1917b2c0e5a51'], 'PAYMENTINFO_0_TAXAMT': ['0.00'], 'PAYMENTINFO_0_TRANSACTIONID': ['3U4531424V959583R'], 'PAYMENTINFO_0_ACK': ['Success'], 'PAYMENTINFO_0_PENDINGREASON': ['authorization'], 'PAYMENTINFO_0_AMT': ['245.40'], 'PAYMENTINFO_0_PROTECTIONELIGIBILITY': ['Eligible'], 'PAYMENTINFO_0_ERRORCODE': ['0'], 'TOKEN': ['EC-82295469MY6979044'], 'VERSION': ['95.0'], 'SUCCESSPAGEREDIRECTREQUESTED': ['true'], 'BUILD': ['7507921'], 'PAYMENTINFO_0_CURRENCYCODE': ['GBP'], 'TIMESTAMP': ['2013-08-29T09:15:59Z'], 'PAYMENTINFO_0_SECUREMERCHANTACCOUNTID': ['XFQALBN3EBE8S'], 'PAYMENTINFO_0_PROTECTIONELIGIBILITYTYPE': ['ItemNotReceivedEligible,UnauthorizedPaymentEligible'], 'PAYMENTINFO_0_ORDERTIME': ['2013-08-29T09:15:59Z'], 'PAYMENTINFO_0_PAYMENTSTATUS': ['Pending']}", 1L, datetime.datetime(2013, 8, 29, 11, 15, 59))
I use the following regex to pull the data from the first item list that is within curley brackets
paypal_meta_re = re.compile(r"""\{(.*)\}""").findall
This works as expected, but when I try to remove the square brackets from the dictionary values, I get an error.
here is my code:
paypal_meta = get_paypal(order_id)
paypal_msg_re = paypal_meta_re(paypal_meta[0])
print type(paypal_msg_re), len(paypal_msg_re)
paypal_str = ''.join(map(str, paypal_msg_re))
print paypal_str, type(paypal_str)
paypal = ast.literal_eval(paypal_str)
paypal_dict = {}
for k, v in paypal.items():
paypal_dict[k] = str(v[0])
if paypal_dict:
namespace['payment_gateway'] = { 'paypal' : paypal_dict}
and here is the traceback:
Traceback (most recent call last):
File "users.py", line 383, in <module>
orders = get_orders(user_id, mongo_user_id, address_book_list)
File "users.py", line 290, in get_orders
paypal = ast.literal_eval(paypal_str)
File "/usr/local/Cellar/python/2.7.2/lib/python2.7/ast.py", line 49, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/usr/local/Cellar/python/2.7.2/lib/python2.7/ast.py", line 37, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "<unknown>", line 1
'PAYMENTINFO_0_TRANSACTIONTYPE': ['expresscheckout'], 'ACK': ['Success'], 'PAYMENTINFO_0_PAYMENTTYPE': ['instant'], 'PAYMENTINFO_0_RECEIPTID': ['2954-8480-1689-8177'], 'PAYMENTINFO_0_REASONCODE': ['None'], 'SHIPPINGOPTIONISDEFAULT': ['false'], 'INSURANCEOPTIONSELECTED': ['false'], 'CORRELATIONID': ['5f22a1dddd174'], 'PAYMENTINFO_0_TAXAMT': ['0.00'], 'PAYMENTINFO_0_TRANSACTIONID': ['36H74806W7716762Y'], 'PAYMENTINFO_0_ACK': ['Success'], 'PAYMENTINFO_0_PENDINGREASON': ['authorization'], 'PAYMENTINFO_0_AMT': ['86.76'], 'PAYMENTINFO_0_PROTECTIONELIGIBILITY': ['PartiallyEligible'], 'PAYMENTINFO_0_ERRORCODE': ['0'], 'TOKEN': ['EC-6B957889FK3149915'], 'VERSION': ['95.0'], 'SUCCESSPAGEREDIRECTREQUESTED': ['true'], 'BUILD': ['6680107'], 'PAYMENTINFO_0_CURRENCYCODE': ['GBP'], 'TIMESTAMP': ['2013-07-02T13:02:50Z'], 'PAYMENTINFO_0_SECUREMERCHANTACCOUNTID': ['XFQALBN3EBE8S'], 'PAYMENTINFO_0_PROTECTIONELIGIBILITYTYPE': ['ItemNotReceivedEligible'], 'PAYMENTINFO_0_ORDERTIME': ['2013-07-02T13:02:49Z'], 'PAYMENTINFO_0_PAYMENTSTATUS': ['Pending']
^
SyntaxError: invalid syntax
where as if i split the code, using
msg, paypal_msg = paypal_meta[0].split(' - ')
paypal = ast.literal_eval(paypal_msg)
paypal_dict = {}
for k, v in paypal.items():
paypal_dict[k] = str(v[0])
if paypal_dict:
namespace['payment_gateway'] = { 'paypal' : paypal_dict}
insert = orders_dbs.save(namespace)
return insert
This works, but I can't use it, as some of the records returned don't split and is not accurate.
Basically, I want to take the items in the curly brackets and remove the square brackets from the values and then create a new dictionary from that.
You need to include the curly braces, your code omits these:
r"""({.*})""")
Note that the parentheses are now around the {...}.
Alternatively, if there is always a message and one dash before the dictionary, you can use str.partition() to split that off:
paypal_msg = paypal_meta[0].partition(' - ')[-1]
or limit your splitting with str.split() to just once:
paypal_msg = paypal_meta[0].split(' - ', 1)[-1]
Try to avoid putting Python structures like that into the database instead; store JSON in a separate column rather than a string dump of the object.

bulk update failing when document has attachments?

I am performing the following operation:
Prepare some documents: docs = [ doc1, doc2, ... ]. The documents have maybe attachments
I POST to _bulk_docs the list of documents
I get an Exception > Problems updating list of documents (length = 1): (500, ('badarg', '58'))
My bulk_docs is (in this case just one):
[ { '_attachments': { 'image.png': { 'content_type': 'image/png',
'data': '...'}},
'_id': '08b8fc66-cd90-47a1-9053-4f6fefabdfe3',
'_rev': '15-ff3d0e8baa56e5ad2fac4937264fb3f6',
'docmeta': { 'created': '2013-10-01 14:48:24.311257',
'updated': [ '2013-10-01 14:48:24.394157',
'2013-12-11 08:19:47.271812',
'2013-12-11 08:25:05.662546',
'2013-12-11 10:38:56.116145']},
'org_id': 45345,
'outputs_id': None,
'properties': { 'auto-t2s': False,
'content_type': 'image/png',
'lang': 'es',
'name': 'dfasdfasdf',
'text': 'erwerwerwrwerwr'},
'subtype': 'voicemail-st',
'tags': ['RRR-ccc-dtjkqx'],
'type': 'recording'}]
This is the detailed exception:
Traceback (most recent call last):
File "portal_support_ut.py", line 470, in test_UpdateDoc
self.ps.UpdateDoc(self.org_id, what, doc_id, new_data)
File "/home/gonvaled/projects/new-wavilon-portal/python_modules/wav/ps/complex_ops.py", line 349, in UpdateDoc
success, doc = database.UpdateDoc(doc_id, new_data)
File "/home/gonvaled/projects/new-wavilon-portal/python_modules/wav/cdb/core/updater.py", line 38, in UpdateDoc
res = self.SaveDoc(doc_id, doc)
File "/home/gonvaled/projects/new-wavilon-portal/python_modules/wav/cdb/core/saver.py", line 88, in SaveDoc
else : self.bulk_append(doc, flush, update_revision)
File "/home/gonvaled/projects/new-wavilon-portal/python_modules/wav/cdb/core/bulker.py", line 257, in bulk_append
if force_send or flush or not self.timer.use_timer : self.BulkSend(show_progress=True)
File "/home/gonvaled/projects/new-wavilon-portal/python_modules/wav/cdb/core/bulker.py", line 144, in BulkSend
results = self.UpdateDocuments(self.bulk)
File "/home/gonvaled/projects/new-wavilon-portal/python_modules/wav/cdb/core/bulker.py", line 67, in UpdateDocuments
results = self.db.update(bulkdocs)
File "/home/gonvaled/.virtualenvs/python2.7.3-wavilon1/local/lib/python2.7/site-packages/couchdb/client.py", line 764, in update
_, _, data = self.resource.post_json('_bulk_docs', body=content)
File "/home/gonvaled/.virtualenvs/python2.7.3-wavilon1/local/lib/python2.7/site-packages/couchdb/http.py", line 527, in post_json
**params)
File "/home/gonvaled/.virtualenvs/python2.7.3-wavilon1/local/lib/python2.7/site-packages/couchdb/http.py", line 546, in _request_json
headers=headers, **params)
File "/home/gonvaled/.virtualenvs/python2.7.3-wavilon1/local/lib/python2.7/site-packages/couchdb/http.py", line 542, in _request
credentials=self.credentials)
File "/home/gonvaled/.virtualenvs/python2.7.3-wavilon1/local/lib/python2.7/site-packages/couchdb/http.py", line 398, in request
raise ServerError((status, error))
ServerError: (500, ('badarg', '58'))
What does that badarg mean? Is it possible to send attachments when doing _bulk_docs?
The solution is to remove the data:image/png;base64, prefix before sending the attachment to coudhdb.
For a python alternative, see here.
This was answered in our mailing list, repeating the answer here for completeness.
The data field was malformed in two ways;
'data': '....'
The 'data:image/png;base64,' prefix is wrong, and the base64 part was malformed (CouchDB obviously needs to decode it to store it).

Best practice to store column-based data before to write to CSV in Python

I have the current code in Python 3:
import csv
if __name__ == '__main__':
sp500_data = [
{
'company': 'GOOGLE',
'headquarters': 'GOOGLEPLEX',
'industry': 'ADS',
'sector': 'TECH',
'symbol': 'GOOG'
},
{
'company': 'HEWLPA',
'headquarters': 'WHATEVER',
'industry': 'HARDWARE',
'sector': 'TECH',
'symbol': 'HP'
}
]
myfile = open("D:/test.csv", 'w', newline='')
wr = csv.DictWriter(myfile, delimiter='\t', quoting=csv.QUOTE_ALL, fieldnames=sp500_data[0].keys)
for sp500_company in sp500_data:
wr.writerow(sp500_company)
However this gives the following error:
Traceback (most recent call last):
File "D:\DEV\BlueTS\src\tsRetriever\dataRetriever\test.py", line 24, in <module>
wr.writerow(sp500_company)
File "C:\Python33\lib\csv.py", line 153, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "C:\Python33\lib\csv.py", line 146, in _dict_to_list
wrong_fields = [k for k in rowdict if k not in self.fieldnames]
File "C:\Python33\lib\csv.py", line 146, in <listcomp>
wrong_fields = [k for k in rowdict if k not in self.fieldnames]
TypeError: argument of type 'builtin_function_or_method' is not iterable
I would like to understand what I am doing wrong, and in addition to this, I would like to know what is the best way in Python to store column-based data which was originally organised in tables.
You forgot to call the .keys() method:
wr = csv.DictWriter(myfile, delimiter='\t', quoting=csv.QUOTE_ALL,
fieldnames=sp500_data[0].keys())
Note the () after sp500_data[0].keys; .keys is not an attribute, it is a method.
Using a csv.DictWriter() is an excellent method to turn data already in dictionary format into CSV data.

How to push into array nested in dictionary?

I want to create a mongodb to store the homework results, I create a homework which is a dictionary storing the results' array of each subject.
import pymongo
DBCONN = pymongo.Connection("127.0.0.1", 27017)
TASKSINFO = DBCONN.tasksinfo
_name = "john"
taskid = TASKSINFO.tasksinfo.insert(
{"name": _name,
"homework": {"bio": [], "math": []}
})
TASKSINFO.tasksinfo.update({"_id": taskid},
{"$push": {"homework.bio", 92}})
When I tried to push some information to db, there's error:
Traceback (most recent call last):
File "mongo_push_demo.py", line 13, in <module>
{"$push": {"homework.bio", 92}})
File "/usr/local/lib/python2.7/dist-packages/pymongo-2.5-py2.7-linux-i686.egg/pymongo/collection.py", line 479, in update
check_keys, self.__uuid_subtype), safe)
File "/usr/local/lib/python2.7/dist-packages/pymongo-2.5-py2.7-linux-i686.egg/pymongo/message.py", line 110, in update
encoded = bson.BSON.encode(doc, check_keys, uuid_subtype)
File "/usr/local/lib/python2.7/dist-packages/pymongo-2.5-py2.7-linux-i686.egg/bson/__init__.py", line 567, in encode
return cls(_dict_to_bson(document, check_keys, uuid_subtype))
File "/usr/local/lib/python2.7/dist-packages/pymongo-2.5-py2.7-linux-i686.egg/bson/__init__.py", line 476, in _dict_to_bson
elements.append(_element_to_bson(key, value, check_keys, uuid_subtype))
File "/usr/local/lib/python2.7/dist-packages/pymongo-2.5-py2.7-linux-i686.egg/bson/__init__.py", line 466, in _element_to_bson
type(value))
bson.errors.InvalidDocument: cannot convert value of type <type 'set'> to bson
{"$push": {"homework.bio", 92}})
It should be :, not ,.
{'a', 1} is a set of two elements in Python, that's why you get the error.

Categories

Resources