Pymongo throws "OperationFailure: The key is too long" - python

After updating MongoDB from version 3.4.17 to 4.0.12 from time to time, when executing mongo.conn.COLLECTION_NAME.find({'email': ['1'] * 2000}), while email is a string value, the following error is thrown:
File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 787, in count
cmd, self.__collation, session=self.__session)
File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 1600, in _count
_cmd, self._read_preference_for(session), session)
File "/usr/local/lib/python2.7/dist-packages/pymongo/mongo_client.py", line 1465, in _retryable_read
return func(session, server, sock_info, slave_ok)
File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 1594, in _cmd
session=session)
File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 250, in _command
user_fields=user_fields)
File "/usr/local/lib/python2.7/dist-packages/pymongo/pool.py", line 613, in command
user_fields=user_fields)
File "/usr/local/lib/python2.7/dist-packages/pymongo/network.py", line 167, in command
parse_write_concern_error=parse_write_concern_error)
File "/usr/local/lib/python2.7/dist-packages/pymongo/helpers.py", line 159, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
OperationFailure: The key is too long
The returned error code is: 17280
Python: 2.7.16
Pymongo: 3.9.0
The query execution plan is:
{'executionStats': {'allPlansExecution': [],
'executionStages': {'advanced': 0,
'alreadyHasObj': 0,
'docsExamined': 0,
'executionTimeMillisEstimate': 0,
'filter': {'email': {'$eq': ['1',...,'1']}},
'inputStage': {'advanced': 0,
'direction': 'forward',
'dupsDropped': 0,
'dupsTested': 0,
'executionTimeMillisEstimate': 0,
'indexBounds': {'email': ['["1", "1"]', '[[ "1", ... , "1" ]]']},
'indexName': 'email_1',
'indexVersion': 2,
'invalidates': 0,
'isEOF': 1,
'isMultiKey': False,
'isPartial': False,
'isSparse': False,
'isUnique': False,
'keyPattern': {'email': 1},
'keysExamined': 1,
'multiKeyPaths': {'email': []},
'nReturned': 0,
'needTime': 1,
'needYield': 0,
'restoreState': 0,
'saveState': 0,
'seeks': 2,
'seenInvalidated': 0,
'stage': 'IXSCAN',
'works': 2},
'invalidates': 0,
'isEOF': 1,
'nReturned': 0,
'needTime': 1,
'needYield': 0,
'restoreState': 0,
'saveState': 0,
'stage': 'FETCH',
'works': 2},
'executionSuccess': True,
'executionTimeMillis': 0,
'nReturned': 0,
'totalDocsExamined': 0,
'totalKeysExamined': 1},
'ok': 1.0,
'queryPlanner': {'indexFilterSet': False,
'namespace': 'DATABASE_NAME.COLLECTION_NAME',
'parsedQuery': {'email': {'$eq': ['1', ... , '1']}},
'plannerVersion': 1,
'rejectedPlans': [],
'winningPlan': {'filter': {'email': {'$eq': ['1', ... , '1']}},
'inputStage': {'direction': 'forward',
'indexBounds': {'email': ['["1", "1"]', '[[ "1", ... ,"1" ]]']},
'indexName': 'email_1',
'indexVersion': 2,
'isMultiKey': False,
'isPartial': False,
'isSparse': False,
'isUnique': False,
'keyPattern': {'email': 1},
'multiKeyPaths': {'email': []},
'stage': 'IXSCAN'},
'stage': 'FETCH'}},
'serverInfo': {'gitVersion': '5776e3cbf9e7afe86e6b29e22520ffb6766e95d4',
'host': '*****',
'port': 27037,
'version': '4.0.12'}}
The error probably caused by a tremendous value (a long list) comparison, but I didn't find any documentation about it.
Is it a restriction that was added in version 4.0?

Related

Type Error for Path Data. Must be list or null

I am getting the following error when attempting to convert json data to a dataframe. I have successfully used this same method to convert json to a dataframe with similar data in the same script.
The full error:
TypeError: {'success': True, 'data': {'data1': 1, 'data2': 1, 'data3': 1, 'data4': True, 'data5': 0, 'data6': 0, 'data7': False, 'data8': 'ABC', 'start_date': '2000-04-14', 'end_date': '2000-09-23', 'data9': None, 'add_time': '2000-07-12 23:00:11', 'update_time': '2000-06-1420:18:55', 'data10': 1, 'data11': 'custom', 'data12': None}}
has non list value
{'data1': 1, 'data2': 1, 'data3': 1, 'data4': True, 'data5': 0, 'data6': 0, 'data7': False, 'data8': 'ABC', 'start_date': '2000-04-14', 'end_date': '2000-09-23', 'data9': None, 'add_time': '2000-07-12 23:00:11', 'update_time': '2000-06-1420:18:55', 'data10': 1, 'data11': 'custom', 'data12': None}
for path data. Must be list or null.
the function:
def get_subscriptions(id, df):
subscriptions_params = {'api_token': 'abc'}
subscriptions_headers = {'Content-Type': 'application/json'}
subscriptions_response = requests.get('https://url/{}'.format(id), params=subscriptions_params,
headers=subscriptions_headers)
subscriptions_data = subscriptions_response.json()
subscriptions_temp_df = pd.json_normalize(subscriptions_data, record_path=['data'])
I do the exact same thing with a similar (but actually more complex) piece of data with no problems. An example of the response that works:
{'success': True, 'data': [{'data1': 1, 'data2': {'data3': 1, 'name': 'name', 'email': 'email#email.com', 'data4': 0, 'data5': None, 'data6': False, 'data7': 1}, 'data8': {'data9': 1, 'name': 'name', 'email': 'email#email.com', 'data10': 0, 'data11': None, 'data12': True, 'data13': 1}, 'data14': {'data15': True, 'name': 'name' .... etc.
this one is actually massive, where as for the one with issues the error includes the full length of the data.
removed the actual data, but did not change the type of data. strings inside single quotes are just other strings. 1s are just other numbers, etc.
any ideas why one succeeds and another fails?
I do not know what the issue/difference is, but this small change works:
def get_subscriptions(id, df):
subscriptions_params = {'api_token': 'abc'}
subscriptions_headers = {'Content-Type': 'application/json'}
subscriptions_response = requests.get('https://url/{}'.format(id), params=subscriptions_params, headers=subscriptions_headers)
subscriptions_data = subscriptions_response.json()
subscriptions_data = subscriptions_data['data']
subscriptions_temp_df = pd.json_normalize(subscriptions_data)

Problems matching values from nested dictionary

In TestRail, I have created several testruns. When I execute:
test_runs = client.send_get('get_runs/1')
pprint(test_runs)
The following results are returned:
{'_links': {'next': None, 'prev': None},
'limit': 250,
'offset': 0,
'runs': [{'assignedto_id': None,
'blocked_count': 0,
'completed_on': None,
'config': None,
'config_ids': [],
'created_by': 1,
'created_on': 1651790693,
'custom_status1_count': 0,
'custom_status2_count': 0,
'custom_status3_count': 0,
'custom_status4_count': 0,
'custom_status5_count': 0,
'custom_status6_count': 0,
'custom_status7_count': 0,
'description': None,
'failed_count': 1,
'id': 13,
'include_all': False,
'is_completed': False,
'milestone_id': None,
'name': '2022-05-05-testrun',
'passed_count': 2,
'plan_id': None,
'project_id': 1,
'refs': None,
'retest_count': 0,
'suite_id': 1,
'untested_count': 0,
'updated_on': 1651790693,
'url': 'https://xxxxxxxxxx.testrail.io/index.php?/runs/view/13'},
{'assignedto_id': None,
'blocked_count': 0,
'completed_on': 1650989972,
'config': None,
'config_ids': [],
'created_by': 5,
'created_on': 1650966329,
'custom_status1_count': 0,
'custom_status2_count': 0,
'custom_status3_count': 0,
'custom_status4_count': 0,
'custom_status5_count': 0,
'custom_status6_count': 0,
'custom_status7_count': 0,
'description': None,
'failed_count': 0,
'id': 9,
'include_all': False,
'is_completed': True,
'milestone_id': None,
'name': 'This is a new test run',
'passed_count': 0,
'plan_id': None,
'project_id': 1,
'refs': None,
'retest_count': 0,
'suite_id': 1,
'untested_count': 3,
'updated_on': 1650966329,
'url': 'https://xxxxxxxxxx.testrail.io/index.php?/runs/view/9'}],
'size': 2}
In my code, I am trying to scan through all of the resulting testruns, locate the testrun I'm interested in by matching the testrun name, and then have the ID for the testrun returned.
from pprint import pprint
from testrail import *
class connecting():
def connectPostRun(self):
client = APIClient('https://xxxxxxxxxx.testrail.io')
client.user = 'abc#abc.com'
client.password = 'abc123'
test_run_name = '2022-05-05-testrun'
test_runs = client.send_get('get_runs/1')
pprint(test_runs)
for test_run in test_runs:
if test_run['name'] == test_run_name:
run_id = test_run['id']
break
return run_id
pprint(run_id)
c=connecting()
c.connectPostRun()
Executing the code as is now results in the following error:
if test_run['name'] == test_run_name:
TypeError: string indices must be integers
You're looping over the wrong part of the datastructure that the function returned. The loop for test_run in test_runs: only iterates over the keys of the top-level dictionary ("_links", "limit", etc.).
You want to be looping over test_runs['runs'], which will give you dictionaries with the "name" keys you're matching against. Try making your loop look like this:
for test_run in test_runs['runs']:
if test_run['name'] == test_run_name:
run_id = test_run['id']
break
I'd note that there's a potential problem in this code, that if you never find a matching run, the run_id variable will never be assigned to, so the return statement at the end of the function will raise an exception. If you think that could ever happen, you should probably either set a default value, or perhaps raise your own exception (with a more clear message) if you get into that situation.

Successfully insert multiple document into MongoDB [Python]

I have the following piece of code in python:
def pushHashtagPosts(hashtagPosts):
from bson.json_util import loads
myclient = pymongo.MongoClient(mongoUri)
mydb = myclient["myDB"]
mycol = mydb["hashtags"]
data = loads(hashtagPosts)
posts = mycol.insert_many(data)
Whereas, the content of hashtagPosts looks something like this:
hashtagPosts = [{'hashtag': '###!', 'PostHashHex': '13fc9904028fb62490a3b5dc2111689376e52a06dc636c3322cfa16e33a41398', 'post': {'_id': {'$oid': '608f8eb73718c7977f9c0a43'}, 'PostHashHex': '13fc9904028fb62490a3b5dc2111689376e52a06dc636c3322cfa16e33a41398', 'PosterPublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'ParentStakeID': '', 'Body': 'Need hashtags ####! Or else it’s a bit difficult to create personal brand and niche on this platform. \n\nDevs are u listening?', 'ImageURLs': [], 'RecloutedPostEntryResponse': None, 'CreatorBasisPoints': 0, 'StakeMultipleBasisPoints': 12500, 'TimestampNanos': 1.6177643730879583e+18, 'IsHidden': False, 'ConfirmationBlockHeight': 13248, 'InMempool': False, 'StakeEntry': {'TotalPostStake': 0, 'StakeList': []}, 'StakeEntryStats': {'TotalStakeNanos': 0, 'TotalStakeOwedNanos': 0, 'TotalCreatorEarningsNanos': 0, 'TotalFeesBurnedNanos': 0, 'TotalPostStakeNanos': 0}, 'ProfileEntryResponse': None, 'Comments': None, 'LikeCount': 5, 'PostEntryReaderState': None, 'InGlobalFeed': False, 'IsPinned': False, 'PostExtraData': {}, 'CommentCount': 2, 'RecloutCount': 0, 'ParentPosts': None, 'PublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'Username': ''}},
{'hashtag': 'investementstrategy', 'PostHashHex': '92f2d08ac8f2b47fe5868b748c7f472e13ad12c284bb0e327cf317b4c2514f83', 'post': {'_id': {'$oid': '608f8eb73718c7977f9c0a3f'}, 'PostHashHex': '92f2d08ac8f2b47fe5868b748c7f472e13ad12c284bb0e327cf317b4c2514f83', 'PosterPublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'ParentStakeID': '', 'Body': 'Don’t say that you are going to buy ur own coin to have a steady growth of ur coin \U0001fa99. That doesn’t show the strength of ur investment nor the coin.πŸ“‰πŸ“ˆ Strength lies in others believing in ur talent, creativity and passion enough to invest in U. πŸš€πŸš€πŸš€\n#InvestementStrategy', 'ImageURLs': [], 'RecloutedPostEntryResponse': None, 'CreatorBasisPoints': 0, 'StakeMultipleBasisPoints': 12500, 'TimestampNanos': 1.6178065064906166e+18, 'IsHidden': False, 'ConfirmationBlockHeight': 13397, 'InMempool': False, 'StakeEntry': {'TotalPostStake': 0, 'StakeList': []}, 'StakeEntryStats': {'TotalStakeNanos': 0, 'TotalStakeOwedNanos': 0, 'TotalCreatorEarningsNanos': 0, 'TotalFeesBurnedNanos': 0, 'TotalPostStakeNanos': 0}, 'ProfileEntryResponse': None, 'Comments': None, 'LikeCount': 2, 'PostEntryReaderState': None, 'InGlobalFeed': False, 'IsPinned': False, 'PostExtraData': {}, 'CommentCount': 1, 'RecloutCount': 0, 'ParentPosts': None, 'PublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'Username': ''}},
{'hashtag': 'productivity', 'PostHashHex': 'c8fabd96f5d624d06ec8d23e90de19cf07ad4b6696dac321fda815c3000fbf1b', 'post': {'_id': {'$oid': '608f8eb73718c7977f9c0a3d'}, 'PostHashHex': 'c8fabd96f5d624d06ec8d23e90de19cf07ad4b6696dac321fda815c3000fbf1b', 'PosterPublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'ParentStakeID': '', 'Body': 'What is the most productive thing u have done in last 24 hours apart from Bitclout???\n\n\U0001f9d0πŸ˜πŸ™ŒπŸΌ #productivity', 'ImageURLs': [], 'RecloutedPostEntryResponse': None, 'CreatorBasisPoints': 0, 'StakeMultipleBasisPoints': 12500, 'TimestampNanos': 1.6178362054980055e+18, 'IsHidden': False, 'ConfirmationBlockHeight': 13487, 'InMempool': False, 'StakeEntry': {'TotalPostStake': 0, 'StakeList': []}, 'StakeEntryStats': {'TotalStakeNanos': 0, 'TotalStakeOwedNanos': 0, 'TotalCreatorEarningsNanos': 0, 'TotalFeesBurnedNanos': 0, 'TotalPostStakeNanos': 0}, 'ProfileEntryResponse': None, 'Comments': None, 'LikeCount': 30, 'PostEntryReaderState': None, 'InGlobalFeed': True, 'IsPinned': False, 'PostExtraData': {}, 'CommentCount': 59, 'RecloutCount': 0, 'ParentPosts': None, 'PublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'Username': ''}}]
When I try to insert this data as insert_many() into mongodb I get the following error:
File "test.py", line X, in pushHashtagPosts
data = loads(hashtagPosts) TypeError: the JSON object must be str, bytes or bytearray, not 'list'
However, I have inserted the line 'data = loads(hashtagPosts)' based on the solution at bson.errors.InvalidDocument: key '$oid' must not start with '$' trying to insert document with pymongo because without the 'data = loads(hashtagPosts)' I was getting the following error:
bson.errors.InvalidDocument: key '$oid' must not start with '$'
How to resolve this and successfully insert many documents in the collection?
Your issue is that hashtagPosts is a list but loads expects to work on a string.
So working backwards, the question becomes how did you construct hashtagPosts in the first place? As it contains $oid values, it looks like an output from dumps; but an output from dumps is a string. not a list. So how did it become a list?
If you are creating it manually, then just set it using ObjectId, e.g.
from bson import ObjectId
item = {'_id': ObjectId('608f8eb73718c7977f9c0a43')}
and then you won't need to use loads.

Uisng Json to format and indent in website requests Python

I am trying to indent and assort the format of balance so that it is easier to read. I want to print the RequestResponse like the Expected output. The balance variable is of type tuple. How could I do such a thing?
import bybit
import json
balance = client.Wallet.Wallet_getBalance(coin="BTC").result()
print(balance)
Output:
({'ret_code': 0, 'ret_msg': 'OK', 'ext_code': '', 'ext_info': '', 'result': {'BTC': {'equity': 0.00208347, 'available_balance': 0.00208347, 'used_margin': 0, 'order_margin': 0, 'position_margin': 0, 'occ_closing_fee': 0, 'occ_funding_fee': 0, 'wallet_balance': 0.00208347, 'realised_pnl': 0, 'unrealised_pnl': 0, 'cum_realised_pnl': 8.347e-05, 'given_cash': 0, 'service_cash': 0}}, 'time_now': '1616685310.655072', 'rate_limit_status': 118, 'rate_limit_reset_ms': 1616685310652, 'rate_limit': 120}, <bravado.requests_client.RequestsResponseAdapter object at 0x000001F5E92EB048>)
Expected Output:
{
"cross_seq": 11518,
"data": [
{
"price": "2999.00",
"side": "Buy",
"size": 9,
"symbol": "BTCUSD"
},
{
"price": "3001.00",
"side": "Sell",
"size": 10,
"symbol": "BTCUSD"
}
],
"timestamp_e6": 1555647164875373,
"topic": "orderBookL2_25.BTCUSD",
"type": "snapshot"
}
I think you provided the wrong expected output since the fields between your output and expected output don't match but in general if you want a better display of a dictionary you can use the json package:
response = {'ret_code': 0, 'ret_msg': 'OK', 'ext_code': '', 'ext_info': '', 'result': {'BTC': {'equity': 0.00208347, 'available_balance': 0.00208347, 'used_margin': 0, 'order_margin': 0, 'position_margin': 0, 'occ_closing_fee': 0, 'occ_funding_fee': 0, 'wallet_balance': 0.00208347, 'realised_pnl': 0, 'unrealised_pnl': 0, 'cum_realised_pnl': 8.347e-05, 'given_cash': 0, 'service_cash': 0}}, 'time_now': '1616685310.655072', 'rate_limit_status': 118, 'rate_limit_reset_ms': 1616685310652, 'rate_limit': 120}
import json
json.loads(json.dumps(response, indent=4, sort_keys=True))
This will give you the following output:
{'ext_code': '',
'ext_info': '',
'rate_limit': 120,
'rate_limit_reset_ms': 1616685310652,
'rate_limit_status': 118,
'result': {'BTC': {'available_balance': 0.00208347,
'cum_realised_pnl': 8.347e-05,
'equity': 0.00208347,
'given_cash': 0,
'occ_closing_fee': 0,
'occ_funding_fee': 0,
'order_margin': 0,
'position_margin': 0,
'realised_pnl': 0,
'service_cash': 0,
'unrealised_pnl': 0,
'used_margin': 0,
'wallet_balance': 0.00208347}},
'ret_code': 0,
'ret_msg': 'OK',
'time_now': '1616685310.655072'}
Another solution is to use pprint
import pprint
pprint.pprint(response)
This will give you the following output:
{'ext_code': '',
'ext_info': '',
'rate_limit': 120,
'rate_limit_reset_ms': 1616685310652,
'rate_limit_status': 118,
'result': {'BTC': {'available_balance': 0.00208347,
'cum_realised_pnl': 8.347e-05,
'equity': 0.00208347,
'given_cash': 0,
'occ_closing_fee': 0,
'occ_funding_fee': 0,
'order_margin': 0,
'position_margin': 0,
'realised_pnl': 0,
'service_cash': 0,
'unrealised_pnl': 0,
'used_margin': 0,
'wallet_balance': 0.00208347}},
'ret_code': 0,
'ret_msg': 'OK',
'time_now': '1616685310.655072'}
Import JSON, then using json.dumps(balance, indent=4) would get you that format.
You could add keyword argument of sort_keys=True if you want them sorted.

Python 3.5 Pandas and MongoDB -json_normalize: raise TypeError("data argument can't be an iterator")

I am trying to do data transformation using pandas on python3.5.
Data is fetched from MongoDB using MongoClient() and json_normalize.
However when i execute below code it throws error as data argument can't be an iterator. Any pointers will help.
Sample Data :
{'bank_code': 'CID005', 'status': 'Init', 'cpgmid': '7847', 'blaze_transId': 'ZI4YQFFOTGG96ZRUQWZS121111632121509-9173782788741', 'currency': 'INR', 'amount': 7800, 'merchant_trans_id': '121111632121509-9173782788741', 'date_time': datetime.datetime(2016, 11, 11, 14, 1, 14, 44000), 'consumer_mobile': 9999999999.0, 'consumer_email': 'test#test.com', '_id': ObjectId('5825cf2a11eae123023730a9')}
{'bank_code': 'CID001', 'status': 'Init', 'cpgmid': '228', 'blaze_transId': '1rjfeklmg2281610111931334hjlm4j8xwl', 'currency': 'INR', 'amount': 651.4, 'merchant_trans_id': '161111569056', 'date_time': datetime.datetime(2016, 11, 11, 14, 1, 14, 333000), 'consumer_mobile': 9999992399.0, 'consumer_email': 'test#air.com', '_id': ObjectId('5825cf2a11eae123023730af')}
{'bank_code': 'CID001', '_id': ObjectId('5825cf2a097752b55d0f17ac'), 'custom_params': {'suppress_trans': 1}, 'currency': 'INR', 'merchant_trans_id': 'BX819215014788728725757', 'date_time': datetime.datetime(2016, 11, 11, 14, 1, 14, 421000), 'consumer_mobile': 0, 'status': 'Init', 'cpgmid': '1656', 'blaze_transId': '1bygejlxl16561610111931423bkgfe1uxx', 'amount': 577, 'consumer_email': 'p.25#gmail.com'}
Code:
start_datetime1 = (datetime.now() - timedelta(days=1)).replace(hour=18, minute=30, second=00, microsecond=0)
start_datetime2 = (datetime.now() - timedelta(days=0)).replace(hour=18, minute=29, second=59, microsecond=0)
client = MongoClient(host_val, int(port_val))
db = client.cit
transactions_collection = db.transactions
cursor = json_normalize(transactions_collection.find({'date_time': {'$lt': start_datetime2, '$gte': start_datetime1}},
{'_id': 1, 'blaze_transId': 1, 'status': 1, 'merchant_trans_id': 1,
'date_time': 1, 'amount': 1, 'status': 1, 'cpgmid': 1, 'currency': 1,
'status_msg': 1, 'bank_code': 1, 'custom_params.suppress_trans': 1,
'consumer_email': 1,'consumer_mobile': 1}))
df_txn = pd.DataFrame(cursor)
Error:
ERROR:root:Exception in fetch
Traceback (most recent call last):
File "/opt/Analytics-services/ETLservices/transformationService/Blazenet_Txns_Fact.py", line 174, in fetchBlazenetTxnsFromDB
'consumer_email': 1,'consumer_mobile': 1}))
File "/usr/local/lib/python3.5/site-packages/pandas/io/json.py", line 717, in json_normalize
return DataFrame(data)
File "/usr/local/lib/python3.5/site-packages/pandas/core/frame.py", line 283, in __init__
raise TypeError("data argument can't be an iterator")
TypeError: data argument can't be an iterator
You need to convert the cursor to a list before passing it to json_normalize.
cursor = transactions_collection.find({'date_time': {'$lt': start_datetime2, '$gte': start_datetime1}},
{'_id': 1, 'blaze_transId': 1, 'status': 1, 'merchant_trans_id': 1,
'date_time': 1, 'amount': 1, 'status': 1, 'cpgmid': 1, 'currency': 1,
'status_msg': 1, 'bank_code': 1, 'custom_params.suppress_trans': 1,
'consumer_email': 1,'consumer_mobile': 1})
df_txn = pd.DataFrame(json_normalize(list(cursor)))
You may also want to look at monary if you want to avoid having the massive ammounts of data converted to a list.
Along with Steves answer changed mongo query to avoid selecting data points which were not required. This is done to as custom_params was not getting flattened if i try to select it in mongo query.
cursor = transactions_collection.find({"date_time": {'$lt': start_datetime2, '$gte': start_datetime1}},{'bankRes':0,'rawDV':0})
df_txn = pd.DataFrame(json_normalize(list(cursor)))

Categories

Resources