Successfully insert multiple document into MongoDB [Python]

Successfully insert multiple document into MongoDB [Python] - python

I have the following piece of code in python:
def pushHashtagPosts(hashtagPosts):
from bson.json_util import loads
myclient = pymongo.MongoClient(mongoUri)
mydb = myclient["myDB"]
mycol = mydb["hashtags"]
data = loads(hashtagPosts)
posts = mycol.insert_many(data)
Whereas, the content of hashtagPosts looks something like this:
hashtagPosts = [{'hashtag': '###!', 'PostHashHex': '13fc9904028fb62490a3b5dc2111689376e52a06dc636c3322cfa16e33a41398', 'post': {'_id': {'$oid': '608f8eb73718c7977f9c0a43'}, 'PostHashHex': '13fc9904028fb62490a3b5dc2111689376e52a06dc636c3322cfa16e33a41398', 'PosterPublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'ParentStakeID': '', 'Body': 'Need hashtags ####! Or else it’s a bit difficult to create personal brand and niche on this platform. \n\nDevs are u listening?', 'ImageURLs': [], 'RecloutedPostEntryResponse': None, 'CreatorBasisPoints': 0, 'StakeMultipleBasisPoints': 12500, 'TimestampNanos': 1.6177643730879583e+18, 'IsHidden': False, 'ConfirmationBlockHeight': 13248, 'InMempool': False, 'StakeEntry': {'TotalPostStake': 0, 'StakeList': []}, 'StakeEntryStats': {'TotalStakeNanos': 0, 'TotalStakeOwedNanos': 0, 'TotalCreatorEarningsNanos': 0, 'TotalFeesBurnedNanos': 0, 'TotalPostStakeNanos': 0}, 'ProfileEntryResponse': None, 'Comments': None, 'LikeCount': 5, 'PostEntryReaderState': None, 'InGlobalFeed': False, 'IsPinned': False, 'PostExtraData': {}, 'CommentCount': 2, 'RecloutCount': 0, 'ParentPosts': None, 'PublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'Username': ''}},
{'hashtag': 'investementstrategy', 'PostHashHex': '92f2d08ac8f2b47fe5868b748c7f472e13ad12c284bb0e327cf317b4c2514f83', 'post': {'_id': {'$oid': '608f8eb73718c7977f9c0a3f'}, 'PostHashHex': '92f2d08ac8f2b47fe5868b748c7f472e13ad12c284bb0e327cf317b4c2514f83', 'PosterPublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'ParentStakeID': '', 'Body': 'Don’t say that you are going to buy ur own coin to have a steady growth of ur coin \U0001fa99. That doesn’t show the strength of ur investment nor the coin.📉📈 Strength lies in others believing in ur talent, creativity and passion enough to invest in U. 🚀🚀🚀\n#InvestementStrategy', 'ImageURLs': [], 'RecloutedPostEntryResponse': None, 'CreatorBasisPoints': 0, 'StakeMultipleBasisPoints': 12500, 'TimestampNanos': 1.6178065064906166e+18, 'IsHidden': False, 'ConfirmationBlockHeight': 13397, 'InMempool': False, 'StakeEntry': {'TotalPostStake': 0, 'StakeList': []}, 'StakeEntryStats': {'TotalStakeNanos': 0, 'TotalStakeOwedNanos': 0, 'TotalCreatorEarningsNanos': 0, 'TotalFeesBurnedNanos': 0, 'TotalPostStakeNanos': 0}, 'ProfileEntryResponse': None, 'Comments': None, 'LikeCount': 2, 'PostEntryReaderState': None, 'InGlobalFeed': False, 'IsPinned': False, 'PostExtraData': {}, 'CommentCount': 1, 'RecloutCount': 0, 'ParentPosts': None, 'PublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'Username': ''}},
{'hashtag': 'productivity', 'PostHashHex': 'c8fabd96f5d624d06ec8d23e90de19cf07ad4b6696dac321fda815c3000fbf1b', 'post': {'_id': {'$oid': '608f8eb73718c7977f9c0a3d'}, 'PostHashHex': 'c8fabd96f5d624d06ec8d23e90de19cf07ad4b6696dac321fda815c3000fbf1b', 'PosterPublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'ParentStakeID': '', 'Body': 'What is the most productive thing u have done in last 24 hours apart from Bitclout???\n\n\U0001f9d0😏🙌🏼 #productivity', 'ImageURLs': [], 'RecloutedPostEntryResponse': None, 'CreatorBasisPoints': 0, 'StakeMultipleBasisPoints': 12500, 'TimestampNanos': 1.6178362054980055e+18, 'IsHidden': False, 'ConfirmationBlockHeight': 13487, 'InMempool': False, 'StakeEntry': {'TotalPostStake': 0, 'StakeList': []}, 'StakeEntryStats': {'TotalStakeNanos': 0, 'TotalStakeOwedNanos': 0, 'TotalCreatorEarningsNanos': 0, 'TotalFeesBurnedNanos': 0, 'TotalPostStakeNanos': 0}, 'ProfileEntryResponse': None, 'Comments': None, 'LikeCount': 30, 'PostEntryReaderState': None, 'InGlobalFeed': True, 'IsPinned': False, 'PostExtraData': {}, 'CommentCount': 59, 'RecloutCount': 0, 'ParentPosts': None, 'PublicKeyBase58Check': 'BC1YLhKJZZcPB2WbZSSekFF19UshsmmPoEjtEqrYakzusLmL25xxAJv', 'Username': ''}}]
When I try to insert this data as insert_many() into mongodb I get the following error:
File "test.py", line X, in pushHashtagPosts
data = loads(hashtagPosts) TypeError: the JSON object must be str, bytes or bytearray, not 'list'
However, I have inserted the line 'data = loads(hashtagPosts)' based on the solution at bson.errors.InvalidDocument: key '$oid' must not start with '$' trying to insert document with pymongo because without the 'data = loads(hashtagPosts)' I was getting the following error:
bson.errors.InvalidDocument: key '$oid' must not start with '$'
How to resolve this and successfully insert many documents in the collection?

Your issue is that hashtagPosts is a list but loads expects to work on a string.
So working backwards, the question becomes how did you construct hashtagPosts in the first place? As it contains $oid values, it looks like an output from dumps; but an output from dumps is a string. not a list. So how did it become a list?
If you are creating it manually, then just set it using ObjectId, e.g.
from bson import ObjectId
item = {'_id': ObjectId('608f8eb73718c7977f9c0a43')}
and then you won't need to use loads.

Related

Type Error for Path Data. Must be list or null

I am getting the following error when attempting to convert json data to a dataframe. I have successfully used this same method to convert json to a dataframe with similar data in the same script.
The full error:
TypeError: {'success': True, 'data': {'data1': 1, 'data2': 1, 'data3': 1, 'data4': True, 'data5': 0, 'data6': 0, 'data7': False, 'data8': 'ABC', 'start_date': '2000-04-14', 'end_date': '2000-09-23', 'data9': None, 'add_time': '2000-07-12 23:00:11', 'update_time': '2000-06-1420:18:55', 'data10': 1, 'data11': 'custom', 'data12': None}}
has non list value
{'data1': 1, 'data2': 1, 'data3': 1, 'data4': True, 'data5': 0, 'data6': 0, 'data7': False, 'data8': 'ABC', 'start_date': '2000-04-14', 'end_date': '2000-09-23', 'data9': None, 'add_time': '2000-07-12 23:00:11', 'update_time': '2000-06-1420:18:55', 'data10': 1, 'data11': 'custom', 'data12': None}
for path data. Must be list or null.
the function:
def get_subscriptions(id, df):
subscriptions_params = {'api_token': 'abc'}
subscriptions_headers = {'Content-Type': 'application/json'}
subscriptions_response = requests.get('https://url/{}'.format(id), params=subscriptions_params,
headers=subscriptions_headers)
subscriptions_data = subscriptions_response.json()
subscriptions_temp_df = pd.json_normalize(subscriptions_data, record_path=['data'])
I do the exact same thing with a similar (but actually more complex) piece of data with no problems. An example of the response that works:
{'success': True, 'data': [{'data1': 1, 'data2': {'data3': 1, 'name': 'name', 'email': 'email#email.com', 'data4': 0, 'data5': None, 'data6': False, 'data7': 1}, 'data8': {'data9': 1, 'name': 'name', 'email': 'email#email.com', 'data10': 0, 'data11': None, 'data12': True, 'data13': 1}, 'data14': {'data15': True, 'name': 'name' .... etc.
this one is actually massive, where as for the one with issues the error includes the full length of the data.
removed the actual data, but did not change the type of data. strings inside single quotes are just other strings. 1s are just other numbers, etc.
any ideas why one succeeds and another fails?

I do not know what the issue/difference is, but this small change works:
def get_subscriptions(id, df):
subscriptions_params = {'api_token': 'abc'}
subscriptions_headers = {'Content-Type': 'application/json'}
subscriptions_response = requests.get('https://url/{}'.format(id), params=subscriptions_params, headers=subscriptions_headers)
subscriptions_data = subscriptions_response.json()
subscriptions_data = subscriptions_data['data']
subscriptions_temp_df = pd.json_normalize(subscriptions_data)

Unable to access the key of a dictionary in python

I'm trying to access a key of a dictionary here
{'application_id': '467377486141980682',
'attachments': [],
'author': {'avatar': '25cb9058944599f9f1ba15279f9e4a8f',
'bot': True,
'discriminator': '0000',
'id': '964810500611375104',
'username': 'Sparky99'},
'channel_id': '790967460039491644',
'components': [],
'content': '73561',
'edited_timestamp': None,
'embeds': [],
'flags': 0,
'id': '976330290698006528',
'mention_everyone': False,
'mention_roles': [],
'mentions': [],
'pinned': False,
'timestamp': '2022-05-18T03:48:00.642000+00:00',
'tts': False,
'type': 0,
'webhook_id': '964810500611375104'}
I can access all these except the username.
When I run dic.get("username") , it returns none like if doesn't exists but all other elements are accessible to me using dic.get()
Note: Tell me if I need to add some more info to the question.

username isn't a key in that dictionary. There's a key called author whose value is a dictionary and in there, you can find username.

Problems matching values from nested dictionary

In TestRail, I have created several testruns. When I execute:
test_runs = client.send_get('get_runs/1')
pprint(test_runs)
The following results are returned:
{'_links': {'next': None, 'prev': None},
'limit': 250,
'offset': 0,
'runs': [{'assignedto_id': None,
'blocked_count': 0,
'completed_on': None,
'config': None,
'config_ids': [],
'created_by': 1,
'created_on': 1651790693,
'custom_status1_count': 0,
'custom_status2_count': 0,
'custom_status3_count': 0,
'custom_status4_count': 0,
'custom_status5_count': 0,
'custom_status6_count': 0,
'custom_status7_count': 0,
'description': None,
'failed_count': 1,
'id': 13,
'include_all': False,
'is_completed': False,
'milestone_id': None,
'name': '2022-05-05-testrun',
'passed_count': 2,
'plan_id': None,
'project_id': 1,
'refs': None,
'retest_count': 0,
'suite_id': 1,
'untested_count': 0,
'updated_on': 1651790693,
'url': 'https://xxxxxxxxxx.testrail.io/index.php?/runs/view/13'},
{'assignedto_id': None,
'blocked_count': 0,
'completed_on': 1650989972,
'config': None,
'config_ids': [],
'created_by': 5,
'created_on': 1650966329,
'custom_status1_count': 0,
'custom_status2_count': 0,
'custom_status3_count': 0,
'custom_status4_count': 0,
'custom_status5_count': 0,
'custom_status6_count': 0,
'custom_status7_count': 0,
'description': None,
'failed_count': 0,
'id': 9,
'include_all': False,
'is_completed': True,
'milestone_id': None,
'name': 'This is a new test run',
'passed_count': 0,
'plan_id': None,
'project_id': 1,
'refs': None,
'retest_count': 0,
'suite_id': 1,
'untested_count': 3,
'updated_on': 1650966329,
'url': 'https://xxxxxxxxxx.testrail.io/index.php?/runs/view/9'}],
'size': 2}
In my code, I am trying to scan through all of the resulting testruns, locate the testrun I'm interested in by matching the testrun name, and then have the ID for the testrun returned.
from pprint import pprint
from testrail import *
class connecting():
def connectPostRun(self):
client = APIClient('https://xxxxxxxxxx.testrail.io')
client.user = 'abc#abc.com'
client.password = 'abc123'
test_run_name = '2022-05-05-testrun'
test_runs = client.send_get('get_runs/1')
pprint(test_runs)
for test_run in test_runs:
if test_run['name'] == test_run_name:
run_id = test_run['id']
break
return run_id
pprint(run_id)
c=connecting()
c.connectPostRun()
Executing the code as is now results in the following error:
if test_run['name'] == test_run_name:
TypeError: string indices must be integers

You're looping over the wrong part of the datastructure that the function returned. The loop for test_run in test_runs: only iterates over the keys of the top-level dictionary ("_links", "limit", etc.).
You want to be looping over test_runs['runs'], which will give you dictionaries with the "name" keys you're matching against. Try making your loop look like this:
for test_run in test_runs['runs']:
if test_run['name'] == test_run_name:
run_id = test_run['id']
break
I'd note that there's a potential problem in this code, that if you never find a matching run, the run_id variable will never be assigned to, so the return statement at the end of the function will raise an exception. If you think that could ever happen, you should probably either set a default value, or perhaps raise your own exception (with a more clear message) if you get into that situation.

how to retrieve a link from a discord message?

i'm trying to create a program, which needs to read messages from a discord bot and retrieve links from these messages.
here's the code:
import requests
import json
from bs4 import builder
import bs4
def retrieve_messages(channelid):
headers = {
'authorization': 'NTQ5OTM4ODEzOTUxMTQ4MDQ3.YMi7CQ.fOm6F-dmPJPEW0dehLwCkB_ilBU'
}
r = requests.get(f'https://discord.com/api/v9/channels/{channelid}/messages', headers=headers)
jsonn = json.loads(r.text)
for value in jsonn:
print(value, '\n')
retrieve_messages('563699841377763348')
here's the output:
{'id': '908857015412084796', 'type': 0, 'content': '<#&624528614330859520>', 'channel_id': '5636998413777633, 2021.```\n5J53T-BKJK5-CTXBZ-JJJTJ-WW6F3```Redeem on48', 'author': {'id': '749499357761503284', 'username': 'shift', 'avatar': 'de9cd6f3224e660a4b6906a89fc2bc15/shift-code/5j53t-bkjk5-ctxbz-jjjtj-ww6f3/?utm_source', 'discriminator': '6125', 'public_flags': 0, 'bot': True}, 'attachments': [], 'embeds': [], 'mentions': []'pinned': False, 'mention_everyone': False, 'tts': Fa, 'mention_roles': ['624528614330859520'], 'pinned': False, 'mention_everyone': False, 'tts': False, 'timest}amp': '2021-11-12T23:13:18.221000+00:00', 'edited_timestamp': None, 'flags': 0, 'components': []}
{'id': '908857014430629898', 'type': 0, 'content': '', 'channel_id': '563699841377763348', 'author': {'id':
'749499357761503284', 'username': 'shift', 'avatar': 'de9cd6f3224e660a4b6906a89fc2bc15', 'discriminator': '6125', 'public_flags': 0, 'bot': True}, 'attachments': [], 'embeds': [{'type': 'rich', 'title': '<:GoldenKey:273763771929853962> Borderlands 1: 5 gold keys', 'description': 'Platform: Universal\nExpires: 30 November,
2021.```\n5J53T-BKJK5-CTXBZ-JJJTJ-WW6F3```Redeem on the [website](https://shift.gearboxsoftware.com/rewards) or in game.\n\n[Source](https://shift.orcicorn.com/shift-code/5j53t-bkjk5-ctxbz-jjjtj-ww6f3/?utm_source=json&utm_medium=shift&utm_campaign=automation)', 'color': 16040976}], 'mentions': [], 'mention_roles': [], 'pinned': False, 'mention_everyone': False, 'tts': False, 'timestamp': '2021-11-12T23:13:17.987000+00:00', 'edited_timestamp': None, 'flags': 1, 'components': []}
in the output there are 2 links, but I need to save the second link to a variable, and I'm wondering how I can do that

This is easiest done with the response body as a text object that can be scanned with regex to find the URLs
Solution
The variable test_case_data is the response body in TEXT form as a string.
import re
regex = r"(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:\/~+#-]*[\w#?^=%&\/~+#-])"
def find_embedded_urls(data):
return re.finditer(regex,data)
test_case_data = """'id': '908857014430629898', 'type': 0, 'content': '', 'channel_id': '563699841377763348', 'author': {'id':
'749499357761503284', 'username': 'shift', 'avatar': 'de9cd6f3224e660a4b6906a89fc2bc15', 'discriminator': '6125', 'public_flags': 0, 'bot': True}, 'attachments': [], 'embeds': [{'type': 'rich', 'title': '<:GoldenKey:273763771929853962> Borderlands 1: 5 gold keys', 'description': 'Platform: Universal\nExpires: 30 November,
2021.```\n5J53T-BKJK5-CTXBZ-JJJTJ-WW6F3```Redeem on the [website](https://shift.gearboxsoftware.com/rewards) or in game.\n\n[Source](https://shift.orcicorn.com/shift-code/5j53t-bkjk5-ctxbz-jjjtj-ww6f3/?utm_source=json&utm_medium=shift&utm_campaign=automation)', 'color': 16040976}], 'mentions': [], 'mention_roles': [], 'pinned': False, 'mention_everyone': False, 'tts': False, 'timestamp': '2021-11-12T23:13:17.987000+00:00', 'edited_timestamp': None, 'flags': 1, 'components': []}"""
# test_case_data = response.text
matches = find_embedded_urls(test_case_data)
matches = [match[0] for match in matches] #convert all urls to strings
print(matches) # List of all the urls! Index for whatever one you need
Output
['https://shift.gearboxsoftware.com/rewards', 'https://shift.orcicorn.com/shift-code/5j53t-bkjk5-ctxbz-jjjtj-ww6f3/?utm_source=json&utm_medium=shift&utm_campaign=automation']
With the URLs as a list index, you can set variables by indexing the list at whatever point you need.

How do I only call the dictionary value within a list if it meets a condition?

I am trying to call all the values from a dictionary that is within a list if the value of the key is within a separate list.
For example, I have this listed dictionary:
status = [{'name': 'Carrousel', 'wait': 0, 'isOpen': True, 'single_rider': None},
{'name': 'Balloon Flite', 'wait': 0, 'isOpen': True, 'single_rider': None},
{'name': 'Skyrush', 'wait': 0, 'isOpen': False, 'single_rider': None},
{'name': 'SooperDooperLooper',
'wait': 5,
'isOpen': True,
'single_rider': None},
{'name': 'Fahrenheit', 'wait': 20, 'isOpen': True, 'single_rider': None},
{'name': 'Dummy', 'wait': 0, 'isOpen': False, 'single_rider': None}]
I also have this list:
route = ['Skyrush', 'SooperDooperLooper', 'Carrousel', 'Fahrenheit']
Basically, I wanted to print out the values of 'wait' in status for those names in route.
I know how to call the values of the row if I know the index but I'm having trouble trying to call only rows that contains the specific value of 'name'.
My expected result is something like:
0
5
0
20
Those are basically the 'wait' times of each respective rides in consecutive order within route.
Thank you! Any help would be greatly appreciated. I looked through other postings but couldn't find anything that is similar to my question.

Adding too #Chris's answer, to get the expected order:
[d['wait'] for d in sorted(status, key=lambda x: ''.join(route).find(x['name'])) if d['name'] in route]
Output:
[0, 5, 0, 20]

Use list comprehension:
[d['wait'] for d in status if d['name'] in route]
Output:
[0, 0, 5, 20]

As indicated in #Chris's answer, a list comprehension is the way to go.
However, if you want the order to be the order matching your route, this would be a solution:
[next(s for s in status if s['name'] == name)['wait'] for name in route]
That actually gets you [0, 5, 0, 20] instead of [0, 0, 5, 20].
This also directly answers your actual question: how to access a list item in a list of dictionaries, by referencing a dictionary key.
next(item for item in some_list_of_dicts if s['key'] == 'some value')
Gets you the first item matching the condition.

Another approach: Change your input so you can access things more easily
name_indexed_status = dict(map(lambda x: (x.pop("name"), x), status))
print (str(name_indexed_status))
for r in route:
print(name_indexed_status[r]["wait"])
{
'Carrousel': {'wait': 0, 'isOpen': True, 'single_rider': None},
'Balloon Flite': {'wait': 0, 'isOpen': True, 'single_rider': None},
'Skyrush': {'wait': 0, 'isOpen': False, 'single_rider': None},
'SooperDooperLooper': {'wait': 5, 'isOpen': True, 'single_rider': None},
'Fahrenheit': {'wait': 20, 'isOpen': True, 'single_rider': None},
'Dummy': {'wait': 0, 'isOpen': False, 'single_rider': None}
}
0
5
0
20

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Successfully insert multiple document into MongoDB [Python] - python

Related

Type Error for Path Data. Must be list or null

Unable to access the key of a dictionary in python

Problems matching values from nested dictionary

how to retrieve a link from a discord message?

How do I only call the dictionary value within a list if it meets a condition?

Categories

Resources