Related
I have the following code. I am trying to access to https://api.github.com/users/jtorre94 via the requests library.
import requests
api_url = "https://api.github.com/users"
response = requests.get(api_url, params={'login': 'jtorre94'})
response.json()
However, the response is something I do not recognize at all, like if it was not filtered by the jtorre94 parameter.
[{'login': 'mojombo',
'id': 1,
'node_id': 'MDQ6VXNlcjE=',
'avatar_url': 'https://avatars.githubusercontent.com/u/1?v=4',
'gravatar_id': '',
'url': 'https://api.github.com/users/mojombo',
'html_url': 'https://github.com/mojombo',
'followers_url': 'https://api.github.com/users/mojombo/followers',
'following_url': 'https://api.github.com/users/mojombo/following{/other_user}',
'gists_url': 'https://api.github.com/users/mojombo/gists{/gist_id}',
'starred_url': 'https://api.github.com/users/mojombo/starred{/owner}{/repo}',
'subscriptions_url': 'https://api.github.com/users/mojombo/subscriptions',
'organizations_url': 'https://api.github.com/users/mojombo/orgs',
'repos_url': 'https://api.github.com/users/mojombo/repos',
'events_url': 'https://api.github.com/users/mojombo/events{/privacy}',
'received_events_url': 'https://api.github.com/users/mojombo/received_events',
'type': 'User',
'site_admin': False},
{'login': 'defunkt',
'id': 2,
'node_id': 'MDQ6VXNlcjI=',
'avatar_url': 'https://avatars.githubusercontent.com/u/2?v=4',
'gravatar_id': '',
'url': 'https://api.github.com/users/defunkt',
'html_url': 'https://github.com/defunkt',
'followers_url': 'https://api.github.com/users/defunkt/followers',
'following_url': 'https://api.github.com/users/defunkt/following{/...
How can I retrieve the json for username jtorre94?
Append it to the url as you already tried with your browser:
import requests
user = 'jtorre94'
api_url = f"https://api.github.com/users/{user}"
response = requests.get(api_url)
response.json()
Output:
{'login': 'jtorre94',
'id': 76944588,
'node_id': 'MDQ6VXNlcjc2OTQ0NTg4',
'avatar_url': 'https://avatars.githubusercontent.com/u/76944588?v=4',
'gravatar_id': '',
'url': 'https://api.github.com/users/jtorre94',
'html_url': 'https://github.com/jtorre94',
'followers_url': 'https://api.github.com/users/jtorre94/followers',
'following_url': 'https://api.github.com/users/jtorre94/following{/other_user}',
'gists_url': 'https://api.github.com/users/jtorre94/gists{/gist_id}',
'starred_url': 'https://api.github.com/users/jtorre94/starred{/owner}{/repo}',
'subscriptions_url': 'https://api.github.com/users/jtorre94/subscriptions',
'organizations_url': 'https://api.github.com/users/jtorre94/orgs',
'repos_url': 'https://api.github.com/users/jtorre94/repos',
'events_url': 'https://api.github.com/users/jtorre94/events{/privacy}',
'received_events_url': 'https://api.github.com/users/jtorre94/received_events',
'type': 'User',
'site_admin': False,
'name': None,
'company': None,
'blog': '',
'location': None,
'email': None,
'hireable': None,
'bio': None,
'twitter_username': None,
'public_repos': 4,
'public_gists': 0,
'followers': 0,
'following': 0,
'created_at': '2021-01-04T10:11:25Z',
'updated_at': '2022-07-23T11:17:18Z'}
I have working code in Python (publish only part connected with question):
###Here Request and Response from API (Response containing some "id" value)
***
response = request.execute()
print(response)
for item in response["id"]:
print(item) #Give me only ID in reponse
json_data = response
for item in json_data["id"]:
with open("response.json", "w") as outfile:
outfile.write(str(json_data["id"])) #write only ID in file "response.json"
with open('response.json', 'r') as f: #Assign to Variable1 value from 1st line of "response.json"
Variable1 = f.readline()
print(Variable1) #print Variable1 for check
print(response) from API gives (some value changed for "***"):
Loading Credentials From File...
Refreshing Access Token...
{'kind': 'youtube#liveBroadcast', 'etag': '*********', 'id': *********, 'snippet': {'publishedAt': '2022-10-13T10:14:14Z', 'channelId': '*********', 'title': 'Test broadcast', 'description': '', 'thumbnails': {'default': {'url': 'https://i.ytimg.com/vi/*********/default_live.jpg', 'width': 120, 'height': 90}, 'medium': {'url': 'https://i.ytimg.com/vi/*********/mqdefault_live.jpg', 'width': 320, 'height': 180}, 'high': {'url': 'https://i.ytimg.com/vi/*********/hqdefault_live.jpg', 'width': 480, 'height': 360}}, 'scheduledStartTime': '2022-10-13T17:31:00Z', 'scheduledEndTime': '2022-10-13T17:32:00Z', 'isDefaultBroadcast': False, 'liveChatId': '*********'}, 'status': {'lifeCycleStatus': 'created', 'privacyStatus': 'unlisted', 'recordingStatus': 'notRecording', 'madeForKids': False, 'selfDeclaredMadeForKids': False}, 'contentDetails': {'monitorStream': {'enableMonitorStream': True, 'broadcastStreamDelayMs': 0, 'embedHtml': '<iframe width="425" height="344" src="https://www.youtube.com/embed/*********?autoplay=1&livemonitor=1" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>'}, 'enableEmbed': True, 'enableDvr': True, 'enableContentEncryption': True, 'startWithSlate': False, 'recordFromStart': True, 'enableClosedCaptions': True, 'closedCaptionsType': 'closedCaptionsHttpPost', 'enableLowLatency': False, 'latencyPreference': 'normal', 'projection': 'rectangular', 'enableAutoStart': False, 'enableAutoStop': False}}
[Finished in 2.0s]
Is there any simplier way to assign to Variable1 value of id from response of API in Python language?
You should start with the 'JSON Reponse Content' from Requests documentation
Then you can assign the value of id to your response_id variable:
json_response = r.json()
response_id = json_response["id"]
i'm trying to create a program, which needs to read messages from a discord bot and retrieve links from these messages.
here's the code:
import requests
import json
from bs4 import builder
import bs4
def retrieve_messages(channelid):
headers = {
'authorization': 'NTQ5OTM4ODEzOTUxMTQ4MDQ3.YMi7CQ.fOm6F-dmPJPEW0dehLwCkB_ilBU'
}
r = requests.get(f'https://discord.com/api/v9/channels/{channelid}/messages', headers=headers)
jsonn = json.loads(r.text)
for value in jsonn:
print(value, '\n')
retrieve_messages('563699841377763348')
here's the output:
{'id': '908857015412084796', 'type': 0, 'content': '<#&624528614330859520>', 'channel_id': '5636998413777633, 2021.```\n5J53T-BKJK5-CTXBZ-JJJTJ-WW6F3```Redeem on48', 'author': {'id': '749499357761503284', 'username': 'shift', 'avatar': 'de9cd6f3224e660a4b6906a89fc2bc15/shift-code/5j53t-bkjk5-ctxbz-jjjtj-ww6f3/?utm_source', 'discriminator': '6125', 'public_flags': 0, 'bot': True}, 'attachments': [], 'embeds': [], 'mentions': []'pinned': False, 'mention_everyone': False, 'tts': Fa, 'mention_roles': ['624528614330859520'], 'pinned': False, 'mention_everyone': False, 'tts': False, 'timest}amp': '2021-11-12T23:13:18.221000+00:00', 'edited_timestamp': None, 'flags': 0, 'components': []}
{'id': '908857014430629898', 'type': 0, 'content': '', 'channel_id': '563699841377763348', 'author': {'id':
'749499357761503284', 'username': 'shift', 'avatar': 'de9cd6f3224e660a4b6906a89fc2bc15', 'discriminator': '6125', 'public_flags': 0, 'bot': True}, 'attachments': [], 'embeds': [{'type': 'rich', 'title': '<:GoldenKey:273763771929853962> Borderlands 1: 5 gold keys', 'description': 'Platform: Universal\nExpires: 30 November,
2021.```\n5J53T-BKJK5-CTXBZ-JJJTJ-WW6F3```Redeem on the [website](https://shift.gearboxsoftware.com/rewards) or in game.\n\n[Source](https://shift.orcicorn.com/shift-code/5j53t-bkjk5-ctxbz-jjjtj-ww6f3/?utm_source=json&utm_medium=shift&utm_campaign=automation)', 'color': 16040976}], 'mentions': [], 'mention_roles': [], 'pinned': False, 'mention_everyone': False, 'tts': False, 'timestamp': '2021-11-12T23:13:17.987000+00:00', 'edited_timestamp': None, 'flags': 1, 'components': []}
in the output there are 2 links, but I need to save the second link to a variable, and I'm wondering how I can do that
This is easiest done with the response body as a text object that can be scanned with regex to find the URLs
Solution
The variable test_case_data is the response body in TEXT form as a string.
import re
regex = r"(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:\/~+#-]*[\w#?^=%&\/~+#-])"
def find_embedded_urls(data):
return re.finditer(regex,data)
test_case_data = """'id': '908857014430629898', 'type': 0, 'content': '', 'channel_id': '563699841377763348', 'author': {'id':
'749499357761503284', 'username': 'shift', 'avatar': 'de9cd6f3224e660a4b6906a89fc2bc15', 'discriminator': '6125', 'public_flags': 0, 'bot': True}, 'attachments': [], 'embeds': [{'type': 'rich', 'title': '<:GoldenKey:273763771929853962> Borderlands 1: 5 gold keys', 'description': 'Platform: Universal\nExpires: 30 November,
2021.```\n5J53T-BKJK5-CTXBZ-JJJTJ-WW6F3```Redeem on the [website](https://shift.gearboxsoftware.com/rewards) or in game.\n\n[Source](https://shift.orcicorn.com/shift-code/5j53t-bkjk5-ctxbz-jjjtj-ww6f3/?utm_source=json&utm_medium=shift&utm_campaign=automation)', 'color': 16040976}], 'mentions': [], 'mention_roles': [], 'pinned': False, 'mention_everyone': False, 'tts': False, 'timestamp': '2021-11-12T23:13:17.987000+00:00', 'edited_timestamp': None, 'flags': 1, 'components': []}"""
# test_case_data = response.text
matches = find_embedded_urls(test_case_data)
matches = [match[0] for match in matches] #convert all urls to strings
print(matches) # List of all the urls! Index for whatever one you need
Output
['https://shift.gearboxsoftware.com/rewards', 'https://shift.orcicorn.com/shift-code/5j53t-bkjk5-ctxbz-jjjtj-ww6f3/?utm_source=json&utm_medium=shift&utm_campaign=automation']
With the URLs as a list index, you can set variables by indexing the list at whatever point you need.
I want to pull a report which is over 2000 rows from Salesforce via API using python. How do I update the post request to send the updated metadata with the new filters in order to get the next 2000 rows of data? Here is the code I have, but the response of the post-request has the same exact filters as before. What am I doing wrong here?
Excerpt of Code:
headers = {
'Content-type': 'application/json',
'Accept-Encoding': 'gzip',
'Authorization': 'Bearer %s' % access_token
}
parameters={}
descripion = requests.request('get', instance_url+'/services/data/v51.0/analytics/reports/00O4Q000009VEPCUA4/describe',
headers=headers, params=parameters, timeout=30).json()
orig_metadata = descripion['reportMetadata']
id_column='CUST_NAME'
last_load_num='162451'
sf_id_column = descripion['reportExtendedMetadata']['detailColumnInfo'][id_column]['label']
print(sf_id_column)
metadata = {
'reportBooleanFilter': '({}) AND {}'.format(orig_metadata['reportBooleanFilter'],
len(orig_metadata['reportFilters']) + 1),
'reportFilters': orig_metadata['reportFilters']+[{'column':id_column,
'filterType': 'fieldValue',
'isRunPageEditable': True,
'operator': 'greaterThan',
'value': last_load_num}],
'standardDateFilter':[{'column': 'CUST_CREATED_DATE','durationValue': 'CUSTOM',
'endDate': '2021-07-14','startDate': '2021-07-01'}],
'detailColumns': orig_metadata['detailColumns'][:],
'sortBy': [{'sortColumn': id_column, 'sortOrder': 'Asc'}],
}
r=requests.request('post', instance_url+'/services/data/v51.0/analytics/reports/00O4Q000009VEPCUA4',
headers=headers, params={'metadata':metadata}, timeout=30).json()
Here is what's in the original metadata:
{'aggregates': ['s!rtms__Load__c.rtms__Carrier_Quote_Total__c', 'RowCount'],
'chart': None,
'crossFilters': [],
'currency': None,
'dashboardSetting': None,
'description': None,
'detailColumns': ['CUST_NAME',
'CUST_CREATED_DATE',
'rtms__Load__c.rtms__Expected_Ship_Date2__c',
'rtms__Load__c.rtms__Load_Status__c',
'rtms__Load__c.rtms__Total_Weight__c',
'rtms__Load__c.rtms__Equipment_Type__c',
'rtms__Load__c.rtms__Origin__c',
'rtms__Load__c.rtms__Destination__c',
'rtms__Load__c.rtms__Zip3_Lane__c',
'rtms__Load__c.rtms__Zip5_Lane__c',
'rtms__Load__c.rtms__Carrier_Quote_Total__c',
'rtms__Load__c.rtms__Customer_Quote_Total__c'],
'developerName': 'Adel_Past_Shipment_Test_Pricing_Tool',
'division': None,
'folderId': '00l1U000000eXWwQAM',
'groupingsAcross': [],
'groupingsDown': [],
'hasDetailRows': True,
'hasRecordCount': True,
'historicalSnapshotDates': [],
'id': '00O4Q000009VEPCUA4',
'name': 'Adel Past Shipment Test Pricing Tool',
'presentationOptions': {'hasStackedSummaries': True},
'reportBooleanFilter': None,
'reportFilters': [{'column': 'rtms__Load__c.rtms__Customer__c',
'filterType': 'fieldValue',
'isRunPageEditable': True,
'operator': 'contains',
'value': 'adel'},
{'column': 'rtms__Load__c.rtms__Load_Status__c',
'filterType': 'fieldValue',
'isRunPageEditable': True,
'operator': 'notContain',
'value': 'cancelled'}],
'reportFormat': 'TABULAR',
'reportType': {'label': 'Loads', 'type': 'CustomEntity$rtms__Load__c'},
'scope': 'organization',
'showGrandTotal': True,
'showSubtotals': True,
'sortBy': [{'sortColumn': 'CUST_CREATED_DATE', 'sortOrder': 'Desc'}],
'standardDateFilter': {'column': 'CUST_CREATED_DATE',
'durationValue': 'CUSTOM',
'endDate': None,
'startDate': None},
'standardFilters': None,
'supportsRoleHierarchy': False,
'userOrHierarchyFilterId': None}
And here is what's in r['reportMetadata']:
{'aggregates': ['s!rtms__Load__c.rtms__Carrier_Quote_Total__c', 'RowCount'],
'chart': None,
'crossFilters': [],
'currency': None,
'dashboardSetting': None,
'description': None,
'detailColumns': ['CUST_NAME',
'CUST_CREATED_DATE',
'rtms__Load__c.rtms__Expected_Ship_Date2__c',
'rtms__Load__c.rtms__Load_Status__c',
'rtms__Load__c.rtms__Total_Weight__c',
'rtms__Load__c.rtms__Equipment_Type__c',
'rtms__Load__c.rtms__Origin__c',
'rtms__Load__c.rtms__Destination__c',
'rtms__Load__c.rtms__Zip3_Lane__c',
'rtms__Load__c.rtms__Zip5_Lane__c',
'rtms__Load__c.rtms__Carrier_Quote_Total__c',
'rtms__Load__c.rtms__Customer_Quote_Total__c'],
'developerName': 'Adel_Past_Shipment_Test_Pricing_Tool',
'division': None,
'folderId': '00l1U000000eXWwQAM',
'groupingsAcross': [],
'groupingsDown': [],
'hasDetailRows': True,
'hasRecordCount': True,
'historicalSnapshotDates': [],
'id': '00O4Q000009VEPCUA4',
'name': 'Adel Past Shipment Test Pricing Tool',
'presentationOptions': {'hasStackedSummaries': True},
'reportBooleanFilter': None,
'reportFilters': [{'column': 'rtms__Load__c.rtms__Customer__c',
'filterType': 'fieldValue',
'isRunPageEditable': True,
'operator': 'contains',
'value': 'adel'},
{'column': 'rtms__Load__c.rtms__Load_Status__c',
'filterType': 'fieldValue',
'isRunPageEditable': True,
'operator': 'notContain',
'value': 'cancelled'}],
'reportFormat': 'TABULAR',
'reportType': {'label': 'Loads', 'type': 'CustomEntity$rtms__Load__c'},
'scope': 'organization',
'showGrandTotal': True,
'showSubtotals': True,
'sortBy': [{'sortColumn': 'CUST_CREATED_DATE', 'sortOrder': 'Desc'}],
'standardDateFilter': {'column': 'CUST_CREATED_DATE',
'durationValue': 'CUSTOM',
'endDate': None,
'startDate': None},
'standardFilters': None,
'supportsRoleHierarchy': False,
'userOrHierarchyFilterId': None}
code image
I've written a script in python to log in to a website and parse the username to make sure I've really been able to log in. Using the way I've tried below seems to get me there. However, I've used hardcoded cookies taken from chrome dev tools within the script to get success.
I've tried with:
import requests
from bs4 import BeautifulSoup
url = 'https://secure.imdb.com/ap/signin?openid.pape.max_auth_age=0&openid.return_to=https%3A%2F%2Fwww.imdb.com%2Fap-signin-handler&openid.identity=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.assoc_handle=imdb_pro_us&openid.mode=checkid_setup&siteState=eyJvcGVuaWQuYXNzb2NfaGFuZGxlIjoiaW1kYl9wcm9fdXMiLCJyZWRpcmVjdFRvIjoiaHR0cHM6Ly9wcm8uaW1kYi5jb20vIn0&openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0'
signin = 'https://secure.imdb.com/ap/signin'
mainurl = 'https://pro.imdb.com/'
with requests.Session() as s:
res = s.get(url,headers={"User-agent":"Mozilla/5.0"})
soup = BeautifulSoup(res.text,"lxml")
payload = {i['name']: i.get('value', '') for i in soup.select('input[name]')}
payload['email'] = 'some username'
payload['password'] = 'some password'
s.post(signin,data=payload,headers={
"User-agent":"Mozilla/5.0",
"Cookie": 'adblk=adblk_yes; ubid-main=130-2884709-6520735; _msuuid_518k2z41603=95C56F3B-E3C1-40E5-A47B-C4F7BAF2FF5D; _fbp=fb.1.1574621403438.97041399; pa=BCYm5GYAag-hj1CWg3cPXjfv2X6NGPUp6kLguepMku7Yf0W9-iSTjgmVNGmQLwUfJ5XJPHqlh84f%0D%0Agrd2voq0Q7TR_rdXU4T1BJw-1a-DdvCNSVuWSm50IXJDC_H4-wM_Qli_%0D%0A; uu=BCYnANeBBdnuTg3UKEVGDiO203C7KR0AQTdyE9Y_Y70vpd04N5QZ2bD3RwWdMBNMAJtdbRbPZMpG%0D%0AbPpC6vZvoMDzucwsE7pTQiKxY24Gr4_-0ONm7hGKPfPbMwvI1NYzy5ZhTIyIUqeVAQ7geCBiS5NS%0D%0A1A%0D%0A; session-id=137-0235974-9052660; session-id-time=2205351554; session-token=jsvzgJ4JY/TCgodelKegvXcqdLyAy4NTDO5/iEvk90VA8qWWEPJpiiRYAZe3V0EYVFlKq590mXU0OU9XMbAzwyKqXIzPLzKfLf3Cc3k0g/VQNTo6roAEa5IxmOGZjWrJuhkRZ1YgeF5uPZLcatWF1y5PFHqvjaDxQrf2LZbgRXF5N7vacTZ8maK0ciJmQEjh; csm-hit=tb:8HH0DWNBDVSWP881GYKG+s-8HH0DWNBDVSWP881GYKG|1574631571950&t:1574631571952&adb:adblk_yes'
})
r = s.get(mainurl,headers={
"Cookie": 'adblk=adblk_yes; ubid-main=130-2884709-6520735; _msuuid_518k2z41603=95C56F3B-E3C1-40E5-A47B-C4F7BAF2FF5D; _fbp=fb.1.1574621403438.97041399; pa=BCYm5GYAag-hj1CWg3cPXjfv2X6NGPUp6kLguepMku7Yf0W9-iSTjgmVNGmQLwUfJ5XJPHqlh84f%0D%0Agrd2voq0Q7TR_rdXU4T1BJw-1a-DdvCNSVuWSm50IXJDC_H4-wM_Qli_%0D%0A; csm-hit=tb:KV47B1QVKP4DNB3QGY95+b-NM69W1Y35R7ARV0639V5|1574631544432&t:1574631544432&adb:adblk_yes; session-id=137-0235974-9052660; session-id-time=2205351554; session-token="EsIzROiSTmFDfXd5jnBPIBOpYG9jAu7tiWXDF8R52sUw5jS6OjddfOOQB+ytCmq0K3UnXs9wKBvQtkB4aVNsXieVbRcIUrKf3iPnYeJchbOlShMjg+MR+O7IQgPKkw0BKihdYQ1YIl7KQS8VeLxZjtzJ5sj5ocnY72fCKdwq/fGOjfieFYbe9Km3a8h++1GpC738JbwcVdpTG08v1pjhQKifqPQXnqhcyVKhi8CD1qk="; x-main="C1KbtQgFFBAYfwttdRSrU5CpCe#Fn6SPHnBTY6dO2ppimt#u1P1L7G0PueQMn6X3"; at-main=Atza|IwEBICfS3UKNp2mwmbyUPY1QzjXRHMcL6fjv2ND7BDXsZ1G-qDPJKsLJXeU9gJOvRpWsofSpOJCyhnap-bIOWCutU6VMIS9bn3UkNVRP8WFVqrs-CLB5opLbrEx6YxVGQlfaxx54gzuuGO4D30z-AgBpGe64_bn0K1iLOT3P3i7S3nBzvP_0AopwKlbU7SRnE5m21cVfVK7bwbtfZO4cf7DrpGcaHK4dlY5jKHPzNx_AR4ypqsEBFbHon36N1j8foty6wLJhFP1gNCvs24mVCec24TRho5ZXFDYqhLB-dw9V3XY1eq7q1QNgtAdYkDSJ6Mq1nllFu59WqIVs1Y3lLEaxDUExLtCt-VQArpS_hZtZR8C_kevhV01jEhWg8RUQaCdYTMwZHwa778MiEOrrrdGqFnR5; sess-at-main="tWwUfkZLx+mDAPqZo+J6yJlnjqBJvYJ0oVMS6/NcIKQ="; id=BCYhnxuM-3g3WFo4uvCv6C5LdGLJKaIcZj8E-rQwU_YsF991I3Tqe94W6IlU27FvaNcnuCyv5Te3%0D%0A0c3O1mMYhEE14wMdByo2SvGXkBS0A4oFMJMEIe0aC1X4fyNRwWYNZ72a6NDzAOqeDQi3_7sZZGH8%0D%0AxQ%0D%0A; uu=BCYsGSOaee6VbhMOMXpG3F_6i7cTIkPCN0S0_Jv7c3bVkUQ5gp9vqtfvVlOMOIOqXv-uHSTSibBp%0D%0ATO1e4tRpT1DolY2qkoOW8yICF7ZrXqAgont_ShTy8zVEg1wxWCxg3_XQX8r8_dGFCO4NWZiyLH-f%0D%0A2RpBF2IJLUSd8R4UCbbbtgo%0D%0A; sid=BCYp9inRAYR9sJgmF1FcA9Vgto81vmiCYHP_gEVv6r2ZdBtz1bKtOQg4_0iSwREudsZrPM8SHMUk%0D%0A5jFMp74veGrdwNTf8DONXPUCExLgkHzfeoZr-KHf4VbI7aI5TrJhqSioYbEhHYqm6q5RGrXfCVPr%0D%0AqA%0D%0A'
})
sauce = BeautifulSoup(r.text,"lxml")
name = sauce.select_one("span.display-name").text
print(name)
I've tried with the following to see if it works to avoid using hardcoded cookies but unfortunately it failed:
cookie_string = "; ".join([str(x)+"="+str(y) for x,y in s.cookies.items()])
This is how I tried automatically:
cookie_string = "; ".join([str(x)+"="+str(y) for x,y in s.cookies.items()])
s.post(signin,data=payload,headers={
"User-agent":"Mozilla/5.0",
"Cookie": cookie_string
})
cookie_string_ano = "; ".join([str(x)+"="+str(y) for x,y in s.cookies.items()])
r = s.get(mainurl,headers={
"Cookie": cookie_string_ano
})
When I tried using above I can see that cookie_string,cookie_string_ano are producing session-id=130-0171771-5726549; session-id-time=2205475101l and session-id=130-0171771-5726549; session-id-time=2205475101l; ubid-main=135-8050026-6353151.
How can I fetch the username without using hardcoded cookies within the script?
To fetch cookies from Chrome dev tools, there is a need to interact with Google Chrome using Chrome DevTools Protocol within a Python script.
Here is a python plugin that gives you the privilege to get cookies. This will help you to overcome the issue related to hard-coded cookies. Visit Reference :
PyChromeDevTools.
Remember: Screen scraping is explicitly forbidden by the IMDb. Visit Reference IMDb Conditions of Use as given here that;
Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on
this site, except with our express wrote consent as noted below.
Prerequisites:
For this, you first have to set chrome path in system environment variables.
After this, you must run an instance of Google Chrome with the remote-debugging option - visit-reference: Remote debugging with Chrome Developer Tools.
Use the following command in command-prompt or terminal to run the instance as given;
chrome.exe --remote-debugging-port=9222 --user-data-dir=remote-profile
Workaround:
After running Google instance then you can run this program like in the following example.
import time
import requests
import PyChromeDevTools
from bs4 import BeautifulSoup
url = 'https://secure.imdb.com/ap/signin?openid.pape.max_auth_age=0&openid.return_to=https%3A%2F%2Fwww.imdb.com%2Fap-signin-handler&openid.identity=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.assoc_handle=imdb_pro_us&openid.mode=checkid_setup&siteState=eyJvcGVuaWQuYXNzb2NfaGFuZGxlIjoiaW1kYl9wcm9fdXMiLCJyZWRpcmVjdFRvIjoiaHR0cHM6Ly9wcm8uaW1kYi5jb20vIn0&openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0'
signin = 'https://secure.imdb.com/ap/signin'
mainurl = 'https://pro.imdb.com/'
def parse_cookies(input_url):
chrome = PyChromeDevTools.ChromeInterface()
chrome.Network.enable()
chrome.Page.enable()
chrome.Page.navigate(url=input_url)
time.sleep(2)
cookies = chrome.Network.getCookies()
return cookies["result"]["cookies"]
def get_cookies(parsed_cookie_string):
cookie_names = [sub_cookie['name'] for sub_cookie in parsed_cookie_string]
cookie_values = [sub_cookie['value'] for sub_cookie in parsed_cookie_string]
cookie_string = "; ".join([str(x) + "=" + str(y) for x, y in zip(cookie_names, cookie_values)])
return cookie_string
with requests.Session() as s:
res = s.get(url, headers={"User-agent": "Mozilla/5.0"})
soup = BeautifulSoup(res.text, "lxml")
payload = {i['name']: i.get('value', '') for i in soup.select('input[name]')}
payload['email'] = 'some username'
payload['password'] = 'some password'
cookie_string_for_post = parse_cookies(signin)
print("Cookies for Post Request:\n ", cookie_string_for_post)
cookie_string_for_get = parse_cookies(mainurl)
print("Cookies for Get Request:\n ", cookie_string_for_get)
post_req_cookies = get_cookies(cookie_string_for_post)
print("Post Cookie_String:\n ", post_req_cookies)
get_req_cookies = get_cookies(cookie_string_for_get)
print("Get Cookie_String:\n ", get_req_cookies)
s.post(signin, data=payload, headers={
"User-agent": "Mozilla/5.0",
"Cookie": post_req_cookies
})
r = s.get(mainurl, headers={
"Cookie": get_req_cookies
})
sauce = BeautifulSoup(r.text, "lxml")
name = sauce.select_one("span.display-name").text
print("User-Name:", name)
In the above script, I have maintained two methods:
parse_cookies(input_url) # To Parse Cookies from IMDB before and after sign-in
get_cookies(parsed_cookie_string) # To do slicing for { name=values; } pattern
Here are results from above script;
Cookies for Post Request:
[{'name': 'csm-hit', 'value': 'adb:adblk_no&t:1575551929829', 'domain': 'secure.imdb.com', 'path': '/', 'expires': 1636031929, 'size': 35, 'httpOnly': False, 'secure': False, 'session': False}, {'name': 'session-token', 'value': 'ojv7WWBxadoA7dlcquiw9uErP2rhrTH7rHbpVhoRy4T+qTDfhwZKdDt5jOeGfZp1TKvwtzTGuJ6pOltjNFPiIuP5Rd5Vw8/e1J3RY/iye5tEh7qoRC2NHF9wc003xKG3PPAAdmgf8/mv8GeLAOOKNgWKBTUeMre9xbj5GzXxZBPdXMZttHrMYqKKSuwWLpa0', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035367.931534, 'size': 205, 'httpOnly': True, 'secure': True, 'session': False}, {'name': '_msuuid_518k2z41603', 'value': '7EFA48D9-B808-4A94-AF25-DF946D700AE7', 'domain': '.imdb.com', 'path': '/', 'expires': 1607087673, 'size': 55, 'httpOnly': False, 'secure': False, 'session': False}, {'name': 'uu', 'value': 'BCYrG0JCGIzGSiHxLJnhMiZmYPKjX1M_R2SYqoaFp8H_0KTtNvuGu-u_h_WO9yjlPz2CTdiUs86i%0D%0Az7kP7F-mJu5OZVpOKhquJmQf7Ks8_flkk2XlZzTPnz7R4WTBpqeRfxQqr0M9q54Gvnd0f5s1lajr%0D%0AVA%0D%0A', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035262.37521, 'size': 174, 'httpOnly': False, 'secure': True, 'session': False}, {'name': 'ubid-main', 'value': '130-4270133-5864707', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035317.315112, 'size': 28, 'httpOnly': False, 'secure': True, 'session': False}, {'name': 'adblk', 'value': 'adblk_no', 'domain': '.imdb.com', 'path': '/', 'expires': 1607087639, 'size': 13, 'httpOnly': False, 'secure': False, 'session': False}, {'name': '_fbp', 'value': 'fb.1.1575551679007.40322953', 'domain': '.imdb.com', 'path': '/', 'expires': 1583327724, 'size': 31, 'httpOnly': False, 'secure': False, 'session': False}, {'name': 'session-id', 'value': '130-3480383-2108806', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035262.375339, 'size': 29, 'httpOnly': False, 'secure': True, 'session': False}, {'name': 'session-id-time', 'value': '2206271615', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035262.375396, 'size': 25, 'httpOnly': False, 'secure': True, 'session': False}]
Cookies for Get Request:
[{'name': 'vuid', 'value': 'pl1203459194.1031556308', 'domain': '.vimeo.com', 'path': '/', 'expires': 1638623938, 'size': 27, 'httpOnly': False, 'secure': False, 'session': False}, {'name': 'session-token', 'value': 'ojv7WWBxadoA7dlcquiw9uErP2rhrTH7rHbpVhoRy4T+qTDfhwZKdDt5jOeGfZp1TKvwtzTGuJ6pOltjNFPiIuP5Rd5Vw8/e1J3RY/iye5tEh7qoRC2NHF9wc003xKG3PPAAdmgf8/mv8GeLAOOKNgWKBTUeMre9xbj5GzXxZBPdXMZttHrMYqKKSuwWLpa0', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035367.931534, 'size': 205, 'httpOnly': True, 'secure': True, 'session': False}, {'name': '_msuuid_518k2z41603', 'value': '7EFA48D9-B808-4A94-AF25-DF946D700AE7', 'domain': '.imdb.com', 'path': '/', 'expires': 1607087673, 'size': 55, 'httpOnly': False, 'secure': False, 'session': False}, {'name': 'uu', 'value': 'BCYrG0JCGIzGSiHxLJnhMiZmYPKjX1M_R2SYqoaFp8H_0KTtNvuGu-u_h_WO9yjlPz2CTdiUs86i%0D%0Az7kP7F-mJu5OZVpOKhquJmQf7Ks8_flkk2XlZzTPnz7R4WTBpqeRfxQqr0M9q54Gvnd0f5s1lajr%0D%0AVA%0D%0A', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035262.37521, 'size': 174, 'httpOnly': False, 'secure': True, 'session': False}, {'name': 'ubid-main', 'value': '130-4270133-5864707', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035317.315112, 'size': 28, 'httpOnly': False, 'secure': True, 'session': False}, {'name': 'adblk', 'value': 'adblk_no', 'domain': '.imdb.com', 'path': '/', 'expires': 1607087639, 'size': 13, 'httpOnly': False, 'secure': False, 'session': False}, {'name': '_fbp', 'value': 'fb.1.1575551679007.40322953', 'domain': '.imdb.com', 'path': '/', 'expires': 1583327724, 'size': 31, 'httpOnly': False, 'secure': False, 'session': False}, {'name': 'session-id', 'value': '130-3480383-2108806', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035262.375339, 'size': 29, 'httpOnly': False, 'secure': True, 'session': False}, {'name': 'session-id-time', 'value': '2206271615', 'domain': '.imdb.com', 'path': '/', 'expires': 3723035262.375396, 'size': 25, 'httpOnly': False, 'secure': True, 'session': False}]
Post Cookie_String:
csm-hit=adb:adblk_no&t:1575551929829; session-token=ojv7WWBxadoA7dlcquiw9uErP2rhrTH7rHbpVhoRy4T+qTDfhwZKdDt5jOeGfZp1TKvwtzTGuJ6pOltjNFPiIuP5Rd5Vw8/e1J3RY/iye5tEh7qoRC2NHF9wc003xKG3PPAAdmgf8/mv8GeLAOOKNgWKBTUeMre9xbj5GzXxZBPdXMZttHrMYqKKSuwWLpa0; _msuuid_518k2z41603=7EFA48D9-B808-4A94-AF25-DF946D700AE7; uu=BCYrG0JCGIzGSiHxLJnhMiZmYPKjX1M_R2SYqoaFp8H_0KTtNvuGu-u_h_WO9yjlPz2CTdiUs86i%0D%0Az7kP7F-mJu5OZVpOKhquJmQf7Ks8_flkk2XlZzTPnz7R4WTBpqeRfxQqr0M9q54Gvnd0f5s1lajr%0D%0AVA%0D%0A; ubid-main=130-4270133-5864707; adblk=adblk_no; _fbp=fb.1.1575551679007.40322953; session-id=130-3480383-2108806; session-id-time=2206271615
Get Cookie_String:
vuid=pl1203459194.1031556308; session-token=ojv7WWBxadoA7dlcquiw9uErP2rhrTH7rHbpVhoRy4T+qTDfhwZKdDt5jOeGfZp1TKvwtzTGuJ6pOltjNFPiIuP5Rd5Vw8/e1J3RY/iye5tEh7qoRC2NHF9wc003xKG3PPAAdmgf8/mv8GeLAOOKNgWKBTUeMre9xbj5GzXxZBPdXMZttHrMYqKKSuwWLpa0; _msuuid_518k2z41603=7EFA48D9-B808-4A94-AF25-DF946D700AE7; uu=BCYrG0JCGIzGSiHxLJnhMiZmYPKjX1M_R2SYqoaFp8H_0KTtNvuGu-u_h_WO9yjlPz2CTdiUs86i%0D%0Az7kP7F-mJu5OZVpOKhquJmQf7Ks8_flkk2XlZzTPnz7R4WTBpqeRfxQqr0M9q54Gvnd0f5s1lajr%0D%0AVA%0D%0A; ubid-main=130-4270133-5864707; adblk=adblk_no; _fbp=fb.1.1575551679007.40322953; session-id=130-3480383-2108806; session-id-time=2206271615
User-Name: **Logged in user-name**
Seems like you are copying the cookies from browser, so here i'll go with this theory.
The first post api you hit, sets some cookies, returns a page, which calls some further urls, which set more cookies, and this goes on. Try checking all the requests in the network tab to see if there are multiple calls, which set different cookies.
If there are, you need to call all of them in the order they are called in the page, each call adding new cookies, and then, finally you should be able to see all the cookies that you are copying.
However, if a random data is being calculated and sent in any of the calls, it might be for csrf protection or bot protection, in which case, you are better off using http://www.omdbapi.com/ or https://imdbpy.github.io/ to access official APIs instead of internal ones.