extract a link from a web page with python

extract a link from a web page with python - python

I'm writing a code in which it sends the site's link in a chat (I know how to do this part), I make the request, but in this request it returns other things along with the link, how do I get only the link?
link = requests.get(f"https://sugoi-api.herokuapp.com/episode/{Episodio}/{AnimeN}")
resultado = link.json()
this is the result:
{'status': 200, 'info': {'name': 'Naruto classico', 'slug': 'naruto-classico', 'fc': 'N', 'epi': '12'}, 'cdn': [{'name': 'Superanimes', 'url': 'https://cdn.superanimes.tv/', 'links': ['https://cdn.superanimes.tv/010/animes/n/naruto-classico-dublado/12.mp4', 'https://cdn.superanimes.tv/010/animes/n/naruto-classico-legendado/12.mp4%27]%7D, {'name': 'Serverotaku', 'url': 'https://cdn.serverotaku01.co/', 'links': ['https://cdn.serverotaku01.co/010/animes/n/naruto-classico-dublado/12.mp4', 'https://cdn.serverotaku01.co/010/animes/n/naruto-classico-legendado/12.mp4%27]%7D, {'name': 'Servertv', 'url': 'https://servertv001.com/', 'links': ['https://servertv001.com/animes/n/naruto-classico-dublado/12.mp4', 'https://servertv001.com/animes/n/naruto-classico-legendado/12.mp4%27]%7D]%7D
if someone knows how to get only the result link it would help me a lot

One simple way to extract URL from any data (general) is mentioned below. First, Convert the json output you got into a string and then use regular expression.
str = json.dumps({'status': 200, 'info':........})
import re
re.findall("(?P<url>https?://[^\s]+)", str)

Related

Using requests.json() returns a list, not JSON

I'm trying to get the tag_name from a GitHub pull request but I always get a list and not a JSON, no matter what I do. I want to be able to separate tag_name and use it for other things.
Code:
json_file = requests.get(url = "https://api.github.com/repos/USERNAME/REPONAME/releases", auth=("USERNAME","TOKEN")).json()
It always returns a list.
"[{\"url\":\"https://api.github.com/repos/USERNAME/REPONAME/releases/42178803\",\"assets_url\":\"https://api.github.com/repos/USERNAME/REPONAME/releases/42178803/assets\",\"upload_url\":\"https://uploads.github.com/repos/USERNAME/REPONAME/releases/42178803/assets{?name,label}\",\"html_url\":\"https://github.com/USERNAME/REPONAME/releases/tag/v1.0-pre\",\"id\":42178803,\"author\":{\"login\":\"USERNAME\",\"id\":72929861,\"node_id\":\"MDQ6VXNlcjcyOTI5ODYx\",\"avatar_url\":\"https://avatars.githubusercontent.com/u/72929861?v=4\",\"gravatar_id\":\"\",\"url\":\"https://api.github.com/users/USERNAME\",\"html_url\":\"https://github.com/USERNAME\",\"followers_url\":\"https://api.github.com/users/USERNAME/followers\",\"following_url\":\"https://api.github.com/users/USERNAME/following{/other_
[{'url': 'https://api.github.com/repos/USERNAME/REPONAME/releases/42178803', 'assets_url': 'https://api.github.com/repos/USERNAME/REPONAME/releases/42178803/assets', 'upload_url': 'https://uploads.github.com/repos/USERNAME/REPONAME/releases/42178803/assets{?name,label}', 'html_url': 'https://github.com/USERNAME/REPONAME/releases/tag/v1.0-pre', 'id': 42178803, 'author': {'login': 'USERNAME', 'id': 72929861, 'node_id': 'MDQ6VXNlcjcyOTI5ODYx', 'avatar_url': 'https://avatars.githubusercontent.com/u/72929861?v=4', 'gravatar_id': '', 'url': 'https://api.github.com/users/USERNAME', 'html_url': 'https://github.com/USERNAME', 'followers_url': 'https://api.github.com/users/USERNAME/followers', 'following_url': 'https://api.github.com/users/USERNAME/following{/other_user}', 'gists_url': 'https://api.github.com/users/USERNAME/gists{/gist_id}', 'starred_url': 'https://api.github.com/users/USERNAME/starred{/owner}{/repo}', 'subscriptions_url': 'https://api.github.com/users/USERNAME/subscriptions', 'organizations_url': 'https://api.github.com/users/USERNAME/orgs', 'repos_url': 'https://api.github.com/users/USERNAME/repos', 'events_url': 'https://api.github.com/users/USERNAME/events{/privacy}', 'received_events_url': 'https://api.github.com/users/USERNAME/received_events', 'type': 'User', 'site_admin': False}, 'node_id': 'MDc6UmVsZWFzZTQyMTc4ODAz', 'tag_name': 'v1.0-pre', 'target_commitish': 'main', 'name': 'Prerelease', 'draft': False, 'prerelease': True, 'created_at': '2021-04-29T05:11:16Z', 'published_at': '2021-04-29T05:17:06Z', 'assets': [], 'tarball_url': 'https://api.github.com/repos/USERNAME/REPONAME/tarball/v1.0-pre', 'zipball_url': 'https://api.github.com/repos/USERNAME/REPONAME/zipball/v1.0-pre', 'body': 'Prerelease for a little test'}]
(just did a print(json_file.json()), here is the result)
I have no idea what is happening, Please help.

It should return a list. After all, what you get from the releases endpoint, e.g. https://api.github.com/repos/python-babel/babel/releases, is a list.
import requests
resp = requests.get("https://api.github.com/repos/python-babel/babel/releases")
resp.raise_for_status()
releases_list = resp.json()
for release in releases_list:
print(release["tag_name"])
prints out
v2.9.1
v2.9.0
v2.8.1
v2.8.0
v2.7.0
v2.6.0
v2.5.3
v2.5.2
v2.5.1
v2.5.0
v2.4.0
2.3.4
2.3.2
2.3.1
2.2.0
2.1.1

Parsing data in a JSON file

I am wanting to parse some information from a JSON file. I cant find a find to successfully retrieve the data I want.
In a file I want to output the profile name.
This is the code on how I am reading and parsing.
with open(json_data) as f:
accounts = dict(json.loads(f.read()))
shell_script = accounts['OCredit']['Profile Name']
print(shell_script)
This gives me the output of
OCredit
In a sense this is what I want, but in the application the value thats now "OCredit"(first bracket) would depend on the user.
with open(json_data) as f:
accounts = dict(json.loads(f.read()))
shell_script = accounts['OCredit']
print(shell_script)
This outputs :
{'Profile Name': 'OCredit', 'Name': 'Andrew Long', 'Email':
'asasa#yahoo.com', 'Tel': '2134568790', 'Address': '213 clover ','Zip':
'95305', 'City': 'brooklyn', 'State': 'NY','CreditCard':'213456759090',
'EXP': '12/21', 'CVV': '213'}
The actual JSON file is :
{'OCredit': {'Profile Name': 'OCredit',
'Name': 'Andrew Long',
'Email': 'asasa#yahoo.com',
'Tel': '2134568790',
'Address': '213 clover ',
'Zip': '95305',
'City': 'Brooklyn',
'State': 'NY',
'CreditCard': '213456759090',
'EXP': '12/21',
'CVV': '213'}}
So, to sum it up. I want to get inside JSON file and just print out the value that "Profile Name" has without hardcoding the fist value of the bracket.
Im not sure if I have to change the way im saving the JSON file to achieve this. Any help would be appreciated.

Try this:
for key in accounts:
print(accounts[key]['Profile Name'])
# OCredit
Or:
for value in accounts.values():
print(value['Profile Name'])

Accessing keys/values in a paginated/nested dictionary

I know that somewhat related questions have been asked here: Accessing key, value in a nested dictionary and here: python accessing elements in a dictionary inside dictionary among other places but I can't quite seem to apply the answers' methodology to my issue.
I'm getting a KeyError trying to access the keys within response_dict, which I know is due to it being nested/paginated and me going about this the wrong way. Can anybody help and/or point me in the right direction?
import requests
import json
URL = "https://api.constantcontact.com/v2/contacts?status=ALL&limit=1&api_key=<redacted>&access_token=<redacted>"
#make my request, store it in the requests object 'r'
r = requests.get(url = URL)
#status code to prove things are working
print (r.status_code)
#print what was retrieved from the API
print (r.text)
#visual aid
print ('---------------------------')
#decode json data to a dict
response_dict = json.loads(r.text)
#show how the API response looks now
print(response_dict)
#just for confirmation
print (type(response_dict))
print('-------------------------')
# HERE LIES THE ISSUE
print(response_dict['first_name'])
And my output:
200
{"meta":{"pagination":{}},"results":[{"id":"1329683950","status":"ACTIVE","fax":"","addresses":[{"id":"4e19e250-b5d9-11e8-9849-d4ae5275509e","line1":"222 Fake St.","line2":"","line3":"","city":"Kansas City","address_type":"BUSINESS","state_code":"","state":"OK","country_code":"ve","postal_code":"19512","sub_postal_code":""}],"notes":[],"confirmed":false,"lists":[{"id":"1733488365","status":"ACTIVE"}],"source":"Site Owner","email_addresses":[{"id":"1fe198a0-b5d5-11e8-92c1-d4ae526edd6c","status":"ACTIVE","confirm_status":"NO_CONFIRMATION_REQUIRED","opt_in_source":"ACTION_BY_OWNER","opt_in_date":"2018-09-11T18:18:20.000Z","email_address":"rsmith#fake.com"}],"prefix_name":"","first_name":"Robert","middle_name":"","last_name":"Smith","job_title":"I.T.","company_name":"FBI","home_phone":"","work_phone":"5555555555","cell_phone":"","custom_fields":[],"created_date":"2018-09-11T15:12:40.000Z","modified_date":"2018-09-11T18:18:20.000Z","source_details":""}]}
---------------------------
{'meta': {'pagination': {}}, 'results': [{'id': '1329683950', 'status': 'ACTIVE', 'fax': '', 'addresses': [{'id': '4e19e250-b5d9-11e8-9849-d4ae5275509e', 'line1': '222 Fake St.', 'line2': '', 'line3': '', 'city': 'Kansas City', 'address_type': 'BUSINESS', 'state_code': '', 'state': 'OK', 'country_code': 've', 'postal_code': '19512', 'sub_postal_code': ''}], 'notes': [], 'confirmed': False, 'lists': [{'id': '1733488365', 'status': 'ACTIVE'}], 'source': 'Site Owner', 'email_addresses': [{'id': '1fe198a0-b5d5-11e8-92c1-d4ae526edd6c', 'status': 'ACTIVE', 'confirm_status': 'NO_CONFIRMATION_REQUIRED', 'opt_in_source': 'ACTION_BY_OWNER', 'opt_in_date': '2018-09-11T18:18:20.000Z', 'email_address': 'rsmith#fake.com'}], 'prefix_name': '', 'first_name': 'Robert', 'middle_name': '', 'last_name': 'Smith', 'job_title': 'I.T.', 'company_name': 'FBI', 'home_phone': '', 'work_phone': '5555555555', 'cell_phone': '', 'custom_fields': [], 'created_date': '2018-09-11T15:12:40.000Z', 'modified_date': '2018-09-11T18:18:20.000Z', 'source_details': ''}]}
<class 'dict'>
-------------------------
Traceback (most recent call last):
File "C:\Users\rkiek\Desktop\Python WIP\Chris2.py", line 20, in <module>
print(response_dict['first_name'])
KeyError: 'first_name'

first_name = response_dict["results"][0]["first_name"]
Even though I think this question would be better answered by yourself by reading some documentation, I will explain what is going on here. You see the dict-object of the man named "Robert" is within a list which is a value under the key "results". So, at first you need to access the value within results which is a python-list.
Then you can use a loop to iterate through each of the elements within the list, and treat each individual element as a regular dictionary object.
results = response_dict["results"]
results = response_dict.get("results", None)
# use any one of the two above, the first one will throw a KeyError if there is no key=="results" the other will return NULL
# this results is now a list according to the data you mentioned.
for item in results:
print(item.get("first_name", None)
# here you can loop through the list of dictionaries and treat each item as a normal dictionary

Extract specific keys from list of dict in python. Sentinelhub

I seem to be stuck on very simple task. I'm still dipping my toes into Python.
I'm trying to download Sentinel 2 Images with SentinelHub API:SentinelHub
The result of data that my code returns is like this:
{'geometry': {'coordinates': [[[[35.895906644, 31.602691754],
[36.264307655, 31.593801516],
[36.230618703, 30.604681346],
[35.642363693, 30.617971909],
[35.678587829, 30.757888786],
[35.715700562, 30.905919341],
[35.754290061, 31.053632806],
[35.793289298, 31.206946419],
[35.895906644, 31.602691754]]]],
'type': 'MultiPolygon'},
'id': 'ee923fac-0097-58a8-b861-b07d89b99310',
'properties': {'**productType**': '**S2MSI1C**',
'centroid': {'coordinates': [18.1321538275, 31.10368655], 'type': 'Point'},
'cloudCover': 10.68,
'collection': 'Sentinel2',
'completionDate': '2017-06-07T08:15:54Z',
'description': None,
'instrument': 'MSI',
'keywords': [],
'license': {'description': {'shortName': 'No license'},
'grantedCountries': None,
'grantedFlags': None,
'grantedOrganizationCountries': None,
'hasToBeSigned': 'never',
'licenseId': 'unlicensed',
'signatureQuota': -1,
'viewService': 'public'},
'links': [{'href': 'http://opensearch.sentinel-hub.com/resto/collections/Sentinel2/ee923fac-0097-58a8-b861-b07d89b99310.json?&lang=en',
'rel': 'self',
'title': 'GeoJSON link for ee923fac-0097-58a8-b861-b07d89b99310',
'type': 'application/json'}],
'orbitNumber': 10228,
'organisationName': None,
'parentIdentifier': None,
'platform': 'Sentinel-2',
'processingLevel': '1C',
'productIdentifier': 'S2A_OPER_MSI_L1C_TL_SGS__20170607T120016_A010228_T36RYV_N02.05',
'published': '2017-07-26T13:09:17.405352Z',
'quicklook': None,
'resolution': 10,
's3Path': 'tiles/36/R/YV/2017/6/7/0',
's3URI': 's3://sentinel-s2-l1c/tiles/36/R/YV/2017/6/7/0/',
'sensorMode': None,
'services': {'download': {'mimeType': 'text/html',
'url': 'http://sentinel-s2-l1c.s3-website.eu-central-1.amazonaws.com#tiles/36/R/YV/2017/6/7/0/'}},
'sgsId': 2168915,
'snowCover': 0,
'spacecraft': 'S2A',
'startDate': '2017-06-07T08:15:54Z',
'thumbnail': None,
'title': 'S2A_OPER_MSI_L1C_TL_SGS__20170607T120016_A010228_T36RYV_N02.05',
'updated': '2017-07-26T13:09:17.405352Z'},
'type': 'Feature'}
Can you explain how can I iterate through this set of data and extract only 'productType'? For example, if there are several similar data sets it would return only different product types.
My code is :
import matplotlib.pyplot as plt
import numpy as np
from sentinelhub import AwsProductRequest, AwsTileRequest, AwsTile, BBox, CRS
betsiboka_coords_wgs84 = [31.245117,33.897777,34.936523,36.129002]
bbox = BBox(bbox=betsiboka_coords_wgs84, crs=CRS.WGS84)
date= '2017-06-05',('2017-06-08')
data=sentinelhub.opensearch.get_area_info(bbox, date_interval=date, maxcc=None)
for i in data:
print(i)

Based on what you have provided, replace your bottom for loop:
for i in data:
print(i)
with the following:
for i in data:
print(i['properties']['**productType**'])

If you want to access only the propertyType you can use i['properties']['productType'] in your for loop. If you want to access it any time you want without writing each time those keys, you can define a generator like this:
def property_types(data_array):
for data in data_array
yield data['properties']['propertyType']
So you can use it like this in a loop (your data_array is data, as returned by sentinelhub api):
for property_type in property_types(data):
# do stuff with property_type

keys = []
for key in d.keys():
if key == 'properties':
for k in d[key].keys():
if k == '**productType**' and k not in keys:
keys.append(d[key][k])
print(keys)

Getting only specific (nested) values: Since your request key is nested, and resides inside the parent "properties" object, you need to access it first, preferably using the get method. This can be done as follows (note the '{}' parameter in the first get, this returns an empty dictionary if the first key is not present)
data_dictionary = json.loads(data_string)
product_type = data_dictionary.get('properties', {}).get('**productType**')
You can then aggregate the different product_type objects in a set, which will automatically guarantee that no 2 objects are the same
product_type_set = set()
product_type.add(product_type)

How to get json elements from http response string in python

I am new in json parsing from http api in python.Currently i have parsed http content as string in python which have json object array my code is given bellow
import json
from urllib.request import urlopen
apilink=urlopen("api link")
data=json.loads(apilink.read().decode())
print(data)
and my current output is
{'Message': 'Success', 'Data': '[{"Did":"c055c3d2f3314725b69965e6c55adb5b","InsertedDate":"2017-08-02 7:27:11 AM","UpdatedDate":"2017-08-02 9:33:16 AM","CreatedBy":"1","UpdatedBy":"1","Name":"Hello World","ModuleName":"Rpt_Hello_World","ApplicationName":"Asset Inventory","Published":"true","UserId":"1","PostProcessor":""}]', 'Status': 'Success'}
but i want to extract only attribute 'data' that is json array
'Data': '[{"Did":"c055c3d2f3314725b69965e6c55adb5b","InsertedDate":"2017-08-02 7:27:11 AM","UpdatedDate":"2017-08-02 9:33:16 AM","CreatedBy":"1","UpdatedBy":"1","Name":"Hello World","ModuleName":"Rpt_Hello_World","ApplicationName":"Asset Inventory","Published":"true","UserId":"1","PostProcessor":""}]'
desirble part is
[{"Did":"c055c3d2f3314725b69965e6c55adb5b","InsertedDate":"2017-08-02 7:27:11 AM","UpdatedDate":"2017-08-02 9:33:16 AM","CreatedBy":"1","UpdatedBy":"1","Name":"Hello World","ModuleName":"Rpt_Hello_World","ApplicationName":"Asset Inventory","Published":"true","UserId":"1","PostProcessor":""}]
kindly help me to solve this.
thank you

data is a dictionary. Use dict indexing. You need the value associated with Data:
In [876]: data['Data']
Out[876]: '[{"Did":"c055c3d2f3314725b69965e6c55adb5b","InsertedDate":"2017-08-02 7:27:11 AM","UpdatedDate":"2017-08-02 9:33:16 AM","CreatedBy":"1","UpdatedBy":"1","Name":"Hello World","ModuleName":"Rpt_Hello_World","ApplicationName":"Asset Inventory","Published":"true","UserId":"1","PostProcessor":""}]'
This is a string. You can use json.loads one more time.
In [877]: json.loads(data['Data'])
Out[877]:
[{'ApplicationName': 'Asset Inventory',
'CreatedBy': '1',
'Did': 'c055c3d2f3314725b69965e6c55adb5b',
'InsertedDate': '2017-08-02 7:27:11 AM',
'ModuleName': 'Rpt_Hello_World',
'Name': 'Hello World',
'PostProcessor': '',
'Published': 'true',
'UpdatedBy': '1',
'UpdatedDate': '2017-08-02 9:33:16 AM',
'UserId': '1'}]

thats simple
format: jsonName[attributename]
data['Data']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

extract a link from a web page with python - python

One simple way to extract URL from any data (general) is mentioned below. First, Convert the json output you got into a string and then use regular expression. str = json.dumps({'status': 200, 'info':........}) import re re.findall("(?P<url>https?://[^\s]+)", str)

Related

Using requests.json() returns a list, not JSON

Parsing data in a JSON file

Accessing keys/values in a paginated/nested dictionary

Extract specific keys from list of dict in python. Sentinelhub

How to get json elements from http response string in python

Categories

Resources