Finding all URLs in a single line - python

I'm trying to to fetch a page that has many urls and other stuff all in just one line in a plain text like
"link_url":"http://www.example.com/link1?site=web","mobile_link_url":"http://m.example.com/episode/link1?site=web" link_url":"http://www.example.com/link2?site=web","mobile_link_url":"http://m.example.com/episode/link2?site=web"
i tired
import re
import requests as req
response = req.get("http://api.example.com/?callback=jQuery112")
content = response.text
print content will give me the "link_url": output
but i need to find
http://www.example.com/link1?site=web
http://www.example.com/link2?site=web
and output only link1 and link2 to a file like
link1
link2
link3

The code below might be what you need.
import re
urls = '''"link_url":"http://www.example.com/link1?site=web","mobile_link_url":"http://m.example.com/episode/link1?site=web" link_url":"http://www.example.com/link2?site=web","mobile_link_url":"http://m.example.com/episode/link2?site=web"'''
links = re.findall(r'http://www[a-z/.?=:]+(link\d)+', urls)
print(links)

If it is a string and not a JSON object, then you could do this even though it's a bit hacky:
s1 ="\"link_url\":\"http://www.example.com/link1?site=web\",\"mobile_link_url\":\"http://m.example.com/episode/link1?site=web\" link_url\":\"http://www.example.com/link2?site=web\",\"mobile_link_url\":\"http://m.example.com/episode/link2?site=web\""
links = [x for x in s1.replace("\":\"", "LINK_DELIM").replace("\"", "").replace(" ", ",").split(",")]
for link in links:
print(link.split("LINK_DELIM")[1])
Which yields:
http://www.example.com/link1?site=web
http://m.example.com/episode/link1?site=web
http://www.example.com/link2?site=web
http://m.example.com/episode/link2?site=web
Though I think #al76's answer is more elegant for this.
But if it's a JSON which looks like:
[
{
"link_url": "http://www.example.com/link1?site=web",
"mobile_link_url": "http://m.example.com/episode/link1?site=web"
},
{
"link_url": "http://www.example.com/link2?site=web",
"mobile_link_url": "http://m.example.com/episode/link2?site=web"
}
]
Then you could do something like:
import json
s1 = "[{ \"link_url \": \"http://www.example.com/link1?site=web \", \"mobile_link_url \": \"http://m.example.com/episode/link1?site=web \"}, { \"link_url \": \"http://www.example.com/link2?site=web \", \"mobile_link_url \": \"http://m.example.com/episode/link2?site=web \"} ]"
data = json.loads(s1)
links = [y for x in data for y in x.values()]
for link in links:
print(link)

If this is a JSON api then you can use response.json() to get a python dictionary, as .text will give you the response as one long string.
You also do not need to use regex for something so simple, python comes with a url parser out of the box.
So provided your response is something like
[
{
"link_url": "http://www.example.com/link1?site=web",
"mobile_link_url": "http://m.example.com/episode/link1?site=web"
},
{
"link_url": "http://www.example.com/link2?site=web",
"mobile_link_url": "http://m.example.com/episode/link2?site=web"
}
]
(doesn't matter if IRL it's one line, as long as it's valid JSON)
You can iterate the results as a dictionary, then use urlparse to get specific components of your urls:
from urllib.parse import urlparse
import requests
response = requests.get("http://api.example.com/?callback=jQuery112")
for urls in response.json():
print(urlparse(url["link_url"]).path.rsplit('/', 1)[-1])
urlparse(...).path will return the path of your url only, eg. episode/link1, and we then we just get the last segment of that with rsplit to just get link1, link2 etc.

try
urls=""" "link_url":"http://www.example.com/link1?site=web","mobile_link_url":"http://m.example.com/episode/link1?site=web" link_url":"http://www.example.com/link2?site=web","mobile_link_url":"http://m.example.com/episode/link2?site=web" """
re.findall(r'"http://www[^"]+"',urls)

urls=""" "link_url":"http://www.example.com/link1?site=web","mobile_link_url":"http://m.example.com/episode/link1?site=web" link_url":"http://www.example.com/link2?site=web","mobile_link_url":"http://m.example.com/episode/link2?site=web" """
p = [i.split('":')[1] for i in urls.replace(' ', ",").split(",")[1:-1]]
#### Output ####
['"http://www.example.com/link1?site=web"',
'"http://m.example.com/episode/link1?site=web"',
'"http://www.example.com/link2?site=web"',
'"http://m.example.com/episode/link2?site=web"']
*Not as efficient as regex.

Related

How to replace URL parts with regex in Python?

I have a JSON file with URLs that looks something like this:
{
"a_url": "https://foo.bar.com/x/env_t/xyz?asd=dfg",
"b_url": "https://foo.bar.com/x/env_t/xyz?asd=dfg",
"some other property": "blabla",
"more stuff": "yep",
"c_url": "https://blabla.com/somethingsomething?maybe=yes"
}
In this JSON, I want to look up all URLs that have a specific format, and then replace some parts in it.
In URLs that have the format of the first 2 URLs, I want to replace "foo" by "fooa" and "env_t" by "env_a", so that the output looks like this:
{
"a_url": "https://fooa.bar.com/x/env_a/xyz?asd=dfg",
"b_url": "https://fooa.bar.com/x/env_a/xyz?asd=dfg",
"some other property": "blabla",
"more stuff": "yep",
"c_url": "https://blabla.com/somethingsomething?maybe=yes"
}
I can't figure out how to do this. I came up with this regex:
https://foo([a-z]?)\.bar\.com/x/(.+)/.+\"
In regex101 this matches my URLs and captures the groups that I'm seeking to replace, but I can't figure out how to do this with Python's regex.sub().
1. Using regex
import regex
url = 'https://foo.bar.com/x/env_t/xyz?asd=dfg'
new_url = regex.sub(
'(env_t)',
'env_a',
regex.sub('(foo)', 'fooa', url)
)
print(new_url)
output:
https://fooa.bar.com/x/env_a/xyz?asd=dfg
2. Using str.replace
with open('./your-json-file.json', 'r+') as f:
content = f.read()
new_content = content \
.replace('foo', 'fooa') \
.replace('env_t', 'env_a')
f.seek(0)
f.write(new_content)
f.truncate()

Why am I getting this error "TypeError: string indices must be integers" when trying to fetch data from an api?

json file =
{
"success": true,
"terms": "https://curr
"privacy": "https://cu
"timestamp": 162764598
"source": "USD",
"quotes": {
"USDIMP": 0.722761,
"USDINR": 74.398905,
"USDIQD": 1458.90221
}
}
The json file is above. i deleted lot of values from the json as it took too many spaces. My python code is in below.
import urllib.request, urllib.parse, urllib.error
import json
response = "http://api.currencylayer.com/live?access_key="
api_key = "42141e*********************"
parms = dict()
parms['key'] = api_key
url = response + urllib.parse.urlencode(parms)
mh = urllib.request.urlopen(url)
source = mh.read().decode()
data = json.loads(source)
pydata = json.dumps(data, indent=2)
print("which curreny do you want to convert USD to?")
xm = input('>')
print(f"Hoe many USD do you want to convert{xm}to")
value = input('>')
fetch = pydata["quotes"][0]["USD{xm}"]
answer = fetch*value
print(fetch)
--------------------------------
Here is the
output
"fetch = pydata["quotes"][0]["USD{xm}"]
TypeError: string indices must be integers"
First of all the JSON data you posted here is not valid. There are missing quotes and commas. For example here "terms": "https://curr. It has to be "terms": "https://curr",. The same at "privacy" and the "timestamp" is missing a comma. After i fixed the JSON data I found a solution. You have to use data not pydata. This mean you have to change fetch = pydata["quotes"][0]["USD{xm}"] to fetch = data["quotes"][0]["USD{xm}"]. But this would result in the next error, which would be a KeyError, because in the JSON data you provided us there is no array after the "qoutes" key. So you have to get rid of this [0] or the json data has to like this:
"quotes":[{
"USDIMP": 0.722761,
"USDINR": 74.398905,
"USDIQD": 1458.90221
}]
At the end you only have to change data["quotes"]["USD{xm}"] to data["quotes"]["USD"+xm] because python tries to find a key called USD{xm} and not for example USDIMP, when you type "IMP" in the input.I hope this fixed your problem.

Is there a way to print out certain elements of the JSON file in python?

I am using a (dutch) weather API and the result is showing a lot of information. However, I only want to print the temperature and location. Is there a way to filter out these keys?
from pip._vendor import requests
import json
response = requests.get(" http://weerlive.nl/api/json-data-10min.php?key=demo&locatie=52.0910879,5.1124231")
def jprint(obj):
weer = json.dumps(obj, indent=2)
print(weer)
jprint(response.json())
result:
{
"liveweer": [
{
"place": "Utrecht",
"temp": "7.7",
"gtemp": "5.2",
"summa": "Dry after rain",
"lv": "89",
etc.
How do I only print out the place and temp?
Thanks in advance
import requests
response = requests.get(" http://weerlive.nl/api/json-data-10min.php?key=demo&locatie=52.0910879,5.1124231")
data = response.json()
for station in data["liveweer"]:
print(f"Temp in {station['plaats']} is {station['temp']}")
output
Temp in Utrecht is 8.0
Note that you can use convenient Response.json() method
If you expect the API to returns you a list of place, you can do:
>>> {'liveweer': [{'plaats': item['plaats'], 'temp': item['temp']}] for item in response.json()['liveweer']}
{'liveweer': [{'plaats': 'Utrecht', 'temp': '8.0'}]}
Try this:
x=response.json()
print(x['liveweer'][0]['place'])
print(x['liveweer'][0]['temp'])
If you are only interested in working with place and temp I would advise making a new dictionary.
import requests
r = requests.get(" http://weerlive.nl/api/json-data-10min.php?key=demo&locatie=52.0910879,5.1124231")
if r.status_code == 200:
data = {
"place" : r.json()['liveweer'][0]["place"],
"temp": r.json()['liveweer'][0]["temp"],
}
print(data)

API not accepting my JSON data from Python

I'm new to Python and dealing with JSON. I'm trying to grab an array of strings from my database and give them to an API. I don't know why I'm getting the missing data error. Can you guys take a look?
###########################################
rpt_cursor = rpt_conn.cursor()
sql="""SELECT `ContactID` AS 'ContactId' FROM
`BWG_reports`.`bounce_log_dummy`;"""
rpt_cursor.execute(sql)
row_headers=[x[0] for x in rpt_cursor.description] #this will extract row headers
row_values= rpt_cursor.fetchall()
json_data=[]
for result in row_values:
json_data.append(dict(zip(row_headers,result)))
results_to_load = json.dumps(json_data)
print(results_to_load) # Prints: [{"ContactId": 9}, {"ContactId": 274556}]
headers = {
'Content-Type': 'application/json',
'Accept': 'application/json',
}
targetlist = '302'
# This is for their PUT to "add multiple contacts to lists".
api_request_url = 'https://api2.xyz.com/api/list/' + str(targetlist)
+'/contactid/Api_Key/' + bwg_apikey
print(api_request_url) #Prints https://api2.xyz.com/api/list/302/contactid/Api_Key/#####
response = requests.put(api_request_url, headers=headers, data=results_to_load)
print(response) #Prints <Response [200]>
print(response.content) #Prints b'{"status":"error","Message":"ContactId is Required."}'
rpt_conn.commit()
rpt_cursor.close()
###########################################################
Edit for Clarity:
I'm passing it this [{"ContactId": 9}, {"ContactId": 274556}]
and I'm getting this response body b'{"status":"error","Message":"ContactId is Required."}'
The API doc gives this as the from to follow for the request body.
[
{
"ContactId": "string"
}
]
When I manually put this data in there test thing I get what I want.
[
{
"ContactId": "9"
},
{
"ContactId": "274556"
}
]
Maybe there is something wrong with json.dumps vs json.load? Am I not creating a dict, but rather a string that looks like a dict?
EDIT I FIGURED IT OUT!:
This was dumb.
I needed to define results_to_load = [] as a dict before I loaded it at results_to_load = json.dumps(json_data).
Thanks for all the answers and attempts to help.
I would recommend you to go and check the API docs to be specific, but from it seems, the API requires a field with the name ContactID which is an array, rather and an array of objects where every object has key as contactId
Or
//correct
{
contactId: [9,229]
}
instead of
// not correct
[{contactId:9}, {contactId:229}]
Tweaking this might help:
res = {}
contacts = []
for result in row_values:
contacts.append(result)
res[contactId] = contacts
...
...
response = requests.put(api_request_url, headers=headers, data=res)
I FIGURED IT OUT!:
This was dumb.
I needed to define results_to_load = [] as an empty dict before I loaded it at results_to_load = json.dumps(json_data).
Thanks for all the answers and attempts to help.

Parsing nested json payload python

I am trying to get only value A from from this nested json payload.
My function:
import requests
import json
def payloaded():
from urllib.request import urlopen
with urlopen("www.example.com/payload.json") as r:
data = json.loads(r.read().decode(r.headers.get_content_charset("utf-8")))
text = (data["bod"]["id"])
print(text)
The payload:
bod: {
id: [
{
value: "A",
summary: "B",
format: "C"
}
]
},
Currently it is returning everything within the brackets [... value ... summary ... format ...]
Solution found:
def payloaded():
from urllib.request import urlopen
with urlopen("www.example.com/payload.json") as r:
data = json.loads(r.read().decode(r.headers.get_content_charset("utf-8")))
text = (data["bod"]["id"][0]["value"])
print(text)
Since the id value is a list (even though it just contains a single value), you'll need to go inside it with a list indexer. Since lists in Python are zero-indexed (they start from zero) you'll use [0] to extract the first element:
data["bod"]["id"][0]["value"]
This works:
def payloaded():
from urllib.request import urlopen
with urlopen("www.example.com/payload.json") as r:
data = json.loads(r.read().decode(r.headers.get_content_charset("utf-8")))
text = (data["bod"]["id"][0]["value"])
print(text)

Categories

Resources