I'm trying to to fetch a page that has many urls and other stuff all in just one line in a plain text like
"link_url":"http://www.example.com/link1?site=web","mobile_link_url":"http://m.example.com/episode/link1?site=web" link_url":"http://www.example.com/link2?site=web","mobile_link_url":"http://m.example.com/episode/link2?site=web"
i tired
import re
import requests as req
response = req.get("http://api.example.com/?callback=jQuery112")
content = response.text
print content will give me the "link_url": output
but i need to find
http://www.example.com/link1?site=web
http://www.example.com/link2?site=web
and output only link1 and link2 to a file like
link1
link2
link3
The code below might be what you need.
import re
urls = '''"link_url":"http://www.example.com/link1?site=web","mobile_link_url":"http://m.example.com/episode/link1?site=web" link_url":"http://www.example.com/link2?site=web","mobile_link_url":"http://m.example.com/episode/link2?site=web"'''
links = re.findall(r'http://www[a-z/.?=:]+(link\d)+', urls)
print(links)
If it is a string and not a JSON object, then you could do this even though it's a bit hacky:
s1 ="\"link_url\":\"http://www.example.com/link1?site=web\",\"mobile_link_url\":\"http://m.example.com/episode/link1?site=web\" link_url\":\"http://www.example.com/link2?site=web\",\"mobile_link_url\":\"http://m.example.com/episode/link2?site=web\""
links = [x for x in s1.replace("\":\"", "LINK_DELIM").replace("\"", "").replace(" ", ",").split(",")]
for link in links:
print(link.split("LINK_DELIM")[1])
Which yields:
http://www.example.com/link1?site=web
http://m.example.com/episode/link1?site=web
http://www.example.com/link2?site=web
http://m.example.com/episode/link2?site=web
Though I think #al76's answer is more elegant for this.
But if it's a JSON which looks like:
[
{
"link_url": "http://www.example.com/link1?site=web",
"mobile_link_url": "http://m.example.com/episode/link1?site=web"
},
{
"link_url": "http://www.example.com/link2?site=web",
"mobile_link_url": "http://m.example.com/episode/link2?site=web"
}
]
Then you could do something like:
import json
s1 = "[{ \"link_url \": \"http://www.example.com/link1?site=web \", \"mobile_link_url \": \"http://m.example.com/episode/link1?site=web \"}, { \"link_url \": \"http://www.example.com/link2?site=web \", \"mobile_link_url \": \"http://m.example.com/episode/link2?site=web \"} ]"
data = json.loads(s1)
links = [y for x in data for y in x.values()]
for link in links:
print(link)
If this is a JSON api then you can use response.json() to get a python dictionary, as .text will give you the response as one long string.
You also do not need to use regex for something so simple, python comes with a url parser out of the box.
So provided your response is something like
[
{
"link_url": "http://www.example.com/link1?site=web",
"mobile_link_url": "http://m.example.com/episode/link1?site=web"
},
{
"link_url": "http://www.example.com/link2?site=web",
"mobile_link_url": "http://m.example.com/episode/link2?site=web"
}
]
(doesn't matter if IRL it's one line, as long as it's valid JSON)
You can iterate the results as a dictionary, then use urlparse to get specific components of your urls:
from urllib.parse import urlparse
import requests
response = requests.get("http://api.example.com/?callback=jQuery112")
for urls in response.json():
print(urlparse(url["link_url"]).path.rsplit('/', 1)[-1])
urlparse(...).path will return the path of your url only, eg. episode/link1, and we then we just get the last segment of that with rsplit to just get link1, link2 etc.
try
urls=""" "link_url":"http://www.example.com/link1?site=web","mobile_link_url":"http://m.example.com/episode/link1?site=web" link_url":"http://www.example.com/link2?site=web","mobile_link_url":"http://m.example.com/episode/link2?site=web" """
re.findall(r'"http://www[^"]+"',urls)
urls=""" "link_url":"http://www.example.com/link1?site=web","mobile_link_url":"http://m.example.com/episode/link1?site=web" link_url":"http://www.example.com/link2?site=web","mobile_link_url":"http://m.example.com/episode/link2?site=web" """
p = [i.split('":')[1] for i in urls.replace(' ', ",").split(",")[1:-1]]
#### Output ####
['"http://www.example.com/link1?site=web"',
'"http://m.example.com/episode/link1?site=web"',
'"http://www.example.com/link2?site=web"',
'"http://m.example.com/episode/link2?site=web"']
*Not as efficient as regex.
Related
I have a JSON file with URLs that looks something like this:
{
"a_url": "https://foo.bar.com/x/env_t/xyz?asd=dfg",
"b_url": "https://foo.bar.com/x/env_t/xyz?asd=dfg",
"some other property": "blabla",
"more stuff": "yep",
"c_url": "https://blabla.com/somethingsomething?maybe=yes"
}
In this JSON, I want to look up all URLs that have a specific format, and then replace some parts in it.
In URLs that have the format of the first 2 URLs, I want to replace "foo" by "fooa" and "env_t" by "env_a", so that the output looks like this:
{
"a_url": "https://fooa.bar.com/x/env_a/xyz?asd=dfg",
"b_url": "https://fooa.bar.com/x/env_a/xyz?asd=dfg",
"some other property": "blabla",
"more stuff": "yep",
"c_url": "https://blabla.com/somethingsomething?maybe=yes"
}
I can't figure out how to do this. I came up with this regex:
https://foo([a-z]?)\.bar\.com/x/(.+)/.+\"
In regex101 this matches my URLs and captures the groups that I'm seeking to replace, but I can't figure out how to do this with Python's regex.sub().
1. Using regex
import regex
url = 'https://foo.bar.com/x/env_t/xyz?asd=dfg'
new_url = regex.sub(
'(env_t)',
'env_a',
regex.sub('(foo)', 'fooa', url)
)
print(new_url)
output:
https://fooa.bar.com/x/env_a/xyz?asd=dfg
2. Using str.replace
with open('./your-json-file.json', 'r+') as f:
content = f.read()
new_content = content \
.replace('foo', 'fooa') \
.replace('env_t', 'env_a')
f.seek(0)
f.write(new_content)
f.truncate()
json file =
{
"success": true,
"terms": "https://curr
"privacy": "https://cu
"timestamp": 162764598
"source": "USD",
"quotes": {
"USDIMP": 0.722761,
"USDINR": 74.398905,
"USDIQD": 1458.90221
}
}
The json file is above. i deleted lot of values from the json as it took too many spaces. My python code is in below.
import urllib.request, urllib.parse, urllib.error
import json
response = "http://api.currencylayer.com/live?access_key="
api_key = "42141e*********************"
parms = dict()
parms['key'] = api_key
url = response + urllib.parse.urlencode(parms)
mh = urllib.request.urlopen(url)
source = mh.read().decode()
data = json.loads(source)
pydata = json.dumps(data, indent=2)
print("which curreny do you want to convert USD to?")
xm = input('>')
print(f"Hoe many USD do you want to convert{xm}to")
value = input('>')
fetch = pydata["quotes"][0]["USD{xm}"]
answer = fetch*value
print(fetch)
--------------------------------
Here is the
output
"fetch = pydata["quotes"][0]["USD{xm}"]
TypeError: string indices must be integers"
First of all the JSON data you posted here is not valid. There are missing quotes and commas. For example here "terms": "https://curr. It has to be "terms": "https://curr",. The same at "privacy" and the "timestamp" is missing a comma. After i fixed the JSON data I found a solution. You have to use data not pydata. This mean you have to change fetch = pydata["quotes"][0]["USD{xm}"] to fetch = data["quotes"][0]["USD{xm}"]. But this would result in the next error, which would be a KeyError, because in the JSON data you provided us there is no array after the "qoutes" key. So you have to get rid of this [0] or the json data has to like this:
"quotes":[{
"USDIMP": 0.722761,
"USDINR": 74.398905,
"USDIQD": 1458.90221
}]
At the end you only have to change data["quotes"]["USD{xm}"] to data["quotes"]["USD"+xm] because python tries to find a key called USD{xm} and not for example USDIMP, when you type "IMP" in the input.I hope this fixed your problem.
I am using a (dutch) weather API and the result is showing a lot of information. However, I only want to print the temperature and location. Is there a way to filter out these keys?
from pip._vendor import requests
import json
response = requests.get(" http://weerlive.nl/api/json-data-10min.php?key=demo&locatie=52.0910879,5.1124231")
def jprint(obj):
weer = json.dumps(obj, indent=2)
print(weer)
jprint(response.json())
result:
{
"liveweer": [
{
"place": "Utrecht",
"temp": "7.7",
"gtemp": "5.2",
"summa": "Dry after rain",
"lv": "89",
etc.
How do I only print out the place and temp?
Thanks in advance
import requests
response = requests.get(" http://weerlive.nl/api/json-data-10min.php?key=demo&locatie=52.0910879,5.1124231")
data = response.json()
for station in data["liveweer"]:
print(f"Temp in {station['plaats']} is {station['temp']}")
output
Temp in Utrecht is 8.0
Note that you can use convenient Response.json() method
If you expect the API to returns you a list of place, you can do:
>>> {'liveweer': [{'plaats': item['plaats'], 'temp': item['temp']}] for item in response.json()['liveweer']}
{'liveweer': [{'plaats': 'Utrecht', 'temp': '8.0'}]}
Try this:
x=response.json()
print(x['liveweer'][0]['place'])
print(x['liveweer'][0]['temp'])
If you are only interested in working with place and temp I would advise making a new dictionary.
import requests
r = requests.get(" http://weerlive.nl/api/json-data-10min.php?key=demo&locatie=52.0910879,5.1124231")
if r.status_code == 200:
data = {
"place" : r.json()['liveweer'][0]["place"],
"temp": r.json()['liveweer'][0]["temp"],
}
print(data)
I'm new to Python and dealing with JSON. I'm trying to grab an array of strings from my database and give them to an API. I don't know why I'm getting the missing data error. Can you guys take a look?
###########################################
rpt_cursor = rpt_conn.cursor()
sql="""SELECT `ContactID` AS 'ContactId' FROM
`BWG_reports`.`bounce_log_dummy`;"""
rpt_cursor.execute(sql)
row_headers=[x[0] for x in rpt_cursor.description] #this will extract row headers
row_values= rpt_cursor.fetchall()
json_data=[]
for result in row_values:
json_data.append(dict(zip(row_headers,result)))
results_to_load = json.dumps(json_data)
print(results_to_load) # Prints: [{"ContactId": 9}, {"ContactId": 274556}]
headers = {
'Content-Type': 'application/json',
'Accept': 'application/json',
}
targetlist = '302'
# This is for their PUT to "add multiple contacts to lists".
api_request_url = 'https://api2.xyz.com/api/list/' + str(targetlist)
+'/contactid/Api_Key/' + bwg_apikey
print(api_request_url) #Prints https://api2.xyz.com/api/list/302/contactid/Api_Key/#####
response = requests.put(api_request_url, headers=headers, data=results_to_load)
print(response) #Prints <Response [200]>
print(response.content) #Prints b'{"status":"error","Message":"ContactId is Required."}'
rpt_conn.commit()
rpt_cursor.close()
###########################################################
Edit for Clarity:
I'm passing it this [{"ContactId": 9}, {"ContactId": 274556}]
and I'm getting this response body b'{"status":"error","Message":"ContactId is Required."}'
The API doc gives this as the from to follow for the request body.
[
{
"ContactId": "string"
}
]
When I manually put this data in there test thing I get what I want.
[
{
"ContactId": "9"
},
{
"ContactId": "274556"
}
]
Maybe there is something wrong with json.dumps vs json.load? Am I not creating a dict, but rather a string that looks like a dict?
EDIT I FIGURED IT OUT!:
This was dumb.
I needed to define results_to_load = [] as a dict before I loaded it at results_to_load = json.dumps(json_data).
Thanks for all the answers and attempts to help.
I would recommend you to go and check the API docs to be specific, but from it seems, the API requires a field with the name ContactID which is an array, rather and an array of objects where every object has key as contactId
Or
//correct
{
contactId: [9,229]
}
instead of
// not correct
[{contactId:9}, {contactId:229}]
Tweaking this might help:
res = {}
contacts = []
for result in row_values:
contacts.append(result)
res[contactId] = contacts
...
...
response = requests.put(api_request_url, headers=headers, data=res)
I FIGURED IT OUT!:
This was dumb.
I needed to define results_to_load = [] as an empty dict before I loaded it at results_to_load = json.dumps(json_data).
Thanks for all the answers and attempts to help.
I am trying to get only value A from from this nested json payload.
My function:
import requests
import json
def payloaded():
from urllib.request import urlopen
with urlopen("www.example.com/payload.json") as r:
data = json.loads(r.read().decode(r.headers.get_content_charset("utf-8")))
text = (data["bod"]["id"])
print(text)
The payload:
bod: {
id: [
{
value: "A",
summary: "B",
format: "C"
}
]
},
Currently it is returning everything within the brackets [... value ... summary ... format ...]
Solution found:
def payloaded():
from urllib.request import urlopen
with urlopen("www.example.com/payload.json") as r:
data = json.loads(r.read().decode(r.headers.get_content_charset("utf-8")))
text = (data["bod"]["id"][0]["value"])
print(text)
Since the id value is a list (even though it just contains a single value), you'll need to go inside it with a list indexer. Since lists in Python are zero-indexed (they start from zero) you'll use [0] to extract the first element:
data["bod"]["id"][0]["value"]
This works:
def payloaded():
from urllib.request import urlopen
with urlopen("www.example.com/payload.json") as r:
data = json.loads(r.read().decode(r.headers.get_content_charset("utf-8")))
text = (data["bod"]["id"][0]["value"])
print(text)