Open json file link through a code - python

I'm creating an addon and I'm modifying some functions that come within a py file.
What I intend to do is the following, I have this code:
def channellist():
return json.loads(openfile('lib.json',pastafinal=os.path.join(tugapath,'resources')))
This code gives access to a lib.json file that is inside the tugapath folder in the resources subfolder. What I did was put the lib.json file in the dropbox and wanted to replace it with the dropbox link from the lib.json file instead of calling the folders.
I tried to change the code but without success.
def channellist():
return json.loads(openfile('lib.json',pastafinal=os.path.join("https://www.dropbox.com/s/sj1246qtiodm6qd/lib.json?dl=1')))
If someone can help me, I'm grateful!
Thank you first.

Given that your link holds valid json - which is not the case with the content you posted - you could use requests.
If the content at dropbox looked liked this:
{"tv":
{"epg": "tv",
"streams":
[{"url": "http://topchantv.net:3456/live/Stalker/Stalker/838.m3u8",
"name": "IPTV",
"resolve": False,
"visible": True}],
"name": "tv",
"thumb": "thumb_tv.png"
}
}
Then fetching the content would be like this
import requests
url = 'https://www.dropbox.com/s/sj1246qtiodm6qd/lib.json?dl=1'
r = requests.get(url)
json_object = r.json()
So if you needed it inside a function, I guess you'd input the url and return the json like so:
def channellist(url):
r = requests.get(url)
json_object = r.json()
return json_object

Related

TypeError while following a tutorial

I am facing a problem trying to fetch API data and convert to CSV
I have successfully printed data, but when I add the lines of code to sort the data, I get this error
for x in myjson['data']:
TypeError: list indices must be integers or slices, not str
Here is my full line of code.
from ast import In
from email.mime import application
from webbrowser import get
import requests
import csv
from requests.api import head
url = "https://api-devnet.magiceden.dev/v2/collections/runecible/activities?offset=0&limit=100"
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json'
}
response = requests.request("GET", url,headers=headers,data={})
myjson = response.json()
ourdata =[]
for x in myjson['data']:
listing = [['collection'], ['price']]
ourdata.append(listing)
print(ourdata)
The main problem is typo in url.
It has to be runcible instead of runecible (without e in run-e-cible)
See url in example in documentation:
https://api.magiceden.dev/#3a6c0dd2-067f-4686-9a8b-8994667f1b67
Other problem is that server sends data with different structure. It is NOT dictionary with ['data'] but it is list with many dictionares - and this need different code in for-loop.
Minimal working code.
I removed many elements because they are not needed.
import requests
import csv
url = "https://api-devnet.magiceden.dev/v2/collections/runcible/activities?offset=0&limit=100"
response = requests.get(url)
data = response.json()
print(data)
# ---
ourdata = []
for item in data:
ourdata.append( [item['collection'], item['price']] )
print(ourdata)
# ---
with open('output.csv', 'w', newline='') as fh: # some systems may need `newline=''` to write it correctly
writer = csv.writer(fh)
writer.writerow(['collection', 'price']) # `writerow` without `s` to write one row (with headers)
writer.writerows(ourdata) # `writerows` with `s` to write many rows (with data)
BTW:
You can also run it as
url = "https://api-devnet.magiceden.dev/v2/collections/runcible/activities"
payload = {
"offset": 0,
"limit": 100,
}
response = requests.get(url, params=payload)
EDIT:
To make sure (because in similar question this also made problem):
Url api-devnet. (Devnet) is for testing and it may have fake or outdated values.
If you will need real data then you will need to use url api-mainnet. (Mainnet).
Text from documentation:
Devnet: api-devnet.magiceden.dev/v2 - this uses a testing Solana cluster, the tokens are not real
Mainnet: api-mainnet.magiceden.dev/v2 - this uses the real Solana cluster, the tokens are real and this is consistent with the data seen on https://magiceden.io/
BTW: In documentation at the top you can select language - ie. Python Requests - and it will show all examples in this language. But some code can be reduced.

Scrapy - How to write to a custom FEED_URI

I'm new to Scrapy and I would like write backups of the HTML that s3. I found that by using the following, I could write a particular scrape's html:
settings.py
ITEM_PIPELINE = {
'scrapy.pipelines.files.S3FilesStore': 1
}
AWS_ACCESS_KEY_ID = os.environ['S3_MAIN_KEY']
AWS_SECRET_ACCESS_KEY= os.environ['S3_MAIN_SECRET']
FEED_FORMAT = "json"
FEED_URI=f's3://bucket_name/%(name)s/%(today)s.html'
And then in my scraper file:
def parse(self, response):
yield {'body': str(response.body, 'utf-8')}
However, I would like to write to a key that includes the url as a subfolder, for example:
FEED_URI=f's3://bucket_name/%(name)s/%(url)s/%(today)s.html'
How can I dynamically grab the url for the FEED_URI. I'm assuming that in
def start_requests(self):
urls = [
'http://www.example.com',
'http://www.example_1.com',
'http://www.example_2.com',
]
I have multiple urls. Also, is there anyway to write the raw HTML file, not nested in JSON? Thanks.
Feed exports are not meant to export to a different file per item.
For that, write an item pipeline instead.

Python: Download Images from Google

I am using google's books API to fetch information about books based on ISBN number. I am getting thumbnails in response along with other information. The response looks like this:
"imageLinks": {
"smallThumbnail": "http://books.google.com/books/content?id=tEDhAAAAMAAJ&printsec=frontcover&img=1&zoom=5&source=gbs_api",
"thumbnail": "http://books.google.com/books/content?id=tEDhAAAAMAAJ&printsec=frontcover&img=1&zoom=1&source=gbs_api"
},
I want to download thumbnails on the links above and store them on local file system. How can it be done python?
Use urllib module
Ex:
import urllib
d = {"imageLinks": {
"smallThumbnail": "http://books.google.com/books/content?id=tEDhAAAAMAAJ&printsec=frontcover&img=1&zoom=5&source=gbs_api",
"thumbnail": "http://books.google.com/books/content?id=tEDhAAAAMAAJ&printsec=frontcover&img=1&zoom=1&source=gbs_api"
}
}
urllib.urlretrieve(d["imageLinks"]["thumbnail"], "MyThumbNail.jpg")
Python3X
from urllib import request
with open("MyThumbNail.jpg", "wb") as infile:
infile.write(request.urlopen(d["imageLinks"]["thumbnail"]).read())

RuntimeWarning: Parent module 'src' not found while handling absolute import - AWS Lambda

I am having trouble with a simple Python Lambda Function. The Lambda (called by Zapier) function basically creates a Confluence Page and recalls another Zapier Webhook. I upload a .zip file in my S3, which has all folders of the required packages, and then a src folder with my Python file which has my handler function.
.zip --> src/lambda_function.py which then I call a handler function.
At the top of my lambda_function.py I have the following:
import string
import json
import pyconfluence as pyco
import requests
import os
import time
def create_page(name, content, label):
data = {}
data["type"] = "page"
data["title"] = name
data["ancestors"] = [{"id": str(12345678)}] #172949526 is the Parent
data["space"] = {"key": "PK"}
data["body"] = {"storage": {"value": content, "representation": "storage"}}
data["metadata"] = {"labels": [{"name": label}]}
return pyco.api.rest("/", "POST", json.dumps(data))
def lambda_handler(event, context):
# Page 12345678 that is in fact a representation of the template
content = pyco.get_page_content(12345678)
page_title = event['issue_key'] + ": " + event['issue_title']
content = string.replace(content, "PK-200", event['issue_key'])
create_page(page_title, content, "active-product-requirement")
api_url = "https://acmein.atlassian.net/rest/api/2/issue/"+event['issue_key']+"/remotelink/"
webhook = 'https://hooks.zapier.com/hooks/catch/123456/8v6fde/'
requisite_headers = {'Accept': 'application/json',
'Content-Type': 'application/json'}
auth = (os.environ["PYCONFLUENCE_USER"], os.environ["PYCONFLUENCE_TOKEN"])
result = requests.get(api_url, headers=requisite_headers, auth=auth).json()
if len(result) > 0:
confluence_url = result[0]["object"]["url"]
else:
confluence_url = "Error getting the page"
callback = requests.post(webhook, data={'issue_key': event['issue_key'], 'confluence_url': confluence_url })
return ["Success"]
and then in my CloudWatch logs I have the following errors:
You have to zip the contents of the src folder, not the src folder itself. Your python module (lambda_function.py) should be at the first level in you zip.
See documentation:
Zip the directory content, not the directory. The contents of the Zip file are available as the current working directory of the Lambda function.

using xpath to parse images from the web

I'm written a bit of code in an attempt to pull photos from a website. I want it to find photos, then download them for use to tweet:
import urllib2
from lxml.html import fromstring
import sys
import time
url = "http://www.phillyhistory.org/PhotoArchive/Search.aspx"
response = urllib2.urlopen(url)
html = response.read()
dom = fromstring(html)
sels = dom.xpath('//*[(#id = "large_media")]')
for pic in sels[:1]:
output = open("file01.jpg","w")
output.write(pic.read())
output.close()
#twapi = tweepy.API(auth)
#twapi.update_with_media(imagefilename, status=xxx)
I'm new at this sort of thing, so I'm not really sure why this isn't working. No file is created, and no 'sels' are being created.
Your problem is that the image search (Search.aspx) doesn't just return a HTML page with all the content in it, but instead delivers a JavaScript application that then makes several subsequent requests (see AJAX) to fetch raw information about assets, and then builds a HTML page dynamically that contains all those search results.
You can observe this behavior by looking at the HTTP requests your browser makes when you load the page. Use the Firebug extension for Firefox or the builtin Chrome developer tools and open the Network tab. Look for requests that happen after the initial page load, particularly POST requests.
In this case the interesting requests are the ones to Thumbnails.ashx, Details.ashx and finally MediaStream.ashx. Once you identify those requests, look at what headers and form data your browser sends, and emulate that behavior with plain HTTP requests from Python.
The response from Thumbnails.ashx is actually JSON, so it's much easier to parse than HTML.
In this example I use the requests module because it's much, much better and easier to use than urllib(2). If you don't have it, install it with pip install requests.
Try this:
import requests
import urllib
BASE_URL = 'http://www.phillyhistory.org/PhotoArchive/'
QUERY_URL = BASE_URL + 'Thumbnails.ashx'
DETAILS_URL = BASE_URL + 'Details.ashx'
def get_media_url(asset_id):
response = requests.post(DETAILS_URL, data={'assetId': asset_id})
image_details = response.json()
media_id = image_details['assets'][0]['medialist'][0]['mediaId']
return '{}/MediaStream.ashx?mediaId={}'.format(BASE_URL, media_id)
def save_image(asset_id):
filename = '{}.jpg'.format(asset_id)
url = get_media_url(asset_id)
with open(filename, 'wb') as f:
response = requests.get(url)
f.write(response.content)
return filename
urlqs = {
'maxx': '-8321310.550067',
'maxy': '4912533.794965',
'minx': '-8413034.983992',
'miny': '4805521.955385',
'onlyWithoutLoc': 'false',
'sortOrderM': 'DISTANCE',
'sortOrderP': 'DISTANCE',
'type': 'area',
'updateDays': '0',
'withoutLoc': 'false',
'withoutMedia': 'false'
}
data = {
'start': 0,
'limit': 12,
'noStore': 'false',
'request': 'Images',
'urlqs': urllib.urlencode(urlqs)
}
response = requests.post(QUERY_URL, data=data)
result = response.json()
print '{} images found'.format(result['totalImages'])
for image in result['images']:
asset_id = image['assetId']
print 'Name: {}'.format(image['name'])
print 'Asset ID: {}'.format(asset_id)
filename = save_image(asset_id)
print "Saved image to '{}'.\n".format(filename)
Note: I didn't check what http://www.phillyhistory.org/'s Terms of Service have to say about automated crawling. You need to check yourself and make sure you're not in violation of their ToS with whatever you're doing.

Categories

Resources