How can I get Helm's binary from their GitHub repo? - python

I'm trying to download Helm's latest release using a script. I want to download the binary and copy it to a file. I tried looking at the documentation, but it's very confusing to read and I don't understand this. I have found a way to download specific files, but nothing regarding the binary. So far, I have:
from github import Github
def get_helm(filename):
f = open(filename, 'w') # The file I want to copy the binary to
g = Github()
r = g.get_repo("helm/helm")
# Get binary and use f.write() to transfer it to the file
f.close
return filename
I am also well aware of the limits of queries that I can do since there are no credentials.

For Helm in particular, you're not going to have a good time since they apparently don't publish their release files via GitHub, only the checksum metadata.
See https://github.com/helm/helm/releases/tag/v3.6.0 ...
Otherwise, this would be as simple as:
get the JSON data from https://api.github.com/repos/{repo}/releases
get the first release in the list (it's the newest)
look through the assets of that release to find the file you need (e.g. for your architecture)
download it using your favorite HTTP client (e.g. the one you used to get the JSON data in the first step)
Nevertheless, here's a script that works for Helm's additional hoops-to-jump-through:
import requests
def download_binary_with_progress(source_url, dest_filename):
binary_resp = requests.get(source_url, stream=True)
binary_resp.raise_for_status()
with open(dest_filename, "wb") as f:
for chunk in binary_resp.iter_content(chunk_size=524288):
f.write(chunk)
print(f.tell(), "bytes written")
return dest_filename
def download_newest_helm(desired_architecture):
releases_resp = requests.get(
f"https://api.github.com/repos/helm/helm/releases"
)
releases_resp.raise_for_status()
releases_data = releases_resp.json()
newest_release = releases_data[0]
for asset in newest_release.get("assets", []):
name = asset["name"]
# For a project using regular releases, this would be simplified to
# checking for the desired architecture and doing
# download_binary_with_progress(asset["browser_download_url"], name)
if desired_architecture in name and name.endswith(".tar.gz.asc"):
tarball_filename = name.replace(".tar.gz.asc", ".tar.gz")
tarball_url = f"https://get.helm.sh/{tarball_filename}"
return download_binary_with_progress(
source_url=tarball_url, dest_filename=tarball_filename
)
raise ValueError("No matching release found")
download_newest_helm("darwin-arm64")

Related

Downloading files raw.githubusercontent.com is immensely slow

I'm building an application in python 3 that requires downloading a whole bunch of *.java files from raw.githubusercontent.com. Basically, I use the GitHub's API v3 to obtain the all paths ending with ".java" in a given repository, then I download them through raw.githubusercontent.com. The trouble is that this is really slow (< 10 kB/s). Now sometimes, it starts of at a decent rate (40-50 kB/s), but then it usually drops of pretty quickly.
I've tried keeping a persistent connection by using requests.Session(). I've also tried using an authorization token, which someone suggested. Both of these failed to give an improvement.
This is how my code looks like:
with requests.Session() as s:
path_index = ""
for path in paths.splitlines():
file_url = githubusercontent_prefix + path
filename = path.split("/")[-1]
res = s.get(file_url, stream=True, allow_redirects=True)
outf = open("sources/" + filename, 'w')
outf.write(res.text)
outf.close()

how to download pics to a specific folder location on windows?

I have this script which download all images from a given web url address:
from selenium import webdriver
import urllib
class ChromefoxTest:
def __init__(self,url):
self.url=url
self.uri = []
def chromeTest(self):
# file_name = "C:\Users\Administrator\Downloads\images"
self.driver=webdriver.Chrome()
self.driver.get(self.url)
self.r=self.driver.find_elements_by_tag_name('img')
# output=open(file_name,'w')
for i, v in enumerate(self.r):
src = v.get_attribute("src")
self.uri.append(src)
pos = len(src) - src[::-1].index('/')
print src[pos:]
self.g=urllib.urlretrieve(src, src[pos:])
# output.write(src)
# output.close()
if __name__=='__main__':
FT=ChromefoxTest("http://imgur.com/")
FT.chromeTest()
my question is: how do i make this script to save all the pics to a specific folder location on my windows machine?
You need to specify the path where you want to save the file. This is explained in the documentation for urllib.urlretrieve:
The method is: urllib.urlretrieve(url[, filename[, reporthook[, data]]]).
And the documentation says:
The second argument, if present, specifies the file location to copy to (if absent, the location will be a tempfile with a generated name).
So...
urllib.urlretrieve(src, 'location/on/my/system/foo.png')
Will save the image to the specified folder.
Also, consider reviewing the documentation for os.path. Those functions will help you manipulate file names and paths.
If you use the requests library you can slurp up really big image files (or small ones) efficiently and arrange to store them in a place of your choice in an obvious way.
Use this code and you'll get a nice picture of a beagle dog!
image_url is the link to the remote image.
file_path is where you want to store the image locally. It can include just a file name or a full path, at your option.
chunk_size is the size of the piece of the file to be downloaded with each slurp from the remote site.
length is the actual size of the piece that is written locally. Since I did this interactively I put this in mainly so that I wouldn't have to look at a long vertical stream of 1024s on my screen.
..
>>> import requests
>>> image_url = 'http://maxpixel.freegreatpicture.com/static/photo/1x/Eyes-Dog-Portrait-Animal-Familiar-Domestic-Beagle-2507963.jpg'
>>> file_path = r'c:\scratch\beagle.jpg'
>>> r = requests.get(image_url, stream=True)
>>> with open(file_path, 'wb') as beagle:
... for chunk in r.iter_content(chunk_size=1024):
... length = beagle.write(chunk)

Why are my pictures corrupted after downloading and writing them in python?

Preface
This is my first post on stackoverflow so I apologize if I mess up somewhere. I searched the internet and stackoverflow heavily for a solution to my issues but I couldn't find anything.
Situation
What I am working on is creating a digital photo frame with my raspberry pi that will also automatically download pictures from my wife's facebook page. Luckily I found someone who was working on something similar:
https://github.com/samuelclay/Raspberry-Pi-Photo-Frame
One month ago this gentleman added the download_facebook.py script. This is what I needed! So a few days ago I started working on this script to get it working in my windows environment first (before I throw it on the pi). Unfortunately there is no documentation specific to that script and I am lacking in python experience.
Based on the from urllib import urlopen statement, I can assume that this script was written for Python 2.x. This is because Python 3.x is now from urlib import request.
So I installed Python 2.7.9 interpreter and I've had fewer issues than when I was attempting to work with Python 3.4.3 interpreter.
Problem
I've gotten the script to download pictures from the facebook account; however, the pictures are corrupted.
Here is pictures of the problem: http://imgur.com/a/3u7cG
Now, I originally was using Python 3.4.3 and had issues with my method urlrequest(url) (see code at bottom of post) and how it was working with the image data. I tried decoding with different formats such as utf-8 and utf-16 but according to the content headers, it shows utf-8 format (I think).
Conclusion
I'm not quite sure if the problem is with downloading the image or with writing the image to the file.
If anyone can help me with this I'd be forever grateful! Also let me know what I can do to improve my posts in the future.
Thanks in advance.
Code
from urllib import urlopen
from json import loads
from sys import argv
import dateutil.parser as dateparser
import logging
# plugin your username and access_token (Token can be get and
# modified in the Explorer's Get Access Token button):
# https://graph.facebook.com/USER_NAME/photos?type=uploaded&fields=source&access_token=ACCESS_TOKEN_HERE
FACEBOOK_USER_ID = "**USER ID REMOVED"
FACEBOOK_ACCESS_TOKEN = "** TOKEN REMOVED - GET YOUR OWN **"
def get_logger(label='lvm_cli', level='INFO'):
"""
Return a generic logger.
"""
format = '%(asctime)s - %(levelname)s - %(message)s'
logging.basicConfig(format=format)
logger = logging.getLogger(label)
logger.setLevel(getattr(logging, level))
return logger
def urlrequest(url):
"""
Make a url request
"""
req = urlopen(url)
data = req.read()
return data
def get_json(url):
"""
Make a url request and return as a JSON object
"""
res = urlrequest(url)
data = loads(res)
return data
def get_next(data):
"""
Get next element from facebook JSON response,
or return None if no next present.
"""
try:
return data['paging']['next']
except KeyError:
return None
def get_images(data):
"""
Get all images from facebook JSON response,
or return None if no data present.
"""
try:
return data['data']
except KeyError:
return []
def get_all_images(url):
"""
Get all images using recursion.
"""
data = get_json(url)
images = get_images(data)
next = get_next(data)
if not next:
return images
else:
return images + get_all_images(next)
def get_url(userid, access_token):
"""
Generates a useable facebook graph API url
"""
root = 'https://graph.facebook.com/'
endpoint = '%s/photos?type=uploaded&fields=source,updated_time&access_token=%s' % \
(userid, access_token)
return '%s%s' % (root, endpoint)
def download_file(url, filename):
"""
Write image to a file.
"""
data = urlrequest(url)
path = 'C:/photos/%s' % filename
f = open(path, 'w')
f.write(data)
f.close()
def create_time_stamp(timestring):
"""
Creates a pretty string from time
"""
date = dateparser.parse(timestring)
return date.strftime('%Y-%m-%d-%H-%M-%S')
def download(userid, access_token):
"""
Download all images to current directory.
"""
logger = get_logger()
url = get_url(userid, access_token)
logger.info('Requesting image direct link, please wait..')
images = get_all_images(url)
for image in images:
logger.info('Downloading %s' % image['source'])
filename = '%s.jpg' % create_time_stamp(image['created_time'])
download_file(image['source'], filename)
if __name__ == '__main__':
download(FACEBOOK_USER_ID, FACEBOOK_ACCESS_TOKEN)
Answering the question of why #Alastair's solution from the comments worked:
f = open(path, 'wb')
From https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files:
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
(I was on a Mac, which explains why the problem wasn't reproduced for me.)
Alastair McCormack posted something that worked!
He said Try setting binary mode when you open the file for writing: f = open(path, 'wb')
It is now successfully downloading the images correctly. Does anyone know why this worked?

Downloading contents of several html pages using python

I'm new to Python and was trying to figure out how to code a script that will download the contents of HTML pages. I was thinking of doing something like:
Y = 0
X = "example.com/example/" + Y
While Y != 500:
(code to download file), Y++
if Y == 500:
break
so the (Y) is the file name and I need to download files from example.com/example/1 all the way till file number 500, regardless of the file type.
Read this official docs page:
This module provides a high-level interface for fetching data across the World Wide Web.
In particular, the urlopen() function is similar to the built-in function open(), but accepts Universal Resource Locators (URLs) instead of filenames.
Some restrictions apply — it can only open URLs for reading, and no seek operations are available.
So you have code like this:
import urllib
content = urllib.urlopen("http://www.google.com").read()
#urllib.request.urlopen(...).read() in python 3
The following code should meet your need. It will download 500 web contents and save them to disk.
import urllib2
def grab_html(url):
response = urllib2.urlopen(url)
mimetype = response.info().getheader('Content-Type')
return response.read(), mimetype
for i in range(500):
filename = str(i) # Use digit as filename
url = "http://example.com/example/{0}".format(filename)
contents, _ = grab_html(url)
with open(filename, "w") as fp:
fp.write(contents)
Notes:
If you need parallel fetching, here is a great example https://docs.python.org/3/library/concurrent.futures.html

can python be useful to open multiple tabs in a browser in one shot?

I am looking for a faster way to do my task. i have 40000 file downloadable urls. I would like to download them in local desktop is.now the thought is currently what I am doing is placing the link on the browser and then download them via a script.now what I am looking for is to place 10 urls in a chunk to the address bar and get the 10 files to be downloaded at the same time.If it possible hope overall time will be decreased.
Sorry I was late to give the code,here it is :
def _download_file(url, filename):
"""
Given a URL and a filename, this method will save a file locally to the»
destination_directory path.
"""
if not os.path.exists(destination_directory):
print 'Directory [%s] does not exist, Creating directory...' % destination_directory
os.makedirs(destination_directory)
try:
urllib.urlretrieve(url, os.path.join(destination_directory, filename))
print 'Downloading File [%s]' % (filename)
except:
print 'Error Downloading File [%s]' % (filename)
def _download_all(main_url):
"""
Given a URL list, this method will download each file in the destination
directory.
"""
url_list = _create_url_list(main_url)
for url in url_list:
_download_file(url, _get_file_name(url))
Thanks,
Why use a browser? This seems like an XY problem.
To download files, I'd use a library like requests (or make a system call to wget).
Something like this:
import requests
def download_file_from_url(url, file_save_path):
r = requests.get(url)
if r.ok: # checks if the download succeeded
with file(file_save_path, 'w') as f:
f.write(r.content)
return True
else:
return r.status_code
download_file_from_url('http://imgs.xkcd.com/comics/tech_support_cheat_sheet.png', 'new_image.png')
# will download image and save to current directory as 'new_image.png'
You first have to install requests using whatever python package manager you prefer e.g., pip install requests. You can also get fancier; e.g.,

Categories

Resources