from Google Colab, I am trying to create a df from a xlsx file I have on a Github repo.
As url I take the permalink from Github, the repo is public and account in connected to Colab
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\n\n\n\n\n\n<!'
Thank you in advance for your help!
Maybe the problem is due to the URL that you are using.
You should try to do this to see what is returned by request.get.
url = "https://github.com/your-user-name/your-repo-name/blob/main/data/raw/your-file-name.xlsx"
import requests
from pprint import pprint
response = requests.get(url)
pprint(response.content)
It is an HTML page. This is not what you want.
There are a couple of things you can do to solve this. This medium post here might be useful.
However, one simple thing is to use an URL like the example below:
https://raw.githubusercontent.com/your-username/name-of-the-repository/master/name-of-the-file.xlsx
I've already tried this and it works.
import requests
import pandas as pd
url = "https://raw.githubusercontent.com/your-username/name-of-the-repository/master/name-of-the-file.xlsx"
response = requests.get(url)
dest = 'local-file.xlsx'
with open(dest, 'wb') as file:
file.write(response.content)
frame = pd.read_excel(dest)
frame.head()
Conclusion: change your URL.
Please use link from "view raw". for my file I use below url
url = 'https://github.com/mehadisaki/Sales-Forecasting-model-development-/blob/main/TV%20Delivery_2016-2022.xlsx?raw=true'
db=pd.read_excel(url)
With Google Colab one thing you could do is use the wget command, like this.
!wget "https://raw.githubusercontent.com/your-username/name-of-the-repository/master/name-of-the-file.xlsx"
Related
I am using Python 3.8.12. I tried the following code to download files from URLs with the requests package, but got 'Unkown file format' message when opening the zip file. I tested on different zip URLs but the size of all zip files are 18KB and none of the files can be opened successfully.
import requests
file_url = 'https://www.censtatd.gov.
hk/en/EIndexbySubject.html?pcode=D5600091&scode=300&file=D5600091B2022MM11B.zip'
file_download = requests.get(file_url, allow_redirects=True, stream=True)
open(save_path+file_name, 'wb').write(file_download.content)
Zip file opening error message
Zip files size
However, once I updated the url as file_url = 'https://www.td.gov.hk/datagovhk_tis/mttd-csv/en/table41a_eng.csv' the code worked well and the csv file could be downloaded perfectly.
I try to use requests, urllib , wget and zipfile io packages, but none of them work.
The reason may be that the zip URL directs to both the zip file and a web page, while the csv URL directs to the csv file only.
I am really new to this field, could anyone help on it? Thanks a lot!
You might examine headers after sending HEAD request to get information regarding file, examining Content-Type allows you to reveal actual type of file
import requests
file_url = 'https://www.censtatd.gov.hk/en/EIndexbySubject.html?pcode=D5600091&scode=300&file=D5600091B2022MM11B.zip'
r = requests.head(file_url)
print(r.headers["Content-Type"])
gives output
text/html
So file you have URL to is actually HTML page.
import wget
url = 'https://www.censtatd.gov.hk/en/EIndexbySubject.html?
pcode=D5600091&scode=300&file=D5600091B2022MM11B.zip'
#url = 'https://golang.org/dl/go1.17.3.windows-amd64.zip'
wget.download(url)
I am trying to download a file or folder from my gitlab repository, but they only way I have seen to do it is using CURL and command line. Is there any way to download files from the repository with just the python-gitlab API? I have read through the API and have not found anything, but other posts said it was possible, just gave no solution.
You can do like this:
import requests
response = requests.get('https://<your_path>/file.txt')
data = response.text
and then save the contents (data) as file...
Otherwise use the API:
f = project.files.get(path='<folder>/file.txt',ref='<branch or commit>')
and then decode using:
import base64
content = base64.b64decode(f.content)
and then save content as file...
so i found this code that lets you upload a file from a direct link to google drive using google colab. but i have to edit the code each time i want to add a url to upload to google drive.
can anyone fix the code so that i can enter the url as a form instead of editing the code and maybe so that i can use the form to manually name the file. or auto naming would be fine. like "1.mp4" "2.mp4" and so on.
this is the code
import requests
file_url = "http://1.droppdf.com/files/5iHzx/automate-the-boring-stuff-with-python-2015-.pdf"
r = requests.get(file_url, stream = True)
with open("/content/gdrive/My Drive/python.pdf", "wb") as file:
for block in r.iter_content(chunk_size = 1024):
if block:
file.write(block)
You can make the file URL a form parameter by adding ##param string to the line:
file_url = "http://1.droppdf.com/files/5iHzx/automate-the-boring-stuff-with-python-2015-.pdf" ##param string
I have a web link which downloads an excel file directly. It opens a page writing "your file is downloading" and starts downloading the file.
Is there any way i can automate it using requests module ?
I am able to do it with selenium but i want it to run in background so i was wondering if i can use request module.
I have used request.get but it simply gives the text i.e "your file is downloading" but somehow i am not able to get the file.
This Python3 code downloads any file from web to a memory:
import requests
from io import BytesIO
url = 'your.link/path'
def get_file_data(url):
response = requests.get(url)
f = BytesIO()
for chunk in response.iter_content(chunk_size=1024):
f.write(chunk)
f.seek(0)
return f
data = get_file_data(url)
You can use next code to read the Excel file:
import pandas as pd
xlsx = pd.read_excel(data, skiprows=0)
print(xlsx)
It sounds like you don't actually have a direct URL to the file, and instead need to engage with some javascript. Perhaps there is an underlying network call that you can find by inspecting the page traffic in your browser that shows a direct URL for downloading the file. With that you can actually just read the excel file URL directly with pandas:
import pandas as pd
url = "https://example.com/some_file.xlsx"
df = pd.read_excel(url)
print(df)
This is nice and tidy, but if you really want to use requests (or avoid pandas) you can download the raw file content as shown in this answer and then use the pyexcel_xlsx package's get_xlsx function to read it without any pandas involvement.
I work on a project and I want to download a csv file from a url. I did some research on the site but none of the solutions presented worked for me.
The url offers you directly to download or open the file of the blow I do not know how to say a python to save the file (it would be nice if I could also rename it)
But when I open the url with this code nothing happens.
import urllib
url='https://data.toulouse-metropole.fr/api/records/1.0/download/?dataset=dechets-menagers-et-assimiles-collectes'
testfile = urllib.request.urlopen(url)
Any ideas?
Try this. Change "folder" to a folder on your machine
import os
import requests
url='https://data.toulouse-metropole.fr/api/records/1.0/download/?dataset=dechets-menagers-et-assimiles-collectes'
response = requests.get(url)
with open(os.path.join("folder", "file"), 'wb') as f:
f.write(response.content)
You can adapt an example from the docs
import urllib.request
url='https://data.toulouse-metropole.fr/api/records/1.0/download/?dataset=dechets-menagers-et-assimiles-collectes'
with urllib.request.urlopen(url) as testfile, open('dataset.csv', 'w') as f:
f.write(testfile.read().decode())