How do I download an xlsm file and read it using openpyxl?

How do I download an xlsm file and read it using openpyxl? - python

As the question states, openpyxl reads files like I need it to but I don't know how to download the file from a sharepoint site and read it using openpyxl.
The url is something like this http://teamsites.teamworks.net/sites/efit-eitecs-005/SiteAssets/Lists/Apr19/AllItems/Gluster-2019.15-OS.xlsm
I'm currently using the following code.
import requests
import urllib
resp = requests.get(a, auth=auth).content
output = open(r'C:\Users\Me\temp.xlsx', 'wb')
output.write(resp)
output.close()
Anyone know the answer? Should I be saving as an xlsm file instead? I don't know what to do.

Related

I Keep On Getting this error: KeyError: "There is no item named 'xl/sharedStrings.xml' in the archive" when I try to load an Excel WorkSheet [duplicate]

I am trying to import data into PowerBi using a Python script so that I can schedule it to refresh data at regular basis.
I am facing a challenge getting the data from an excel file and receiving the error 'KeyError: "There is no item named 'xl/sharedStrings.xml' in the archive"
' while importing.
When I look into the archive of the xlsx file in the xl folder there is no file sharedString.xml. As there are no strings in the excel. the file opens properly in an excel without any issues but not with python.
import openpyxl
import pandas
import xlrd
import os
globaltrackerdf = pandas.read_excel (r'C:\Users\Documents\Trackers\Tracker-Global Tracker_V2-2022-06-13.xlsx',sheet_name="Sheet1",engine="openpyxl")

Solution that worked for me: Resave your file using your excel. My file also opened fine in Excel but upon zipping the file and looking inside there was no sharedStrings.xml. There seems to be a bug where saving a xlsx might not produce the sharedStrings.xml file. I found various ideas about why it might happen but since I don't have access to the client's Excel not sure what caused it.
For extra context on what an XLSX file is, I found this to be helpful: https://www.adimian.com/blog/fast-xlsx-parsing-with-python/

How to Read CSV from url in pandas? - error tokenizing data

How can I download this following file in python? I have no issue doing this in R. I believe this issue is the last row in the file which will change. How can I change the code to work?
import pandas as pd
url = "https://ark-funds.com/wp-content/uploads/funds-etf-csv/ARK_INNOVATION_ETF_ARKK_HOLDINGS.csv"
test = pd.read_csv(url)

You should better download the csv file first by using the requests module.
Then you can read the file from the download directory by passing the file path instead of the URL (pd.read_csv(download_path)).

Get access to zipped excel sheet online without saving it using python

I want to get access to a zipped excel sheet online using python without downloading it to my PC. The link is as follow https://www.richmondfed.org/-/media/richmondfedorg/research/regional_economy/surveys_of_business_conditions/manufacturing/zipfile/mfg_historicaldata.zip,
which points to a zipped excel. Does anyone know how to use python to deal with it? For example, I want to print the first row of the excel without unzipping and saving the file directly in my PC.
Downloading and unzipping a .zip file without writing to disk
I have found a similar question below, however, I cannot use this code to read the excel file.

You can use pandas to read the excel file.
from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
import pandas as pd
resp = urlopen("https://www.richmondfed.org/-/media/richmondfedorg/research/regional_economy/surveys_of_business_conditions/manufacturing/zipfile/mfg_historicaldata.zip")
zipfile = ZipFile(BytesIO(resp.read()))
extracted_file = zipfile.open(zipfile.namelist()[0])
print(pd.read_excel(extracted_file))

Python: How to handle excel data from web without saving file

I'm new to python and having trouble dealing with excel manpulation in python.
So here's my situation: I'm using requests to get a .xls file from a web server. After that I'm using xlrd to save the content in excel file. I'm only interested in one value of that file, and there are thousands of files im retrieving from different url addresses.
I want to know how could i handle the contents i get from request in some other way rather than creating a new file.
Besides, i've included my code my comments on how could I improve it. Besides, it doesn't work, since i'm trying to save new content in an already created excel file (but i couldnt figure out how to delete the contents of that file for my code to work (even if its not efficient)).
import requests
import xlrd
d={}
for year in string_of_years:
for month in string_of_months:
dls=" http://.../name_year_month.xls"
resp = requests.get(dls)
output = open('temp.xls', 'wb')
output.write(resp.content)
output.close()
workbook = xlrd.open_workbook('temp.xls')
worksheet = workbook.sheet_by_name(mysheet_name)
num_rows = worksheet.nrows
for k in range(num_rows):
if condition I'm looking for:
w={key_year_month:worksheet.cell_value(k,0)}
dic.update(w)
break

xlrd.open_workbook can accept a string for the file data instead of the file name. Your code could pass the contents of the XLS, rather than creating a file and passing its name.
Try this:
# UNTESTED
resp = requests.get(dls)
workbook = xlrd.open_workbook(file_contents=resp.content)
Reference: xlrd.open_workbook documentation

Save it and then delete the file readily on each loop after the work with os.
import os
#Your Stuff here
os.remove(#path to temp_file)

Download Excel Spreadsheet Python

I am still pretty new to Python, so perhaps I am missing something obvious. I am trying to download a simple spreadsheet from Google Docs, save the file, and open it in Excel. When I did a test run with text files instead of excel files, it worked fine. However, using xls and xlsx, when excel opens the newly downloaded file, it says that the data is corrupted. How can I fix this?
import urllib2
print "Downloading..."
myfile = urllib2.urlopen("https://docs.google.com/spreadsheet/pub?key=0AoJYUIVnE85odGZxVHkybGxYRXF1TFpuQXdqZlJwNXc&output=xls")
output = open('C:\\Users\\Lucas\\Desktop\\downloaded.xlsx', 'w')
output.write(myfile.read())
output.close()
print "Done"
import subprocess
subprocess.call(['C:\\Program Files (x86)\\Microsoft Office\\Office14\\EXCEL.exe', 'C:\\Users\\Lucas\\Desktop\\downloaded.xlsx'])

you would want to make it wb you can take a look at the docs here

You're writing the file in plain-text, ascii mode. Excel documents are not plain text: under this assumption, you'll mishandle the content.
To use data as-is, with zero assumptions about its format, you use binary mode. Here:
output = open('C:\\Users\\Lucas\\Desktop\\downloaded.xlsx', 'wb')
Notice the 'b' flag at the end.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I download an xlsm file and read it using openpyxl? - python

Related

I Keep On Getting this error: KeyError: "There is no item named 'xl/sharedStrings.xml' in the archive" when I try to load an Excel WorkSheet [duplicate]

How to Read CSV from url in pandas? - error tokenizing data

Get access to zipped excel sheet online without saving it using python

Python: How to handle excel data from web without saving file

Download Excel Spreadsheet Python

Categories

Resources