As the question states, openpyxl reads files like I need it to but I don't know how to download the file from a sharepoint site and read it using openpyxl.
The url is something like this http://teamsites.teamworks.net/sites/efit-eitecs-005/SiteAssets/Lists/Apr19/AllItems/Gluster-2019.15-OS.xlsm
I'm currently using the following code.
import requests
import urllib
resp = requests.get(a, auth=auth).content
output = open(r'C:\Users\Me\temp.xlsx', 'wb')
output.write(resp)
output.close()
Anyone know the answer? Should I be saving as an xlsm file instead? I don't know what to do.
Related
I am trying to import data into PowerBi using a Python script so that I can schedule it to refresh data at regular basis.
I am facing a challenge getting the data from an excel file and receiving the error 'KeyError: "There is no item named 'xl/sharedStrings.xml' in the archive"
' while importing.
When I look into the archive of the xlsx file in the xl folder there is no file sharedString.xml. As there are no strings in the excel. the file opens properly in an excel without any issues but not with python.
import openpyxl
import pandas
import xlrd
import os
globaltrackerdf = pandas.read_excel (r'C:\Users\Documents\Trackers\Tracker-Global Tracker_V2-2022-06-13.xlsx',sheet_name="Sheet1",engine="openpyxl")
Solution that worked for me: Resave your file using your excel. My file also opened fine in Excel but upon zipping the file and looking inside there was no sharedStrings.xml. There seems to be a bug where saving a xlsx might not produce the sharedStrings.xml file. I found various ideas about why it might happen but since I don't have access to the client's Excel not sure what caused it.
For extra context on what an XLSX file is, I found this to be helpful: https://www.adimian.com/blog/fast-xlsx-parsing-with-python/
How can I download this following file in python? I have no issue doing this in R. I believe this issue is the last row in the file which will change. How can I change the code to work?
import pandas as pd
url = "https://ark-funds.com/wp-content/uploads/funds-etf-csv/ARK_INNOVATION_ETF_ARKK_HOLDINGS.csv"
test = pd.read_csv(url)
You should better download the csv file first by using the requests module.
Then you can read the file from the download directory by passing the file path instead of the URL (pd.read_csv(download_path)).
I want to get access to a zipped excel sheet online using python without downloading it to my PC. The link is as follow https://www.richmondfed.org/-/media/richmondfedorg/research/regional_economy/surveys_of_business_conditions/manufacturing/zipfile/mfg_historicaldata.zip,
which points to a zipped excel. Does anyone know how to use python to deal with it? For example, I want to print the first row of the excel without unzipping and saving the file directly in my PC.
Downloading and unzipping a .zip file without writing to disk
I have found a similar question below, however, I cannot use this code to read the excel file.
You can use pandas to read the excel file.
from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
import pandas as pd
resp = urlopen("https://www.richmondfed.org/-/media/richmondfedorg/research/regional_economy/surveys_of_business_conditions/manufacturing/zipfile/mfg_historicaldata.zip")
zipfile = ZipFile(BytesIO(resp.read()))
extracted_file = zipfile.open(zipfile.namelist()[0])
print(pd.read_excel(extracted_file))
I'm new to python and having trouble dealing with excel manpulation in python.
So here's my situation: I'm using requests to get a .xls file from a web server. After that I'm using xlrd to save the content in excel file. I'm only interested in one value of that file, and there are thousands of files im retrieving from different url addresses.
I want to know how could i handle the contents i get from request in some other way rather than creating a new file.
Besides, i've included my code my comments on how could I improve it. Besides, it doesn't work, since i'm trying to save new content in an already created excel file (but i couldnt figure out how to delete the contents of that file for my code to work (even if its not efficient)).
import requests
import xlrd
d={}
for year in string_of_years:
for month in string_of_months:
dls=" http://.../name_year_month.xls"
resp = requests.get(dls)
output = open('temp.xls', 'wb')
output.write(resp.content)
output.close()
workbook = xlrd.open_workbook('temp.xls')
worksheet = workbook.sheet_by_name(mysheet_name)
num_rows = worksheet.nrows
for k in range(num_rows):
if condition I'm looking for:
w={key_year_month:worksheet.cell_value(k,0)}
dic.update(w)
break
xlrd.open_workbook can accept a string for the file data instead of the file name. Your code could pass the contents of the XLS, rather than creating a file and passing its name.
Try this:
# UNTESTED
resp = requests.get(dls)
workbook = xlrd.open_workbook(file_contents=resp.content)
Reference: xlrd.open_workbook documentation
Save it and then delete the file readily on each loop after the work with os.
import os
#Your Stuff here
os.remove(#path to temp_file)
I am still pretty new to Python, so perhaps I am missing something obvious. I am trying to download a simple spreadsheet from Google Docs, save the file, and open it in Excel. When I did a test run with text files instead of excel files, it worked fine. However, using xls and xlsx, when excel opens the newly downloaded file, it says that the data is corrupted. How can I fix this?
import urllib2
print "Downloading..."
myfile = urllib2.urlopen("https://docs.google.com/spreadsheet/pub?key=0AoJYUIVnE85odGZxVHkybGxYRXF1TFpuQXdqZlJwNXc&output=xls")
output = open('C:\\Users\\Lucas\\Desktop\\downloaded.xlsx', 'w')
output.write(myfile.read())
output.close()
print "Done"
import subprocess
subprocess.call(['C:\\Program Files (x86)\\Microsoft Office\\Office14\\EXCEL.exe', 'C:\\Users\\Lucas\\Desktop\\downloaded.xlsx'])
you would want to make it wb you can take a look at the docs here
You're writing the file in plain-text, ascii mode. Excel documents are not plain text: under this assumption, you'll mishandle the content.
To use data as-is, with zero assumptions about its format, you use binary mode. Here:
output = open('C:\\Users\\Lucas\\Desktop\\downloaded.xlsx', 'wb')
Notice the 'b' flag at the end.