I'm new to python and having trouble dealing with excel manpulation in python.
So here's my situation: I'm using requests to get a .xls file from a web server. After that I'm using xlrd to save the content in excel file. I'm only interested in one value of that file, and there are thousands of files im retrieving from different url addresses.
I want to know how could i handle the contents i get from request in some other way rather than creating a new file.
Besides, i've included my code my comments on how could I improve it. Besides, it doesn't work, since i'm trying to save new content in an already created excel file (but i couldnt figure out how to delete the contents of that file for my code to work (even if its not efficient)).
import requests
import xlrd
d={}
for year in string_of_years:
for month in string_of_months:
dls=" http://.../name_year_month.xls"
resp = requests.get(dls)
output = open('temp.xls', 'wb')
output.write(resp.content)
output.close()
workbook = xlrd.open_workbook('temp.xls')
worksheet = workbook.sheet_by_name(mysheet_name)
num_rows = worksheet.nrows
for k in range(num_rows):
if condition I'm looking for:
w={key_year_month:worksheet.cell_value(k,0)}
dic.update(w)
break
xlrd.open_workbook can accept a string for the file data instead of the file name. Your code could pass the contents of the XLS, rather than creating a file and passing its name.
Try this:
# UNTESTED
resp = requests.get(dls)
workbook = xlrd.open_workbook(file_contents=resp.content)
Reference: xlrd.open_workbook documentation
Save it and then delete the file readily on each loop after the work with os.
import os
#Your Stuff here
os.remove(#path to temp_file)
Related
How do I make a dataset that shows historic data from snapshots?
I have a csv-file that is updated and overwritten with new snapshot data once a day. I would like to make a python-script that regularly updates the snapshot data with the current snapshots.
One way I thought of was the following:
import pandas as pd
# Read csv-file
snapshot = pd.read_csv('C:/source/snapshot_data.csv')
# Try to read potential trend-data
try:
historic = pd.read_csv('C:/merged/historic_data.csv')
# Merge the two dfs and write back to historic file-path
historic.merge(snapshot).to_csv('C:/merged/historic_data.csv')
except:
snapshot.to_csv('C:/merged/historic_data.csv')
However, I don't like the fact that I use a try-function to get the historic data if the file-path exists or write the snapshot data to the historic path if the path doesn't exist.
Is there anyone that knows a better way of creating a trend dataset?
You can use os module to check if the file exists and mode argument in to_csv function to append data to the file.
The code below will:
Read from snapshot.csv.
Checks if the historic.csv file exists.
If it exists then save the headers else dont save header.
Save the file. If the file already exists, new data will be appended to the file instead of overwriting it.
import os
import pandas as pd
# Read snapshot file
snapshot = pd.read_csv("snapshot.csv")
# Check if historic data file exists
file_path = "historic.csv"
header = not os.path.exists(file_path) # whether header needs to written
# Create or append to the historic data file
snapshot.to_csv(file_path, header=header, index=False, mode="a")
you could easily one line it by utilising the mode parameter in `to_csv'.
pandas.read_csv('snapshot.csv').to_csv('historic.csv', mode='a')
It will create the file if it doesn't already exist, or will append if it does.
What happens if you don't have a new snapshot file? You might want to wrap that in a try... except block. The pythonic way is typically ask for forgiveness instead of permission.
I wouldn't even both with an external library like pandas as the standard library has all you need to 'append' to a file.
with open('snapshot.csv', 'r') as snapshot:
with open('historic.csv', 'a') as historic:
for line in new_file.readline():
historic_file.write(line)
I want users to be able to download an Excel file by clicking a button. I have an existing Excel file, though it can also be generated from a dataframe, that I want to be provided in Excel format.
Documentation gives an example for .csv files:
with open('my_file.csv') as f:
st.download_button('Download', f)
but I can't adapt this use case for an Excel file. I can't manage to put the excel file in the right format so that the download_button method accepts it. I tried passing pd.to_excel() object but it also didn't work.
I'll appreciate any and every suggestion!
Found the solution:
with open(file_path, 'rb') as my_file:
st.download_button(label = 'Download', data = my_file, file_name = 'filename.xlsx', mime = 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
I am using cmis package available in python to download the document from FileNet repository. I am using getcontentstream method available in the package. However it returns content file that beings with 'Pk' and ends in 'PK'. when I googled I came to know it is excel zip package content. is there a way to save the content into an excel file. I should be able to open the downloaded excel. I am using below code. but getting byte-liked object is required not str. I noticed type of result is string.io.
# expport the result
result = testDoc.getContentStream()
outfile = open(sample.xlsx, 'wb')
outfile.write(result.read())
result.close()
outfile.close()
Hi there and welcome to stackoverflow. There are a few bits I noticed about your post.
To answer the error code you are getting directly. You called the outfile FileStream to be in terms of binary, however the result.read() must be in Unicode string format which is why you are getting this error. You can try to encode it before passing it to the outfile.write() function (ex: outfile.write(result.read().encode())).
You can also simply just write Unicode directly by:
result = testDoc.getContentStream()
result_text = result.read()
from zipfile import ZipFile
with ZipFile(filepath, 'w') as zf:
zf.writestr('filename_that_is_zipped', result_text)
Not I am not sure what you have in your ContentStream but note that a excel file is made up of xml files zipped up. The minimum file structure you need for an excel file is as follows:
_rels/.rels contains excel schemas
docProps/app.xml contains number of sheets and sheet names
docProps/core.xml boiler plate user info and date created
xl/workbook.xml contains sheet names rdId to workbook link
xl/worksheets/sheet1.xml (and more sheets in this folder) contains cell data for each sheet
xl/_rels/workbook.xml.rels contains sheet file locations within zipfile
xl/sharedStrings.xml if you have string only cell values
[Content_Types].xmlapplies schemas to file types
I recently went through piecing together an excel file from scratch, if you want to see the code check out https://github.com/PydPiper/pylightxl
As the question states, openpyxl reads files like I need it to but I don't know how to download the file from a sharepoint site and read it using openpyxl.
The url is something like this http://teamsites.teamworks.net/sites/efit-eitecs-005/SiteAssets/Lists/Apr19/AllItems/Gluster-2019.15-OS.xlsm
I'm currently using the following code.
import requests
import urllib
resp = requests.get(a, auth=auth).content
output = open(r'C:\Users\Me\temp.xlsx', 'wb')
output.write(resp)
output.close()
Anyone know the answer? Should I be saving as an xlsm file instead? I don't know what to do.
I have a folder with a large number of Excel workbooks. Is there a way to convert every file in this folder into a CSV file using Python's xlrd, xlutiles, and xlsxWriter?
I would like the newly converted CSV files to have the extension '_convert.csv'.
OTHERWISE...
Is there a way to merge all the Excel workbooks in the folder to create one large file?
I've been searching for ways to do both, but nothing has worked...
Using pywin32, this will find all the .xlsx files in the indicated directory and open and resave them as .csv. It is relatively easy to figure out the right commands with pywin32...just record an Excel macro and perform the open/save manually, then look at the resulting macro.
import os
import glob
import win32com.client
xl = win32com.client.gencache.EnsureDispatch('Excel.Application')
for f in glob.glob('tmp/*.xlsx'):
fullname = os.path.abspath(f)
xl.Workbooks.Open(fullname)
xl.ActiveWorkbook.SaveAs(Filename=fullname.replace('.xlsx','.csv'),
FileFormat=win32com.client.constants.xlCSVMSDOS,
CreateBackup=False)
xl.ActiveWorkbook.Close(SaveChanges=False)
I will give a try with my library pyexcel:
from pyexcel import Book, BookWriter
import glob
import os
for f in glob.glob("your_directory/*.xlsx"):
fullname = os.path.abspath(f)
converted_filename = fullname.replace(".xlsx", "_converted.csv")
book = Book(f)
converted_csvs = BookWriter(converted_filename)
converted_csvs.write_book_reader(book)
converted_csvs.close()
If you have a xlsx that has more than 2 sheets, I imagine you will have more than 2 csv files generated. The naming convention is: "file_converted_%s.csv" % your_sheet_name. The script will save all converted csv files in the same directory where you had xlsx files.
In addition, if you want to merge all in one, it is super easy as well.
from pyexcel.cookbook import merge_all_to_a_book
import glob
merge_all_to_a_book(glob.glob("your_directory/*.xlsx"), "output.xlsx")
If you want to do more, please read the tutorial
Look at openoffice's python library. Although, I suspect openoffice would support MS document files.
Python has no native support for Excel file.
Sure. Iterate over your files using something like glob and feed them into one of the modules you mention. With xlrd, you'd use open_workbook to open each file by name. That will give you back a Book object. You'll then want to have nested loops that iterate over each Sheet object in the Book, each row in the Sheet, and each Cell in the Row. If your rows aren't too wide, you can append each Cell in a Row into a Python list and then feed that list to the writerow method of a csv.writer object.
Since it's a high-level question, this answer glosses over some specifics like how to call xlrd.open_workbook and how to create a csv.writer. Hopefully googling for examples on those specific points will get you where you need to go.
You can use this function to read the data from each file
import xlrd
def getXLData(Filename, min_row_len=1, get_datemode=False, sheetnum=0):
Data = []
book = xlrd.open_workbook(Filename)
sheet = book.sheets()[sheetnum]
rowcount = 0
while rowcount < sheet.nrows:
row = sheet.row_values(rowcount)
if len(row)>=min_row_len: Data.append(row)
rowcount+=1
if get_datemode: return Data, book.datemode
else: return Data
and this function to write the data after you combine the lists together
import csv
def writeCSVFile(filename, data, headers = []):
import csv
if headers:
temp = [headers]
temp.extend(data)
data = temp
f = open(filename,"wb")
writer = csv.writer(f)
writer.writerows(data)
f.close()
Keep in mind you may have to re-format the data, especially if there are dates or integers in the Excel files since they're stored as floating point numbers.
Edited to add code calling the above functions:
import glob
filelist = glob.glob("*.xls*")
alldata = []
headers = []
for filename in filelist:
data = getXLData(filename)
headers = data.pop(0) # omit this line if files do not have a header row
alldata.extend(data)
writeCSVFile("Output.csv", alldata, headers)