Read table data from Excel file with python

Read table data from Excel file with python - python

I currently have an Excel workbook with some graphs (charts?). The graphs are plotted from numerical values. I can access the values in LibreOffice if I right click on the graph and select "Data table". These values are nowhere else in the file.
I would like to access these values programmatically with Python. I tried things like xlrd, but it seems xlrd ignores graphical elements. When I run it on my workbook I only get empty cells back.
Have you ever encountered this issue?
Sadly I cannot provide the file as it is confidential.

import pandas as pd
df = pd.read_excel('path/name_of_your_file.xlsx')
print(df.head())
You should have a dataframe (df) to play with in python!

I never worked with graphical excel file. But i used to read normal excel with following code. have you tried this?
import xlrd
file = 'temp.xls'
book = xlrd.open_workbook(file)
for sheet in book.sheets():
#to check columns in sheet
if sheet.ncols:
#row values
row_list = sheet.row_values
for value in row_list:
print(value)

Related

Edit .xlsx with python

I Completely have no idea where to start.
I want to edit some think like:
To:
I want to save the result in a .txt file.
Every thing i know is to open and read the file.
code:
import pandas as pd
file = "myfile.xlsx"
f = pd.read_excel(file)
print(f)
I think the image colors speak for themselves how the code have to run. If not, I'll answer any question.

My go-to for editing Excel spreadsheets is openpyxl
I don't believe it can turn .csv or .xlsx/xlsm into .txt files, but it can read .xlsx/xlsm and save them as a .csv, and pandas can read csv files, so you can probably go from there
Quick example:
from openpyxl import load_workbook
wb = load_workbook("foo.xlsx")
sheet = wb["baz"]
sheet["D5"] = "I'm cell D5"

Use openpyxl, and look at this below:
Get cell color from .xlsx
color_in_hex = sh['A2'].fill.start_color.index # this gives you Hexadecimal value of the color (in cell A2)
So you'd have to iterate across your columns/rows checking for a colour match, then if its a match, grab the value and apply it to your new sheet

How can I check values of an excel sheet using Python?

I have an Excel sheet, with data of stock prices and I want to build a code where I check if the guess is correct or not. So I need to compare my python value to the value on the Excel sheet.
I have tried using repl.it in a .csv file, but it was not compatible and I was not able to check my values. I have also tried using a .xlsx file on repl.it, but I still could not access the values.
Is there any way I can compare the values?

Try this:
import pandas as pd
table = pd.read_excel('file_name.xslx')
for row in table.values.tolist():
first_value_in_row = row[0]
second_value_in_row = row[1]

How to write multiIndex-columns excel with pandas

I want to export a multiIndex-column.
I read an excel file (https://drive.google.com/open?id=1G6nE5wiNRf5sip22dQ8dfhuKgxzm4f8E) and exported it with the following code:
df = pd.read_excel('sample.xlsx')
df.to_excel('sample2.xlsx', index = False)
However, sample2.xlsx has different format as sample.xlsx.
For example, there are merged cells in sample.xlsx but not in sample2.xlsx and the blank cells in sample.xlsx become Unnamed:xx.
You can view sample2.xlsx here.
How to solve this problem?
Thank you.

Since you working with xlsx files, openpyxl package will do the job.
import openpyxl
wb_obj = openpyxl.load_workbook('sample.xlsx')
wb_obj.save('sample2.xlsx')
Further reading on openpyxl

How do I execute this python code automatically in in excel cells?

I need to extract the domain for example: (http: //www.example.com/example-page, http ://test.com/test-page) from a list of websites in an excel sheet and modify that domain to give its url (example.com, test.com). I have got the code part figured put but i still need to get these commands to work on excel sheet cells in a column automatically.
here's_the_code

I think you should read in the data as a pandas DataFrame (pd.read_excel), make a function from your code then apply to the dframe (df.apply). Then it is easy to save to excel with pd.to_excel().
ofc you will need pandas to be installed.
Something like:
import pandas as pd
dframe = pd.read_excel(io='' , sheet_name='')
dframe['domains'] = dframe['urls col name'].apply(your function)
dframe.to_excel('your path')
Best

xlsxwriter: is there a way to open an existing worksheet in my workbook?

I'm able to open my pre-existing workbook, but I don't see any way to open pre-existing worksheets within that workbook. Is there any way to do this?

You cannot append to an existing xlsx file with xlsxwriter.
There is a module called openpyxl which allows you to read and write to preexisting excel file, but I am sure that the method to do so involves reading from the excel file, storing all the information somehow (database or arrays), and then rewriting when you call workbook.close() which will then write all of the information to your xlsx file.
Similarly, you can use a method of your own to "append" to xlsx documents. I recently had to append to a xlsx file because I had a lot of different tests in which I had GPS data coming in to a main worksheet, and then I had to append a new sheet each time a test started as well. The only way I could get around this without openpyxl was to read the excel file with xlrd and then run through the rows and columns...
i.e.
cells = []
for row in range(sheet.nrows):
cells.append([])
for col in range(sheet.ncols):
cells[row].append(workbook.cell(row, col).value)
You don't need arrays, though. For example, this works perfectly fine:
import xlrd
import xlsxwriter
from os.path import expanduser
home = expanduser("~")
# this writes test data to an excel file
wb = xlsxwriter.Workbook("{}/Desktop/test.xlsx".format(home))
sheet1 = wb.add_worksheet()
for row in range(10):
for col in range(20):
sheet1.write(row, col, "test ({}, {})".format(row, col))
wb.close()
# open the file for reading
wbRD = xlrd.open_workbook("{}/Desktop/test.xlsx".format(home))
sheets = wbRD.sheets()
# open the same file for writing (just don't write yet)
wb = xlsxwriter.Workbook("{}/Desktop/test.xlsx".format(home))
# run through the sheets and store sheets in workbook
# this still doesn't write to the file yet
for sheet in sheets: # write data from old file
newSheet = wb.add_worksheet(sheet.name)
for row in range(sheet.nrows):
for col in range(sheet.ncols):
newSheet.write(row, col, sheet.cell(row, col).value)
for row in range(10, 20): # write NEW data
for col in range(20):
newSheet.write(row, col, "test ({}, {})".format(row, col))
wb.close() # THIS writes
However, I found that it was easier to read the data and store into a 2-dimensional array because I was manipulating the data and was receiving input over and over again and did not want to write to the excel file until it the test was over (which you could just as easily do with xlsxwriter since that is probably what they do anyway until you call .close()).

After searching a bit about the method to open the existing sheet in xlxs, I discovered
existingWorksheet = wb.get_worksheet_by_name('Your Worksheet name goes here...')
existingWorksheet.write_row(0,0,'xyz')
You can now append/write any data to the open worksheet.

You can use the workbook.get_worksheet_by_name() feature:
https://xlsxwriter.readthedocs.io/workbook.html#get_worksheet_by_name
According to https://xlsxwriter.readthedocs.io/changes.html the feature has been added on May 13, 2016.
"Release 0.8.7 - May 13 2016
-Fix for issue when inserting read-only images on Windows. Issue #352.
-Added get_worksheet_by_name() method to allow the retrieval of a worksheet from a workbook via its name.
-Fixed issue where internal file creation and modification dates were in the local timezone instead of UTC."

Although it is mentioned in the last two answers with it's documentation link, and from the documentation it seems indeed there are new methods to work with the "worksheets", I couldn't able to find this methods in the latest package of "xlsxwriter==3.0.3"
"xlrd" has removed support for anything other than xls files now.
Hence I was able to workout with "openpyxl" this gives you the expected functionality as mentioned in the first answer above.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read table data from Excel file with python - python

import pandas as pd df = pd.read_excel('path/name_of_your_file.xlsx') print(df.head()) You should have a dataframe (df) to play with in python!

Related

Edit .xlsx with python

How can I check values of an excel sheet using Python?

How to write multiIndex-columns excel with pandas

How do I execute this python code automatically in in excel cells?

xlsxwriter: is there a way to open an existing worksheet in my workbook?

Categories

Resources