In the lab that I work in, we process a lot of data produced by a 96 well plate reader. I'm trying to write a script that will perform a few calculations and output a bar graph using matplotlib.
The problem is that the plate reader outputs data into a .xlsx file. I understand that some modules like pandas have a read_excel function, can you explain how I should go about reading the excel file and putting it into a dataframe?
Thanks
Data sample of a 24 well plate (for simplicity):
0.0868 0.0910 0.0912 0.0929 0.1082 0.1350
0.0466 0.0499 0.0367 0.0445 0.0480 0.0615
0.6998 0.8476 0.9605 0.0429 1.1092 0.0644
0.0970 0.0931 0.1090 0.1002 0.1265 0.1455
I'm not exactly sure what you mean when you say array, but if you mean into a matrix, might you be looking for:
import pandas as pd
df = pd.read_excel([path here])
df.as_matrix()
This returns a numpy.ndarray type.
This task is super easy in Pandas these days.
import pandas as pd
df = pd.read_excel('file_name_here.xlsx', sheet_name='Sheet1')
or
df = pd.read_csv('file_name_here.csv')
This returns a pandas.DataFrame object which is very powerful for performing operations by column, row, over an entire df, or over individual items with iterrows. Not to mention slicing in different ways.
There is awesome xlrd package with quick start example here.
You can just google it to find code snippets. I have never used panda's read_excel function, but xlrd covers all my needs, and can offer even more, I believe.
You could also try it with my wrapper library, which uses xlrd as well:
import pyexcel as pe # pip install pyexcel
import pyexcel.ext.xls # pip install pyexcel-xls
your_matrix = pe.get_array(file_name=path_here) # done
Related
So my xlsb file contains real-time data fetching and calculations from third party service like Bloomberg. After calculation is done in excel, how to import the file in Python?
I tried methods online, but none worked and returned NA for cells that required real time calculations.
Try the latest xlsb2xlsx package on PyPI:
pip install xlsb2xlsx
python -m xlsb2xlsx /filepath_with_xlsb_file
Then you can use pandas with something like:
import pandas as pd
df = pd.read_excel('your_filepath.xlsx')
And work with the df object from there.
See https://pypi.org/project/xlsb2xlsx/ for more info.
I'm trying to parse .ods files with pandas, using pd.read_excel() function, which uses odf under the hood. The problem I face is simple: some cells have comments, and pandas treat them as if they were some regular content.
Here is a basic example ; given a very simple .ods file with a single comment:
Importing that file into a dataframe using
import pandas as pd
df = pd.read_excel("example_with_comment.ods")
gives:
while I would have liked to retrieve the content of the cell only.
Does anyone know how to drop the comments during parsing ?
I'm using pandas 1.3.4.
Thanks a lot to anyone who could give me a hint !
It seems like a bug. You may try, instead of read_excel, to use this module:
https://pypi.org/project/pandas-ods-reader/
I have a file bigger than 7GB. I am trying to place it into a dataframe using pandas, like this:
df = pd.read_csv('data.csv')
But it takes too long. Is there a better way to speed up the dataframe creation? I was considering changing the parameter engine='c', since it says in the documentation:
"engine{‘c’, ‘python’}, optional
Parser engine to use. The C engine is faster while the python engine is currently more feature-complete."
But I dont see much gain in speed
If the problem is you are not able to create the dataframe since the big size makes the operation to fail, you can check how to chunk it in this answer
In case it is created at some point, but you consider it is too slow, then you can use datatable to read the file, then convert to pandas, and continue with your operations:
import pandas as pd
import datatable as dt
# Read with databale
datatable_df = dt.fread('myfile.csv')
# Then convert the dataframe into pandas
pandas_df = frame_datatable.to_pandas()
Ok so I'm looking to create a program that will interact with an excel spreadsheet. The idea that seemed to work the most is converting it to a csv file. I've managed to make a program that prints the data but I want it to edit it and thus change the results in the csv file itself.
Sorry if it's a bit confusing as my programming skills aren't great.
Heres the code:
import csv
with open('wert.csv') as csvfile:
freq=csv.reader(csvfile, delimiter=',')
for row in freq:
print(row[0],row[1],row[2])
If anyone has a better idea on how to make this program work then it would be greatly appreciated.
Thanks
You could try using the pandas package, a widely used data analysis/manipulation library.
import pandas as pd
data = pd.read_csv('foo.csv')
#change data here, see pandas documentation
data.to_csv('bar.csv')
You can find the docs here
If you csv file is composed of just numbers (floats) or numbers and a header, you can try reading it with:
import numpy as np
data=np.genfromtxt('name.csv',delimiter=',',skip_header=1)
Then modify your data in python, and save it with:
data_modified=data**2 #for example
np.savetxt('name_modified.csv',data_modified,delimiter=',',header='whaterverheader,you,want')
You can read the excel file directly using pandas and do the processing directly
import pandas
measured_data = pandas.read_excel(filename)
A simple, yet somewhat baffling enquiry. I have a CSV file, which originally contains 136 rows and 24 columns of data (plus a column of indices and two rows' worth of column headers). When I import this file into Python with the aid of pandas, everything is okay both in Python 3 and Python 2.
import pandas as pd
R = pd.read_csv('csv_file.csv', index_col=0, header=[0,1])
However, things go south when I reorder the CSV, compacting eight old file rows into a single new row. This results in 17 rows and 192 columns of data, which Python 3's pandas still handles fine. However, Python 2's pandas now just returns a giant data frame of NaNs, with the indices/column names imported fine.
Any idea what's going on here? How do I make it go away? I need this code to work in Python 2 because reasons. In case it's of relevance, Python 2 is on Debian.
The problem stemmed from an outdated pandas version (0.14.1), which is distributed through apt-get on Debian. Updating pandas through pip to 0.17.1 solved the issue, so whatever bug caused this got sorted out by now. Thanks to JohnE for the help!