How to open different excel sheets using a class python - python

So I am pretty new to classes but I am trying to write one that opens a excel file in a dataframe then extracts some information from it then moves on to the next excel file and does the same. The name of each file is the same with a different number on the end and this cannot be changed - the numbers are not consistent.
I have tried this code:
class Systems:
def __init__(self, survey_number):
self.survey_number = survey_number
self.file_name = 's3://misc/survey' + survey_number + '.xlsx'
def readfile(self):
self.df = pd.read_excel(self.file_name, sheet_name='Results')
survey_1 = Systems('026')
survey_1.df
I thought this should being up the dataframe for the first input then I could do the same with the other files names however I am getting error:
AttributeError: Systems instance has no attribute 'df'
I have not included a sample of that data as I don't think it is needed for this? Let me know if it is.
I will be adding more functions to the class when it is working but think this step needs resolving first and I don't know how to fix it.
Thanks!
EDIT - I believe the problem is trying to read the file using the '...' + variable + '...' method - is there a better way to do this?

class Systems:
def __init__(self, survey_number):
self.file_name = 'yourpath' + survey_number + '.xlsx'
self.df = pd.read_excel(self.file_name, sheet_name='Results')
and then
a = Systems('123')
a has then a property call df which is what you are looking for.
a.df

Related

how to package all parameters as one?

I always import one function.py to do some special calculate,
one day I found this function.py have steps of read files, that means, every time I call this function, function.py will open several excel files.
file1 = pd.read_excel('para/first.xlsx',sheet_name='Sheet1')
file2 = pd.read_excel('para/second.xlsx',sheet_name='Sheet1')
...
I think it's a waste of time, can any method to pacakage all the excel files as one parameter, so I can read the files in main script, other than open it many times in function.py.
function.calculate(parameter_group)
to replace
function.calculate(file1,file2, file3...)
how to get "parameter_group"?
I'm thinking if make those files parameters to .pkl, maybe can read faster?
You can loop through the names and put the DataFrames into a list or you could within the loop do whatever calculations you want and save the results in a list. For example:
pd_group = []
file_names = ['file1', 'file2', 'file3']
for name in file_names:
pd_group.append(pd.read_excel(name,sheet_name='Sheet1'))
then access the DataFrames using pd_group[0], pd_group[1] and so on. Or else assign individual names as you want by using myname0 = pd_group[0] and so on.
define a class
read excel in init
class Demo:
def __init__(self):
self.excel_info = self.mock_read_excel()
print("done")
def mock_read_excel(self):
# df = pd.read_excel('para/second.xlsx',sheet_name='Sheet1')
df = "mock data"
print("reading")
return df
def use_df_data_1(self):
print(self.excel_info)
def use_df_data_2(self):
print(self.excel_info)
if __name__ == '__main__':
dm = Demo()
dm.use_df_data_1()
dm.use_df_data_2()
It can solve the problem of reading excel every time the function is called

How to call a variable to my function without repeating it

My code uses pandas to extract information from an excel sheet. I created a function to read and extract what I need and then I set it as two variables so I can work the data.
But the start of my code seems a bit messy. Is there a way to rewrite it?
file_locations = 'C:/Users/sheet.xlsx'
def process_file(file_locations):
file_location = file_locations
df = pd.read_excel(fr'{file_location}')
wagon_list = df['Wagon'].tolist()
weight_list = df['Weight'].tolist()
It seems stupid to have a variable with the file destination and then set the file_location for pandas inside my function as the variable.
I'm not sure if I could use file_location as the variable outside and inside the function I would call file_location = file_location.
Thanks for any input!
You can simply remove the setting of the file location inside the function.
file_location = 'C:/Users/sheet.xlsx'
def process_file():
df = pd.read_excel(fr'{file_location})
wagon_list = df['Wagon'].tolist()
weight_list = df['Weight'].tolist()
But it depends on what you are trying to do with the function as well. Are you using the same function with multiple files in different locations? or is it the same file over and over again.
If it's the later then this seems fine.
You could instead do something like this and feed the location string directly into the function. This is more of a "proper" way to do things.
def process_file(file_location):
df = pd.read_excel(file_location)
wagon_list = df['Wagon'].tolist()
weight_list = df['Weight'].tolist()
process_file('C:/Users/sheet.xlsx')

Python convert.txt to .csv without having a specific file name

I am currently working on an application that will convert a messy text file full of data into an organized CSV. It currently works well but when I convert the text file to csv I want to be able to select a text file with any name. The way my code is currently written, I have to name the file "file.txt". How do I go about this?
Here is my code. I can send the whole script if necessary. Just to note this is a function that is linked to a tkinter button. Thanks in advance.
def convert():
df = pd.read_csv("file.txt",delimiter=';')
df.to_csv('Cognex_Data.csv')
Try defining your function as follow:
def convert(input_filename, output_filename='Cognex_Data.csv'):
df = pd.read_csv(input_filename, delimiter=';')
df.to_csv(output_filename)
And for instance use it as follow:
filename = input("Enter filename: ")
convert(filename, "Cognex_Data.csv")
You can also put "Cognex_Data.csv" as a default value for the output_filename argument in the convert function definition (as done above).
And finally you can use any way you like to get the filename (for instance tkinter.filedialog as suggested by matszwecja).
I haven't worked with tkinter, but PySimplyGUI, which to my knowledge is built on tkinter so you should have the possibility to extract the variables that correspond to the name of the file selected by the user. That's what I'm doing using PySimpleGUIon a similar problem.
Then, extract the file name selected by the user through the prompt and pass it as an argument to your function:
def convert(file):
df = pd.read_csv("{}.txt".format(file), delimiter=';')
df.to_csv('Cognex_Data.csv')

Pythonic way to modify a for loop

I'm using python in the lab to control measurements. I often find myself looping over a value (let's say voltage), measuring another (current) and repeating that measurement a couple of times to be able to average the results later. Since I want to keep all the measured data, I like to write it to disk immediately and to keep things organized I use the hdf5 file format. This file format is hierarchical, meaning it has some sort of directory structure inside that uses Unix style names (e.g. / is the root of the file). Groups are the equivalent of directories and datasets are more or less equivalent to files and contain the actual data. The code resulting from such an approach looks something like:
import h5py
hdf_file = h5py.File('data.h5', 'w')
for v in range(5):
group = hdf_file.create_group('/'+str(v))
v_source.voltage = v
for i in range(3):
group2 = group.create_group(str(i))
current = i_probe.current
group2.create_dataset('current', data = current)
hdf_file.close()
I've written a small library to handle the communication with instruments in the lab and I want this library to automatically store the data to file, without explicitly instructing to do so in the script. The problem I run into when doing this is that the groups (or directories if you prefer) still need to be explicitly created at the start of the for loop. I want to get rid of all the file handling code in the script and therefore would like some way to automatically write to a new group on each iteration of the for loop. One way of achieving this would be to somehow modify the for statement itself, but I'm not sure how to do this. The for loop can of course be nested in more elaborate experiments.
Ideally I would be left with something along the lines of:
import h5py
hdf_file = h5py.File('data.h5', 'w')
for v_source.voltage in range(5): # v_source.voltage=x sets the voltage of a physical device to x
for i in range(3):
current = i_probe.current # i_probe.current reads the current from a physical device
current_group.create_dataset('current', data = current)
hdf_file.close()
Any pointers to implement this solution or something equally readable would be very welcome.
Edit:
The code below includes all class definitions etc and might give a better idea of my intentions. I'm looking for a way to move all the file IO to a library (e.g. the Instrument class).
import h5py
class Instrument(object):
def __init__(self, address):
self.address = address
#property
def value(self):
print('getting value from {}'.format(self.address))
return 2 # dummy value instead of value read from instrument
#value.setter
def value(self, value):
print('setting value of {} to {}'.format(self.address, value))
source1 = Instrument('source1')
source2 = Instrument('source2')
probe = Instrument('probe')
hdf_file = h5py.File('data.h5', 'w')
for v in range(5):
source1.value = v
group = hdf_file.create_group('/'+str(v))
group.attrs['address'] = source1.address
for i in range(4):
source2.value = i
group2 = group.create_group(str(i))
group2.attrs['address'] = source2.address
group2.create_dataset('current', data = probe.value)
hdf_file.close()
Without seeing the code it is hard to see, but essentially from the looks of it the pythonic way to do this is that every time you add a new dataset, you want to check whether the directory exists, and if it does you want to append the new dataset, and if it doesn't you want to create a new directory - i.e. this question might help
Writing to a new file if not exist, append to file if it do exist
Instead of writing a new file, use it to create a directory instead. Another helpful one might be
How to check if a directory exists and create it if necessary?

Python win32 read/modify/store/map Name Box

I'm trying to do some work on a complex Excel Workbook which has a large number of variables which have been created and used using the Name Box feature. See picture attached for example/detail.
I'd like to store or change DeathRate or maybe read all the Name Boxes and create a dictionary between names and locations of the cell from outside Excel.
I'm using the win32com library in Python but I guess I could switch to another Excel reader as long as it copes with XLSX files.
Has someone come across this before?
Found the solution, see code below:
import os
from win32com.client import Dispatch #win32com is based around cells beginning at one.
app_xl = Dispatch("Excel.Application")
WORKING_DIR = os.getcwd()
excelPath = WORKING_DIR + "\SampleModel.xls"
wb = app_xl.Workbooks.Open(excelPath)
# Get Named Boxes
name_box_list = [x for x in app_xl.ActiveWorkbook.Names]
name_box_map = {x.Name:x.Value for x in name_box_list}
print name_box_list
print name_box_map
# Change Named Boxes
name_box_list[0].Name = u'NewName'
name_box_list[0].Value = u'=model!$B$5'
name_box_map = {x.Name:x.Value for x in name_box_list}

Categories

Resources