I always import one function.py to do some special calculate,
one day I found this function.py have steps of read files, that means, every time I call this function, function.py will open several excel files.
file1 = pd.read_excel('para/first.xlsx',sheet_name='Sheet1')
file2 = pd.read_excel('para/second.xlsx',sheet_name='Sheet1')
...
I think it's a waste of time, can any method to pacakage all the excel files as one parameter, so I can read the files in main script, other than open it many times in function.py.
function.calculate(parameter_group)
to replace
function.calculate(file1,file2, file3...)
how to get "parameter_group"?
I'm thinking if make those files parameters to .pkl, maybe can read faster?
You can loop through the names and put the DataFrames into a list or you could within the loop do whatever calculations you want and save the results in a list. For example:
pd_group = []
file_names = ['file1', 'file2', 'file3']
for name in file_names:
pd_group.append(pd.read_excel(name,sheet_name='Sheet1'))
then access the DataFrames using pd_group[0], pd_group[1] and so on. Or else assign individual names as you want by using myname0 = pd_group[0] and so on.
define a class
read excel in init
class Demo:
def __init__(self):
self.excel_info = self.mock_read_excel()
print("done")
def mock_read_excel(self):
# df = pd.read_excel('para/second.xlsx',sheet_name='Sheet1')
df = "mock data"
print("reading")
return df
def use_df_data_1(self):
print(self.excel_info)
def use_df_data_2(self):
print(self.excel_info)
if __name__ == '__main__':
dm = Demo()
dm.use_df_data_1()
dm.use_df_data_2()
It can solve the problem of reading excel every time the function is called
Related
My code uses pandas to extract information from an excel sheet. I created a function to read and extract what I need and then I set it as two variables so I can work the data.
But the start of my code seems a bit messy. Is there a way to rewrite it?
file_locations = 'C:/Users/sheet.xlsx'
def process_file(file_locations):
file_location = file_locations
df = pd.read_excel(fr'{file_location}')
wagon_list = df['Wagon'].tolist()
weight_list = df['Weight'].tolist()
It seems stupid to have a variable with the file destination and then set the file_location for pandas inside my function as the variable.
I'm not sure if I could use file_location as the variable outside and inside the function I would call file_location = file_location.
Thanks for any input!
You can simply remove the setting of the file location inside the function.
file_location = 'C:/Users/sheet.xlsx'
def process_file():
df = pd.read_excel(fr'{file_location})
wagon_list = df['Wagon'].tolist()
weight_list = df['Weight'].tolist()
But it depends on what you are trying to do with the function as well. Are you using the same function with multiple files in different locations? or is it the same file over and over again.
If it's the later then this seems fine.
You could instead do something like this and feed the location string directly into the function. This is more of a "proper" way to do things.
def process_file(file_location):
df = pd.read_excel(file_location)
wagon_list = df['Wagon'].tolist()
weight_list = df['Weight'].tolist()
process_file('C:/Users/sheet.xlsx')
I have a list of csv files that contain numbers, all the same size. I also have lists of data associated with each file (e.g. list of dates, list of authors of each). What I'm trying to do is read in the file, perform a function , then put the maximum value with its value, along with the data associated with it, into a dataframe.
I have approximately 100,000 files to compute, so being able to use multiprocessing would save an incredible amount of time. Important to note: I'm running the following code in Jupyter Notebook on windows, which I know has known issues with multiprocessing. The code I've tried so far is:
import needed things
files = [file1, file2,...,fileN]
dates = [date1, ... , dateN]
authors = [author1, ... , authorN]
def func(file):
#Reads in file, computes func on values
index = files.index(file)
d1 = {'Filename': file, 'Max': max, 'X-value': x-values, 'Date': dates[index], 'Author': authors[index]}
return d1
if __name__ == '__main__':
with mp.Pool() as pool:
data = pool.map(FindTriggers, files)
summaryDF = pd.DataFrame(data)
This runs indefinitely, a known issue, but I'm not sure what the error is or how to fix it. Thank you in advance for any help.
I'm using python in the lab to control measurements. I often find myself looping over a value (let's say voltage), measuring another (current) and repeating that measurement a couple of times to be able to average the results later. Since I want to keep all the measured data, I like to write it to disk immediately and to keep things organized I use the hdf5 file format. This file format is hierarchical, meaning it has some sort of directory structure inside that uses Unix style names (e.g. / is the root of the file). Groups are the equivalent of directories and datasets are more or less equivalent to files and contain the actual data. The code resulting from such an approach looks something like:
import h5py
hdf_file = h5py.File('data.h5', 'w')
for v in range(5):
group = hdf_file.create_group('/'+str(v))
v_source.voltage = v
for i in range(3):
group2 = group.create_group(str(i))
current = i_probe.current
group2.create_dataset('current', data = current)
hdf_file.close()
I've written a small library to handle the communication with instruments in the lab and I want this library to automatically store the data to file, without explicitly instructing to do so in the script. The problem I run into when doing this is that the groups (or directories if you prefer) still need to be explicitly created at the start of the for loop. I want to get rid of all the file handling code in the script and therefore would like some way to automatically write to a new group on each iteration of the for loop. One way of achieving this would be to somehow modify the for statement itself, but I'm not sure how to do this. The for loop can of course be nested in more elaborate experiments.
Ideally I would be left with something along the lines of:
import h5py
hdf_file = h5py.File('data.h5', 'w')
for v_source.voltage in range(5): # v_source.voltage=x sets the voltage of a physical device to x
for i in range(3):
current = i_probe.current # i_probe.current reads the current from a physical device
current_group.create_dataset('current', data = current)
hdf_file.close()
Any pointers to implement this solution or something equally readable would be very welcome.
Edit:
The code below includes all class definitions etc and might give a better idea of my intentions. I'm looking for a way to move all the file IO to a library (e.g. the Instrument class).
import h5py
class Instrument(object):
def __init__(self, address):
self.address = address
#property
def value(self):
print('getting value from {}'.format(self.address))
return 2 # dummy value instead of value read from instrument
#value.setter
def value(self, value):
print('setting value of {} to {}'.format(self.address, value))
source1 = Instrument('source1')
source2 = Instrument('source2')
probe = Instrument('probe')
hdf_file = h5py.File('data.h5', 'w')
for v in range(5):
source1.value = v
group = hdf_file.create_group('/'+str(v))
group.attrs['address'] = source1.address
for i in range(4):
source2.value = i
group2 = group.create_group(str(i))
group2.attrs['address'] = source2.address
group2.create_dataset('current', data = probe.value)
hdf_file.close()
Without seeing the code it is hard to see, but essentially from the looks of it the pythonic way to do this is that every time you add a new dataset, you want to check whether the directory exists, and if it does you want to append the new dataset, and if it doesn't you want to create a new directory - i.e. this question might help
Writing to a new file if not exist, append to file if it do exist
Instead of writing a new file, use it to create a directory instead. Another helpful one might be
How to check if a directory exists and create it if necessary?
So, I started doing some python recently and I have always like to lift some weights as well. Therefore, I was thinking about a little program where I can put in my training progress (as some kind of python excercise).
I do something like the following as an example:
from sys import argv
file = argv[1]
target_file = open(file, 'w')
weigth = raw_input("Enter what you lifted today: ")
weigth_list = []
weigth_list.append(weigth)
file.write(weigth_list)
file.close()
Now, I know that a lot is wrong here but this is just to get across the idea I had in mind. So what I was hoping to do, was creating a file and getting a list into and store the "raw_input()" in that file. Then I want to save that file and the next time I run the script (say after the next training), I want to save another number and put that to the list. Additionally, I want to do some plotting with the data stored in the list and the file.
Now, I know I could simply do that in Excel but I would prefer to do it in python. Hopefully, someone understood what I mean.
Unsure what exactly your weight_list looks like, or whether you're planning this for one specific workout or the general case, but you'll probably want to use something like a CSV (comma-separated values) format to save the info and be able to easily plot it (for the general case of N different workout types). See below for what I mean:
$ ./record-workout saved-workouts.csv
where the record-form is
<workout type>,<number of sets>,<number of reps>,<weight>
and saved-workouts.csv is the file we'll save to
then, modifying your script ever-so-slightly:
# even though this is a small example, it's usually preferred
# to import the modules from a readability standpoint [1]
import sys
# we'll import time so we can get todays date, so you can record
# when you worked out
import time
# you'll likely want to check that the user provided arguments
if len(sys.argv) != 2:
# we'll print a nice message that will show the user
# how to use the script
print "usage: {} <workout_file>".format(sys.argv[0])
# after printing the message, we'll exit with an error-code
# because we can't do anything else!
sys.exit(1)
# `sys.argv[1]` should contain the first command line argument,
# which in this case is the name of the data file we want
# to write to (and subsequently read from when we're plotting)
# therefore, the type of `f` below is `str` (string).
#
# Note: I changed the name from `file` to `filename` because although `file`
# is not a reserved word, it's the name of a built-in type (and constructor) [2]
filename = sys.argv[1]
# in Python, it's recommended to use a `with` statement
# to safely open a file. [3]
#
# Also, note that we're using 'a' as the mode with which
# to open the file, which means `append` rather than `write`.
# `write` will overwrite the file when we call `f.write()`, but
# in this case we want to `append`.
#
# Lastly, note that `target_file` is the name of the file object,
# which is the object to which you'll be able to read or write or append.
with open(filename, 'a') as target_file:
# you'd probably want the csv-form to look like
#
# benchpress,2,5,225
#
# so for the general case, let's build this up
workout = raw_input("Enter what workout you did today: ")
num_sets = raw_input("Enter the number of sets you did today")
num_reps = raw_input("Enter the number of reps per set you did today")
weight = raw_input("Enter the weight you lifted today")
# you might also want to record the day and time you worked out [4]
todays_date = time.strftime("%Y-%m-%d %H:%M:%S")
# this says "join each element in the passed-in tuple/list
# as a string separated by a comma"
workout_entry = ','.join((workout, num_sets, num_reps, weight, todays_date))
# you don't need to save all the entries to a list,
# you can simply write the workout out to the file obj `target_file`
target_file.write(workout_entry)
# Note: I removed the `target_file.close()` because the file closes when the
# program reaches the end of the `with` statement.
The structure of saved-workouts.csv would thus be:
workout,sets,reps,weight
benchpress,2,5,225
This would also allow you to easily parse the data when you're getting ready to plot it. In this case, you'd want another script (or another function in the above script) to read the file using something like below:
import sys
# since we're reading the csv file, we'll want to use the `csv` module
# to help us parse it
import csv
if len(sys.argv) < 2:
print "usage: {} <workout_file>".format(sys.argv[0])
sys.exit(1)
filename = sys.argv[1]
# now that we're reading the file, we'll use `r`
with open(filename, 'r') as data_file:
# to use `csv`, you need to create a csv-reader object by
# passing in the `data_file` `file` object
reader = csv.reader(data_file)
# now reader contains a parsed iterable version of the file
for row in reader:
# here's where you'll want to investigate different plotting
# libraries and such, where you'll be accessing the various
# points in each line as follows:
workout_name = row[0]
num_sets = row[1]
num_reps = row[2]
weight = row[3]
workout_time = row[4]
# optionally, if your csv file contains headers (as in the example
# above), you can access fields in each row using:
#
# row['weight'] or row['workout'], etc.
Sources:
[1] https://softwareengineering.stackexchange.com/questions/187403/import-module-vs-from-module-import-function
[2] https://docs.python.org/2/library/functions.html#file
[3] http://effbot.org/zone/python-with-statement.htm
[4] How to get current time in Python
I have a function f2(a, b)
It is only ever called by a minimize algorithm which iterates the function for different values of a and b each time. I would like to store these iterations in excel for plotting.
Is it possible to extract these values (i only need to paste them all into excel or a text file) easily? Conventional return and print won't work within f2. Is there any way to extract the values a and b to a public list in the main body some other way?
The algorithm may iterate dozens or hundreds of times.
So far I have tried:
Print to console (can't paste this data into excel easily)
Write to file (csv) within f2, the csv file gets overwritten within the function each time though.
Append the values to a global list.
values = []
def f2(a,b):
values.append((a,b))
#todo: actual function logic goes here
Then you can look at values in the main scope once you're done iterating.
Write to file (csv) within f2, the csv file gets overwritten within the function each time though.
Not if you open the file in append mode:
with open("file.csv", "a") as myfile: