python: write input to file and save it - python

So, I started doing some python recently and I have always like to lift some weights as well. Therefore, I was thinking about a little program where I can put in my training progress (as some kind of python excercise).
I do something like the following as an example:
from sys import argv
file = argv[1]
target_file = open(file, 'w')
weigth = raw_input("Enter what you lifted today: ")
weigth_list = []
weigth_list.append(weigth)
file.write(weigth_list)
file.close()
Now, I know that a lot is wrong here but this is just to get across the idea I had in mind. So what I was hoping to do, was creating a file and getting a list into and store the "raw_input()" in that file. Then I want to save that file and the next time I run the script (say after the next training), I want to save another number and put that to the list. Additionally, I want to do some plotting with the data stored in the list and the file.
Now, I know I could simply do that in Excel but I would prefer to do it in python. Hopefully, someone understood what I mean.

Unsure what exactly your weight_list looks like, or whether you're planning this for one specific workout or the general case, but you'll probably want to use something like a CSV (comma-separated values) format to save the info and be able to easily plot it (for the general case of N different workout types). See below for what I mean:
$ ./record-workout saved-workouts.csv
where the record-form is
<workout type>,<number of sets>,<number of reps>,<weight>
and saved-workouts.csv is the file we'll save to
then, modifying your script ever-so-slightly:
# even though this is a small example, it's usually preferred
# to import the modules from a readability standpoint [1]
import sys
# we'll import time so we can get todays date, so you can record
# when you worked out
import time
# you'll likely want to check that the user provided arguments
if len(sys.argv) != 2:
# we'll print a nice message that will show the user
# how to use the script
print "usage: {} <workout_file>".format(sys.argv[0])
# after printing the message, we'll exit with an error-code
# because we can't do anything else!
sys.exit(1)
# `sys.argv[1]` should contain the first command line argument,
# which in this case is the name of the data file we want
# to write to (and subsequently read from when we're plotting)
# therefore, the type of `f` below is `str` (string).
#
# Note: I changed the name from `file` to `filename` because although `file`
# is not a reserved word, it's the name of a built-in type (and constructor) [2]
filename = sys.argv[1]
# in Python, it's recommended to use a `with` statement
# to safely open a file. [3]
#
# Also, note that we're using 'a' as the mode with which
# to open the file, which means `append` rather than `write`.
# `write` will overwrite the file when we call `f.write()`, but
# in this case we want to `append`.
#
# Lastly, note that `target_file` is the name of the file object,
# which is the object to which you'll be able to read or write or append.
with open(filename, 'a') as target_file:
# you'd probably want the csv-form to look like
#
# benchpress,2,5,225
#
# so for the general case, let's build this up
workout = raw_input("Enter what workout you did today: ")
num_sets = raw_input("Enter the number of sets you did today")
num_reps = raw_input("Enter the number of reps per set you did today")
weight = raw_input("Enter the weight you lifted today")
# you might also want to record the day and time you worked out [4]
todays_date = time.strftime("%Y-%m-%d %H:%M:%S")
# this says "join each element in the passed-in tuple/list
# as a string separated by a comma"
workout_entry = ','.join((workout, num_sets, num_reps, weight, todays_date))
# you don't need to save all the entries to a list,
# you can simply write the workout out to the file obj `target_file`
target_file.write(workout_entry)
# Note: I removed the `target_file.close()` because the file closes when the
# program reaches the end of the `with` statement.
The structure of saved-workouts.csv would thus be:
workout,sets,reps,weight
benchpress,2,5,225
This would also allow you to easily parse the data when you're getting ready to plot it. In this case, you'd want another script (or another function in the above script) to read the file using something like below:
import sys
# since we're reading the csv file, we'll want to use the `csv` module
# to help us parse it
import csv
if len(sys.argv) < 2:
print "usage: {} <workout_file>".format(sys.argv[0])
sys.exit(1)
filename = sys.argv[1]
# now that we're reading the file, we'll use `r`
with open(filename, 'r') as data_file:
# to use `csv`, you need to create a csv-reader object by
# passing in the `data_file` `file` object
reader = csv.reader(data_file)
# now reader contains a parsed iterable version of the file
for row in reader:
# here's where you'll want to investigate different plotting
# libraries and such, where you'll be accessing the various
# points in each line as follows:
workout_name = row[0]
num_sets = row[1]
num_reps = row[2]
weight = row[3]
workout_time = row[4]
# optionally, if your csv file contains headers (as in the example
# above), you can access fields in each row using:
#
# row['weight'] or row['workout'], etc.
Sources:
[1] https://softwareengineering.stackexchange.com/questions/187403/import-module-vs-from-module-import-function
[2] https://docs.python.org/2/library/functions.html#file
[3] http://effbot.org/zone/python-with-statement.htm
[4] How to get current time in Python

Related

Looking to compare values in two different files

I have two CSV files that have been renamed to text files. I need to compare a column in each one (a date) to confirm they have been updated.
For example, c:\temp\oldfile.txt has 6 columns and the last one is called version. I need to make sure that c:\temp\newfile.txt has a different value for version. It doesn't need to do any date verification of any kind, as long as the comparison sees that they're different, it can proceed. If possible, I would prefer to stick with 'standard' libraries as I'm just learning and don't want to start creating dictionaries and learning pandas and numpy just yet.
Edit
Here's a copy of oldfile.txt and newfile.txt.
oldfile.txt:
feed_publisher_name,feed_publisher_url,feed_lang,feed_start_date,feed_end_date,feed_version
MyStuff,http://www.mystuff.com,en,20220103,20220417,22APR_20220401
newfile.txt:
feed_publisher_name,feed_publisher_url,feed_lang,feed_start_date,feed_end_date,feed_version
MyStuff,http://www.mystuff.com,en,20220103,20220417,22APR_20220414
In this case the comparison would note that the last column has a different value and would know to proceed with the rest of the script. Otherwise, if the values are the same, it will know that it was not updated and I'll have the program exit.
You can do it by using the csv module in the standard library since that's the format of your files.
import csv
with open('oldfile.txt', 'r', newline='') as oldfile, \
open('newfile.txt', 'r', newline='') as newfile:
old_reader = csv.DictReader(oldfile)
new_reader = csv.DictReader(newfile)
old_row = next(old_reader)
new_row = next(new_reader)
same = old_row['feed_version'] == new_row['feed_version']
print(f"The files are {'the same' if same else 'different'}.")
If you are only interested in checking if there two files are the equal (essentially "updated"), you can compute the hash of one file and compare with the hash of the other
To compute hash (for example, sha256), you can use the following function:
import hashlib
def sha256sum(filename):
# Opens the file
with open(filename, 'rb') as file:
content = file.read()
hasher = hashlib.sha256()
hasher.update(content)
return hasher.hexdigest()
hashlib is probably part of the standard library if you went through the default installation process.
For example, if you write "v1.0" in a text document, the hasher function will give "fa8b919c909d5eb9e373d090928170eb0e7936ac20ccf413332b96520903168e"
If you later change it to "v1.1", the hasher function will give "eb79768c42dbbf9f10733e525a06ea9eb08f28b7b8edf9c6dcacb63940aedcb0".
These are two different hexdigest values, so it would imply that two files are different.
Reading the file-
we don't need any libraries for this. just opening the file and reading it, then doing a little parsing:
a, b = "", "" # set the globals for the comparison
with open("c:/temp/oldfile.txt") as f: # open the file as f
text = f.read().split('\n')[1] # get the contents of the file then cut just the second line from it
a = text.split(',')[5] # spliting the string by ',' to an array then getting the 6th element
Then opening the other one:
with open("c:/temp/newfile.txt") as f:
text = f.read().split('\n')[1]
b = text.split(',')[5]
more on reading files here
Comparing the lines-
if a == b:
print("The date is the same!")
else:
print("The date is different...")
Of course you can make this into a function and make it return whether or not they're equal then use the value to determine the future of the program.
Hope this helps!

Python csv.dictreader does not appear to be reading all rows of csv

I'm working on a program that, in short, creates a CSV file and compares it line by line to a previously generated CSV file (created on previous script execution), and logs differences between the two.
I'm having a weird issue where csv.dictreader does not appear to be reading all rows of the NEW log - but it DOES read all rows of the OLD log. What makes this issue even more puzzling is that if I run the program again, it creates a new log, and will now read the previous log all the way to the end.
Here's what I mean if that made no sense:
Run program to generate LOG A
Run program again to generate LOG B
csv.dictreader does not read all the way through LOG B, but it DOES read all the way through LOG A
Run program again to generate LOG C
csv.dictreader does NOT read all the way through LOG C, but it DOES read all the way through LOG B (which it previously didn't although no information in the csv file has changed)
Here's the relevant function:
def compare_data(newLog, oldLog):
# this function compares the new scan Log to the old Log to determine if, and how much, memory space has changed
# within each directory
# both arguments should be filenames of the logs
# use the .csv library to create dictionary objects out of files and read thru them
newReader = csv.DictReader(newLog)
oldReader = csv.DictReader(oldLog)
oldDirs, newDirs = [], []
# write data from new log into dictionary list
for row in newReader:
newLogData = {}
newLogData['ScanDate'] = row['ScanDate']
newLogData['Directory'] = row['Directory']
newLogData['Size'] = int(row['Size'])
newDirs.append(newLogData)
# write data from old log into another dictionary list
for row in oldReader:
oldLogData = {}
oldLogData['ScanDate'] = row['ScanDate']
oldLogData['Directory'] = row['Directory']
oldLogData['Size'] = int(row['Size'])
oldDirs.append(oldLogData)
# now compare data between the two dict lists to determine what's changed
for newDir in newDirs:
for oldDir in oldDirs:
dirMatch = False
if newDir['Directory'] == oldDir['Directory']:
# we have linked dirs. now check to see if their size has changed
dirMatch = True
if newDir['Size'] != oldDir['Size']:
print(newDir['Directory'], 'has changed in size! It used to be', oldDir['Size'],
'bytes, now it\'s', newDir['Size'], 'bytes! Hype')
# now that we have determined changes, now we should check for unique dirs
# unique meaning either a dir that has been deleted since last scan, or a new dir added since last scan
find_unique_dirs(oldDirs, newDirs)
Based on the fact that the old log gets read in its entirety, I don't think it would be an issue of open quotes in filenames or anything like that.

Attempt to Load-Update-Resave Argument in a game

I am very new to programing and trying to learn by doing creating a text adventure game and reading Python documentation/blogs.
My issue is I'm attempting to save/load data in a text game to create some elements which carry over from game to game and are passed as arguments. Specifically with this example my goal recall, update and load an incrementing iteration each time the game is played past the intro. Specially my intention here is to import the saved march_iteration number, display it to the user as a default name suggestion, then iterate the iteration number and save the updated saved march_iteration number.
From my attempts at debugging this I seem to be updating the value and saving the updated value of 2 to the game.sav file correctly, so I believe my issues is either I'm failing to load the data properly or overwriting the saved value with the static one somehow. I've read as much documentation as I can find but from the articles I've read on saving and loading to json I cannot identify where my code is wrong.
Below is a small code snippet I wrote just to try and get the save/load working. Any insight would be greatly appreciated.
import json
def _save(dummy):
f = open("game.sav", 'w+')
json.dump(world_states, f)
f.close
def _continue(dummy):
f = open("game.sav", 'r+')
world_states = json.load(f)
f.close
world_states = {
"march_iteration" : 1
}
def _resume():
_continue("")
_resume()
print ("world_states['march_iteration']", world_states['march_iteration'])
current_iteration = world_states["march_iteration"]
def name_the_march(curent_iteration=world_states["march_iteration"]):
march_name = input("\nWhat is the name of your march? We suggest TrinMar#{}. >".format(current_iteration))
if len(march_name) == 0:
print("\nThe undifferentiated units shift nerviously, unnerved and confused, perhaps even angry.")
print("\nPlease give us a proper name executor. The march must not be nameless, that would be chaos.")
name_the_march()
else:
print("\nThank you Executor. The {} march begins its long journey.".format(march_name))
world_states['march_iteration'] = (world_states['march_iteration'] +1)
print ("world_states['march_iteration']", world_states['march_iteration'])
#Line above used only for debugging purposed
_save("")
name_the_march()
I seem to have found a solution which works for my purposes allowing me to load, update and resave. It isn't the most efficient but it works, the prints are just there to display the number being properly loaded and updated before being resaved.
Pre-requisite: This example assumes you've already created a file for this to open.
import json
#Initial data
iteration = 1
#Restore from previously saved from a file
with open('filelocation/filename.json') as f:
iteration = json.load(f)
print(iteration)
iteration = iteration + 1
print(iteration)
#save updated data
f = open("filename.json", 'w')
json.dump(iteration, f)
f.close

Pythonic way to modify a for loop

I'm using python in the lab to control measurements. I often find myself looping over a value (let's say voltage), measuring another (current) and repeating that measurement a couple of times to be able to average the results later. Since I want to keep all the measured data, I like to write it to disk immediately and to keep things organized I use the hdf5 file format. This file format is hierarchical, meaning it has some sort of directory structure inside that uses Unix style names (e.g. / is the root of the file). Groups are the equivalent of directories and datasets are more or less equivalent to files and contain the actual data. The code resulting from such an approach looks something like:
import h5py
hdf_file = h5py.File('data.h5', 'w')
for v in range(5):
group = hdf_file.create_group('/'+str(v))
v_source.voltage = v
for i in range(3):
group2 = group.create_group(str(i))
current = i_probe.current
group2.create_dataset('current', data = current)
hdf_file.close()
I've written a small library to handle the communication with instruments in the lab and I want this library to automatically store the data to file, without explicitly instructing to do so in the script. The problem I run into when doing this is that the groups (or directories if you prefer) still need to be explicitly created at the start of the for loop. I want to get rid of all the file handling code in the script and therefore would like some way to automatically write to a new group on each iteration of the for loop. One way of achieving this would be to somehow modify the for statement itself, but I'm not sure how to do this. The for loop can of course be nested in more elaborate experiments.
Ideally I would be left with something along the lines of:
import h5py
hdf_file = h5py.File('data.h5', 'w')
for v_source.voltage in range(5): # v_source.voltage=x sets the voltage of a physical device to x
for i in range(3):
current = i_probe.current # i_probe.current reads the current from a physical device
current_group.create_dataset('current', data = current)
hdf_file.close()
Any pointers to implement this solution or something equally readable would be very welcome.
Edit:
The code below includes all class definitions etc and might give a better idea of my intentions. I'm looking for a way to move all the file IO to a library (e.g. the Instrument class).
import h5py
class Instrument(object):
def __init__(self, address):
self.address = address
#property
def value(self):
print('getting value from {}'.format(self.address))
return 2 # dummy value instead of value read from instrument
#value.setter
def value(self, value):
print('setting value of {} to {}'.format(self.address, value))
source1 = Instrument('source1')
source2 = Instrument('source2')
probe = Instrument('probe')
hdf_file = h5py.File('data.h5', 'w')
for v in range(5):
source1.value = v
group = hdf_file.create_group('/'+str(v))
group.attrs['address'] = source1.address
for i in range(4):
source2.value = i
group2 = group.create_group(str(i))
group2.attrs['address'] = source2.address
group2.create_dataset('current', data = probe.value)
hdf_file.close()
Without seeing the code it is hard to see, but essentially from the looks of it the pythonic way to do this is that every time you add a new dataset, you want to check whether the directory exists, and if it does you want to append the new dataset, and if it doesn't you want to create a new directory - i.e. this question might help
Writing to a new file if not exist, append to file if it do exist
Instead of writing a new file, use it to create a directory instead. Another helpful one might be
How to check if a directory exists and create it if necessary?

What's a Fail-safe Way to Update Python CSV

This is my function to build a record of user's performed action in python csv. It will get the username from the global and perform increment given in the amount parameter to the specific location of the csv, matching the user's row and current date.
In brief, the function will read the csv in a list, and do any modification on the data before rewriting the whole list back into the csv file.
Every first item on rows is the username, and the header has the dates.
Accs\Dates,12/25/2016,12/26/2016,12/27/2016
user1,217,338,653
user2,261,0,34
user3,0,140,455
However, I'm not sure why sometimes, the header get's pushed down to the second row, and data gets wiped entirely when it crashes.
Also, I need to point out that there maybe multiple script running this function and writing on the same file, not sure if that causing the issue.
I'm thinking maybe I can write the stats separately and uniquely to each users and combine later, hence eliminating the possible clashing in writing. Although would be great if I could just improve from what I have here and read/write everything on a file.
Any fail-safe way to do what I'm trying to do here?
# Search current user in first rows and updating the count on the column (today's date)
# 'amount' will be added to the respective position
def dailyStats(self, amount, code = None):
def initStats():
# prepping table
with open(self.stats, 'r') as f:
reader = csv.reader(f)
for row in reader:
if row:
self.statsTable.append(row)
self.statsNames.append(row[0])
def getIndex(list, match):
# get the index of the matched date or user
for i, j in enumerate(list):
if j == match:
return i
self.statsTable = []
self.statsNames = []
self.statsDates = None
initStats()
today = datetime.datetime.now().strftime('%m/%d/%Y')
user_index = None
today_index = None
# append header if the csv is empty
if len(self.statsTable) == 0:
self.statsTable.append([r'Accs\Dates'])
# rebuild updated table
initStats()
# add new user/date if not found in first row/column
self.statsDates = self.statsTable[0]
if getIndex(self.statsNames, self.username) is None:
self.statsTable.append([self.username])
if getIndex(self.statsDates, today) is None:
self.statsDates.append(today)
# rebuild statsNames after table appended
self.statsNames = []
for row in self.statsTable:
self.statsNames.append(row[0])
# getting the index of user (row) and date (column)
user_index = getIndex(self.statsNames, self.username)
today_index = getIndex(self.statsDates, today)
# the row where user is matched, if there are previous dates than today which
# has no data, append 0 (e.g. user1,0,0,0,) until the column where today's date is match
if len(self.statsTable[user_index]) < today_index + 1:
for i in range(0,today_index + 1 - len(self.statsTable[user_index])):
self.statsTable[user_index].append(0)
# insert pv or tb code if found
if code is None:
self.statsTable[user_index][today_index] = amount + int(re.match(r'\b\d+?\b', str(self.statsTable[user_index][today_index])).group(0))
else:
self.statsTable[user_index][today_index] = str(re.match(r'\b\d+?\b', str(self.statsTable[user_index][today_index])).group(0)) + ' - ' + code
# Writing final table
with open(self.stats, 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(self.statsTable)
# return the summation of the user's total count
total_follow = 0
for i in range(1, len(self.statsTable[user_index])):
total_follow += int(re.match(r'\b\d+?\b', str(self.statsTable[user_index][i])).group(0))
return total_follow
As David Z says, concurrency is more likely the cause of your problem.
I will add that CSV format is not suitable for Database storing, indexing, sorting, because it is plain/text and sequential.
You could handle it using a RDBMS for storing and updating your data and periodically processing your stats. Then your CSV format is just an import/export format.
Python offers a SQLite binding in its Standard Library, if you build a connector that import/update CSV content in a SQLite schema and then dump results as CSV you will be able to handle concurency and keep your native format without worring about installing a database server and installing new packages in Python.
Also, I need to point out that there maybe multiple script running this function and writing on the same file, not sure if that causing the issue.
More likely than not that is exactly your issue. When two things are trying to write to the same file at the same time, the outputs from the two sources can easily get mixed up together, resulting in a file full of gibberish.
An easy way to fix this is just what you mentioned in the question, have each different process (or thread) write to its own file and then have separate code to combine all those files in the end. That's what I would probably do.
If you don't want to do that, what you can do is have different processes/threads send their information to an "aggregator process", which puts everything together and writes it to the file - the key is that only the aggregator ever writes to the file. Of course, doing that requires you to build in some method of interprocess communication (IPC), and that in turn can be tricky, depending on how you do it. Actually, one of the best ways to implement IPC for simple programs is by using temporary files, which is just the same thing as in the previous paragraph.

Categories

Resources