I've been trying to create a program which allows users to view a text file's contents and delete some or all of a single entry block.
An example of the text's file contents can be seen below:
Special Type A Sunflower
2016-10-12 18:10:40
Asteraceae
Ingredient in Sunflower Oil
Brought to North America by Europeans
Requires fertile and moist soil
Full sun
Pine Tree
2018-12-15 13:30:45
Pinaceae
Evergreen
Tall and long-lived
Temperate climate
Tropical Sealion
2019-01-20 12:10:05
Otariidae
Found in zoos
Likes fish
Likes balls
Likes zookeepers
Big Honey Badger
2015-06-06 10:10:25
Mustelidae
Eats anything
King of the desert
As such, the entry block refers to all lines without a horizontal space.
Currently, my progress is at:
import time
import os
global o
global dataset
global database
from datetime import datetime
MyFilePath = os.getcwd()
ActualFile = "creatures.txt"
FinalFilePath = os.path.join(MyFilePath, ActualFile)
def get_dataset():
database = []
shown_info = []
with open(FinalFilePath, "r") as textfile:
sections = textfile.read().split("\n\n")
for section in sections:
lines = section.split("\n")
database.append({
"Name": lines[0],
"Date": lines[1],
"Information": lines[2:]
})
return database
def delete_creature():
dataset = get_dataset()
delete_question = str(input("Would you like to 1) delete a creature or 2) only some of its information from the dataset or 3) return to main page? Enter 1, 2 or 3: "))
if delete_question == "1":
delete_answer = str(input("Enter the name of the creature: "))
for line in textfile:
if delete_answer in line:
line.clear()
elif delete_question == "2":
delete_answer = str(input("Enter the relevant information of the creature: "))
for line in textfile:
if delete_answer in line:
line.clear()
elif delete_question == "3":
break
else:
raise ValueError
except ValueError:
print("\nPlease try again! Your entry is invalid!")
while True:
try:
option = str(input("\nGood day, This is a program to save and view creature details.\n" +
"1) View all creatures.\n" +
"2) Delete a creature.\n" +
"3) Close the program.\n" +
"Please select from the above options: "))
if option == "1":
view_all()
elif option == "2":
delete()
elif option == "3":
break
else:
print("\nPlease input one of the options 1, 2 or 3.")
except:
break
The delete_function() is meant to delete the creature by:
Name, which deletes the entire text block associated with the name
Information, which deletes only the line of information
I can't seem to get the delete_creature() function to work, however, and I am unsure of how to get it to work.
Does anyone know how to get it to work?
Many thanks!
Your problem with removing lines from a section is that you specifically hardcoded which line represents what. Removing a section in your case will be easy, removing a line will, if you do not change your concept, involve setting the line in question to empty or to some character representing the empty string later.
Another question here is, do you need your sections to remain ordered as they were entered, or you can have them sorted back in a file in some other order.
What I would do is to change the input file format to e.g. INI file format. Then you can use the configparser module to parse and edit them in an easy manner.
The INI file would look like:
[plant1]
name="Some plant's English name"
species="The plant's Latin species part"
subspecies="The plant's subspecies in Latin ofcourse"
genus="etc."
[animal1]
# Same as above for the animal
# etc. etc. etc.
configparser.ConfigParser() will let you load it in an dictionarish manner and edit sections and values. Sections you can name animal1, plant1, or use them as something else, but I prefer to keep the name inside the value, usually under the name key, then use configparser to create a normal dictionary from names, where its value is another dictionary containing key-value pairs as specified in the section. And I reverse the process when saving the results. Either manually, or using configparser again.
The other format you might consider is JSON, using the json module.
Using its function dumps() with separators and indentation set correctly, you will get pretty human-readable and editable output format. The nice thing is that you save the data structure you are working with, e.g. dictionary, then you load it and it comes back as you saved it, and you do not need to perform some additional stuff to get it done, as with configparser. The thing is, that an INI file is a bit less confusing for an user not custom to JSON to construct, and results in less errors, while JSON must be strictly formatted, and any errors in opening and closing the scopes or with separators results in whole thing not working or incorrect input. And it easily happens when the file is big.
Both formats allows users to put empty lines wherever they want and they will not change the way the file will be loaded, while your method is strict in regard to empty lines.
If you are expecting your database to be edited only by your program, then use the pickle module to do it and save yourself the mess.
Otherwise you can:
def getdata (stringfromfile):
end = {}
l = [] # lines in a section
for x in stringfromfile.strip().splitlines():
x = x.strip()
if not x: # New section encountered
end[l[0].lower()] = l[1:]
l = []
continue
end.append(x)
end[l[0].lower()] = l[1:] # Add last section
# Connect keys to numbers in the same dict(), so that users can choose by number too
for n, key in enumerate(sorted(end)):
end[n] = key
return end
# You define some constants for which line is what in a dict():
values = {"species": 0, "subspecies": 1, "genus": 2}
# You load the file and parse the data
data = getdata(f.read())
def edit (name_or_number, edit_what, new_value):
if isinstance(name_or_number, int):
key = data[name_or_number]
else:
key = name_or_number.lower().strip()
if isinstance(edit_what, str):
edit_what = values[edit_what.strip().lower()]
data[key][edit_what] = new_value.strip()
def add (name, list_of_lines):
n = len(data)/2 # Number for new entry for numeric getting
name = name.strip().lower()
data[name] = list_of_lines
data[n] = name
def remove (name):
name = name.lower().strip()
del data[name]
# Well, this part is bad and clumsy
# It would make more sense to keep numeric mappings in separate list
# which will do this automatically, especially if the database file is big-big-big...
# But I started this way, so, keeping it simple and stupid, just remap everything after removing the item (inefficient as hell itself)
for x in data.keys():
if isinstance(x, int):
del data[x]
for n, key in enumerate(sorted(data)):
data[n] = key
def getstring (d):
# Serialize for saving
end = []
for l0, ls in d.items():
if isinstance(l0, int):
continue # Skip numeric mappings
lines = l0+"\n"+"\n".join(ls)
end.append(lines)
return "\n\n".join(end)
I didn't test the code. There might be bugs.
If you need no specific lines, you can modify my code easily to search in the lines using the list.index() method, or just use numbers for the lines if they exist when you need to get to them. For doing so with configparser, use generic keys in a section like: answer0, answer1..., or just 0, 1, 2..., Then ignore them and load answers as a list or however. If you are going to use configparser to work on the file, you will get sometimes answer0, answer3... when you remove.
And a warning. If you want to keep the order in which input file gives the creatures, use ordereddict instead of the normal dictionary.
Also, editing the opened file in place is, of course, possible, but complicated and inadvisable, so just don't. Load and save back. There are very rare situations when you want to change the file directly. And for that you would use the mmap module. Just don't!
Related
I am trying to compare two files. One file has a list of stores. The other list has the same list of stores, except it is missing a few from a filter I had run against it from another script. I would like to compare these two files, if the store in file 1 is not anywhere to be located in file 2, I want to print it out, or append to a list, not too picky on that part. Below are examples of partials in both files:
file 1:
Store: 00377
Main number: 8033056238
Store: 00525
Main number: 4075624470
Store: 00840
Main number: 4782736996
Store: 00920
Main number: 4783337031
Store: 00998
Main number: 9135631751
Store: 02226
Main number: 3107501983
Store: 02328
Main number: 8642148700
Store: 02391
Main number: 7272645342
Store: 02392
Main number: 9417026237
Store: 02393
Main number: 4057942724
File 2:
00377
00525
00840
00920
00998
02203
02226
02328
02391
02392
02393
02394
02395
02396
02397
02406
02414
02425
02431
02433
02442
Here is what I built to try and make this work, but it just keeps spewing all stores in the file.
def comparesitestest():
with open("file_1.txt", "r") as pairsin:
pairs = pairsin.readlines()
pairsin.close
with open("file_2.txt", "r") as storesin:
stores = storesin.readlines()
storesin.close
for pair in pairs:
for store in stores:
if store not in pair:
print(store)
When you read your first file, add the store number to a set.
store_nums_1 = set()
with open("file_1.txt") as f:
for line in f:
line = line.strip() # Remove trailing whitespace
if line.startswith("Store"):
store_nums_1.add(line[7:]) # Add only store number to set
Next, read the other file and add those numbers to another set
store_nums_2 = set()
with open("file_2.txt") as f:
for line in f:
line = line.strip() # Remove trailing whitespace
store_nums_2.add(line) # The entire line is the store number, so no need to slice.
Finally, find the set difference between the two sets.
file1_extras = store_nums_1 - store_nums_2
Which gives a set containing only the store numbers in file 1 but not in file 2. (I changed your file_2 to have only the first three lines, because the file you've shown actually contains more store numbers than file_1, so the result file1_extras was empty using your input)
{'00920', '00998', '02226', '02328', '02391', '02392', '02393'}
This is more efficient than using lists, because checking if something exists in a list is an O(N) operation. When you do it once for each of the M items in your first list, you end up with an O(N*M) operation. On the other hand, membership checks in a set are O(1), so the entire set-difference operation is O(M) instead of O(N*M)
You are getting the output you get because your check is not checking what you want. Try changing your for loop to something like this:
for pairline in pairs:
if pairline:
name, number = pairline.split(': ')
if name == "Store":
if number not in stores:
print(number)
Explanation is as follows:
You start with a File 1 of pairs, and a File 2 of stores (store numbers, really). Your file 2 is in decent shape. After you read it in, you've got a list of store numbers. You don't need to put that through a second loop. In fact, it's wasteful and unnecessary.
Your File 1 is a little more complicated. Although you refer to the info as pairs, it's a little more complicated than that, because the lines have a store number and what I assume is a phone number. So, for each line in the File 1, I would check if the line starts with "Store:", knowing I can ignore all the other lines. If the line starts with "Store;", the next part of the line is the store number I actually want to check for in the list of File 2.
So, the program above does a little more checking to see if it's reading in a line it needs to act on. and then it acts on it if necessary by checking whether the store number is in the store number list.
Also, as a side note, it's great to use the with structure. It's good coding practice. But when you do that, you do not need to explicitly close the file. That happens automatically with that context structure. Once you leave the context, the close happens automatically.
As another side note, there are usually multiple good ways and bad ways to solve a problem. Another possible reasonable solution/version is:
for pairline in pairs:
if pairline and pairline.startswith("Store:"):
store = pairline.split()[1]
if store not in stores:
print(stores)
It's different. Not necessarily better or worse, just different.
So I have compiled a list of NFL game projections from the 2020 season for fantasy relevant players. Each row contains the team names, score, relevant players and their stats like in the text below. The problem is that each of the player names and stats are either different lengths or written out in slightly different ways.
`Bears 24-17 Jaguars
M.Trubisky- 234/2TDs
D.Montgomery- 113 scrim yards/1 rush TD/4 rec
A.Robinson- 9/114/1
C.Kmet- 3/35/0
G.Minshew- 183/1TD/2int
J.Robinson- 77 scrim yards/1 rush TD/4 rec
DJ.Chark- 3/36`
I'm trying to create a data frame that will split the player name, receptions, yards, and touchdowns into separate columns. Then I will able to compare these numbers to their actual game numbers and see how close the predictions were. Does anyone have an idea for a solution in Python? Even if you could point me in the right direction I'd greatly appreciate it!
You can get split the full string using the '-' (dash/minus sign) as the separator. Then use indexing to get different parts.
Using str.split(sep='-')[0] gives you the name. Here, the str would be the row, for example M.Trubisky- 234/2TDs.
Similarly, str.split(sep='-')[1]gives you everything but the name.
As for splitting anything after the name, there is no way of doing it unless they are in a certain order. If you are able to somehow achieve this, there is a way of splitting into columns.
I am going to assume that the trend here is yards / touchdowns / receptions, in which case, we can again use the str.split() method. I am also assuming that the 'rows' only belong to one team. You might have to run this script once for each team to create a dataframe, and then join all dataframes with a new feature called 'team_name'.
You can define lists and append values to them, and then use the lists to create a dataframe. This snippet should help you.
import re
names, scrim_yards, touchdowns, receptions = [], [], [], []
for row in rows:
# name = row.split(sep='-')[0] --> sample name: M.Trubisky
names.append(row.split(sep='-')[0])
stats = row.split(sep='-')[1].split(sep='/') # sample stats: [234, 2TDs ]
# Since we only want the 'numbers' from each stat, we can filter out what we want using regular expressions.
# This snippet was obtained from [here][1].
numerical_stats = re.findall(r'\b\d+\b', stats) # sample stats: [234, 2]
# now we use indexing again to get desired values
# If the
scrim_yards.append(numerical_stats[0])
touchdowns.append(numerical_stats[1])
receptions.append(numerical_stats[2])
# You can then create a pandas dataframe
nfl_player_stats = pd.DataFrame({'names': names, 'scrim_yards': scrim_yards, 'touchdowns': touchdowns, 'receptions': receptions})
As you are pointing out, often times the hardest part of processing a data file like this is handling all the variability and inconsistency in the file itself. There are a lot of things that can vary inside the file, and then sometimes the file also contains silly errors (typos, missing whitespace, and the like). Depending on the size of the data file, you might be better off simply hand-editing it to make it easier to read into Python!
If you tackle this directly with Python code, then it's a very good idea to be very careful to verify the actual data matches your expectations of it. Here are some general concepts on how to handle this:
First off, make sure to strip every line of whitespace and ignore blank lines:
for curr_line in file_lines:
curr_line = curr_line.strip()
if len(curr_line) > 0:
# Process the line...
Once you have your stripped, non-blank line, make sure to handle the "game" (matchup between two teams) line differently than the lines denoting players"
TEAM_NAMES = [ "Cardinals", "Falcons", "Panthers", "Bears", "Cowboys", "Lions",
"Packers", "Rams", "Vikings" ] # and 23 more; you get the idea
#...down in the code where we are processing the lines...
if any([tn in curr_line for tn in TEAM_NAMES]):
# ...handle as a "matchup"
else:
# ...handle as a "player"
When handling a player and their stats, we can use "- " as a separator. (You must include the space, otherwise players such as Clyde Edwards-Helaire will split the line in a way you did not want.) Here we unpack into exactly two variables, which gives us a nice error check since the code will raise an exception if the line doesn't split into exactly two parts.
p_name, p_stats = curr_line.split("- ")
Handling the stats will be the hardest part. It will all depend on what assumptions you can safely make about your input data. I would recommend being very paranoid about validating that the input data agrees with the assumptions in your code. Here is one notional idea -- an over-engineered solution, but that should help to manage the hassle of finding all the little issues that are probably lurking in that data file:
if "scrim yards" in p_stats:
# This is a running back, so "scrim yards" then "rush TD" then "rec:
rb_stats = p_stats.split("/")
# To get the number, just split by whitespace and grab the first one
scrim_yds = int(rb_stats[0].split()[0])
if len(rb_stats) >= 2:
rush_tds = int(rb_stats[1].split()[0])
if len(rb_stats) >= 3:
rec = int(rb_stats[2].split()[0])
# Always check for unexpected data...
if len(rb_stats) > 3:
raise Exception("Excess data found in rb_stats: {}".format(rb_stats))
elif "TD" in p_stats:
# This is a quarterback, so "yards"/"TD"/"int"
qb_stats = p_stats.split("/")
qb_yards = int(qb_stats[0]) # Or store directly into the DF; you get the idea
# Handle "TD" or "TDs". Personal preference is to avoid regexp's
if len(qb_stats) >= 2:
if qb_stats[1].endswidth("TD"):
qb_td = int(qb_stats[1][:-2])
elif qb_stats[1].endswith("TDs"):
qb_td = int(qb_stats[1][:-3])
else:
raise Exception("Unknown qb_stats: {}".format(qb_stats))
# Handle "int" if it's there
if len(qb_stats) >= 3:
if qb_stats[2].endswidth("int"):
qb_int = int(qb_stats[2][:-3])
else:
raise Exception("Unknown qb_stats: {}".format(qb_stats))
# Always check for unexpected data...
if len(qb_stats) > 3:
raise Exception("Excess data found in qb_stats: {}".format(qb_stats))
else:
# Must be a running back: receptions/yards/TD
rb_rec, rb_yds, rb_td = p_stats.split("/")
I made a dictionary in IDE and made a function to take input from user and it takes the input but when I rerun that program and try to print the output It don`t show anything.
Here is the code, help if anyone want to.
# Created dictionary.
list = {}
# Made a Function to save data.
def up():
v = int(input(f"How many inputs you want to give : "))
for i in range(v):
a = input(f"Give words you want to put : ")
b = input(f"Assign : ")
list.update({a:b})
print(f"Saved",{a:b})
value = input(f"What you want to do ? \nSee List or update it. \nIf you want to update type 'u' , If you want to see list type 's' ")
if value == "s":
print(list)
elif value == "u":
up()
Information in your variables is stored during the execution of your script. It is not automatically carried across to different executions of your script. Each one is a blank slate. Even if it weren't, the first line of your program sets list to an empty dictionary.
At the moment, you're putting salt on your broccoli, eating it, then expecting the broccoli you eat tomorrow to also have salt on it.
You could serialise the dictionary to a file, that can be read back in on next execution, rather than starting with an empty dictionary each time.
I am not able to add a number to my list that i have in a text file and don't know how to.
Code so far:
def add_player_points():
# Allows the user to add a points onto the players information.
L = open("players.txt","r+")
name = raw_input("\n\tPlease enter the name of the player whose points you wish to add: ")
for line in L:
s = line.strip()
string = s.split(",")
if name == string[0]:
opponent = raw_input("\n\t Enter the name of the opponent: ")
points = raw_input("\n\t Enter how many points you would like to add?: ")
new_points = string[5] + points
L.close()
This is a sample of a key in the text file. There are about 100 in the file:
Joe,Bloggs,J.bloggs#anemailaddress.com,01269 512355, 1, 0, 0, 0,
^
The value that i would like this number to be added to is the 0 besides the number already in there, indicated by an arrow below it. The text file is called players.txt as shown.
A full code answer would be helpful.
This is likely the problem:
new_points = string[5] + points
You are adding a string with another string, you need to convert them to integer
new_points = int(string[5]) + int(points)
This is not checking for incorrect input, but assuming the file format is correct and the user input too, it should work.
Edit: If you want to update the file with the new information, a better way is to divide the problem in 3 parts: 1) Read player information into an appropriate data structure, e.g. a dictionary using the player name as key, 2) Make the changes into the dictionary, and finally 3) save the changes back to file. So your code should be split into 3 functions. Some help can be found here.
One part of my program allows the user to check whether the clues they have entered are correct with the solved version which is in an external file named solved.txt... So far I have the code shown below however it is only showing the three clue pairings at the beginning of the program which are correct, not the ones which I have added throughout the program. I think this is just something minor but I am a bit stuck on what to change.
Here is my code so far...
def check_clues():
# Dictionary to hold all of the symbol/letter combos
coded_dict = {}
# Go through each symbol/letter combination together
with open("words.txt") as fw, open("solved.txt") as fs:
for fw_line, fs_line in zip(fw, fs):
for fw_symbol, fs_letter in zip(fw_line.strip(), fs_line.strip()):
# Add the symbol/letter combination to the dictionary
coded_dict[fw_symbol] = fs_letter
correct_clues = []
with open("clues.txt") as fc:
for fc_line in fc:
# If the symbol is in the dictionary and the letter matches the symbol
if fc_line[1] in coded_dict and coded_dict[fc_line[1]] == fc_line[0]:
# Add a correct clue to your list
correct_clues.append(fc_line.strip())
print("You got a total of {0} correct: {1}".format(len(correct_clues), ", ".join(correct_clues)))
if __name__ == "__main__":
check_clues()
Below is a link to what is in each of the files...
http://www.codeshare.io/vgwrC
If needed, I will add the code for all of my program...