Python text file searching for values and compiling results found - python

I have a large text file of lots of experimental results to search through for specific pieces of data, which I need to compile. The text file has results from many different experiments, and I need to keep the data from each experiment together.
e.g. (Not the actual data)
Object 1
The colour of the object is blue.
The size of the object is 0.5 m^3
The mass of the object is 0.8 g
Object 2
The colour of the object is pink.
The size of the object is 0.3m^3
etc.
I know where the values I want will be, as I can search the text for a specific phrase that I know will be present on the line the data is on.
One way I thought of doing it would be to search through the file for each specific line (I'm looking for two different variables), and add the value needed to a list. From this I would then create a dictionary for each object, assuming that at the same number in each list will be data from the same object.
e.g.
variable_one = []
variable_two = []
def get_data(file):
with open("filename.txt", "r") as file:
for line in file:
if "The colour" in line:
variable_one.append(line.split()[6])
if "The mass" in line:
variable_two.append(line.split()[6])
file.close()
or, to search through the file and create a list, with each entry being the section of data from a different object, then searching for the two variables for each object from within the different items in the list - again eventually storing the values from each object in a dictionary.
What I want to know is if there is a more efficient/better method for doing this than the ideas I had?

Here is an alternative which uses only one list and uses less "append" and less "in" and thus should be more effective.
variables = []
with open('filename.txt') as input:
colour = mass = ''
for line in input:
fields = line.split()
if len(fields)>6:
value = fields[6]
if 'The colour' in line:
colour = value
elif 'The mass' in line:
mass = value
elif line.startswith('Object'):
variables.append((colour, mass))
colour = mass = '' # may not be needed.
del(variables[0])

The way you are doing it there looks fine to me in general, except for the areas I mentioned in the comments, and the indexing causing an error if you have a line shorter than 6 words.

Related

Im having difficulty utilizing an array in python

I'm trying to manipulate text from a word file however when I save it to an array of classes, all the indexes are being overwritten instead of the one particular index I intend to change.
for line in modified:
if line.startswith('Date'):
output.append(line)
list2=line.split(' ')
work.date=list2[1]
# print(work.date)
if line.startswith('PPV'): #list1[2]=l,[3]=t,[4]=v
output.append(line)
list1=line.split(' ')
work.lpv=list1[2]
# print("l is ",list1[2],work.lpv)
work.tpv=list1[3]
# print("t is ",list1[3],work.tpv)
work.vpv=list1[4]
# print("v is ",list1[4],work.vpv)
daylist[count]=work
#print("l2 is ",list1[2],work.lpv)
#print("daylist", count, "saved")
print(count,daylist[count].date) #this displays the correct value at the propper index but all other indexs have also been changed to this value
count+=1
Im trying to save a class which holds a string and a few floats to an array but cannot seem to get it to save to each index properly as it is read from the file. ive tried messing with the scope and list initialization but cant seem to figure it out. Any input would be appreciated, Thanks!
Every index in the daylist array references the same work object. When you change an attribute of work (e.g. work.date) it's reflected in all references to that single object. You want each index to reference a separate, independent object but that's not what the code is doing.
Try something like this where work is a dictionary:
for line in modified:
work = {} # <-- this makes the name "work" refer to a new, empty dict
if line.startswith('Date'):
output.append(line)
list2=line.split(' ')
work["date"] = list2[1]
elif line.startswith('PPV'):
output.append(line)
list1=line.split(' ')
work["lpv"] = list1[2]
# ...
print(count, daylist[count]["date"])
count += 1
Here's a helpful link on how names reference objects: How does Python referencing work?

my graph is repeating values and I suspect its because one of my variables is not in a list so how do I convert the `variable` into a list?

When I read a text file into a variable and use that variable as a parameter in a function, I get a graph with repeated values. I suspect it is because variable in the code below is not a list and when I try to use the list() function or wrap variable in [] it is unsuccessful. The goal of the code below is to read a text file into a variable, use that variable as a parameter in the function, count the words in a column of a dataframe based on search_words1 and output a horizontal bar chart of the count of the words. the words will be on the y axis and the count will be on the x axis. Alternatively, when I declare search_words1 = ['Job Completed','delay in estimate upload', 'mitigation needed'], instead of using the variable read in from the text file, the code works fine. My question is how do I change search_words1 = variable into a list to see if that works to resolve the issue and if it doesn't work what direction can I take to get the right output of the horizontal bar chart?
# read text into variable
with open('text2.txt') as f:
variable=f.read()
# declarations
ser = df['Comments']
search_words1 = variable
l2 = []
# I want to count the words in df['Comments'] based on search_words1 variable and create dataframe
def count(search_words1):
for i in search_words1:
c = ser[(ser.str.contains(i,regex=False))].count()
l2.append(c)
return pd.DataFrame({'Comments':search_words1, 'Count':l2})
# Call function and sort
twords=count(search_words1).sort_values(by='Count',ascending=False)
# graph results
twords.plot(kind='barh',x='Comments',figsize=(5,5),fontsize=14, title ='2021 Comments Common words')
first image is the bad output. the second image is the desired output.
Edits:
when using variable = f.read(), the variable looks like 'Job Completed', 'delay in estimate upload', 'mitigation needed' when using the print function on it and it is of type str. Graph output is the first image with this method.
when using variable = f.readlines(), the variable looks like ["'Job Completed',\n", " 'delay in estimate upload',\n", " 'mitigation needed' "] when using the print function on it and it is of type list. Also, with this method, the graph output is similar to second image but with no horizontal bars within the chart frame.

Parsing and arranging text in python

I'm having some trouble figuring out the best implementation
I have data in file in this format:
|serial #|machine_name|machine_owner|
If a machine_owner has multiple machines, I'd like the machines displayed in a comma separated list in the field. so that.
|1234|Fred Flinstone|mach1|
|5678|Barney Rubble|mach2|
|1313|Barney Rubble|mach3|
|3838|Barney Rubble|mach4|
|1212|Betty Rubble|mach5|
Looks like this:
|Fred Flinstone|mach1|
|Barney Rubble|mach2,mach3,mach4|
|Betty Rubble|mach5|
Any hints on how to approach this would be appreciated.
You can use dict as temporary container to group by name and then print it in desired format:
import re
s = """|1234|Fred Flinstone|mach1|
|5678|Barney Rubble|mach2|
|1313|Barney Rubble||mach3|
|3838|Barney Rubble||mach4|
|1212|Betty Rubble|mach5|"""
results = {}
for line in s.splitlines():
_, name, mach = re.split(r"\|+", line.strip("|"))
if name in results:
results[name].append(mach)
else:
results[name] = [mach]
for name, mach in results.items():
print(f"|{name}|{','.join(mach)}|")
You need to store all the machines names in a list. And every time you want to append a machine name, you run a function to make sure that the name is not already in the list, so that it will not put it again in the list.
After storing them in an array called data. Iterate over the names. And use this function:
data[i] .append( [ ] )
To add a list after each machine name stored in the i'th place.
Once your done, iterate over the names and find them in in the file, then append the owner.
All of this can be done in 2 steps.

how to invert another function in a dictionary and how to count the inverted value if its not unique in the report?

my load library is working well but the other too failed..
how to invert another function in a dictionary(index by author) and how to count the inverted value if its not unique in the report(report author count)?
def load_library(f):
with open(f,'rt') as x:
return dict(map(str.strip, line.split("|")) for line in x)
def index_by_author(f):
return {value:key for key, value in load_library(f).items()}
def count_authors(file_name):
invert = {}
for k, v in load_library(file_name).items():
invert[v] = invert.get(v, 0) + 1
return invert
def write_authors_counts(counts, file_name):
with open(file_name, 'w') as fobj:
for name, count in counts.items():
fobj.write('{}: {}\n'.format(name, count))
def report_author_counts(lib_fpath, rep_filepath):
counts = count_authors(lib_fpath)
write_authors_counts(counts, rep_filepath)
Load library
In module library.py, implement function load_library().
Inputs:
Path to a text file (with contents similar to those above) containing the individual books.
Outputs:
The function shall produce a dictionary where the book titles are used as keys and the authors' names are stored as values.
You can expect that the input text file will always exist, but it may be empty. In that case, the function shall return an empty dictionary.
It can then be used as follows:
>>> from library import load_library
>>> book_author = load_library('books.txt')
>>> print(book_author['RUR'])
Capek, Karel
>>> print(book_author['Dune'])
Herbert, Frank
Index by author
In module library.py, create function index_by_author(), which - in a sense - inverts the dictionary of books produced by load_library().
Inputs:
A dictionary with book titles as keys and book authors as values (the same structure as produced by load_library() function).
Outputs:
A dictionary containing book authors as keys and a list of all books of the respective author as values.
If the input dictionary is empty, the function shall produce an empty dictionary as well.
For example, running the function on the following book dictionary (with reduced contents for the sake of brevity) would produce results shown below in the code:
>>> book_author = {'RUR': 'Capek, Karel', 'Dune': 'Herbert, Frank', 'Children of Dune': 'Herbert, Frank'}
>>> books_by = index_by_author(book_author)
>>> print(books_by)
{'Herbert, Frank': ['Dune', 'Children of Dune'], 'Capek, Karel': ['RUR']}
>>> books_by['Capek, Karel']
['RUR']
>>> books_by['Herbert, Frank']
['Dune', 'Children of Dune']
Report author counts
In module library.py, create function report_author_counts(lib_fpath, rep_filepath) which shall compute the number of books of each author and the total number of books, and shall store this information in another text file.
Inputs:
Path to a library text file (containing records for individual books).
Path to report text file that shall be created by this function.
Outputs: None
Assuming the file books.txt has the same contents as above, running the function like this:
>>> report_author_counts('books.txt', 'report.txt')
shall create a new text file report.txt with the following contents:
Clarke, Arthur C.: 2
Herbert, Frank: 2
Capek, Karel: 1
Asimov, Isaac: 3
TOTAL BOOKS: 8
The order of the lines is irrelevant. Do not forget the TOTAL BOOKS line! If the input file is empty, the output file shall contain just the line TOTAL BOOKS: 0.
Suggestion: There are basically 2 ways how to implement this function. You can either
use the 2 above functions to load the library, transform it using index_by_author() and then easilly iterate over the dictionary, or
you can work directly with the source text file, extract the author names, and count their occurences.
Both options are possible, provided the function will accept the specified arguments and will produce the right file contents. The choice is up to you.
python
The index_by_author function needs to be a little more complex than the dict comprehension you suggested. dict.setdefault() comes in handy here, as described in Efficient way to either create a list, or append to it if one already exists?. Notice too that your assignment says a dictionary should be the parameter, not a file. Here is what I recommend:
def index_by_author(book_author):
dict_by_author = {}
for key, value in book_author.items():
dict_by_author.setdefault(value, []).append(key)
return dict_by_author
Then in your report_author_counts(), you can use index_by_author() to invert the dictionary. Then loop through the inverted dictionary. For each item, the count will be the length of the value, which will be a list of titles. The length of a list is determined with len(list).

Sorting on list values read into a list from a file

I am trying to write a routine to read values from a text file, (names and scores) and then be able to sort the values az by name, highest to lowest etc. I am able to sort the data but only by the position in the string, which is no good where names are different lengths. This is the code I have written so far:
ClassChoice = input("Please choose a class to analyse Class 1 = 1, Class 2 = 2")
if ClassChoice == "1":
Classfile = open("Class1.txt",'r')
else:
Classfile = open("Class2.txt",'r')
ClassList = [line.strip() for line in Classfile]
ClassList.sort(key=lambda s: s[x])
print(ClassList)
This is an example of one of the data files (Each piece of data is on a separate line):
Bob,8,7,5
Fred,10,9,9
Jane,7,8,9
Anne,6,4,8
Maddy,8,5,5
Jim, 4,6,5
Mike,3,6,5
Jess,8,8,6
Dave,4,3,8
Ed,3,3,4
I can sort on the name, but not on score 1, 2 or 3. Something obvious probably but I have not been able to find an example that works in the same way.
Thanks
How about something like this?
indexToSortOn = 0 # will sort on the first integer value of each line
classChoice = ""
while not classChoice.isdigit():
classChoice = raw_input("Please choose a class to analyse (Class 1 = 1, Class 2 = 2) ")
classFile = "Class%s.txt" % classChoice
with open(classFile, 'r') as fileObj:
classList = [line.strip() for line in fileObj]
classList.sort(key=lambda s: int(s.split(",")[indexToSortOn+1]))
print(classList)
The key is to specify in the key function that you pass in what part of each string (the line) you want to be sorting on:
classList.sort(key=lambda s: int(s.split(",")[indexToSortOn+1]))
The cast to an integer is important as it ensures the sort is numeric instead of alphanumeric (e.g. 100 > 2, but "100" < "2")
I think I understand what you are asking. I am not a sort expert, but here goes:
Assuming you would like the ability to sort the lines by either the name, the first int, second int or third int, you have to realize that when you are creating the list, you aren't creating a two dimensional list, but a list of strings. Due to this, you may wish to consider changing your lambda to something more like this:
ClassList.sort(key=lambda s: str(s).split(',')[x])
This assumes that the x is defined as one of the fields in the line with possible values 0-3.
The one issue I see with this is that list.sort() may sort Fred's score of 10 as being less than 2 but greater than 0 (I seem to remember this being how sort worked on ints, but I might be mistaken).

Categories

Resources