I'm currently revising for my GCSE coursework. The Task I have been asked for is a troubleshooting program, in which the user says their problem; and I evaluate their input and test it for keywords, before pulling up a text file into the code and printing the according solution.
This was my original code:
keywords = ["k1","k2","k3","k4","k5","k6","k7","k8","k9","kk"]
question_about_phone = input("What Seems to be the Problem? Please be percific but don't bombard us with too much info").lower()
file = open('code.txt','r')
solution = [line.strip(',') for line in file.readlines()]
for x in range(0, 10):
if keywords[x] in question_about_phone:
print(solution[x])
However in the middle of my Assessment I realised that u cant have it printing a solution for each keyword. So I decided to make it assign a value to a different list and then have many lines of
if list[1] and list[5] = "true:
print(Solution[1]
and so on ...
however this is inefficient ;( is there anyway i can use a DICTIONARY with values and say something along the lines of:
dictionary = [list[1] list[5], (more combos)
then something like (probably a while loop)
for x in range(0,10):
if dictionary[x] == "TRUE":
print(solutions[x])
end code
You can do
keywords = ["battery", "off", "broken"]
question_about_phone = input("What Seems to be the Problem?")
with open('code.txt', 'r') as file:
solutions = {k:line.strip(',\n') for k, line in zip(keywords, file)}
answers = [v for k, v in solutions.items() if k in question_about_phone]
if answers:
print(answers)
else:
print('Sorry, there are no answers to your question')
which, for example, with a file of
answer 4 battery
answer 4 off
answer 4 broken
...
and an input question of
What Seems to be the Problem? broken battery sunny
produces
['answer 4 broken', 'answer 4 battery']
basically solutions is built pairing the keywords and each line of the file.
Then answers is formed picking the values of those keywords that appear in the question
However, I strongly agree with Tim Seed's approach: it would be much more efficient to only look for the keywords present in the question instead of doing the opposite, since the possible answers outnumber the terms in the question.
In order to achieve that, simply change
answers = [solutions[k] for k in question_about_phone.split() if k in solutions]
You have correctly deduced that iterating through a list (array) is inefficient - and using a dictionary is an option.
So using your Example
keywords = {"k1": "Turn Power on","k2":"Turn Power Off"}
for k in ['Bad','k1','k2','bad']:
if k in keywords:
print("%s: Answer is %s"%(k,keywords[k]))
else:
print("%s: No idea what the issue is"%(k))
You should get Answers for k1,k2 - but not for the others....
Giving you output of
Bad: No idea what the issue is
k1: Answer is Turn Power on
k2: Answer is Turn Power Off
bad: No idea what the issue is
Hope that helps
I assume that there is exactly one answer per keyword (from your example code).
You can then just return after the first answer has been found, as in:
for x in range(0, 10):
if keywords[x] in question_about_phone:
print(solution[x])
return
print("No solution found, be more specific")
You can also iterate in a more general way:
for idx, kw in enumerate(keywords):
if kw in question_about_phone:
print(solutions[idx])
return
Related
So I am currently preparing for a competition (Australian Informatics Olympiad) and in the training hub, there is a problem in AIO 2018 intermediate called Castle Cavalry. I finished it:
input = open("cavalryin.txt").read()
output = open("cavalryout.txt", "w")
squad = input.split()
total = squad[0]
squad.remove(squad[0])
squad_sizes = squad.copy()
squad_sizes = list(set(squad))
yn = []
for i in range(len(squad_sizes)):
n = squad.count(squad_sizes[i])
if int(squad_sizes[i]) == 1 and int(n) == int(total):
yn.append(1)
elif int(n) == int(squad_sizes[i]):
yn.append(1)
elif int(n) != int(squad_sizes[i]):
yn.append(2)
ynn = list(set(yn))
if len(ynn) == 1 and int(ynn[0]) == 1:
output.write("YES")
else:
output.write("NO")
output.close()
I submitted this code and I didn't pass because it was too slow, at 1.952secs. The time limit is 1.000 secs. I wasn't sure how I would shorten this, as to me it looks fine. PLEASE keep in mind I am still learning, and I am only an amateur. I started coding only this year, so if the answer is quite obvious, sorry for wasting your time 😅.
Thank you for helping me out!
One performance issue is calling int() over and over on the same entity, or on things that are already int:
if int(squad_sizes[i]) == 1 and int(n) == int(total):
elif int(n) == int(squad_sizes[i]):
elif int(n) != int(squad_sizes[i]):
if len(ynn) == 1 and int(ynn[0]) == 1:
But the real problem is your code doesn't work. And making it faster won't change that. Consider the input:
4
2
2
2
2
Your code will output "NO" (with missing newline) despite it being a valid configuration. This is due to your collapsing the squad sizes using set() early in your code. You've thrown away vital information and are only really testing a subset of the data. For comparison, here's my complete rewrite that I believe handles the input correctly:
with open("cavalryin.txt") as input_file:
string = input_file.read()
total, *squad_sizes = map(int, string.split())
success = True
while squad_sizes:
squad_size = squad_sizes.pop()
for _ in range(1, squad_size):
try:
squad_sizes.remove(squad_size) # eliminate n - 1 others like me
except ValueError:
success = False
break
else: # no break
continue
break
with open("cavalryout.txt", "w") as output_file:
print("YES" if success else "NO", file=output_file)
Note that I convert all the input to int early on so I don't have to consider that issue again. I don't know whether this will meet AIO's timing constraints.
I can see some things in there that might be inefficient, but the best way to optimize code is to profile it: run it with a profiler and sample data.
You can easily waste time trying to speed up parts that don't need it without having much effect. Read up on the cProfile module in the standard library to see how to do this and interpret the output. A profiling tutorial is probably too long to reproduce here.
My suggestions, without profiling,
squad.remove(squad[0])
Removing the start of a big list is slow, because the rest of the list has to be copied as it is shifted down. (Removing the end of the list is faster, because lists are typically backed by arrays that are overallocated (more slots than elements) anyway, to make .append()s fast, so it only has to decrease the length and can keep the same array.
It would be better to set this to a dummy value and remove it when you convert it to a set (sets are backed by hash tables, so removals are fast), e.g.
dummy = object()
squad[0] = dummy # len() didn't change. No shifting required.
...
squad_sizes = set(squad)
squad_sizes.remove(dummy) # Fast lookup by hash code.
Since we know these will all be strings, you can just use None instead of a dummy object, but the above technique works even when your list might contain Nones.
squad_sizes = squad.copy()
This line isn't required; it's just doing extra work. The set() already makes a shallow copy.
n = squad.count(squad_sizes[i])
This line might be the real bottleneck. It's effectively a loop inside a loop, so it basically has to scan the whole list for each outer loop. Consider using collections.Counter for this task instead. You generate the count table once outside the loop, and then just look up the numbers for each string.
You can also avoid generating the set altogether if you do this. Just use the Counter object's keys for your set.
Another point unrelated to performance. It's unpythonic to use indexes like [i] when you don't need them. A for loop can get elements from an iterable and assign them to variables in one step:
from collections import Counter
...
count_table = Counter(squad)
for squad_size, n in count_table.items():
...
You can collect all occurences of the preferred number for each knight in a dictionary.
Then test if the number of knights with a given preferred number is divisible by that number.
with open('cavalryin.txt', 'r') as f:
lines = f.readlines()
# convert to int
list_int = [int(a) for a in lines]
#initialise counting dictionary: key: preferred number, item: empty list to collect all knights with preferred number.
collect_dict = {a:[] for a in range(1,1+max(list_int[1:]))}
print(collect_dict)
# loop though list, ignoring first entry.
for a in list_int[1:]:
collect_dict[a].append(a)
# initialise output
out='YES'
for key, item in collect_dict.items():
# check number of items with preference for number is divisilbe
# by that number
if item: # if list has entries:
if (len(item) % key) > 0:
out='NO'
break
with open('cavalryout.txt', 'w') as f:
f.write(out)
I am studying "Python for Everybody" book written by Charles R. Severance and I have a question to the exercise2 from Chapter7.
The task is to go through the mbox-short.txt file and "When you encounter a line that starts with “X-DSPAM-Confidence:” pull apart the line to extract the floating-point number on the line. Count these lines and then compute the total of the spam confidence values from these lines. When you reach the end of the file, print out the average spam confidence."
Here is my way of doing this task:
fname = input('Enter the file name: ')
try:
fhand = open(fname)
except:
print('File cannot be opened:', fname)
exit()
count = 0
values = list()
for line in fhand:
if line.startswith('X-DSPAM-Confidence:'):
string = line
count = count + 1
colpos = string.find(":")
portion = string[colpos+1:]
portion = float(portion)
values.append(portion)
print('Average spam confidence:', sum(values)/count)
I know this code works because I get the same result as in the book, however, I think this code can be simpler. The reason I think so is because I used a list in this code (declared it and then stored values in it). However, "Lists" is the next topic in the book and when solving this task I didn't know anything about lists and had to google them. I solved this task this way, because this is what I'd do in the R language (which I am already quite familiar with), I'd make a vector in which I'd store the values from my iteration.
So my question is: Can this code be simplified? Can I do the same task without using list? If yes, how can I do it?
I could change the "values" object to a floating type. The overhead of a list is not really needed in the problem.
values = 0.0
Then in the loop use
values += portion
Otherwise, there really is not a simpler way as this problem has tasks and you must meet all of the tasks in order to solve it.
Open File
Check For Error
Loop Through Lines
Find certain lines
Total up said lines
Print average
If you can do it in 3 lines of code great but that doesn't make what goes on in the background necessarily simpler. It will also probably look ugly.
You could filter the file's lines before the loop, then you can collapse the other variables into one, and get the values using list-comprehension. From that, you have your count from the length of that list.
interesting_lines = (line.startswith('X-DSPAM-Confidence:') for line in fhand)
values = [float(line[(line.find(":")+1):]) for line in interesting_lines]
count = len(values)
Can I do the same task without using list?
If the output needs to be an average, yes, you can accumlate the sum and the count as their own variables, and not need a list to call sum(values) against
Note that open(fname) is giving you an iterable collection anyway, and you're looping over the "list of lines" in the file.
List-comprehensions can often replace for-loops that add to a list:
fname = input('Enter the file name: ')
try:
fhand = open(fname)
except:
print('File cannot be opened:', fname)
exit()
values = [float(l[l.find(":")+1:]) for l in fhand if l.startswith('X-DSPAM-Confidence:')]
print('Average spam confidence:', sum(values)/len(values))
The inner part is simply your code combined, so perhaps less readable.
EDIT: Without using lists, it can be done with "reduce":
from functools import reduce
fname = input('Enter the file name: ')
try:
fhand = open(fname)
except:
print('File cannot be opened:', fname)
exit()
sum, count = reduce(lambda acc, l: (acc[0] + float(l[l.find(":")+1:]), acc[1]+1) if l.startswith('X-DSPAM-Confidence:') else acc, fhand, (0,0))
print('Average spam confidence:', sum / count)
Reduce is often called "fold" in other languages, and it basically allows you to iterate over a collection with an "accumulator". Here, I iterate the collection with an accumulator which is a tuple of (sum, count). With each item, we add to the sum and increment the count. See Reduce documentation.
All this being said, "simplify" does not necessarily mean as little code as possible, so I would stick with your own code if you're not comfortable with these shorthand notations.
I am trying to learn python and trying to create a simple application where I have it set to read lines from the text file. The first line is the question and second line is answer. Now, I am able to read the question and answer. But the part where I compare user answer with the actual answer, it doesn't perform the actions in the loop even when the answer entered is correct.
My code :
def populate():
print("**************************************************************************************************************")
f=open("d:\\q.txt")
questionList=[]
b = 1
score=0
start=0
for line in f.read().split("\n"):
questionList.append(line)
while b<len(questionList):
a = questionList[start]
print(a)
userinput=input("input user")
answer=questionList[b]
b = b + 2
print(answer)
if userinput==answer:
score =score+1
print(score)
else:
break
start += 2
I would really appreciate any guidance on this.
My q.txt file:
1. Set of instructions that you write to tell a computer what to do.
Program
2. A language's set of rules.
Syntax
3. Describes the programs that operate the computer.
System Software
4.To achieve a working program that accomplishes its intended tasks by removing all syntax and logical errors from the program
Debugging
5.a program that creates and names computer memory locations and can hold values, and write a series of steps or operations to manipulate those values
Procedural Program
6. The named computer memory locations.
Variables
7. The style of typing the first initial of an identifier in lowercase and making the initial of the second word uppercase. -- example -- "payRate"
Camel Casing
8. Individual operations within a computer program that are often grouped together into logical units.
Methods
9. This is an extension of procedural programming in terms of using variables and methods, but it focuses more on objects.
Object Oriented Programming
10. A concrete entity that has behaviors and attributes.
Objects
Your code was:
always asking the same question: questionList[start]
throwing away the value of every
replacing every space in answers with nothing, so "System Software" becomes "SystemSoftware"
failing to factor in case: need to use .lower() on userinput and answer.
Here's a more pythonic implementation:
#!/usr/bin/env python3
from itertools import islice
# Read in every line and then put alternating lines together in
# a list like this [ (q, a), (q, a), ... ]
def get_questions(filename):
with open(filename) as f:
lines = [line.strip() for line in f]
number_of_lines = len(lines)
if number_of_lines == 0:
raise ValueError('No questions in {}'.format(filename))
if number_of_lines % 2 != 0:
raise ValueError('Missing answer for last question in {}'.filename)
return list(zip(islice(lines, 0, None, 2), islice(lines, 1, None, 2)))
def quiz(questions):
score = 0
for question, answer in questions:
user_answer = input('\n{}\n> '.format(question))
if user_answer.lower() == answer.lower():
print('Correct')
score += 1
else:
print('Incorrect: {}'.format(answer))
return score
questions = get_questions('questions.txt')
score = quiz(questions)
num_questions = len(questions)
print('You scored {}/{}'.format(score, num_questions))
This piece of code in theory have to compare two lists which have the ID of a tweet, and in this comparison if it already exists in screen printing , otherwise not.
But I print all or not being listed.
Any suggestions to compare these two lists of ID's and if not the ID of the first list in the second then print it ?
Sorry for the little efficient code . ( and my English )
What I seek is actually not do RT ( retweet ) repeatedly when I already have . I use Tweepy library , I read the timeline , and make the tweet RT I did not do RT
def analizarRT():
timeline = []
temp = []
RT = []
fileRT = openFile('rt.txt')
for status in api.user_timeline('cnn', count='6'):
timeline.append(status)
for i in range(6):
temp.append(timeline[i].id)
for a in range(6):
for b in range(6):
if str(temp[a]) == fileRT[b]:
pass
else:
RT.append(temp[a])
for i in RT:
print i
Solved add this function !
def estaElemento(tweetId, arreglo):
encontrado = False
for a in range(len(arreglo)):
if str(tweetId) == arreglo[a].strip():
encontrado = True
break
return encontrado
Its a simple program, don't complicate it. As per your comments, there are two lists:)
1. timeline
2. fileRT
Now, you want to compare the id's in both these lists. Before you do that, you must know the nature of these two lists.
I mean, what is the type of data in the lists?
Is it
list of strings? or
list of objects? or
list of integers?
So, find out that, debug it, or use print statements in your code. Or please add these details in your question. So, you can give a perfect answer.
Mean while, try this:
if timeline.id == fileRT.id should work.
Edited:
def analizarRT():
timeline = []
fileRT = openFile('rt.txt')
for status in api.user_timeline('cnn', count='6'):
timeline.append(status)
for i in range(6):
for b in range(6):
if timeline[i].id == fileRT[b].id:
pass
else:
newlist.append(timeline[i].id)
print newlist
As per your question, you want to obtain them, right?. I have appended them in a newlist. Now you can say print newlist to see the items
your else statement is associated with the for statement, you probably need to add one more indent to make it work on the if statement.
I was browsing through "Text Processing in Python" and tried its example about Schwartzian sort.
I used following structure for sample data which also contains empty lines. I sorted this data by fifth column:
383230 -49 -78 1 100034 '06 text' 9562 'text' 720 'text' 867
335067 -152 -18 3 100030 'text' 2400 'text' 2342 'text' 696
136592 21 230 3 100035 '03. text' 10368 'text' 1838 'text' 977
Code used for Schwartzian sorting:
for n in range(len(lines)): # Create the transform
lst = string.split(lines[n])
if len(lst) >= 4: # Tuple w/ sort info first
lines[n] = (lst[4], lines[n])
else: # Short lines to end
lines[n] = (['\377'], lines[n])
lines.sort() # Native sort
for n in range(len(lines)): # Restore original lines
lines[n] = lines[n][1]
open('tmp.schwartzian','w').writelines(lines)
I don't get how the author intended that short or empty lines should go to end of file by using this code. Lines are sorted after the if-else structure, thus raising empty lines to top of file. Short lines of course work as supposed with the custom sort (fourth_word function) as implemented in the example.
This is now bugging me, so any ideas? If I'm correct about this then how would you ensure that short lines actually stay at end of file?
EDIT: I noticed the square brackets around '\377'. This messed up sort() so I removed those brackets and output started working.
else: # Short lines to end
lines[n] = (['\377'], lines[n])
print type(lines[n][0])
>>> (type 'list')
I accepted nosklo's answer for good clarification about the meaning of '\377' and for his improved algorithm. Many thanks for the other answers also!
If curious, I used 2 MB sample file which took 0.95 secs with the custom sort and 0.09 with the Schwartzian sort while creating identical output files. It works!
Not directly related to the question, but note that in recent versions of python (since 2.3 or 2.4 I think), the transform and untransform can be performed automatically using the key argument to sort() or sorted(). eg:
def key_func(line):
lst = string.split(line)
if len(lst) >= 4:
return lst[4]
else:
return '\377'
lines.sort(key=key_func)
I don't know what is the question, so I'll try to clarify things in a general way.
This algorithm sorts lines by getting the 4th field and placing it in front of the lines. Then built-in sort() will use this field to sort. Later the original line is restored.
The lines empty or shorter than 5 fields fall into the else part of this structure:
if len(lst) >= 4: # Tuple w/ sort info first
lines[n] = (lst[4], lines[n])
else: # Short lines to end
lines[n] = (['\377'], lines[n])
It adds a ['\377'] into the first field of the list to sort. The algorithm does that in hope that '\377' (the last char in ascii table) will be bigger than any string found in the 5th field. So the original line should go to bottom when doing the sort.
I hope that clarifies the question. If not, perhaps you should indicate exaclty what is it that you want to know.
A better, generic version of the same algorithm:
sort_by_field(list_of_str, field_number, separator=' ', defaultvalue='\xFF')
# decorates each value:
for i, line in enumerate(list_of_str)):
fields = line.split(separator)
try:
# places original line as second item:
list_of_str[i] = (fields[field_number], line)
except IndexError:
list_of_str[i] = (defaultvalue, line)
list_of_str.sort() # sorts list, in place
# undecorates values:
for i, group in enumerate(list_of_str))
list_of_str[i] = group[1] # the second item is original line
The algorithm you provided is equivalent to this one.
An empty line won't pass the test
if len(lst) >= 4:
so it will have ['\377'] as its sort key, not the 5th column of your data, which is lst[4] ( lst[0] is the first column).
Well, it will sort short lines almost at the end, but not quite always.
Actually, both the "naive" and the schwartzian version are flawed (in different ways). Nosklo and wbg already explained the algorithm, and you probably learn more if you try to find the error in the schwartzian version yourself, therefore I will give you only a hint for now:
Long lines that contain certain text
in the fourth column will sort later
than short lines.
Add a comment if you need more help.
Although the used of the Schwartzian transform is pretty outdated for Python it is worth mentioning that you could have written the code this way to avoid the possibility of a line with line[4] starting with \377 being sorted into the wrong place
for n in range(len(lines)):
lst = lines[n].split()
if len(lst)>4:
lines[n] = ((0, lst[4]), lines[n])
else:
lines[n] = ((1,), lines[n])
Since tuples are compared elementwise, the tuples starting with 1 will always be sorted to the bottom.
Also note that the test should be len(list)>4 instead of >=
The same logic applies when using the modern equivalent AKA the key= function
def key_func(line):
lst = line.split()
if len(lst)>4:
return 0, lst[4]
else:
return 1,
lines.sort(key=key_func)