How can i get my files to be opened? - python

Hi there im working on a function that merges two separate .txt files and outputs a personalized letter. The problem is, is that i can include my text within the funciton module and it works perfectly. But when i try to open them in the function and to be used by the function i get this
error message:
Traceback (most recent call last):
File "/Users/nathandavis9752/CP104/davi0030_a10/src/q2_function.py", line 25, in
data = cleanData(q2)
File "/Users/nathandavis9752/CP104/davi0030_a10/src/q2_function.py", line 17, in cleanData
return [item.strip().split('\n\n') for item in query.split('--')]
AttributeError: 'file' object has no attribute 'split'
code:
letter = open('letter.txt', 'r')
q2 = open('q2.txt', 'r')
def cleanData(query):
return [item.strip().split('\n\n') for item in query.split('--')]
def writeLetter(template, variables, replacements):
# replace ith variable with ith replacement variable
for i in range(len(variables)):
template = template.replace(variables[i], replacements[i])
return template
data = cleanData(q2)
print (data)
variables = ['[fname]', '[lname]', '[street]', '[city]']
letters = [writeLetter(letter, variables, person) for person in data]
for i in letters:
print (i)
q2.txt file:
Michael
dawn
lock hart ln
Dublin
--
kate
Nan
webster st
king city
--
raj
zakjg
late Road
Toronto
--
dave
porter
Rock Ave
nobleton
letter.txt file:
[fname] [lname]
[street]
[city]
Dear [fname]:
As a fellow citizen of [city], you and all your neighbours
on [street] are invited to a celebration this Saturday at
[city]'s Central Park. Bring beer and food!

You are trying to split a file buffer rather than a string.
def cleanData(query):
return [item.strip().split('\n\n') for item in query.read().split('--')]

Related

"IndexError: list index out of range" When creating an automated response bot

Im creating a Chatbot which uses questions from a CSV file and checks similarity using SKlearn and NLTK, However im getting an error if the same input is entered twice:
This is the main code that takes the user input and outputs an answer to the user:
import pandas as pd
data=pd.read_csv('FootballQA.csv')
question=data['Q'].tolist()
answer=data['A'].tolist()
lemmer = nltk.stem.WordNetLemmatizer()
#WordNet is a semantically-oriented dictionary of English included in NLTK.
def LemTokens(tokens):
return [lemmer.lemmatize(token) for token in tokens]
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)
def LemNormalize(text):
return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))
GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey","how are you")
GREETING_RESPONSES = ["hi", "hey", "hi there", "hello", "I am glad! You are talking to me"]
def greeting(sentence):
for word in sentence.split():
if word.lower() in GREETING_INPUTS:
return random.choice(GREETING_RESPONSES)
GI = ("how are you")
GR = ["i'm fine","good,how can i help you!"]
def greet(sentence):
for word in sentence.split():
if word.lower() in GREETING_INPUTS:
return random.choice(GREETING_RESPONSES)
def responses(user):
response=''
question.append(user)
TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
tfidf = TfidfVec.fit_transform(question)
val = cosine_similarity(tfidf[-1], tfidf)
id1=val.argsort()[0][-2]
flat = val.flatten()
flat.sort()
req = flat[-2]
if(req==0):
robo_response=response+"I am sorry! I don't understand you"
return robo_response
else:
response = response+answer[id1]
question.remove(user)
return response
command=1
while(command):
v = input("Enter your value: ")
if(v=="exit"):
command=0
else:
print(responses(str(v)))
When the program runs it asks the user for their input however the problem happens if the same input is entered twice, if i enter "football" it will first correctly display the output i want but then a second time will stop the program and im given this error:
Enter your value: scored
Alan shearer holds the goal record in the premier league.
Enter your value: football
I am sorry! I don't understand you
Enter your value: football
Traceback (most recent call last):
File "C:\Users\Chris\Desktop\chatbot_simple\run.py", line 79, in <module>
print(responses(str(v)))
File "C:\Users\Chris\Desktop\chatbot_simple\run.py", line 68, in responses
response = response+answer[id1]
IndexError: list index out of range
The csv:
Q,A
Who has scored the most goals in the premier league?,Alan shearer holds the goal record in the premier league.
Who has the most appearences in the premier league?,Gareth Barry has the most appearences in premier league history.
I've tried deleting the variable after each input but it still somehow remembers it, anyone have any ideas ?
Thanks
Chris
answer=data['A'].tolist()
and then later on
id1=val.argsort()[0][-2]
response = response+answer[id1]
So if the anwser don't have id1 in it you will get index out of range. So in your case the len(answer) >= id1 is true.

p_mask=p_mask[span_idx].tolist(),AttributeError: - 'list' object has no attribute 'tolist'

I'm getting the following error when I attempt to pass a question and context to a Transformer pipeline. The abort is actually occurring in the HuggingFace code.
Traceback (most recent call last):
File "pipeline.py", line 66, in <module>
main()
File "pipeline.py", line 63, in main
answer = query(value)
File "pipeline.py", line 45, in query
answer = qa(question=question, context=context)
File "/home/pi/.local/lib/python3.7/site-packages/transformers/pipelines/question_answering.py", line 248, in __call__
return super().__call__(examples[0], **kwargs)
File "/home/pi/.local/lib/python3.7/site-packages/transformers/pipelines/base.py", line 915, in __call__
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "/home/pi/.local/lib/python3.7/site-packages/transformers/pipelines/base.py", line 921, in run_single
model_inputs = self.preprocess(inputs, **preprocess_params)
File "/home/pi/.local/lib/python3.7/site-packages/transformers/pipelines/question_answering.py", line 316, in preprocess
p_mask=p_mask[span_idx].tolist(),
AttributeError: 'list' object has no attribute 'tolist'
I am executing the code on a Raspberry Pi 4 arm71 CPU. I am using Transformer version 4.11.2 and PyTorch version 1.7.0a0. The code executes successfully under Windows 10.
Here is the code...
# -*- coding: iso-8859-15 -*-
import os, sys
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForSequenceClassification
#import torchaudio
import torch
#from speaking import speak
# globals
context = ""
model = None
qa = pipeline("question-answering") # set up the model pipeline once
queries = { # set up dictioanry to hold questions
"Question01": "How old are the Albuquerque volcanoes?",
"Question02": "Where does andesitic compositions exist?",
"Question03": "What is Tephrochronology?",
"Question04": "Do volcanoes exist on other planets?",
"Question05": "What type of volcano is Cerro Verde?",
"Question06": "How old is the oldest rock in Albuqueruqe?",
"Question07": "What does the gradual weathering of the loose cinders create?",
"Question08": "What causes fissure eruption of the Albuquerque Volcanoes?",
"Question09": "What is the name of the composite volcanoe that is 2.5 million years old?",
"Question10": "What are the gas bubble in basalt called?",
"Question11": "What do rocks in the Palomas Volcanic Field consist of?",
"Question12": "What are the volcanic features of the Palomas Volcanic Field?",
"Question13": "Where is the ZUNI-BANDERA FIELD AND MCCARTY'S LAVA FLOW located?",
"Question14": "What surface is fairly common to many pahoehoe flows?",
"Question15": "How far does the The Navajo Dine volcanic field extend?",
"Question16": "How was the Valles Caldera formed?",
"Question17": "What is the largest volcanic field within the Rio Grande rift?",
"Question18": "What geologic feature is located in the central part of Charette Mesa, northwest of Wagon Mound?",
"Question19": "Where did the most recent volcictivity occur in New Mexico?",
"Question20": "what are the floors of the Jornada caves covered with?",
"Question21": "The Albuquerque Volcanoes we see today are are the result of what?",
"Question22": "In what year did the eruption of Mount Pinatubo occur?",
"Question23": "How does volcanic ash form?",
"Question24": "What are volocanic mudflows called?",
"Question25": "What volcano is located about 33 miles east of Raton?"
}
def query(question):
# Generating an answer to the question in context
# qa = pipeline("question-answering")
answer = qa(question=question, context=context)
# save the model
#torch.save(qa.model.state_dict(), "C:\\tmp")
# Print the answer
print("{0}: ".format(question))
print(f"Answer: '{answer['answer']}' with score {answer['score']}")
def main():
global qa, context
# get the volcano corpus
with open('volcanic.corpus', encoding="utf8") as file:
context = file.read().replace('\n', '')
# process each query
for value in queries.values():
answer = query(value)
if __name__ == "__main__":
main()

rdd.first() does not give an error but rdd.collect() does

I am working in pyspark and have the following code, where I am processing tweet and making an RDD with the user_id and text. Below is the code
"""
# Construct an RDD of (user_id, text) here.
"""
import json
def safe_parse(raw_json):
try:
json_object = json.loads(raw_json)
if 'created_at' in json_object:
return json_object
else:
return;
except ValueError as error:
return;
def get_usr_txt (line):
tmp = safe_parse (line)
return ((tmp.get('user').get('id_str'),tmp.get('text')));
usr_txt = text_file.map(lambda line: get_usr_txt(line))
print (usr_txt.take(5))
and the output looks okay (as shown below)
[('470520068', "I'm voting 4 #BernieSanders bc he doesn't ride a CAPITALIST PIG adorned w/ #GoldmanSachs $. SYSTEM RIGGED CLASS WAR "), ('2176120173', "RT #TrumpNewMedia: .#realDonaldTrump #America get out & #VoteTrump if you don't #VoteTrump NOTHING will change it's that simple!\n#Trump htt…"), ('145087572', 'RT #Libertea2012: RT TODAY: #Colorado’s leading progressive voices to endorse #BernieSanders! #Denver 11AM - 1PM in MST CO State Capitol…'), ('23047147', '[VID] Liberal Tears Pour After Bernie Supporter Had To Deal With Trump Fans '), ('526506000', 'RT #justinamash: .#tedcruz is the only remaining candidate I trust to take on what he correctly calls the Washington Cartel. ')]
However, as soon as I do
print (usr_txt.count())
I get an error like below
Py4JJavaError Traceback (most recent call last)
<ipython-input-60-9dacaf2d41b5> in <module>()
8 usr_txt = text_file.map(lambda line: get_usr_txt(line))
9 #print (usr_txt.take(5))
---> 10 print (usr_txt.count())
11
/usr/local/spark/python/pyspark/rdd.py in count(self)
1054 3
1055 """
-> 1056 return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
1057
1058 def stats(self):
What am I missing? Is the RDD not created properly? or there is something else? how do I fix it?
You have returned None from safe_parse method when there is no created_at element in the parsed json line or when there is an error in parsing. This created error while getting elements from the parsed jsons in (tmp.get('user').get('id_str'),tmp.get('text')). That caused the error to occur
The solution is to check for None in get_usr_txt method
def get_usr_txt (line):
tmp = safe_parse(line)
if(tmp != None):
return ((tmp.get('user').get('id_str'),tmp.get('text')));
Now the question is why print (usr_txt.take(5)) showed the result and print (usr_txt.count()) caused the error
Thats because usr_txt.take(5) considered only the first five rdds and not the rest and didn't have to deal with None datatype.

Understanding why this python code works randomly

I'm coding a little script that gets metadata from a sound file and creates a string with the desired values. I know I'm doing something wrong but I ain't sure why, but it's probably the way I am iterating the if's. When I run the code :
import os, mutagen
XPATH= "/home/xavier/Code/autotube/tree/def"
DPATH="/home/xavier/Code/autotube/tree/down"
def get_meta():
for dirpath, directories,files in os.walk(XPATH):
for sound_file in files :
if sound_file.endswith('.flac'):
from mutagen.flac import FLAC
metadata = mutagen.flac.Open(os.path.join(dirpath,sound_file))
for (key, value) in metadata.items():
#print (key,value)
if key.startswith('date'):
date = value
print(date[0])
if key.startswith('artist'):
artist = value
#print(artist[0])
if key.startswith('album'):
album = value
#print(album[0])
if key.startswith('title'):
title = value
#print(title[0])
build_name(artist,album,title) # UnboundLocalError gets raised here
def build_name(artist,album,title):
print(artist[0],album[0],title[0])
I get the desired result or an error, randomly :
RESULT :
1967 Ravi Shankar & Yehudi Menuhin West Meets East Raga: Puriya Kalyan
ERROR :
Traceback (most recent call last):
File "<stdin>", line 39, in <module>
File "<stdin>", line 31, in get_meta
build_name(artist,album,title)
UnboundLocalError: local variable 'album' referenced before assignment
If "title" comes before "album" in the meta data then album will never be initialised. "album" may not exist at all.
As you don't blank out the value of album for each track, if a track has previously had "album" defined then the next track which doesn't define "album" will use the previous track's value.
Give it a blank value for each track (if that's reasonable to you).
Looking at build_name the values are lists of strings, so the default should be ['']:
for sound_file in files:
artist = album = title = ['']
However, you will still not get values before calling build_name if the metadata is out of order.
You need to move build_name(artist, album, title) out of the loop:
for (key, value) in metadata.items():
... # searching metadata
build_name(artist, album, title)

Not iterating through whole dictionary

So basically, I have an api from which i have several dictionaries/arrays. (http://dev.c0l.in:5984/income_statements/_all_docs)
When getting the financial information for each company from the api (e.g. sector = technology and statement = income) python is supposed to return 614 technology companies, however i get this error:
Traceback (most recent call last):
File "C:\Users\samuel\Desktop\Python Project\Mastercopy.py", line 83, in <module>
user_input1()
File "C:\Users\samuel\Desktop\Python Project\Mastercopy.py", line 75, in user_input1
income_statement_fn()
File "C:\Users\samuel\Desktop\Python Project\Mastercopy.py", line 51, in income_statement_fn
if is_response ['sector'] == user_input3:
KeyError: 'sector'
on a random company (usually on one of the 550-600th ones)
Here is the function for income statements
def income_statement_fn():
user_input3 = raw_input("Which sector would you like to iterate through in Income Statement?: ")
print 'Starting...'
for item in income_response['rows']:
is_url = "http://dev.c0l.in:5984/income_statements/" + item['id']
is_request = urllib2.urlopen(is_url).read()
is_response = json.loads(is_request)
if is_response ['sector'] == user_input3:
csv.writerow([
is_response['company']['name'],
is_response['company']['sales'],
is_response['company']['opening_stock'],
is_response['company']['purchases'],
is_response['company']['closing_stock'],
is_response['company']['expenses'],
is_response['company']['interest_payable'],
is_response['company']['interest_receivable']])
print 'loading...'
print 'done!'
print end - start
Any idea what could be causing this error?
(I don't believe that it is the api itself)
Cheers
Well, on testing the url you pass in the urlopen call, with a random number, I got this:
{"error":"not_found","reason":"missing"}
In that case, your function will return exactly the error you get. If you want your program to handle the error nicely and add a "missing" line instead of actual data, you could do that for instance:
def income_statement_fn():
user_input3 = raw_input("Which sector would you like to iterate through in Income Statement?: ")
print 'Starting...'
for item in income_response['rows']:
is_url = "http://dev.c0l.in:5984/income_statements/" + item['id']
is_request = urllib2.urlopen(is_url).read()
is_response = json.loads(is_request)
if is_response.get('sector', False) == user_input3:
csv.writerow([
is_response['company']['name'],
is_response['company']['sales'],
is_response['company']['opening_stock'],
is_response['company']['purchases'],
is_response['company']['closing_stock'],
is_response['company']['expenses'],
is_response['company']['interest_payable'],
is_response['company']['interest_receivable']])
print 'loading...'
else:
csv.writerow(['missing data'])
print 'done!'
print end - start
The problem seems to be with the final row of your income_response data
{"id":"_design/auth","key":"_design/auth","value":{"rev":"1-3d8f282ec7c26779194caf1d62114dc7"}}
This does not have a sector value. You need to alter your code to handle this line, for example by ignoring any line where the sector key is not present.
You could easily have debugged this with a few print statements - for example insert
print item['id'], is_response.get('sector', None)
into your code before the part that outputs the CSV.
A KeyError means that the key you tried to use does not exist in the dictionary. When checking for a key, it is much safer to use .get(). So you would replace this line:
if is_response['sector'] == user_input3:
With this:
if is_response.get('sector') == user_input3:

Categories

Resources