I'm trying to make a simple web scraper that will send me an e-mail about deals posted on a website's page. I am using beautifulsoup in order to scrape the info into a list called "list". I can get the output to look they way I want it using a print command, but when I try to use the same loops to append the strings into a list I get the following error
> ----- Post with most thanks ------ Traceback (most recent call last):
> ----- Trending Hot Deals ------ File "C:/Users/Geoff/PycharmProjects/web_scraping/Historian_file.py", line
> 45, in <module>
> ----- Popular Threads ------
> print "\n".join(msg)
> ----- New Posts ------ TypeError: sequence item 0: expected string, NoneType found
>
> Process finished with exit code 1
here is the code, the commented out parts don't work, the print commands do.
def title(number):
if number == 1:
print "----- Post with most thanks ------"
elif number == 2:
print "----- Trending Hot Deals ------"
elif number == 3:
print "----- Popular Threads ------"
else:
print "----- New Posts ------"
msg = []
x = 1
for i in list:
print title(x)
#msg.append(title(x))
x = x+1
for j in i:
l = j.encode_contents()
print l
#msg.append(l)
#print "\n".join(msg)
I appreciate any help on this.
Thanks
Change print statement on return in the title function.
def title(number):
if number == 1:
return "----- Post with most thanks ------"
elif number == 2:
return "----- Trending Hot Deals ------"
elif number == 3:
return "----- Popular Threads ------"
else:
return "----- New Posts ------"
Remember that every function without return statement always returns None.
Moved to an answer, because people are shorting you on valid information.
Yes you need to return data from the function, rather than printing it. So change print to return, and you will be set (as long as you return strings, or sanitize your data)
This is a good lesson on debugging code. Your stack trace says the issue, but your title ignores it as do some of the others. .join() expects string types, so that is where this is causing you issue.
To debug strange issues, you will want to pay more attention to the stacktrace, which was a little cluttered by the prints (don't worry - all of us have ignored this data before).
None is a valid type that can be in a list. As a result, calling mylist.append(Foo()) when Foo returns nothing, appends None to your list. It's completely valid.
Your actual issue, however, is when you try to call .join(ListWithNotStringsInIt). Read that as: I gave .join() list of items, at least one of them was not a string, nor could it be implicitly cast to a string (str()).
How should you solve that to avoid issues in the future? Sanitize your data.
List comprehension is a pretty good way to do this, though - it should be completely unnecessary if you are handling your data responsibly:
'\n'.join([str(x) for x in my_list])
Python expects that you as a programmer are wise about how you can use it. As a result, you get lots of rope to hang yourself with (such as seeing errors like this).
List comprehension should not be required here, and if you are responsible at using your list correctly, it is completely unnecessary. But it's a way to check what you are getting, especially when there are exceptions thrown and you're debugging the issue.
Your title function doesn't return anything, so when you try to append the result of calling the function, you're not actually appending the result from title. Instead, you are appending None. To fix this, you need to return them, instead of simply printing them.
Related
So I have been struggling with this issue for what seems like forever now (I'm pretty new to Python). I am using Python 3.7 (need it to be 3.7 due to variations in the versions of packages I am using for the project) to develop an AI chatbot system that can converse with you based on your text input. The program reads the contents of a series of .yml files when it starts. In one of the .yml files I am developing a syntax for when the first 5 characters match a ^###^ pattern, it will instead execute the code and return the result of that execution rather than just output text back to the user. For example:
Normal Conversation:
- - What is AI?
- Artificial Intelligence is the branch of engineering and science devoted to constructing machines that think.
Service/Code-based conversation:
- - Say hello to me
- ^###^print("HELLO")
The idea is that when you ask it to say hello to you, the ^##^print("HELLO") string will be retrieved from the .yml file, the first 5 characters of the response will be removed, the response will be sent to a separate function in the python code where it will run the code and store the result into a variable which will be returned from the function into a variable that will give the nice, clean result of HELLO to the user. I realize that this may be a bit hard to follow, but I will straighten up my code and condense everything once I have this whole error resolved. As a side note: Oracle is just what I am calling the project. I'm not trying to weave Java into this whole mess.
THE PROBLEM is that it does not store the result of the code being run/executed/evaluated into the variable like it should.
My code:
def executecode(input):
print("The code to be executed is: ",input)
#note: the input may occasionally have single quotes and/or double quotes in the input string
result = eval("{}".format(input))
print ("The result of the code eval: ", result)
test = eval("2+2")
test
print(test)
return result
#app.route("/get")
def get_bot_response():
userText = request.args.get('msg')
print("Oracle INTERPRETED input: ", userText)
ChatbotResponse = str(english_bot.get_response(userText))
print("CHATBOT RESPONSE VARIABLE: ", ChatbotResponse)
#The interpreted string was a request due to the ^###^ pattern in front of the response in the custom .yml file
if ChatbotResponse[:5] == '^###^':
print("---SERVICE REQUEST---")
print(executecode(ChatbotResponse[5:]))
interpreter_response = executecode(ChatbotResponse[5:])
print("Oracle RESPONDED with: ", interpreter_response)
else:
print("Oracle RESPONDED with: ", ChatbotResponse)
return ChatbotResponse
When I run this code, this is the output:
Oracle INTERPRETED input: How much RAM do you have?
CHATBOT RESPONSE VARIABLE: ^###^print("HELLO")
---SERVICE REQUEST---
The code to be executed is: print("HELLO")
HELLO
The result of the code eval: None
4
None
The code to be executed is: print("HELLO")
HELLO
The result of the code eval: None
4
Oracle RESPONDED with: None
Output on the website interface
Essentially, need it to say HELLO for the "The result of the code eval:" output. This should get it to where the chatbot responds with HELLO in the web interface, which is the end goal here. It seems as if it IS executing the code due to the HELLO's after the "The code to be executed is:" output text. It's just not storing it into a variable like I need it to.
I have tried eval, exec, ast.literal_eval(), converting the input to string with str(), changing up the single and double quotes, putting \ before pairs of quotes, and a few other things. Whenever I get it to where the program interprets "print("HELLO")" when it executes the code, it complains about the syntax. Also, from several days of looking online I have figured out that exec and eval aren't generally favored due to a bunch of issues, however I genuinely do not care about that at the moment because I am trying to make something that works before I make something that is good and works. I have a feeling the problem is something small and stupid like it always is, but I have no idea what it could be. :(
I used these 2 resources as the foundation for the whole chatbot project:
Text Guide
Youtube Guide
Also, I am sorry for the rather lengthy and descriptive question. It's rare that I have to ask a question of my own on stackoverflow because if I have a question, it usually already has a good answer. It feels like I've tried everything at this point. If you have a better suggestion of how to do this whole system or you think I should try approaching this another way, I'm open to ideas.
Thank you for any/all help. It is very much appreciated! :)
The issue is that python's print() doesn't have a return value, meaning it will always return None. eval simply evaluates some expression, and returns back the return value from that expression. Since print() returns None, an eval of some print statement will also return None.
>>> from_print = print('Hello')
Hello
>>> from_eval = eval("print('Hello')")
Hello
>>> from_print is from_eval is None
True
What you need is a io stream manager! Here is a possible solution that captures any io output and returns that if the expression evaluates to None.
from contextlib import redirect_stout, redirect_stderr
from io import StringIO
# NOTE: I use the arg name `code` since `input` is a python builtin
def executecodehelper(code):
# Capture all potential output from the code
stdout_io = StringIO()
stderr_io = StringIO()
with redirect_stdout(stdout_io), redirect_stderr(stderr_io):
# If `code` is already a string, this should work just fine without the need for formatting.
result = eval(code)
return result, stdout_io.getvalue(), stderr_io.getvalue()
def executecode(code):
result, std_out, std_err = executecodehelper(code)
if result is None:
# This code didn't return anything. Maybe it printed something?
if std_out:
return std_out.rstrip() # Deal with trailing whitespace
elif std_err:
return std_err.rstrip()
else:
# Nothing was printed AND the return value is None!
return None
else:
return result
As a final note, this approach is heavily linked to eval since eval can only evaluate a single statement. If you want to extend your bot to multiple line statements, you will need to use exec, which changes the logic. Here's a great resource detailing the differences between eval and exec: What's the difference between eval, exec, and compile?
It is easy just convert try to create a new list and add the the updated values of that variable to it, for example:
if you've a variable name myVar store the values or even the questions no matter.
1- First declare a new list in your code as below:
myList = []
2- If you've need to answer or display the value through myVar then you can do like below:
myList.append(myVar)
and this if you have like a generator for the values instead if you need the opposite which means the values are already stored then you will just update the second step to be like the following:
myList[0]='The first answer of the first question'
myList[1]='The second answer of the second question'
ans here all the values will be stored in your list and you can also do this in other way, for example using loops is will be much better if you have multiple values or answers.
Could you help me to understand why can't have anything printed?
list = [1, 2,3,4, 5, 7, 9]
def test(list):
# Using for loop
for i in list:
d = i % 2
if d == 0:
print('odd')
save = i + " it's odd"
else:
print('even')
save= i + " it's even"
return save
print(save)
This should work for you, let me know if it doesn't. Here's a couple of pointers on your code above:
in the line save = i + " it's odd" and save= i + " it's even" you are trying to add a string and an int -- python doesn't really like this and will throw an error. But you can use .format method to insert values into a string. Here's a link with some examples https://www.geeksforgeeks.org/python-format-function/
The return will end your function. So the way you have it will only let the for loop get the first index in the list. (index 0). So for your purposes I would just go ahead and ditch the return call.
You call your list "list", which is intuitive-especially if you are just starting out. But python has built in functions that you can call on and one of them is list. It is a good idea to think of these as "reserved words" and to not use them as variable names. Try something like "lis" or "my_list" instead. Here is a link to some built in python functions that you will see a lot. https://docs.python.org/3/library/functions.html
Lastly, to get a function to run you must call it. You can see in the code below, at the very bottom that I have test(lis). This tells python to go ahead and run the function with the input we specify (in this case it is lis)
lis = [1, 2,3,4, 5, 7, 9]
def test(list_):
# edit 'def' requires tab indentation
# Using for loop
for i in list_:
if (i % 2) != 0:
save = ("{0} it's odd".format(i))
else:
save = ("{0} it's even".format(i))
print(save)
test(lis)
Brack, Thanks for your patience.
That has been said before, but, believe it or not, spyder doesn't complain. I will use as mentioned on geeksforgeeks.
Understood your explanation. I tried to do this way because I was following a peace of code to make some changes on it. To do that I was trying to understand it first.
Got it
4.That's another thing I can't see in that code.(it works because I use it).
The code I was trying to modify is this: https://gist.github.com/ericrobskyhuntley/0c293113aa75a254237c143e0cf962fa
And what I want is to include a way to save the processed records when it reaches , for example, multiples of 500.
It will avoid to lose work done if something goes wrong (like it did often)
Once again, thank you!
just I wanted to let you know that the answer you responded too is actually mine, not Moquin. Their name shows up on there too because they edited a spelling error. Also I am creating another answer here to respond to you because I am not yet able to respond in the comment section of other peoples questions. (That is reserved for individuals who have a reputation of 50 or higher on this site). If you liked my last answer, please consider accepting it as the answer by pressing the check mark.
To answer your other question regarding preserving your work, I would consider writing your output to a list. Before, we just printed it out, but if you write it to a list you can access it later in your code. Do so like this:
lis = [1, 2,3,4, 5, 7, 9]
empty_lis = []
def test(list_):
# edit 'def' requires tab indentation
# Using for loop
for i in list_:
if (i % 2) != 0:
empty_lis.append(save)
save = ("{0} it's odd".format(i))
else:
empty_lis.append(save)
save = ("{0} it's even".format(i))
print(empty_lis)
test(lis)
or additionally, you could consider creating a log file to keep track of output and where your script breaks. It will write to a .log file in whatever path you specify.
import logging
#create log
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
formatter = logging.Formatter("%(asctime)s:%(levelname)s:%(message)s")
file_handler = logging.FileHandler(r'{insert path here}testlog.log')#iput file path in the curly
#braces and name the lof whatever you want. Just make sure it ends in .log
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
lis = [1, 2,3,4, 6, 7, 9]
empty_lis = []
def test(list_):
# edit 'def' requires tab indentation
# Using for loop
for i in list_:
try:
if (i % 2) != 0:
save = ("{0} it's odd".format(i))
empty_lis.append(save)
else:
save = ("{0} it's even".format(i))
empty_lis.append(save)
logger.info(" my list is{0}".format(empty_lis)) # this writes to your log
print(empty_lis)
except:
logger.exception("Did not work. See logs for details.")
test(lis)
I'd like to make a program that makes offline copies of math questions from Khan Academy. I have a huge 21.6MB text file that contains data on all of their exercises, but I have no idea how to start analyzing it, much less start pulling the questions from it.
Here is a pastebin containing a sample of the JSON data. If you want to see all of it, you can find it here. Warning for long load time.
I've never used JSON before, but I wrote up a quick Python script to try to load up individual "sub-blocks" (or equivalent, correct term) of data.
import sys
import json
exercises = open("exercises.txt", "r+b")
byte = 0
frontbracket = 0
backbracket = 0
while byte < 1000: #while byte < character we want to read up to
#keep at 1000 for testing purposes
char = exercises.read(1)
sys.stdout.write(char)
#Here we decide what to do based on what char we have
if str(char) == "{":
frontbracket = byte
while True:
char = exercises.read(1)
if str(char)=="}":
backbracket=byte
break
exercises.seek(frontbracket)
block = exercises.read(backbracket-frontbracket)
print "Block is " + str(backbracket-frontbracket) + " bytes long"
jsonblock = json.loads(block)
sys.stdout.write(block)
print jsonblock["translated_display_name"]
print "\nENDBLOCK\n"
byte = byte + 1
Ok, the repeated pattern appears to be this: http://pastebin.com/4nSnLEFZ
To get an idea of the structure of the response, you can use JSONlint to copy/paste portions of your string and 'validate'. Even if the portion you copied is not valid, it will still format it into something you can actually read.
First I have used requests library to pull the JSON for you. It's a super-simple library when you're dealing with things like this. The API is slow to respond because it seems you're pulling everything, but it should work fine.
Once you get a response from the API, you can convert that directly to python objects using .json(). What you have is essentially a mixture of nested lists and dictionaries that you can iterate through and pull specific details. In my example below, my_list2 has to use a try/except structure because it would seem that some of the entries do not have two items in the list under translated_problem_types. In that case, it will just put 'None' instead. You might have to use trial and error for such things.
Finally, since you haven't used JSON before, it's also worth noting that it can behave like a dictionary itself; you are not guaranteed the order in which you receive details. However, in this case, it seems the outermost structure is a list, so in theory it's possible that there is a consistent order but don't rely on it - we don't know how the list is constructed.
import requests
api_call = requests.get('https://www.khanacademy.org/api/v1/exercises')
json_response = api_call.json()
# Assume we first want to list "author name" with "author key"
# This should loop through the repeated pattern in the pastebin
# access items as a dictionary
my_list1 = []
for item in json_response:
my_list1.append([item['author_name'], item['author_key']])
print my_list1[0:5]
# Now let's assume we want the 'sha' of the SECOND entry in translated_problem_types
# to also be listed with author name
my_list2 = []
for item in json_response:
try:
the_second_entry = item['translated_problem_types'][0]['items'][1]['sha']
except IndexError:
the_second_entry = 'None'
my_list2.append([item['author_name'], item['author_key'], the_second_entry])
print my_list2[0:5]
I am trying to write an Go_nogo Task in Psychopy. Even though I managed to write a script which is working, there are still a few things that make troubles. First, I present pictures of emotional stimuli (im_n, neural; im_a, emotional) and people should only answer by pressing "space" if neutral emotional pictures are presented. When I run the code below everything works well until I don't press any key or the wrong key. So my question is, how do I have to write the code that I don't get kicked out of the run while not answering...? Thanks everybody!
for im in imlist: # Loop for each Image in the List
picShown = bitmap.setImage(im)
bitmap.draw()
win.flip()
rt_clock.reset()
resp = False
while rt_clock.getTime() < timelimit: # timelimit is defined 2 s
if not resp:
resp = event.getKeys(keyList=['space'])
rt = rt_clock.getTime()
if im in im_n: # im_n is an extra list of one kind of images
correctResp = 'space'
if resp[0]==correctResp:
corrFb.draw() # is defined as a "green O"
else:
incorrFb.draw() # is defined as a "red X"
win.flip()
core.wait(ISI)
I get the error message:
if resp[0]==correctResp:
IndexError: list index out of range
My guess is that you get this error message at the if resp[0]==correctResp: line:
IndexError: list index out of range
Is that true? If yes, it is simply because event.getKeys() returns an empty list [] if no responses were collected. And doing [][0] will give you the above error because there's no first element (index zero) just like [1,2,3,4][1000] will give you the same error. Note that even if you press a lot of keys and none of them are in the keyList, getKeys will return an empty list because it ignores everything but the contents of the keyList (unless you set keyList=None, in which case it accepts all keys).
There's a few simple ways out of this. Firstly, you can simply check whether resp is empty and give a "fail" score if it is and only check for correctness if it is not. A more general solution, which would work with many response keys and scoring criteria, is to do if correctResp in resp and then score as a success if yes. This comparison will work with an empty list as well, in which case it always returns False as empty lists per definition can't contain anything.
But in your particular case, you only have one response option so it is even simpler! Since you've "filtered" responses sing the keyList, you KNOW that if resp is [], the subject answered "no-go" and conversely if resp is not [], he/she answered "go". So:
if im in im_n: # im_n is an extra list of one kind of images
if resp: # if subject answered 'go'
corrFb.draw() # is defined as a "green O"
else:
incorrFb.draw() # is defined as a "red X"
Actually, I suspect that you also want to give feedback in trials without neutral images. In that case, define correct as bool(resp) is (im in im_n):
if bool(resp) is (im in im_n): # if answer correspond to trial type
corrFb.draw() # is defined as a "green O"
else:
incorrFb.draw() # is defined as a "red X"
I am using some code I found here on SO to google search a set of strings and return the "expected" amount of results. Here is that code:
for a in months:
for b in range(1, daysInMonth[a] + 1):
#Code
if not myString:
googleStats.append(None)
else:
try:
query = urllib.urlencode({'q': myString})
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
search_response = urllib.urlopen(url)
search_results = search_response.read()
results = json.loads(search_results)
data = results['responseData']
googleStats.append(data['cursor']['estimatedResultCount'])
except TypeError:
googleStats.append(None)
for x in range(0, len(googleStats)):
if googleStats[x] != None:
finalGoogleStats.append(googleStats[x])
There are two problems, which may be related. When I return the len(finalGoogleStats), it's different every time. One time it's 37, then it's 12. However, it should be more like 240.
This is TypeError I receive when I take out the try/except:
TypeError: 'NoneType' object has no attribute '__getitem__'
which occurs on line
googleStats.append(data['cursor']['estimatedResultCount'])
So, I just can't figure out why the number of Nones in googleStats changes every time and it's never as low as it should be. If anyone has any ideas, I'd love to hear them, thanks!
UPDATE
When I try to print out data for every think I'm searching, I get a ton of Nones and very, very few actual JSON dictionaries. The dictionaries I do get are spread out across all the searches, I don't see a pattern in what is a None and what isn't. So, the problem looks like it has more to do with GoogleAPI than anything else.
First, I'd say remove your try..except clause and see where exactly the problem is. Then as a general good practice, when you try to access layers of dictionary elements, use .get() method instead for better control.
As a demonstration of your possible TypeError, here is my educated guess:
>>> a = {}
>>> a['lol'] = None
>>> a['lol']['teemo']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object has no attribute '__getitem__'
>>>
There are ways to use .get(), for a simple demonstration:
>>> a = {}
>>> b = a.get('lol') # will return None
>>> if type(b) is dict: # determine type
... print b.get('teemo') # same technique if b is indeed of type dict
...
>>>
The answer is what I was fearing for a while, but thanks to everyone who tried to help, I upvoted you if anythign was useful.
So, Google seems to randomly freak out that I'm searching so must stuff. Here's the error they give to me :
Suspected Terms of Service Abuse ...... responseStatus:403
So, I guess they put limits on how much I can search with them. What is still strange, though, is that it doesn't happen all the time, I still get sporadic successful searches within the sea of errors. That is still a mystery...
By default the googleapi pass the least result. If you want to increase your display results, in your url add another parameter 'rsz=8' (by default rsz=1 hence the small result).
so your new url becomes:
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&rsz=8&%s' % query
see detailed documentation here: https://developers.google.com/web-search/docs/reference#_class_GSearch