I have this Random Wikipedia Article Generator that I have created but I want that it will generate an article about a specific topic and for some reason, it generates articles but not about the topic that I have entered.
Code:
import requests
from bs4 import BeautifulSoup
import webbrowser
while True:
topic = input("Plz enter a topic you would like to read about: ")
url = requests.get(f"https://en.wikipedia.org/wiki/Special:Random/{topic}")
soup = BeautifulSoup(url.content, "html.parser")
title = soup.find(class_="firstHeading").text
answer = input(f"The article is: {title} \nWould you like to view this article? (Y/N)\n").upper()
if answer == "Y":
url = "https://en.wikipedia.org/wiki\%s" %title
webbrowser.open(url)
leave = input("Would you like to read another one? (Y/N)\n").upper()
if leave == "Y":
pass
else:
break
elif answer == "N":
print("Try again!!")
else:
print("I don't know what it is")
According to: https://en.wikipedia.org/wiki/Wikipedia:Special:Random
The endpoint for a categorized random subject is:
https://en.wikipedia.org/wiki/Special:RandomInCategory/
And not:
https://en.wikipedia.org/wiki/Special:Random/
So you should change your url to:
url = requests.get(f"https://en.wikipedia.org/wiki/Special:RandomInCategory/{topic}")
Related
Trying to make this web crawler ask the user for a position of a song from the top charts and then print that song, position etc. I've been able to do all of this but the position is always one more. For example, if you asked for the 2nd top song of the week, it would give you the third.
import requests
from bs4 import BeautifulSoup
lol = input("Number: ")
if lol == "1":
url = 'https://www.officialcharts.com/charts/singles-chart/'
req = requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')
print("")
print("TYPE THE POSITION")
user_pos = int(input("Please pick a number between 1 - 100: "))
if 1 <= user_pos <= 100:
hit = soup.find_all("div", class_="title-artist")[user_pos - 1]
title, artist = [hr.text for hr in hit.findAll("a")]
print("POSITION | TITLE | ARTISTS")
print(user_pos, "|", title, "|", artist)
else:
print("Next time pick a number between 1 and 100")
It's because of he indexing. When you want to get very first element, you have to call soup.find_all("div", class_="title-artist")[user_pos - 1]. soup.find_all() returns a list. Indexing starts from 0.
Imagine this, a program that lists out all the redirects that a link has (i.e a grabify link.)
Now lets say you wanted to print "yes" if there was more than 1 link pasted, how can I do that?
This is the code that I have:
import requests
import time
def nowdotheyes():
time.sleep(1)
print("...")
time.sleep(3)
responses = requests.get(link)
for responses in responses.history:
print(responses.url)
if (responses.history > 1):
print ("yes")
Here Is the finalized code, I provided a boolean and an int value for you to use in your program if needed. The "Icarly domain is the only one I know that redirects off hand"
import requests
def check(link):
redirects = None
responses = requests.get(link)
print(responses.url)
if len(responses.history) >= 1:
redirects = True
print ("Redirects More Than One Time")
elif len(responses.history) <= 1:
redirects = False
print("Does Not Redirect More Than one time")
total = int(len(responses.history))
print(f"Total Number of Redirects {total}") # Just so you have the total number of redirects if you need it
print(f"Redirect Status: {redirects}") # A boolean that returns if it redirects
check(link="https://www.icarly.com")
Thanks to #Kim Kakan Andersson, I realized the problem. I should have removed the for loop and added lens. Thanks everyone.
I am writing a program that asks the user for a baseball player and a card number and it will return the card's value. I am using BeautifulSoup to web scrape https://mavin.io/search?q= and finding the value referenced in the code below. When I type in Mike Ivie, with card number 45T, the program creates the URL, https://mavin.io/search?q=mike+ivie+45T. From there I want to get the price in the green box Shown here.
I keep getting this as a result, Click for result.
Can anyone help out?
from bs4 import BeautifulSoup
import requests
import re
print('\t Want to know the value of your baseball card?')
print('You found the place, enter the name of the player and the card number below.')
print('Enter "STOP" as the player name when you are done.')
print('----------------------------------------------------------------------------')
url = 'https://mavin.io/search?q='
choice = False
while not(choice):
player = input('Player Name: ').lower()
card_number = input('Card Number: ')
checkPlayer = list(player)
for i in range(0,10):
if str(i) in checkPlayer:
print()
print('-----------------------------')
print('You entered an invalid input!')
print('-----------------------------')
continue
if player == 'stop':
choice = True
print()
print('--------------------------------------------')
print('Thank you for using this tool. See you again')
print('--------------------------------------------')
else:
#add contents to end of url
finalPlayer = player.split(' ')
for i in range(len(finalPlayer)):
url += finalPlayer[i] + '+'
url += card_number
source = requests.get(url).content
soup = BeautifulSoup(source,'lxml')
article = soup.find('div', class_ = 'estimate-box equal-width')
print(article.h4)
The data is in a json format within the script tags. You can get the whole description (and pull out the value using regex or other means). Or (and you'll have to check), seems they just take the average of the lowPrice and highPrice, so just calculate that out.
from bs4 import BeautifulSoup
import requests
import re
import json
import statistics
print('\t Want to know the value of your baseball card?')
print('You found the place, enter the name of the player and the card number below.')
print('Enter "STOP" as the player name when you are done.')
print('----------------------------------------------------------------------------')
url = 'https://mavin.io/search?q='
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36'}
choice = False
while not(choice):
player = input('Player Name: ').lower()
card_number = input('Card Number: ')
checkPlayer = list(player)
for i in range(0,10):
if str(i) in checkPlayer:
print()
print('-----------------------------')
print('You entered an invalid input!')
print('-----------------------------')
continue
if player == 'stop':
choice = True
print()
print('--------------------------------------------')
print('Thank you for using this tool. See you again')
print('--------------------------------------------')
else:
#add contents to end of url
finalPlayer = player.split(' ')
for i in range(len(finalPlayer)):
url += finalPlayer[i] + '+'
url += card_number
source = requests.get(url, headers=headers).content
soup = BeautifulSoup(source,'lxml')
script = soup.find_all('script', {'type':'application/ld+json'})[1]
jsonData = json.loads(script.string)[0]
description = jsonData['description']
highPrice = float(jsonData['offers']['highPrice'])
lowPrice = float(jsonData['offers']['lowPrice'])
average = round(statistics.mean([highPrice, lowPrice]),2)
print(description)
Output:
Value estimate of $4.35. Based on 2 similar items sold in the Baseball Cards category.
Here's the calculated average:
print(average)
4.35
I noticed the price lives in h3.sold-price. That's all I know. Hope it helps. This is not an answer, just an observation.
I'm writing a script in Python that prompts you to ask a question, and analyzes the AskReddit subreddit to and gives you a response. My code is:
import requests
import json
import random
#The main function that will grab a reply
def grab_reply(question):
#Navigate to the Search Reddit Url
r = requests.get('https://www.reddit.com/r/AskReddit/search.json?q=' + question + '&sort=relevance&t=all', headers = {'User-agent': 'Chrome'})
answers = json.loads(r.text) #Load the JSON file
Children = answers["data"]["children"]
ans_list= []
for post in Children:
if post["data"]["num_comments"] >= 5: #Greater then 5 or equal comments
ans_list.append (post["data"]["url"])
#If no results are found return "I have no idea"
if len(ans_list) == 0:
return "I have no idea"
#Pick A Random Post
comment_url=ans_list[random.randint(0,len(ans_list)-1)] + '.json?sort=top' #Grab Random Comment Url and Append .json to end
#Navigate to the Comments
r = requests.get(comment_url, headers = {'User-agent': 'Chrome'})
reply= json.loads(r.text)
Children = reply[1]['data']['children']
reply_list= []
for post in Children:
reply_list.append(post["data"]["body"]) #Add Comments to the List
if len(reply_list) == 0:
return "I have no clue"
#Return a Random Comment
return reply_list[random.randint(0,len(reply_list)-1)]
#Main Loop, Always ask for a question
while 1:
input("Ask me anything: ")
q=q.replace(" ", "+") #Replace Spaces with + for URL encoding
print(grab_reply(q)) #Grab and Print the Reply
After running the script in my terminal, I get this response:
NameError: name 'q' is not defined
I have managed to get most of the errors out of my script, but this one is driving me crazy. Help me out, stack overflow.
probably this will help
while True:
q = input("Ask me anything: ")
input("Ask me anything: ")
should be:
q = input("Ask me anything: ")
Since, you are not assigning the result of the input to any variable. q is undefined.
q is not defined yet. You should defined q before use it.
I've recently started learning Python API's and I've run into a problem while trying to access the HaveIBeenPwned API. I can get it to print the JSON data so I think it's a formatting problem? All other solutions seem to force me to rewrite my entire code only to find it doesn't work anyway or is incompatible.
#This program aims to provide 4 search functions by which users can test if their data is at risk.
import urllib.request as url
import json
import ast
def UsernameSearch():
print("Username search selected!")
def PasswordSearch():
print("Password search selected!")
def EmailSearch():
Username = input("Please enter the Email that's going to be searched \n: ")
def DataGetCurrent(Username):
SearchURL = "https://haveibeenpwned.com/api/v2/breachedaccount/{}".format(Username)
request = url.urlopen(url.Request(SearchURL, headers={'User-Agent' : "Mozilla/5.0"}))
data = request.read()
data = data.decode("utf-8")
json_data = json.loads(data)
return json_data[0]
Data = DataGetCurrent(Username)
a = ("Your Email address has been involved in [number] breaches: \nBreach \nTitle: {}\nWebsite: {}\nDate: {}\nInformation: {}\nLeaked Data: {}".format(Data['Title'],Data['Domain'],Data['BreachDate'],Data['Description'],Data['DataClasses']))
print(a)
def SiteSearch():
print("Website search selected!")
def loop():
try:
answer = input("There are currently 5 options: \n(1)Username search \n(2)Password search \n(3)Email search \n(4)Website search \n(5)Exit \n \n:")
if answer.upper() == "1":
UsernameSearch()
elif answer.upper() == "2":
PasswordSearch()
elif answer.upper() == "3":
EmailSearch()
elif answer.upper() == "4":
SiteSearch()
else:
print("\nThis is invalid, sorry. Please try again!\n")
loop()
except KeyboardInterrupt:
print("\nYou don't need to exit the program this way, there's an exit option; just type \"exit\"!\n")
loop()
loop()
The error it throws is:
TypeError: string indices must be integers
Edit:
Updated now and it does call some information however it only calls the first dictionary entry whereas I need it to call as many as there are (and preferably have a count variable sometimes).
I'm also having trouble selecting the "DataClasses" entry and printing the individual entities within.
All help is appreciated, thanks.
To convert a json string to dictionary, use json module (standard library):
import json
data_str = '{"index" : 5}'
json_dict = json.loads(data_str)
In your example:
import json
# ...
def DataGetCurrent(Username):
SearchURL = "https://haveibeenpwned.com/api/v2/breachedaccount/{}".format(Username)
request = url.urlopen(url.Request(SearchURL, headers={'User-Agent' : "Mozilla/5.0"}))
data = request.read()
data = data.decode("utf-8")
return json.loads(data)
EDIT
Apparently HaveIBeenPwned returns a list of dictionaries. Therefore, to get the results, you need to get the dictionary in the 0th index of the list:
def DataGetCurrent(Username):
SearchURL = "https://haveibeenpwned.com/api/v2/breachedaccount/{}".format(Username)
request = url.urlopen(url.Request(SearchURL, headers={'User-Agent' : "Mozilla/5.0"}))
data = request.read()
data = data.decode("utf-8")
json_list = json.loads(data)
return json_list[0]
EDIT 2
0th element of the list is only one of the results. To process all the results, the list itself should be returned and used accordingly.