How do you generate a valid youtube URL with python - python

Is there any way to generate a valid youtube URL with python
import requests
from string import ascii_uppercases, ascii_lowercase, digits
charset = list(ascii_uppercase) + list(ascii_lowercase)+ list(digits)
def gen_id():
res = ""
for i in range(11):
res += random.choice(charset)
return res
youtube_url = "https://www.youtube.com/watch?v=" + gen_id()
resp = requests.get(youtube_url)
print (resp.status_code)
I am using this example to generate random youtube url
I get response code 200 but no video found when i try to open the video in the browser
I looked at this method but it does not work

ID's are generated randomly, and are not that predictable. They are all supposedly Base64 though, which helps limit the number of characters (you will probably want to add dashes and underscores to your random generation since codes like gbhDL8BT_w0 are possible). The only real approach known is generation and then testing, and as some commenters mentioned, this might get rate-limited by YouTube.
There are some additional details provided in this answer to a similar question that might help in doing the generation, or satiating curiosity.

It's not possible to pick always a valid random url from all the videos Youtube has, not every valid sequence is a valid id. You have to check yourself that the urls you want to choose randomly are valid. Pick some videos up and put them in a list.
myUrls = [
"https://www.youtube.com/watch?v=...",
"https://www.youtube.com/watch?v=...",
...
]
youtube_url = random.choice(myUrls)

Related

Getting the Protein names and their ID for a given list of peptide sequence (using Python)

I have a list of peptide sequence, I want to map it to the correct protein names from any Open Database like Uniprot, i.e., peptides belonging to the proteins. Can someone guide how to find the protein names and map them, thanks in advance.
I'd say your best bet is to use the requests module and hook into the API that Uniprot has on their website. The API for peptide sequence searching is here, and the docs for it link from the same page.
With this, you should be able to form a dict that contains your search parameters and send a request to the API that will return the results you are looking for. The requests module allows you to retrieve the results as json format, which you can very easily parse back into lists/dicts, etc for use in whatever way you wish.
Edit: I have code!
Just for fun, I tried the first part: looking up the proteins using the peptides. This works! You can see how easy the requests module makes this sort of thing :)
There is another API for retrieving the database entries once you have the list of "accessions" from this first step. All of the API end points and docs can be accessed here. I think you want this one.
import requests
from time import sleep
url = 'https://research.bioinformatics.udel.edu/peptidematchws/asyncrest'
#peps can be a comma separated list for multiple peptide sequences
data={'peps':'MKTLLLTLVVVTIVCLDLGYT','lEQi':'off','spOnly':'off'}
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
response = requests.post(url,params=data,headers=headers)
if response.status_code == 202:
print(f"Search accepted. Results at {response.headers['Location']}")
search_job = requests.get(response.headers['Location'])
while search_job.status_code == 303:
sleep(30)
search_job = requests.get(response.headers['Location'])
if search_job.status_code == 200:
results = search_job.text.split(',')
print('Results found:')
print(results)
else:
print('No matches found')
else:
print('Error Search not accepted')
print(response.status_code, response.reason)

How to use urbandictionary API built in API function random()

I want to build a simple app that will generate random words and their associated defintion from the urban dictionary api. I was thinking I could somehow scrape the website or find a database or .csv file with most of the urban dictionary words and then inject that into the api {word}.
I found their unofficial/official API online here: http://api.urbandictionary.com/v0
And more information about it here: https://pub.dev/documentation/urbandictionary/latest/urbandictionary/OfficialUrbanDictionaryClient-class.html
And here: https://pub.dev/documentation/urbandictionary/latest/urbandictionary/UrbanDictionary-class.html
Inside the second pub.dev link there appears to be a built-in function that generates a random list of words from the site. So obviously rather than having to find a database/web scrape the words this would be a much better way to create this app. Problem is I dont know how to call that function in my code.
New to APIs and here my code so far:
import requests
word = "all good in the hood"
response = requests.get(f"http://api.urbandictionary.com/v0/define?term={word}")
print(response.text)
This gives a long JSON/Dictionary in VSCODE. I think I'd be able to expand on this idea if it's possible to access that random function and just get a random word from the list.
Any help is appreciated.
Thanks
Scraping all the words in the Urban Dictionary would take a very long time. You can get a random word from the Urban Dictionary by calling https://api.urbandictionary.com/v0/random
Here's a function that gets a random word from the Urban Dictionary
def randomword():
response = requests.get("https://api.urbandictionary.com/v0/random")
return response.text
In order to convert the response to JSON, you have to import JSON and do json.loads(response.text). Once converted to JSON, it is basically a dictionary. Here's a code that gets the definition, word, and author of the first definition
data = json.loads(randomword()) #gets random and converts to JSON
firstdef = data["list"][0] #gets first definition
author = firstdef["author"] #author of definition
definition = firstdef["definition"] #definition of word
word = firstdef["word"] #word
Referring to the above comment, I share the method you need.
import requests
word = "all good in the hood"
response = requests.get(f"https://api.urbandictionary.com/v0/random")
# get all list item
for obj in response.json()['list']:
print(obj)
# get index 0 of list
print(response.json()['list'][0])
# get index 0 - word of list
print(response.json()['list'][0]['word'])
The text is in json format, so just use the json module to convert to a dictionary. I also had it just give the definition with the most thumbs
import json
import requests
word = "all good in the hood"
response = requests.get(f"http://api.urbandictionary.com/v0/define?term={word}")
dictionary = json.loads(response.text)['list']
most_thumbs = -1
best_definition = ""
for definition in dictionary:
if definition['thumbs_up']>most_thumbs:
most_thumbs = definition['thumbs_up']
best_definition = definition['definition']
print(f"{word}: {best_definition}")

How to delete lines from a text until a keyword

I am requesting a wikipedia page that returns all the text from that website like so:
def my_function(addr):
response = requests.get(addr)
print(response.text)
my_function("https://en.wikipedia.org/wiki/Web_scraping")
Right now what im trying to do is basically delete unwanted parts, basically all text before the id with the class 'See_also'. Is there a right and easy way to do so? I could not just delete a certain amount of lines since this code is meant to work for different wiki sites.
You can use REGEX (huraay).
import requests
import re
def my_function(addr):
response = requests.get(addr)
print(re.findall("See_also[\\s\\S]*", response.text))
my_function("https://en.wikipedia.org/wiki/Web_scraping")

Handling multiple user URL inputs that then need to be split and processed individually

I'm newer to Python so please be easy on me senpai, since this is probably a simple loop I'm overlooking. Essentially what I'm attempting to do is have a user input a list of URLS separated by commas, then individually those URLS get joined to the ending of an API call. I have it working perfect when I remove the .split for one address, but I'd love to know how to get it to handle multiple user inputs. I tried setting a counter, and an upper limit for a loop then having it work that way but couldn't get it working properly.
import requests
import csv
import os
Domain = input ("Enter the URLS seperated by commas").split(',')
URL = 'https:APIcalladdresshere&' + Domain
r = requests.get(URL)
lm = r.text
j = lm.replace(';',',')
file = open(Domain +'.csv', "w",)
file.write(j)
file.close()
file.close()
print (j)
print (URL)
I unfortunately don't have enough reputation to comment and ask what you mean by it not working properly (I'm guessing you mean something that I've mentioned down below), but maybe if you have something like a list of domains and then looking for a specific input that makes you break the loop (so you don't have an upper limit like you said) that might solve your issue. Something like:
Domains = []
while True:
domain = input ("Enter the URLS seperated by commas: (Enter 'exit' to exit)")
if 'exit' in domain.lower():
break
else:
Domains.append(domain.split(','))
Urls = []
for domain in Domains:
URL = 'https:APIcalladdresshere&' + domain
Urls.append(domain) #or you could just write Urls.append('https:APIcalladdresshere&' + domain)
But then the line URL = 'https:APIcalladdresshere&' + Domain will throw a TypeError because you're trying to add a list to a string (you converted Domain to a list with Domain.split(',')). The loop above works just fine, but if you insist on comma-separated urls, try:
URL = ['https:APIcalladdresshere&' + d for d in Domain]
where URL is now a list that you can iterate over.
Hope this helps!

output more than limited results from a form request

I have the following script that posts a search terms into a form and retrieves results:
import mechanize
url = "http://www.taliesin-arlein.net/names/search.php"
br = mechanize.Browser()
br.set_handle_robots(False) # ignore robots
br.open(url)
br.select_form(name="form")
br["search_surname"] = "*"
res = br.submit()
content = res.read()
with open("surnames.txt", "w") as f:
f.write(content)
however the rendered web page, and hence script here limits the search to 250 results. Is there any way I can bypass this limit and retrieve all results?
Thank you
You could simply iterate over possible prefixes to get around the limit. There is 270,000 names and a limit of 250 results per query, therefore you need to make at least 1080 requests, there are 26 letters in the alphabet so if we assume there is an even distribution this would mean we would need to use a little over 2 letters as a prefix (log(1080)/log(26)), however it is unlikely to be that even (how many people have surnames starting with ZZ after all).
To get around this we use a modified depth first search like so:
import string
import time
import mechanize
def checkPrefix(prefix):
#Return list of names with this prefix.
url = "http://www.taliesin-arlein.net/names/search.php"
br = mechanize.Browser()
br.open(url)
br.select_form(name="form")
br["search_surname"] = prefix+'*'
res = br.submit()
content = res.read()
return extractSurnames(content)
def extractSurnames(pageText):
#write function to extract text from html
Q=[x for x in string.ascii_lowercase]
listOfSurnames=[]
while Q:
curPrefix=Q.pop()
print curPrefix
curSurnames=checkPrefix(curPrefix)
if len(curSurnames)<250:
#store surnames could also write to file.
listOfSurnames+=curSurnames
else:
#We clearly didnt get all of the names need to subdivide more
Q+=[curPrefix+x for x in string.ascii_lowercase]
time.sleep(5) # Sleep here to avoid overloading the server for other people.
Thus we query more in places where there are too many results to be displayed, but we do not query ZZZZ if there is less than 250 surnames that start with ZZZ (or shorter). Without knowing how skewed the name distribution is, hard to estimate how long this will take but the 5 seconds sleep multiplied by 1080 is 1.5 hours or so so you are probably looking at at least half a day if not longer.
Note: This could be made more efficient by declaring the browser globally, however whether this is appropriate depends on where this code will be placed.

Categories

Resources