I'm looking to get the list of all subdomains of a URL, uisng the Virustotal API
Running Google.com as an example through their website, I can see that Virustotal registers a total of 3.2k subdomains for Google.com
https://www.virustotal.com/gui/domain/google.com/relations -> subdomains screenshot
Using the API, however, I get a list of only 100 subdomains.
import vt
import requests
url = 'https://www.virustotal.com/vtapi/v2/domain/report'
params = {'apikey':'MY_API_KEY','domain':'google.com'}
response = requests.get(url, params=params)
print(response.json())
Is there an option to get ALL the subdomains in the response?
I tried the V2 and V3 API, crosschecked with their online GUI to confirm that my code wasn't return all available output
After looking at the Virus Total api, there is a parameter called limit that is default set at 100.
I am assuming if you want to change it you would have to add it to your parameters entry.
import vt
import requests
url = 'https://www.virustotal.com/vtapi/v2/domain/report'
params = {'apikey':'MY_API_KEY','domain':'google.com','limit': 100}
response = requests.get(url, params=params)
print(response.json())
Related
I want to get data returned by this API:
https://www.instagram.com/api/v1/users/web_profile_info/?username=kateannedesigns
When we search for a user we can access basic data without even logging in but when I make a request using this api which actually fetches the data, it fails with a 400 response.
This is the request shown in the browser:
There is no session ID but it still works in the browser
but I want to use this in python requests.
Unfortunately Instagram banned my IP because I sent too many requests. I could find out that the API expects the X-IG-App-ID header with a number. I am not sure whether or not it actually needs the 15 digit number. Try this code:
import requests
headers = {'X-IG-App-ID': '743845927482847'} # random 15 digit number
response = requests.get('https://www.instagram.com/api/v1/users/web_profile_info/?username=kateannedesigns', headers=headers)
json_data = response.json()
print(response.status_code)
print(json_data)
I'm just learning about the request module in python and want to test it.
First I started looking at the chrome network tab to see the request URL and method: POST and the used form data, looking at the response tab on the chrome gives me the expected data in JSON but after doing it on a python file like this:
import requests
uid = "23415191"
url = "https://info.gbfteamraid.fun/web/userrank"
honors = requests.post(url, data={
"method": "getUserDayPoint",
"params": {"teamraidid":"teamraid056","userid":uid}
}).text
print(honors)
This gives me a HTML elements of the homepage of the site instead of the JSON response, also used postman and it gives me the same result
You are being redirected to a different page ,maybe cos you are not authenticated .Most probably you may need to authenticate your request . I have added the basic auth ,which uses id and password.Try this with the id ,pass that you have for that website and then see if it works. Just replace your and with your id and password.
import requests
from requests.auth import HTTPBasicAuth
uid = "11111111"
url = "https://info.gbfteamraid.fun/web/userrank"
honors = requests.post(url,auth=HTTPBasicAuth(username="<your username>",password="<your password>"), data={
"method": "getUserDayPoint",
"params": {"teamraidid":"teamraid056","userid":uid}
}).text
print(honors)
I am trying to get data using the Resquest library, but I’m doing something wrong. My explanation, manual search:
URL - https://www9.sabesp.com.br/agenciavirtual/pages/template/siteexterno.iface?idFuncao=18
I fill in the “Informe o RGI” field and after clicking on the Prosseguir button (like Next):
enter image description here
I get this result:
enter image description here
Before I coding, I did the manual search and checked the Form Data:
enter image description here
And then I tried it with this code:
import requests
data = { "frmhome:rgi1": "0963489410"}
url = "https://www9.sabesp.com.br/agenciavirtual/block/send-receive-updates"
res = requests.post(url, data=data)
print(res.text)
My output is:
<session-expired/>
What am I doing wrong?
Many thanks.
When you go to the site using the browser, a session is created and stored in a cookie on your machine. When you make the POST request, the cookies are sent with the request. You receive an session-expired error because you're not sending any session data with your request.
Try this code. It requests the entry page first and stores the cookies. The cookies are then sent with the POST request.
import requests
session = requests.Session() # start session
# get entry page with cookies
response = session.get('https://www9.sabesp.com.br/agenciavirtual/pages/home/paginainicial.iface', timeout=30)
cks = session.cookies # save cookies with Session data
print(session.cookies.get_dict())
data = { "frmhome:rgi1": "0963489410"}
url = "https://www9.sabesp.com.br/agenciavirtual/block/send-receive-updates"
res = requests.post(url, data=data, cookies=cks) # send cookies with request
print(res.text)
I'm trying to create a scraper for olx website (www.olx.pl) using requests and beautifulsoup. I don't have any problems with most of data, but the phone number is hidden (One has to first click it). I've already tried to use chrome inspect to see what is happening in the "Network" tab when I click it manually.
There is an ajax request with this information "?pt=5d1480fbad0a1f2006e865bfdf7a6fb07f244b82e17ab0ea4c5eaddc43f9da391b098e1926642564ffb781655d55be270c6913f7526a08298f43b24c0169636b"
This is the phoneToken which may be found in the website source (it changes on each page load).
I tried to send this kind of request using requests library, but I got "000 000 000" in response.
I can get the phone number using Selenium, but it is so slow to load.
The question is:
Is there a way to get around those security phone tokens?
or
How to speed up Selenium to scrape phone number in let's say 1-2sec?
Ad example:
https://www.olx.pl/561666735
EDIT:
Actually, now in response I get the message that my IP address is blocked. (But only using requests, ip is not blocked when I load page manually).
Unfortunately I made some changes and I can't reproduce the code, to get '000 000 000' in response. This is part of my code right now.
def scrape_phone(id):
s = requests.Session()
url = "https://www.olx.pl/{}".format(id)
response = s.get(url, headers=headers)
page_text = response.text
# getting short id
index_of_short_id = page_text.index("'id':'")
short_id = page_text[index_of_short_id:index_of_short_id+11].split("'")[-1]
# getting phone token
index_of_token = page_text.index("phoneToken")
phone_token = page_text[index_of_token+10:index_of_token+150].split("'")[1]
url = "https://www.olx.pl/ajax/misc/contact/phone/{}".format(short_id)
data = {
'pt': phone_token
}
response = s.post(url, data=data, headers=headers)
print(response.text)
scrape_phone(540006276)
i'm trying to send a post request to my express server using python, this is my code:
import requests
print("started");
URL = "http://localhost:5000/api/chat"
PARAMS = {'ID':"99","otherID":"87",'chat':[{"senderName":"tom","text":"helloworld"}]}
r = requests.post(url = URL, data = PARAMS)
pastebin_url = r.text
print("The pastebin URL is:%s"%pastebin_url)
but when I recieve the call, I get an empty object on my Node server, am I missing something?
(With postman it works fine so its not the server)
typically requests library differentiates on the type of request depending on the param used to make the request. Meaning if you intend to make a JSON post then one should use the json param like so:
response = requests.post(url=URL, json=PARAMS)
what this does is set the accompanying headers which is why when your express server attempts to parse it, it comes back empty