Python no json object could be decoded

Python no json object could be decoded - python

I'm trying to create a simple login into a "Kahoot!" quiz .
First thing i'm trying to do is load from "https://kahoot.it/#/" JSON objects so i could fill the form in it (i tried to fill the form using 'mechenize' but its seems to support only html forms).
when im running the next script im getting exception that json could not be decoded:
import urllib, json
url = "https://kahoot.it/#/"
response = urllib.urlopen(url)
data = json.loads(response.read())
print data
output:
ValueError: No JSON object could be decoded
Any ideas? ,
Thanks.

type(response.read()) is str, representing the HTML of the page. Obviously it's not a valid JSON therefore you are getting that error.
EDIT If you are trying to login to that page, it is possible with selenium:
from selenium import webdriver
url = "https://kahoot.it/#/"
driver = webdriver.Chrome() # or webdriver.Firefox()
driver.get(url)
# finding the text field and 'typing' the game pin
driver.find_element_by_xpath('//*[#id="inputSession"]').send_keys('your_game_pin')
# finding and clicking the sumbit button
driver.find_element_by_xpath('/html/body/div[3]/div/div/div/form/button').click()

Related

Catch response on send_keys

I am using Selenium and unittest to write automated testing for web app
I have a text field that works as a 'search engine'. API returns response in json format on the entry of each character in the text field.
For example I get element of search and enter “Arrays” in the same:
def test_search(self):
driver = self.driver
driver.get(URL)
# find text field
element = driver.find_element_by_id("gsc-i-id2")
# enter some text into a text field
element.send_keys("Arrays")
# --> api returns response in json format
# --> catch response
Is it posible to get result list? Idea is to get JSON from response, is it possible?

How to scrape PDF files from a webpage if button sends a post request [duplicate]

This question already has answers here:
File download via Post form
(2 answers)
Closed 3 years ago.
I wanted to scrape some pdfs from a website: https://dsscic.nic.in/cause-list-report-web/view-decision?commissionname=302&file_category=1&fileno=&name=&public_authority=&decisiontypeid=1&frdate=&todate=&page_length=10&search_button=Submit
But what I encountered is, the buttons to the pdfs are actually POST request which has some kind of unique token as a value parameter. When I send a post request to server it sends an OK response but does return PDF.

You need to scrape the filename value from the hidden <input> field present in this url.
Example filename value,
Q0lDLVBHSU1FLUEtMjAxOC02MTY5NjEtQkoucGRm
Now, you can send the post request to fetch the pdf file,
import requests
import base64
data = {
'filename': 'Q0lDLVBHSU1FLUEtMjAxOC02MTY5NjEtQkoucGRm'
}
'''
To get the resultant file name you can either take the file name from
the `Content-disposition' key in the response headers or just base64
decode the filename value obtained from the hidden input field
'''
filename = base64.b64decode(data['filename'])
response = requests.post('https://dsscic.nic.in/cause-list-report-web/download', data=data)
# open the file to write in binary mode
with open(filename, 'wb') as file:
file.write(response.content)
See this in action here
Here is the curl request to download the PDF like in the above code.
curl 'https://dsscic.nic.in/cause-list-report-web/download' --data 'filename=Q0lDLVBHSU1FLUEtMjAxOC02MTY5NjEtQkoucGRm' -o $(base64 -d <<< Q0lDLVBHSU1FLUEtMjAxOC02MTY5NjEtQkoucGRm)

I suggest you use selenium web drivers. You can try sending a post like the one bellow.
from seleniumrequests import Firefox
webdriver = Firefox()
response = webdriver.request('POST', 'url here', data={"param1": "value1"})
print(response)
Or you could try manually clicking them. Like
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Safari()
driver.implicitly_wait(20)
driver.get('https://dsscic.nic.in/cause-list-report-web/view-decision-all/1?opt=appCom&commissionname=302&file_category=1&fileno=&name=&public_authority=&decisiontypeid=1&frdate=&todate=&page_length=10&search_button=Submit')
driver.implicitly_wait(20) #because this site is slow.
driver.find_element_by_class_name('btn.btn-primary').click()
The bottom did not work for me but I bet with some tinkering you could get one working. Cheers

Python webscraping page which requires login

I am trying to automate a web data gathering process using Python. In my case, I need to pull the information from https://app.ixml.com.br/documentos/nfe page. However, before you go to this page, you need to log in at https://app.ixml.com/login. The code below should theoretically log into the site:
import re
from robobrowser import RoboBrowser
username = 'email'
password = 'password'
br = RoboBrowser()
br.open('https://app.ixml.com.br/login')
form = br.get_form()
form['email'] = username
form['senha'] = password
br.submit_form(form)
src = str(br.parsed())
However, by printing the src variable, I get the source code from the https://app.ixml.com.br/login page, ie before logging in. If I add the following lines at the end of the previous code
br.open('https://app.ixml.com.br/documentos/nfe')
src2 = str(br.parsed())
The src2 variable contains the source code of the page https://app.ixml.com.br/. I tried some variations, such as creating a new br object, but got the same result. How can I access the information at https://app.ixml.com.br/documentos/nfe?

If it is ok to have a webpage opening you can try to solve this using selenium. This package makes it possible to create a program that reacts just like a user would.
The following code would have you login:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("https://app.ixml.com.br/login")
browser.find_element_by_id("email").send_keys("abc#mail")
browser.find_element_by_id("senha").send_keys("abc")
browser.find_element_by_css_selector("button").click()

Read page source before POST

I want to know if there is a way to POST parameters after reading the page source. Ex: read captcha before posting ID#
My current code:
import requests
id_number = "1"
url = "http://www.submitmyforum.com/page.php"
data = dict(id = id_number, name = 'Alex')
post = requests.post(url, data=data)
There is a captcha that is changeable after every request to http://submitforum.com/page.php (obv not a real site) I would like to read that parameter and submit it to the "data" variable.

As discussed in OP comments, selenium can be used, methods without browser emulation may also exists !
Using Selenium (http://selenium-python.readthedocs.io/) instead of requests module method:
import re
import selenium
from selenium import webdriver
regexCaptcha = "k=.*&co="
url = "http://submitforum.com/page.php"
# Get to the URL
browser = webdriver.Chrome()
browser.get(url)
# Example for getting page elements (using css seletors)
# In this example, I'm getting the google recaptcha ID if present on the current page
try:
element = browser.find_element_by_css_selector('iframe[src*="https://www.google.com/recaptcha/api2/anchor?k"]')
captchaID = re.findall(regexCaptcha, element.get_attribute("src"))[0].replace("k=", "").replace("&co=", "")
captchaFound = True
print "Captcha found !", captchaID
except Exception, ex:
print "No captcha found !"
captchaFound = False
# Treat captcha
# --> Your treatment code
# Enter Captcha Response on page
captchResponse = browser.find_element_by_id('captcha-response')
captchResponse.send_keys(captcha_answer)
# Validate the form
validateButton = browser.find_element_by_id('submitButton')
validateButton.click()
# --> Analysis of returned page if needed

error while parsing url using python

I am working on a url using python.
If I click the url, I am able to get the excel file.
but If I run following code, it gives me weird output.
>>> import urllib2
>>> urllib2.urlopen('http://intranet.stats.gov.my/trade/download.php?id=4&var=2012/2012%20MALAYSIA%27S%20EXPORTS%20BY%20ECONOMIC%20GROUPING.xls').read()
output :
"<script language=javascript>window.location='2012/2012 MALAYSIA\\'S EXPORTS BY ECONOMIC GROUPING.xls'</script>"
why its not able to read content with urllib2?

Take a look using an http listener (or even Google Chrome Developer Tools), there's a redirect using javascript when you get to the page.
You will need to access the initial url, parse the result and fetch again the actual url.

#Kai in this question seems to have found an answer to javascript redirects using the module Selenium
from selenium import webdriver
driver = webdriver.Firefox()
link = "http://yourlink.com"
driver.get(link)
#this waits for the new page to load
while(link == driver.current_url):
time.sleep(1)
redirected_url = driver.current_url

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python no json object could be decoded - python

Related

Catch response on send_keys

How to scrape PDF files from a webpage if button sends a post request [duplicate]

Python webscraping page which requires login

Read page source before POST

error while parsing url using python

Categories

Resources