there is a task-from page get the text of all posts with more than 0 likes. As I understand it, you must first get all the tokens and IDs of possible posts from the page (which was not difficult) and make a request to the server using the requests library and these ids and get a response, since the post itself is in the code only in the form of a form without information about likes. But I don't understand much about the requests themselves and I can't figure out how to make such a request and get the html code of the post? Do I need a token? They are usually used for security and are generated by each user.
Directly finding the assumed token and request
Number of likes
To do this, you simply import the requests library and use requests.get(). More detailed response can be found here: https://realpython.com/python-requests/.
Related
Appreciate this is very simple, and I've managed to get the data I want using requests but for the life of me I can't seem to figure this out.
Essentially I am making a request to an API which returns a nicely formatted json that includes a load of site data for multiple sites. I then want to iterate through this list, extracting the site id nested in the json to make requests for the menus at each site. This should just be a case of passing the site id into a request along with the rest of the URL and then returning the json data that this new request generates but I cannot seem to be able to figure out how to do this.
I'm happy to offer more detail if anyone is able to help and needs it. Thanks in advance!
I made a project, but in it you need to get a special token from the VK social network. I made the token pass along with the link. She looks like this:
http://127.0.0.1:8000/vk/auth#access_token=7138dcd74f5da5e557943b955bbfbd9a62811da7874067e5fa0edef1ca8680216755be16&expires_in=86400&user_id=397697636
But the problem is that the django cannot see this link. I tried to look at it in a post request, get request, but everything is empty there. I tried to make it come not as a request but as a link, it is like this:
http://127.0.0.1:8000/vk/auth #access_token=7138dcd74f5da5e557943b955bbfbd9a62811da7874067e5fa0edef1ca8680216755be16&expires_in=86400&user_id=397697636
But the django does not want to read the space. Who can help
I think there is a confusion between a query string (get params) that follows a ? and a fragment (the text, that follows a #)
What follows the # is not sent to the server (and thus not received by Django) it is only useful to the web browser and to the javascript that is executed on the browser , which can use it to update parts of the screen. use it as virtual urls / bookmarks for one page web applications.
The javascript can of course also trigger AJAX requests using that data, but that's up to the javascript
If you write however http://127.0.0.1:8000/vk/auth?access_token=7138dcd74f5da5e557943b955bbfbd9a62811da7874067e5fa0edef1ca8680216755be16&expires_in=86400&user_id=397697636 (you replace # with ?)
Then you can receive the information in your django view with
request.GET["access_token"], request.GET["expires_in"] and request.GET["user_id"]
If it is really a #, then your javascript should parse whatever follows the # and make the according AJAX requests to the server to send / validate the token.
For another question about fragments, refer for example to Is the URL fragment identifier sent to the server?
I am trying to web scrape the a piece of news. I try to login into the website by python so that I can have full access to the whole web page. But I have looked at so many tutorials but still fail.
Here is the code. Can anyone tell me why.
There is no bug in my code. But I still can not see the full text, which means I am still not log in.
`
url='https://id.wsj.com/access/pages/wsj/us/signin.html?mg=id-wsj&mg=id-wsj'
payload={'username':'my_user_name',
'password':'******'}
session=requests.Session()
session.get(url)
response=session.post(url,data=payload)
print(response.cookies)
r=requests.get('https://www.wsj.com/articles/companies-push-to-repeal-amt-after-senates-last-minute-move-to-keep-it-alive-1512435711')
print(r.text)
`
Try sending your last GET request using the response variable. After all, it's the one who made the login and holds the cookies (if there are any). You've used a new requests object for your last request thus ignoring the login you just made.
From this question, the last responder seems to think that it is possible to use python to open a webpage, let me sign in manually, go through a bunch of menus then let the python parse the page when I get where I want. The website has a weird sign in procedure so using requests and passing a user name and password will not be sufficient.
However it seems from this question that it's not a possibility.
SO the question is, is it possible? if so, do you know of some example code out there?
The way to approach this problem is when you login normally have the developer tools next to you and see what the request is sending.
When logging in to bandcamp the XHR request that's being sent is the following:
From that response you can see that an identity cookie is being sent. That's probably how they identify that you are logged in. So when you've got that cookie set you would be authorized to view logged in pages.
So in your program you could login normally using requests, save the cookie in a variable and then apply the cookie to further requests using requests.
Of course login procedures and how this authorization mechanism works may differ, but that's the general gist of it.
So when do you actually need selenium? You need it if a lot of the things are being rendered by javascript. requests is only able to get the html. So if the menus and such is rendered with javascript you won't ever be able to see that information using requests.
I'm trying to log into Instagram using Python Requests. I figured it would be as simple as creating a requests.Session object and then sending a post request i.e.
session.post(login_url, data={'username':****, 'password':****})
This didn't work. I didn't know why so I tried manually entering the browsers headers (I used Chrome dev tools to see the headers of the post request) and passing them along with the request (headers={...}) even though I figured the session would deal with that. I tried sending a get request to the login URL first in order to get a cookie (and CSRF token I think) then doing the steps mentioned before. None of this worked.
I dont have much experience at all with this type of thing and I just dont understand what differentiates my post requests from google chromes (I must be doing something wrong). Thanks