How to extract user information from github using their email address? - python

If I have the email address of some users and I want to extract their information from GitHub account. how I can do this using Python.I found this (https://help.github.com/articles/searching-users/) but how it can help me in extracting user information.

You can either lookup how to web scrape information or try using the API for user emails.

I know that retrieving user data using their e-mail is possible using the GitHub API, which should return a JSON item with the user's information. I believe most of the tutorials for using the API use Ruby, though I see no reason why the same general principles wouldn't carry over to Python.
Otherwise, if you choose to use a web scraper instead, I'd recommend using BeautifulSoup.

you can try this. This ain't exactly the solution you might want , but this will work for you. In this code we have the Username not the Email-Id as input. The API is given in the code . But then to connect you need access token ( Similar to password ) . So you can create your own Personal Token . Here is the link to it :- https://github.com/settings/tokens . So now you have all the details and then you can play around with loops and all stuffs and extract whatever information you want.
P.S. :- If this solution doesn't meet your requirements , you can follow this link :- https://developer.github.com/v3/users/emails/ and do some changes accordingly in API
import urllib
import json
serviceurl = 'https://api.github.com/users/'
while True:
user = raw_input('Enter user_name : ')
if len(user) < 1 : break
serviceurl += user +'?'
access_token = "f6f02691c1d45293156ac5a2b7b324ed4fb9d2b4"
url = serviceurl + urllib.urlencode({'access_token': access_token})
print 'Retrieving', url
uh = urllib.urlopen(url)
data = uh.read()
#print data
js = json.loads(str(data))
print json.dumps(js, indent=4)
"""for i in js:
print i
print js["email"]"""

Related

Alexa fetching user email using python

I am trying to make an Alexa skill using python backend.
I am using amazon developer console to create model and code backend.
I want to retrieve user email address.
I would appreciate if you could provide me with sample code. I tried many methods but none were working.
here are some codes I tried :
https://github.com/alexa/alexa-skills-kit-sdk-for-python/tree/master/samples/GetDeviceAddress
I know this is device address but this was also not working, and I thought if i could get address I can get email.
Everything mentioned online is for Node, and I want to make my backend on python
As specified in the official documentation you need to make an API call to the ASK.
For email you need to call your api_endpoint followed by /v2/accounts/~current/settings/Profile.email
For me endpoint is : https://api.eu.amazonalexa.com therefore the complete url becomes :
https://api.eu.amazonalexa.com/v2/accounts/~current/settings/Profile.email
As far as adding the token to authorize the request it can be done by using the requests library by passing header to the get request. You can learn more about that from here
The final code should then look something like this :
#Fetching access token
accesstoken = str(handler_input.request_envelope.context.system.api_access_token)
#Fetching user emailaddress from ASK API
endpoint = "https://api.eu.amazonalexa.com/v2/accounts/~current/settings/Profile.email"
api_access_token = "Bearer " + accesstoken
headers = {"Authorization": api_access_token}
r = requests.get(endpoint, headers=headers)
email = r.json()
Hope this helps, Cheers!

logging into moodle using python

I'm trying to write some code that downloads content from Moodle website.
the first thing was trying and logging in, but from what I've tried so far, it seems as if I'm not actually being redirected to the page after log in (with the courses data etc...). here's that I've tried
user = 'my_username'
pas = 'my_password'
payload = {'username':user, 'password':pas}
login_site = "https://moodle2.cs.huji.ac.il/nu20/login/index.php?" # actual login webpage
data_site = "https://moodle2.cs.huji.ac.il/nu20" # should be the inner webpage with the courses etc...
with requests.Session() as session:
post = session.post(login_site, data=payload)
r = session.get(data_site)
content = r.text
print(content) # doesn't actually contain the HTML of the main courses page (seems to me its the login page)
any idea why might that happen? would appreciate your help ;)
It is difficult to help without knowing more about the specific site you are trying to log into.
One thing that's worth a try is changing
session.post(login_site, data=payload)
to
session.post(login_site, json=payload)
When the data parameter is used, the content-type header is not set to "application/json". Some sites will reject the POST based on this.
I've also run into sites which have protections against logins from scripts. They may require an additional token to be sent in the POST.
If all else fails, you could consider using selenium. Selenium allows you to control a browser instance programmatically You can simply load the page and send text input to the username and password fields on the login page. This would also get you access to any content which is rendered client side via javascript. However, this may be overkill depending on your use case.

python scraping school's webpage which requires user login

I'm using python to scrape my school's webpage, but in order to do that I needed to simulate a user login first. here is my code:
import requests, lxml.html
s = requests.session()
url = "https://my.emich.edu"
login = s.get(url)
login_html = lxml.html.fromstring(login.text)
hidden_inputs = login_html.xpath(r'//form//input[#type="hidden"]')
form = {x.attrib["name"]:x.attrib["value"] for x in hidden_inputs}
form["username"] = "myusernamge"
form["password"] = "mypassword"
form["submit"] = "LOGIN"
response = s.post("https://netid.emich.edu/cas/loginservice=https%3A%2F%2Fmy.emich.edu%2Fc%2Fportal%2Flogin",form)
response = s.get("http://my.emich.edu")
f = open("result.html","w")
f.write(response.text)
print response.text
i am expecting that response.text will give me my own student account page instead of that it gives me a log in requirement page. Can any one help me with this issue?
BTW this is not a homework
There are a few options here, and I think your requests approach can be made much easier by logging in manually and copying over the headers.
Use a python scripting package like http://wwwsearch.sourceforge.net/mechanize/ to scrape the site.
Use a browser-emulater such as http://casperjs.org/. Using this you can basically do anything you'd be able to do in a browser.
My suggestion here would be to go to the website, log in, and then open the developer console and copy those headers/cookies into your requests headers/cookies. This way you can just hardcode the 'already-authenticated request' and it will work fine. Note that this method is the least reliable for doing robust, everyday scraping, but if you're looking for something that will be the quickest to implement and will work until the authentication runs out, use this method.
Also, you need the request the logged-in homepage (again) after you successfully do the post.

Using python how do I connect to two different api s at the same time so I can compare data?

I am using the requests and json modules. So my current code looks like.
# API url to connect
url1 = url
url2 = url
# Authentication for Url1, Url2 doesn't need auth
usr = username
pass = password
r2 = requests.get(url1, auth=(usr, pass), verify=False)
r3 = requests.get(url2, verify=False)
for obj in json.loads(r2.text)['results']:
for obj in json.loads(r3.text)['ip']:
if str(obj['ip']) == str(obj['ip']):
print "Hostname: " + str(obj['name']) + ", IP: " + str(obj['ip'])
What I need to do now is add another api to this mix and I want to run an if statement that compares ip addresses to make sure that our servers on one reporting system can cross reference itself to another system that is manually entered. So we can know what is in there or not. Sorry for the poor example I'm on my train ride home and seriously am stuck.
This question lacks details to be answerable. Do it like this
Get IP addresses from one place using requests
Get IP addresses from another place using requests
Compare Python arrays, dicts, tables or whatever format you got them
Unless more specific information about formats, tables, urls, etc. are included it is hard to give any detailed help.

How to mirror a reddit moderator page with python

I'm trying to create a mirror of specific moderator pages (i.e. restricted) of a subreddit on my own server, for transparency purposes. Unfortunately my python-fu is weak and after struggling a bit with the reddit API, its python wrapper and even some answers in here, I'm no closer to having a working solution.
So what I need to do is login to reddit with a specific user, access a moderator only page and copy its html to a file on my own server for others to access
The problem I'm running into is that the API and its wrapper is not very well documented so I haven't found if there's a way to retrieve a reddit page after logging in. If I can do that, then I could theoretically copy the result to a simple html page on my server.
When trying to do it outside the python API, I can't figure out how to use the built-in modules of python to login and then read a restricted page.
Any help appreciated.
I don't use PRAW so I'm not sure about that, but if I were to do what you wanted to do, I'd do something like: login, save the modhash, grab the HTML from the url of the place you want to go:
It also looks like it's missing some CSS or something when I save it, but it's recognizable enough as it is. You'll need the requests module, along with pprint and json
import requests, json
from pprint import pprint as pp2
#----------------------------------------------------------------------
def login(username, password):
"""logs into reddit, saves cookie"""
print 'begin log in'
#username and password
UP = {'user': username, 'passwd': password, 'api_type': 'json',}
headers = {'user-agent': '/u/STACKOVERFLOW\'s API python bot', }
#POST with user/pwd
client = requests.session()
r = client.post('http://www.reddit.com/api/login', data=UP)
#if you want to see what you've got so far
#print r.text
#print r.cookies
#gets and saves the modhash
j = json.loads(r.text)
client.modhash = j['json']['data']['modhash']
print '{USER}\'s modhash is: {mh}'.format(USER=username, mh=client.modhash)
#pp2(j)
return client
client = login(USER, PASSWORD)
#mod mail url
url = r'http://www.reddit.com/r/mod/about/message/inbox/'
r = client.get(url)
#here's the HTML of the page
pp2(r.text)

Categories

Resources