I've setup a code in python to search for tweets using the oauth2 and urllib2 libraries only. (I'm not using any particular twitter library)
I'm able to search for tweets based on keywords. However, I'm getting zero number of tweets when I search for this particular keyword - "Jurgen%20Mayer-Hermann". (this is challenge because my ultimate goal is to search for this keyword only.
On the other hand when I search for the same thing online (twitter interface, I'm getting enough tweets). - https://twitter.com/search?q=Jurgen%20Mayer-Hermann&src=typd
Can someone please see if we can identify the issue?
The code is as follows:
def getfeed(mystr, tweetcount):
url = "https://api.twitter.com/1.1/search/tweets.json?q=" + mystr + "&count=" + tweetcount
parameters = []
response = twitterreq(url, "GET", parameters)
res = json.load(response)
return res
search_str = "Jurgen Mayer-Hermann"
search_str = '%22'+search_str+'%22'
search = search_str.replace(" ","%20")
search = search.replace("#","%23")
tweetcount = str(50)
res = getfeed(search, tweetcount)
When I print the constructed url, I get
https://api.twitter.com/1.1/search/tweets.json?q=%22Jurgen%20Mayer-Hermann%22&count=50
I have actually never worked with the Twitter API, but it looks like the count parameter only applies to searches on timelines as a way to limit the amount of tweets per page of results. In other words, you use it with the GET statuses/home_timeline, GET statuses/mentions, and GET statuses/user_timeline endpoints.
Try without count and see what happens.
Please use urllib.urlencode to encode your query parameters, like so:
import urllib
query = urllib.urlencode({'q': '"Jurgen Mayer-Hermann"', count: 50})
This produces 'q=%22Jurgen+Mayer-Hermann%22&count=50'. Which might bring you more luck...
Related
I am trying to scrape the tweets from a trending tag in twitter. I tried to find the xpath of the text in a tweet, but it doesn't work.
browser = webdriver.Chrome('/Users/Suraj/Desktop/twitter/chromedriver')
url = 'https://twitter.com/search?q=%23'+'Swastika'+'&src=trend_click'
browser.get(url)
time.sleep(1)
The following piece of code doesn't give any results.
browser.find_elements_by_xpath('//*[#id="tweet-text"]')
Other content which I was able to find where :
browser.find_elements_by_css_selector("[data-testid=\"tweet\"]") # works
browser.find_elements_by_xpath("/html/body/div[1]/div/div/div[2]/main/div/div/div/div[1]/div/div[2]/div/div/section/div/div/div/div/div/div/article/div/div/div/div[2]/div[2]/div[1]/div/div") # works
I want to know how I can select the text from the tweet.
You can use Selenium to scrape twitter but it would be much easier/faster/efficient to use the twitter API with tweepy. You can sign up for a developer account here: https://developer.twitter.com/en/docs
Once you have signed up get your access keys and use tweepy like so:
import tweepy
# connects to twitter and authenticates your requests
auth = tweepy.OAuthHandler(TWapiKey, TWapiSecretKey)
auth.set_access_token(TWaccessToken, TWaccessTokenSecret)
# wait_on_rate_limit prevents you from requesting too many times and having twitter block you
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
# loops through every tweet that tweepy.Cursor pulls -- api.search tells cursor
# what to do, q is the search term, result_type can be recent popular or mixed,
# and the max_id/since_id are snowflake ids which are twitters way of
# representing time and finally count is the maximum amount of tweets you can return per request.
for tweet in tweepy.Cursor(api.search, q=YourSearchTerm, result_type='recent', max_id=snowFlakeCurrent, since_id=snowFlakeEnd, count=100).items(500):
createdTime = tweet.created_at.strftime('%Y-%m-%d %H:%M')
createdTime = dt.datetime.strptime(createdTime, '%Y-%m-%d %H:%M').replace(tzinfo=pytz.UTC)
data.append(createdTime)
This code is an example of a script that pulls 500 tweets from YourSearchTerm recent tweets and then appends the time each was created to a list. You can check out the tweepy documentation here: http://docs.tweepy.org/en/latest/
Each tweet that you pull with the tweepy.Cursor() will have many attributes that you can choose and append to a list and or do something else. Even though it is possible to scrape twitter with Selenium it's realllly not recommended as it will be very slow whereas tweepy returns result in mere seconds.
Applying for the API is not always successful. I used Twint, which provides a means to scrape quickly. In this case to a CSV output.
def search_twitter(terms, start_date, filename, lang):
c = twint.Config()
c.Search = terms
c.Custom_csv = ["id", "user_id", "username", "tweet"]
c.Output = filename
c.Store_csv = True
c.Lang = lang
c.Since = start_date
twint.run.Search(c)
return
My goal is to connect to Youtube API and download the URLs of specific music producers.I found the following script which I used from the following link: https://www.youtube.com/watch?v=_M_wle0Iq9M. In the video the code works beautifully. But when I try it on python 2.7 it gives me KeyError:'items'.
I know KeyErrors can occur when there is an incorrect use of a dictionary or when a key doesn't exist.
I have tried going to the google developers site for youtube to make sure that 'items' exist and it does.
I am also aware that using get() may be helpful for my problem but I am not sure. Any suggestions to fixing my KeyError using the following code or any suggestions on how to improve my code to reach my main goal of downloading the URLs (I have a Youtube API)?
Here is the code:
#these modules help with HTTP request from Youtube
import urllib
import urllib2
import json
API_KEY = open("/Users/ereyes/Desktop/APIKey.rtf","r")
API_KEY = API_KEY.read()
searchTerm = raw_input('Search for a video:')
searchTerm = urllib.quote_plus(searchTerm)
url = 'https://www.googleapis.com/youtube/v3/search?part=snippet&q='+searchTerm+'&key='+API_KEY
response = urllib.urlopen(url)
videos = json.load(response)
videoMetadata = [] #declaring our list
for video in videos['items']: #"for loop" cycle through json response and searches in items
if video['id']['kind'] == 'youtube#video': #makes sure that item we are looking at is only videos
videoMetadata.append(video['snippet']['title']+ # getting title of video and putting into list
"\nhttp://youtube.com/watch?v="+video['id']['videoId'])
videoMetadata.sort(); # sorts our list alphaetically
print ("\nSearch Results:\n") #print out search results
for metadata in videoMetadata:
print (metadata)+"\n"
raw_input('Press Enter to Exit')
The problem is most likely a combination of using an RTF file instead of a plain text file for the API key and you seem to be confused whether to use urllib or urllib2 since you imported both.
Personally, I would recommend requests, but I think you need to read() the contents of the request to get a string
response = urllib.urlopen(url).read()
You can check that by printing the response variable
So I'm trying to create a Python script that will take a search term or query, then search google for that term. It should then return 5 URL's from the result of the search term.
I spent many hours trying to get PyGoogle to work. But later found out Google no longer supports the SOAP API for search, nor do they provide new license keys. In a nutshell, PyGoogle is pretty much dead at this point.
So my question here is... What would be the most compact/simple way of doing this?
I would like to do this entirely in Python.
Thanks for any help
Use BeautifulSoup and requests to get the links from the google search results
import requests
from bs4 import BeautifulSoup
keyword = "Facebook" #enter your keyword here
search = "https://www.google.co.uk/search?sclient=psy-ab&client=ubuntu&hs=k5b&channel=fs&biw=1366&bih=648&noj=1&q=" + keyword
r = requests.get(search)
soup = BeautifulSoup(r.text, "html.parser")
container = soup.find('div',{'id':'search'})
url = container.find("cite").text
print(url)
What issues are you having with pygoogle? I know it is no longer supported, but I've utilized that project on many occasions and it would work fine for the menial task you have described.
Your question did make me curious though--so I went to Google and typed "python google search". Bam, found this repository. Installed with pip and within 5 minutes of browsing their documentation got what you asked:
import google
for url in google.search("red sox", num=5, stop=1):
print(url)
Maybe try a little harder next time, ok?
Here, link is the xgoogle library to do the same.
I tried similar to get top 10 links which also counts words in links we are targeting. I have added the code snippet for your reference :
import operator
import urllib
#This line will import GoogleSearch, SearchError class from xgoogle/search.py file
from xgoogle.search import GoogleSearch, SearchError
my_dict = {}
print "Enter the word to be searched : "
#read user input
yourword = raw_input()
try:
#This will perform google search on our keyword
gs = GoogleSearch(yourword)
gs.results_per_page = 80
#get google search result
results = gs.get_results()
source = ''
#loop through all result to get each link and it's contain
for res in results:
#print res.url.encode('utf8')
#this will give url
parsedurl = res.url.encode("utf8")
myurl = urllib.urlopen(parsedurl)
#above line will read url content, in below line we parse the content of that web page
source = myurl.read()
#This line will count occurrence of enterd keyword in our webpage
count = source.count(yourword)
#We store our result in dictionary data structure. For each url, we store it word occurent. Similar to array, this is dictionary
my_dict[parsedurl] = count
except SearchError, e:
print "Search failed: %s" % e
print my_dict
#sorted_x = sorted(my_dict, key=lambda x: x[1])
for key in sorted(my_dict, key=my_dict.get, reverse=True):
print(key,my_dict[key])
I want to create a script in Python which downloads the current KML files of all the Maps I created on Google Maps.
To do so manually, I can use this:
http://maps.google.com.br/maps/ms?msid=USER_ID.MAP_ID&msa=0&output=kml
where USER_ID is a constant number Google uses to identify me, and MAP_ID is the individual map identifier generated by the link icon on top-right corner.
This is not very straightforward, because I have to manually browse "My Places" page on Google Maps, and get the links one by one.
From Google Maps API HTTP Protocol Reference:
The Map Feed is a feed of user-created maps.
This feed's full GET URI is:
http://maps.google.com/maps/feeds/maps/default/full
This feed returns a list of all maps for the authenticated user.
** The page says this service is no longer available, so I wonder if there is a way to do the same in the present.
So, the question is: Is there a way to get/download the list of MAP_IDs of all my maps, preferrably using Python?
Thanks for reading
The correct answer to this question involves using Google Maps Data API, HTML interface, which by the way is deprecated but still solves my need in a more official way, or at least more convincing than parsing a web page. Here it goes:
# coding: utf-8
import urllib2, urllib, re, getpass
username = 'heltonbiker'
senha = getpass.getpass('Senha do usuário ' + username + ':')
dic = {
'accountType': 'GOOGLE',
'Email': (username + '#gmail.com'),
'Passwd': senha,
'service': 'local',
'source': 'helton-mapper-1'
}
url = 'https://www.google.com/accounts/ClientLogin?' + urllib.urlencode(dic)
output = urllib2.urlopen(url).read()
authid = output.strip().split('\n')[-1].split('=')[-1]
request = urllib2.Request('http://maps.google.com/maps/feeds/maps/default/full')
request.add_header('Authorization', 'GoogleLogin auth=%s' % authid)
source = urllib2.urlopen(request).read()
for link in re.findall('<link rel=.alternate. type=.text/html. href=((.)[^\1]*?)>', source):
s = link[0]
if 'msa=0' in s:
print s
I arrived with this solution with a bunch of other questions in SO, and a lot of people helped me a lot, so I hope this code might help anyone else trying to do so in the future.
A quick and dirty way I have found, that skips Google Maps API completely and perhaps might brake in the near future, is this:
# coding: utf-8
import urllib, re
from BeautifulSoup import BeautifulSoup as bs
uid = '200931058040775970557'
start = 0
shown = 1
while True:
url = 'http://maps.google.com/maps/user?uid='+uid+'&ptab=2&start='+str(start)
source = urllib.urlopen(url).read()
soup = bs(source)
maptables = soup.findAll(id=re.compile('^map[0-9]+$'))
for table in maptables:
for line in table.findAll('a', 'maptitle'):
mapid = re.search(uid+'\.([^"]*)', str(line)).group(1)
mapname = re.search('>(.*)</a>', str(line)).group(1).strip()[:-2]
print shown, mapid, mapname
shown += 1
# uncomment if you want to download the KML files:
# urllib.urlretrieve('http://maps.google.com.br/maps/ms?msid=' + uid + '.' + str(mapid) +
'&msa=0&output=kml', mapname + '.kml')
if '<span>Next</span>' in str(source):
start += 5
else:
break
Of course it is only printing a numbered list, but from there to save a dictionary and/or automate KML download via &output=kml url trick it goes naturally.
I'm trying to screenscrape the first result of a Google search using Python and simplejson, but I can't access the search results the way that many examples online demonstrate. Here's a snippet:
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % (query)
search_results = urllib.urlopen(url)
json = simplejson.load(search_results)
try:
results = json['responseData']['results'] # always fails at this line
first_result = results[0]
except:
print "attempt to set results failed"
When I go to http://ajax.googleapis.com/ajax/services/search/web?v=1.0&stackoverflow (or anything else substituted for the %s) in a browser, it displays the line "{"responseData": null, "responseDetails": "clip sweeping", "responseStatus": 204}." Is there some other way to access the results of a Google search in Python besides trying to use the apparently empty responseData?
You missed the &q=. You also should consider using an api-key. http://code.google.com/intl/de/apis/ajaxsearch/documentation/. Besides that plain string contaction wont work, you need to escape the parameter.
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q= ' + urllib.quote_plus(query)