Ive got a python script intending to scrape tweets from twitter and append them to a csv file. Im using the tweepy module however it is only returning 1 tweet. Is this a problem with my for loop, or with the call to the twitter API?
for status in tweepy.Cursor(twitterapi.search,q="labour party",since="2018-05-01", until="2018-05-10").items(200):
if 'RT' not in status.text:
with open('C:/Users/User/Desktop/twittersentiment.csv', 'wb') as f:
w = csv.writer(f)
favourites = status.user.favourites_count
location = status.user.location.encode('utf8')
tweet_text = ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",status.text.encode('utf8')).split())
date = status.created_at.strftime('%m/%d/%Y')
a = [location]
b=[favourites]
c=[tweet_text]
d=[date]
zip(a,b,c,d)
w.writerow(zip(a,b,c,d))
You should open the file before you start iterating the tweepy.Cursor otherwise each iteration of the cursor will create a new file with one entry, overwriting the previous file.
with open('C:/Users/User/Desktop/twittersentiment.csv', 'wb') as f:
w = csv.writer(f)
for status in tweepy.Cursor(twitterapi.search,q="labour party",since="2018-05-01", until="2018-05-10").items(200):
if 'RT' not in status.text:
favourites = status.user.favourites_count
location = status.user.location.encode('utf8')
tweet_text = ' '.join(re.sub("(#[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",status.text.encode('utf8')).split())
date = status.created_at.strftime('%m/%d/%Y')
a = [location]
b=[favourites]
c=[tweet_text]
d=[date]
zip(a,b,c,d)
w.writerow(zip(a,b,c,d))
Related
I trust all is well with everyone here. My apologies if this has been answered before, though I am trying to do the following.
cursor = tweepy.Cursor(
api.search_tweets,
q = '"Hello"',
lang = 'en',
result_type = 'recent',
count = 2
)
I want to match the number in count, to the number of json objects I will be iterating through.
for tweet in cursor.items():
tweet_payload = json.dumps(tweet._json,indent=4, sort_keys=True)
I have tried several different ways to write the data, though it would appear that the following does not work (currently is a single fire):
with open("Tweet_Payload.json", "w") as outfile:
outfile.write(tweet_payload)
time.sleep(.25)
outfile.close()
This is what it looks like put together.
import time
import tweepy
from tweepy import cursor
import Auth_Codes
import json
twitter_auth_keys = {
"consumer_key" : Auth_Codes.consumer_key,
"consumer_secret" : Auth_Codes.consumer_secret,
"access_token" : Auth_Codes.access_token,
"access_token_secret" : Auth_Codes.access_token_secret
}
auth = tweepy.OAuthHandler(
twitter_auth_keys["consumer_key"],
twitter_auth_keys["consumer_secret"]
)
auth.set_access_token(
twitter_auth_keys["access_token"],
twitter_auth_keys["access_token_secret"]
)
api = tweepy.API(auth)
cursor = tweepy.Cursor(
api.search_tweets,
q = '"Hello"',
lang = 'en',
result_type = 'recent',
count = 2
)
for tweet in cursor.items():
tweet_payload = json.dumps(tweet._json,indent=4, sort_keys=True)
with open("Tweet_Payload.json", "w") as outfile:
outfile.write(tweet_payload)
time.sleep(.25)
outfile.close()
Edit:
Using the suggestion by Mickael
also, the current code
tweet_payload = []
for tweet in cursor.items():
tweet_payload.append(tweet._json)
print(json.dumps(tweet_payload, indent=4,sort_keys=True))
with open("Tweet_Payload.json", "w") as outfile:
outfile.write(json.dumps(tweet_payload,indent=4, sort_keys=True))
time.sleep(.25)
Just loops, I am not sure why thats the case when the count is 10. I thought it would run just 1 call for 10 results or less, then end.
Opening the file with the write mode erases its previous data so, if you want to add each new tweet to the file, you should use the append mode instead.
As an alternative, you could also store all the tweets' json in a list and write them all at once. That should be more efficient and the list at the root of your json file will make it valid.
json_tweets = []
for tweet in cursor.items():
json_tweets.append(tweet._json)
with open("Tweet_Payload.json", "w") as outfile:
outfile.write(json.dumps(json_tweets,indent=4, sort_keys=True))
On a side note, the with closes the file automatically, you don't need to do it.
I am attempting to loop through a series of text files in a directory, looking for occurences of certain types of words, and prefixing each found word with a user defined tag. My code is as follows.
ACC_Tagged_Test = 'C:/ACC_Tag_Test'
for filename in glob.glob(os.path.join(ACC_Tagged_Test, '*.txt')):
with open(filename) as f:
data = f.read()
data = data.lower()
modals = {"could":1, "would":1, "should":1, "can":1, "may":1, "might":1}
personal_attribute = {"believes":1, "guess":1, "surmise":1, "considers":1,
"presume":1, "speculate":1, "postulate":1, "surmised":1, "assume":1}
approx_adapt = {"broadly":1, "mainly":1, "mostly":1, "loosely":1,
"generally":1, "usually":1,"typically":1, "regularly":1, "widely":1}
plaus_shields = {"wonder":1, "suspect":1, "theorize":1, "hypothesize":1,
"cogitate":1, "contemplate":1, "deliberate":1}
format_modal = "<555>{} ".format
format_attribute = "<666>{} ".format
format_app_adaptor = "<777>{} ".format
format_plaus_shield = "<888>{} ".format
data = " ".join(format_modal(word) if word in modals else word for word in data.split())
data = " ".join(format_attribute(word) if word in personal_attribute else word for word in data.split())
data = " ".join(format_app_adaptor(word) if word in approx_adapt else word for word in data.split())
data = " ".join(format_plaus_shield(word) if word in plaus_shields else word for word in data.split())
with open (filename, "w") as f:
f.write(str(data))
print(data) # This is just added in order to check on screen all files
# Are being processed.
My problem is that although code works on the last file in the directory it is not working on the previous files (1 out of 10 in this) I've tried a second For loop above the file write out statements but that is not working at all. Can anyone explain what I'm doing wrong here?
regards
My speculation is your code is only showing the last file because it's
not indented properly to have all relevant code within the for loop.
Try with this indentation:
ACC_Tagged_Test = 'C:/ACC_Tag_Test'
for filename in glob.glob(os.path.join(ACC_Tagged_Test, '*.txt')):
with open(filename) as f:
data = f.read()
data = data.lower()
modals = {"could":1, "would":1, "should":1, "can":1, "may":1, "might":1}
personal_attribute = {"believes":1, "guess":1, "surmise":1, "considers":1,
"presume":1, "speculate":1, "postulate":1, "surmised":1, "assume":1}
approx_adapt = {"broadly":1, "mainly":1, "mostly":1, "loosely":1,
"generally":1, "usually":1,"typically":1, "regularly":1, "widely":1}
plaus_shields = {"wonder":1, "suspect":1, "theorize":1, "hypothesize":1,
"cogitate":1, "contemplate":1, "deliberate":1}
format_modal = "<555>{} ".format
format_attribute = "<666>{} ".format
format_app_adaptor = "<777>{} ".format
format_plaus_shield = "<888>{} ".format
data = " ".join(format_modal(word) if word in modals else word for word in data.split())
data = " ".join(format_attribute(word) if word in personal_attribute else word for word in data.split())
data = " ".join(format_app_adaptor(word) if word in approx_adapt else word for word in data.split())
data = " ".join(format_plaus_shield(word) if word in plaus_shields else word for word in data.split())
with open (filename, "w") as f:
f.write(str(data))
print(data) # This is just added in order to check on screen all files
# Are being processed.
Assuming all of your code is supposed to be in your for loop. You are overriding your text file, therefore it looks like only your last run is working:
#this overrides the file
with open(filename, "w") as fh:
fh.write(str(data))
change to:
#this append to the file
with open(filename, "a") as fh:
fh.write(str(data))
This will append to your text file and will not override previous added data with the data from the last loop.
I'm trying to download files from a site and due to search result limitations (max 300), I need to search each item individually. I have a csv file that has a complete list which I've written some basic code to return the ID# column.
With some help, I've got another script that iterates through each search result and downloads a file. What I need to do now is to combine the two so that it will search each individual ID# and download the file.
I know my loop is messed up here, I just can't figure out where and if I'm even looping in the right order
import requests, json, csv
faciltiyList = []
with open('Facility List.csv', 'r') as f:
csv_reader = csv.reader(f, delimiter=',')
for searchterm in csv_reader:
faciltiyList.append(searchterm[0])
url = "https://siera.oshpd.ca.gov/FindFacility.aspx"
r = requests.get(url+"?term="+str(searchterm))
searchresults = json.loads(r.content.decode('utf-8'))
for report in searchresults:
rpt_id = report['RPT_ID']
reporturl = f"https://siera.oshpd.ca.gov/DownloadPublicFile.aspx?archrptsegid={rpt_id}&reporttype=58&exportformatid=8&versionid=1&pageid=1"
r = requests.get(reporturl)
a = r.headers['Content-Disposition']
filename = a[a.find("filename=")+9:len(a)]
file = open(filename, "wb")
file.write(r.content)
r.close()
The original code I have is here:
import requests, json
searchterm="ALAMEDA (COUNTY)"
url="https://siera.oshpd.ca.gov/FindFacility.aspx"
r=requests.get(url+"?term="+searchterm)
searchresults=json.loads(r.content.decode('utf-8'))
for report in searchresults:
rpt_id=report['RPT_ID']
reporturl=f"https://siera.oshpd.ca.gov/DownloadPublicFile.aspx?archrptsegid={rpt_id}&reporttype=58&exportformatid=8&versionid=1&pageid=1"
r=requests.get(reporturl)
a=r.headers['Content-Disposition']
filename=a[a.find("filename=")+9:len(a)]
file = open(filename, "wb")
file.write(r.content)
r.close()
The searchterm ="ALAMEDA (COUNTY)" results in more than 300 results, so I'm trying to replace "ALAMEDA (COUNTY)" with a list that'll run through each name (ID# in this case) so that I'll get just one result, then run again for the next on the list
CSV - just 1 line
Tested with a CSV file with just 1 line:
406014324,"HOLISTIC PALLIATIVE CARE, INC.",550004188,Parent Facility,5707 REDWOOD RD,OAKLAND,94619,1,ALAMEDA,Not Applicable,,Open,1/1/2018,Home Health Agency/Hospice,Hospice,37.79996,-122.17075
Python code
This script reads the IDs from the CSV file. Then, it fetches the results from URL and finally writes the desired contents to the disk.
import requests, json, csv
# read Ids from csv
facilityIds = []
with open('Facility List.csv', 'r') as f:
csv_reader = csv.reader(f, delimiter=',')
for searchterm in csv_reader:
facilityIds.append(searchterm[0])
# fetch and write file contents
url = "https://siera.oshpd.ca.gov/FindFacility.aspx"
for facilityId in facilityIds:
r = requests.get(url+"?term="+str(facilityId))
reports = json.loads(r.content.decode('utf-8'))
# print(f"reports = {reports}")
for report in reports:
rpt_id = report['RPT_ID']
reporturl = f"https://siera.oshpd.ca.gov/DownloadPublicFile.aspx?archrptsegid={rpt_id}&reporttype=58&exportformatid=8&versionid=1&pageid=1"
r = requests.get(reporturl)
a = r.headers['Content-Disposition']
filename = a[a.find("filename=")+9:len(a)]
# print(f"filename = {filename}")
with open(filename, "wb") as o:
o.write(r.content)
Repl.it link
I am struggling to add a delimiter to my output file. I am grabbing data off of twitter in an attempt to export it to a csv for excel. Can someone please look at my code and tell me what I need to alter to get this fixed.
Thanks!
for tweet in ts.search_tweets_iterable(tso):
text = tweet['text']
user = tweet['user']
meta = ['created_at']
vector = [user,text,meta]
dataToFile = toFileThread(keyWords[0]+'_tweets.txt',vector)
dataToFile.start()
entries_added = entries_added + 1
sys.stdout.flush()
sys.stdout.write('\r'+'Entries added: ' +str(entries_added))
current_amount_of_queries = ts.get_statistics()[0]
if not last_amount_of_queries == current_amount_of_queries:
last_amount_of_queries = current_amount_of_queries
time.sleep(sleep_for)
I'm using Tweepy to collect tweets from the Twitter API by their Tweet ID.
Im trying to read in a file full of the IDs, get the previous tweet from the conversation stream, then store that tweet and its author's screen name etc in a text file. Some of the tweets have been deleted or the user's profile has been set to private, in which case I want to ignore that tweet and move on to the next. However, for some reason, I'm not collecting all accessible tweets. Its storing maybe 3/4 of all tweets that aren't private and haven't been deleted. Any ideas why its not catching everything?
Thanks in advance.
def getTweet(tweetID, tweetObj, callTweetObj, i):
tweet = callTweetObj.text.encode("utf8")
callUserName = callTweetObj.user.screen_name
callTweetID = tweetObj.in_reply_to_status_id_str
with open("call_tweets.txt", "a") as calltweets:
output = (callTweetObj.text.encode('utf-8')+ "\t" + callTweetID + "\t" + tweetID)
calltweets.write(output)
print output
with open("callauthors.txt", "a") as callauthors:
cauthors = (callUserName+ "\t" + "\t" + callTweetID + "\n")
callauthors.write(cauthors)
with open("callIDs.txt", "a") as callIDs:
callIDs.write(callTweetID + "\n")
with open("newResponseIDs.txt", "a") as responseIDs:
responseIDs.write(tweetID)
count = 0
file = "Response_IDs.txt"
with open(file, 'r+') as f:
lines = f.readlines()
for i in range(0, len(lines)):
tweetID = lines[i]
sleep(5)
try:
tweetObj = api.get_status(tweetID)
callTweetID = tweetObj.in_reply_to_status_id_str
callTweetObj = api.get_status(callTweetID)
getTweet(tweetID, tweetObj, callTweetObj, i)
count = count+1
print count
except:
pass
You haven't specified information regarding the response coming back from api.get_status, so it's hard to detect what the error is.
However, it might be you have reached the rate limit for the statuses/show/:id request. The API specifies this request is limited to 180 requests a window.
You can use Tweepy to call application/rate_limit_status:
response = api.rate_limit_status()
remaining = response['resources']['statuses']['/statuses/show/:id']['remaining']
assert remaining > 0