Think I'm overlooking the answer so need a fresh pair of eyes. My script should search and get tweets, then write the tweet date, username, and tweet text on one row, separated by columns. Then write the next matching tweet to a new row and so on. Printing the returned twitter object values confirms all ok. Can print and separate the data for each tweet. However, when writing to Excel, my loop code just writes the first tweet n times, without the remaining tweets.
Code:
print ('TEST PRINT...')
for tweet in tweepy.Cursor(api.search, search).items(numberOfTweets):
print(tweet.created_at)
print(tweet.user.screen_name)
print(tweet.text)
print '\n'
for tweet in tweepy.Cursor(api.search, search).items(numberOfTweets):
for rowNum in range(3, sheet.max_row):
sheet.cell(row=rowNum, column=1).value = tweet.created_at
sheet.cell(row=rowNum, column=2).value = tweet.user.screen_name
sheet.cell(row=rowNum, column=3).value = tweet.text
break
The second code block is the issue. How can I write the three above tweet values for each tweet on separate rows?
Thanks in advance...
Yes you are looping with the same tweet. Try this (i could't test) :
rowNum = 0 # or 3 ?
for tweet in tweepy.Cursor(api.search, search).items(numberOfTweets):
sheet.cell(row=rowNum, column=1).value = tweet.created_at
sheet.cell(row=rowNum, column=2).value = tweet.user.screen_name
sheet.cell(row=rowNum, column=3).value = tweet.text
rowNum = rowNum + 1
Related
I can't get the full text of the Arabic tweet when I use tweet_mode='extended' and tweet.full_text.encode('utf-8') I don't know what is the problem
HashValue = "#لقاح"
StartDate = "2022-01-02"
csvFile = open(HashValue+'.csv', 'a',encoding='utf-8')
csvWriter = csv.writer(csvFile)
for tweet in tweepy.Cursor(api.search,
q=HashValue,
count=20,
lang="ar",
since=StartDate,
tweet_mode='extended').items():
print (tweet.created_at, tweet.full_text)
csvWriter.writerow([tweet.created_at, tweet.full_text.encode('utf-8')])
print ("Scraping finished and saved to "+HashValue+".csv")
I'm in need of some advice with my Twitter sentiment analysis.
I'm trying to do a pretty common sentiment analysis, but not on random tweets from Twitter search, but on the tweets of selected users.
What I've tried so far, is that I read in this csv of the users. And then iterated over this list and then, user by user conducted this tweet analysis.
I'll put my write_tweets function here, just so it could get some feedback maybe :)
def write_tweets(users_df, file):
# If the file exists, then read the existing data from the CSV file.
if os.path.exists(file):
df = pd.read_csv(file, header=0)
else:
df = pd.DataFrame(columns=COLS)
#page attribute in tweepy.cursor and iteration
for user in users_df[0]:
#for user in users_list:
print(user)
#user = 'fion_li'
try:
#for status in tweepy.Cursor(api.user_timeline, screen_name=user, count = 1,tweet_mode="extended").items(22):
for status in tweepy.Cursor(api.user_timeline, screen_name=user, count = 1,tweet_mode="extended").items(1):
#print(status)
new_entry = []
status = status._json
#print(to_datetime(status['created_at']))
#print(status['full_text'])
#csvFile = open(file, 'a' ,encoding='utf-8')
if (to_datetime(status['created_at']) < startDate):
#print(to_datetime(status['created_at']))
#print(status['full_text'])
continue
## check whether the tweet is in english or skip to the next tweet
if status['lang'] != 'en':
continue
#tweepy preprocessing called for basic preprocessing
#clean_text = clean(status['entities'])
clean_text = clean(status['full_text'])
#call clean_tweet method for extra preprocessing
filtered_tweet=clean_tweets(clean_text)
#pass textBlob method for sentiment calculations
blob = TextBlob(filtered_tweet)
blob_2 = TextBlob(filtered_tweet, analyzer=NaiveBayesAnalyzer())
Sentiment = blob.sentiment
Sentiment_2 = blob_2.sentiment
#seperate polarity and subjectivity in to two variables
polarity = Sentiment.polarity
subjectivity = Sentiment.subjectivity
positivity = Sentiment_2.p_pos
negativity = Sentiment_2.p_neg
#new entry append
new_entry += [status['id'], status['created_at'],
status['source'],
#status['full_text'],
filtered_tweet, Sentiment,polarity,subjectivity, positivity, negativity, status['lang'],
status['favorite_count'], status['retweet_count']]
#to append original author of the tweet
new_entry.append(status['user']['screen_name'])
try:
is_sensitive = status['possibly_sensitive']
except KeyError:
is_sensitive = None
new_entry.append(is_sensitive)
# hashtagas and mentiones are saved using comma separted
hashtags = ", ".join([hashtag_item['text'] for hashtag_item in status['entities']['hashtags']])
new_entry.append(hashtags)
mentions = ", ".join([mention['screen_name'] for mention in status['entities']['user_mentions']])
new_entry.append(mentions)
#get location of the tweet if possible
try:
location = status['user']['location']
except TypeError:
location = ''
new_entry.append(location)
try:
coordinates = [coord for loc in status['place']['bounding_box']['coordinates'] for coord in loc]
except TypeError:
coordinates = None
new_entry.append(coordinates)
single_tweet_df = pd.DataFrame([new_entry], columns=COLS)
#print(single_tweet_df)
df = df.append(single_tweet_df, ignore_index=True)
csvFile = open(file, 'a' ,encoding='utf-8')
except Exception, e:
pass
#csvFile = open(file, 'a' ,encoding='utf-8')
df.to_csv(csvFile, mode='a', columns=COLS, index=False, encoding="utf-8")
write_tweets(users_list, test_file)
Output would be a few indicators of sentiment, like positivity, negativity, neutral etc.
My question is, that maybe some of you has done this kind of thing already and can give me some recommendations about it? My version of it seems very slow and not very efficient (for me, at least).
Thanks in advance
I want to check for a name column in an existing spreadsheet and if it exists I want to update a specific column with a time stamp for that row. I'm in a rut because I can't figure out how to go about this with out a for loop. The for loop will append more rows for the ones it didnt match and nothing shows up in column when I try to stamp it after matching a name with a row.
for rowNum in range(2, ws1.max_row):
log_name = ws1.cell(row=rowNum,column=1).value
if log_name == chkout_new_name_text:
print 'apple' + 'pen'
ws1.cell(row=rowNum, column=2).value = str(time.strftime("%m/%d/%y %H:%M %p"))
break
else:
continue
print 'pen' + 'pineapple'
# Normal procedure
Any help will be greatly appreciated.
Figured it out
name_text = raw_input("Please enter name: ")
matching_row_nbr = None
for rowNum in range(2, ws1.max_row + 1 ):
log_name = ws1.cell(row=rowNum,column=1).value
if log_name == name_text:
# Checks for a matching row and remembers the row number
matching_row_nbr = rowNum
break
if matching_row_nbr is not None:
# Uses the matching row number to change the cell value of the specific row
ws1.cell(row=matching_row_nbr, column=6).value = str(time.strftime("%m/%d/%y %H:%M - %p"))
wb.save(filename = active_workbook)
else:
# If the none of the rows match then continue with intended use of new data
print name_text
I have a json-text file containing tweets from a certain hashtag. Now I transform it to a matrix with a row for each tweet and an array of columns such as user, time, latitude, longitude and so on. I have written the following code, but when I get the output file the information is not saved. It has just showed the header row:
#import module
import json
from csv import writer
#input file
tweets = ()
for line in open('file.txt'):
try:
tweets.append(json.loads(line))
except:
pass
#variables
ids = [tweet['id_str'] for tweet in tweets]
times = [tweet['created_at'] for tweet in tweets]
users = [tweet['user']['name'] for tweet in tweets]
texts = [tweet['text'] for tweet in tweets]
lats = [(T['geo']['coordinates'][0] if T['geo'] else None) for T in tweets]
lons = [(T['geo']['coordinates'][1] if T['geo'] else None) for T in tweets]
place_names = [(T['place']['full_name'] if T['place'] else None) for T in tweets]
place_types = [(T['place']['place_type'] if T['place'] else None) for T in tweets]
#output file
out = open('tweets_file.csv', 'w')
print >> out, 'id,created,text,user,lat,lon,place name,place type'
rows = zip(ids, times, texts, users, lats, lons, place_names, place_types)
csv = writer(out)
for row in rows:
values = [(value.encode('utf8') if hasattr(value, 'encode') else value) for value in row]
csv.writerow(values)
out.close()
Please could you help me to find and clear the bug... Thanks in advance.
R.
In your code, tweets is a tuple:
'tuple' object has no attribute 'append'
It seems you have copy-paste code from several sources without understand what is doing.
import json
from csv import writer
with open('file.txt') as data_file:
data = json.load(data_file)
tweets = data['statuses']
ids = [tweet['id_str'] for tweet in tweets]
times = [tweet['created_at'] for tweet in tweets]
...
I am uploading a csv file and would like to loop though the data and insert each cell into the database. Here is the python code:
import csv
#app.route("/uploadcsv", methods=['POST'])
def uploadcsv():
myfile = request.files['file']
r = csv.reader(myfile)
headers = r.next()
for row in r:
print str(row[0])
print row[1]
print row[2]
print row[3]
print row[4]
print row[5]
print row[6]
print row[7]
print row[8]
# put into database
return "OK"
There are currently 3 rows in the csv file (many more later) and only the first row is printed, how can I print all the rows?
The csv file is:
first_name,last_name,email,phone,designation,company,industry,tag,created_at
john,smith,john#example.com,1234567,some company,some industry,some tag, now()
Reduce the indentation of the return "OK" statementby one level - as it is currently, it returns from uploadcsv() as soon as the first row has been printed, and not, as you intended, after the for-loop.