How do I print a snscrape output

How do I print a snscrape output - python

I'm trying to print the output of scraping a twitter feed using snscrape. It works on command line but I can't get it to print to file.
My code:
import snscrape.modules.twitter as twitter
maxTweets = 10
keyword='salvation'
for i, tweet in enumerate(twitter.TwitterSearchScraper(keyword + ' since:2021-11-01 until:2023-01-01 lang:"en" ').get_items()):
tweets = {
"tweet.url" : tweet.url
}
print(tweets)
It prints to the command line but when I try:
with open('file.txt', 'w', encoding="utf-8") as f:
print(tweets, file=f)
then it won't print and I get an error message:
future warning username is deprecated, use user.username instead

Note: printing after you open the file with mode='w' [in write-mode] means the file will be over-written with every print and only the last print will show up. If you want all the tweets objects preserved, you should be appending instead, inside the loop [because if you print outside the loop, again only the last will be saved as tweets is also over-written every loop]:
# for i, tweet in...
# tweets = ....
with open('file.txt', 'a', encoding="utf-8") as f:
print(tweets, file=f)
This doesn't make any sense - that's a warning message, not an error message, and it shouldn't halt or break your program; and also, that warning shouldn't appear unless you have something like tweet.username somewhere in you code [and if you do, then you should probably replace it with tweet.user.username as the warning instructs].
I am unable to reproduce the error, and how you print shouldn't really have anything to do with triggering the message, but if that's really the only difference between when the message appears and when it doesn't, then you can try some other method to save it to the file, like collecting all the tweets into a list [in the loop] and then [after the loops] converting that list to multi-line string to write to the file:
import snscrape.modules.twitter as twitter
maxTweets = 10
keyword='salvation'
timeStr = 'since:2021-11-01 until:2023-01-01 lang:"en" '
twGen = twitter.TwitterSearchScraper(f'{keyword} {timeStr} ').get_items()
allTweets = []
for i, tweet in enumerate(twGen):
if i > maxTweets: break
tweets = {
"tweet.url" : tweet.url
}
allTweets.append(tweets)
## [OUTSIDE loop]
with open('file.txt', 'w', encoding="utf-8") as f:
f.write('\n'.join[str(t) for t in allTweets])

Related

Tweepy, how to pull count integer and use it

I trust all is well with everyone here. My apologies if this has been answered before, though I am trying to do the following.
cursor = tweepy.Cursor(
api.search_tweets,
q = '"Hello"',
lang = 'en',
result_type = 'recent',
count = 2
)
I want to match the number in count, to the number of json objects I will be iterating through.
for tweet in cursor.items():
tweet_payload = json.dumps(tweet._json,indent=4, sort_keys=True)
I have tried several different ways to write the data, though it would appear that the following does not work (currently is a single fire):
with open("Tweet_Payload.json", "w") as outfile:
outfile.write(tweet_payload)
time.sleep(.25)
outfile.close()
This is what it looks like put together.
import time
import tweepy
from tweepy import cursor
import Auth_Codes
import json
twitter_auth_keys = {
"consumer_key" : Auth_Codes.consumer_key,
"consumer_secret" : Auth_Codes.consumer_secret,
"access_token" : Auth_Codes.access_token,
"access_token_secret" : Auth_Codes.access_token_secret
}
auth = tweepy.OAuthHandler(
twitter_auth_keys["consumer_key"],
twitter_auth_keys["consumer_secret"]
)
auth.set_access_token(
twitter_auth_keys["access_token"],
twitter_auth_keys["access_token_secret"]
)
api = tweepy.API(auth)
cursor = tweepy.Cursor(
api.search_tweets,
q = '"Hello"',
lang = 'en',
result_type = 'recent',
count = 2
)
for tweet in cursor.items():
tweet_payload = json.dumps(tweet._json,indent=4, sort_keys=True)
with open("Tweet_Payload.json", "w") as outfile:
outfile.write(tweet_payload)
time.sleep(.25)
outfile.close()
Edit:
Using the suggestion by Mickael
also, the current code
tweet_payload = []
for tweet in cursor.items():
tweet_payload.append(tweet._json)
print(json.dumps(tweet_payload, indent=4,sort_keys=True))
with open("Tweet_Payload.json", "w") as outfile:
outfile.write(json.dumps(tweet_payload,indent=4, sort_keys=True))
time.sleep(.25)
Just loops, I am not sure why thats the case when the count is 10. I thought it would run just 1 call for 10 results or less, then end.

Opening the file with the write mode erases its previous data so, if you want to add each new tweet to the file, you should use the append mode instead.
As an alternative, you could also store all the tweets' json in a list and write them all at once. That should be more efficient and the list at the root of your json file will make it valid.
json_tweets = []
for tweet in cursor.items():
json_tweets.append(tweet._json)
with open("Tweet_Payload.json", "w") as outfile:
outfile.write(json.dumps(json_tweets,indent=4, sort_keys=True))
On a side note, the with closes the file automatically, you don't need to do it.

python error while looping from a list json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I'm making a script that fills a text document with responses from an api. The api is being asked to convert usernames from a list to universally unique identifiers. I keep getting this error and can't find a way around it. "json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)"
Sample of accounts.txt
knapplace
Coppinator
tynow
Pman59
ButterMusty
FlyHighGuy13
Seyashi
fluzzygirl1
SquidMan55
leonrules9
BarthGimble
MTR_30
Darkshadow402
Deathmyster
Team_Everlook
Sheathok
KCFrost
mendog
Allfaal117
theLP25D
Zimyx
Blurrnis
redboy678
moose_breeder
kaser12345
import requests
import json
file1 = open('accounts.txt', 'r')
usernames = []
for line in file1:
stripped_line = line.strip()
usernames.append(stripped_line)
file1.close()
for x in usernames:
username = str(x)
url = ("https://api.mojang.com/users/profiles/minecraft/"+username+"?at=1462770000")
y = requests.get(url)
y_data = y.json()
uuid = y_data['id']
uuids = []
uuids.append(uuid)
file2 = open('uuids.txt', 'w')
file2.writelines(uuids)
file2.close()
file2 = open('uuids.txt', 'r')
lines = file2.readlines()

Note: #Ali makes a great point about checking for an empty reply. With that fix it works like a champ for me with a few other minor changes:
Used usernames provided by OP instead of reading them in from a file.
Moved initialization of uuids out of for loop to avoid it being reset for each username.
Modfied file i/o stuff to what I am more used to working with. ;^)
import requests
import json
usernames = [
"knapplace",
"Coppinator",
"tynow",
]
uuids = []
for x in usernames:
username = str(x)
url = ("https://api.mojang.com/users/profiles/minecraft/"+username+"?at=1462770000")
y = requests.get(url)
if len(y.content) == 0:
continue # Skip processing this username
y_data = y.json()
uuid = y_data['id']
uuids.append(uuid)
with open('uuids.txt', 'w') as f:
for uuid in uuids:
f.write(uuid + '\n')
with open('uuids.txt', 'r') as f:
read_data = f.read()
print(read_data)
Output:
c9998bafea3146d5935f4e215b6b4351
5c321f81409847a0907c4b30c342217f
9f206def69bf407fbab6de7c9b70ff80

I checked the URL you pasted. If the user does not exist, the API does not return any content but still returns a successful status. That is what the error means — it expected there to be a JSON object starting at char 0.
Essentially, you need to handle the case when the response is empty before you try to execute a y.json() by checking y.content. If y.content is empty, skip processing the current username and go to the next one.
y = requests.get(url)
if len(y.content) == 0:
continue # Skip processing this username
# The rest of the code only runs if y.content is not empty.
y_data = y.json()
uuid = y_data['id']

How to write the list to file?

I am unable to write the result of the following code to a file
import boto3
ACCESS_KEY= "XXX"
SECRET_KEY= "XXX"
regions = ['us-east-1','us-west-1','us-west-2','eu-west-1','sa-east-1','ap-southeast-1','ap-southeast-2','ap-northeast-1']
for region in regions:
client = boto3.client('ec2',aws_access_key_id=ACCESS_KEY,aws_secret_access_key=SECRET_KEY,region_name=region,)
addresses_dict = client.describe_addresses()
#f = open('/root/temps','w')
for eip_dict in addresses_dict['Addresses']:
with open('/root/temps', 'w') as f:
if 'PrivateIpAddress' in eip_dict:
print eip_dict['PublicIp']
f.write(eip_dict['PublicIp'])
This results in printing the IP's but nothing gets written in file, the result of print is :
22.1.14.1
22.1.15.1
112.121.41.41
....
I just need to write the content in this format only

for eip_dict in addresses_dict['Addresses']:
with open('/root/temps', 'w') as f:
if 'PrivateIpAddress' in eip_dict:
print eip_dict['PublicIp']
f.write(eip_dict['PublicIp'])
You are re-opening the file for writing at each iteration of the loop. Perhaps the last iteration has no members with 'PrivateIpAddress' in its dict, so the file gets opened, truncated, and left empty. Write it this way instead:
with open('/root/temps', 'w') as f:
for eip_dict in addresses_dict['Addresses']:
if 'PrivateIpAddress' in eip_dict:
print eip_dict['PublicIp']
f.write(eip_dict['PublicIp'])

open file in append mode
with open('/root/temps', 'a') as f:
or
declare the file outside the loop

Twitter API - not collecting all tweets using Tweepy

I'm using Tweepy to collect tweets from the Twitter API by their Tweet ID.
Im trying to read in a file full of the IDs, get the previous tweet from the conversation stream, then store that tweet and its author's screen name etc in a text file. Some of the tweets have been deleted or the user's profile has been set to private, in which case I want to ignore that tweet and move on to the next. However, for some reason, I'm not collecting all accessible tweets. Its storing maybe 3/4 of all tweets that aren't private and haven't been deleted. Any ideas why its not catching everything?
Thanks in advance.
def getTweet(tweetID, tweetObj, callTweetObj, i):
tweet = callTweetObj.text.encode("utf8")
callUserName = callTweetObj.user.screen_name
callTweetID = tweetObj.in_reply_to_status_id_str
with open("call_tweets.txt", "a") as calltweets:
output = (callTweetObj.text.encode('utf-8')+ "\t" + callTweetID + "\t" + tweetID)
calltweets.write(output)
print output
with open("callauthors.txt", "a") as callauthors:
cauthors = (callUserName+ "\t" + "\t" + callTweetID + "\n")
callauthors.write(cauthors)
with open("callIDs.txt", "a") as callIDs:
callIDs.write(callTweetID + "\n")
with open("newResponseIDs.txt", "a") as responseIDs:
responseIDs.write(tweetID)
count = 0
file = "Response_IDs.txt"
with open(file, 'r+') as f:
lines = f.readlines()
for i in range(0, len(lines)):
tweetID = lines[i]
sleep(5)
try:
tweetObj = api.get_status(tweetID)
callTweetID = tweetObj.in_reply_to_status_id_str
callTweetObj = api.get_status(callTweetID)
getTweet(tweetID, tweetObj, callTweetObj, i)
count = count+1
print count
except:
pass

You haven't specified information regarding the response coming back from api.get_status, so it's hard to detect what the error is.
However, it might be you have reached the rate limit for the statuses/show/:id request. The API specifies this request is limited to 180 requests a window.
You can use Tweepy to call application/rate_limit_status:
response = api.rate_limit_status()
remaining = response['resources']['statuses']['/statuses/show/:id']['remaining']
assert remaining > 0

File Operation in Python

What I am trying to do:
I am trying to use 'Open' in python and this is the script I am trying to execute. I am trying to give "restaurant name" as input and a file gets saved (reviews.txt).
Script: (in short, the script goes to a page and scrapes the reviews)
from bs4 import BeautifulSoup
from urllib import urlopen
queries = 0
while queries <201:
stringQ = str(queries)
page = urlopen('http://www.yelp.com/biz/madison-square-park-new-york?start=' + stringQ)
soup = BeautifulSoup(page)
reviews = soup.findAll('p', attrs={'itemprop':'description'})
authors = soup.findAll('span', attrs={'itemprop':'author'})
flag = True
indexOf = 1
for review in reviews:
dirtyEntry = str(review)
while dirtyEntry.index('<') != -1:
indexOf = dirtyEntry.index('<')
endOf = dirtyEntry.index('>')
if flag:
dirtyEntry = dirtyEntry[endOf+1:]
flag = False
else:
if(endOf+1 == len(dirtyEntry)):
cleanEntry = dirtyEntry[0:indexOf]
break
else:
dirtyEntry = dirtyEntry[0:indexOf]+dirtyEntry[endOf+1:]
f=open("reviews.txt", "a")
f.write(cleanEntry)
f.write("\n")
f.close
queries = queries + 40
Problem:
It's using append mode 'a' and according to documentation, 'w' is the write mode where it overwrites. When i change it to 'w' nothing happens.
f=open("reviews.txt", "w") #does not work!
Actual Question:
EDIT: Let me clear the confusion.
I just want ONE review.txt file with all the reviews. Everytime I run the script, I want the script to overwrite the existing review.txt with new reviews according to my input.
Thank you,

If I understand properly what behavior you want, then this should be the right code:
with open("reviews.txt", "w") as f:
for review in reviews:
dirtyEntry = str(review)
while dirtyEntry.index('<') != -1:
indexOf = dirtyEntry.index('<')
endOf = dirtyEntry.index('>')
if flag:
dirtyEntry = dirtyEntry[endOf+1:]
flag = False
else:
if(endOf+1 == len(dirtyEntry)):
cleanEntry = dirtyEntry[0:indexOf]
break
else:
dirtyEntry = dirtyEntry[0:indexOf]+dirtyEntry[endOf+1:]
f.write(cleanEntry)
f.write("\n")
This will open the file for writing only once and will write all the entries to it. Otherwise, if it's nested in for loop, the file is opened for each review and thus overwritten by the next review.
with statement ensures that when the program quits the block, the file will be closed. It also makes code easier to read.
I'd also suggest to avoid using brackets in if statement, so instead of
if(endOf+1 == len(dirtyEntry)):
it's better to use just
if endOf + 1 == len(dirtyEntry):

If you want to write every record to a different new file, you must name it differently, because this way you are always overwritting your old data with new data, and you are left only with the latest record.
You could increment your filename like this:
# at the beginning, above the loop:
i=1
f=open("reviews_{0}.txt".format(i), "a")
f.write(cleanEntry)
f.write("\n")
f.close
i+=1
UPDATE
According to your recent update, I see that this is not what you want. To achieve what you want, you just need to move f=open("reviews.txt", "w") and f.close() outside of the for loop. That way, you won't be opening it multiple times inside a loop, every time overwriting your previous entries:
f=open("reviews.txt", "w")
for review in reviews:
# ... other code here ... #
f.write(cleanEntry)
f.write("\n")
f.close()
But, I encourage you to use with open("reviews.txt", "w") as described in Alexey's answer.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I print a snscrape output - python

Related

Tweepy, how to pull count integer and use it

python error while looping from a list json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

How to write the list to file?

Twitter API - not collecting all tweets using Tweepy

File Operation in Python

Categories

Resources