I trust all is well with everyone here. My apologies if this has been answered before, though I am trying to do the following.
cursor = tweepy.Cursor(
api.search_tweets,
q = '"Hello"',
lang = 'en',
result_type = 'recent',
count = 2
)
I want to match the number in count, to the number of json objects I will be iterating through.
for tweet in cursor.items():
tweet_payload = json.dumps(tweet._json,indent=4, sort_keys=True)
I have tried several different ways to write the data, though it would appear that the following does not work (currently is a single fire):
with open("Tweet_Payload.json", "w") as outfile:
outfile.write(tweet_payload)
time.sleep(.25)
outfile.close()
This is what it looks like put together.
import time
import tweepy
from tweepy import cursor
import Auth_Codes
import json
twitter_auth_keys = {
"consumer_key" : Auth_Codes.consumer_key,
"consumer_secret" : Auth_Codes.consumer_secret,
"access_token" : Auth_Codes.access_token,
"access_token_secret" : Auth_Codes.access_token_secret
}
auth = tweepy.OAuthHandler(
twitter_auth_keys["consumer_key"],
twitter_auth_keys["consumer_secret"]
)
auth.set_access_token(
twitter_auth_keys["access_token"],
twitter_auth_keys["access_token_secret"]
)
api = tweepy.API(auth)
cursor = tweepy.Cursor(
api.search_tweets,
q = '"Hello"',
lang = 'en',
result_type = 'recent',
count = 2
)
for tweet in cursor.items():
tweet_payload = json.dumps(tweet._json,indent=4, sort_keys=True)
with open("Tweet_Payload.json", "w") as outfile:
outfile.write(tweet_payload)
time.sleep(.25)
outfile.close()
Edit:
Using the suggestion by Mickael
also, the current code
tweet_payload = []
for tweet in cursor.items():
tweet_payload.append(tweet._json)
print(json.dumps(tweet_payload, indent=4,sort_keys=True))
with open("Tweet_Payload.json", "w") as outfile:
outfile.write(json.dumps(tweet_payload,indent=4, sort_keys=True))
time.sleep(.25)
Just loops, I am not sure why thats the case when the count is 10. I thought it would run just 1 call for 10 results or less, then end.
Opening the file with the write mode erases its previous data so, if you want to add each new tweet to the file, you should use the append mode instead.
As an alternative, you could also store all the tweets' json in a list and write them all at once. That should be more efficient and the list at the root of your json file will make it valid.
json_tweets = []
for tweet in cursor.items():
json_tweets.append(tweet._json)
with open("Tweet_Payload.json", "w") as outfile:
outfile.write(json.dumps(json_tweets,indent=4, sort_keys=True))
On a side note, the with closes the file automatically, you don't need to do it.
Related
I'm trying to print the output of scraping a twitter feed using snscrape. It works on command line but I can't get it to print to file.
My code:
import snscrape.modules.twitter as twitter
maxTweets = 10
keyword='salvation'
for i, tweet in enumerate(twitter.TwitterSearchScraper(keyword + ' since:2021-11-01 until:2023-01-01 lang:"en" ').get_items()):
tweets = {
"tweet.url" : tweet.url
}
print(tweets)
It prints to the command line but when I try:
with open('file.txt', 'w', encoding="utf-8") as f:
print(tweets, file=f)
then it won't print and I get an error message:
future warning username is deprecated, use user.username instead
Note: printing after you open the file with mode='w' [in write-mode] means the file will be over-written with every print and only the last print will show up. If you want all the tweets objects preserved, you should be appending instead, inside the loop [because if you print outside the loop, again only the last will be saved as tweets is also over-written every loop]:
# for i, tweet in...
# tweets = ....
with open('file.txt', 'a', encoding="utf-8") as f:
print(tweets, file=f)
This doesn't make any sense - that's a warning message, not an error message, and it shouldn't halt or break your program; and also, that warning shouldn't appear unless you have something like tweet.username somewhere in you code [and if you do, then you should probably replace it with tweet.user.username as the warning instructs].
I am unable to reproduce the error, and how you print shouldn't really have anything to do with triggering the message, but if that's really the only difference between when the message appears and when it doesn't, then you can try some other method to save it to the file, like collecting all the tweets into a list [in the loop] and then [after the loops] converting that list to multi-line string to write to the file:
import snscrape.modules.twitter as twitter
maxTweets = 10
keyword='salvation'
timeStr = 'since:2021-11-01 until:2023-01-01 lang:"en" '
twGen = twitter.TwitterSearchScraper(f'{keyword} {timeStr} ').get_items()
allTweets = []
for i, tweet in enumerate(twGen):
if i > maxTweets: break
tweets = {
"tweet.url" : tweet.url
}
allTweets.append(tweets)
## [OUTSIDE loop]
with open('file.txt', 'w', encoding="utf-8") as f:
f.write('\n'.join[str(t) for t in allTweets])
I am trying to append values to a json file. How can i append the data? I have been trying so many ways but none are working ?
Code:
def all(title,author,body,type):
title = "hello"
author = "njas"
body = "vgbhn"
data = {
"id" : id,
"author": author,
"body" : body,
"title" : title,
"type" : type
}
data_json = json.dumps(data)
#data = ast.literal_eval(data)
#print data_json
if(os.path.isfile("offline_post.json")):
with open('offline_post.json','a') as f:
new = json.loads(f)
new.update(a_dict)
json.dump(new,f)
else:
open('offline_post.json', 'a')
with open('offline_post.json','a') as f:
new = json.loads(f)
new.update(a_dict)
json.dump(new,f)
How can I append data to json file when this function is called?
I suspect you left out that you're getting a TypeError in the blocks where you're trying to write the file. Here's where you're trying to write:
with open('offline_post.json','a') as f:
new = json.loads(f)
new.update(a_dict)
json.dump(new,f)
There's a couple of problems here. First, you're passing a file object to the json.loads command, which expects a string. You probably meant to use json.load.
Second, you're opening the file in append mode, which places the pointer at the end of the file. When you run the json.load, you're not going to get anything because it's reading at the end of the file. You would need to seek to 0 before loading (edit: this would fail anyway, as append mode is not readable).
Third, when you json.dump the new data to the file, it's going to append it to the file in addition to the old data. From the structure, it appears you want to replace the contents of the file (as the new data contains the old data already).
You probably want to use r+ mode, seeking back to the start of the file between the read and write, and truncateing at the end just in case the size of the data structure ever shrinks.
with open('offline_post.json', 'r+') as f:
new = json.load(f)
new.update(a_dict)
f.seek(0)
json.dump(new, f)
f.truncate()
Alternatively, you can open the file twice:
with open('offline_post.json', 'r') as f:
new = json.load(f)
new.update(a_dict)
with open('offline_post.json', 'w') as f:
json.dump(new, f)
This is a different approach, I just wanted to append without reloading all the data. Running on a raspberry pi so want to look after memory. The test code -
import os
json_file_exists = 0
filename = "/home/pi/scratch_pad/test.json"
# remove the last run json data
try:
os.remove(filename)
except OSError:
pass
count = 0
boiler = 90
tower = 78
while count<10:
if json_file_exists==0:
# create the json file
with open(filename, mode = 'w') as fw:
json_string = "[\n\t{'boiler':"+str(boiler)+",'tower':"+str(tower)+"}\n]"
fw.write(json_string)
json_file_exists=1
else:
# append to the json file
char = ""
boiler = boiler + .01
tower = tower + .02
while(char<>"}"):
with open(filename, mode = 'rb+') as f:
f.seek(-1,2)
size=f.tell()
char = f.read()
if char == "}":
break
f.truncate(size-1)
with open(filename, mode = 'a') as fw:
json_string = "\n\t,{'boiler':"+str(boiler)+",'tower':"+str(tower)+"}\n]"
fw.seek(-1, os.SEEK_END)
fw.write(json_string)
count = count + 1
I am trying to make a point system for my Twitch bot and I am encountering KeyErrors when trying to make a new entry for some odd reason. Here is my code:
import urllib2, json
def updateUsers(chan):
j = urllib2.urlopen('http://tmi.twitch.tv/group/user/' + chan + '/chatters')
j_obj = json.load(j)
with open('dat.dat', 'r') as data_file:
data = json.load(data_file)
for usr in j_obj['chatters']['viewers']:
data[usr]['Points'] = "0" # Were the KeyError: u'someguysusername' occurs
with open('dat.dat', 'w') as out_file:
json.dump(data, out_file)
updateUsers('tryhard_clan')
If you want to see the Json itself go to http://tmi.twitch.tv/group/user/tryhard_clan/chatters
I'm storing user data in a file in this format:
{"users": {"cupcake": {"Points": "0"}}}
a slightly more concise form than #Raunak suggested:
data.setdefault (usr, {}) ['Points'] = "0"
that will set data[usr] to an empty dict if it's not already there, and set the 'Points' element in any case.
It happens variable usr doesn't resolve to an existing key in data. Do this instead:
if usr not in data:
data[usr] = {}
data[usr]['Points'] = "0"
I have a csv file with several hundred organism IDs and a second csv file with several thousand organism IDs and additional characteristics (taxonomic information, abundances per sample, etc)
I am trying to write a code that will extract the information from the larger csv using the smaller csv file as a reference. Meaning it will look at both smaller and larger files, and if the IDs are in both files, it will extract all the information form the larger file and write that in a new file (basically write the entire row for that ID).
so far I have written the following, and while the code does not error out on me, I get a blank file in the end and I don't exactly know why. I am a graduate student that knows some simple coding but I'm still very much a novice,
thank you
import sys
import csv
import os.path
SparCCnames=open(sys.argv[1],"rU")
OTU_table=open(sys.argv[2],"rU")
new_file=open(sys.argv[3],"w")
Sparcc_OTUs=csv.writer(new_file)
d=csv.DictReader(SparCCnames)
ids=csv.DictReader(OTU_table)
for record in ids:
idstopull=record["OTUid"]
if idstopull[0]=="OTUid":
continue
if idstopull[0] in d:
new_id.writerow[idstopull[0]]
SparCCnames.close()
OTU_table.close()
new_file.close()
I'm not sure what you're trying to do in your code but you can try this:
def csv_to_dict(csv_file_path):
csv_file = open(csv_file_path, 'rb')
csv_file.seek(0)
sniffdialect = csv.Sniffer().sniff(csv_file.read(10000), delimiters='\t,;')
csv_file.seek(0)
dict_reader = csv.DictReader(csv_file, dialect=sniffdialect)
csv_file.seek(0)
dict_data = []
for record in dict_reader:
dict_data.append(record)
csv_file.close()
return dict_data
def dict_to_csv(csv_file_path, dict_data):
csv_file = open(csv_file_path, 'wb')
writer = csv.writer(csv_file, dialect='excel')
headers = dict_data[0].keys()
writer.writerow(headers)
# headers must be the same with dat.keys()
for dat in dict_data:
line = []
for field in headers:
line.append(dat[field])
writer.writerow(line)
csv_file.close()
if __name__ == "__main__":
big_csv = csv_to_dict('/path/to/big_csv_file.csv')
small_csv = csv_to_dict('/path/to/small_csv_file.csv')
output = []
for s in small_csv:
for b in big_csv:
if s['id'] == b['id']:
output.append(b)
if output:
dict_to_csv('/path/to/output.csv', output)
else:
print "Nothing."
Hope that will help.
You need to read the data into a data structure, assuming OTUid is unique you can store this into a dictionary for fast lookup:
with open(sys.argv[1],"rU") as SparCCnames:
d = csv.DictReader(SparCCnames)
fieldnames = d.fieldnames
data = {i['OTUid']: i for i in d}
with open(sys.argv[2],"rU") as OTU_table, open(sys.argv[3],"w") as new_file:
Sparcc_OTUs = csv.DictWriter(new_file, fieldnames)
ids = csv.DictReader(OTU_table)
for record in ids:
if record['OTUid'] in data:
Sparcc_OTUs.writerow(data[record['OTUid']])
Thank you everyone for your help. I played with things and consulted with an advisor, and finally got a working script. I am posting it in case it helps someone else in the future.
Thanks!
import sys
import csv
input_file = csv.DictReader(open(sys.argv[1], "rU")) #has all info
ref_list = csv.DictReader(open(sys.argv[2], "rU")) #reference list
output_file = csv.DictWriter(
open(sys.argv[3], "w"), input_file.fieldnames) #to write output file with headers
output_file.writeheader() #write headers in output file
white_list={} #create empty dictionary
for record in ref_list: #for every line in my reference list
white_list[record["Sample_ID"]] = None #store into the dictionary the ID's as keys
for record in input_file: #for every line in my input file
record_id = record["Sample_ID"] #store ID's into variable record_id
if (record_id in white_list): #if the ID is in the reference list
output_file.writerow(record) #write the entire row into a new file
else: #if it is not in my reference list
continue #ignore it and continue iterating through the file
I have some simple code to ingest some JSON Twitter data, and output some specific fields into separate columns of a CSV file. My problem is that I cannot for the life of me figure out the proper way to encode the output as UTF-8. Below is the closest I've been able to get, with the help of a member here, but I still it still isn't running correctly and fails because of the unique characters in the tweet text field.
import json
import sys
import csv
import codecs
def main():
writer = csv.writer(codecs.getwriter("utf-8")(sys.stdout), delimiter="\t")
for line in sys.stdin:
line = line.strip()
data = []
try:
data.append(json.loads(line))
except ValueError as detail:
continue
for tweet in data:
## deletes any rate limited data
if tweet.has_key('limit'):
pass
else:
writer.writerow([
tweet['id_str'],
tweet['user']['screen_name'],
tweet['text']
])
if __name__ == '__main__':
main()
From Docs:
https://docs.python.org/2/howto/unicode.html
a = "string"
encodedstring = a.encode('utf-8')
If that does not work:
Python DictWriter writing UTF-8 encoded CSV files
I have had the same problem. I have a large amount of data from twitter firehose so every possible complication case (and has arisen)!
I've solved it as follows using try / except:
if the dict value is a string: if isinstance(value,basestring) I try to encode it straight away. If not a string, I make it a string and then encode it.
If this fails, it's because some joker is tweeting odd symbols to mess up my script. If that is the case, firstly I decode then re-encode value.decode('utf-8').encode('utf-8') for strings and decode, make into a string and re-encode for non-strings value.decode('utf-8').encode('utf-8')
Have a go with this:
import csv
def export_to_csv(list_of_tweet_dicts,export_name="flat_twitter_output.csv"):
utf8_flat_tweets=[]
keys = []
for tweet in list_of_tweet_dicts:
tmp_tweet = tweet
for key,value in tweet.iteritems():
if key not in keys: keys.append(key)
# convert fields to utf-8 if text
try:
if isinstance(value,basestring):
tmp_tweet[key] = value.encode('utf-8')
else:
tmp_tweet[key] = str(value).encode('utf-8')
except:
if isinstance(value,basestring):
tmp_tweet[key] = value.decode('utf-8').encode('utf-8')
else:
tmp_tweet[key] = str(value.decode('utf-8')).encode('utf-8')
utf8_flat_tweets.append(tmp_tweet)
del tmp_tweet
list_of_tweet_dicts = utf8_flat_tweets
del utf8_flat_tweets
with open(export_name, 'w') as f:
dict_writer = csv.DictWriter(f, fieldnames=keys,quoting=csv.QUOTE_ALL)
dict_writer.writeheader()
dict_writer.writerows(list_of_tweet_dicts)
print "exported tweets to '"+export_name+"'"
return list_of_tweet_dicts
hope that helps you.