Tweets streamed using tweepy, reading json file in python - python

I streamed tweets using the following code
class CustomStreamListener(tweepy.StreamListener):
def on_data(self, data):
try:
with open('brasil.json', 'a') as f:
f.write(data)
return True
except BaseException as e:
print("Error on_data: %s" % str(e))
return True
Now I have a json file (brasil.json). I want to open it on python to do sentiment analysis but I can't find a way. I managed to open the first tweet using this:
with open('brasil.json') as f:
for line in f:
tweets.append(json.loads(line))
but it doesn't read all the other tweets. Any idea?

From comments: after examining the contents of the json data-file, all the tweets are in the odd number if rows. The even numbers are blank.
This caused a json.decoder.JSONDecodeError.
There are two ways to handle this error, either read only the odd rows or use exception-handling.
using odd rows:
with open('brasil.json') as f:
for n, line in enumerate(f, 1):
if n % 2 == 1: # this line is in an odd-numbered row
tweets.append(json.loads(line))
exception-handling:
with open('brasil.json', 'r') as f:
for line in f:
try:
tweets.append(json.loads(line))
except json.decoder.JSONDecodeError:
pass # skip this line
try and see which one works best.

Related

Python csv get original raw data line

In python it is easy to read and parse a csv file and process line-by-line:
reader = csv.reader(open("my_csv_file.csv"))
for row in reader:
# row is an array or dict
parsed_data = my_data_parser(row)
where my_data_parser is my own piece of logic that takes input data, parses and does logic.
If my parser fails, I would like to log the entire original line of csv file, but it seems that from the csv reader i have no more access to it.
Is it possible to retrieve the original raw line data?
It doesn't seem like the csv.reader() exposes the file object it's iterating, however, you could use the reader's line_num attribute to achieve what you want.
For example:
import csv
file = open("my_csv_file.csv")
lines = file.readlines()
reader = csv.reader(lines)
for row in reader:
# row is an array or dict
try:
parsed_data = my_data_parser(row)
except MyDataParserError:
print(f"ERROR in line number {reader.line_num}")
print("Full line:")
print(lines[reader.line_num])
file.close()
Alternative
If you'd like to avoid always loading the file into memory, you could instead keep your initial way of reading the file and only read the whole file into memory if an error occurred:
import csv
reader = csv.reader(open("my_csv_file.csv"))
for row in reader:
# row is an array or dict
try:
parsed_data = my_data_parser(row)
except MyDataParserError:
# Only read the whole file into memory when an error occurred.
file = open("my_csv_file.csv")
lines = file.readlines()
file.close()
print(f"ERROR in line number {reader.line_num}")
print("Full line:")
print(lines[reader.line_num])
You can access the row line number with
reader.line_num
But there seems to be no direct way to access the actual line (says doc). Here is iterative method that avoids reading the whole file to memory at any step:
import csv
class MyException(Exception):
pass
def super_logic(line): # Some silly logic to get test code running
if len(line) != 2 or line[1] != '1':
raise MyException("Invalid value")
print("Process: %s" % line)
class LastLineReader:
def __init__(self, fn ):
self.fid = open(fn)
def __iter__(self):
return self
def __next__(self):
line = self.fid.readline() # Read single line and cache it local object
if len(line) == 0:
raise StopIteration()
self.current_line = line.strip()
return line
reader_with_lines = LastLineReader( "my_csv_file.csv" )
reader = csv.reader( reader_with_lines )
for line in reader:
try:
super_logic(line)
except MyException as e:
print("Got exception: %s at line '%s'" % ( e, reader_with_lines.current_line ))
(Edited: removed other solutions as they are also visible on other ppl posts)
As alternative to reader.line_num
for index, row in enumerate(reader):
print(i + 1, row)

Python - how to make loop continue after Try Catch?

I have following python code and it works fine but it brings error and then jumps to the last line.
Then I remove problematic line from a file, run again python script but it again finds problematic line and jumps to the end.
I want to be able to print all lines without jumping to the end of python script (just skip line and continue to the next):
import csv
with open('data.tsv', "rb") as f:
reader = csv.reader( f )
try:
for row in reader:
continue
except csv.Error, e:
print reader.line_num, e
pass
print "End of file!\n"
iterate manually, hoping that the csv reader object can recover from the exception:
import csv
with open('data.tsv', "r") as f:
reader = csv.reader( f )
while True:
try:
row = next(reader)
print(row)
except csv.Error as e:
print("line: {}, error: {}".format(reader.line_num, e))
except StopIteration:
break
print("End of file!\n")
the StopIteration exception is raised when csv.reader object reaches the end of file. At this point, break is used to exit from the infinite loop.
Let's test this by inserting a NULL byte in a row. An easy way is to replace f by a list of rows:
data = """hello,world
foo,bar
hi\x00,I'm joe
recovered,yeah
"""
f = data.splitlines()
now f can be fed to csv.reader with the code above (remove with block). Note the NUL byte inserted at the third line. Output:
['hello', 'world']
['foo', 'bar']
line: 3, error: line contains NULL byte
['recovered', 'yeah']
End of file!
yeah! it works (and the code is compatible with Python 2 and Python 3 as a bonus)
Move the try inside the for loop
import csv
with open('data.tsv', "rb") as f:
reader = csv.reader( f )
for row in reader:
try:
continue
except csv.Error, e:
print reader.line_num, e
pass
print "End of file!\n"

How do I tell a user where the line error occurred?

I'm trying to incorporate enumerate so I can give the user of the program where the line error was and the input of that line.
Here is my code:
elif response == 'data2':
print('Processing file:', response + '.txt')
try:
infile = open('data2.txt', 'r')
for line in infile:
amount = float(line)
total += amount
infile.close()
print(format(total, ',.2f'))
except IOError:
print("IO Error occurred trying to read the file.")
except ValueError:
print("Non-numeric data found in file:", response + '.txt')
except:
print("An error occurred.")
As you see I want the ValueErrorto output something along the lines of:
Non-numeric data found in file: data2.txt at line: 3 with input: three
hundred
I'm however stuck on how to accomplish this.
You can use the enumerate built-in function to get the line number:
elif response == 'data2':
print('Processing file:', response + '.txt')
try:
# Python allows you to iterate over a file object directly.
for line_no, line in enumerate(open('data2.txt', 'r')):
amount = float(line)
total += amount
print(format(total, ',.2f'))
except IOError:
print("IO Error occurred trying to read the file.")
except ValueError:
# I took the liberaty of formatting your output in a way
# that's a bit more readble than one long line of text.
print("Non-numeric data found in file: {}.txt at line: {}"
"with input: {}".format(response, line_no + 1, line))
except:
print("An error occurred.")
You need to track what line you are up to in the file, and catch the exception while reading.
for line_number, line in enumerate(infile, 1):
try:
amount = float(line)
except ValueError:
print(
"Non-numeric data found in file",
response + ".txt on line",
line_number,
"with input",
line
)
exit(1) # or whatever is appropriate for this script.

Python - ValueError: Expecting value: line 1 column 1 (char 0)

This produces and error:
ValueError: Expecting value: line 1 column 1 (char 0)
Here is my code:
...
print("Your phonebook contains the following entries:")
for name, number in phoneBook.items():
print("%s - %s" % (name, number))
while not created:
if not os.path.isfile('phonebook.json'):
with open('phonebook.json', 'wb') as f:
try:
f.write('{}')
except TypeError:
{}
created = True
print('New phonebook created!')
else:
print('Phonebook found!')
created = True
with open('phonebook.json', 'r') as f:
try:
phoneBook_Ori = json.load(f)
phoneBook_Upd = dict(phoneBook_Ori.items() + phoneBook.items())
phoneBook_Ori.write(phoneBook_Upd)
except EOFError:
{}
if EOFError:
with open('phonebook.json', 'w') as f:
json.dump(phoneBook, f)
else:
with open('phonebook.json', 'w') as f:
json.dump(phoneBook_Ori, f)
Has anyone got an idea of how to fix this?
I have also previously asked a question on this code here
I copy pasted your code in the python 2.x interpreter.
I received a ValueError regarding the phonebook.json file. I created a dummy file with:
{'sean':'310'}
My error reads:
ValueError: Expecting property name: line 1 column 2
This was the only way I was able to receive a ValueError.
Therefore, I believe your issue lies in the way the json is written in phonebook.json. Can you post its contents or a subset?
Also, using phoneBook_Ori.write() seems very questionable, as the json module has no method called write(), and the return on json.load(), if used on json objects, is a dictionary, which also cannot write(). You would probably want to use json.dump().
read more at:
https://docs.python.org/2/library/json.html
Anyway, I hope I was helpful.
I was getting this error whilst using json.load(var) with var containing an empty JSON response from a REST API call.
In your case, the JSON response (phonebook.json) must have records. This will fix the error.

Searching and extracting WH-word from a file line by line with Python and regex

I have a file that has one sentence per line. I am trying to read the file and search if the sentence is a question using regex and extract the wh-word from the sentences and save them back into another file according the order it appeared in the first file.
This is what I have so far..
def whWordExtractor(inputFile):
try:
openFileObject = open(inputFile, "r")
try:
whPattern = re.compile(r'(.*)who|what|how|where|when|why|which|whom|whose(\.*)', re.IGNORECASE)
with openFileObject as infile:
for line in infile:
whWord = whPattern.search(line)
print whWord
# Save the whWord extracted from inputFile into another whWord.txt file
# writeFileObject = open('whWord.txt','a')
# if not whWord:
# writeFileObject.write('None' + '\n')
# else:
# whQuestion = whWord
# writeFileObject.write(whQuestion+ '\n')
finally:
print 'Done. All WH-word extracted.'
openFileObject.close()
except IOError:
pass
The result after running the code above: set([])
Is there something I am doing wrong here? I would be grateful if someone can point it out to me.
Something like this:
def whWordExtractor(inputFile):
try:
with open(inputFile) as f1:
whPattern = re.compile(r'(.*)who|what|how|where|when|why|which|whom|whose(\.*)', re.IGNORECASE)
with open('whWord.txt','a') as f2: #open file only once, to reduce I/O operations
for line in f1:
whWord = whPattern.search(line)
print whWord
if not whWord:
f2.write('None' + '\n')
else:
#As re.search returns a sre.SRE_Match object not string, so you will have to use either
# whWord.group() or better use whPattern.findall(line)
whQuestion = whWord.group()
f2.write(whQuestion+ '\n')
print 'Done. All WH-word extracted.'
except IOError:
pass
Not sure if it's what you're looking for, but you could try something like this:
def whWordExtractor(inputFile):
try:
whPattern = re.compile(r'who|what|how|where|when|why|which|whom|whose', re.IGNORECASE)
with open(inputFile, "r") as infile:
for line in infile:
whMatch = whPattern.search(line)
if whMatch:
whWord = whMatch.group()
print whWord
# save to file
else:
# no match
except IOError:
pass
Change '(.*)who|what|how|where|when|why|which|whom|whose(\.*)' to
".*(?:who|what|how|where|when|why|which|whom|whose).*\."

Categories

Resources