Collecting comments from Reddit, outputting to CSV file

Collecting comments from Reddit, outputting to CSV file - python

I'm trying to scrape comments from a certain submission on Reddit and output them to a CSV file.
import praw
import csv
reddit = praw.Reddit(client_id='ClientID', client_secret='ClientSecret', user_agent='UserAgent')
Submission = reddit.submission(id="SubmissionID")
with open('Reddit.csv', 'w') as csvfile:
for comment in submission.comments:
csvfile.write(comment.body)
The problem is that for each cell the comments seem to be randomly split up. I want each comment in its own cell. Any ideas on how to achieve this?

You are importing the csv library but you are not actually utilizing it. Utilize it and your problem may go away.
https://docs.python.org/3/library/csv.html#csv.DictWriter
import csv
# ...
comment = "this string was created from your code"
# ...
with open('names.csv', 'w', newline='') as csvfile:
fieldnames = ['comment']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerow({'comment': comment})

To write a CSV file in Python, use the csv module, specifically csv.writer(). You import this module at the top of your code, but you never use it.
Using this in your code, this looks like:
with open('Reddit.csv', 'w') as csvfile:
comment_writer = csv.writer(csvfile)
for comment in submission.comments:
comment_writer.writerow([comment.body])
Here, we use csv.writer() to create a CSV writer from the file that we've opened, and we call it comment_writer. Then, for each comment, we write another row to the CSV file. The row is represented as a list. Since we only have one piece of information to write on each row, the list contains just one item. The row is [comment.body].
The csv module takes care of making sure that values with new lines, commas, or other special characters are properly formatted as CSV values.
Note that, for some submissions with many comments, your PRAW code might raise an exception along the lines of 'MoreComments' object has no attribute 'body'. The PRAW docs discuss this, and I encourage you to read that to learn more, but know that we can avoid this happening in code by further modifying our loop:
from praw.models import Comment
# ...
with open('Reddit.csv', 'w') as csvfile:
comment_writer = csv.writer(csvfile)
for comment in submission.comments:
if isinstance(comment, Comment):
comment_writer.writerow([comment.body])
Also, your code only gets the top level comments of a submission. If you're interested in more, see this question, which is about how to get more than just top-level comments from a submission.

I'm guessing that the cells are not being split randomly, but being split at a comma, space semi-colon. You can choose what character you want the cells to be split at by using the delimiter parameter.
import csv
with open('Reddit.csv', 'w') as csvfile:
comments = ['comment one','comment two','comment three']
csv_writer = csv.writer(csvfile, delimiter='-')
csv_writer.writerow(comments)

Related

Multiline CSV read using Python3

Everyday we get CSV file from vendor and we need to parse them and insert it to database. We use single Python3 program for all the tasks.
The problem happening is with multiline CSV files, where the contents in the second lines are skipped.
48.11363;11.53402;81369;München;"";1.0;1962;I would need
help from
Stackoverflow;"";"";"";289500.0;true;""
Here the field "I would need help from Stackoverflow" is spread in 3 lines.
The problem that happens is python3 only considers "I would Need" as a record and skips the rest of the part.
At present I am using below options to read from database :
with open(file_path, newline='', encoding='utf-8') as f:
reader = csv.reader(f, delimiter=',' , quotechar='"', quoting=csv.QUOTE_MINIMAL)
for row in reader:
{MY LOGIC}
Is there any way to include multiline CSV as a single record.
I understand, In pyspark, there is an option of option("multiline",True) but we don't want to use pyspark in first place.
Looking for options.
Thanks in Advance

Python CSV write output error when using DictReader and DictWriter for output

I am trying to create a script to reformat a .CSV file. The read file starts as pipe delimited and get written as comma.
All it needs to do is index the columns and output them to file the way I want.
I am able to make it work perfectly when printing the output to screen (see two commented lines at bottom of code), but when I attempt to write to file I get the following error. I have tried to change the format of csv_writer.writerow({'F3'}) several different ways. It would seem I don't completely understand how to use writerow(). Or if I am completely missing something to make it function properly.
I also have to put static fields in, in front of the index fields. (i.e I need a "1" put in front of the F3 field) Is there an additional trick to that?
import csv
csv.register_dialect('piper', delimiter='|')
with open('pbfile.txt', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file, dialect='piper',quoting=csv.QUOTE_MINIMAL)
with open('ouput2.csv', 'w', newline='') as new_file:
fieldnames = ['F0','F1','F2','F3','F4','F5','F6']
csv_writer = csv.DictWriter(new_file, delimiter=',',fieldnames=fieldnames)
for line in csv_reader:
#csv_writer.writeheader()
csv_writer.writerow({'F3'})
csv_writer.writerow({'F1', 'F2', 'F6'})
#print('1', line['F3'])
#print('380', line['F1'], line['F2'], line['F6'])

Write matching rows in csvfile to a new csv file using Python

I am new to python and I am trying to reduce the csv file records by matching specific strings. I want to write the rows of the matching one to a new csv file.
Here is an example dataset:
What I am trying to do is search by going through all of the rows for specific matching keywords (e.g. only write the rows containing WARRANT ARREST as can be seen on the image) to a new csv file.
Here is my code for so far:
import csv
with open('test.csv', 'a') as myfile:
with open('train3.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',')
for r in spamreader:
for field in row:
if field == "OTHER OFFENSES":
myfile.write(r)
test.csv is empty and train3 contains all the records.

You can often learn a lot about what's going on by simply adding some else statements. For instance, after if field == "OTHER OFFENSES":, you could write else: print(field) or else: print(r). It might become obvious why your comparison fails once you see the actual data.
There might also be a newline character after each row that's messing up the comparison (that was the cause of the problem the last time someone asked about this and I answered). Perhaps python sees OTHER OFFENSES\n which does not equal OTHER OFFENCES. To match these, use a less strict comparison or strip() the field.
Try replacing if field == "OTHER OFFENSES" with if "OTHER OFFENSES" in field:. When you do == you're asking for an exact match whereas something in something_else will search the whole line of text for something.

Try the following approach, it is a bit difficult to test as your data cannot be copy/pasted:
import csv
with open('test.csv', 'a', newline='') as f_outputcsv, open('train3.csv', 'r') as f_inputcsv:
csv_spamreader = csv.reader(f_inputcsv)
csv_writer = csv.writer(f_outputcsv)
for row in csv_spamreader:
for field in row:
if field == "WARRANT ARREST":
csv_writer.writerow(row)
break
This uses a csv.writer instance to write whole rows back to your output file.

Python: Unicode output to csv using unicodecsv

I writing some unicode output to csv using the unicodecsv module. Everything is working as expected, but I'm trying to build it out by adding some headers or field names. So far, I've tried a number of different ways, but I can't come up with how to add the field names.
I've tried other unicode solutions and this module seems to be the most elegant to implement so I'm trying to use it if possible. If there are other suggestions, I'm up for them. Any ideas?? Please see relevant code below.
import unicodecsv
with open('c:\pull.csv', 'wb+') as f:
csv_writer = unicodecsv.writer(f, encoding='utf-8')
for i in changes['user']['login'], changes['title'], str(changes['changed_files']), str(changes['commits']) :
csv_writer.writerow([changes['user']['login'], changes['title'],changes['changed_files'], changes['commits']])
Sample output for changes in the csv file:
'John Doe', 'Some Title', 1, 1

For the json data you have, there is only one user entry, so the following should work:
with open('c:\pull.csv', 'wb+') as f:
csv_writer = unicodecsv.writer(f, encoding='utf-8')
# Write a header row (do once)
# csv_writer.writerow(["login", "title", "changed_files", "commits"])
# Write data row
csv_writer.writerow([changes['user']['login'], changes['title'],changes['changed_files'], changes['commits']])
If you want a header row, uncomment the line. This would then give you an output file:
login,title,changed_files,commits
octocat,new-feature,5,3

How to import data from a CSV file and store it in a variable?

I am extremely new to python 3 and I am learning as I go here. I figured someone could help me with a basic question: how to store text from a CSV file as a variable to be used later in the code. So the idea here would be to import a CSV file into the python interpreter:
import csv
with open('some.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
...
and then extract the text from that file and store it as a variable (i.e. w = ["csv file text"]) to then be used later in the code to create permutations:
print (list(itertools.permutations(["w"], 2)))
If someone could please help and explain the process, it would be very much appreciated as I am really trying to learn. Please let me know if any more explanation is needed!

itertools.permutations() wants an iterable (e.g. a list) and a length as its arguments, so your data structure needs to reflect that, but you also need to define what you are trying to achieve here. For example, if you wanted to read a CSV file and produce permutations on every individual CSV field you could try this:
import csv
with open('some.csv', newline='') as f:
reader = csv.reader(f)
w = []
for row in reader:
w.extend(row)
print(list(itertools.permutations(w, 2)))
The key thing here is to create a flat list that can be passed to itertools.permutations() - this is done by intialising w to an empty list, and then extending its elements with the elements/fields from each row of the CSV file.
Note: As pointed out by #martineau, for the reasons explained here, the file should be opened with newline='' when used with the Python 3 csv module.

If you want to use Python 3 (as you state in the question) and to process the CSV file using the standard csv module, you should be careful about how to open the file. So far, your code and the answers use the Python 2 way of opening the CSV file. The things has changed in Python 3.
As shengy wrote, the CSV file is just a text file, and the csv module gets the elements as strings. Strings in Python 3 are unicode strings. Because of that, you should open the file in the text mode, and you should supply the encoding. Because of the nature of CSV file processing, you should also use the newline='' when opening the file.
Now extending the explanation of Burhan Khalid... When reading the CSV file, you get the rows as lists of strings. If you want to read all content of the CSV file into memory and store it in a variable, you probably want to use the list of rows (i.e. list of lists where the nested lists are the rows). The for loop iterates through the rows. The same way the list() function iterates through the sequence (here through the sequence of rows) and build the list of the items. To combine that with the wish to store everything in the content variable, you can write:
import csv
with open('some.csv', newline='', encoding='utf_8') as f:
reader = csv.reader(f)
content = list(reader)
Now you can do your permutation as you wish. The itertools is the correct way to do the permutations.

import csv
data = csv.DictReader(open('FileName.csv', 'r'))
print data.fieldnames
output = []
for each_row in data:
row = {}
try:
p = dict((k.strip(), v) for k, v in p.iteritems() if v.lower() != 'null')
except AttributeError, e:
print e
print p
raise Exception()
//based on the number of column
if p.get('col1'):
row['col1'] = p['col1']
if p.get('col2'):
row['col2'] = p['col2']
output.append(row)
Finally all data stored in output variable

Is this what you need?
import csv
with open('some.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
rows = list(reader)
print('The csv file had {} rows'.format(len(rows)))
for row in rows:
do_stuff(row)
do_stuff_to_all_rows(rows)
The interesting line is rows = list(reader), which converts each row from the csv file (which will be a list), into another list rows, in effect giving you a list of lists.
If you had a csv file with three rows, rows would be a list with three elements, each element a row representing each line in the original csv file.

If all you care about is to read the raw text in the file (csv or not) then:
with open('some.csv') as f:
w = f.read()
will be a simple solution to having w="csv, file, text\nwithout, caring, about columns\n"

You should try pandas, which work both with Python 2.7 and Python 3.2+ :
import pandas as pd
csv = pd.read_csv("your_file.csv")
Then you can handle you data easily.
More fun here

First, a csv file is a text file too, so everything you can do with a file, you can do it with a csv file. That means f.read(), f.readline(), f.readlines() can all be used. see detailed information of these functions here.
But, as your file is a csv file, you can utilize the csv module.
# input.csv
# 1,david,enterprise
# 2,jeff,personal
import csv
with open('input.csv') as f:
reader = csv.reader(f)
for serial, name, version in reader:
# The csv module already extracts the information for you
print serial, name, version
More details about the csv module is here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Collecting comments from Reddit, outputting to CSV file - python

Related

Multiline CSV read using Python3

Python CSV write output error when using DictReader and DictWriter for output

Write matching rows in csvfile to a new csv file using Python

Python: Unicode output to csv using unicodecsv

How to import data from a CSV file and store it in a variable?

Categories

Resources