Python: Using BeautifulSoup to save content to CSV

Python: Using BeautifulSoup to save content to CSV - python

With the Amazing help of Martijn i came this far in my python programming. However i tried to export the content of my cells to a csv file. I succeeded in importing it, but my resuit is as follows:
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('https://clinicaltrials.gov/ct2/show/study/NCT01718158?term=NCT01718158&rank=1&show_locs=Y#locn').read())
import csv
filename = 'Trial1.csv'
f = open(filename, 'wb')
with f:
writer = csv.writer(f)
for row in soup('table')[5].findAll('tr'):
tds = row('td')
result = u' '.join([cell.string for cell in tds if cell.string])
writer.writerow(result)
print result
f.close()
Result: |j|o|h|n|1|2|3
instead of |john|123| for each particular cell.
How do i correct this. Thanks.

Well the problem is your cell in tds contains , but some don't, which the writer got confused. As you know, it's csv writer (Comma Separate Value).
Anyway, just change the delimiter should correct the issue you had, like this:
...
# I'd suggest using with ... as f as in 1 line
with open(filename, 'wb') as f:
# set the delimiter to \t tab than comma
writer = csv.writer(f, delimiter='\t')
for row in soup('table')[5].findAll('tr'):
tds = row('td')
# you can writerow the list directly as it will convert it to string for you
writer.writerow([cell.string for cell in tds if cell.string])
...
Hope this helps.

Related

Replacing multiple space in every line by comma

How can I replace multiple whitespaces for every lines by comma, I'm using tabulate and been trying to figure out how
Here's a my code:
def print_extensions(self):
i = InternetExplorer(self.os)
content1 = tabulate(i.extensions(), headers="keys", tablefmt="plain")
text_file=open("output.csv","w")
text_file.write(content1)
text_file.close()
My CSV output:
where • = whitespace
path•••••name•••••id
C:\Windows•••••Microsoft•••••{CFBFAE00}
Expected CSV output:
path,name,id
C:\Windows,Microsoft,{CFBFAE00}

Probably you'd be better off using Python's standard
csv module.
For example:
import csv
data = ["some", "header"], ["and", "data"]
with open("test.csv", "w") as csv_file:
writer = csv.writer(csv_file)
writer.writerows(data)
produces the file
some,header
and,data
Tabulate is really more intended for pretty-printing. I certainly wouldn't suggest producing CSV files by parsing the output of tabulate with regular expressions.

import csv
data = ["some", "header"], ["and", "data"]
with open("test.csv", "w") as csv_file:
writer = csv.writer(csv_file)
writer.writerows(data)

Change CSV from writing in one column to multiple columns

I am attempting to write items that I feed my code with a for loop into a csv. I am able to write it to a csv however it throws everything into a single column like so...The output of my code. I've achieved this using this code:
workspace = arcpy.mapping.MapDocument("CURRENT")
with open (arcpy.GetParameterAsText(0),"wb") as csv_file:
writer = csv.writer(csv_file, delimiter = ',')
for textElement in arcpy.mapping.ListLayoutElements(workspace, "TEXT_ELEMENT","elem*"):
writer.writerow([textElement.name],)
writer.writerow([textElement.text],)
The issue I have is that I want to push each new instance of "elem" into a new column. If anyone can help me write some code to create a csv that looks like this... desired csv I would greatly appreciate it.

It looks like you need to do:
writer.writerow([textElement.name,textElement.text])
It would be easier to help if you provided something we could run. See: https://stackoverflow.com/help/mcve

I found this example a while back:
import csv
with open(newfilePath, "w") as f:
writer = csv.writer(f)
for row in rows:
writer.writerow(row)
I work in pandas a lot and this has worked great for me!

I created a dictionary of all of the elements (elem1, elem2, elem3) and the content. Than I just write the dictionary into the csv rather than write each individual element and content. I used this code:
dic = {}
workspace = arcpy.mapping.MapDocument(r"path.mxd")
for textElement in arcpy.mapping.ListLayoutElements(workspace, "TEXT_ELEMENT", "elem*"):
name = ''.join(textElement.name)
content = ''.join(textElement.text)
dic[str(name)] = str(content)
print dic
with open(r'test.csv', 'wb') as csv_file:
writer = csv.writer(csv_file, delimiter = ',')
for item in dic.items():
writer.writerow(item)

Conversion from XSLX to TXT asks for exit method

As per other posts I saw on stackoverflow, I wrote the following code to convert an XLSX to TXT however it throws: AttributeError: exit :
import xlrd
import csv
with xlrd.open_workbook('data.xlsx').sheet_by_index(0) as in_xslx:
in_reader = csv.reader(in_xslx)
with open("data.txt", "w", newline='', encoding='utf8') as out_text:
out_writer = csv.writer(out_text, delimiter = '\t')
for row in in_reader:
out_writer.writerow(row)
However it successfully converts a CSV if I replace the first two rows with:
with open("data.csv", "r", encoding='utf-8') as in_csv:
in_reader = csv.reader(in_csv)
Any idea why is that happening when converting XSLX->TXT and how to correct?
Thank you

What you need is:
import xlrd
import csv
with open("data.txt", "w") as out_text:
# define output writer
out_writer = csv.writer(out_text, delimiter = '\t')
# Open and read an Excel file
data_file = xlrd.open_workbook('data.xlsx')
# get the first worksheet
worksheet= data_file.sheet_by_index(0)
# get the row values and write into output file
for rownum in xrange( worksheet.nrows ):
out_writer.writerow(worksheet.row_values(rownum))

Python script to turn input csv columns into output csv row values

I have an input csv that look like
email,trait1,trait2,trait3
foo#gmail,biz,baz,buzz
bar#gmail,bizzy,bazzy,buzzy
foobars#gmail,bizziest,bazziest,buzziest
and I need the output format to look like
Indv,AttrName,AttrValue,Start,End
foo#gmail,"trait1",biz,,,
foo#gmail,"trait2",baz,baz,,
foo#gmail,"trait3",buzz,,,
For each row in my input file I need to write a row for the N-1 columns in the input csv. The Start and End fields in the output file can be empty in some cases.
I'm trying to read in the data using a DictReader. So for i've been able to read in the data with
import unicodecsv
import os
import codecs
with open('test.csv') as csvfile:
reader = unicodecsv.csv.DictReader(csvfile)
outfile = codecs.open("test-write", "w", "utf-8")
outfile.write("Indv", "ATTR", "Value", "Start","End\n")
for row in reader:
outfile.write([row['email'],"trait1",row['trait1'],'',''])
outfile.write([row['email'],"trait2",row['trait2'],row['trait2'],''])
outfile.write([row['email'],"trait3",row['trait3'],'','')
Which doesn't work. (I think I need to cast the list to a string), and is also very brittle as I'm hardcoding the column names for each row. The bigger issue is that the data within the for loop isn't written to "test-write". Only the line
outfile.write("Indv", "ATTR", "Value", "Start","End\n") actually write out to the file. Is DictReader the appropriate class to use in my case?

This uses a unicodecsv.DictWriter and the zip() function to do what you want, and the code is fairly readable in my opinion.
import unicodecsv
import os
import codecs
with open('test.csv') as infile, \
codecs.open('test-write.csv', 'w', 'utf-8') as outfile:
reader = unicodecsv.DictReader(infile)
fieldnames = 'Indv,AttrName,AttrValue,Start,End'.split(',')
writer = unicodecsv.DictWriter(outfile, fieldnames)
writer.writeheader()
for row in reader:
email = row['email']
trait1, trait2, trait3 = row['trait1'], row['trait2'], row['trait3']
writer.writerows([ # writes three rows of output from each row of input
dict(zip(fieldnames, [email, 'trait1', trait1])),
dict(zip(fieldnames, [email, 'trait2', trait2, trait2])),
dict(zip(fieldnames, [email, 'trait3', trait3]))])
Here's the contents of the test-write.csv file it produced from your example input csv file:
Indv,AttrName,AttrValue,Start,End
foo#gmail,trait1,biz,,
foo#gmail,trait2,baz,baz,
foo#gmail,trait3,buzz,,
bar#gmail,trait1,bizzy,,
bar#gmail,trait2,bazzy,bazzy,
bar#gmail,trait3,buzzy,,
foobars#gmail,trait1,bizziest,,
foobars#gmail,trait2,bazziest,bazziest,
foobars#gmail,trait3,buzziest,,

I may be completely off since I don't do a lot of work with unicode, but it seems to me that the following should work:
import csv
with open('test.csv', 'ur') as csvin, open('test-write', 'uw') as csvout:
reader = csv.DictReader(csvin)
writer = csv.DictWriter(csvout, fieldnames=['Indv', 'AttrName',
'AttrValue', 'Start', 'End'])
for row in reader:
for traitnum in range(1, 4):
key = "trait{}".format(traitnum)
writer.writerow({'Indv': row['email'], 'AttrName': key,
'AttrValue': row[key]})

import pandas as pd
pd1 = pd.read_csv('input_csv.csv')
pd2 = pd.melt(pd1, id_vars=['email'], value_vars=['trait1','trait2','trait3'], var_name='AttrName', value_name='AttrValue').rename(columns={'email': 'Indv'}).sort(columns=['Indv','AttrName']).reset_index(drop=True)
pd2.to_csv('output_csv.csv', index=False)
Unclear on what the Start and End fields represent, but this gets you everything else.

Separate data with a comma CSV Python

I have some data that needs to be written to a CSV file. The data is as follows
A ,B ,C
a1,a2 ,b1 ,c1
a2,a4 ,b3 ,ct
The first column has comma inside it. The entire data is in a list that I'd like to write to a CSV file, delimited by commas and without disturbing the data in column A. How can I do that? Mentioning delimiter = ',' splits it into four columns on the whole.

Just use the csv.writer from the csv module.
import csv
data = [['A','B','C']
['a1,a2','b1','c1']
['a2,a4','b3','ct']]
fname = "myfile.csv"
with open(fname,'wb') as f:
writer = csv.writer(f)
for row in data:
writer.writerow(row)
https://docs.python.org/library/csv.html#csv.writer

No need to use the csv module since the ',' in the first column is already part of your data, this will work:
with open('myfile.csv', 'w') as f:
for row in data:
f.write(', '.join(row))
f.write('\n')

You could try the below.
Code:
import csv
import re
with open('infile.csv', 'r') as f:
lst = []
for line in f:
lst.append(re.findall(r',?(\S+)', line))
with open('outfile.csv', 'w', newline='') as w:
writer = csv.writer(w)
for row in lst:
writer.writerow(row)
Output:
A,B,C
"a1,a2",b1,c1
"a2,a4",b3,ct

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Using BeautifulSoup to save content to CSV - python

Related

Replacing multiple space in every line by comma

Change CSV from writing in one column to multiple columns

Conversion from XSLX to TXT asks for exit method

Python script to turn input csv columns into output csv row values

Separate data with a comma CSV Python

Categories

Resources