Python: Using BeautifulSoup to save content to CSV - python

With the Amazing help of Martijn i came this far in my python programming. However i tried to export the content of my cells to a csv file. I succeeded in importing it, but my resuit is as follows:
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('https://clinicaltrials.gov/ct2/show/study/NCT01718158?term=NCT01718158&rank=1&show_locs=Y#locn').read())
import csv
filename = 'Trial1.csv'
f = open(filename, 'wb')
with f:
writer = csv.writer(f)
for row in soup('table')[5].findAll('tr'):
tds = row('td')
result = u' '.join([cell.string for cell in tds if cell.string])
writer.writerow(result)
print result
f.close()
Result: |j|o|h|n|1|2|3
instead of |john|123| for each particular cell.
How do i correct this. Thanks.

Well the problem is your cell in tds contains , but some don't, which the writer got confused. As you know, it's csv writer (Comma Separate Value).
Anyway, just change the delimiter should correct the issue you had, like this:
...
# I'd suggest using with ... as f as in 1 line
with open(filename, 'wb') as f:
# set the delimiter to \t tab than comma
writer = csv.writer(f, delimiter='\t')
for row in soup('table')[5].findAll('tr'):
tds = row('td')
# you can writerow the list directly as it will convert it to string for you
writer.writerow([cell.string for cell in tds if cell.string])
...
Hope this helps.

Related

Replacing multiple space in every line by comma

How can I replace multiple whitespaces for every lines by comma, I'm using tabulate and been trying to figure out how
Here's a my code:
def print_extensions(self):
i = InternetExplorer(self.os)
content1 = tabulate(i.extensions(), headers="keys", tablefmt="plain")
text_file=open("output.csv","w")
text_file.write(content1)
text_file.close()
My CSV output:
where • = whitespace
path•••••name•••••id
C:\Windows•••••Microsoft•••••{CFBFAE00}
Expected CSV output:
path,name,id
C:\Windows,Microsoft,{CFBFAE00}
Probably you'd be better off using Python's standard
csv module.
For example:
import csv
data = ["some", "header"], ["and", "data"]
with open("test.csv", "w") as csv_file:
writer = csv.writer(csv_file)
writer.writerows(data)
produces the file
some,header
and,data
Tabulate is really more intended for pretty-printing. I certainly wouldn't suggest producing CSV files by parsing the output of tabulate with regular expressions.
import csv
data = ["some", "header"], ["and", "data"]
with open("test.csv", "w") as csv_file:
writer = csv.writer(csv_file)
writer.writerows(data)

Change CSV from writing in one column to multiple columns

I am attempting to write items that I feed my code with a for loop into a csv. I am able to write it to a csv however it throws everything into a single column like so...The output of my code. I've achieved this using this code:
workspace = arcpy.mapping.MapDocument("CURRENT")
with open (arcpy.GetParameterAsText(0),"wb") as csv_file:
writer = csv.writer(csv_file, delimiter = ',')
for textElement in arcpy.mapping.ListLayoutElements(workspace, "TEXT_ELEMENT","elem*"):
writer.writerow([textElement.name],)
writer.writerow([textElement.text],)
The issue I have is that I want to push each new instance of "elem" into a new column. If anyone can help me write some code to create a csv that looks like this... desired csv I would greatly appreciate it.
It looks like you need to do:
writer.writerow([textElement.name,textElement.text])
It would be easier to help if you provided something we could run. See: https://stackoverflow.com/help/mcve
I found this example a while back:
import csv
with open(newfilePath, "w") as f:
writer = csv.writer(f)
for row in rows:
writer.writerow(row)
I work in pandas a lot and this has worked great for me!
I created a dictionary of all of the elements (elem1, elem2, elem3) and the content. Than I just write the dictionary into the csv rather than write each individual element and content. I used this code:
dic = {}
workspace = arcpy.mapping.MapDocument(r"path.mxd")
for textElement in arcpy.mapping.ListLayoutElements(workspace, "TEXT_ELEMENT", "elem*"):
name = ''.join(textElement.name)
content = ''.join(textElement.text)
dic[str(name)] = str(content)
print dic
with open(r'test.csv', 'wb') as csv_file:
writer = csv.writer(csv_file, delimiter = ',')
for item in dic.items():
writer.writerow(item)

Conversion from XSLX to TXT asks for exit method

As per other posts I saw on stackoverflow, I wrote the following code to convert an XLSX to TXT however it throws: AttributeError: exit :
import xlrd
import csv
with xlrd.open_workbook('data.xlsx').sheet_by_index(0) as in_xslx:
in_reader = csv.reader(in_xslx)
with open("data.txt", "w", newline='', encoding='utf8') as out_text:
out_writer = csv.writer(out_text, delimiter = '\t')
for row in in_reader:
out_writer.writerow(row)
However it successfully converts a CSV if I replace the first two rows with:
with open("data.csv", "r", encoding='utf-8') as in_csv:
in_reader = csv.reader(in_csv)
Any idea why is that happening when converting XSLX->TXT and how to correct?
Thank you
What you need is:
import xlrd
import csv
with open("data.txt", "w") as out_text:
# define output writer
out_writer = csv.writer(out_text, delimiter = '\t')
# Open and read an Excel file
data_file = xlrd.open_workbook('data.xlsx')
# get the first worksheet
worksheet= data_file.sheet_by_index(0)
# get the row values and write into output file
for rownum in xrange( worksheet.nrows ):
out_writer.writerow(worksheet.row_values(rownum))

Python script to turn input csv columns into output csv row values

I have an input csv that look like
email,trait1,trait2,trait3
foo#gmail,biz,baz,buzz
bar#gmail,bizzy,bazzy,buzzy
foobars#gmail,bizziest,bazziest,buzziest
and I need the output format to look like
Indv,AttrName,AttrValue,Start,End
foo#gmail,"trait1",biz,,,
foo#gmail,"trait2",baz,baz,,
foo#gmail,"trait3",buzz,,,
For each row in my input file I need to write a row for the N-1 columns in the input csv. The Start and End fields in the output file can be empty in some cases.
I'm trying to read in the data using a DictReader. So for i've been able to read in the data with
import unicodecsv
import os
import codecs
with open('test.csv') as csvfile:
reader = unicodecsv.csv.DictReader(csvfile)
outfile = codecs.open("test-write", "w", "utf-8")
outfile.write("Indv", "ATTR", "Value", "Start","End\n")
for row in reader:
outfile.write([row['email'],"trait1",row['trait1'],'',''])
outfile.write([row['email'],"trait2",row['trait2'],row['trait2'],''])
outfile.write([row['email'],"trait3",row['trait3'],'','')
Which doesn't work. (I think I need to cast the list to a string), and is also very brittle as I'm hardcoding the column names for each row. The bigger issue is that the data within the for loop isn't written to "test-write". Only the line
outfile.write("Indv", "ATTR", "Value", "Start","End\n") actually write out to the file. Is DictReader the appropriate class to use in my case?
This uses a unicodecsv.DictWriter and the zip() function to do what you want, and the code is fairly readable in my opinion.
import unicodecsv
import os
import codecs
with open('test.csv') as infile, \
codecs.open('test-write.csv', 'w', 'utf-8') as outfile:
reader = unicodecsv.DictReader(infile)
fieldnames = 'Indv,AttrName,AttrValue,Start,End'.split(',')
writer = unicodecsv.DictWriter(outfile, fieldnames)
writer.writeheader()
for row in reader:
email = row['email']
trait1, trait2, trait3 = row['trait1'], row['trait2'], row['trait3']
writer.writerows([ # writes three rows of output from each row of input
dict(zip(fieldnames, [email, 'trait1', trait1])),
dict(zip(fieldnames, [email, 'trait2', trait2, trait2])),
dict(zip(fieldnames, [email, 'trait3', trait3]))])
Here's the contents of the test-write.csv file it produced from your example input csv file:
Indv,AttrName,AttrValue,Start,End
foo#gmail,trait1,biz,,
foo#gmail,trait2,baz,baz,
foo#gmail,trait3,buzz,,
bar#gmail,trait1,bizzy,,
bar#gmail,trait2,bazzy,bazzy,
bar#gmail,trait3,buzzy,,
foobars#gmail,trait1,bizziest,,
foobars#gmail,trait2,bazziest,bazziest,
foobars#gmail,trait3,buzziest,,
I may be completely off since I don't do a lot of work with unicode, but it seems to me that the following should work:
import csv
with open('test.csv', 'ur') as csvin, open('test-write', 'uw') as csvout:
reader = csv.DictReader(csvin)
writer = csv.DictWriter(csvout, fieldnames=['Indv', 'AttrName',
'AttrValue', 'Start', 'End'])
for row in reader:
for traitnum in range(1, 4):
key = "trait{}".format(traitnum)
writer.writerow({'Indv': row['email'], 'AttrName': key,
'AttrValue': row[key]})
import pandas as pd
pd1 = pd.read_csv('input_csv.csv')
pd2 = pd.melt(pd1, id_vars=['email'], value_vars=['trait1','trait2','trait3'], var_name='AttrName', value_name='AttrValue').rename(columns={'email': 'Indv'}).sort(columns=['Indv','AttrName']).reset_index(drop=True)
pd2.to_csv('output_csv.csv', index=False)
Unclear on what the Start and End fields represent, but this gets you everything else.

Separate data with a comma CSV Python

I have some data that needs to be written to a CSV file. The data is as follows
A ,B ,C
a1,a2 ,b1 ,c1
a2,a4 ,b3 ,ct
The first column has comma inside it. The entire data is in a list that I'd like to write to a CSV file, delimited by commas and without disturbing the data in column A. How can I do that? Mentioning delimiter = ',' splits it into four columns on the whole.
Just use the csv.writer from the csv module.
import csv
data = [['A','B','C']
['a1,a2','b1','c1']
['a2,a4','b3','ct']]
fname = "myfile.csv"
with open(fname,'wb') as f:
writer = csv.writer(f)
for row in data:
writer.writerow(row)
https://docs.python.org/library/csv.html#csv.writer
No need to use the csv module since the ',' in the first column is already part of your data, this will work:
with open('myfile.csv', 'w') as f:
for row in data:
f.write(', '.join(row))
f.write('\n')
You could try the below.
Code:
import csv
import re
with open('infile.csv', 'r') as f:
lst = []
for line in f:
lst.append(re.findall(r',?(\S+)', line))
with open('outfile.csv', 'w', newline='') as w:
writer = csv.writer(w)
for row in lst:
writer.writerow(row)
Output:
A,B,C
"a1,a2",b1,c1
"a2,a4",b3,ct

Categories

Resources