I have an input file that contains data in the same format repeatedly across 5 rows. I need to format this data into one row (CSV file) and have only few fields relevant to me. How do i achieve the mentioned output with the input file provided.
Note - I'm very new to learning any language and haven't reached to this depth of details yet to write my own. I have already written the code where i'm importing the input file, reaching to a specif word and then printing the rest of the data(this is where i need help as i don't need all the information in the input as using space is delimiter is not giving the output in correct columns). I have also written the code to write the output in a csv file.
Note 2 - I'm very to this forum as well and kindly excuse me in case i have made any posting in posting my query.
Input -
Input File
Output -
Output File
import itertools, csv
You should read in the file and parse it manually, then use the csv module to write it to a .csv file:
import re
with open('myfile.txt', 'r') as f:
lines = f.readlines()
# divide on whitespace characters, but not single spaces
lines = [re.split("\s\s+", line) for line in lines]
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
for line in lines:
writer.writerow(lines)
But this will include every piece of data. You can iterate through lines and remove the fields you don't want to keep. So before you do the csv writing, you could do:
def filter_line(line):
# see how the input file was parsed
print(line)
# for example, only keep the first 2 columns
return [line[0], line[1]]
lines = [filter_line(line) for line in lines]
Related
I have several lists of various lengths that I am trying to export to CSV so they can be called up again later when the program is launched again. Everytime I try the following, it only outputs a single line of data to the csv:
export = [solutions, fedata, bbcom, fenxt, ten_99, ten_99links]
with open('test.csv','w') as f:
writer = csv.writer(f)
# writer.writerow([<header row>]) #uncomment this and put header, if you want header row , if not leave it commented.
for x in zip(*export):
writer.writerow(x)
some of the lists currently only have 1 item in them, but I am trying to basically make a CSV be a database for this program as we will be adding more to the lists as it is expanded. Any help is appreciated, I am really banging my head against the wall here.
I tried the pasted code but it only outputs a single line of data
Do you want every item to be on a newline or every list on a newline?
If you want an empty line between the prints then you can remove the newline=''
Try this:
with open('test.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(export)
Currently the csv file is saved in line break mode. But it should be separated by comma for inputting these datas as an array.
The current csv file:
test#eaxmple.com
test#eaxmple.com
test#eaxmple.com
The ideal csv file:
test#eaxmple.com, test#eaxmple.com, test#eaxmple.com
The code:
def get_addresses():
with open('./addresses.csv') as f:
addresses_file = csv.reader(f)
# Need to be converted
How can I convert it? I hope to use Python.
tried this.
with open('./addresses.txt') as input, open('./addresses.csv', 'w') as output:
output.write(','.join(input.readlines()))
output.write('\n')
the result:
test#eaxmple.com
,test#eaxmple.com
,test#eaxmple.com
with open('./addresses.txt') as f:
print(",".join(f.read().splitlines()))
Load the original file into pandas using:
import pandas as pd
df = pd.read_csv({YOUR_FILE}, escapechar='\\')
Then export it back to .csv (by default this will be comma separated).
df.to_csv({YOUR_FILE})
For this simple task, just read them into an array, then join the array on commas.
with open('./addresses.txt') as input, open('./addresses.csv', 'w') as output:
output.write(','.join(input.read().splitlines()))
output.write('\n')
This ignores any complications in the CSV formatting - if your data could contain commas (which are reserved as the field separator) or double quotes (which are reserved for quoting other reserved characters) you will want to switch to the proper csv module for output and perhaps for input.
Overwriting your input file is also an unnecessary complication, so I suggest you rename the input file to addresses.txt and use addresses.csv only for output.
Demo: https://repl.it/repls/AdequateStunningVideogames
Another common trick is to read one line at a time, and write a separator before each output except the first. This is more scalable for large input files.
with open blah blah blah ...:
separator = '' # for first line
for line in input:
output.write(separator)
output.write(line)
separator = ',' # for subsequent input lines
output.write('\n')
I have quite a messy txt file which I need to convert to a dataframe to use as reference data. An Excerpt is shown below:
http://amdc.in2p3.fr/nubase/nubase2016.txt
I've cleaned it up the best I can but to cut a long story short I would like to space delimit most of each line and then fixed delimit the last column. i.e. ignore the spaces in the last section.
Cleaned Data Text File
Can anyone point me in the right direction of a resource which can do this? Not sure if Pandas copes with this?
Kenny
P.S. I have found some great resources to clean up the multiple whitespaces and replace the line breaks. Sorry can't find the original reference, so see attached.
fin = open("Input.txt", "rt")
fout = open("Ouput.txt", "wt")
for line in fin:
fout.write(re.sub(' +', ' ', line).strip() + "\n")
fin.close()
fout.close()
So what i would do is very simple, i would clean up the data as much as possible and then convert it to a csv file, because they are easy to use. An then i would step by step load it into a pandas dataframe and change if it needed.
with open("NudatClean.txt") as f:
text=f.readlines()
import csv
with open('dat.csv', 'w', newline='') as file:
writer = csv.writer(file)
for i in text:
l=i.split(' ')
row=[]
for a in l:
if a!='':
row.append(a)
print(row)
writer.writerow(row)
That should to the job for the beginning. But I don't know what you want remove exactly so I think the rest should be pretty clear.
The way I managed to do this was split the csv into two parts then recombine. Not particularly elegant but did the job I needed.
Split by Column
I have a file in the following format
name#company.com, information
name#company2.com, information
....
What I need to do is read in the file and output the email address only to a file. I have the following code created
with open ('n-emails.txt') as f:
lines = f.readlines()
print (lines)
Can someone please show me how to only get the email part of the file and how to output it to a file this is all done on a mac.
2 different ways of doing it:
without csv module: read each line, split according to tokens, strip the blanks, print:
with open ('n-emails.txt') as f:
for line in f:
toks = line.split(",")
if toks:
print(toks[0].strip())
with the csv module, map the opened file on a csv reader, iterate on the rows, print first (stripped) row.
import csv
with open ('n-emails.txt') as f:
cr = csv.reader(delimiter=",")
for row in cr:
print(row[0].strip())
the second method has the advantage of being robust to commas contained in cells, quotes, ... that's why I recommend it.
I have a text file (.txt) which could be in tab separated format or pipe separated format, and I need to convert it into CSV file format. I am using python 2.6. Can any one suggest me how to identify the delimiter in a text file, read the data and then convert that into comma separated file.
Thanks in advance
I fear that you can't identify the delimiter without knowing what it is. The problem with CSV is, that, quoting ESR:
the Microsoft version of CSV is a textbook example of how not to design a textual file format.
The delimiter needs to be escaped in some way if it can appear in fields. Without knowing, how the escaping is done, automatically identifying it is difficult. Escaping could be done the UNIX way, using a backslash '\', or the Microsoft way, using quotes which then must be escaped, too. This is not a trivial task.
So my suggestion is to get full documentation from whoever generates the file you want to convert. Then you can use one of the approaches suggested in the other answers or some variant.
Edit:
Python provides csv.Sniffer that can help you deduce the format of your DSV. If your input looks like this (note the quoted delimiter in the first field of the second row):
a|b|c
"a|b"|c|d
foo|"bar|baz"|qux
You can do this:
import csv
csvfile = open("csvfile.csv")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.DictReader(csvfile, dialect=dialect)
for row in reader:
print row,
# => {'a': 'a|b', 'c': 'd', 'b': 'c'} {'a': 'foo', 'c': 'qux', 'b': 'bar|baz'}
# write records using other dialect
Your strategy could be the following:
parse the file with BOTH a tab-separated csv reader and a pipe-separated csv reader
calculate some statistics on resulting rows to decide which resultset is the one you want to write. An idea could be counting the total number of fields in the two recordset (expecting that tab and pipe are not so common). Another one (if your data is strongly structured and you expect the same number of fields in each line) could be measuring the standard deviation of number of fields per line and take the record set with the smallest standard deviation.
In the following example you find the simpler statistic (total number of fields)
import csv
piperows= []
tabrows = []
#parsing | delimiter
f = open("file", "rb")
readerpipe = csv.reader(f, delimiter = "|")
for row in readerpipe:
piperows.append(row)
f.close()
#parsing TAB delimiter
f = open("file", "rb")
readertab = csv.reader(f, delimiter = "\t")
for row in readerpipe:
tabrows.append(row)
f.close()
#in this example, we use the total number of fields as indicator (but it's not guaranteed to work! it depends by the nature of your data)
#count total fields
totfieldspipe = reduce (lambda x,y: x+ y, [len(f) for f in piperows])
totfieldstab = reduce (lambda x,y: x+ y, [len(f) for f in tabrows])
if totfieldspipe > totfieldstab:
yourrows = piperows
else:
yourrows = tabrows
#the var yourrows contains the rows, now just write them in any format you like
Like this
from __future__ import with_statement
import csv
import re
with open( input, "r" ) as source:
with open( output, "wb" ) as destination:
writer= csv.writer( destination )
for line in input:
writer.writerow( re.split( '[\t|]', line ) )
I would suggest taking some of the example code from the existing answers, or perhaps better use the csv module from python and change it to first assume tab separated, then pipe separated, and produce two output files which are comma separated. Then you visually examine both files to determine which one you want and pick that.
If you actually have lots of files, then you need to try to find a way to detect which file is which.
One of the examples has this:
if "|" in line:
This may be enough: if the first line of a file contains a pipe, then maybe the whole file is pipe separated, else assume a tab separated file.
Alternatively fix the file to contain a key field in the first line which is easily identified - or maybe the first line contains column headers which can be detected.
for line in open("file"):
line=line.strip()
if "|" in line:
print ','.join(line.split("|"))
else:
print ','.join(line.split("\t"))