Split into two columns and convert txt text into a csv file

Split into two columns and convert txt text into a csv file - python

I have the following data:
Graudo. A selection of Pouteria caimito, a minor member...
TtuNextrecod. A selection of Pouteria caimito, a minor member of the Sapotaceae...
I want to split it into two columns
Column1 Column2
------------------------------------------------------------------------------
Graudo A selection of Pouteria caimito, a minor member...
TtuNextrecod A selection of Pouteria caimito, a minor member of the Sapotaceae...
Need help with the code. Thanks,
import csv # convert
import itertools #function for a efficient looping
with open('Abiutxt.txt', 'r') as in_file:
lines = in_file.read().splitlines() #returns a list with all the lines in string, including the line breaks
test = [line.split('. ')for line in lines ] #split period....but...need work
print(test)
stripped = [line.replace('', '').split('. ')for line in lines ]
grouped = itertools.izip(*[stripped]*1)
with open('logtestAbiutxt.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('Column1', 'Column2'))
for group in grouped:
writer.writerows(group)

I am not sure you need zipping here at all. Simply iterate over every line of the input file, skip empty lines, split by the period and write to the csv file:
import csv
with open('Abiutxt.txt', 'r') as in_file:
with open('logtestAbiutxt.csv', 'w') as out_file:
writer = csv.writer(out_file, delimiter="\t")
writer.writerow(['Column1', 'Column2'])
for line in in_file:
if not line.strip():
continue
writer.writerow(line.strip().split(". ", 1))
Notes:
Note: specified a tab as a delimiter, but you could change it appropriately
thanks to #PatrickHaugh for the idea to split by the first occurence of ". " only as your second column may contain periods as well.

This should get you what you want. This will handle all the escaping.
import csv
with open('Abiutxt.txt', 'r') as in_file:
x = in_file.read().splitlines()
x = [line.split('. ', 1) for line in x if line]
with open('logtestAbiutxt.csv', "w") as output:
writer = csv.writer(output, lineterminator='\n')
writer.writerow(['Column1', 'Column2'])
writer.writerows(x)

Related

How to sort the values (from smallest to larger) of a column in an ascii file using python?

I have an ASCII file with the following columns :
ID, val1, val2, val3
where ID is a row_number but not sorted. I want to write a new ascii file with the same columns with sorted ID (from smaller to larger).
How I could do that in python?
In fact, this file has been produced by the concatenation of 2 ascii files using the following code:
import os.path
maindir1="/home/d01/"
maindir2="/home/d02/"
outdir="/home/final/"
pols=[ "F1","F2","F3" ]
months=["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
for ipol in pols:
for imonth in months:
for kk in range(1, 7):
template_args = {"ipol": ipol, "imonth": imonth, "kk": kk}
filename = "{ipol}_{imonth}_0{kk}_L1.txt".format(ipol=ipol, imonth=imonth, kk=kk)
out_name = os.path.join(outdir, filename)
in_names = [os.path.join(maindir1, filename), os.path.join(maindir2, filename)]
with open(out_name, "w") as out_file:
for in_name in in_names:
with open(in_name, "r") as in_file:
out_file.write(in_file.read())
How could I define to the above code to write the final file in a sorted way (based on the first column) ?

Assuming Comma Separated Values
I think you're talking about a Comma Separated Values (CSV) file. The character encoding is probably ASCII. If this is true, you'll have an input like this:
id,val1,val2,val3
3,a,b,c
1,a,b,c
2,a,b,c
Python has a good standard library for this: csv.
import csv
with open("in.csv") as f:
reader = csv.reader(f)
We import the csv library first, then open the file using a context processor. Basically, it's a nice way to open the file, do stuff (in the with block) and then close it.
The csv.reader method takes the file pointer f as an argument. This reader can be iterated and represents the contents of your file. If you cast it to a list, you get a list of lists. The first item in the list of lists is the header, which you want to save, and the rest is the contents:
contents = list(reader)
header = contents[0]
rows = contents[1:]
You then want to sort the rows. But sorting a list of lists might not do what you expect. You need to write a function that helps you find the key to use to perform the sorting:
lambda line: line[0]
This means for every line (which we expect to be a list), the key is equal to the first member of the list. If you prefer not to use lambdas, you can also define a function:
def get_key(line):
return line[0]
get_key is identical to the lambda.
Combine this all together to get:
new_file = sorted(rows, key=lambda line: line[0])
If you didn't use the lambda, that's:
new_file = sorted(rows, key=get_key)
To write it to a file, you can use the csv library again. Remember to first write the header then the rest of the contents:
with open("out.csv", "w") as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerows(new_file)
All together, the code looks like this:
import csv
with open("in.txt") as f:
reader = csv.reader(f)
contents = list(reader)
header = contents[0]
rows = contents[1:]
new_file = sorted(rows, key=lambda line: line[0])
with open("out.csv", "w") as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerows(new_file)
Assuming Custom
If the file is custom and definitely has the spaces in the header like you described (almost like a CSV) or you don't want to use the csv library, you can extract rows like this:
contents = [row.replace(" ", "").split(",") for row in f.readlines()]
If, for instance, it's space-delimited instead of comma-delimited, you would use this:
contents = [row.split() for row in f.readlines()]
You can write rows like this:
with open("out.csv", "w") as f:
f.write(", ".join(header))
for row in new_file:
f.write(", ".join(row))
In ensemble:
with open("in.txt") as f:
contents = [row.replace(" ", "").split(",") for row in f.readlines()]
header = contents[0]
rows = contents[1:]
new_file = sorted(rows, key=lambda line: line[0])
with open("out.csv", "w") as f:
f.write(", ".join(header))
for row in new_file:
f.write(", ".join(row))
Hope that helps!
EDIT: This would perform a lexicographical sort on the first column, which is probably not what you want. If you can guarantee that all of the first column (aside from the header) are integers, you can just cast them from a str:
lambda line: line[0]
...becomes:
lambda line: int(line[0])
...with full code:
import csv
with open("in.txt") as f:
reader = csv.reader(f)
contents = list(reader)
header = contents[0]
rows = contents[1:]
new_file = sorted(rows, key=lambda line: int(line[0]))
with open("out.csv", "w") as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerows(new_file)

So, you need to sort the data in csv format you have in ascending order on the basis of Id.
You can use this function to do it
def Sort(sub_li):
sub_li.sort(key = lambda x: x[0])
return sub_li
x[0] to sort according to Id means first column or you can change according to your use case.
I took the input as `
a = [["1a", 122323,1000,0],
["6a", 12323213,24,2],
["3a", 1233,1,3]]
So, using the above function I got the output as
[['1a', 122323, 1000, 0],
['3a', 1233, 1, 3],
['6a', 12323213, 24, 2]]
I hope this will help.

Sorting names in a text file, writing results to another text file

I have a csv file containing a few names that are written in one line seperated by commas but no spaces ex. "maho,baba,fika,anst,koka,root". What i would like to do is to sort these names alphabetically and write them to a new text file so the result becomes like this:
anst
baba
fika
etc.
This is my attempt at it which did not work..
names = list()
filename = 'users.csv'
with open (filename) as fin:
for line in fin:
names.append(line.strip())
names.sort()
print(names)
filename = 'names_sorted1.txt'
with open (filename, 'w') as fout:
for name in names:
fout.write(name + '\n')

You are trying to sort names, which will only contain one string: the entire chunk of comma-separated text. What you need is a way to separate it into a list of individual names, which can be done with the split method:
in_filename = 'users.csv'
with open(in_filename ) as fin:
names = sorted(fin.read().strip().split(','))
Then, we can use the join method to combine the list into one long string again, where each element from the list is separated from the next by '\n':
out_filename = 'names_sorted1.txt'
with open(out_filename , 'w') as fout:
fout.write('\n'.join(names) + '\n')

You can use this oneliner:
with open("names.csv") as f, open("new.csv", "w") as fw:
[fw.write(x+"\n") for x in sorted([x for x in ",".join(f.readlines()).split(",")])]
new.csv
anst
baba
fika
koka
maho
root
Demo

You could use the csv module like,
$ cat names.csv
maho,baba,fika,anst,koka,root
$ cat sort_names.py
import csv
with open('names.csv') as csvfile, open('new.txt', 'w') as f:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
print(row)
for word in sorted(row):
f.write("{}\n".format(word))
$ python sort_names.py
$ cat new.txt
anst
baba
fika
koka
maho
root

Modify field in csv file with Python

I am attempting to remove special characters from a specific column within my csv file. But I cant figure out a way to specify the column I would like to change. Here is what I have:
import csv
input_file = open('src/list.csv', 'r')
output_file = open('src/list_new.csv', 'w')
data = csv.reader(input_file)
writer = csv.writer(output_file, quoting=csv.QUOTE_ALL) # dialect='excel')
specials = '#'
for line in data:
line = str(line)
new_line = str.replace(line, specials, '')
writer.writerow(new_line.split(','))
input_file.close()
output_file.close()
Instead of searching through the whole file how can I specify the column ("Names") I would like to remove the special characters from?

Maybe use csv.DictReader? Then you can refer to the column by name.

Replace character in line inside a file

I have these different lines with values in a text file
sample1:1
sample2:1
sample3:0
sample4:15
sample5:500
and I want the number after the ":" to be updated sometimes
I know I can split the name by ":" and get a list with 2 values.
f = open("test.txt","r")
lines = f.readlines()
lineSplit = lines[0].split(":",1)
lineSplit[1] #this is the value I want to change
im not quite sure how to update the lineSplit[1] value with the write functions

You can use the fileinput module, if you're trying to modify the same file:
>>> strs = "sample4:15"
Take the advantage of sequence unpacking to store the results in variables after splitting.
>>> sample, value = strs.split(':')
>>> sample
'sample4'
>>> value
'15'
Code:
import fileinput
for line in fileinput.input(filename, inplace = True):
sample, value = line.split(':')
value = int(value) #convert value to int for calculation purpose
if some_condition:
# do some calculations on sample and value
# modify sample, value if required
#now the write the data(either modified or still the old one) to back to file
print "{}:{}".format(sample, value)

Strings are immutable, meaning, you can't assign new values inside them by index.
But you can split up the whole file into a list of lines, and change individual lines (strings) entirely. This is what you're doing in lineSplit[1] = A_NEW_INTEGER
with open(filename, 'r') as f:
lines = f.read().splitlines()
for i, line in enumerate(lines):
if condition:
lineSplit = line.split(':')
lineSplit[1] = new_integer
lines[i] = ':'.join(lineSplit)
with open(filename, 'w') as f:
f.write('\n'.join(lines)

Maybe something as such (assuming that each first element before : is indeed a key):
from collections import OrderedDict
with open('fin') as fin:
samples = OrderedDict(line.split(':', 1) for line in fin)
samples['sample3'] = 'something else'
with open('output') as fout:
lines = (':'.join(el) + '\n' for el in samples.iteritems())
fout.writelines(lines)

Another option is to use csv module (: is a column delimiter in your case).
Assuming there is a test.txt file with the following content:
sample1:1
sample2:1
sample3:0
sample4:15
sample5:500
And you need to increment each value. Here's how you can do it:
import csv
# read the file
with open('test.txt', 'r') as f:
reader = csv.reader(f, delimiter=":")
lines = [line for line in reader]
# write the file
with open('test.txt', 'w') as f:
writer = csv.writer(f, delimiter=":")
for line in lines:
# edit the data here
# e.g. increment each value
line[1] = int(line[1]) + 1
writer.writerows(lines)
The contents of test.txt now is:
sample1:2
sample2:2
sample3:1
sample4:16
sample5:501
But, anyway, fileinput sounds more logical to use in your case (editing the same file).
Hope that helps.

Removing specific text from every line

I have a txt file with this format:
something text1 pm,bla1,bla1
something text2 pm,bla2,bla2
something text3 am,bla3,bla3
something text4 pm,bla4,bla4
and in a new file I want to hold:
bla1,bla1
bla2,bla2
bla3,bla3
bla4,bla4
I have this which holds the first 10 characters for example of every line. Can I transform this or any other idea?
with open('example1.txt', 'r') as input_handle:
with open('example2.txt', 'w') as output_handle:
for line in input_handle:
output_handle.write(line[:10] + '\n')

This is what the csv module was made for.
import csv
reader = csv.reader(open('file.csv'))
for row in reader: print(row[1])
You can then just redirect the output of the file to the new file using your shell, or you can do something like this instead of the last line:
for row in reader:
with open('out.csv','w+') as f:
f.write(row[1]+'\n')

To remove the first ","-separated column from the file:
first, sep, rest = line.partition(",")
if rest: # don't write lines with less than 2 columns
output_handle.write(rest)

If the format is fixed:
with open('example1.txt', 'r') as input_handle:
with open('example2.txt', 'w') as output_handle:
for line in input_handle:
if line: # and maybe some other format check
od = line.split(',', 1)
output_handle.write(od[1] + "\n")

Here is how I would write it.
Python 2.7
import csv
with open('example1.txt', 'rb') as f_in, open('example2.txt', 'wb') as f_out:
writer = csv.writer(f_out)
for row in csv.reader(f_in):
writer.write(row[-2:]) # keeps the last two columns
Python 3.x (note the differences in arguments to open)
import csv
with open('example1.txt', 'r', newline='') as f_in:
with open('example2.txt', 'w', newline='') as f_out:
writer = csv.writer(f_out)
for row in csv.reader(f_in):
writer.write(row[-2:]) # keeps the last two columns

Try:
output_handle.write(line.split(",", 1)[1])
From the docs:
str.split([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split into two columns and convert txt text into a csv file - python

Related

How to sort the values (from smallest to larger) of a column in an ascii file using python?

Sorting names in a text file, writing results to another text file

Modify field in csv file with Python

Replace character in line inside a file

Removing specific text from every line

Categories

Resources