Concatenate row values using python

Concatenate row values using python - python

I am new to pyhton and am stuck on this topic from 2 days,tried looking for a basic answer but couldn't,so finally I decided to come up with my question.
I want to concatenate the values of only first two rows of my csv file(if possible with help of inbuilt modules).
Any kind of help would be appreciated. Thnx in advance
Below is my sample csv file without headers:
1,Suraj,Bangalore
2,Ahuja,Karnataka
3,Rishabh,Bangalore
Desired Output:
1 2,Suraj Ahuja,Bangalore Karnataka
3,Rishabh,Bangalore

Just create a csv.reader object (and a csv.writer object). Then use next() on the first 2 rows and zip them together (using list comprehension) to match the items.
Then process the rest of the file normally.
import csv
with open("file.csv") as fr, open("output.csv","w",newline='') as fw:
cr=csv.reader(fr)
cw=csv.writer(fw)
title_row = [" ".join(z) for z in zip(next(cr),next(cr))]
cw.writerow(title_row)
# dump the rest as-is
cw.writerows(rows)
(you'll get an exception if the file has only 1 row of course)

You can use zip() for your first 2 lines like below:
with open('f.csv') as f:
lines = f.readlines()
res = ""
for i, j in zip(lines[0].strip().split(','), lines[1].strip().split(',')):
res += "{} {},".format(i, j)
print(res.rstrip(','))
for line in lines[2:]:
print(line)
Output:
1 2,Suraj Ahuja,Bangalore Karnataka
3,Rishabh,Bangalore

with open('file2', 'r') as f, open('file2_new', 'w') as f2:
lines = [a.split(',') for a in f.read().splitlines() if a.strip()]
newl2 = [[' '.join(x) for x in zip(lines[0], lines[1])]] + lines[2:]
for a in newl2:
f2.write(', '.join(a)+'\n')

Related

Extracting data from a .txt file

I have this data in my sample.txt file:
A2B3,32:45:63
A4N6,17:72:35
S2R3,13:14:99
What I want to do is to put those data in an array but I'm having problems separating those with commas.
My code goes like this:
with open('sample.txt', 'r') as f:
for line in f:
x = f.read().splitlines()
print(x)
And the output goes like this:
['A2B3,32:45:63','A4N6,17:72:35','S2R3,13:14:99']
I altered my code in different ways to separate those two variables with commas but I can't seem to make it work. Can someone help me achieve this output?
['A2B3','32:45:63','A4N6','17:72:35','S2R3','13:14:99']

use line.split(',') to seperate the line at the ",".
x = []
with open('sample.txt', 'r') as f:
for line in f:
for j in line.split(','):
x.append(j.split('\n')[0])
print(x)

Use this code, which splits the lines into a list like you have, and then splits those items at the comma.
filename = "sample.txt"
with open(filename) as file:
lines = file.read().split("\n")
output = []
for l in lines:
for j in l.split(","):
output.append(j)
print(output)
Output:
['A2B3', '32:45:63', 'A4N6', '17:72:35', 'S2R3', '13:14:99']

You probably could just do:
data = list()
with open('sample.txt', 'r') as f:
for line in f.readlines():
data.append(line)
And you should end up with list of appended lines. It's also faster on big files than just .splitlines() since .readlines() is implemented in C and doesn't load whole file in memory.

yes, it's very simple...
after separate all line, you get list look like
['A2B3,32:45:63','A4N6,17:72:35','S2R3,13:14:99']
then after again you separate that each element by comma(,) and add it into new list like
list_a = ['A2B3,32:45:63','A4N6,17:72:35','S2R3,13:14:99']
final_list = []
for i in list_a:
part_1, part_2 = i.split(',')
final_list.append(part_1)
final_list.append(part_2)
print(final_list)
And it will give your desire output like
['A2B3','32:45:63','A4N6','17:72:35','S2R3','13:14:99']
it is not a redundant way but for you very easy to understand
Thank You :)

Here you go, just iterating once over the lines:
res = []
with open('sample.txt', 'r') as f:
for line in f:
res += line.strip().split(",")
print(res)
Gives:
['A2B3', '32:45:63', 'A4N6', '17:72:35', 'S2R3', '13:14:99']
Though I wonder why you'd want to have everything in a list, I think you are missing the link between the items, maybe could be more useful to keep them in tuples like this:
res = []
with open('sample.txt', 'r') as f:
for line in f:
res.append(tuple(line.strip().split(",")))
print(res)
Gives:
[('A2B3', '32:45:63'), ('A4N6', '17:72:35'), ('S2R3', '13:14:99')]

FMPOV this result is better to go along. But nevermind, I guess, you'll find your solution from one of those poseted here.
x = [i.replace("\n","").split(',')for i in open('data.txt', 'r')]
print(x)
print(x[0][1])

Combining two files' data into one list

I am new to python and have only started working with files. I am wondering how to combine the data of two files into one list using list comprehension to read and combine them.
#for instance line 1 of galaxies = I
#line 1 of cycles = 0
#output = [IO] (list)
This is what I have so far. Thanks in advance!
comlist =[line in open('galaxies.txt') and line in open('cycles.txt')]
Update:
comlist = [mylist.append(gline[i]+cline[i]) for i in range(r)]
However, this is only returning none

Like this:
#from itertools import chain
def chainer(*iterables):
# chain('ABC', 'DEF') --> A B C D E F
for it in iterables:
for element in it:
yield element
comlist = list(chainer(open('galaxies.txt'), open('cycles.txt')))
print(comlist)
Although leaving files open like that isn't generally considered a good practice.

You can use zip to combine iterables
https://docs.python.org/3/library/functions.html#zip

If its only 2 files why do you want to use comprehension together? Something like this would be easier:
[l for l in open('galaxies.txt')]+[l for l in open('cycles.txt')]
The question is, what if you had n files? lets say in a list ... fileList = ['f1.txt', 'f2.txt', ... , 'fn.txt']. Then you may consider itertools.chain
import itertools as it
filePointers = map(open, fileList)
lines = it.chain(filePointers)
map(close, filePointers)
I haven't tested it, but this should work ...

f1 = open('galaxies.txt')
f2 = open('cycles.txt')
If you want to combine them by alternating the lines, use zip and comprehension:
comlist = [line for two_lines in zip(f1, f2) for line in two_lines]
You need two iterations here because the return value from zip is itself an iterable, in this case consisting of two lines, one from f1 and one from f2. You can combine two iterations in a single comprehension as shown.
If you want to combine them one after the other, use "+" for concatenation:
comlist = [line for line in f1] + [line for line in f2]
In both cases, it's a good practice to close each file:
f1.close()
f2.close()

You can achieve your task within lambda and map:
I assume a data in in_file (first file) like this :
1 2
3 4
5 6
7 8
And a data in in_file2 (second file) like this:
hello there!
And with this piece of code:
# file 1
a = "in_file"
# file 2
b = "in_file2"
f = lambda x,y: (open(x, 'r'),open(y, 'r'))
# replacing "\n" with an empty string
data = [k for k in map(lambda x:x.read().replace("\n",""), f(a,b))]
print(data)
The output will be:
['1 23 45 67 8', 'hello there!']
However, it's not a good practice to leave an opened files like this way.

Using only list comprehensions:
[line for file in (open('galaxies.txt'), open('cycles.txt')) for line in file]
However it is bad practice to leave files open and hope the GC cleans it up. You should really do something like this:
import fileinput
with fileinput.input(files=('galaxies.txt', 'cycles.txt')) as f:
comlist = f.readlines()
If you want to strip end of line characters a good way is with line.rstrip('\r\n').

Sort a file by first (or second, or else) column in python

This seems a very basic question, but I am new to python, and after spending a long time trying to find a solution on my own, I thought it's time to ask some more advanced people!
So, I have a file (sample):
ENSMUSG00000098737 95734911 95734973 3 miRNA
ENSMUSG00000077677 101186764 101186867 4 snRNA
ENSMUSG00000092727 68990574 68990678 11 miRNA
ENSMUSG00000088009 83405631 83405764 14 snoRNA
ENSMUSG00000028255 145003817 145032776 3 protein_coding
ENSMUSG00000028255 145003817 145032776 3 processed_transcript
ENSMUSG00000028255 145003817 145032776 3 processed_transcript
ENSMUSG00000098481 38086202 38086317 13 miRNA
ENSMUSG00000097075 126971720 126976098 7 lincRNA
ENSMUSG00000097075 126971720 126976098 7 lincRNA
and I need to write a new file with all the same information, but sorted by the first column.
What I use so far is :
lines = open(my_file, 'r').readlines()
output = open("intermediate_alphabetical_order.txt", 'w')
for line in sorted(lines, key=itemgetter(0)):
output.write(line)
output.close()
It doesn't return me any error, but just writes the output file exactly as the input file.
I know it is certainly a very basic mistake, but it would be amazing if some of you could tell me what I'm doing wrong!
Thanks a lot!
Edit
I am having trouble with the way I open the file, so the answers concerning already opened arrays don't really help.

The problem you're having is that you're not turning each line into a list. When you read in the file, you're just getting the whole line as a string. You're then sorting by the first character of each line, and this is always the same character in your input, 'E'.
To just sort by the first column, you need to split the first block off and just read that section. So your key should be this:
for line in sorted(lines, key=lambda line: line.split()[0]):
split will turn your line into a list, and then the first column is taken from that list.

If your input file is tab-separated, you can also use the csv module.
import csv
from operator import itemgetter
reader = csv.reader(open("t.txt"), delimiter="\t")
for line in sorted(reader, key=itemgetter(0)):
print(line)
sorts by first column.
Change the number in
key=itemgetter(0)
for sorting by a different column.

Same idea as SuperBiasedMan, but I prefer this approach: if you want another way of sorting (for example: if first column matches, sort by second, then third, etc) it is more easily implemented
with open(my_file) as f:
lines = [line.split(' ') for line in f]
output = open("result.txt", 'w')
for line in sorted(lines):
output.write(' '.join(line), key=itemgetter(0))
output.close()

You can write a function that takes a filename, delimiter and column to sort by using csv.reader to parse the file:
from operator import itemgetter
import csv
def sort_by(fle,col,delim):
with open(fle) as f:
r = csv.reader(f, delim=delim)
for row in sorted(r, key=itemgetter(col)):
yield row
for row in sort_by("your_file",2, "\t"):
print(row)

You can do this quickly with pandas as follows, with the data file set up exactly as you show it (i.e., with variable spaces as separators):
import pandas as pd
df = pd.read_csv('csvdata.csv', sep=' ', skipinitialspace=True, header=None)
df.sort(columns=[0], inplace=True)
df.to_csv('sorted_csvdata.csv', header=None, index=None)
Just to check the result:
with open('sorted_csvdata.csv', 'r') as f:
print(f.read())
ENSMUSG00000028255,145003817,145032776,3,protein_coding
ENSMUSG00000028255,145003817,145032776,3,processed_transcript
ENSMUSG00000028255,145003817,145032776,3,processed_transcript
ENSMUSG00000077677,101186764,101186867,4,snRNA
ENSMUSG00000088009,83405631,83405764,14,snoRNA
ENSMUSG00000092727,68990574,68990678,11,miRNA
ENSMUSG00000097075,126971720,126976098,7,lincRNA
ENSMUSG00000097075,126971720,126976098,7,lincRNA
ENSMUSG00000098481,38086202,38086317,13,miRNA
ENSMUSG00000098737,95734911,95734973,3,miRNA
You can do multi column sorting by adding additional columns to the list in the colmuns=[...] keyword argument.

Here is another option. Similar to some of the ideas above. Basically, mysort is a function that will do the custom sorting for you which here is based on
def mysort(line):
return line.split()[0]
with open("records.txt", "r") as f:
text = f.readlines()
for line in sorted(text, key=mysort):
print line

Fixed Length Text File using csv

I have a csv file that looks like this:
123456,456789,12345,123.45,123456
123456,456789,12345,123.45,123456
123456,456789,12345,123.45,123456
I am extremly new to Python programming but I'm learning and finding Python to be very useful. I basically want the output to look like this:
123456 456789 12345 123.45 123456
123456 456789 12345 123.45 123456
123456 456789 12345 123.45 123456
Basically, all fields right justified, having fixed length. There are no heading in the csv file.
Here's the code I have tried so far and like I said, I'm very new to Python:
import csv
with open('test.csv') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',')
for row in spamreader:
print(', '.join(row))
with open('test2.txt', 'wb') as f:
writer = csv.writer(f)
writer.writerows(f)
Any help would be greatly appreciated: Thank You in advance.

OK you have a mess of problems with your code:
Your indentation is all wrong. That's one of the basic concepts of python. Go search the web and read a little about it if you don't understand what I mean
the part that opens 'test2.txt' is inside the loop of spamreader, meaning it is re-opened and truncated for every row in 'test.csv'.
you are trying to write the file to itself with this line: writer.writerows(f) (remember? f is the file you are writing to...)
You are using a csv.writer to write lines to a txt file.
You want a spacing between each item but you're not doing that anywhere in your code
So to sum up all those problems, here's a fixed example, which is really not that far away from your code as it is:
import csv
res = []
# start a loop to collect the data
with open('test.csv') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',')
for row in spamreader:
line = '\t'.join(row) + '\r\n' # the \n is for linebreaks. \r is so notepad loves you too
res.append(line)
# now, outside the loop, we can do this:
with open('test2.txt', 'wb') as f:
f.writelines(res)
EDIT
If you want to control the spacing you can use ljust function like this:
line = ''.ljust(2).join(row)
This will make sure there are 2 spaces between each item. space is the default, but if you want to specify what ljust will be using you can add a second parameter to it:
line = ''.ljust(5, '-').join(row)
then each line would look like this:
123456-----456789-----12345-----123.45-----123456
And thanks for Philippe T. who mentioned it in the comments
2nd Edit
If you want a different length for each column you need to predefine it. The best way would be to create a list in the same length as your csv file columns, with each item being the length of that column and last one being the ending of the line (which is convenient because ''.join doesn't do that by itself), then zip it with your row. Say you want a tab for the first column, then two spaces between each of the other columns. Then your code would look like this:
spacing = ['\t', ' ', ' ', ' ', '\r\n']
# ... the same code from before ...
line = ''.join([j for i in zip(row, spacing) for j in i])
# ... rest of the code ...
The list comprehension loop is a bit convoluted, but think about it like this:
for i in zip(row, spacing): # the zip here equals ==> [(item1, '\t'), (item2, ' ') ...]
for j in i: # now i == (item1, '\t')
j # so j is just the items of each tuple
With the list comprehension, this outputs: [item1, '\t', item2, ' ', ... ]. You join that together and thats it.

Try this:
import csv
with open('data.csv') as fin, open('out.txt','w') as fout:
data = csv.reader(fin,delimiter=',')
resl = csv.writer(fout,delimiter='\t')
resl.writerows(data)

Python split and csv; Modification of existing Python Script

Another users was kind enough to help me with a script that reads in a file and removes/replaces '::' and moves columns to headers:
(I am reposting as it may be useful to someone in this form- my question follows)
with open('infile', "rb") as fin, open('outfile', "wb") as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
for i, line in enumerate(reader):
split = [item.split("::") for item in line if item.strip()]
if not split: # blank line
continue
keys, vals = zip(*split)
if i == 0:
# first line: write header
writer.writerow(keys)
writer.writerow(vals)
I was not aware that the last column of this file had the following text at the end:
{StartPoint::7858.35924983374[%2C]1703.69341358077[%2C]-3.075},{EndPoint::7822.85045874375[%2C]1730.80294308742[%2C]-3.53962362760298}
How do I modify this existing code to take the above and:
1. remove the brackets { }
2. convert the '[%2C]' to a ',' - making it comma delim like the rest of the file
3. Produce 'Xa Ya Za' and 'Xb Yb Zb' as headers for the values liberated in #2
The above text is the input file. Output from the original script produces this:
{StartPoint,EndPoint}
7858.35924983374[%2C]1703.69341358077[%2C]-3.075, 7822.85045874375[%2C]1730.80294308742[%2C]-3.53962362760298}
Is it possible to insert a simple strip command in there?
Thanks, I appreciate your guidance - I am a Python newbie

It seems like you're looking for the replace method on strings: http://docs.python.org/2/library/stdtypes.html#str.replace
Perhaps something like this after the zip:
keys = [key.replace('{', '') for key in keys]
vals = [val.split('[%2C]') for val in vals]
I'm not sure if csv.writer will handle a nested list like would be in vals after this, but you get the idea. If so, vals should be a flat list. Perhaps something like this would flatten it out:
vals = [item for sublist in vals for item in sublist]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Concatenate row values using python - python

with open('file2', 'r') as f, open('file2_new', 'w') as f2: lines = [a.split(',') for a in f.read().splitlines() if a.strip()] newl2 = [[' '.join(x) for x in zip(lines[0], lines[1])]] + lines[2:] for a in newl2: f2.write(', '.join(a)+'\n')

Related

Extracting data from a .txt file

Combining two files' data into one list

Sort a file by first (or second, or else) column in python

Fixed Length Text File using csv

Python split and csv; Modification of existing Python Script

Categories

Resources