I have a csv file which is a list as seen here:
A
B
C
D
E
F
And I would like to transform it into a list with pair like this:
AB
CD
EF
What is simplest way to achieve this?
An alternative approach using itertools.islice, it will avoid reading the whole file at once:
import csv
from itertools import islice
CHUNK = 2
def chunks():
with open("test.csv", newline="") as f:
reader = csv.reader(f)
while chunk := tuple(islice(reader, CHUNK)):
yield "".join(*zip(*chunk))
def main():
print(list(chunks()))
if __name__ == "__main__":
main()
Note:
The walrus operator (:=) is available since Python 3.8+, in previous versions you'll need something like this:
chunk = tuple(islice(reader, CHUNK))
while chunk:
yield "".join(*zip(*chunk))
chunk = tuple(islice(reader, CHUNK))
The easiest way is probably to put each line of your file in a list and then create a list with half the size and your pairs. Since your .csv file appears to have only one column, the file format doesn't really matter.
Now, I assume that you have the file eggs.csv in the same directory as your Python file:
A
B
C
D
E
F
The following code produces the expected output:
output_lines = []
with open('eggs.csv', 'r') as file:
for first, second in zip(file, file):
output_lines.append(f'{first.strip()}{second.strip()}')
If you execute this code and print output_lines, you will get
['AB', 'CD', 'EF']
Note that if the number of lines is odd, the last line will be simply ignored. I don't know the desired behavior so I just assumed this, but you can easily change the code.
Related
I am new to python and have only started working with files. I am wondering how to combine the data of two files into one list using list comprehension to read and combine them.
#for instance line 1 of galaxies = I
#line 1 of cycles = 0
#output = [IO] (list)
This is what I have so far. Thanks in advance!
comlist =[line in open('galaxies.txt') and line in open('cycles.txt')]
Update:
comlist = [mylist.append(gline[i]+cline[i]) for i in range(r)]
However, this is only returning none
Like this:
#from itertools import chain
def chainer(*iterables):
# chain('ABC', 'DEF') --> A B C D E F
for it in iterables:
for element in it:
yield element
comlist = list(chainer(open('galaxies.txt'), open('cycles.txt')))
print(comlist)
Although leaving files open like that isn't generally considered a good practice.
You can use zip to combine iterables
https://docs.python.org/3/library/functions.html#zip
If its only 2 files why do you want to use comprehension together? Something like this would be easier:
[l for l in open('galaxies.txt')]+[l for l in open('cycles.txt')]
The question is, what if you had n files? lets say in a list ... fileList = ['f1.txt', 'f2.txt', ... , 'fn.txt']. Then you may consider itertools.chain
import itertools as it
filePointers = map(open, fileList)
lines = it.chain(filePointers)
map(close, filePointers)
I haven't tested it, but this should work ...
f1 = open('galaxies.txt')
f2 = open('cycles.txt')
If you want to combine them by alternating the lines, use zip and comprehension:
comlist = [line for two_lines in zip(f1, f2) for line in two_lines]
You need two iterations here because the return value from zip is itself an iterable, in this case consisting of two lines, one from f1 and one from f2. You can combine two iterations in a single comprehension as shown.
If you want to combine them one after the other, use "+" for concatenation:
comlist = [line for line in f1] + [line for line in f2]
In both cases, it's a good practice to close each file:
f1.close()
f2.close()
You can achieve your task within lambda and map:
I assume a data in in_file (first file) like this :
1 2
3 4
5 6
7 8
And a data in in_file2 (second file) like this:
hello there!
And with this piece of code:
# file 1
a = "in_file"
# file 2
b = "in_file2"
f = lambda x,y: (open(x, 'r'),open(y, 'r'))
# replacing "\n" with an empty string
data = [k for k in map(lambda x:x.read().replace("\n",""), f(a,b))]
print(data)
The output will be:
['1 23 45 67 8', 'hello there!']
However, it's not a good practice to leave an opened files like this way.
Using only list comprehensions:
[line for file in (open('galaxies.txt'), open('cycles.txt')) for line in file]
However it is bad practice to leave files open and hope the GC cleans it up. You should really do something like this:
import fileinput
with fileinput.input(files=('galaxies.txt', 'cycles.txt')) as f:
comlist = f.readlines()
If you want to strip end of line characters a good way is with line.rstrip('\r\n').
I have some text files that I want to read file by file and line by line and sort and write to one file in Python, for example:
file 1:
C
D
E
file 2:
1
2
3
4
file 3:
#
$
*
File 4,.......
The result should be like this sequence in one file:
C
1
#
D
2
$
E
3
*
C
4
#
D
1
#
You can use a list of iterators over your files. You then need to constantly cycle through these iterators until one of your files has been consumed. You could use a while loop, or shown here is using itertools cycle:
import glob
import itertools
fs = glob.glob("./*py") # Use glob module to get files with a pattern
fits = [open(i, "r") for i in fs] # Create a list of file iterators
with open("blah", "w") as out:
for f in itertools.cycle(fits): # Loop over you list until one file is consumed
try:
l = next(f).split(" ")
s = sorted(l)
out.write(" ".join(s) + "/n")
print s
except: # If one file has been read, the next(f) will raise an exception and this will stop the loop
break
This looks related to another question you asked (and then deleted).
I'm assuming you want to be able to read a file, create generators, combine generators, sort the output of generators, then write to a file.
Using yield to form your generator makes life a lot easier.
Keep in mind, to sort every line like this, you will have to store it in memory. If dealing with very large files, you will need to handle this in a more memory-conscious way.
First, let's make your generator that opens a file and reads line-by-line:
def line_gen(file_name):
with open(file_name, 'r') as f:
for line in f.readlines():
yield line
Then let's "merge" the generators, by creating a generator which will iterate through each one in order.
def merge_gens(*gens):
for gen in gens:
for x in gen:
yield x
Then we can create our generators:
gen1 = line_gen('f1.txt')
gen2 = line_gen('f2.txt')
Combine them:
comb_gen = merge_gens(gen1, gen2)
Create a list from the generator. (This is the potentially-memory-intensive step.):
itered_list = [x for x in comb_gen]
Sort the list:
sorted_list = sorted(itered_list)
Write to the file:
with open('f3.txt', 'w') as f:
for line in sorted_list:
f.write(line + '\n')
I have list like the one below,
lista=["a","b","c","\n","d","e","f","\n","g","h","i","\n"]
Can some one please advise how I make csv module to write this so that every "\n" in the list is counted as a line break? To make it simpler the csv should look like this,
a,b,c
d,e,f
g,h,i
Please let me know if the question is not clear, I will make changes as required.
import csv
import sys
def rows(lst):
it = iter(lst)
while True:
row = list(iter(it.next, '\n')) # it.__next__ in Python 3.x
if not row:
break
yield row
lista = ["a","b","c","\n","d","e","f","\n","g","h","i","\n"]
writer = csv.writer(sys.stdout) # Replace sys.stdout with file object
writer.writerows(rows(lista))
You don't really need the CSV module:
for s in ','.join(lista).split('\n'):
print(s.strip(','), '\n')
This gives:
a,b,c
d,e,f
g,h,i
I have 125 data files containing two columns and 21 rows of data. Please see the image below:
and I'd like to import them into a single .csv file (as 250 columns and 21 rows).
I am fairly new to python but this what I have been advised, code wise:
import glob
Results = [open(f) for f in glob.glob("*.data")]
fout = open("res.csv", 'w')
for row in range(21):
for f in Results:
fout.write( f.readline().strip() )
fout.write(',')
fout.write('\n')
fout.close()
However, there is slight problem with the code as I only get 125 columns, (i.e. the force and displacement columns are written in one column) Please refer to the image below:
I'd very much appreciate it if anyone could help me with this !
import glob
results = [open(f) for f in glob.glob("*.data")]
sep = ","
# Uncomment if your Excel formats decimal numbers like 3,14 instead of 3.14
# sep = ";"
with open("res.csv", 'w') as fout:
for row in range(21):
iterator = (f.readline().strip().replace("\t", sep) for f in results)
line = sep.join(iterator)
fout.write("{0}\n".format(line))
So to explain what went wrong with your code, your source files use tab as a field separator, but your code uses comma to separate the lines it reads from those files. If your excel uses period as a decimal separator, it uses comma as a default field separator. The whitespace is ignored unless enclosed in quotes, and you see the result.
If you use the text import feature of Excel (Data ribbon => From Text) you can ask it to consider both comma and tab as valid field separators, and then I'm pretty sure your original output would work too.
In contrast, the above code should produce a file that will open correctly when double clicked.
You don't need to write your own program to do this, in python or otherwise. You can use an existing unix command (if you are in that environment):
paste *.data > res.csv
Try this:
import glob, csv
from itertools import cycle, islice, count
def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
# Recipe credited to George Sakkis
pending = len(iterables)
nexts = cycle(iter(it).next for it in iterables)
while pending:
try:
for next in nexts:
yield next()
except StopIteration:
pending -= 1
nexts = cycle(islice(nexts, pending))
Results = [open(f).readlines() for f in glob.glob("*.data")]
fout = csv.writer(open("res.csv", 'wb'), dialect="excel")
row = []
for line, c in zip(roundrobin(Results), cycle(range(len(Results)))):
splitline = line.split()
for item,currItem in zip(splitline, count(1)):
row[c+currItem] = item
if count == len(Results):
fout.writerow(row)
row = []
del fout
It should loop over each line of your input file and stitch them together as one row, which the csv library will write in the listed dialect.
I suggest to get used to csv module. The reason is that if the data is not that simple (simple strings in headings, and then numbers only) it is difficult to implement everything again. Try the following:
import csv
import glob
import os
datapath = './data'
resultpath = './result'
if not os.path.isdir(resultpath):
os.makedirs(resultpath)
# Initialize the empty rows. It does not check how many rows are
# in the file.
rows = []
# Read data from the files to the above matrix.
for fname in glob.glob(os.path.join(datapath, '*.data')):
with open(fname, 'rb') as f:
reader = csv.reader(f)
for n, row in enumerate(reader):
if len(rows) < n+1:
rows.append([]) # add another row
rows[n].extend(row) # append the elements from the file
# Write the data from memory to the result file.
fname = os.path.join(resultpath, 'result.csv')
with open(fname, 'wb') as f:
writer = csv.writer(f)
for row in rows:
writer.writerow(row)
The with construct for opening a file can be replaced by the couple:
f = open(fname, 'wb')
...
f.close()
The csv.reader and csv.writer are simply wrappers that parse or compose the line of the file. The doc says that they require to open the file in the binary mode.
Clarification:
So if my file has 10 lines:
THe first line is a heading, so I want to append some text at the end of first line
THen I have a list which contains 9 elements..
I want to read that list and append the end of each line with corresponding element..
So basically list[0] to second line, list[1] to third line and so on..
I have a file which is delimted by comma.
something like this:
A,B,C
0.123,222,942
......
Now I want to do something like this:
A,B,C,D #append "D" just once
0.123,222,942,99293
............
This "D" is actually saved in a list so yeah I have this "D"
How do I do this? I mean I know the naive way.
like go thru each line and do something like
string += str(list[i])
Basically how do i append something at the end of the file in pythonic way :)
Just create a new file:
data = ['header', 1, 2, 3, 4]
with open("infile", 'r') as inf, open("infile.2", 'w') as outf:
outf.writelines('%s,%s\n' % (s.strip(), n) for s, n in zip(inf, data))
If you want to "update" the input file, just rename the new one afterwards
import os
os.unlink("infile")
os.rename("infile.2", "infile")
Short answer: Use the csv module.
Long answer:
import csv
newvalues = [...]
with open("path/to/input.csv") as file:
data = list(csv.reader(file))
with open("path/to/input.csv", "w") as file:
writer = csv.writer(file)
for row, newvalue in zip(data, newvalues):
row.append(newvalue)
writer.writerow(row)
Naturally, this depends on the lines in the file and newvalues being the same length. If this isn't the case, you could use something like zip_longest to fill in the excess lines with a given value.
If you are doing this to the different files, we can do it even more easily:
import csv
newvalues = [...]
with open("path/to/input.csv") as from, open("path/to/output.csv", "w") as to:
reader = csv.reader(from)
writer = csv.writer(to)
for row, newvalue in zip(reader, newvalues):
row.append(newvalue)
writer.writerow(row)
This also has the advantage of not reading the entire file into memory, so for very large files, this is a better solution.