I have multiple txt files and each of these txt files has 6 columns. What I want to do : add just one column as a last column, so at the end the txt file has maximum 7 columns and if i run the script again it shouldn't add a new one:
At the beginning each file has six columns:
637.39 718.53 155.23 -0.51369 -0.18539 0.057838 3.209840789730089
636.56 720 155.57 -0.51566 -0.18487 0.056735 3.3520643559939938
635.72 721.52 155.95 -0.51933 -0.18496 0.056504 3.4997850701290125
What I want is to add a new column of zeros only if the current number of columns is 6, after that it should prevent adding a new column when I run the script again (7 columns is the total number where the last one is zeros):
637.39 718.53 155.23 -0.51369 -0.18539 0.057838 3.209840789730089 0
636.56 720 155.57 -0.51566 -0.18487 0.056735 3.3520643559939938 0
635.72 721.52 155.95 -0.51933 -0.18496 0.056504 3.4997850701290125 0
My code works and add one additional column each time i run the script but i want to add just once when the number of columns 6. Where (a) give me the number of column and if the condition is fulfilled then add a new one:
from glob import glob
import numpy as np
new_column = [0] * 20
def get_new_line(t):
l, c = t
return '{} {}\n'.format(l.rstrip(), c)
def writecolumn(filepath):
# Load data from file
with open(filepath) as datafile:
lines = datafile.readlines()
a=np.loadtxt(lines, dtype='str').shape[1]
print(a)
**#if a==6: (here is the problem)**
n, r = divmod(len(lines), len(new_column))
column = new_column * n + new_column[:r]
new_lines = list(map(get_new_line, zip(lines, column)))
with open(filepath, "w") as f:
f.writelines(new_lines)
if __name__ == "__main__":
filepaths = glob("/home/experiment/*.txt")
for path in filepaths:
writecolumn(path)
When i check the number of columns #if a==6 and shift the content inside the if statement I get error. without shifting the content inside the if every thing works fine and still adding one column each time i run it.
Any help is appreciated.
To test the code create two/one txt files with random number of six columns.
Could be an indentation problem, i.e. block below 'if'. writing new-lines should be indented properly --
This works --
def writecolumn(filepath):
# Load data from file
with open(filepath) as datafile:
lines = datafile.readlines()
a=np.loadtxt(lines, dtype='str').shape[1]
print(a)
if int(a)==6:
n, r = divmod(len(lines), len(new_column))
column = new_column * n + new_column[:r]
new_lines = list(map(get_new_line, zip(lines, column)))
with open(filepath, "w") as f:
f.writelines(new_lines)
Use pandas to read your text file:
import pandas as of
df = pd.read_csv("whitespace.csv", header=None, delimiter=" ")
Add a column or more as needed
df['somecolname'] = 0
Save DataFrame with no header.
Related
Analysis software I'm using outputs many groups of results in 1 csv file and separates the groups with 2 empty lines.
I would like to break the results in groups so that I can then analyse them separately.
I'm sure there is a built-in function in python (or one of it's libraries) that does this, I tried this piece of code that I found somewhere but it doesn't seem to work.
import csv
results = open('03_12_velocity_y.csv').read().split("\n\n")
# Feed first csv.reader
first_csv = csv.reader(results[0], delimiter=',')
# Feed second csv.reader
second_csv = csv.reader(results[1], delimiter=',')
Update:
The original code actually works, but my python skills are pretty limited and I did not implement it properly.
.split(\n\n\n) method does work but the csv.reader is an object and to get the data in a list (or something similar), it needs to iterate through all the rows and write them to the list.
I then used Pandas to remove the header and convert the scientific notated values to float. Code is bellow. Thanks everyone for help.
import csv
import pandas as pd
# Open the csv file, read it and split it when it encounters 2 empty lines (\n\n\n)
results = open('03_12_velocity_y.csv').read().split('\n\n\n')
# Create csv.reader objects that are used to iterate over rows in a csv file
# Define the output - create an empty multi-dimensional list
output1 = [[],[]]
# Iterate through the rows in the csv file and append the data to the empty list
# Feed first csv.reader
csv_reader1 = csv.reader(results[0].splitlines(), delimiter=',')
for row in csv_reader1:
output1.append(row)
df = pd.DataFrame(output1)
# remove first 7 rows of data (the start position of the slice is always included)
df = df.iloc[7:]
# Convert all data from string to float
df = df.astype(float)
If your row counts are inconsistent across groups, you'll need a little state machine to check when you're between groups and do something with the last group.
#!/usr/bin/env python3
import csv
def write_group(group, i):
with open(f"group_{i}.csv", "w", newline="") as out_f:
csv.writer(out_f).writerows(group)
with open("input.csv", newline="") as f:
reader = csv.reader(f)
group_i = 1
group = []
last_row = []
for row in reader:
if row == [] and last_row == [] and group != []:
write_group(group, group_i)
group = []
group_i += 1
continue
if row == []:
last_row = row
continue
group.append(row)
last_row = row
# flush remaining group
if group != []:
write_group(group, group_i)
I mocked up this sample CSV:
g1r1c1,g1r1c2,g1r1c3
g1r2c1,g1r2c2,g1r2c3
g1r3c1,g1r3c2,g1r3c3
g2r1c1,g2r1c2,g2r1c3
g2r2c1,g2r2c2,g2r2c3
g3r1c1,g3r1c2,g3r1c3
g3r2c1,g3r2c2,g3r2c3
g3r3c1,g3r3c2,g3r3c3
g3r4c1,g3r4c2,g3r4c3
g3r5c1,g3r5c2,g3r5c3
And when I run the program above I get three CSV files:
group_1.csv
g1r1c1,g1r1c2,g1r1c3
g1r2c1,g1r2c2,g1r2c3
g1r3c1,g1r3c2,g1r3c3
group_2.csv
g2r1c1,g2r1c2,g2r1c3
g2r2c1,g2r2c2,g2r2c3
group_3.csv
g3r1c1,g3r1c2,g3r1c3
g3r2c1,g3r2c2,g3r2c3
g3r3c1,g3r3c2,g3r3c3
g3r4c1,g3r4c2,g3r4c3
g3r5c1,g3r5c2,g3r5c3
If your row counts are consistent, you can do this with fairly vanilla Python or using the Pandas library.
Vanilla Python
Define your group size and the size of the break (in "rows") between groups.
Loop over all the rows adding each row to a group accumulator.
When the group accumulator reaches the pre-defined group size, do something with it, reset the accumulator, and then skip break-size rows.
Here, I'm writing each group to its own numbered file:
import csv
group_sz = 5
break_sz = 2
def write_group(group, i):
with open(f"group_{i}.csv", "w", newline="") as f_out:
csv.writer(f_out).writerows(group)
with open("input.csv", newline="") as f_in:
reader = csv.reader(f_in)
group_i = 1
group = []
for row in reader:
group.append(row)
if len(group) == group_sz:
write_group(group, group_i)
group_i += 1
group = []
for _ in range(break_sz):
try:
next(reader)
except StopIteration: # gracefully ignore an expected StopIteration (at the end of the file)
break
group_1.csv
g1r1c1,g1r1c2,g1r1c3
g1r2c1,g1r2c2,g1r2c3
g1r3c1,g1r3c2,g1r3c3
g1r4c1,g1r4c2,g1r4c3
g1r5c1,g1r5c2,g1r5c3
With Pandas
I'm new to Pandas, and learning this as I go, but it looks like Pandas will automatically trim blank rows/records from a chunk of data^1.
With that in mind, all you need to do is specify the size of your group, and tell Pandas to read your CSV file in "iterator mode", where you can ask for a chunk (your group size) of records at a time:
import pandas as pd
group_sz = 5
with pd.read_csv("input.csv", header=None, iterator=True) as reader:
i = 1
while True:
try:
df = reader.get_chunk(group_sz)
except StopIteration:
break
df.to_csv(f"group_{i}.csv")
i += 1
Pandas add an "ID" column and default header when it writes out the CSV:
group_1.csv
,0,1,2
0,g1r1c1,g1r1c2,g1r1c3
1,g1r2c1,g1r2c2,g1r2c3
2,g1r3c1,g1r3c2,g1r3c3
3,g1r4c1,g1r4c2,g1r4c3
4,g1r5c1,g1r5c2,g1r5c3
TRY this out with your output:
import pandas as pd
# csv file name to be read in
in_csv = 'input.csv'
# get the number of lines of the csv file to be read
number_lines = sum(1 for row in (open(in_csv)))
# size of rows of data to write to the csv,
# you can change the row size according to your need
rowsize = 500
# start looping through data writing it to a new file for each set
for i in range(1,number_lines,rowsize):
df = pd.read_csv(in_csv,
header=None,
nrows = rowsize,#number of rows to read at each loop
skiprows = i)#skip rows that have been read
#csv to write data to a new file with indexed name. input_1.csv etc.
out_csv = 'input' + str(i) + '.csv'
df.to_csv(out_csv,
index=False,
header=False,
mode='a', #append data to csv file
)
I updated the question with the last details that answer my question.
I've got this dataset in a .csv file:
https://www.dropbox.com/s/2kzpzkhoiolhnlc/output.csv?dl=0
19,3,12
3
12
16,4
26,15,8,3
2
8
15
20
12,25,20,2,16
12,16
12,25
2,16
1,12
16,4
11,19,25,20
11,20,16,21
25,20,21
.....
for each row, if numbers are less then 51 than I need to add ? until having 51 chars in that row. For example, in the first row I have 19,3,12, so I have to add 48 ? to have a row like this: 19,3,12,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?
In the second row I have just one number, so I have to add 50 ?, the same for the other rows. Could you help me please?
EDIT: I've tried with this but didn't work, it just added "" to some rows:
import pandas as pd
df = pd.read_csv('output.csv', sep=';')
df = df.fillna('?')
df.to_csv('sorted2.csv', index=False)
you can do it with just text file manipulation if you want, no need to use pandas or csv module for this simple case.
import csv
with open('source.csv') as f:
with open('result.csv', 'w') as fw:
for line in f:
line = line.strip() + (',?' * (50 - line.count(',')))
fw.write(line + '\n')
Use pandas to read the file in and set the number of columns you want. The following code reads in a file and assigns n columns. The extra elements will by default have value np.nan
df.read_csv('file', names=range(n))
If you want them to have a different value you can assign it with
df.fillna(value, inplace = True)
Then you can just write the dataframe back into the file and it will have the shape you want
df.to_csv('file')
I have a script who goes through all the files within a given path('C:\Users\Razvi\Desktop\italia') and reads the number of lines from each file found with the name "cnt.csv" and then it writes in another file named "counters.csv" the current date + the name of folder + the sum of lines found in the "cnt.csv".
Now the output file("counters.csv") looks like this:
30/9/2017
8dg5 5
8dg7 7
01/10/2017
8dg5 8
8dg7 10
In which the names 8dg5 and 8dg7 are the folders where the script found the file"cnt.csv" aand the numers 5,7,8,10 are the sum of lines found in csv files every time when i run the script and obviously the date when i run the script it's appended every time.
Now what i want it's to write those dates in csv, somehow appended on columns , not lines, something like this:
30/9/2017 01/10/2017
8dg5 5 8dg5 8
8dg7 7 8dg7 10
Here is the code:
import re
import os
from datetime import datetime
#italia =r'C:\Users\timraion\Desktop\italia'
italia =r'C:\Users\Razvi\Desktop\italia'
file = open(r'C:\Users\Razvi\Desktop\counters.csv', 'a')
now=datetime.now()
dd=str(now.day)
mm=str(now.month)
yyyy=str(now.year)
date=(dd + "/" + mm + "/" + yyyy)
file.write(date + '\n')
for filename in os.listdir(italia):
infilename = os.path.join(italia, filename)
for files in os.listdir(infilename):
if files.endswith("cnt.csv"):
result=os.path.join(infilename,"cnt.csv")
print(result)
infilename2 = os.path.join(infilename, files)
lines = 0
for line in open(infilename2):
lines += 1
file = open(r'C:\Users\Razvi\Desktop\counters.csv', 'a')
file.write(str(filename) +","+ str(lines)+"\n" )
file.close()
Thanks!
If you want to add the new entries as columns instead of rows further down, you'll have to read in the counters.csv file each time, append your columns, and rewrite it.
import os
from datetime import datetime
import csv
italia =r'C:\Users\Razvi\Desktop\italia'
counts = []
for filename in os.listdir(italia):
infilename = os.path.join(italia, filename)
for files in os.listdir(infilename):
if files.endswith("cnt.csv"):
result=os.path.join(infilename,"cnt.csv")
print(result)
infilename2 = os.path.join(infilename, files)
lines = 0
for line in open(infilename2):
lines += 1
counts.append((filename, lines)) # save the filename and count for later
if os.path.exists(r'C:\Users\Razvi\Desktop\counters.csv'):
with open(r'C:\Users\Razvi\Desktop\counters.csv', 'r') as csvfile:
counters = [row for row in csv.reader(csvfile)] # read the csv file
else:
counters = [[]]
now=datetime.now()
# Add the date and a blank cell to the first row
counters[0].append('{}/{}/{}'.format(now.day, now.month, now.year))
counters[0].append('')
for i, count in enumerate(counts):
if i + 1 == len(counters): # check if we need to add a row
counters.append(['']*(len(counters[0])-2))
while len(counters[i+1]) < len(counters[0]) - 2:
counters[i+1].append('') # ensure there are enough columns in the row we're on
counters[i+1].extend(count) # Add the count info as two new cells
with open(r'C:\Users\Razvi\Desktop\counters.csv', 'w') as csvfile:
writer = csv.writer(csvfile) # write the new csv file
for row in counters:
writer.writerow(row)
Another note, file is a builtin function in Python, so you shouldn't use that name as a variable, since unexpected stuff might happen further down in your code.
I'm new to python and looking for a script that reformats a .csv file. So in my .csv files there are rows which are not formatted correctly. It does look similar to this:
id,author,text,date,id,author,
text,date
id,author,text,date
id,author,text,date
It's supposed to have "id,author,text,date" on each line. So my idea was to count the commas in each row and when a specific number is achieved (in this example 4) it will insert the remainder at the beginning of the next row. What I got is the following which counts the commas in one row:
import csv
with open("test.csv") as f:
r = csv.reader(f) # create rows split on commas
for row in r:
com_count = 0
com_count += len(row)
print(com_count)
Thanks for your help!
We're going to build a generator that yields entries and then build the new rows from that
with open('oldfile.csv', newline='') as old:
r = csv.reader(old)
num_cols = int(input("How many columns: "))
entry_generator = (entry for row in r for entry in row)
with open('newfile.csv', 'w+', newline='') as newfile:
w = csv.writer(newfile)
while True:
try:
w.writerow([next(entry_generator) for _ in range(num_cols)])
except StopIteration:
break
This will not work if you have a row that is missing entries.
If you want to handle getting the column width programmatically, you can either wrap this in a function that takes a width as input, or use the first row of the csv as a canonical length
unique.txt file contains: 2 columns with columns separated by tab. total.txt file contains: 3 columns each column separated by tab.
I take each row from unique.txt file and find that in total.txt file. If present then extract entire row from total.txt and save it in new output file.
###Total.txt
column a column b column c
interaction1 mitochondria_205000_225000 mitochondria_195000_215000
interaction2 mitochondria_345000_365000 mitochondria_335000_355000
interaction3 mitochondria_345000_365000 mitochondria_5000_25000
interaction4 chloroplast_115000_128207 chloroplast_35000_55000
interaction5 chloroplast_115000_128207 chloroplast_15000_35000
interaction15 2_10515000_10535000 2_10505000_10525000
###Unique.txt
column a column b
mitochondria_205000_225000 mitochondria_195000_215000
mitochondria_345000_365000 mitochondria_335000_355000
mitochondria_345000_365000 mitochondria_5000_25000
chloroplast_115000_128207 chloroplast_35000_55000
chloroplast_115000_128207 chloroplast_15000_35000
mitochondria_185000_205000 mitochondria_25000_45000
2_16595000_16615000 2_16585000_16605000
4_2785000_2805000 4_2775000_2795000
4_11395000_11415000 4_11385000_11405000
4_2875000_2895000 4_2865000_2885000
4_13745000_13765000 4_13735000_13755000
My program:
file=open('total.txt')
file2 = open('unique.txt')
all_content=file.readlines()
all_content2=file2.readlines()
store_id_lines = []
ff = open('match.dat', 'w')
for i in range(len(all_content)):
line=all_content[i].split('\t')
seq=line[1]+'\t'+line[2]
for j in range(len(all_content2)):
if all_content2[j]==seq:
ff.write(seq)
break
Problem:
but istide of giving desire output (values of those 1st column that fulfile the if condition). i nead somthing like if jth of unique.txt == ith of total.txt then write ith row of total.txt into new file.
import csv
with open('unique.txt') as uniques, open('total.txt') as total:
uniques = list(tuple(line) for line in csv.reader(uniques))
totals = {}
for line in csv.reader(total):
totals[tuple(line[1:])] = line
with open('output.txt', 'w') as outfile:
writer = csv.writer(outfile)
for line in uniques:
writer.writerow(totals.get(line, []))
I will write your code in this way:
file=open('total.txt')
list_file = list(file)
file2 = open('unique.txt')
list_file2 = list(file2)
store_id_lines = []
ff = open('match.dat', 'w')
for curr_line_total in list_file:
line=curr_line_total.split('\t')
seq=line[1]+'\t'+ line[2]
if seq in list_file2:
ff.write(curr_line_total)
Please, avoid readlines() and use the with syntax when you open your files.
Here is explained why you don't need to use readlines()