I have a file text with some content.
I want to edit only the column "Medicalization". For example with a program, by entring on keypad B the column "Medicalization" becomes B :
This column has coordinates 14 for each letter of medicalization.
I tried something but I get an "index out of range" error :
with open('d:/test.txt','r') as infile:
with open('d:/test2.txt','w') as outfile:
for line in infile :
line = line.split()
new_line = '"B"\n'.format(line[14])
outfile.write(new_line)
Is that possible to do that with Python ?
Since data is in tabular form so use pandas.read_csv with sep \s+ then use pandas.DataFrame.loc to replace A with B in medicalization.
import pandas as pd
df = pd.read_csv("test.txt", sep="\s+")
df.loc[df["medicalization"] == "A" ,"medicalization"] = "B"
print(df)
typtpt name medicalization
0 1 Entrance B
1 2 Departure B
2 3 Consultation B
3 4 Meeting B
4 5 Transfer B
And if you want to save it back then use:
df.to_csv('test.txt', sep='\t', index=False)
The 'A' value you wish to change cannot possibly be column 14 in every line. If you look at, for example, the 4th row (with 'Consultation' as the name), even with a single space separating the columns, the third column would be at column position 17. So your assumption about fixed column positions must be wrong. If there is, for example, a single space or tab character separating each column, then for the first row of actual data the 'A` value would be at offset 12 and this would explain your exception.
Assuming a single space is separating each column from one another, then you could use the csv module as follows:
import csv
with open('d:/test.txt') as infile:
with open('d:/test2.txt', 'w', newline='') as outfile:
rdr = csv.reader(infile, delimiter=' ')
wtr = csv.writer(outfile, delimiter=' ')
# just write out the first row:
header = next(rdr)
wtr.writerow(header)
for row in rdr:
row[2] = 'B'
wtr.writerow(row)
Or specify delimiter='\t' if a tab is used to separate the columns.
If an arbitrary number of whitespace characters (spaces or tabs) separates each column, then:
with open('test.txt') as infile:
with open('test2.txt', 'w') as outfile:
first_time = True
for row in infile:
columns = row.split()
if first_time:
first_time = False
else:
columns[2] = 'B'
print(' '.join(columns), file=outfile)
The index out of range error is because of the output you get from the line = line.split(). This splits by all the whitespace thus the output of the line.split() is a list like so ['01','Entrance','A'] for line 2 for example. So when you do the indexing you're indexing at 14 which does not exist within the list.
If you're data files format is consistent (all Medicalization data is in the 3rd column) you can achieve what you're after with pure python like so:
with open('test.txt','r') as infile:
with open('test2.txt','w') as outfile:
for idx, line in enumerate(infile) :
line = line.split()
# if idx is 0 its the headers so we don't want to change those
if idx != 0:
line[2] = '"B"'
outfile.write(' '.join(line) + '\n')
However, #Hamza's answer is potentially a nicer one using pandas.
Related
I've got multiple csv files, which I received in the following line format:
-8,000E-04,2,8E+1,
The first and the third comma are meant to be decimal separators, the second comma is a column delimiter and I think the last one is supposed to indicate a new line. So the csv should only consist of two columns and I have to prepare the data in order to plot it. Therefore I need to specify the two columns as x and y to plot the data.I tried removing or replacing the separators in every line but by doing that I'm no longer able to specify the two columns. Is there a way to remove certain separators from every line of the csv?
You can use the string returned by reading line as follow
line="-8,000E-04,2,8E+1,"
list_string = line.split(',')
x= float(list_string[0]+"."+list_string[1])
y= float(list_string[2]+"."+list_string[3])
print(x,y)
Result is
-0.0008 28.0
you can arrange x and y in columns also or whatever you want
Here a short program in python to convert your csv-files
import csv
f1 = "in_test.csv"
f2 = "out_test.csv"
with open(f1, newline='') as csv_reader:
reader = csv.reader(csv_reader, delimiter=',')
with open(f2, mode='w', newline='') as csv_writer:
writer = csv.writer(csv_writer, delimiter=";")
for row in reader:
out_row = [row[0] + '.' + row[1], row[2] + '.' + row[3]]
writer.writerow(out_row)
Sample input:
-8,000E-04,2,8E+1,
-2,000E-03,2,7E+2,
Sample output:
-8.000E-04;2.8E+1
-2.000E-03;2.7E+2
I think you should replace the second comma using regex. Well, I'm definitely not an expert at it, but I've managed to come up with this:
import re
s = "-8,000E-04,2,8E+1,"
pattern = "^([^,]*,[^,]*),(.*),$"
grps = re.search(pattern, s).groups()
res = [float(s.replace(",", ".")) for s in grps]
print(res)
# [-0.0008, 28.0]
Sample csv file:
-8,000E-04,2,8E+1,
6,0E-6,-45E+2,
-5,550E-6,-6,2E+1,
And you can do something like this:
x = []
y = []
regex = re.compile("^([^,]*,[^,]*),(.*),$")
with open("a.csv") as f:
for line in f:
result = regex.search(line).groups()
x.append(float(result[0].replace(",", ".")))
y.append(float(result[1].replace(",", ".")))
The result is:
print(x, y)
# [-0.0008, 6e-06, -5.55e-06] [28.0, -4500.0, -62.0]
I'm not sure this is the most efficient way, but it works.
I have a CSV file with contents:
scenario1,5,dosomething
scenario2,10,donothing
scenario3,8,dosomething
scenario4,5,donothing
I would like to take the contents of a variable to firstly see if it is in the first column, if true - I would like to get the row number where it is found and the entire line contents. There will be no duplicate values in column 1 of the csv.
I can partly do the first step which is to find if the variable is in the csv, returning the whole line.
import csv
filename = csv.reader(open('/file.csv', "rb"), delimiter=",")
v = 'scenario1'
for row in configfile:
if 'v' in row[0]:
print row
The results I receive would be:
['scenario1','5','dosomething']
But I need assistance with the second part please. This is to find the row number.
Try this:
import csv
with open("ooo.csv", "r") as f:
reader = csv.reader(f)
for line_num, content in enumerate(reader):
if content[0] == "scenario1":
print content, line_num + 1
Or without csv module:
with open("ooo.csv") as f:
for l, i in enumerate(f):
data = i.split(",")
if data[0] == "scenario1":
print data, l + 1
Output:
['scenario1', '5', 'dosomething'] 1
I have a CSV file.
There are a fixed number of columns and an unknown number of rows.
The information I need is always in the same 2 columns but not in the same row.
When column 6 has a 17 character value I also need to get the data from column 0.
This is an example row from the CSV file:
E4:DD:EF:1C:00:4F, 2012-10-08 11:29:04, 2012-10-08 11:29:56, -75, 9, 18:35:2C:18:16:ED,
You could open the file and go through it line by line. Split the line and if element 6 has 17 characters append element 0 to your result array.
f = open(file_name, 'r')
res = []
for line in f:
L = line.split(',')
If len(L[6])==17:
res.append(L[0])
Now you have a list with all the elements in column 6 of you cvs.
You can use csv module to read the csv files and you can provide delimiter/dialect as you need (, or | or tab etc..) while reading the file using csv reader.
csv reader takes care of providing the row/record with columns as list of values. If you want access the csv record/row as dict then you can use DictReader and its methods.
import csv
res = []
with open('simple.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
# Index start with 0 so the 6th column will be 5th index
# Using strip method would trim the empty spaces of column value
# Check the length of columns is more than 5 to handle uneven columns
if len(row) > 5 and len(row[5].strip()) == 17:
res.append(row[0])
# Result with column 0 where column 6 has 17 char length
print res
I am trying to select specific columns from a large tab-delimited CSV file and output only certain columns to a new CSV file. Furthermore, I want to recode the data as this happens. If the cell has a value of 0 then just output 0. However, if the cell has a value of greater than 0, then just output 1 (i.e., all values greater than 0 are coded as 1).
Here's what I have so far:
import csv
outputFile = open('output.csv', 'wb')
outputWriter = csv.writer(outputFile)
included_cols = range(9,2844)
with open('source.txt', 'rb') as f:
reader = csv.reader(f, delimiter='\t')
for row in reader:
content = list(row[i] for i in included_cols)
outputWriter.writerow(content)
The first issue I am having is that I want to also take from column 6. I wasn't sure how to write column 6 and then columns 9-2844.
Second, I wasn't sure how to do the recoding on the fly as I write the new CSV.
I wasn't sure how to write column 6 and then columns 9-2844.
included_cols = [6] + list(range(9,2844))
This works because you can add two lists together. Note that in Python3, range doesn't return a list, so we have to coerce it.
I wasn't sure how to do the recoding on the fly
content = list((1 if row[i] > 0 else 0) for i in included_cols)
This works because of the conditional expression: 1 if row[i] > 0 else 0. The general form A if cond else B evaluates to either A or B, depending upon the condition.
Another form, which I think is "too clever by half" is content = list((row[i] and 1) for i in included_cols). This works because the and operator always returns one or the other of its inputs.
This should work:
import csv
outputFile = open('output.csv', 'wb')
outputWriter = csv.writer(outputFile)
included_cols = [5] + range(8,2844) # you can just merge two lists
with open('source.txt', 'rb') as f:
reader = csv.reader(f, delimiter='\t')
outputWriter.writerow(reader[0]) # write header row unchanged
for row in reader[1:]: # skip header row
content = [int(row[i]) if i == 5 else (0 if int(row[i]) == 0 else 1) for i in included_cols]
outputWriter.writerow(content)
unique.txt file contains: 2 columns with columns separated by tab. total.txt file contains: 3 columns each column separated by tab.
I take each row from unique.txt file and find that in total.txt file. If present then extract entire row from total.txt and save it in new output file.
###Total.txt
column a column b column c
interaction1 mitochondria_205000_225000 mitochondria_195000_215000
interaction2 mitochondria_345000_365000 mitochondria_335000_355000
interaction3 mitochondria_345000_365000 mitochondria_5000_25000
interaction4 chloroplast_115000_128207 chloroplast_35000_55000
interaction5 chloroplast_115000_128207 chloroplast_15000_35000
interaction15 2_10515000_10535000 2_10505000_10525000
###Unique.txt
column a column b
mitochondria_205000_225000 mitochondria_195000_215000
mitochondria_345000_365000 mitochondria_335000_355000
mitochondria_345000_365000 mitochondria_5000_25000
chloroplast_115000_128207 chloroplast_35000_55000
chloroplast_115000_128207 chloroplast_15000_35000
mitochondria_185000_205000 mitochondria_25000_45000
2_16595000_16615000 2_16585000_16605000
4_2785000_2805000 4_2775000_2795000
4_11395000_11415000 4_11385000_11405000
4_2875000_2895000 4_2865000_2885000
4_13745000_13765000 4_13735000_13755000
My program:
file=open('total.txt')
file2 = open('unique.txt')
all_content=file.readlines()
all_content2=file2.readlines()
store_id_lines = []
ff = open('match.dat', 'w')
for i in range(len(all_content)):
line=all_content[i].split('\t')
seq=line[1]+'\t'+line[2]
for j in range(len(all_content2)):
if all_content2[j]==seq:
ff.write(seq)
break
Problem:
but istide of giving desire output (values of those 1st column that fulfile the if condition). i nead somthing like if jth of unique.txt == ith of total.txt then write ith row of total.txt into new file.
import csv
with open('unique.txt') as uniques, open('total.txt') as total:
uniques = list(tuple(line) for line in csv.reader(uniques))
totals = {}
for line in csv.reader(total):
totals[tuple(line[1:])] = line
with open('output.txt', 'w') as outfile:
writer = csv.writer(outfile)
for line in uniques:
writer.writerow(totals.get(line, []))
I will write your code in this way:
file=open('total.txt')
list_file = list(file)
file2 = open('unique.txt')
list_file2 = list(file2)
store_id_lines = []
ff = open('match.dat', 'w')
for curr_line_total in list_file:
line=curr_line_total.split('\t')
seq=line[1]+'\t'+ line[2]
if seq in list_file2:
ff.write(curr_line_total)
Please, avoid readlines() and use the with syntax when you open your files.
Here is explained why you don't need to use readlines()