I have these different lines with values in a text file
sample1:1
sample2:1
sample3:0
sample4:15
sample5:500
and I want the number after the ":" to be updated sometimes
I know I can split the name by ":" and get a list with 2 values.
f = open("test.txt","r")
lines = f.readlines()
lineSplit = lines[0].split(":",1)
lineSplit[1] #this is the value I want to change
im not quite sure how to update the lineSplit[1] value with the write functions
You can use the fileinput module, if you're trying to modify the same file:
>>> strs = "sample4:15"
Take the advantage of sequence unpacking to store the results in variables after splitting.
>>> sample, value = strs.split(':')
>>> sample
'sample4'
>>> value
'15'
Code:
import fileinput
for line in fileinput.input(filename, inplace = True):
sample, value = line.split(':')
value = int(value) #convert value to int for calculation purpose
if some_condition:
# do some calculations on sample and value
# modify sample, value if required
#now the write the data(either modified or still the old one) to back to file
print "{}:{}".format(sample, value)
Strings are immutable, meaning, you can't assign new values inside them by index.
But you can split up the whole file into a list of lines, and change individual lines (strings) entirely. This is what you're doing in lineSplit[1] = A_NEW_INTEGER
with open(filename, 'r') as f:
lines = f.read().splitlines()
for i, line in enumerate(lines):
if condition:
lineSplit = line.split(':')
lineSplit[1] = new_integer
lines[i] = ':'.join(lineSplit)
with open(filename, 'w') as f:
f.write('\n'.join(lines)
Maybe something as such (assuming that each first element before : is indeed a key):
from collections import OrderedDict
with open('fin') as fin:
samples = OrderedDict(line.split(':', 1) for line in fin)
samples['sample3'] = 'something else'
with open('output') as fout:
lines = (':'.join(el) + '\n' for el in samples.iteritems())
fout.writelines(lines)
Another option is to use csv module (: is a column delimiter in your case).
Assuming there is a test.txt file with the following content:
sample1:1
sample2:1
sample3:0
sample4:15
sample5:500
And you need to increment each value. Here's how you can do it:
import csv
# read the file
with open('test.txt', 'r') as f:
reader = csv.reader(f, delimiter=":")
lines = [line for line in reader]
# write the file
with open('test.txt', 'w') as f:
writer = csv.writer(f, delimiter=":")
for line in lines:
# edit the data here
# e.g. increment each value
line[1] = int(line[1]) + 1
writer.writerows(lines)
The contents of test.txt now is:
sample1:2
sample2:2
sample3:1
sample4:16
sample5:501
But, anyway, fileinput sounds more logical to use in your case (editing the same file).
Hope that helps.
Related
I have the following data:
Graudo. A selection of Pouteria caimito, a minor member...
TtuNextrecod. A selection of Pouteria caimito, a minor member of the Sapotaceae...
I want to split it into two columns
Column1 Column2
------------------------------------------------------------------------------
Graudo A selection of Pouteria caimito, a minor member...
TtuNextrecod A selection of Pouteria caimito, a minor member of the Sapotaceae...
Need help with the code. Thanks,
import csv # convert
import itertools #function for a efficient looping
with open('Abiutxt.txt', 'r') as in_file:
lines = in_file.read().splitlines() #returns a list with all the lines in string, including the line breaks
test = [line.split('. ')for line in lines ] #split period....but...need work
print(test)
stripped = [line.replace('', '').split('. ')for line in lines ]
grouped = itertools.izip(*[stripped]*1)
with open('logtestAbiutxt.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('Column1', 'Column2'))
for group in grouped:
writer.writerows(group)
I am not sure you need zipping here at all. Simply iterate over every line of the input file, skip empty lines, split by the period and write to the csv file:
import csv
with open('Abiutxt.txt', 'r') as in_file:
with open('logtestAbiutxt.csv', 'w') as out_file:
writer = csv.writer(out_file, delimiter="\t")
writer.writerow(['Column1', 'Column2'])
for line in in_file:
if not line.strip():
continue
writer.writerow(line.strip().split(". ", 1))
Notes:
Note: specified a tab as a delimiter, but you could change it appropriately
thanks to #PatrickHaugh for the idea to split by the first occurence of ". " only as your second column may contain periods as well.
This should get you what you want. This will handle all the escaping.
import csv
with open('Abiutxt.txt', 'r') as in_file:
x = in_file.read().splitlines()
x = [line.split('. ', 1) for line in x if line]
with open('logtestAbiutxt.csv', "w") as output:
writer = csv.writer(output, lineterminator='\n')
writer.writerow(['Column1', 'Column2'])
writer.writerows(x)
Suppose I have a big file as file.txt and it has data of around 300,000. I want to split it based on certain key location. See file.txt below:
Line 1: U0001;POUNDS;**CAN**;1234
Line 2: U0001;POUNDS;**USA**;1234
Line 3: U0001;POUNDS;**CAN**;1234
Line 100000; U0001;POUNDS;**CAN**;1234
The locations are limited to 10-15 different nation. And I need to separate each record of a particular country in one particular file. How to do this task in Python
Thanks for help
This will run with very low memory overhead as it writes each line as it reads it.
Algorithm:
open input file
read a line from input file
get country from line
if new country then open file for country
write the line to country's file
loop if more lines
close files
Code:
with open('file.txt', 'r') as infile:
try:
outfiles = {}
for line in infile:
country = line.split(';')[2].strip('*')
if country not in outfiles:
outfiles[country] = open(country + '.txt', 'w')
outfiles[country].write(line)
finally:
for outfile in outfiles.values():
outfile.close()
with open("file.txt") as f:
content = f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line
text = [x.strip() for x in content]
x = [i.split(";") for i in text]
x.sort(key=lambda x: x[2])
from itertools import groupby
from operator get itemgetter
y = groupby(x, itemgetter(2))
res = [(i[0],[j for j in i[1]]) for i in y]
for country in res:
with open(country[0]+".txt","w") as writeFile:
writeFile.writelines("%s\n" % ';'.join(l) for l in country[1])
will group by your item!
Hope it helps!
Looks like what you have is a csv file. csv stands for comma-separated values, but any file that uses a different delimiter (in this case a semicolon ;) can be treated like a csv file.
We'll use the python module csv to read the file in, and then write a file for each country
import csv
from collections import defaultdict
d = defaultdict(list)
with open('file.txt', 'rb') as f:
r = csv.reader(f, delimiter=';')
for line in r:
d[l[2]].append(l)
for country in d:
with open('{}.txt'.format(country), 'wb') as outfile:
w = csv.writer(outfile, delimiter=';')
for line in d[country]:
w.writerow(line)
# the formatting-function for the filename used for saving
outputFileName = "{}.txt".format
# alternative:
##import time
##outputFileName = lambda loc: "{}_{}.txt".format(loc, time.asciitime())
#make a dictionary indexed by location, the contained item is new content of the file for the location
sortedByLocation = {}
f = open("file.txt", "r")
#iterate each line and look at the column for the location
for l in f.readlines():
line = l.split(';')
#the third field (indices begin with 0) is the location-abbreviation
# make the string lower, cause on some filesystems the file with upper chars gets overwritten with only the elements with lower characters, while python differs between the upper and lower
location = line[2].lower().strip()
#get previous lines of the location and store it back
tmp = sortedByLocation.get(location, "")
sortedByLocation[location]=tmp+l.strip()+'\n'
f.close()
#save file for each location
for location, text in sortedByLocation.items():
with open(outputFileName(location) as f:
f.write(text)
nf=open(Output_File,'w+')
with open(Input_File,'read') as f:
for row in f:
Current_line = str(row)
Reformated_line=str(','.join(Current_line.split('|')[1:-1]))
nf.write(Reformated_line+ "\n")
I'm trying to read Input file which is in Table Format and write it in a CSV file, but my Output contains one last empty line also. How can I remove the last empty line in CSV?
It sounds like you have an empty line in your input file. From your comments, you actually have a non-empty line that has no | characters in it. In either case, it is easy enough to check for an empty result line.
Try this:
#UNTESTED
nf=open(Output_File,'w+')
with open(Input_File,'read') as f:
for row in f:
Current_line = str(row)
Reformated_line=str(','.join(Current_line.split('|')[1:-1]))
if Reformatted_line:
nf.write(Reformated_line+ "\n")
Other notes:
You should use with consistently. Open both files the same way.
str(row) is a no-op. row is already a str.
str(','.join(...)) is similarly redundant.
open(..., 'read') is not a valid use of the mode parameter to open(). You should use r or even omit the parameter altogether.
I prefer not to introduce new names when changing the format of existing data. That is, I prefer row = row.split() over Reformatted_line = row.split().
Here is a version that incorporates these and other suggestions:
with open(Input_File) as inf, open(Output_File, 'w+') as outf:
for row in inf:
row = ','.join(row.split('|')[1:-1])
if row:
outf.write(row + "\n")
Just a question of reordering things a little:
first = True
with open(Input_File,'read') as f, open(Output_File,'w+') as nf:
for row in f:
Current_line = str(row)
Reformated_line=str(','.join(Current_line.split('|')[1:-1]))
if not first:
nf.write('\n')
else:
first = False
nf.write(Reformated_line)
I'm trying to write a very simple program using tuples. Which works for the most part but I can't really get it to work by accessing individual elements in the tuples.
I'm taking input from a file containing some info convert it to tuple and the store the data in some other file.
It works if I write all the data or just the first tuple but not in any other case. Following is the code
filename = "in.txt"
stock_market = []
for line in open(filename):
fields = line.split(",")
name = fields[0]
shares = int(fields[1])
stock = (name,shares)
portfolio.append(stock)
f = open("output.txt","w")
print >>f, portfolio[1]
f.close()
You can't append to portfolio without defining it first. Try something like this:
inFilename = "in.txt"
outFilename = "output.txt"
with open(inFilename, 'r') as inf:
with open(outFilename, 'w') as outf:
for line in inf:
fields = line.split(',')
print >>outf, (fields[0], fields[1])
How can I skip the header row and start reading a file from line2?
with open(fname) as f:
next(f)
for line in f:
#do something
f = open(fname,'r')
lines = f.readlines()[1:]
f.close()
If you want the first line and then you want to perform some operation on file this code will helpful.
with open(filename , 'r') as f:
first_line = f.readline()
for line in f:
# Perform some operations
If slicing could work on iterators...
from itertools import islice
with open(fname) as f:
for line in islice(f, 1, None):
pass
f = open(fname).readlines()
firstLine = f.pop(0) #removes the first line
for line in f:
...
To generalize the task of reading multiple header lines and to improve readability I'd use method extraction. Suppose you wanted to tokenize the first three lines of coordinates.txt to use as header information.
Example
coordinates.txt
---------------
Name,Longitude,Latitude,Elevation, Comments
String, Decimal Deg., Decimal Deg., Meters, String
Euler's Town,7.58857,47.559537,0, "Blah"
Faneuil Hall,-71.054773,42.360217,0
Yellowstone National Park,-110.588455,44.427963,0
Then method extraction allows you to specify what you want to do with the header information (in this example we simply tokenize the header lines based on the comma and return it as a list but there's room to do much more).
def __readheader(filehandle, numberheaderlines=1):
"""Reads the specified number of lines and returns the comma-delimited
strings on each line as a list"""
for _ in range(numberheaderlines):
yield map(str.strip, filehandle.readline().strip().split(','))
with open('coordinates.txt', 'r') as rh:
# Single header line
#print next(__readheader(rh))
# Multiple header lines
for headerline in __readheader(rh, numberheaderlines=2):
print headerline # Or do other stuff with headerline tokens
Output
['Name', 'Longitude', 'Latitude', 'Elevation', 'Comments']
['String', 'Decimal Deg.', 'Decimal Deg.', 'Meters', 'String']
If coordinates.txt contains another headerline, simply change numberheaderlines. Best of all, it's clear what __readheader(rh, numberheaderlines=2) is doing and we avoid the ambiguity of having to figure out or comment on why author of the the accepted answer uses next() in his code.
If you want to read multiple CSV files starting from line 2, this works like a charm
for files in csv_file_list:
with open(files, 'r') as r:
next(r) #skip headers
rr = csv.reader(r)
for row in rr:
#do something
(this is part of Parfait's answer to a different question)
# Open a connection to the file
with open('world_dev_ind.csv') as file:
# Skip the column names
file.readline()
# Initialize an empty dictionary: counts_dict
counts_dict = {}
# Process only the first 1000 rows
for j in range(0, 1000):
# Split the current line into a list: line
line = file.readline().split(',')
# Get the value for the first column: first_col
first_col = line[0]
# If the column value is in the dict, increment its value
if first_col in counts_dict.keys():
counts_dict[first_col] += 1
# Else, add to the dict and set value to 1
else:
counts_dict[first_col] = 1
# Print the resulting dictionary
print(counts_dict)