Python
I have 1000+ files with numerically consecutive names like IMAG0000.JPG that I have saved as list, converted to string, and saved as a text file. I want the text file to look like this:
IMAG0000.JPG
IMAG0001.JPG
IMAG0002.JPG
IMAG0003.JPG
...
Currently, it looks like:
IMAG0000.JPGIMAG0001.JPGIMAG0002.JPGIMAG0003.JPG...
I can't quite figure out where to put \n to make it format correctly. This is what I have so far...
import glob
newfiles=[]
filenames=glob.glob('*.JPG')
newfiles =''.join(filenames)
f=open('file.txt','w')
f.write(newfiles)
You concat with an empty string '' instead of '\n'.
newfiles = '\n'.join(filenames)
f = open('file.txt','w')
f.write(newfiles) # keep in mind to use f.close()
or safer (i.e. releasing the file handle):
with open("file.txt", w) as f:
f.write('\n'.join(filenames))
or instead of concatting everything:
with open("file.txt", w) as f:
for filename in filenames:
f.write(filename + '\n')
Try this:
newfiles = '\n'.join(filenames)
Side note: it's good practice to use the with keyword when dealing with file objects, so the code:
f=open('file.txt','w')
f.write(newfiles)
would become:
with open('file.txt','w') as f:
f.write(newfiles)
That way you do not need to explicitly do f.close() to close the file.
Related
I'm trying to have output to be without commas, and separate each line into two strings and print them.
My code so far yields:
173,70
134,63
122,61
140,68
201,75
222,78
183,71
144,69
But i'd like it to print it out without the comma and the values on each line separated as strings.
if __name__ == '__main__':
# Complete main section of code
file_name = "data.txt"
# Open the file for reading here
my_file = open('data.txt')
lines = my_file.read()
with open('data.txt') as f:
for line in f:
lines.split()
lines.replace(',', ' ')
print(lines)
In your sample code, line contains the full content of the file as a str.
my_file = open('data.txt')
lines = my_file.read()
You then later re-open the file to iterate the lines:
with open('data.txt') as f:
for line in f:
lines.split()
lines.replace(',', ' ')
Note, however, str.split and str.replace do not modify the existing value, as strs in python are immutable. Also note you are operating on lines there, rather than the for-loop variable line.
Instead, you'll need to assign the result of those functions into new values, or give them as arguments (E.g., to print). So you'll want to open the file, iterate over the lines and print the value with the "," replaced with a " ":
with open("data.txt") as f:
for line in f:
print(line.replace(",", " "))
Or, since you are operating on the whole file anyway:
with open("data.txt") as f:
print(f.read().replace(",", " "))
Or, as your file appears to be CSV content, you may wish to use the csv module from the standard library instead:
import csv
with open("data.txt", newline="") as csvfile:
for row in csv.reader(csvfile):
print(*row)
with open('data.txt', 'r') as f:
for line in f:
for value in line.split(','):
print(value)
while python can offer us several ways to open files this is the prefered one for working with files. becuase we are opening the file in lazy mode (this is the prefered one espicialy for large files), and after exiting the with scope (identation block) the file io will be closed automaticly by the system.
here we are openening the file in read mode. files folow the iterator polices, so we can iterrate over them like lists. each line is a true line in the file and is a string type.
After getting the line, in line variable, we split (see str.split()) the line into 2 tokens, one before the comma and the other after the comma. split return new constructed list of strings. if you need to omit some unwanted characters you can use the str.strip() method. usualy strip and split combined together.
elegant and efficient file reading - method 1
with open("data.txt", 'r') as io:
for line in io:
sl=io.split(',') # now sl is a list of strings.
print("{} {}".format(sl[0],sl[1])) #now we use the format, for printing the results on the screen.
non elegant, but efficient file reading - method 2
fp = open("data.txt", 'r')
line = None
while (line=fp.readline()) != '': #when line become empty string, EOF have been reached. the end of file!
sl=line.split(',')
print("{} {}".format(sl[0],sl[1]))
I have a large text file in python. I want to split it into 2, using a keyword. The file above the keyword must be copied to one file and the rest of the file into other. I want to save these files with different extensions in the same directory. Please help me with this.
Also, how to convert a file from one format to another format?
For example, .txt to .xml or .cite to .xml ?
To answer the first part of your question, you can simply use the split function after reading the text and write them to your new files:
with open('oldfile.txt', 'r') as fh:
text_split = fh.read().split(keyword)
with open('newfile' + extension1, 'w') as fh:
fh.write(text_split[0])
with open('newfile' + extension2, 'w') as fh:
# If you know that the keyword only appears once
# you can changes this to fh.write(text_split[1])
fh.write(keyword.join(text_split[1:]))
The second part of your question is much more difficult. I don't know what kind of file format that you are working with, but txt files are just plain text with no specific structure. XML files cannot be converted from any arbitrary format. If you are working with XML files with a .txt format, you can simply change the format to XML, but if you are looking to convert a format like CSV, I suggest you use a library such as lxml.
Edit: If the file does not fit into memory, then you can iterate through the lines instead:
with open('oldfile.txt', 'r') as fh:
fh_new = open('newfile' + extension1, 'w')
keyword_found = False
line = fh.readline()
while line:
if not keyword_found:
text_split = line.split(keyword)
fh_new.write(text_split[0])
if len(text_split) > 1:
fh_new.close()
keyword_found = True
fh_new = open('newfile' + extension2, 'w')
fh_new.write(text_split[1:])
else:
fh_new.write(line)
line = fh.readline()
fh_new.close()
about splitting your file this should do it( considering the largeness of the file):
import mmap
regex=b'your keyword'
f=open('your_path_to_the_main_file','rb')
s = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
first_occurance_position=s.find(regex)
if(first_occurance_position==0)
print('this is a mistake')
f.close()
quit()
buf_size=0xfff
first_part_file=open('your_path_to_the_first_part'+'.its_extension','wb')
second_part_file=open('your_path_to_the_second_part'+'.its_extension','wb')
i=0;
if(buf_size>len(regex)):
buf_size=len(regex)
b=f.read(buf_size)
while(b):
i=i+buf_size
first_part_file.write(b)
if(i==first_occurance_position):
break
if(first_occurance_position-i<buf_size):
buf_size=first_occurance_position-i
b=f.read(buf_size)
b=f.read(0xffff)
while(b):
second_part_file.write(b)
b=f.read(0xffff)
first_part_file.close()
second_part_file.close()
f.close()
I have a text file with about 20 entries. They look like this:
~
England
Link: http://imgur.com/foobar.jpg
Capital: London
~
Iceland
Link: http://imgur.com/foobar2.jpg
Capital: Reykjavik
...
etc.
I would like to take these entries and turn them into a CSV.
There is a '~' separating each entry. I'm scratching my head trying to figure out how to go thru line by line and create the CSV values for each country. Can anyone give me a clue on how to go about this?
Use the libraries luke :)
I'm assuming your data is well formatted. Most real world data isn't that way. So, here goes a solution.
>>> content.split('~')
['\nEngland\nLink: http://imgur.com/foobar.jpg\nCapital: London\n', '\nIceland\nLink: http://imgur.com/foobar2.jpg\nCapital: Reykjavik\n', '\nEngland\nLink: http://imgur.com/foobar.jpg\nCapital: London\n', '\nIceland\nLink: http://imgur.com/foobar2.jpg\nCapital: Reykjavik\n']
For writing the CSV, Python has standard library functions.
>>> import csv
>>> csvfile = open('foo.csv', 'wb')
>>> fieldnames = ['Country', 'Link', 'Capital']
>>> writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
>>> for entry in entries:
... cols = entry.strip().splitlines()
... writer.writerow({'Country': cols[0], 'Link':cols[1].split(': ')[1], 'Capital':cols[2].split(':')[1]})
...
If your data is more semi structured or badly formatted, consider using a library like PyParsing.
Edit:
Second column contains URLs, so we need to handle the splits well.
>>> cols[1]
'Link: http://imgur.com/foobar2.jpg'
>>> cols[1].split(':')[1]
' http'
>>> cols[1].split(': ')[1]
'http://imgur.com/foobar2.jpg'
The way that I would do that would be to use the open() function using the syntax of:
f = open('NameOfFile.extensionType', 'a+')
Where "a+" is append mode. The file will not be overwritten and new data can be appended. You could also use "r+" to open the file in read mode, but would lose the ability to edit. The "+" after a letter signifies that if the document does not exist, it will be created. The "a+" I've never found to work without the "+".
After that I would use a for loop like this:
data = []
tmp = []
for line in f:
line.strip() #Removes formatting marks made by python
if line == '~':
data.append(tmp)
tmp = []
continue
else:
tmp.append(line)
Now you have all of the data stored in a list, but you could also reformat it as a class object using a slightly different algorithm.
I have never edited CSV files using python, but I believe you can use a loop like this to add the data:
f2 = open('CSVfileName.csv', 'w') #Can change "w" for other needs i.e "a+"
for entry in data:
for subentry in entry:
f2.write(str(subentry) + '\n') #Use '\n' to create a new line
From my knowledge of CSV that loop would create a single column of all of the data. At the end remember to close the files in order to save the changes:
f.close()
f2.close()
You could combine the two loops into one in order to save space, but for the sake of explanation I have not.
I am trying to search a large group of text files (160K) for a specific string that changes for each file. I have a text file that has every file in the directory with the string value I want to search. Basically I want to use python to create a new text file that gives the file name, the string, and a 1 if the string is present and a 0 if it is not.
The approach I am using so far is to create a dictionary from a text file. From there I am stuck. Here is what I figure in pseudo-code:
**assign dictionary**
d = {}
with open('file.txt') as f:
d = dict(x.rstrip().split(None, 1) for x in f)
**loop through directory**
for filename in os.listdir(os.getcwd()):
***here is where I get lost***
match file name to dictionary
look for string
write filename, string, 1 if found
write filename, string, 0 if not found
Thank you. It needs to be somewhat efficient since its a large amount of text to go through.
Here is what I ended up with
d = {}
with open('ibes.txt') as f:
d = dict(x.rstrip().split(None, 1) for x in f)
import os
for filename in os.listdir(os.getcwd()):
string = d.get(filename, "!##$%^&*")
if string in open(filename, 'r').read():
with open("ibes_in.txt", 'a') as out:
out.write("{} {} {}\n".format(filename, string, 1))
else:
with open("ibes_in.txt", 'a') as out:
out.write("{} {} {}\n".format(filename, string, 0))
As I understand your question, the dictionary relates file names to strings
d = {
"file1.txt": "widget",
"file2.txt": "sprocket", #etc
}
If each file is not too large you can read each file into memory:
for filename in os.listdir(os.getcwd()):
string = d[filename]
if string in open(filename, 'r').read():
print(filename, string, "1")
else:
print(filename, string, "0")
This example uses print, but you could write to a file instead. Open the output file before the loop outfile = open("outfile.txt", 'w') and instead of printing use
outfile.write("{} {} {}\n".format(filename, string, 1))
On the other hand, if each file is too large to fit easily into memory, you could use a mmap as described in Search for string in txt file Python
I have written a python script to process a set of ASCII files within a given dir. I wonder if there is a more concise and/or "pythonesque" way to do it, without loosing readability?
Python Code
import os
import fileinput
import glob
import string
indir='./'
outdir='./processed/'
for filename in glob.glob(indir+'*.asc'): # get a list of input ASCII files to be processed
fin=open(indir+filename,'r') # input file
fout=open(outdir+filename,'w') # out: processed file
lines = iter(fileinput.input([indir+filename])) # iterator over all lines in the input file
fout.write(next(lines)) # just copy the first line (the header) to output
for line in lines:
val=iter(string.split(line,' '))
fout.write('{0:6.2f}'.format(float(val.next()))), # first value in the line has it's own format
for x in val: # iterate over the rest of the numbers in the line
fout.write('{0:10.6f}'.format(float(val.next()))), # the rest of the values in the line has a different format
fout.write('\n')
fin.close()
fout.close()
An example:
Input:
;;; This line is the header line
-5.0 1.090074154029272 1.0034662411357929 0.87336062116561186 0.78649408279093869 0.65599958665017222 0.4379879132749317 0.26310799350679176 0.087808018565486673
-4.9900000000000002 1.0890770415316042 1.0025480136545413 0.87256100700428996 0.78577373527626004 0.65539842673645277 0.43758616966566649 0.26286647978335914 0.087727357602906453
-4.9800000000000004 1.0880820021223023 1.0016316956763136 0.87176305623792771 0.78505488659611744 0.65479851808106115 0.43718526271594083 0.26262546925502467 0.087646864773454014
-4.9700000000000006 1.0870890372077564 1.0007172884938402 0.87096676998908273 0.78433753775986659 0.65419986152386733 0.4367851929843618 0.26238496225635727 0.087566540188423345
-4.9600000000000009 1.086098148170821 0.99980479337809591 0.87017214936140763 0.78362168975984026 0.65360245789061966 0.4363859610200459 0.26214495911617541 0.087486383957276398
Processed:
;;; This line is the header line
-5.00 1.003466 0.786494 0.437988 0.087808
-4.99 1.002548 0.785774 0.437586 0.087727
-4.98 1.001632 0.785055 0.437185 0.087647
-4.97 1.000717 0.784338 0.436785 0.087567
-4.96 0.999805 0.783622 0.436386 0.087486
Other than a few minor changes, due to how Python has changed through time, this looks fine.
You're mixing two different styles of next(); the old way was it.next() and the new is next(it). You should use the string method split() instead of going through the string module (that module is there mostly for backwards compatibility to Python 1.x). There's no need to use go through the almost useless "fileinput" module, since open file handle are also iterators (that module comes from a time before Python's file handles were iterators.)
Edit: As #codeape pointed out, glob() returns the full path. Your code would not have worked if indir was something other than "./". I've changed the following to use the correct listdir/os.path.join solution. I'm also more familiar with the "%" string interpolation than string formatting.
Here's how I would write this in more idiomatic modern Python
def reformat(fin, fout):
fout.write(next(fin)) # just copy the first line (the header) to output
for line in fin:
fields = line.split(' ')
# Make a format header specific to the number of fields
fmt = '%6.2f' + ('%10.6f' * (len(fields)-1)) + '\n'
fout.write(fmt % tuple(map(float, fields)))
basenames = os.listdir(indir) # get a list of input ASCII files to be processed
for basename in basenames:
input_filename = os.path.join(indir, basename)
output_filename = os.path.join(outdir, basename)
with open(input_filename, 'r') as fin, open(output_filename, 'w') as fout:
reformat(fin, fout)
The Zen of Python is "There should be one-- and preferably only one --obvious way to do it". It's interesting how you functions which, during the last 10+ years, was "obviously" the right solution, but are no longer. :)
fin=open(indir+filename,'r') # input file
fout=open(outdir+filename,'w') # out: processed file
#code
fin.close()
fout.close()
can be written as:
with open(indir+filename,'r') as fin, open(outdir+filename,'w') as fout:
#code
In python 2.6, you can use:
with open(indir+filename,'r') as fin:
with open(outdir+filename,'w') as fout:
#code
And the line
lines = iter(fileinput.input([indir+filename]))
is useless. You can just iterate over an open file(fin in your case)
You can also do line.split(' ') instead of string.split(line, ' ')
If you change those things, there is no need to import string and fileinput.
Edit: I didn't know you can use inline code. That's cool
In my build script, I have this code:
inFile = open(sourceFile,'r')
outFile = open(targetFile,'w')
for line in inFile:
line = doKeywordSubstitution(line)
outFile.write(line)
inFile.close()
outFile.close()
I don't know of a way to make this any more concise. Putting the line-changing logic in a different function looks neater to me though.
I may be missing the point of your code, but I don't understand why you have lines = iter(fileinput.input([indir+filename])).
I don't understand why do you use: string.split(line, ' ') instead of just line.split(' ').
Well maybe I would write the string-processing part like this:
values = line.split(' ')
values[0] = '{0:6.2f}'.format(float(values[0]))
values[1:] = ['{0:10.6f}'.format(float(v)) for v in values[1:]]
fout.write(' '.join(values))
At least for me this looks better but this might be subjective :)
Instead of indir I would use os.curdir. Instead of "./processed" I would do: os.path.join(os.curdir, 'processed').