So i'm importing a list of names
e.g.
Textfile would include:
Eleen
Josh
Robert
Nastaran
Miles
my_list = ['Eleen','Josh','Robert','Nastaran','Miles']
Then i'm assigning each name to a list and I want to write a new excel file for each name in that list.
#1. Is there anyway I can create a for loop where on the line:
temp = os.path.join(dir,'...'.xls')
_________________________
def high_throughput(names):
import os
import re
# Reading file
in_file=open(names,'r')
dir,file=os.path.split(names)
temp = os.path.join(dir,'***this is where i want to put a for loop
for each name in the input list of names***.xls')
out_file=open(temp,'w')
data = []
for line in in_file:
data.append(line)
in_file.close()
I'm still not sure what you're trying to do (and by "not sure", I mean "completely baffled"), but I think I can explain some of what you're doing wrong, and how to do it right:
in_file=open(names,'r')
dir,file=os.path.split(names)
temp = os.path.join(dir,'***this is where i want to put a for loop
for each name in the input list of names***.xls')
At this point, you don't have the input list of names. That's what you're reading from in_file, and you haven't read it yet. Later on, you read those named into data, after which you can use them. So:
in_file=open(names,'r')
dir,file=os.path.split(names)
data = []
for line in in_file:
data.append(line)
in_file.close()
for name in data:
temp = os.path.join(dir, '{}.xls'.format(name))
out_file=open(temp,'w')
Note that I put the for loop outside the function call, because you have to do that. And that's a good thing, because you presumably want to open each path (and do stuff to each file) inside that loop, not open a single path made out of a loop of files.
But if you don't insist on using a for loop, there is something that may be closer to what you were looking for: a list comprehension. You have a list of names. You can use that to build a list of paths. And then you can use that to build a list of open files. Like this:
paths = [os.path.join(dir, '{}.xls'.format(name)) for name in data]
out_files = [open(path, 'w') for path in paths]
Then, later, after you've built up the string you want to write to all the files, you can do this:
for out_file in out_files:
out_file.write(stuff)
However, this is kind of an odd design. Mainly because you have to close each file. They may get closed automatically by the garbage collection, and even if they don't, they may get flushed… but unless you get lucky, all that data you wrote is just sitting around in buffers in memory and never gets written to disk. Normally you don't want to write programs that depend on getting lucky. So, you want to close your files. With this design, you'd have to do something like:
for out_file in out_files:
out_file.close()
It's probably a lot simpler to go back to the one big loop I suggested in the first place, so you can do this:
for name in data:
temp = os.path.join(dir, '{}.xls'.format(name))
out_file=open(temp,'w')
out_file.write(stuff)
out_file.close()
Or, even better:
for name in data:
temp = os.path.join(dir, '{}.xls'.format(name))
with open(temp,'w') as out_file:
out_file.write(stuff)
A few more comments, while we're here…
First, you really shouldn't be trying to generate .xls files manually out of strings. You can use a library like openpyxl. Or you can create .csv files instead—they're easy to create with the csv library that comes built in with Python, and Excel can handle them just as easily as .xls files. Or you can use win32com or pywinauto to take control of Excel and make it create your files. Really, anything is better than trying to generate them by hand.
Second, the fact that you can write for line in in_file: means that an in_file is some kind of sequence of lines. So, if all you want to do is convert it to a list of lines, you can do that in one step:
data = list(in_file)
But really, the only reason you want this list in the first place is so you can loop around it later, creating the output files, right? So why not just hold off, and loop over the lines in the file in the first place?
Whatever you do to generate the output stuff, do that first. Then loop over the file with the list of filenames and write stuff. Like this:
stuff = # whatever you were doing later, in the code you haven't shown
dir = os.path.dirname(names)
with open(names, 'r') as in_file:
for line in in_file:
temp = os.path.join(dir, '{}.xls'.format(line))
with open(temp, 'w') as out_file:
out_file.write(stuff)
That replaces all of the code in your sample (except for that function named high_throughput that imports some modules locally and then does nothing).
Take a look at openpyxl, especially if you need to create .xlsx files. Below example assumes the Excel workbooks are created as blank.
from openpyxl import Workbook
names = ['Eleen','Josh','Robert','Nastaran','Miles']
for name in names:
wb = Workbook()
wb.save('{0}.xlsx'.format(name))
Try this:
in_file=open(names,'r')
dir,file=os.path.split(names)
for name in in_file:
temp = os.path.join(dir, name + '.xls')
with open(temp,'w') as out_file:
# write data to out_file
Related
I have a project named project1, where I take a big .txt file (around 1 GB). I make a list that has each line of the text as elements, with the following code:
txt = open('<path>', 'r', encoding="utf8")
lista = list(txt)
And then I edit the items in the list, which is not important for my question.
I need to use the variable lista in another project (project2), but i don't want to import it in the following way
from project1 import lista
because by doing that I have to run all the code in project1 to get the text in the .txt file and to edit the list.
So my goal is to use lista without having to run code that takes time, since lista will always be the same.
IMPORTANT NOTES
I can't just print it in project1, copy the output and paste it in project2 to use it as a variable, because the list is way too long.
One way I thought about that was to save lista as a string in a .txt file (let's call it lista.txt), open the .txt file in project2 and, in some way, tell python that the string in lista.txt is actually a list. Example to understand better:
In project 1
file_text = open('<path>\\lista.txt', 'w', encoding="utf8")
lista = ['<string_1>', '<string_2>', ..., '<string_n>']
file_text.write(f'{lista}')
file_text.close()
In project 2
file_text = open('<path>\\lista.txt', 'r', encoding="utf8")
list_as_string = file_text
def string_to_list(input_string):
#way to transform the list_as_string into the original "lista" variable, which is a list
#return list
string_to_list(list_as_string)
IMPORTANT: The way that I described looks to complex to me, so it was just an idea, but I'm sure there are better ways (maybe there is a way to save a python variable as a file that keeps information like its type and the directly import it in a project as a variable of that type, in this case a list)
May I suggest that you use txt.readlines() instead of list(txt) in order to get the lines unless every line in the file contains a single character. In Json/Pickle; dump/dumps enable you to save an object to an open file (you could save the list to a file) or obtain the source/bytes, respectively, that would be saved in a writable-file-object; load/loads allows to restore the content from the corresponding dump. Personally I would just make a new list using the file's path or encapsulate the code in the other script to make it less slow on import.
I have some txt files that their names have the following pattern:
arc.1.txt, arc.2.txt,...,arc.100.txt,..., arc.500.txt,...,arc.838.txt
I know that we can write a program using for loop to open the files one by one, if we now the total numbers of files. I want to know is it possible to use While loop without counting the number files to open them ?
import glob
for each_file in glob.glob("arc\.\d+\.txt"):
print(each_file)
It is definitely possible to use a while loop assuming that the files are numbered in sequential order:
i = 0
while True:
i += 1
filename = 'arc.%d.txt' % i
try:
with open(filename, 'r') as file_handle:
...
except IOError:
break
Though this becomes pretty ugly with all the nesting. You're probably better off getting the list of filenames using something like glob.glob.
from glob import glob
filenames = glob('arc.*.txt')
for filename in filenames:
with open(filename) as file_handle:
...
There are some race conditions associated with this second approach -- If the file somehow gets deleted between when glob found it and when it actually is time to process the file then your program could have a bad day.
If you add them all to a list and then remove them one by one and set the while condition to while (name of list) > 0: then open the next file
I am using
for file in fileList:
f.write(open(file).read())
I am combining files if a folder to one csv. However I dont need X amount of headers in the one file.
Is there a way to use this and have it write everything but the first row (header) coming from the files in the files?
Use python csv module
Or something like that:
for file_name in file_list:
file_obj = open(file_name)
file_obj.read()
f.write(file_obj.read())
This solution doesn't load whole file into memory, so when you use file_obj.readlines(), whole file content is load into memory
Note, that it isn't good practice to name variables with builtin names
for file in fileList:
mylines = open(file).readlines()
f.write("".join(mylines[1:]))
This should point you in the right direction. Please don't do your homework on stackoverflow.
If it's a cvs file, look into python csv lib.
Ok I'm going to try this again, apologies for my poor effort in my pervious question.
I am writing a program in Java and I wanted to move some files from one directory to another based on whether they appear in a list. I could do it manually but there a thousands of files in the directory so it would be an arduous task and I need to repeat it several times! I tried to do it in Java but because I am using Java it appears I cannot use java.nio, and I am not allowed to use external libraries.
So I have tried to write something in python.
import os
import shutil
with open('files.txt', 'r') as f:
myNames = [line.strip() for line in f]
print myNames
dir_src = "trainfile"
dir_dst = "train"
for file in os.listdir(dir_src):
print file # testing
src_file = os.path.join(dir_src, file)
dst_file = os.path.join(dir_dst, file)
shutil.move(src_file, dst_file)
"files.txt" is in the format:
a.txt
edfs.txt
fdgdsf.txt
and so on.
So at the moment it is moving everything from train to trainfile, but I need to only move files if the are in the myNames list.
Does anyone have any suggestions?
check whether the file name exists in the myNames list
put it before shutil.move
if file in myNames:
so at the moment it is moving everything from train to trainfile, but ii need to only move files if the are in the myName list
You can translate that "if they are in the myName list" directly from English to Python:
if file in myNames:
shutil.move(src_file, dst_file)
And that's it.
However, you probably want a set of names, rather than a list. It makes more conceptual sense. It's also more efficient, although the speed of looking things up in a list will probably be negligible compared to the cost of copying files around. Anyway, to do that, you need to change one more line:
myNames = {line.strip() for line in f}
And then you're done.
I have some data in simplejson format in txt files, which I read using:
with open("my_file.txt") as f: any_variable = simplejson.load(f)
It works fine, no problems. However, I now have 100's of such text files (some of which, I dont know the names for!) to read from and I was wondering, if there was a pythonic way to read all these files and assign them to say: any_variable1 to any_variableN. I dont really care in what order they are read in.
Obviously, a simple way would be to loop and store results, yet, I was wondering if there was a pythonic way here.
If the files are inside a directory, you can use:
variables = []
path = "/your/path"
for filename in os.listdir(path):
variables.append(simplejson.load(open(os.path.join(path, filename))))