Skip first row in python - python

I am using
for file in fileList:
f.write(open(file).read())
I am combining files if a folder to one csv. However I dont need X amount of headers in the one file.
Is there a way to use this and have it write everything but the first row (header) coming from the files in the files?

Use python csv module
Or something like that:
for file_name in file_list:
file_obj = open(file_name)
file_obj.read()
f.write(file_obj.read())
This solution doesn't load whole file into memory, so when you use file_obj.readlines(), whole file content is load into memory
Note, that it isn't good practice to name variables with builtin names

for file in fileList:
mylines = open(file).readlines()
f.write("".join(mylines[1:]))
This should point you in the right direction. Please don't do your homework on stackoverflow.
If it's a cvs file, look into python csv lib.

Related

Read paths in csv and open in unix

I need to read paths in a csv and go to these file paths in unix. I am wondering if there is any way to do that using unix or python commands. I am unsure of how to proceed and I am not able to find much resources in the net either.
The number of rows in the excel.csv is a lot and similar to below. I need to open the excel.csv and then read the first line and go to this file path. Once this file is opened using the file path, I need to be able to read the file and extract out certain information. I tried using python for this but I am unable to find much information and I am wondering if I can use unix commands to solve this. I am clueless on how to proceed for this one so I would appreciate any reference or help using either python or unix commands. Thank you!
/folder/file1
/folder/file2
/folder/file3
It shouldn't be very difficult to do this in Python, as reading csv files is part of the standard library. Your code could look something like this:
with open('data.csv', newline='') as fh:
# depending if the first row describes the header or not,
# you can also use the simple csv.reader here
for row in csv.DictReader(fh, strict=True):
# again, if you use the simple csv.reader, you'll have to access
# the column via index instead
file_path = row['path']
with open(file_path, 'r') as fh2:
data = fh2.read()
# do something with data

Python search files in multiple subdirectories for specific string and return file path(s) if present

I would be very grateful indeed for some help for a frustrated and confused Python beginner.
I am trying to create a script that searches a windows directory containing multiple subdirectories and different file types for a specific single string (a name) in the file contents and if found prints the filenames as a list. There are approximately 2000 files in 100 subdirectories, and all the files I want to search don't necessarily have the same extension - but are all in essence, ASCII files.
I've been trying to do this for many many days but I just cannot figure it out.
So far I have tried using glob recursive coupled with reading the file but I'm so very bewildered. I can successfully print a list of all the files in all subdirectories, but don't know where to go from here.
import glob
files = []
files = glob.glob('C:\TEMP' + '/**', recursive=True)
print(files)
Can anyone please help me? I am 72 year old scientist trying to improve my skills and "automate the boring stuff", but at the moment I'm just losing the will.
Thank you very much in advance to this community.
great to have you here!
What you have done so far is found all the file paths, now the simplest way is to go through each of the files, read them into the memory one by one and see if the name you are looking for is there.
import glob
files = glob.glob('C:\TEMP' + '/**', recursive=True)
target_string = 'John Smit'
# itereate over files
for file in files:
try:
# open file for reading
with open(file, 'r') as f:
# read the contents
contents = f.read()
# check if contents have your target string
if target_string in conents:
print(file)
except:
pass
This will print the file path each time it found a name.
Please also note I have removed the second line from your code, because it is redundant, you initiate the list in line 3 anyway.
Hope it helps!
You could do it like this, though i think there must be a better approach
When you find all files in your directory, you iterate over them and check if they contain that specific string.
for file in files:
if(os.path.isfile(file)):
with open(file,'r') as f:
if('search_string' in f.read()):
print(file)

Python csv : writing to a different directory

I'm downloading files from a site and I need to save the original file, then open it and then add the url that the file was downloaded from and the date of the download to the file before saving the file to a different directory.
I've used this answer to amend the csv: how to Add New column in beginning of CSV file by Python
but I'm struggling to redirect the file to a different directory before the write() function is called.
Is the best answer to write the file and then move it, or is there a way to write the file to a different directory within the open() function?
if fileName in fileList:
print "already got file "+ fileName
else:
# download the file
urllib.urlretrieve(csvUrl, os.path.basename(fileName))
#print "Saving to 1_Downloaded "+ fileName
# open the file and then add the extra columns
with open(fileName, 'rb') as inf, open("out_"+fileName, 'wb') as outf:
csvreader = csv.DictReader(inf)
# add column names to beginning
fieldnames = ['url_source','downloaded_at'] + csvreader.fieldnames
csvwriter = csv.DictWriter(outf, fieldnames)
csvwriter.writeheader()
for node, row in enumerate(csvreader, 1):
csvwriter.writerow(dict(row, url_source=csvUrl, downloaded_at=today))
I believe both would work.
To me it seems the neatest way to do it would be to append to the file and relocate it afterwards.
Have a look at:
shutil.move
I belive rewriting the entire file would be less efficient.
It's not necessary to rebuild the file, try using the time module to create a time stamp string for your file name, and using os.rename to move your file.
Example - this just moves the file to your specified location:
os.rename('filename.csv','NEW_dir/filename.csv')
Hope this helps.
Went with an additional routine using shutil in the end:
# move and rename the 'out_' files to the right dir
source = os.listdir(downloaded)
for files in source:
if files.startswith('out_'):
newName = files.replace('out_','')
newPath = renamed+'/'+newName
shutil.move(files,newPath)

Create for loop for naming output file Python

So i'm importing a list of names
e.g.
Textfile would include:
Eleen
Josh
Robert
Nastaran
Miles
my_list = ['Eleen','Josh','Robert','Nastaran','Miles']
Then i'm assigning each name to a list and I want to write a new excel file for each name in that list.
#1. Is there anyway I can create a for loop where on the line:
temp = os.path.join(dir,'...'.xls')
_________________________
def high_throughput(names):
import os
import re
# Reading file
in_file=open(names,'r')
dir,file=os.path.split(names)
temp = os.path.join(dir,'***this is where i want to put a for loop
for each name in the input list of names***.xls')
out_file=open(temp,'w')
data = []
for line in in_file:
data.append(line)
in_file.close()
I'm still not sure what you're trying to do (and by "not sure", I mean "completely baffled"), but I think I can explain some of what you're doing wrong, and how to do it right:
in_file=open(names,'r')
dir,file=os.path.split(names)
temp = os.path.join(dir,'***this is where i want to put a for loop
for each name in the input list of names***.xls')
At this point, you don't have the input list of names. That's what you're reading from in_file, and you haven't read it yet. Later on, you read those named into data, after which you can use them. So:
in_file=open(names,'r')
dir,file=os.path.split(names)
data = []
for line in in_file:
data.append(line)
in_file.close()
for name in data:
temp = os.path.join(dir, '{}.xls'.format(name))
out_file=open(temp,'w')
Note that I put the for loop outside the function call, because you have to do that. And that's a good thing, because you presumably want to open each path (and do stuff to each file) inside that loop, not open a single path made out of a loop of files.
But if you don't insist on using a for loop, there is something that may be closer to what you were looking for: a list comprehension. You have a list of names. You can use that to build a list of paths. And then you can use that to build a list of open files. Like this:
paths = [os.path.join(dir, '{}.xls'.format(name)) for name in data]
out_files = [open(path, 'w') for path in paths]
Then, later, after you've built up the string you want to write to all the files, you can do this:
for out_file in out_files:
out_file.write(stuff)
However, this is kind of an odd design. Mainly because you have to close each file. They may get closed automatically by the garbage collection, and even if they don't, they may get flushed… but unless you get lucky, all that data you wrote is just sitting around in buffers in memory and never gets written to disk. Normally you don't want to write programs that depend on getting lucky. So, you want to close your files. With this design, you'd have to do something like:
for out_file in out_files:
out_file.close()
It's probably a lot simpler to go back to the one big loop I suggested in the first place, so you can do this:
for name in data:
temp = os.path.join(dir, '{}.xls'.format(name))
out_file=open(temp,'w')
out_file.write(stuff)
out_file.close()
Or, even better:
for name in data:
temp = os.path.join(dir, '{}.xls'.format(name))
with open(temp,'w') as out_file:
out_file.write(stuff)
A few more comments, while we're here…
First, you really shouldn't be trying to generate .xls files manually out of strings. You can use a library like openpyxl. Or you can create .csv files instead—they're easy to create with the csv library that comes built in with Python, and Excel can handle them just as easily as .xls files. Or you can use win32com or pywinauto to take control of Excel and make it create your files. Really, anything is better than trying to generate them by hand.
Second, the fact that you can write for line in in_file: means that an in_file is some kind of sequence of lines. So, if all you want to do is convert it to a list of lines, you can do that in one step:
data = list(in_file)
But really, the only reason you want this list in the first place is so you can loop around it later, creating the output files, right? So why not just hold off, and loop over the lines in the file in the first place?
Whatever you do to generate the output stuff, do that first. Then loop over the file with the list of filenames and write stuff. Like this:
stuff = # whatever you were doing later, in the code you haven't shown
dir = os.path.dirname(names)
with open(names, 'r') as in_file:
for line in in_file:
temp = os.path.join(dir, '{}.xls'.format(line))
with open(temp, 'w') as out_file:
out_file.write(stuff)
That replaces all of the code in your sample (except for that function named high_throughput that imports some modules locally and then does nothing).
Take a look at openpyxl, especially if you need to create .xlsx files. Below example assumes the Excel workbooks are created as blank.
from openpyxl import Workbook
names = ['Eleen','Josh','Robert','Nastaran','Miles']
for name in names:
wb = Workbook()
wb.save('{0}.xlsx'.format(name))
Try this:
in_file=open(names,'r')
dir,file=os.path.split(names)
for name in in_file:
temp = os.path.join(dir, name + '.xls')
with open(temp,'w') as out_file:
# write data to out_file

Python read 100's of text files

I have some data in simplejson format in txt files, which I read using:
with open("my_file.txt") as f: any_variable = simplejson.load(f)
It works fine, no problems. However, I now have 100's of such text files (some of which, I dont know the names for!) to read from and I was wondering, if there was a pythonic way to read all these files and assign them to say: any_variable1 to any_variableN. I dont really care in what order they are read in.
Obviously, a simple way would be to loop and store results, yet, I was wondering if there was a pythonic way here.
If the files are inside a directory, you can use:
variables = []
path = "/your/path"
for filename in os.listdir(path):
variables.append(simplejson.load(open(os.path.join(path, filename))))

Categories

Resources