Reading a csv file in python and creating database - python

I'm working on a function where I need to accept a CSV file name as a string, open and read it, create a database, and then return the database. The attempt I have so far seems to have the correct logic but it is giving me the error "No such file or directory 'filename.csv'. The files I'm reading are called file0.csv, file1.csv, etc. I'll include an example of one of them with my code below. Does anyone have any advice on how to fix this? Thanks
Edit: I realize that what I included below is an example of the database. Apparantly the first line of the file is the header row and the code I have now is reading the header row when it shouldn't be. Here is the updated code below
Code:
def read_file(filename):
thefile = open(filename)
data = []
for line in thefile:
data.append(line)
thefile.close()
return data
Example database:
{'Leonardo da Vinci': [('Mona Lisa', 1503,
76.8, 53.0, 'oil paint', 'France'), ('The
Last Supper', 1495, 460.0, 880.0, 'tempera',
'Italy')]}

Let's look at just the first two lines of your code:
def read_file(filename):
thefile = open('filename.csv')
I surmise that, since you want to be able to process more than one file with this code you want to be able to call read_file substituting various filenames in place of filename. Correct?
OK, then one flaw in the code is that filename in the first line is a variable but 'filename.csv' is a literal. This means that no matter what you put for filename in the first line it will NOT change the literal. To do that the second line would have to be, for instance,
thefile = open ('%s.csv' % filename, 'r')
This would put what's in the filename variable in place of the %s and do what you seem to want.
What most respondents are yammering about: Your script (ie, the Python code) might be in one disc folder or directory but the files you want to process might be in a different folder or directory. When you run a script without telling it where to look for files it will assume that you mean in the folder where it's running. At your stage of the game, the easiest thing to do is to put the Python script and the files it needs all in the same folder, and then run then in that same folder.

Maybe you want something like this? It opens your file (provided you find its correct location), accumulates its lines and returns the set of lines in your file.
def read_file(filename):
thefile = open('file0.csv', 'r')
lines = []
for line in thefile:
lines.append(line)
thefile.close()
return lines

Related

Python script not finding value in log file when the value is in the file

The code below is meant to find any xls or csv file used in a process. The .log file contains full paths with extensions and definitely contains multiple values with "xls" or "csv". However, Python can't find anything...Any idea? The weird thing is when I copy the content of the log file and paste it to another notepad file and save it as log, it works then...
infile=r"C:\Users\me\Desktop\test.log"
important=[]
keep_words=["xls","csv"]
with open(infile,'r') as f:
for line in f:
for word in keep_words:
if word in line:
important.append(line)
print(important)
I was able to figure it out...encoding issue...
with io.open(infile,encoding='utf16') as f:
You must change the line
for line in f:
to
for line in f.readlines():
You made the python search in the bytes opened file, not in his content, even in his lines (in a list, just like the readlines method);
I hope I was able to help (sorry about my bad English).

Selectively replacing csv header names

I have been searching for a solution for this and haven't been able to find one. I have a directory of folders which contain multiple, very-large csv files. I'm looping through each csv in each folder in the directory to replace values of certain headers. I need the headers to be consistent (from file to file) in order to run a different script to process all the data properly.
I found this solution that I though would work: change first line of a file in python.
However this is not working as expected. My code:
from_file = open(filepath)
# for line in f:
# if
data = from_file.readline()
# print(data)
# with open(filepath, "w") as f:
print 'DBG: replacing in file', filepath
# s = s.replace(search_pattern, replacement)
for i in range(len(search_pattern)):
data = re.sub(search_pattern[i], replacement[i], data)
# data = re.sub(search_pattern, replacement, data)
to_file = open(filepath, mode="w")
to_file.write(data)
shutil.copyfileobj(from_file, to_file)
I want to replace the header values in search_pattern with values in replacement without saving or writing to a different file - I want to modify the file. I have also tried
shutil.copyfileobj(from_file, to_file, -1)
As I understand it that should copy the whole file rather than breaking it up in chunks, but it doesn't seem to have an effect on my output. Is it possible that the csv is just too big?
I haven't been able to determine a different way to do this or make this way work. Any help would be greatly appreciated!
this answer from change first line of a file in python you copied from doesn't work in windows
On Linux, you can open a file for reading & writing at the same time. The system ensures that there's no conflict, but behind the scenes, 2 different file objects are being handled. And this method is very unsafe: if the program crashes while reading/writing (power off, disk full)... the file has a great chance to be truncated/corrupt.
Anyway, in Windows, you cannot open a file for reading and writing at the same time using 2 handles. It just destroys the contents of the file.
So there are 2 options, which are portable and safe:
create a file in the same directory, once copied, delete first file, and rename the new one
Like this:
import os
import shutil
filepath = "test.txt"
with open(filepath) as from_file, open(filepath+".new","w") as to_file:
data = from_file.readline()
to_file.write("something else\n")
shutil.copyfileobj(from_file, to_file)
os.remove(filepath)
os.rename(filepath+".new",filepath)
This doesn't take much longer, because the rename operation is instantaneous. Besides, if the program/computer crashes at any point, one of the files (old or new) is valid, so it's safe.
if patterns have the same length, use read/write mode
like this:
filepath = "test.txt"
with open(filepath,"r+") as rw_file:
data = rw_file.readline()
data = "h"*(len(data)-1) + "\n"
rw_file.seek(0)
rw_file.write(data)
Here we, read the line, replace the first line by the same amount of h characters, rewind the file and write the first line back, overwriting previous contents, keeping the rest of the lines. This is also safe, and even if the file is huge, it's very fast. The only constraint is that the pattern must be of the exact same size (else you would have remainders of the previous data, or you would overwrite the next line(s) since no data is shifted)

Script to search and replace strings in a flie

I'm new to Python and am struggling to understand why this program
#!/usr/bin/env python
infile = open('/usr/src/scripts/in_file.conf')
outfile = open('/usr/src/scripts/in_file.conf', 'w')
replacements = {'abcd':'ABCD', '1234':'bob'}
for line in infile:
for src, target in replacements.items():
line = line.replace(src, target)
outfile.write(line)
infile.close()
outfile.close()
results in a blank file after script execution.
The original in_file.conf is:
testfile of junk
abcd
******************
1234
*************
Correct me if i'm wrong, but it is my understanding that the script opens the in_file.conf and loads the contents into two temporary files in memory, infile & outfile. the dictionary type variable replacements acts like an array to hold the "to find" and to "replace" string.
It loops over each line then a nested loop goes down the line and loads the variables src and target with the contents of the replacement variable (like an array); then writes the line, until all the lines are written.
Am I way off in my understanding?
The in_file.conf is in the same directory as the script, could it just not finding the in_file.conf and writing a blank file?
I told you i was new to python.
Kind Regards,
Reggie.
The problem is that you're opening the same file in read mode and then in write mode (which truncates the file). You should ideally have a different file for the output, but if you need the output to be in the same file, you can delete the old file and rename the new one afterwards.
Please use different files for infile and outfile. Opening a file in write mode will delete its contents. Because your infile and outfile are the same files, your file contents is deleted and your for loop is never run

Issues reading and writing txt files line by line

The script is written using PyQt4.10.1 and Python2.7
I have been working on a simple tool to do allow a user to search for paths and then save them out to a config file for another program to read later. If there is already a config file then the script reads it and displays the existing paths for the user to edit or add to. I wrote a gui to make it as user friendly as possible. There are a couple issues I am having with it.
First, when I read in the config file I am using the following code:
try:
self.paths = open(configFile, "r")
self.data = self.paths.readlines()
self.paths.close()
except:
self.data = None
if self.data is not None:
for line in self.data:
print line
#self.listDelegate is the model for my QListView
self.listDelegate.insertRows(0, 1, line)
When I do that I get the following in my gui:
This (above) is how it looks when you first input the data (before the data is saved and then reopened)
This (above) is how the data looks after the config file is saved and then read back in (note the extra space below the path).
The config file is only read in when the script is first opened.
the following is how the config file looks when it is written out.
C:\Program Files
C:\MappedDrives
C:\NVIDIA
Now all of that wouldnt be a big deal but when I open the config file to edit it with this tool then the extra space in the gui is read as another line break. so the config file is then printed as:
C:\Program Files
C:\MappedDrives
C:\NVIDIA
Then the problem just gets bigger and bigger every time I edit the file.
This issue leads me to the second issue (which I think may be the culprit). When I write the lines from the gui to the config file I use the following code:
rowCount = self.listDelegate.rowCount()
if rowCount > 0:
myfile = open(configFile, 'w')
for i in range(rowCount):
myfile.write(str(self.listDelegate.index(i).data(role = QtCore.Qt.DisplayRole).toPyObject()))
myfile.write("\n")
myfile.close()
I am assuming that the issue with the extra line breaks is because I am adding the line breaks in manually. The problem is that I need each path to be on its own line for the config file to be usable later. I don't have a lot of experience writing out text files and everyone says that the easiest way to write them out line by line is to add in the line breaks by hand. If anyone has any better ideas I would love to hear them.
Sorry for the long winded explanation. If I am not being clear enough please tell me and I will try to explain myself better.
Thanks for the help!
The problem is that every time you read the file the line break remains at the end of the line. From the description of readline:
f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline.
If you try
self.paths = open(configFile, "r")
self.data = self.paths.readlines()
for line in self.data:
print repr(line)
which prints the representation of every line as python code you will get something like
'C:\\Program Files\n'
'C:\\MappedDrives\n'
'C:\\NVIDIA\n'
As you later insert further newlines the easiest fix is probably to remove the trailing newline:
for line in self.data:
strippedLine = line.rstrip('\n')

question about splitting a large file

Hey I need to split a large file in python into smaller files that contain only specific lines. How do I do this?
You're probably going to want to do something like this:
big_file = open('big_file', 'r')
small_file1 = open('small_file1', 'w')
small_file2 = open('small_file2', 'w')
for line in big_file:
if 'Charlie' in line: small_file1.write(line)
if 'Mark' in line: small_file2.write(line)
big_file.close()
small_file1.close()
small_file2.close()
Opening a file for reading returns an object that allows you to iterate over the lines. You can then check each line (which is just a string of whatever that line contains) for whatever condition you want, then write it to the appropriate file that you opened for writing. It is worth noting that when you open a file with 'w' it will overwrite anything already written to that file. If you want to simply add to the end, you should open it with 'a', to append.
Additionally, if you expect there to be some possibility of error in your reading/writing code, and want to make sure the files are closed, you can use:
with open('big_file', 'r') as big_file:
<do stuff prone to error>
Do you mean breaking it down into subsections? Like if I had a file with chapter 1, chapter 2, and chapter 3, you want it to be broken down into separate files for each chapter?
The way I've done this is similar to Wilduck's response, but closes the input file as soon as it reads in the data and keeps all the lines read in.
data_file = open('large_file_name', 'r')
lines = data_file.readlines()
data_file.close()
outputFile = open('output_file_one', 'w')
for line in lines:
if 'SomeName' in line:
outputFile.write(line)
outputFile.close()
If you wanted to have more than one output file you could either add more loops or open more than one outputFile at a time.
I'd recommend using Wilducks response, however, as it uses less space and will take less time with larger files since the file is read only once.
How big and does it need to be done in python? If this is on unix, would split/csplit/grep suffice?
First, open the big file for reading.
Second, open all the smaller file names for writing.
Third, iterate through every line. Every iteration, check to see what kind of line it is, then write it to that file.
More info on File I/O: http://docs.python.org/tutorial/inputoutput.html

Categories

Resources