python. write to file, cannot understand behavior - python

I don't understand why I cannot write to file in my python program. I have list of strings measurements. I want just write them to file. Instead of all strings it writes only 1 string. I cannot understand why.
This is my piece of code:
fmeasur = open(fmeasur_name, 'w')
line1st = 'rev number, alg time\n'
fmeasur.write(line1st)
for i in xrange(len(measurements)):
fmeasur.write(measurements[i])
print measurements[i]
fmeasur.close()
I can see all print of these trings, but in the file there is only one. What could be the problem?

The only plausible explanation that I have is that you execute the above code multiple times, each time with a single entry in measurements (or at least the last time you execute the code, len(measurements) is 1).
Since you're overwriting the file instead of appending to it, only the last set of measurements would be present in the file, but all of them would appear on the screen.
edit Or do you mean that the data is there, but there's no newlines between the measurements? The easiest way to fix that is by using print >>fmeasur, measurements[i] instead of fmeasur.write(...).

Related

How to iterate in reverse from the middle of a text file?

The Problem:
I am writing a program whose ultimate objective is to extract several specific lines from text-versions of .json files. I want to automate the manual process of copy/pasting tens or hundreds of lines that all share the same keyword, but of which are a few lines removed FROM that keyword.
Proposed solution:
The python program iterates through the .txt file to look for a specific keyword
Once it finds that word, it then stops and iterates backwards from
that line until it finds a SECOND keyword.
When the second keyword is found, the program writes the entire line that the keyword
is on to a new file, and then resumes iterating through the file again from the initial keyword's line.
Illustration:
<fields>
<fullName>NAME KEYWORD</fullName> ##line I want to iterate backwards to so I can write it to another file##
<label>example_label</label>
<length>131072</length>
<trackHistory>false</trackHistory> ##line with keyword to stop the iterating process#
<type>example_type</type>
</fields>
Once the line with "NAME KEYWORD" is written to a new file, then the program continues onto the next section, which will have many of the same fields, but a different "NAME KEYWORD", etc.
Attempted solutions:
I have been looking for clear information online about how to iterate through a text file in reverse from a given point. I have found one site, (kite.com), which illustrates how to use the readlines() and reversed() functions, but those actions are performed on the document as a whole, as opposed to a distinct portion.
I also reviewed Python's own documentation, but the suggestions there do not appear to propose the functionality that I'm looking for here. (Unless I've misunderstanding.)
TL;DR
Does anyone have an idea about whether there is an existing module, function or practice which would allow Python to iterate backwards from the middle of a text file?
As others mentioned in the comments, it would be better to work with the original JSON or use an XML parser. But if these aren't possible (maybe the file is too big to load into memory at once), I think you can do it without having to read in reverse.
saved_line = None
for line in oldfile:
if 'NAME KEYWORD' in line:
saved_line = line
elif '<trackHistory>false</trackHistory>' in line and saved_line:
newfile.write(saved_line)
saved_line will always contain the same line that you would have found if you iterated backwards after finding the <trackHistory>false</trackHistory> line.

Writelines (python) clears the text file it is supposed to write to, and writes nothing

What I'm doing is pretty basic, but for some reason isn't actually writing anything into the text file I need.
The first thing I've done is gotten input from the user and assigned it to assoc. This works fine, as I can print out assoc whenever I please, and that appears to work completely fine.
Next, I open a different file depending on whether or not assoc is equal to 0, 1, or 2. I read all the lines and assign the list of read lines to the variable beta, then I grab the length of beta and assign it to prodlen, add one two prodlen and assign that new value to localid and close the object. The only reason I'm including this is because I fear I've missed something crucial and simple.
if assoc==0:
fob=open('pathto/textfile1.txt','r')
if assoc==1:
fob=open('pathto/textfile2.txt','r')
if assoc==2:
fob=open('pathto/textfile3.txt','r')
beta=fob.readlines();
prodlen=len(beta);
localid=prodlen+1;
fob.close;
After I get the user input, open the file, list its contents, and read its length, I then use the user's input again to open the file with writing permissions. (I've only included one of the possible if statements, because the others are identical except for which file they write to and what VALUE, which is a string, is). I append list beta with \n to get a line break, followed by a string, which has been represented here by VALUE. I then add localid onto the end, in string form.
if assoc==0:
fob=open('pathto/textfile1.txt','w')
beta.append("\nVALUE"+str(localid))
print (beta)
fob.writelines(beta)
My true problem, though, is in the last two lines. When I print out list beta, it includes the new value that I've appended. But when I try to write the list to the file, it clears any data that was currently in the file and doesn't write anything inside! I'm a python noob, so please, keep the solution simple (if possible). I assume the solution to this is relatively simple. I'm probably just overlooking something.
use the 'a' option instead of 'w' in your open call. w overwrites, a appends.
http://docs.python.org/2/library/functions.html#open
python open built-in function: difference between modes a, a+, w, w+, and r+?
is a useful explanation of different modes.

Using Python to write a CSV file with delimiter

I'm new to programming, and also to this site, so my apologies in advance for anything silly or "newbish" I may say or ask.
I'm currently trying to write a script in python that will take a list of items and write them into a csv file, among other things. Each item in the list is really a list of two strings, if that makes sense. In essence, the format is [[Google, http://google.com], [BBC, http://bbc.co.uk]], but with different values of course.
Within the CSV, I want this to show up as the first item of each list in the first column and the second item of each list in the second column.
This is the part of my code that I need help with:
with open('integration.csv', 'wb') as f:
writer = csv.writer(f, delimiter=',', dialect='excel')
writer.writerows(w for w in foundInstances)
For whatever reason, it seems that the delimiter is being ignored. When I open the file in Excel, each cell has one list. Using the old example, each cell would have "Google, http://google.com". I want Google in the first column and http://google.com in the second. So basically "Google" and "http://google.com", and then below that "BBC" and "http://bbc.co.uk". Is this possible?
Within my code, foundInstances is the list in which all the items are contained. As a whole, the script works fine, but I cannot seem to get this last step. I've done a lot of looking around within stackoverflow and the rest of the Internet, but I haven't found anything that has helped me with this last step.
Any advice is greatly appreciated. If you need more information, I'd be happy to provide you with it.
Thanks!
In your code on pastebin, the problem is here:
foundInstances.append(['http://' + str(num) + 'endofsite' + ', ' + desc])
Here, for each row in your data, you create one string that already has a comma in it. That is not what you need for the csv module. The CSV module makes comma-delimited strings out of your data. You need to give it the data as a simple list of items [col1, col2, col3]. What you are doing is ["col1, col2, col3"], which already has packed the data into a string. Try this:
foundInstances.append(['http://' + str(num) + 'endofsite', desc])
I just tested the code you posted with
foundInstances = [[1,2],[3,4]]
and it worked fine. It definitely produces the output csv in the format
1,2
3,4
So I assume that your foundInstances has the wrong format. If you construct the variable in a complex manner, you could try to add
import pdb; pdb.set_trace()
before the actual variable usage in the csv code. This lets you inspect the variable at runtime with the python debugger. See the Python Debugger Reference for usage details.
As a side note, according to the PEP-8 Style Guide, the name of the variable should be found_instances in Python.

openpyxl please do not assume text as a number when importing

There are numerous questions about how to stop Excel from interpreting text as a number, or how to output number formats with openpyxl, but I haven't seen any solutions to this problem:
I have an Excel spreadsheet given to me by someone else, so I did not create it. When I open the file with Excel, I have certain values like "5E12" (clone numbers, if anyone cares) that appear to display correctly, but there's a little green arrow next to each one warning me that "This appears to be a number stored as text". Excel then asks me if I would like to convert it to a number, and if I saw yes, I get 5000000000000, which then converts automatically to scientific notation and displays 5E12 again, only this time a text output would show the full number with zeroes. Note that before the conversion, this really is text, even to Excel, and I'm only being warned/offered to convert it.
So, when reading this file in with openpyxl (from openpyxl.reader.excel import load_workbook), the 5E12 is getting converted automatically to 5000000000000. I assume that openpyxl is making the same assumption that Excel made, only the conversion happens without a prompt or input on my part.
How can I prevent this from happening? I do not want text that look like "numbers stored as text" to convert to numbers. They are text unless I say so.
So far, the only solution I have found is to add single quotes to the front of each cell, but this is not an ideal solution, as it's manual labor rather than a programmatic solution. Also, the solution needs to be general, since I don't always know where this problem might occur (I'm reading millions of lines per day, so I don't want to have to do anything by hand).
I think this is a problem with openpyxl. There is a google group discussion from the beginning of 2011 that mentions this problem, but assumes it's too rare to matter. https://groups.google.com/forum/?fromgroups=#!topic/openpyxl-users/HZfpShMp8Tk
So, any suggestions?
If you want to use openpyxl again (for whatever reason), the following changes to the worksheet reader routine do the trick of keeping the strings as strings:
diff --git a/openpyxl/reader/worksheet.py b/openpyxl/reader/worksheet.py
--- a/openpyxl/reader/worksheet.py
+++ b/openpyxl/reader/worksheet.py
## -134,8 +134,10 ##
data_type = element.get('t', 'n')
if data_type == Cell.TYPE_STRING:
value = string_table.get(int(value))
-
- ws.cell(coordinate).value = value
+ ws.cell(coordinate).set_value_explicit(value=value,
+ data_type=Cell.TYPE_STRING)
+ else:
+ ws.cell(coordinate).value = value
# to avoid memory exhaustion, clear the item after use
element.clear()
The Cell.value is a property and on assignment call Cell._set_value, which then does a Cell.bind_value which according to the method's doc: "Given a value, infer type and display options". As the types of the values are in the XML file those should be taken (here I only do that for strings) instead of doing something 'smart'.
As you can see from the code, the test whether it is a string was already there.

How do I parse a listing of files to get just the filenames in Python?

So lets say I'm using Python's ftplib to retrieve a list of log files from an FTP server. How would I parse that list of files to get just the file names (the last column) inside a list? See the link above for example output.
Using retrlines() probably isn't the best idea there, since it just prints to the console and so you'd have to do tricky things to even get at that output. A likely better bet would be to use the nlst() method, which returns exactly what you want: a list of the file names.
This best answer
You may want to use ftp.nlst() instead of ftp.retrlines(). It will give you exactly what you want.
If you can't, read the following :
Generators for sysadmin processes
In his now famous review, Generator Tricks For Systems Programmers An Introduction, David M. Beazley gives a lot of receipes to answer to this kind of data problem with wuick and reusable code.
E.G :
# empty list that will receive all the log entry
log = []
# we pass a callback function bypass the print_line that would be called by retrlines
# we do that only because we cannot use something better than retrlines
ftp.retrlines('LIST', callback=log.append)
# we use rsplit because it more efficient in our case if we have a big file
files = (line.rsplit(None, 1)[1] for line in log)
# get you file list
files_list = list(files)
Why don't we generate immediately the list ?
Well, it's because doing it this way offer you much flexibility : you can apply any intermediate generator to filter files before turning it into files_list : it's just like pipe, add a line, you add a process without overheat (since it's generators). And if you get rid off retrlines, it still work be it's even better because you don't store the list even one time.
EDIT : well, I read the comment to the other answer and it says that this won't work if there is any space in the name.
Cool, this will illustrate why this method is handy. If you want to change something in the process, you just change a line. Swap :
files = (line.rsplit(None, 1)[1] for line in log)
and
# join split the line, get all the item from the field 8 then join them
files = (' '.join(line.split()[8:]) for line in log)
Ok, this may no be obvious here, but for huge batch process scripts, it's nice :-)
And a slightly less-optimal method, by the way, if you're stuck using retrlines() for some reason, is to pass a function as the second argument to retrlines(); it'll be called for each item in the list. So something like this (assuming you have an FTP object named 'ftp') would work as well:
filenames = []
ftp.retrlines('LIST', lambda line: filenames.append(line.split()[-1]))
The list 'filenames' will then be a list of the file names.
Is there any reason why ftplib.FTP.nlst() won't work for you? I just checked and it returns only names of the files in a given directory.
Since every filename in the output starts at the same column, all you have to do is get the position of the dot on the first line:
drwxrwsr-x 5 ftp-usr pdmaint 1536 Mar 20 09:48 .
Then slice the filename out of the other lines using the position of that dot as the starting index.
Since the dot is the last character on the line, you can use the length of the line minus 1 as the index. So the final code is something like this:
lines = ftp.retrlines('LIST')
lines = lines.split("\n") # This should split the string into an array of lines
filename_index = len(lines[0]) - 1
files = []
for line in lines:
files.append(line[filename_index:])
If the FTP server supports the MLSD command, then please see section “single directory case” from that answer.
Use an instance (say ftpd) of the FTPDirectory class, call its .getdata method with connected ftplib.FTP instance in the correct folder, then you can:
directory_filenames= [ftpfile.name for ftpfile in ftpd.files]
I believe it should work for you.
file_name_list = [' '.join(each_file.split()).split()[-1] for each_file_detail in file_list_from_log]
NOTES -
Here I am making a assumption that you want the data in the program (as list), not on console.
each_file_detail is each line that is being produced by the program.
' '.join(each_file.split())
To replace multiple spaces by 1 space.

Categories

Resources