This question already has answers here:
Troubles while parsing with python very large xml file
(3 answers)
Closed 4 years ago.
I tried to modify and save a xml file using minidom in python.
Everything is quite working good except 1 specific file, that I only can read but can not write it back.
Code that I use to save xml file:
domXMLFile = minidom.parse(dom_document_filename)
#some modification
F= open(dom_document_filename,"w")
domXMLFile .writexml(F)
F.close()
My question is :
Is it true that minidom can not handle too large file ( 714KB )?
How do i solve my problem?
In my opinion, lxml is way better than minidom for handling XML. If you have it, here is how to use it:
from lxml import etree
root = etree.parse('path/file.xml')
# some changes to root
with open('path/file.xml', 'w') as f:
f.write(etree.tostring(root, pretty_print=True))
If not, you could use pdb to debug your code. Just write import pdb; pdb.set_trace() in your code where you want a break pont and when running your function in a shell, it should stop at this line. It may give you a better view for what is not working.
Related
This question already has answers here:
How to open a file for both reading and writing?
(4 answers)
Closed 3 years ago.
In simplest terms, when i try to run a short amount of code to delete the contents of a file and then rewrite stuff to that file, it pulls that error. I'm trying to get a temperature reading from a com port using the filewrite from CoolTerm, perhaps it's the fact that the file is being used by CoolTerm as well, so I can't edit it, but I'm unsure.
I've tried multiple ways to delete the file information e.g the file.close(), and others, but none seem to work.
while True:
file = open("test.txt","r")
file.truncate()
x = file.read()
x = x.split("\n")
print(x[0])
print(x[1])
time.sleep(3)
I expect the console to output the contents of file but it doesn't. Something that gives me a similar result of what i want would be the Console just outputting the last two entries of the file, rather than having to delete all of it than rewriting it.
Modified to r+ mode is ok, I have tested.
with open('./install_cmd', 'r+') as f:
print(f'truncate ago:{f.read()}')
f.truncate(0)
print(f'truncate after:{f.read()}')
This question already has an answer here:
How do I write all of these rows into a CSV file for a given range?
(1 answer)
Closed 6 years ago.
I'm parsing text from an XML file. Parsing works well, and I can print the results in full, but when I try to write the text into a text document, all I get in the document is the last item.
from bs4 import BeautifulSoup
import urllib.request
import sys
req = urllib.request.urlopen('file:///C:/Users/John/Desktop/Dow%20Jones/compaq%20neg%201.xml')
xml = BeautifulSoup(req, 'xml')
for item in xml.findAll('paragraph'):
sys.stdout = open('CN1.txt', 'w')
print(item.text)
sys.stdout.close()
What am I missing here?
It looks like you are opening the file every time you go through the loop, which I am surprised it let you do. When it opens the file, it is is opening it in write mode and therefore is wiping out everything that was in it on the last pass through the loop.
This question already has answers here:
Find and Replace Values in XML using Python
(4 answers)
Closed 6 years ago.
I need to write a Python script that reads and replaces some data in an XML file.
The data that is replaced has to be read automatically from a directory (it's a file's name)
<setting name="abc" serializeAs="String">
<value>fw.version.1.1</value>
the fw.version1.1 has to be replaced with the file name from a folder.
Could use some help:)
thanks,
Robert
Assuming the XML File looks something like that test.xml:
<someXml>
<setting name="abc" serializeAs="String"/>
<value>fw.version.1.1</value>
</someXml>
To read the XML Data from File:
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
xmlData = etree.parse('test.xml', parser )
Reading the text from the value Tag:
xmlData.xpath('//value')[0].text
Writing new text to the value Tag:
xmlData.xpath('//value')[0].text = "test"
And finally write your changes to the same (or any other) File:
xmlData.write( 'test.xml', pretty_print=True )
This question already has answers here:
Python how to write to a binary file?
(7 answers)
Closed 8 years ago.
I am beginner with python. ASCII files I can create, but with binary it seems more difficult to get in.
The writing of binary files got me confused, when I have not been able to find simplest code EXAMPLES, which would effectively reveal me, how it is actually done.
So, here I write things, which I would like to solve:
python: a=254, write value a to binary file.
file1: FE
file2: 00FE
file3: 000000FE
file4: FE00
file5: FE000000
python: string="00AABBCCDDEEFF"
file: 00AABBCCDDEEFF
python: string="999 This is ASCII"
file: 090909[and the rest same way converted]
So, that was writing needs, but how to reverse the progress?
Additional explaining, how to read wwxxyyzz from
file: FFDD0045wwxxyyzzFA23
python: wwxxyyzz (as value or string)
python: zzyyxxww (reversed)
If I could find as basic information, it would help me a lot to the new things to play with.
As you may see, this is my first post, so very newbie...
1.st EDIT: Okay, first I thank the fast answer, but as I am so new here, I could not comment, upvoted or so. That example is fitting for my file1, but file2-5 will be still hard to figure out, even with provided links, if there is not as clear and small (full) example. Also my question was rapidly marked as a duplicate, but on there was information still a bit not clear enough for a newbie like me. I have to continue with trial and error.
Heres a basic example that will accomplish what you wanted for writing binary files
>>> filename = "file"
>>> file = open(filename,"wb")
>>> a = 254
>>> file.write(chr(a))
>>> file.close()
For reading binary files, and more examples:
Reading binary file in Python and looping over each byte
https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files
Binary file IO in python, where to start?
This question already has answers here:
Whitespace gone from PDF extraction, and strange word interpretation
(7 answers)
Closed 5 years ago.
I'm looking for the easiest way to convert PDF to plain text in Python.
PyPDF2 seemed to be very easy, here is what I have:
def test_pdf(filename):
import PyPDF2
pdf = PyPDF2.PdfFileReader(open(filename, "rb"))
for page in pdf.pages:
print page.extractText()
But it gives me:
InChapter5wepresentandevaluateourresults,togetherwiththetestenvironment.
How can I extract words from that PDF with PyPDF? Is there a different way (another library that works well for this)?
Well i used with success PDFMiner, with which you can parse and extract text from pdf documents.
More specifically there is this pdf2txt.py module where you can use to extract text. Installation is easy: pdfminer-xxx#python setup.py install and from bash or cmd a simple pdf2txt.py -o Application.txt Reference/Application.pdf command would do the trick.
In the above mentioned oneliner application.pdf is ur target pdf, the one you are going to process and application.txt is the file that will be generated.
Furthermore for more complex tasks you can take a look at the api and modify it up to your needs.
edit: i answered based on my personal experience and that's that. I have no reason to "promote" the proposed tool. I hope that helps
edit2: something like that worked for me.
# -*- coding: utf-8 -*-
import os
import re
dirpath = 'path\\to\\dir'
filenames = os.listdir(dirpath)
nb = 0
open('path\\to\\dir\\file.txt', 'w') as outfile:
for fname in filenames:
nb = nb+1
print fname
print nb
currentfile = os.path.join(dirpath, fname)
open(currentfile) as infile:
for line in infile:
outfile.write(line)