I'm kinda new to Python. I need to read some text from one file (A), compare it with the text from another file (B), change a part of the earlier mentioned file and write it to the third one (C). Problem is A and B files files have unusual notation that involves this symbol "¶".
So, I managed to bypass it (ignore it) by reading (or writing) in the following way:
input = codecs.open('bla.txt', 'r', 'ascii', 'ignore');
But it's not good. I NEED to read it in precise way and compare it and write successfully.
So, content of my B file is: "Sugar=[Sugar#Butter¶Cherry]"
but when I read it, my variable has the value Sugar=[Sugar#Butter¶Cherry]
You can see, there is additional "Â"
Then my A file contains a lot of text which needs to be copied to the C file, except the certain part that follows after the above mentioned text in in B. That part needs to be changed and then written, BUT they are not the same, my program never enters the IF condition in which I am comparing the "Sugar=[Sugar#Butter¶Cherry]" form A and "Sugar=[Sugar#Butter¶Cherry]" from B.
Is there a way I can read the text so that this symbol "¶" appears as it is?
Yes.
Use the correct encoding.
input = codecs.open('bla.txt', 'r', 'UTF-8', 'ignore')
Related
I appreciate that this may be an issue with my computer/software, but I want to double check that my code isn't causing the problem before ruling it out.
I have written a fairly simple program. I have a short list of strings read in from a text file, then with a different text file open, I iterate over each word in the second text file, checking if the first two letters of the word are contained in the first list of strings.
Then, if that condition is fulfilled, I use string interpolation to insert that word into a string of HTML code. Finally, I append that string to an existing empty .html. When finished iterating through, I close the html file.
with open("strings.txt", "r") as f:
strings = f.read().splitlines()
urlfile = open("links.html", "a")
with open("words.txt", "r") as f:
text = f.read().splitlines()
for word in text:
if word[:2] in strings:
html = '<a href="[URL]/{}">'.format(word)
urlfile.write(html)
urlfile.close()
so far there doesn't actually seem to be any issues with my code doing what I want - I am generating the right html code and if I print it to console it does so quickly. It is being appended to the html file.
The problem I have is that something I am doing must be computationally expensive or problematic, because Notepad++ freezes every time I try to check links.html for the results. I have managed to see that it looks correct, but Notepad++ then becomes unusable, and my computer is clearly straining. The only solution I have is to close anything related to the html file.
None of the lists used are long and all the operations should in theory be quite simple, so I feel as though I must be doing something wrong. Am I writing to files in an unsafe way? Am I doing something wildly expensive that I'm just missing? I am using Notepad++ v7.9.5, Python 3, and Anaconda prompt.
EDIT: I am now able to access the html file on my browser and on Notepad++ without issue. I think the source of the problem was some laptop software updating in the background without me noticing. I'll check that first next time!
This is my carDatabase.txt
CarID:c01 ModelName:honda VehicleType:city Price:20
CarID:c02 ModelName:honda VehicleType:x Price:30
I want to search for the carID and be only able to modify the whole line without interrupting others
my current code is here:
# Converting txt data into a string and modify
carsDatabaseFile = open('carsDatabase.txt', 'r')
allDataFromDatabase = [line.split(',') for line in carsDatabaseFile.readlines()]
Note:
Your question has a couple of issues: your sample from carDatabase.txt looks like it is tab-delimited, but your current code looks like it is splitting the line around the ',' character. This also looks like a place where a list comprehension might be hurting you more than it is helping you. Break that up into a for-loop if you're trying to add some logic to manipulate a single line.
For looking at CSV files, I would highly recommend using pandas for general manipulation of data in comma ceparated as well as a number of other formats.
That said, if you are truly restricted to only using built-in packages, or you are looking at this as a learning exercise, and your goal is to directly manipulate just one line of that file, what you are looking for is the seek method. You can use this in combination with the tell method ( documented just blow seek in the above link ) to find where you are in the file.
Write a for loop to identify which line in the file you are looking for
From there, you can get the output of tell() to find the specific place in the file you are trying to manipulate
Using the output from the above two steps, you can set the file pointer to a specific location using the seek() method (by byte: files are really stored as one dimensional).
You can now use the write() method to directly update the file at the location you determined above.
I am a new python learner and now i have entered into file handling.
I tried solution for my problem but failed, so posting my question. before duplication please consider my question.
I tried to create a file, it worked.
writing in the file also worked.
But when i tried to read the text or values in the file, it returns empty.
I use command line terminal to work with python and running in Ubuntu OS.
The coding which I have tried is given below. The file is created in the desired location and the written text is also present.
f0=open("filehandling.txt","wb")
f0.write("my second attempt")
s=f0.read(10);
print s
I also tried with wb+, r+. But it just returns as empty
edit 1:
I have attached the coding below. I entered one by one in command line
fo = open("samp.txt", "wb")
fo.write( "Text is here\n");
fo.close()
fo = open("samp.txt", "r+")
str = fo.read(10);
print "Read String is : ", str
fo.close()
First of all if you open with wb flag then the file will be only in writeable mode. If you want to both read and write then you need wb+ flag. If you don't want the file to be truncated each time then you need rb+.
Now files are streams with pointers pointing at a certain location inside the file. If you write
f0.write("my second attempt")
then the pointer points at the [pointer before writing] (in your case the begining of the file, i.e. 0) plus [length of written bytes] (in your case 17, which is the end of the file). In order to read whole file you have to move that pointer back to the begining and then read:
f0.seek(0)
data = f0.read()
I want to write the output write the output in a file, but all I got was "None" even for words with synonyms.
Note: when I am not writing in the file the output it works perfectly fine Another note: the output appears on the screen whether i am writing to a file or not, but I get "None" in the file, is theres anyway to fix it.
[im using python V2.7 Mac version]
file=open("INPUT.txt","w") #opening a file
for xxx in Diacritics:
print xxx
synsets = wn.get_synsetids_from_word(xxx) or []
for s in synsets:
file.write(str(wn._items[s].describe()))
I tried to simplify the question and rewrite your code so that its an independent test that you should be able to run and eventually modify if that was the problem with your code.
test = "Is this a real life? Is this fantasy? Caught in a test slide..."
with open('test.txt', 'w') as f:
for word in test.split():
f.write(word) # test.txt output: Isthisareallife?Isthisfantasy?Caughtinatestslide...
A side note it almost sounds like you want to append rather than truncate, but I am not sure, so take a look at this.
The most commonly-used values of mode are 'r' for reading, 'w' for writing (truncating the file if it already exists), and 'a' for appending (which on some Unix systems means that all writes append to the end of the file regardless of the current seek position). If mode is omitted, it defaults to 'r'. The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading. Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability. (Appending 'b' is useful even on systems that don’t treat binary and text files differently, where it serves as documentation.) See below for more possible values of mode.
file.write() is going to write whatever is returned by the describe() command. Because 'None' is being written, and because output always goes to the screen, the problem is that describe is writing to the screen directly (probably with print) and returning None.
You need to use some other method besides describe, or give the correct parameters to describe to have it return the strings instead of printing them, or file a bug report. (I am not familiar with that package, so I don't know which is the correct course of action.)
From what I've researched, csv.writeRow should take in a list, and then write it to the given csv file. Here's what I tried:
from csv import writer
with open('Test.csv', 'wb') as file:
csvFile, count = writer(file), 0
titles = ["Hello", "World", "My", "Name", "Is", "Simon"]
csvFile.writerow(titles)
I'm just trying to write it so that each word is in a different column.
When I open the file that it creates, however, I get the following message:
After pressing to continue anyways, I get a message saying that the file is either corrupted, or is a SYLK file. I can then open the file, but only after going through two error messages everytime I open the file.
Why is this?
Thanks!
It's a documented issue that Excel will assume a csv file is SYLK if the first two characters are 'ID'.
Venturing into the realm of opinion - it shouldn't, but Excel thinks it knows better than the extension. To be fair, people expect it to be able to figure out cases where the extension really is wrong, but in a case like this assuming the extension is wrong, and then further assuming the file is corrupt when it doesn't appear corrupt if interpreted according to the extension is just mind-boggling.
#John Y points out:
One thing to watch out for: The "workaround" given by the Microsoft issue linked to by #PeterDeGlopper is to (manually) prepend an apostrophe into the file. (This is also advice commonly found on the Web, including StackOverflow, to try to force CSV digits to be treated as strings rather than numbers.) This is not what I'd call good advice, as that injects a literal apostrophe into your data.
#DSM suggests using quoting=csv.QUOTE_NONNUMERIC on the writer. Excel is not confused by a file beginning with "ID" rather than ID, so if the other tools that are going to work with the CSV accept that quoting level this is probably the best solution other than just ignoring Excel's confusion.