First line not capitalizing correctly in Python 3 - python

I'm trying to capitalize the first letter of every name in a file, so I wrote the following code:
with open('C:/Users/Nishesh/Documents/updated_firstnames.txt', 'r+', encoding='utf-8') as updated_fnames_file:
with open('C:/Users/Nishesh/Documents/capitalized.txt', 'w', encoding='utf-8') as new_fnames:
for line in updated_fnames_file:
new_fnames.write(line.capitalize())
I'm new to Python, so I'm well aware that this is probably poor formatting/logic (and I'd appreciate suggestions to improve it), but for my purposes, this did manage to correctly capitalize every item in the file other than the very first one, as far as I can tell. Actually, the first name in the original file was already capitalized, but after I ran this it ended up lower case in the resulting file. The other items in the first file which were already capitalized were not made lower case however - just this one. Why is this happening?

capitalize() :
It returns a copy of the string with only its first character capitalized.
You probably need capwords() from string lib.
string.capwords() :
Split the argument into words using str.split(), capitalize each word using str.capitalize(), and join the capitalized words using str.join().
Or you can do the same method by hand
new_fnames.write(' '.join(map(str.capitalize, line.split())))

Related

Weird behavior when writing a string to a file

I am trying to make an AutoHotKey script that removes the letter 'e' from most words you type. To do this, I am going to put a list of common words in a text file and have a python script add the proper syntax to the AHK file for each word. For testing purposes, my word list file 'words.txt' contains this:
apple
dog
tree
I want the output in the file 'wordsOut.txt' (which I will turn into the AHK script) to end up like this after I run the python script:
::apple::appl
::tree::tr
As you can see, it will exclude words without the letter 'e' and removes 'e' from everything else. But when I run my script which looks like this...
f = open('C:\\Users\\jpyth\\Desktop\\words.txt', 'r')
while True:
word = f.readline()
if not word: break
if 'e' in word:
sp_word = word.strip('e')
outString = '::{}::{}'.format(word, sp_word)
p = open('C:\\Users\\jpyth\\Desktop\\wordsOut.txt', 'a+')
p.write(outString)
p.close()
f.close()
The output text file ends up like this:
::apple
::apple
::tree::tr
The weirdest part is that, while it never gets it right, the text in the output file can change depending on the number of lines in the input file.
I'm making this an official answer and not a comment because it's worth pointing out how strip works, and to be weary of hidden characters like new line characters.
f.readline() returns each line, including the '\n'. Because strip() only removes the character from beginning and end of string, not from the middle, it's not actually removing anything from most words with 'e'. In fact even a word that ends in 'e' doesn't get that 'e' removed, since you have a new line character to the right of it. It also explains why ::apple is printed over two lines.
'hello\n'.strip('o') outputs 'hello\n'
whereas 'hello'.strip('o') outputs 'hell'
As pointed out in the comments, just do sp_word = word.strip().replace('\n', '').replace('e', '')
lhay's answer is right about the behavior of strip() but I'm not convinced that list comprehension really qualifies as "simple".
I would instead go with replace():
>>> 'elemental'.replace('e', '')
'lmntal'
(Also side note: for word in f: does the same thing as the first three lines of your code.)

Analyze a text file for certain attributes in Python 3.x

For an assignment in Python 3.x, I have to create a program that reads a text file and outputs the total number of characters, lines, vowels, capital letters, numeric digits, and words. The user has to provide the file and path of the text file. Asking for the file is easy:
file = input("Please provide the file path and file name. \nFor example C:\\Users\\YourName\\Documents\\books\\book.txt \n:")
f = open(file, 'r')
text = f.read()
I tried to use simple functions like:
numberOfCharacters = len(text)
...but reading farther into the assignment reveals that I have to use a for loop to analyze each character in the string, and then use a multi-way if statement to check whether it is a vowel, digit, etc.
I know I can count the number of line by counting the number of \n's and I can use the .split() functions for wordsl but I am rather lost on how to get going.
I want to format the output like this, though I think I can figure this out after I get the program to work.
------------width=35---------|--width=8----
|number of characters : #####|
|number of lines : #####|
|number of vowels : #####|
|number of capital letters : #####|
|number of numeric digits : #####|
|number of words : #####|
Any help getting going and showing me what to do would be greatly appreciated.
You can use the NLTK toolkit (http://www.nltk.org/) to get the info you want.

Read. Check. Write. A Broken Python script

I am developing a word game, and for this game, I needed a list of words. Sadly, this list was so long that I just had to refine it (this list of words can be found on any Mac at /usr/share/dict/).
To refine it, I decided to use my own Python scripts. I already wrote a script before that removes all words that start with capital letters (thus removing names of places, etc.), and it worked. This is it:
with open("/Users/me/Desktop/oldwords.txt", "r") as text:
with open("/Users/me/Desktop/newwords.txt", "w") as towriteto:
for word in text:
if word[0]==word[0].lower():
towriteto.write(word)
Then, I decided to refine it even further; I decided that I would delete all words that are not in the pyenchant module English dictionary. This opperation's code is very similar to the previous one's code. This is my code:
import enchant
with open("/Users/me/Desktop/newwords.txt", "r") as text:
with open("/Users/me/Desktop/words.txt", "w") as towriteto:
d = enchant.Dict("en_US")
for word in text:
if d.check(word):
towriteto.write(word)
Sadly, this did not write anything to the "towriteto" file, and after some debugging, I found that
d.check(word) -> False
It always returned false. However, when I checked words separately, real words returned True, and fake words returned False as they should.
I have no idea what is wrong with my second script. The file locations are correct and the pyenchant installation had no issues.
Thanks in advance!
I don't know the input file format but if there is only one word per line, try to remove the end-of-line character of word before to call d.check(word):
word = word.rstrip()

Search a delimited string in a file - Python

I have the following read.json file
{:{"JOL":"EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD.tr","LAPTOP":"error"}
and python script :
import re
shakes = open("read.json", "r")
needed = open("needed.txt", "w")
for text in shakes:
if re.search('JOL":"(.+?).tr', text):
print >> needed, text,
I want it to find what's between two words (JOL":" and .tr) and then print it. But all it does is printing all the text set in "read.json".
You're calling re.search, but you're not doing anything with the returned match, except to check that there is one. Instead, you're just printing out the original text. So of course you get the whole line.
The solution is simple: just store the result of re.search in a variable, so you can use it. For example:
for text in shakes:
match = re.search('JOL":"(.+?).tr', text)
if match:
print >> needed, match.group(1)
In your example, the match is JOL":"EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD.tr, and the first (and only) group in it is EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD, which is (I think) what you're looking for.
However, a couple of side notes:
First, . is a special pattern in a regex, so you're actually matching anything up to any character followed by tr, not .tr. For that, escape the . with a \. (And, once you start putting backslashes into a regex, use a raw string literal.) So: r'JOL":"(.+?)\.tr'.
Second, this is making a lot of assumptions about the data that probably aren't warranted. What you really want here is not "everything between JOL":" and .tr", it's "the value associated with key 'JOL' in the JSON object". The only problem is that this isn't quite a JSON object, because of that prefixed :. Hopefully you know where you got the data from, and therefore what format it's actually in. For example, if you know it's actually a sequence of colon-prefixed JSON objects, the right way to parse it is:
d = json.loads(text[1:])
if 'JOL' in d:
print >> needed, d['JOL']
Finally, you don't actually have anything named needed in your code; you opened a file named 'needed.txt', but you called the file object love. If your real code has a similar bug, it's possible that you're overwriting some completely different file over and over, and then looking in needed.txt and seeing nothing changed each timeā€¦
If you know that your starting and ending matching strings only appear once, you can ignore that it's JSON. If that's OK, then you can split on the starting characters (JOL":"), take the 2nd element of the split array [1], then split again on the ending characters (.tr) and take the 1st element of the split array [0].
>>> text = '{:{"JOL":"EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD.tr","LAPTOP":"error"}'
>>> text.split('JOL":"')[1].split('.tr')[0]
'EuXaqHIbfEDyvph%2BMHPdCOJWMDPD%2BGG2xf0u0mP9Vb4YMFr6v5TJzWlSqq6VL0hXy07VDkWHHcq3At0SKVUrRA7shgTvmKVbjhEazRqHpvs%3D-%1E2D%TL/xs23EWsc40fWD'

re.sub not replacing all occurrences

I'm not a Python developer, but I'm using a Python script to convert SQLite to MySQL
The suggested script gets close, but no cigar, as they say.
The line giving me a problem is:
line = re.sub(r"([^'])'t'(.)", r"\1THIS_IS_TRUE\2", line)
...along with the equivalent line for false ('f'), of course.
The problem I'm seeing is that only the first occurrence of 't' in any given line is replaced.
So, input to the script,
INSERT INTO "cars" VALUES(56,'Bugatti Veyron','BUG 1',32,'t','t','2011-12-14 18:39:16.556916','2011-12-15 11:25:03.675058','81');
...gives...
INSERT INTO "cars" VALUES(56,'Bugatti Veyron','BUG 1',32,THIS_IS_TRUE,'t','2011-12-14 18:39:16.556916','2011-12-15 11:25:03.675058','81');
I mentioned I'm not a Python developer, but I have tried to fix this myself. According to the documentation, I understand that re.sub should replace all occurrences of 't'.
I'd appreciate a hint as to why I'm only seeing the first occurrence replaced, thanks.
The two substitutions you'd want in your example overlap - the comma between your two instances of 't' will be matched by (.) in the first case, so ([^']) in the second case never gets a chance to match it. This slightly modified version might help:
line = re.sub(r"(?<!')'t'(?=.)", r"THIS_IS_TRUE", line)
This version uses lookahead and lookbehind syntax, described here.
How about
line = line.replace("'t'", "THIS_IS_TRUE").replace("'f'", "THIS_IS_FALSE")
without using re. This replaces all occurrences of 't' and 'f'. Just make sure that no car is named t.
The first match you see is ,'t',. Python proceeds starting with the next character, which is ' (before the second t), subsequently, it cannot match the ([^']) part and skips the second 't'.
In other words, subsequent matches to be replaced cannot overlap.
using re.sub(r"\bt\b","THIS_IS_TRUE",line):
In [21]: strs="""INSERT INTO "cars" VALUES(56,'Bugatti Veyron','BUG 1',32,'t','t','2011-12-14 18:39:16.556916','2011-12-15 11:25:03.675058','81');"""
In [22]: print re.sub(r"\bt\b","THIS_IS_TRUE",strs)
INSERT INTO "cars" VALUES(56,'Bugatti Veyron','BUG 1',32,'THIS_IS_TRUE','THIS_IS_TRUE','2011-12-14 18:39:16.556916','2011-12-15 11:25:03.675058','81');

Categories

Resources