So, I already have the code to get all the words with digits in them out of the text, now all I need to do is to have the text all in one line.
with open("lolpa.txt") as f:
for word in f.readline().split():
digits = [c for c in word if c.isdigit()]
if not digits:
print(word)
The split makes the words all be in a different column.
If I take out the .split(), it types in the words without the digits, literally just takes the digits out of the words, and makes every letter to be in a different column.
EDIT: Yes, print(word,end=" ") works, thanks. But I also want the script to now read only one line. It can't read anything that is on line 2 or 3 etc.
The second problem is that the script reads only the FIRST line. So if the input in the first line would be
i li4ke l0ke like p0tatoes potatoes
300 bla-bla-bla 00bla-bla-0211
the output would be
i like potatoes
In Python v 3.x you'd use
print(word, end='')
to avoid the newline.
in Python v 2.x
print word,
you'd use the comma at the end of the items you are printing. Note that unlike in v3 you'd get a single blank space between consecutive prints
Note that print(word), won't prevent a newline in v 3.x.
--
Update based on edit in original post re code problem:
With input:
i li4ke l0ke like p0tatoes potatoes
300 bla-bla-bla 00bla-bla-0211
this code:
def hasDigit(w):
for c in w:
if c.isdigit():
return True
return False
with open("data.txt") as f:
for line in f:
digits = [w for w in line.split() if not hasDigit(w)]
if digits:
print ' '.join(digits)
# break # uncomment the "break" if you ONLY want to process the first line
will yield all the "words" that do not contain digits:
i like potatoes
bla-bla-bla <-- this line won't show if the "break" is uncommented above
Note:
The post was a bit unclear if you wanted to process only the first line of the file, or if the problem was that your script only processed the first line. This solution can work either way depending on whether the break statement is commented out or not.
with open("lolpa.txt") as f:
for word in f.readline().split():
digits = [c for c in word if c.isdigit()]
if not digits:
print word,
print
Not , at the end of print.
If you're using python 3.x, you can do:
print (word,end="")
to suppress the newline -- python 2.x uses the somewhat strange syntax:
print word, #trailing comma
Alternatively, use sys.stdout.write(str(word)). (this works for both python 2.x and 3.x).
you can use join():
with open("lolpa.txt") as f:
print ' '.join(str(x.split()) for x in f if not [c for c in x.split() if c.isdigit()])
using a simple for loop:
import sys
with open("data.txt") as f:
for x in f: #loop over f not f.readline()
word=x.split()
digits = [c for c in word if c.isdigit()]
if not digits:
sys.stdout.write(str(word)) #from mgilson's solution
Related
So it's not a very difficult problem and I've been trying to do it. Here is my sample code:
import sys
for s in sys.stdin:
s = s[0:1].upper() + s[1:len(s)-1] + s[len(s)-1:len(s)].upper()
print(s)
This code only capitalizes the first letter and not the last letter as well. Any tips?
You are operating on lines, not words, since iterating over sys.stdin will give you strings that consist of each line of text that you input. So your logic won't be capitalizing individual words.
There is nothing wrong with your logic for capitalizing the last character of a string. The reason that you are not seeming to capitalize the end of the line is that there's an EOL character at the end of the line. The capitalization of EOL is EOL, so nothing is changed.
If you call strip() on the input line before you process it, you'll see the last character capitalized:
import sys
for s in sys.stdin:
s = s.strip()
s = s[0:1].upper() + s[1:len(s)-1] + s[len(s)-1:len(s)].upper()
print(s)
#Calculuswhiz's answer shows you how to deal with capitalizing each word in your input.
You first have to split the line of stdin, then you can operate on each word using a map function. Without splitting, the stdin is only read line by line in the for loop.
#!/usr/bin/python
import sys
def capitalize(t):
# Don't want to double print single character
if len(t) is 1:
return t.upper()
else:
return t[0].upper() + t[1:-1] + t[-1].upper()
for s in sys.stdin:
splitLine = s.split()
l = map(capitalize, splitLine)
print(' '.join(l))
Try it online!
You could just use the capitalize method for str which will do exactly what you need, and then uppercase the last letter individually, something like:
my_string = my_string.capitalize()
my_string = my_string[:-1] + my_string[-1].upper()
This code is meant to read a text file and add every word to a dictionary where the key is the first letter and the values are all the words in the file that start with that letter. It kinda works but for
two problems I run into:
the dictionary keys contain apostrophes and periods (how to exclude?)
the values aren't sorted alphabetically and are all jumbled up. the code ends up outputting something like this:
' - {"don't", "i'm", "let's"}
. - {'below.', 'farm.', 'them.'}
a - {'take', 'masters', 'can', 'fallow'}
b - {'barnacle', 'labyrinth', 'pebble'}
...
...
y - {'they', 'very', 'yellow', 'pastry'}
when it should be more like:
a - {'ape', 'army','arrow', 'arson',}
b - {'bank', 'blast', 'blaze', 'breathe'}
etc
# make empty dictionary
dic = {}
# read file
infile = open('file.txt', "r")
# read first line
lines = infile.readline()
while lines != "":
# split the words up and remove "\n" from the end of the line
lines = lines.rstrip()
lines = lines.split()
for word in lines:
for char in word:
# add if not in dictionary
if char not in dic:
dic[char.lower()] = set([word.lower()])
# Else, add word to set
else:
dic[char.lower()].add(word.lower())
# Continue reading
lines = infile.readline()
# Close file
infile.close()
# Print
for letter in sorted(dic):
print(letter + " - " + str(dic[letter]))
I'm guessing I need to remove the punctuation and apostrophes from the whole file when I'm first iterating through it but before adding anything to the dictionary? Totally lost on getting the values in the right order though.
Use defaultdict(set) and dic[word[0]].add(word), after removing any starting punctuation. No need for the inner loop.
from collections import defaultdict
def process_file(fn):
my_dict = defaultdict(set)
for word in open(fn, 'r').read().split():
if word[0].isalpha():
my_dict[word[0].lower()].add(word)
return(my_dict)
word_dict = process_file('file.txt')
for letter in sorted(word_dict):
print(letter + " - " + ', '.join(sorted(word_dict[letter])))
You have a number of problems
splitting words on spaces AND punctuation
adding words to a set that could not exist at the time of the first addition
sorting the output
Here a short program that tries to solve the above issues
import re, string
# instead of using "text = open(filename).read()" we exploit a piece
# of text contained in one of the imported modules
text = re.__doc__
# 1. how to split at once the text contained in the file
#
# credit to https://stackoverflow.com/a/13184791/2749397
p_ws = string.punctuation + string.whitespace
words = re.split('|'.join(re.escape(c) for c in p_ws), text)
# 2. how to instantiate a set when we do the first addition to a key,
# that is, using the .setdefault method of every dictionary
d = {}
# Note: words regularized by lowercasing, we skip the empty tokens
for word in (w.lower() for w in words if w):
d.setdefault(word[0], set()).add(word)
# 3. how to print the sorted entries corresponding to each letter
for letter in sorted(d.keys()):
print(letter, *sorted(d[letter]))
My text contains numbers, so numbers are found in the output (see below) of the above program; if you don't want numbers filter them, if letter not in '0123456789': print(...).
And here it is the output...
0 0
1 1
8 8
9 9
a a above accessible after ailmsux all alphanumeric alphanumerics also an and any are as ascii at available
b b backslash be before beginning behaviour being below bit both but by bytes
c cache can case categories character characters clear comment comments compatibility compile complement complementing concatenate consist consume contain contents corresponding creates current
d d decimal default defined defines dependent digit digits doesn dotall
e each earlier either empty end equivalent error escape escapes except exception exports expression expressions
f f find findall finditer first fixed flag flags following for forbidden found from fullmatch functions
g greedy group grouping
i i id if ignore ignorecase ignored in including indicates insensitive inside into is it iterator
j just
l l last later length letters like lines list literal locale looking
m m made make many match matched matches matching means module more most multiline must
n n name named needn newline next nicer no non not null number
o object occurrences of on only operations optional or ordinary otherwise outside
p p parameters parentheses pattern patterns perform perl plus possible preceded preceding presence previous processed provides purge
r r range rather re regular repetitions resulting retrieved return
s s same search second see sequence sequences set signals similar simplest simply so some special specified split start string strings sub subn substitute substitutions substring support supports
t t takes text than that the themselves then they this those three to
u u underscore unicode us
v v verbose version versions
w w well which whitespace whole will with without word
x x
y yes yielding you
z z z0 za
Without comments and a little obfuscation it's just 3 lines of code...
import re, string
text = re.__doc__
p_ws = string.punctuation + string.whitespace
words = re.split('|'.join(re.escape(c) for c in p_ws), text)
d, add2d = {}, lambda w: d.setdefault(w[0],set()).add(w) #1
for word in (w.lower() for w in words if w): add2d(word) #2
for abc in sorted(d.keys()): print(abc, *sorted(d[abc])) #3
I wrote a program in which the user may enter any string. It should:
Delete all the vowels.
Insert the character "." before each consonant.
Replaces all uppercase consonants with corresponding lowercase ones.
Here is the code I wrote:
s=raw_input()
k=s.lower()
listaa=[]
for x in k:
listaa.append(x)
if x=='a':
listaa.remove('a')
if x=='o':
listaa.remove('o')
if x=='y':
listaa.remove('y')
if x=='e':
listaa.remove('e')
if x=='u':
listaa.remove('u')
if x=='i':
listaa.remove('i')
for a in listaa:
print '.%s'%(a),
This code works fine, but for example if I use the input tour, the output is .t .r. Although this is right, it's not exactly what i want. I want to remove spaces between them. I want output that looks like: .t.r
How do I do this?
If you put a comma after a print, it adds a space to the print statement. To not print a space or newline, use sys.stdout.write()
Fixed code:
import sys
s=raw_input()
k=s.lower()
listaa=[]
for x in k:
listaa.append(x)
if x=='a':
listaa.remove('a')
if x=='o':
listaa.remove('o')
if x=='y':
listaa.remove('y')
if x=='e':
listaa.remove('e')
if x=='u':
listaa.remove('u')
if x=='i':
listaa.remove('i')
for a in listaa:
sys.stdout.write('.%s' % a)
Note: you will need to add the import sys statement to use sys.stdout.write
Avoid using remove, since it removes the first occurrence of an item.
Instead, determine what you need to append before appending:
if x in 'bcdfghjklmnpqrstvwxyz':
listaa.append('.')
listaa.append(x)
elif x not in 'aeiou':
# If x is neither a vowel nor a consonant, it is appended.
listaa.append(x)
Also, it would be good to convert your list back to a string at the end:
result = ''.join(listaa)
print result
Regular expressions (contained in the re library) are designed to do exactly this sort of stuff.
import re
import string
alphabet = string.ascii_lowercase
vowels = 'aeiou'
consonants = "".join(set(alphabet)-(set(vowels)))
vowelre = re.compile("["+vowels+"]", re.IGNORECASE)
consonantre = re.compile("(["+consonants+"]){1}", re.IGNORECASE)
text = "Mary had a little lamb The butcher killed it DEAD"
print(vowelre.sub("", text))
print(consonantre.sub(".", text))
I note that you've put the Python 2.7 tag on your query. If you are using 2.7 then I would recommend you get used to using the Python 3 style print function. To enable this you may need a line
from __future__ import print_function
in your code.
tim#merlin:~$ python retest.py
Mry hd lttl lmb Th btchr klld t DD
.a.. .a. a .i...e .a.. ..e .u...e. .i..e. i. .EA.
HTH
using print elem, in python uses the delimiter of a space which means it automatically prints a space after each call to print. this can be avoided by calling print only once by making a single string:
print ''.join(listaa)
str.join(list) is a built in string method that joins a list's elements together with str. the problem with this is that it wont have a period before each consonant.
your logic can also be changed to remove if statements. best way to do this is to change your logic to a add if not in kind of way:
s=raw_input()
k=s.lower()
listaa=[]
for x in k:
if x not in ['a','e','i','o','u']: listaa.append('.'+x.lower())
all this does is check if x is a vowel, if not add a string made up of a period and x
you can also do this all in one line:
print ''.join('.'+x.lower() for x in raw_input() if x not in 'aeiou')
I have a .txt doc full of text. I'd like to search it for specific characters (or ideally groups of characters (strings) , then do things with the charcter found, and the characters 2 in front/4behind the selected characters.
I made a version that searches lines for the character, but I cant find the equivalent for characters.
f = open("C:\Users\Calum\Desktop\Robopipe\Programming\data2.txt", "r")
searchlines = f.readlines()
f.close()
for i, line in enumerate(searchlines):
if "_" in line:
for l in searchlines[i:i+2]: print l, #if i+2 then prints line and the next
print
If I understand the problem, what you want is to repeatedly search one giant string, instead of a searching a list of strings one by one.
So, the first step is, don't use readlines, use read, so you get that one giant string in the first place.
Next, how do you repeatedly search for all matches in a string?
Well, a string is an iterable, just like a list is—it's an iterable of characters (which are themselves strings with length 1). So, you can just iterate over the string:
f = open(path)
searchstring = f.read()
f.close()
for i, ch in enumerate(searchstring):
if ch == "_":
print searchstring[i-4:i+2]
However, notice that this only works if you're only searching for a single-character match. And it will fail if you find a _ in the first four characters. And it can be inefficient to loop over a few MB of text character by character.* So, you probably want to instead loop over str.find:
i = 4
while True:
i = searchstring.find("_", i)
if i == -1:
break
print searchstring[i-4:i+2]
* You may be wondering how find could possibly be doing anything but the same kind of loop. And you're right, it's still iterating character by character. But it's doing it in optimized code provided by the standard library—with the usual CPython implementation, this means the "inner loop" is in C code rather than Python code, it doesn't have to "box up" each character to test it, etc., so it can be much, much faster.
You could use a regex for this:
The regex searches for any two characters (that are not _), an _, then any four characters that are not an underscore.
import re
with open(path) as f:
searchstring = f.read()
regex = re.compile("([^_]{2}_[^_]{4})")
for match in regex.findall(searchstring):
print match
With the input of:
hello_there my_wonderful_friend
The script returns:
lo_ther
my_wond
ul_frie
I'm writing a program in python and I want to compare two strings that exist in a text file and are separated by a new line character. How can I read the file in and set each string to a different variable. i.e string1 and string2?
Right now I'm using:
file = open("text.txt").read();
but this gives me extra content and not just the strings. I'm not sure what it is returning but this text file just contains two strings. I tried using other methods such as ..read().splitlines() but this did not yield the result I'm looking for. I'm new to python so any help would be appreciated!
This only reads the first 2 lines, strips off the newline char at the end, and stores them in 2 separate variables. It does not read in the entire file just to get the first 2 strings in it.
with open('text.txt') as f:
word1 = f.readline().strip()
word2 = f.readline().strip()
print word1, word2
# now you can compare word1 and word2 if you like
text.txt:
foo
bar
asdijaiojsd
asdiaooiasd
Output:
foo bar
EDIT: to make it work with any number of newlines or whitespace:
with open('text.txt') as f:
# sequence of all words in all lines
words = (word for line in f for word in line.split())
# consume the first 2 items from the words sequence
word1 = next(words)
word2 = next(words)
I've verified this to work reliably with various "non-clean" contents of text.txt.
Note: I'm using generator expressions which are like lazy lists so as to avoid reading more than the needed amount of data. Generator expressions are otherwise equivalent to list comprehensions except they produce items in the sequence lazily, i.e. as just as much as asked.
with open('text.txt') as f:
lines = [line.strip() for line in f]
print lines[0] == lines[1]
I'm not sure what it is returning but this text file just contains two strings.
Your problem is likely related to whitespace characters (most common being carriage return, linefeed/newline, space and tab). So if you tried to compare your string1 to 'expectedvalue' and it fails, it's likely because of the newline itself.
Try this: print the length of each string then print each of the actual bytes in each string to see why the comparison fails.
For example:
>>> print len(string1), len(expected)
4 3
>>> for got_character, expected_character in zip(string1, expected):
... print 'got "{}" ({}), but expected "{}" ({})'.format(got_character, ord(got_character), expected_character, ord(expected_character))
...
got " " (32), but expected "f" (102)
got "f" (102), but expected "o" (111)
got "o" (111), but expected "o" (111)
If that's your problem, then you should strip off the leading and trailing whitespace and then execute the comparison:
>>> string1 = string1.strip()
>>> string1 == expected
True
If you're on a unix-like system, you'll probably have an xxd or od binary available to dump a more detailed representation of the file. If you're using windows, you can download many different "hex editor" programs to do the same.