I have many big strings with many characters (about 1000-1500 characters) and I want to write the string to a text file using python. However, I need the strings to occupy only a single line in a text file.
For example, consider two strings:
string_1 = "Mary had a little lamb
which was as white
as snow"
string_2 = "Jack and jill
went up a hill
to fetch a pail of
water"
When I write them to a text file, I want the strings to occupy only one line and not multiple lines.
text file eg:
Mary had a little lamb which was as white as snow
Jack and Jill went up a hill to fetch a pail of water
How can this be done?
If you want all the strings to be written out on one line in a file without a newline separator between them there are a number of ways as others here have shown.
The interesting issue is how you get them back into a program again if that is needed, and getting them back into appropriate variables.
I like to use json (docs here) for this kind of thing and you can get it to output all onto one line. This:
import json
string_1 = "Mary had a little lamb which was as white as snow"
string_2 = "Jack and jill went up a hill to fetch a pail of water"
strs_d = {"string_1": string_1, "string_2": string_2}
with open("foo.txt","w") as fh:
json.dump(strs_d, fh)
would write out the following into a file:
{"string_1": "Mary had a little lamb which was as white as snow", "string_2": "Jack and jill went up a hill to fetch a pail of water"}
This can be easily reloaded back into a dictionary and the oroginal strings pulled back out.
If you do not care about the names of the original string variable, then you can use a list like this:
import json
string_1 = "Mary had a little lamb which was as white as snow"
string_2 = "Jack and jill went up a hill to fetch a pail of water"
strs_l = [string_1, string_2]
with open("foo.txt","w") as fh:
json.dump(strs_l, fh)
and it outputs this:
["Mary had a little lamb which was as white as snow", "Jack and jill went up a hill to fetch a pail of water"]
which when reloaded from the file will get the strings all back into a list which can then be split into individual strings.
This all assumes that you want to reload the strings (and so do not mind the extra json info in the output to allow for the reloading) as opposed to just wanting them output to a file for some other need and cannot have the extra json formatting in the output.
Your example output does not have this, but your example output also is on more than one line and the question wanted it all on one line, so your needs are not entirely clear.
In [36]: string_1 = "Mary had a little lamb which was as white as snow"
...:
...: string_2 = "Jack and jill went up a hill to fetch a pail of water"
In [37]: s = [string_1, string_2]
In [38]: with open("a.txt","w") as f:
...: f.write(" ".join(s))
...:
Construct single line from multiline string and then write to file as normal. Your example really should use triple quotes to allow for multi-line strings
string_1 = """Mary had a little lamb
which was as white
as snow"""
string_2 = """Jack and jill
went up a hill
to fetch a pail of
water"""
with open("myfile.txt", "w") as f:
f.write(" ".join(string_1.split("\n")) + "\n")
f.write(" ".join(string_2.split("\n")) + "\n")
with open("myfile.txt") as f:
print(f.read())
output
Mary had a little lamb which was as white as snow
Jack and jill went up a hill to fetch a pail of water
You can split the string to lines using parenthesis:
s = (
"First line "
"second line "
"third line"
)
You can also use triple quotes and remove the newline characters using strip and replace:
s = """
First line
Second line
Third line
""".strip().replace("\n", " ")
total_str = [string_1,string_2]
with open(file_path+"file_name.txt","w") as fp:
for i in total_str:
fp.write(i+'\n')
fp.close()
Related
I have the following code:
output = requests.get(url=url, auth=oauth, headers=headers, data=payload)
output_data = output.content
type(output_date)
<class 'bytes'>
output_data
Squeezed Text (3632 Lines)
When looking at the squeezed text, I have some values that look like this:
Steve likes to walk his dog. Steve says to John "I like \n Pineapple, oranges, \n and pizza.\n" and then he went to bed \n.
John likes his beer cold.\n
Sally likes her teeth brushed with a bottle of jack.\n
How can I remove the \n characters, but ONLY if it is contained within double quotes, so that my results look like this:
Steve likes to walk his dog. Steve says to John "I like Pineapple, oranges, and pizza." and then he went to bed \n.
John likes his beer cold.\n
Sally likes her teeth brushed with a bottle of jack.\n
I know how to remove \n characters, but I am not sure how to do this if I only want to remove the values if they are contained within double quotes.
Here is what I have tries:
I found this, and used this code:
my_text = re.sub(r'"\\n"','',my_text)
But it doesn't seem to be working.
I might be complicating it a bit, but something like this might work
parts = content.split("\"")
for i, part in enumerate(parts):
if i % 2:
parts[i] = part.replace("\n", "")
content = "\"".join(parts)
Figured it out.
Steps:
Convert bytes to String
Create the pattern for Regex
Use regex to format the values.
Step 1:
my_text = my_text.decode("utf-8")
Step 2:
pattern = re.compile(r'".*?"',re.DOTALL)
Step 3:
my_text = pattern.sub(lambda x:x.group().replace('\n',''),my_text)
This solves my problem.
I have a list of movie titles and a list of names.
Movies:
Independence Day
Who Framed Roger Rabbit
Rosemary's Baby
Ghostbusters
There's Something About Mary
Names:
Roger
Kyle
Mary
Sam
I want to make a new list of all the movies that match a name from the names list.
Who Framed Roger Rabbit (matched "roger")
Rosemary's Baby (matched "mary")
There's Something About Mary (matched "mary")
I've tried to do this in Python, but for some reason it isn't working. The resulting file is empty.
with open("movies.csv", "r") as movieList:
movies = movieList.readlines()
with open("names.txt", "r") as namesToCheck:
names = namesToCheck.readlines()
with open("matches.csv", "w") as matches:
matches.truncate(0)
for i in range(len(movies)):
for j in range(len(names)):
if names[j].lower() in movies[i].lower():
matches.write(movies[i])
break
matches.close();
What am I missing here?
The reason that you aren't getting any results is likely that when you call readlines() on a file in Python it gives you a list of each line with a newline character, \n, attached to the end. Therefore your program would be checking if "roger\n" is in a line in the movies files rather than just "roger".
To fix this, you could simply add a [:-1] to your if statement to only check the name and not the newline:
if names[j].lower()[:-1] in movies[i].lower():
You could also change the way you read the names file by using read().splitlines() to get rid of the newline character like this:
names = namesToCheck.read().splitlines()
This works ....
Movies="""Independence Day
Who Framed Roger Rabbit
Rosemary's Baby
Ghostbusters
There's Something About Mary
"""
Names="""Roger
Kyle
Mary
Sam"""
with StringIO(Movies) as movie_file:
movies=[n.strip().lower() for n in movie_file.readlines()]
with StringIO(Names) as name_file:
names=[n.strip().lower() for n in name_file.readlines()]
for name in names:
for film in movies:
if film.find(name) is not -1:
print("{:20s} {:40s}".format(name,film))
Output:
roger who framed roger rabbit
mary rosemary's baby
mary there's something about mary
I have a txt file, single COLUMN, taken from excel, of the following type:
AMANDA (LOUDLY SPEAKING)
JEFF
STEVEN (TEASINGLY)
AMANDA
DOC BRIAN GREEN
As output I want:
AMANDA
JEFF
STEVEN
AMANDA
DOC BRIAN GREEN
I tried with a for cycle on all the column and then:
if (str[i] == '('):
return str.split('(')
but it's clearly not working.
Do you have any possible solution? I would then need an output file as my original txt, so with each name for each line in a single column.
Thanks everyone!
(I am using PyCharm 3.2)
I'd use regex in this situation. \w will replace letters, the * will select 0 or more. Then we check that it is between parenthesis.
import re
fi = "AMANDA (LOUDLY) JEFF STEVEN (TEASINGLY) AMANDA"
with open("mytext.txt","r") as fi, open("out.txt", "w") as fo:
for line in fi:
fo.write(re.sub("\(.*?\)", "", line))
You can split the string into a list using a regular expression that matches everything in parentheses or a full word, remove all elements from the list which contain parentheses and then join the list to a string again. The advantage is that there will be no double spaces in the result string where a word in parantheses was removed.
import re
text = "AMANDA (LOUDLY SPEAKING) JEFF STEVEN (TEASINGLY) AMANDA DOC BRIAN GREEN"
words = re.findall("\(.*?\)|[^\s]+",text)
print " ".join([x for x in words if "(" not in x])
I have a .txt file (scraped as pre-formatted text from a website) where the data looks like this:
B, NICKOLAS CT144531X D1026 JUDGE ANNIE WHITE JOHNSON
ANDREWS VS BALL JA-15-0050 D0015 JUDGE EDWARD A ROBERTS
I'd like to remove all extra spaces (they're actually different number of spaces, not tabs) in between the columns. I'd also then like to replace it with some delimiter (tab or pipe since there's commas within the data), like so:
ANDREWS VS BALL|JA-15-0050|D0015|JUDGE EDWARD A ROBERTS
Looked around and found that the best options are using regex or shlex to split. Two similar scenarios:
Python Regular expression must strip whitespace except between quotes,
Remove white spaces from dict : Python.
You can apply the regex '\s{2,}' (two or more whitespace characters) to each line and substitute the matches with a single '|' character.
>>> import re
>>> line = 'ANDREWS VS BALL JA-15-0050 D0015 JUDGE EDWARD A ROBERTS '
>>> re.sub('\s{2,}', '|', line.strip())
'ANDREWS VS BALL|JA-15-0050|D0015|JUDGE EDWARD A ROBERTS'
Stripping any leading and trailing whitespace from the line before applying re.sub ensures that you won't get '|' characters at the start and end of the line.
Your actual code should look similar to this:
import re
with open(filename) as f:
for line in f:
subbed = re.sub('\s{2,}', '|', line.strip())
# do something here
What about this?
your_string ='ANDREWS VS BALL JA-15-0050 D0015 JUDGE EDWARD A ROBERTS'
print re.sub(r'\s{2,}','|',your_string.strip())
Output:
ANDREWS VS BALL|JA-15-0050|D0015|JUDGE EDWARD A ROBERTS
Expanation:
I've used re.sub() which takes 3 parameter, a pattern, a string you want to replace with and the string you want to work on.
What I've done is taking at least two space together , I 've replaced them with a | and applied it on your string.
s = """B, NICKOLAS CT144531X D1026 JUDGE ANNIE WHITE JOHNSON
ANDREWS VS BALL JA-15-0050 D0015 JUDGE EDWARD A ROBERTS
"""
# Update
re.sub(r"(\S)\ {2,}(\S)(\n?)", r"\1|\2\3", s)
In [71]: print re.sub(r"(\S)\ {2,}(\S)(\n?)", r"\1|\2\3", s)
B, NICKOLAS|CT144531X|D1026|JUDGE ANNIE WHITE JOHNSON
ANDREWS VS BALL|JA-15-0050|D0015|JUDGE EDWARD A ROBERTS
Considering there are at least two spaces separating the columns, you can use this:
lines = [
'B, NICKOLAS CT144531X D1026 JUDGE ANNIE WHITE JOHNSON ',
'ANDREWS VS BALL JA-15-0050 D0015 JUDGE EDWARD A ROBERTS '
]
for line in lines:
parts = []
for part in line.split(' '):
part = part.strip()
if part: # checking if stripped part is a non-empty string
parts.append(part)
print('|'.join(parts))
Output for your input:
B, NICKOLAS|CT144531X|D1026|JUDGE ANNIE WHITE JOHNSON
ANDREWS VS BALL|JA-15-0050|D0015|JUDGE EDWARD A ROBERTS
It looks like your data is in a "text-table" format.
I recommend using the first row to figure out the start point and length of each column (either by hand or write a script with regex to determine the likely columns), then writing a script to iterate the rows of the file, slice the row into column segments, and apply strip to each segment.
If you use a regex, you must keep track of the number of columns and raise an error if any given row has more than the expected number of columns (or a different number than the rest). Splitting on two-or-more spaces will break if a column's value has two-or-more spaces, which is not just entirely possible, but also likely. Text-tables like this aren't designed to be split on a regex, they're designed to be split on the column index positions.
In terms of saving the data, you can use the csv module to write/read into a csv file. That will let you handle quoting and escaping characters better than specifying a delimiter. If one of your columns has a | character as a value, unless you're encoding the data with a strategy that handles escapes or quoted literals, your output will break on read.
Parsing the text above would look something like this (i nested a list comprehension with brackets instead of the traditional format so it's easier to understand):
cols = ((0,34),
(34, 50),
(50, 59),
(59, None),
)
for line in lines:
cleaned = [i.strip() for i in [line[s:e] for (s, e) in cols]]
print cleaned
then you can write it with something like:
import csv
with open('output.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter='|',
quotechar='"', quoting=csv.QUOTE_MINIMAL)
for line in lines:
spamwriter.writerow([line[col_start:col_end].strip()
for (col_start, col_end) in cols
])
Looks like this library can solve this quite nicely:
http://docs.astropy.org/en/stable/io/ascii/fixed_width_gallery.html#fixed-width-gallery
Impressive...
Here is my dilemma: I'm writing an application in Python that will allow me to search a flat file (KJV bible.txt) for particular strings, and return the line number, book, and string searched for. However, I would also like to return the chapter and verse in which the string was found. That calls for me going to the beginning of the line and getting the chapter and verse number. I'm a Python neophyte, and am currently still reading through the Python tutorial by Guido van Rossum. This is something I'm trying to accomplish for a bible study group; something portable that can ran in the cmd module almost anywhere. I appreciate any help ... Thanks. Below is an excerpt from an example of a Bible chapter:
Daniel
1:1 In the third year of the reign of Jehoiakim king of Judah came
Nebuchadnezzar king of Babylon unto Jerusalem, and besieged it.
Say I searched for 'Jehoiakim' and one of the search results was the first line above. I would like to go to the numbers that precede this line (in this case 1:1) and get the chapter (1) and verse (1) and print them to the screen.
1:2 And the Lord gave Jehoiakim king of Judah into his hand, with part
of the vessels of the house of God: which he carried into the land of
Shinar to the house of his god; and he brought the vessels into the
treasure house of his god.
Code:
import os
import sys
import re
word_search = raw_input(r'Enter a word to search: ')
book = open("KJV.txt", "r")
first_lines = {36: 'Genesis', 4812: 'Exodus', 8867: 'Leviticus', 11749: 'Numbers', 15718: 'Deuteronomy',
18909: 'Joshua', 21070: 'Judges', 23340: 'Ruth', 23651: 'I Samuel', 26641: 'II Samuel',
29094: 'I Kings', 31990: 'II Kings', 34706: 'I Chronicles', 37378: 'II Chronicles',
40502: 'Ezra', 41418: 'Nehemiah', 42710: 'Esther', 43352: 'Job', 45937: 'Psalms', 53537: 'Proverbs',
56015: 'Ecclesiastes', 56711: 'The Song of Solomon', 57076: 'Isaih', 61550: 'Jeremiah',
66480: 'Lamentations', 66961: 'Ezekiel', 71548: 'Daniel' }
for ln, line in enumerate(book):
if word_search in line:
first_line = max(l for l in first_lines if l < ln)
bibook = first_lines[first_line]
template = "\nLine: {0}\nString: {1}\nBook:\n"
output = template.format(ln, line, bibook)
print output
Do a single split on whitespace, then split on :.
passage, text = line.split(None, 1)
chapter, verse = passage.split(':')
Use a regular expression: r'(\d+)\.(\d+)'
After finding a match (match = re.match(r'(\d+)\.(\d+)', line)), you can find the chapter in group 1 (chapter = match.group(1)) and the verse in group 2.
Use this code:
for ln, line in enumerate(book):
match = match = re.match(r'(\d+)\.(\d+)', line)
if match:
chapter, verse = match.group(1), match.group(2)
if word_search in line:
...
print 'Book %s %s:%s ...%s...' % (book, chapter, verse, line)