Related
Following up on Python to replace a symbol between between 2 words in a quote
Extended input and expected output:
trying to replace comma between 2 words Durango and PC in the second line by & and then remove the quotes " as well. Same for third line with Orbis and PC and 4th line has 2 word combos in quotes that I would like to process "AAA - Character Tech, SOF - UPIs","Durango, Orbis, PC"
I would like to retain the rest of the lines using Python.
INPUT
2,SIN-Rendering,Core Tech - Rendering,PC,147,Reopened
2,Kenny Chong,Core Tech - Rendering,"Durango, PC",55,Reopened
3,SIN-Audio,AAA - Audio,"Orbis, PC",13,Open
LTY-168499,[PC][PS4][XB1] Missing textures from Fort Capture NPC face,3,CTU-CharacterTechBacklog,"AAA - Character Tech, SOF - UPIs","Durango, Orbis, PC",29,Waiting For
...
...
...
Like these, there can be 100 lines in my sample. So the expected output is:
2,SIN-Rendering,Core Tech - Rendering,PC,147,Reopened
2,Kenny Chong,Core Tech - Rendering, Durango & PC,55,Reopened
3,SIN-Audio,AAA - Audio, Orbis & PC,13,Open
LTY-168499,[PC][PS4][XB1] Missing textures from Fort Capture NPC face,3,CTU-CharacterTechBacklog,AAA - Character Tech & SOF - UPIs,Durango, Orbis & PC,29,Waiting For
...
...
...
So far, I could think of reading line by line and then if the line contains quote replace it with no character but then replacement of symbol inside is something I am stuck with.
Here is what I have right now:
for line in lines:
expr2 = re.findall('"(.*?)"', line)
if len(expr2)!=0:
expr3 = re.split('"',line)
expr4 = expr3[0]+expr3[1].replace(","," &")+expr3[2]
print >>k, expr4
else:
print >>k, line
but it does not consider the case in 4th line? There can be more than 3 combos as well. For eg.
3,SIN-Audio,"AAA - Audio, xxxx, yyyy","Orbis, PC","13, 22",Open
and wish to make this
3,SIN-Audio,AAA - Audio & xxxx & yyyy, Orbis & PC, 13 & 22,Open
How to achieve this, any suggestion? Learning Python.
So, by treating the input file as a .csv we can easily turn the lines into something easy to work with.
For example,
2,Kenny Chong,Core Tech - Rendering, Durango & PC,55,Reopened
is read as:
['2', 'Kenny Chong', 'Core Tech - Rendering', 'Durango, PC', '55', 'Reopened']
Then, by replacing all instances of , with _& (space) we would have the line:
['2', 'Kenny Chong', 'Core Tech - Rendering', 'Durango & PC', '55', 'Reopened']
And it replaces multiple instances of ,s within a line, and when finally writing we no longer have the original double quotes.
Here is the code, given that in.txt is your input file and it will write to out.txt.
import csv
with open('in.txt') as infile:
reader = csv.reader(infile)
with open('out.txt', 'w') as outfile:
for line in reader:
line = list(map(lambda s: s.replace(',', ' &'), line))
outfile.write(','.join(line) + '\n')
The fourth line is outputted as:
LTY-168499,[PC][PS4][XB1] Missing textures from Fort Capture NPC face,3,CTU-CharacterTechBacklog,AAA - Character Tech & SOF - UPIs,Durango & Orbis & PC,29,Waiting For
Please check this once: I could not find a single expression that could do this. So did it in a bit elaborate way. Will update if I can find a better way(Python 3)
import re
st = "3,SIN-Audio,\"AAA - Audio, xxxx, yyyy\",\"Orbis, PC\",\"13, 22\",Open"
found = re.findall(r'\"(.*)\"',st)[0].split("\",\"")
final = ""
for word in found:
final = final + (" &").join(word.split(","))+","
result = re.sub(r'\"(.*)\"',final[:-1],st)
print(result)
I am trying to read a file which has format like below: It has two '\n' space in between every line.
Great tool for healing your life--if you are ready to change your beliefs!<br /><a href="http
Bought this book for a friend. I read it years ago and it is one of those books you keep forever. Love it!
I read this book many years ago and have heard Louise Hay speak a couple of times. It is a valuable read...
I am using below python code to read the line and convert it into Dataframe:
open_reviews = open("C:\\Downloads\\review_short.txt","r",encoding="Latin-1" ).read()
documents = []
for r in open_reviews.split('\n\n'):
documents.append(r)
df = pd.DataFrame(documents)
print(df.head())
The output I am getting is as below:
0 I was very inspired by Louise's Hay approach t...
1 \n You Can Heal Your Life by
2 \n I had an older version
3 \n I love Louise Hay and
4 \n I thought the book was exellent
Since I used two (\n), it gets appended at beginning of each line. Is there any other way to handle this, so that I get output as below:
0 I was very inspired by Louise's Hay approach t...
1 You Can Heal Your Life by
2 I had an older version
3 I love Louise Hay and
4 I thought the book was exellent
This appends every non-blank line.
filename = "..."
lines = []
with open(filename) as f:
for line in f:
line = line.strip()
if line:
lines.append(line)
>>> lines
['Great tool for healing your life--if you are ready to change your beliefs!<br /><a href="http',
'Bought this book for a friend. I read it years ago and it is one of those books you keep forever. Love it!',
'I read this book many years ago and have heard Louise Hay speak a couple of times. It is a valuable read...']
lines = pd.DataFrame(lines, columns=['my_text'])
>>> lines
my_text
0 Great tool for healing your life--if you are r...
1 Bought this book for a friend. I read it years...
2 I read this book many years ago and have heard...
Try using the .stip() method. It will remove any unnecessary whitespace characters from the beginning or end of a string.
You can use it like this:
for r in open_review.split('\n\n'):
documents.append(r.strip())
Use readlines() and clean the line with strip().
filename = "C:\\Downloads\\review_short.txt"
open_reviews = open(filename, "r", encoding="Latin-1")
documents = []
for r in open_reviews.readlines():
r = r.strip() # clean spaces and \n
if r:
documents.append(r)
I have a file X_true that consists of sentences like these:
evid emerg interview show done deal
munich hamburg train crash wednesday first gener ice model power two electr power locomot capac 759 passeng
one report earlier week said older two boy upset girlfriend broken polic confirm
jordan previous said
Now instead of storing these sentences in a file, I wish to put them in an array(List of strings) to work with them throughout the code. So the array would look something like this:
['evid emerg interview show done deal',
'munich hamburg train crash wednesday first gener ice model power two electr power locomot capac 759 passeng',
'one report earlier week said older two boy upset girlfriend broken polic confirm',
'jordan previous said']
Earlier when working with the file, this was the code I was using:
def run(command):
output = subprocess.check_output(command, shell=True)
return output
row = run('cat '+'/Users/mink/X_true.txt'+" | wc -l").split()[0]
Now when I working with X_true as an array, how can I write an equivalent statement for the row assignment above?
len(X_true_array) ,where X_true_array is the array of ur file content represented by array.
because before then u use wc -l to get the line count of ur file,and in here u can represent the line count through the count of array item.
So I understand this correctly, you just want to read in a file and store each line as an element of an array?
X_true = []
with open("X_true.txt") as f:
for line in f:
X_true.append(line.strip())
Another option (thanks #roeland):
with open("X_true.txt") as f:
X_true = list(map(str.strip, f))
with open(X_true.txt) as f:
X_true= f.readlines()
or with stripping the newline character:
X_true= [line.rstrip('\n') for line in open(X_true.txt)]
Refer Input and Ouput:
Try this:
Using readlines
X_true = open("x_true.txt").readlines()
Using read:
X_true = open("x_true.txt").read().split("\n")
Using List comprehension:
X_true = [line.rstrip() for line in open("x_true.txt")]
with open(X_true.txt) as f:
array_of_lines = f.readlines()
array_of_lines will look like your example above. Note: it will still have the newline characters at the end of each string in the array. Those can be removed with string.strip() if they're a concern.
I have a text file which stores data like name : score e.g.:
bob : 10
fred : 3
george : 5
However, I want to make it so it says
10 : bob
3 : fred
5 : george
What would the code be to flip it like that?
Would I need to separate them first by removing the colon as I have managed this through this code?
file = open("Class 3.txt", "r")
t4 = (file.read())
test =''.join(t4.split(':')[0:10])
print (test)
How would I finish it and make it say the reverse?
This code handles fractional scores (e.g. 9.5), and doesn't care whether there are extra spaces around the : delimiter. It should be much easier to maintain than your current code.
Class 3.txt:
bob : 10
fred : 3
george : 5
Code:
class_num = input('Which class (1, 2, or 3)? ')
score_sort = input('Sort by name or score? ').lower().startswith('s')
with open("Class " + class_num + ".txt", "r") as f:
scores = {name.strip():float(score) for
name,score in (line.strip().split(':') for line in f)}
if score_sort:
for name in sorted(scores, key=scores.get, reverse=True):
print(scores.get(name), ':', name)
else:
for name in sorted(scores):
print(name, ':', scores.get(name))
Input:
3
scores
Output:
10.0 : bob
5.0 : george
3.0 : fred
Input:
3
name
Output:
bob : 10.0
fred : 3.0
george : 5.0
First, this is going to be a lot harder to do whole-file-at-once than line-at-a-time.
But, either way, you obviously can't just split(':') and then ''.join(…). All that's going to do is replace colons with nothing. You obviously need ':'.join(…) to put the colons back in.
And meanwhile, you have to swap the values around on each side of each colon.
So, here's a function that takes just one line, and swaps the sides:
def swap_sides(line):
left, right = line.split(':')
return ':'.join((right, left))
But you'll notice there's a few problems here. The left has a space before the colon; the right has a space after the colon, and a newline at the end. How are you going to deal with that?
The simplest way is to just strip out all the whitespace on both sides, then add back in the whitespace you want:
def swap_sides(line):
left, right = line.split(':')
return ':'.join((right.strip() + ' ', ' ' + left.strip())) + '\n'
But a smarter idea is to treat the space around the colon as part of the delimiter. (The newline, you'll still need to handle manually.)
def swap_sides(line):
left, right = line.strip().split(' : ')
return ' : '.join((right.strip(), left.strip())) + '\n'
But if you think about it, do you really need to add the newline back on? If you're just going to pass it to print, the answer is obviously no. So:
def swap_sides(line):
left, right = line.strip().split(' : ')
return ' : '.join((right.strip(), left.strip()))
Anyway, once you're happy with this function, you just write a loop that calls it once for each line. For example:
with open("Class 3.txt", "r") as file:
for line in file:
swapped_line = swap_sides(line)
print(swapped_line)
Let's learn how to reverse a single line:
line = `bob : 10`
line.partition(' : ') # ('10', ' : ', 'bob')
''.join(reversed(line.partition(' : ')) # 'bob : 10'
Now, combine with reading lines from a file:
for line in open('Class 3.txt').read().splitlines():
print ''.join(reversed(line.partition(' : '))
Update
I am re-writing the code to read the file, line by line:
with open('Class 3.txt') as input_file:
for line in input_file:
line = line.strip()
print ''.join(reversed(line.partition(' : ')))
When using Python's textwrap library, how can I turn this:
short line,
long line xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
into this:
short line,
long line xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxx
I tried:
w = textwrap.TextWrapper(width=90,break_long_words=False)
body = '\n'.join(w.wrap(body))
But I get:
short line, long line xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
(spacing not exact in my examples)
try
w = textwrap.TextWrapper(width=90,break_long_words=False,replace_whitespace=False)
that seemed to fix the problem for me
I worked that out from what I read here (I've never used textwrap before)
body = '\n'.join(['\n'.join(textwrap.wrap(line, 90,
break_long_words=False, replace_whitespace=False))
for line in body.splitlines() if line.strip() != ''])
How about wrap only lines longer then 90 characters?
new_body = ""
lines = body.split("\n")
for line in lines:
if len(line) > 90:
w = textwrap.TextWrapper(width=90, break_long_words=False)
line = '\n'.join(w.wrap(line))
new_body += line + "\n"
TextWrapper is not designed to handle text that already has newlines in it.
There are a two things you may want to do when your document already has newlines:
1) Keep old newlines, and only wrap lines that are longer than the limit.
You can subclass TextWrapper as follows:
class DocumentWrapper(textwrap.TextWrapper):
def wrap(self, text):
split_text = text.split('\n')
lines = [line for para in split_text for line in textwrap.TextWrapper.wrap(self, para)]
return lines
Then use it the same way as textwrap:
d = DocumentWrapper(width=90)
wrapped_str = d.fill(original_str)
Gives you:
short line,
long line xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxx
2) Remove the old newlines and wrap everything.
original_str.replace('\n', '')
wrapped_str = textwrap.fill(original_str, width=90)
Gives you
short line, long line xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
(TextWrapper doesn't do either of these - it just ignores the existing newlines, which leads to a weirdly formatted result)
It looks like it doesn't support that. This code will extend it to do what I need though:
http://code.activestate.com/recipes/358228/
lines = text.split("\n")
lists = (textwrap.TextWrapper(width=90,break_long_words=False).wrap(line) for line in lines)
body = "\n".join("\n".join(list) for list in lists)
Here is a little module that can wrap text, break lines, handle extra indents (eg.a bulleted list), and replace characters/words with markdown!
class TextWrap_Test:
def __init__(self):
self.Replace={'Sphagnum':'$Sphagnum$','Equisetum':'$Equisetum$','Carex':'$Carex$',
'Salix':'$Salix$','Eriophorum':'$Eriophorum$'}
def Wrap(self,Text_to_fromat,Width):
Text = []
for line in Text_to_fromat.splitlines():
if line[0]=='-':
wrapped_line = textwrap.fill(line,Width,subsequent_indent=' ')
if line[0]=='*':
wrapped_line = textwrap.fill(line,Width,initial_indent=' ',subsequent_indent=' ')
Text.append(wrapped_line)
Text = '\n\n'.join(text for text in Text)
for rep in self.Replace:
Text = Text.replace(rep,self.Replace[rep])
return(Text)
Par1 = "- Fish Island is a low center polygonal peatland on the transition"+\
" between the Mackenzie River Delta and the Tuktoyaktuk Coastal Plain.\n* It"+\
" is underlain by continuous permafrost, peat deposits exceede the annual"+\
" thaw depth.\n* Sphagnum dominates the polygon centers with a caonpy of Equisetum and sparse"+\
" Carex. Dwarf Salix grows allong the polygon rims. Eriophorum and carex fill collapsed ice wedges."
TW=TextWrap_Test()
print(TW.Wrap(Par1,Text_W))
Will output:
Fish Island is a low center polygonal peatland on the
transition between the Mackenzie River Delta and the
Tuktoyaktuk Coastal Plain.
It is underlain by continuous permafrost, peat
deposits exceede the annual thaw depth.
$Sphagnum$ dominates the polygon centers with a
caonpy of $Equisetum$ and sparse $Carex$. Dwarf $Salix$
grows allong the polygon rims. $Eriophorum$ and
carex fill collapsed ice wedges.
Characters between the $$ would be in italics if you were working in matplotlib for instance, but the $$ won't count towards the line spacing since they are added after!
So if you did:
fig,ax = plt.subplots(1,1,figsize = (10,7))
ax.text(.05,.9,TW.Wrap(Par1,Text_W),fontsize = 18,verticalalignment='top')
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
You'd get:
I had to a similar problem formatting dynamically generated docstrings. I wanted to preserve the newlines put in place by hand and split any lines over a certain length. Reworking the answer by #far a bit, this solution worked for me. I only include it here for posterity:
import textwrap
wrapArgs = {'width': 90, 'break_long_words': True, 'replace_whitespace': False}
fold = lambda line, wrapArgs: textwrap.fill(line, **wrapArgs)
body = '\n'.join([fold(line, wrapArgs) for line in body.splitlines()])
Split, wrap+join, and rejoin:
def wrap(s):
return "\n".join("\n".join(textwrap.wrap(x)) for x in s.splitlines())