Using the Join command to eliminate extra paragraph breaks

Using the Join command to eliminate extra paragraph breaks - python

So I have this text:
'
Location
Address
Number
Website
'
Except the top and bottom lines are empty as well, there aren't single quotes on those two lines. I basically want to make sure each line is one after another without any line breaks. This is what I would like it to look like.
Location
Address
Number
Website
I want to strip all of the line breaks and just have each result one line after another. This is the code to scrape the information from a webpage.
results = soup.findAll('div', class_='name')
for each in results:
worksheet.write(row,1,each.text)
row += 1
Each time I run through this, I want the results to print one line after another. Thanks.

Is there a reason you cannot use a simple if?
results = soup.findAll('div', class_='name')
for each in results:
if each.text:
worksheet.write(row,1,each.text)
row += 1

To join the results with a line-break use :
('\n').join(results)
To join with new lines and remove any new line present use :
import re
line=re.sub(r"(\n)+",r"\n",('\n').join(results))
The above case is useful if you don't know how many new lines exist between the text.(reduces multiple newlines to one)
Also the answer given by Malvolio is to avoid the blank line while writing:
if each.text:
This line would check if a line(each in your case) has text, if it doesn't it skips the statements below it.

Related

Removing blank item from List not working when referencing it in a query

I have a file that has tab delimited data within it. I pull this data row by row and split it. I then use it in a sql query to insert into a table. unfortunately some of these files have a trailing tab (tab after last column) which ofc is interpreted as another column. When this happens I get a error saying sql expected 16 parameters and got 17.
running Python 3. I have tried using list comprehension and filters. But its not working
....
for line in islice(OutputFile,int(Quantity)):
data= line.split(" ")
#The following line removes ALL values in the list that are empty. This is for the case where there is a trailing tab in the data
data[:] = [val for val in data if val]
query = ("INSERT INTO [smthn].[dbo].smthn] (Country,ChargingType,OrderNumber,foo,bar,foo,bar,foo,bar,foo,bar,foo,bar,foo,Date,bar) values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)")
cursor.execute(query,int(CountryCode),ChargingType,OrderNumber,*data,str(datetime.datetime.now().strftime("%Y-%m-%d")),data[1][:18])
nb. Cursor.execute is a part of the For loop.. not sure why this thing isnt putting it in the loop.....
For the sake of company privacy I changed a bunch of naming above. If something is off with spacing or naming etc, please just take me at face value that it is correct in the actual code and i may have just missed it here when i changed up stuff.
Im supposed to receive the list with only 16 paramaters (as i remove the blank item from the list) so i can be executed successfully in the insert query.
as stated above error msg is "The SQL contains 16 parameter markers, but 17 parameters were supplied'"

You can get rid of trailing tabs by replacing data= line.split(" ") by:
data= line.strip().split(" ")

I can't actually see the data. But if it's really tabbed it could be better to split on a tab.
data = line.strip().split('\t')
or if you want a list comprehension:
data = [i for i in line.strip().split() if i] # no need to specify " " or '\t'

Use this to remove all trailing tab characters. r in rstrip is for right.
line.rstrip("\t").split("\t")
Or without rstrip you could slice up to 16th item inclusive with
[:17]
Instead of your 18.

Formating csv file to utilize zabbix sender

I have a csv file that i need to format first before i can send the data to zabbix, but i enconter a problem in 1 of the csv that i have to use.
This is a example of the part of the file that i have problems:
Some_String_That_ineed_01[ifcontainsCell01removehere],xxxxxxxxx02
Some_String_That_ineed_01vtcrmp01[ifcontainsCell01removehere]-[1],aass
so this is 2 lines from the file, the other lines i already treated.
i need to check if Cell01 is in the line
if 'Cell01' in h: do something.
i need to remove all the content beetwen the [ ] included this [] that contains the Cell01 word and leave only this:
Some_String_That_ineed_01,xxxxxxxxx02
Some_String_That_ineed_01vtcrmp01-[1],aass
else my script already treats quite easy. There must be a better way then what i think which is use h.split in the first [ then split again on the , then remove the content that i wanna then add what is left sums strings. since i cant use replace because i need this data([1]).
later on with the desired result i will add this to zabbix sender as the item and item key_. I already have the hosts,timestamps,values.

You should use a regexp (re module):
import re
s = "Some_String_That_ineed_01[ifcontainsCell01removehere],xxxxxxxxx02"
replaced = re.sub('\[.*Cell01.*\]', '', s)
print (replaced)
will return:
[root#somwhere test]# python replace.py
Some_String_That_ineed_01,xxxxxxxxx02
You can also experiment here.

Python, return new line in function return

I have code like this `
celldata=""
count=0
for tableData in y:
count = count+1
strcount=str(count)
celldata += strcount + ")" + tableData .text + "\n"
return celldata
`
I am returning the value to be used in flask, the issue is I want each for loop row in a new line but after trying \n, and in the flask web app I am getting celldata in one single line with one space each between each output line of the for loop.
Here is my current output for celldata in flask web
1)xxxx 2)yyyy
I want the flask web url to return
1)xxxx
2)yyyy

You're presumably returning HTML, and viewing that HTML in a browser.
In HTML, all runs of whitespace are equivalent—there's no difference between '\n' and ' '. The browser should convert them all to single spaces, and then decide how to flow the results nicely.
So, you're going to have to learn some basic HTML. But here are a few quick hints to get you started:
<p>one paragraph</p> <p>another paragraph</p> defines two separate paragraphs.
<p>one paragraph<br />with a line break in the middle</p> defines a paragraph with a line break in the middle.
<table><tr><td>row one</td></tr> <tr><td>row two</td></tr></table> defines a table of two rows (and one column).
The last one is the most complicated, but given that you've got things named tableData and celldata, I suspect it may be what you actually want here.
HTML itself only specifies "structure", not layout. It's up to the browser to decide what "two paragraphs" or "a line break" or "two rows" actually means in terms of actual pixels. If you want finer control, you need to learn CSS as well as HTML, which lets you specify explicit styles for these elements.

If you are trying to format this as HTML, I would suggest you add <br /> also to the returned text:
celldata = []
for count, tableData in enumerate(y, start=1):
celldata.append('{}) {}<br/>'.format(count, tableData.text))
return '\n'.join(celldata)
This first builds a list of entries with the correct numbering, and then joins each line together with a newline. The newline is purely cosmetic and will only effect how the HTML appears when viewed as source. It is the <br /> which will ensure each entry appears on a different line.
enumerate() is used to automatically count your entries for you.

Python-How to stop code checking every line of a text file individually?

so I'm working on a code that basically has a text file which stores a list of 'Products', and a user should be able to input what product GTIN-8 code they want and get a receipt from it.
I've made it so that when the user inputs the GTIN-8 code, the code looks through the Products text file to find a matching GTIN-8 code. When it does, it should get that line of code and then write it to another text file called 'Receipt' (I couldn't think of any other way to do this). It should also say 'Product found' or 'That Product doesn't exist' depending if there is a match. However, my code checks every individual line of code and says on screen for every line of code if there's a match or not. I need to have an individual line of code so I can take the information of that line, but I don't want the code to check EVERY single line. Just like an overview of the entire file and pick out that specific line. I hope that makes sense.
with open("Productsfile.txt") as f:
Found = False
for line in f:
if ProductsWanted in line:
Receipt=open("ReceiptFile.txt","a")
Receipt.write("%r x%r\n" % (line, AmountOfProducts))
Receipt.close()
print("Product found!")
Found = True
if not Found:
print("That product does not exist")
(I deleted the screenshot/link and put it in as text, I am sorry if putting in a screenshot/link was the wrong thing to do)
Thank you!!!

How many products are there?
If there are fewer than, say, a million, consider reading the contents of the file into a dictionary, keyed on the GTIN-8 code.
(Does that make sense?)

If you are going to work on smaller file then you can read the file lines into a list and then check if the line is in list. Like this:
Lines=[x.strip() for x in open("file.txt")] #readlines has \n at the end of lines so use strip to remove it
If product in lines:
print "line no =",lines.index(product)+1
else:
print "Product not found"

How do I preserve new lines when extracting text from html using lxml.text_content()

I am trying to learn to use Whoosh. I have a large collection of html documents I want to search. I discovered that the text_content() method creates some interesting problems for example I might have some text that is organized in a table that looks like
<html><table><tr><td>banana</td><td>republic</td></tr><tr><td>stateless</td><td>person</td></table></html>
When I take the original string and and get the tree and then use text_content to get the text in the following manner
mytree = html.fromstring(myString)
text = mytree.text_content()
The results have no spaces (as should be expected)
'bananarepublicstatelessperson'
I tried to insert new lines using string.replace()
myString = myString.replace('</tr>','</tr>\n')
I confirmed that the new line was present
'<html><table><tr><td>banana</td><td>republic</td></tr>\n<tr><td>stateless</td><td>person</td></table></html>'
but when I run the same code from above the line feeds are not present. Thus the resulting text_content() looks just like above.
This is a problem from me because I need to be able to separate words, I thought I could add non-breaking spaces after each td and line breaks after rows as well asd line breaks after body elements etc to get text that reasonably conforms to my original source.
I will note that I did some more testing and found that line breaks inserted after paragraph tag closes were preserved. But there is a lot of text in the tables that I need to be able to search.
Thanks for any assistance

You could use this solution:
import re
def striphtml(data):
p = re.compile(r'<.*?>')
return p.sub('', data)
>>> striphtml('I Want This <b>text!</b>')
>>> 'I Want This text!'
Found here: using python, Remove HTML tags/formatting from a string

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using the Join command to eliminate extra paragraph breaks - python

Is there a reason you cannot use a simple if? results = soup.findAll('div', class_='name') for each in results: if each.text: worksheet.write(row,1,each.text) row += 1

Related

Removing blank item from List not working when referencing it in a query

Formating csv file to utilize zabbix sender

Python, return new line in function return

Python-How to stop code checking every line of a text file individually?

How do I preserve new lines when extracting text from html using lxml.text_content()

Categories

Resources