currently I am using
for s in list:
print(*s)
but it displays the list output as
['this is']
['the output']
But I would like the output to be displayed as
this is
the output
There should be a simple solution but i am still yet to come across one.
l = [['this is'], ['the output']]
for sub_list in l:
print(sub_list[0])
list_string = ', '.join(list_name)
print (list_string) #without brackets
Join using the newline character:
print("\n".join(your_list))
Please note that list is a Python type and shouldn't be used as a variable name.
Related
I am scraping data from a website and I am trying to create a table from the data on the website. However the webpages may have multiple structures and therefore I am having trouble with doing this.
the 2 different types of structues I have come across so far are:
text1 = ['\nPie\n Type\n\xa0\nMain\n Ingrediënt\n\xa0\nCountry\n of Origin', '\nApplie\n Pie\n\xa0\nApples\n\xa0\nUnited\n Kingdom']
and
text2 = ['\n\nPie Type\n\n\nMain Ingrediënt\n\n\nCountry of Origin\n\n', '\n\nApple Pie\n\n\nApples\n\n\nUnited Kingdom\n\n']
so far, using the following code:
for x in range(len(text1)):
try:
y = text1[x].strip().split('\xa0')
tlist = []
for p in y:
lst = p.replace('\n ', ' ').replace('\n', '')
tlist.append(lst)
dflist.append(tlist)
except:
dflist.append('X')
text1 will return the following output:
[['Pie Type', 'Main Ingrediënt', 'Country of Origin'], ['Applie Pie', 'Apples', 'United Kingdom']]
which is also what I would like text 2 to return. but using the same code on text2 will return:
[['Pie TypeMain IngrediëntCountry of Origin'], ['Apple PieApplesUnited Kingdom']]
because it contains \n\n\n instead of \n\xa0\n.
I have tried using a if statement to figure out whether the data contains either of the 2, but that does not seem to work if i use if '\xa0' in text1:.
Can anyone help me with either a regex function that can turn both of these in the desired structure, or help me come up with another way to tackle this problem?
Thanks!
EDIT:
Thanks to everyone who responded to this question timely. Unfortunately none of your answers provided me with the results i would like, but I answerd my own question. Will accept it as an answer in 2 days when stackoverflow allows me to unless I receive another answer that works.
EDIT:
Both Suneesh Jacob and RJ Adriaansen provided a working solution for this problem with an efficient code. I decided to time the answers and accept the fastest one as the best one.
results:
def decode(data):
return [[j.replace('\n ',' ').strip() for j in re.split(r'\n\n\n|\n\xa0\n',i)] for i in data]
result:
0.07250859999999999
and
def clean_list(lst):
lst = [re.split('\n\n\n|\n\xa0\n',i) for i in lst]
return [[' '.join(i.split()) for i in sublist] for sublist in lst]
result:
0.0712564
thank you all!
With regex you can specify multiple delimiters:
import re
def clean_list(lst):
lst = [re.split('\n\n\n|\n\xa0\n',i) for i in lst]
return [[' '.join(i.split()) for i in sublist] for sublist in lst]
print(clean_list(text1), clean_list(text2)):
[['Pie Type', 'Main Ingrediënt', 'Country of Origin'], ['Applie Pie', 'Apples', 'United Kingdom']]
[['Pie Type', 'Main Ingrediënt', 'Country of Origin'], ['Apple Pie', 'Apples', 'United Kingdom']]
You can try something like this:
import re
text1 = ['\nPie\n Type\n\xa0\nMain\n Ingrediënt\n\xa0\nCountry\n of Origin', '\nApplie\n Pie\n\xa0\nApples\n\xa0\nUnited\n Kingdom']
text2 = ['\n\nPie Type\n\n\nMain Ingrediënt\n\n\nCountry of Origin\n\n', '\n\nApple Pie\n\n\nApples\n\n\nUnited Kingdom\n\n']
def decode(data):
return [[j.replace('\n ',' ').strip() for j in re.split(r'\n\n\n|\n\xa0\n',i)] for i in data]
print(decode(text1))
print(decode(text2))
no need for a regex in your case simply do this and add a replace before the split
y = text1[x].strip().replace("\n\n\n",'\n\xa0\n').split('\xa0')
instead of
y = text1[x].strip().split('\xa0')
Alright so after messing around for a while I was able to figure out my issue. Thanks for everyone who responded to this question, but unfortunately none of your answers provided the results I wanted. I will comment on your answers what the output was.
For my issue specifically, the problem was:
if '\n\n\n' in text1:
Does not work, since text1 is a list. It should be:
if '\n\n\n' in text1[0]:
I can't believe this took me 3 hours :/.
The code that wound up working for me was:
dflist = []
for x in range(len(text1)):
try:
if '\xa0' in text1[0]:
y = text1[x].strip().split('\xa0')
if "\n\n\n" in text1[0]:
y = text1[x].strip().split('\n\n\n')
tlist = []
for p in y:
lst = p.replace('\n ', ' ').replace('\n', '')
tlist.append(lst)
dflist.append(tlist)
except:
dflist.append('X')
EDIT:
After this post, everyone who initially answered my question updated their answers and all of them seem to work now. I provided a timeit in my question to show the one which works the fastest and accepted that one as the answer.
Say you have the following code:
bicycles = ['Trek','Cannondale','Redline','Secialized']
print(bicycles[0],bicycles[1],bicycles[2],bicycles[3])
This would print out:
Trek Cannondale Redline Specialized
I have two questions. First, Is there a way to make the print string more organized so that you don't have to type out bicycles multiple times? I know that if you were to just do:
print(bicycles)
It would print the brackets also, which I'm trying to avoid.
Second question, how would I insert commas to display within the list when its printed?
This is how I would like the outcome:
Trek, Cannondale, Redline, Specialized.
I know that I could just do
print("Trek, Cannondale, Redline, Specialized.")
But using a list, is there anyway to make it more organzed? Or would printing the sentence out be the smartest way of doing it?
use .join() method:
The method join() returns a string in which the string elements of
sequence have been joined by str separator.
syntax: str.join(sequence)
bicycles = ['Trek','Cannondale','Redline','Secialized']
print (' '.join(bicycles))
output:
Trek Cannondale Redline Secialized
Example: change separotor into ', ':
print (', '.join(bicycles))
output:
Trek, Cannondale, Redline, Secialized
For python 3. you can also use unpacking:
We can use * to unpack the list so that all elements of it can be
passed as different parameters.
We use operator *
bicycles = ['Trek','Cannondale','Redline','Secialized']
print (*bicycles)
output:
Trek Cannondale Redline Secialized
NOTE:
It's using ' ' as a default separator, or specify one, eg:
print(*bicycles, sep=', ')
Output:
Trek, Cannondale, Redline, Secialized
It will also work if the elements in the list are different types (without having to explicitly cast to string)
eg, if bicycles was ['test', 1, 'two', 3.5, 'something else']
bicycles = ['test', 1, 'two', 3.5, 'something else']
print(*bicycles, sep=', ')
output:
test, 1, two, 3.5, something else
You can use join:
' '.join(bicycles)
', '.join(bicycles)
I am trying to replace any i's in a string with capital I's. I have the following code:
str.replace('i ','I ')
However, it does not replace anything in the string. I am looking to include a space after the I to differentiate between any I's in words and out of words.
Thanks if you can provide help!
The exact code is:
new = old.replace('i ','I ')
new = old.replace('-i-','-I-')
new = old.replace('i ','I ')
new = old.replace('-i-','-I-')
You throw away the first new when you assign the result of the second operation over it.
Either do
new = old.replace('i ','I ')
new = new.replace('-i-','-I-')
or
new = old.replace('i ','I ').replace('-i-','-I-')
or use regex.
I think you need something like this.
>>> import re
>>> s = "i am what i am, indeed."
>>> re.sub(r'\bi\b', 'I', s)
'I am what I am, indeed.'
This only replaces bare 'i''s with I, but the 'i''s that are part of other words are left untouched.
For your example from comments, you may need something like this:
>>> s = 'i am sam\nsam I am\nThat Sam-i-am! indeed'
>>> re.sub(r'\b(-?)i(-?)\b', r'\1I\2', s)
'I am sam\nsam I am\nThat Sam-I-am! indeed'
I'm trying to create a program where the user inputs a list of strings, each one in a separate line. I want to be able to be able to return, for example, the third word in the second line. The input below would then return "blue".
input_string("""The cat in the hat
Red fish blue fish """)
Currently I have this:
def input_string(input):
words = input.split('\n')
So I can output a certain line using words[n], but how do output a specific word in a specific line? I've been trying to implement being able to type words[1][2] but my attempts at creating a multidimensional array have failed.
I've been trying to split each words[n] for a few hours now and google hasn't helped. I apologize if this is completely obvious, but I just started using Python a few days ago and am completely stuck.
It is as simple as:
input_string = ("""The cat in the hat
Red fish blue fish """)
words = [i.split(" ") for i in input_string.split('\n')]
It generates:
[['The', 'cat', 'in', 'the', 'hat', ''], ['Red', 'fish', 'blue', 'fish', '']]
It sounds like you want to split on os.linesep (the line separator for the current OS) before you split on space. Something like:
import os
def input_string(input)
words = []
for line in input.split(os.linesep):
words.append(line.split())
That will give you a list of word lists for each line.
There is a method called splitlines() as well. It will split on newlines. If you don't pass it any arguments, it will remove the newline character. If you pass it True, it will keep it there, but separate the lines nonetheless.
words = [line.split() for line in input_string.splitlines()]
Try this:
lines = input.split('\n')
words = []
for line in lines:
words.append(line.split(' '))
In english:
construct a list of lines, this would be analogous to reading from a file.
loop over each line, splitting it into a list of words
append the list of words to another list. this produces a list of lists.
How to match exact string/word while searching a list. I have tried, but its not correct. below I have given the sample list, my code and the test results
list = ['Hi, friend', 'can you help me?']
my code
dic=dict()
for item in list:
for word in item.split():
dic.setdefault(word, list()).append(item)
print dic.get(s)
test results:
s = "can" ~ expected output: 'can you help me?' ~ output I get: 'can you help me?'
s = "you" ~ expected output: *nothing* ~ output I get: 'can you help me?'
s = "Hi," ~ expected output: 'Hi, friend' ~ output I get: 'Hi, friend'
s = "friend" ~ expected output: *nothing* ~ output I get: 'Hi, friend'
My list contains 1500 strings. Anybody can help me??
Looks like you need a map of sentences and their starting word, so you don't need to map all words in that sentence but only the first one.
from collections import defaultdict
sentences = ['Hi, friend', 'can you help me?']
start_sentence_map = defaultdict(list)
for sentence in sentences:
start = sentence.split()[0]
start_sentence_map[start].append(sentence)
for s in ["can", "you", "Hi,", "friend"]:
print s,":",start_sentence_map.get(s)
output:
can : ['can you help me?']
you : None
Hi, : ['Hi, friend']
friend : None
Also note few things from the code above
Don't use name list as name of variable because python uses it for list class
Use default dict which makes it easy to directly add entries to dictionary instead of first adding a default entry
Better descriptive names instead of mylist, or dic
In case if you just want to see if the sentence starts with a given words you can try startswith if you don;t want the searched word to be at word boundary or split()[0] if you want it to match at word boundary. As an example
>>> def foo(s): # # word boundary
return [x for x in l if x.split()[0]==s]
>>> def bar(s): # Prefix
return [x for x in l if x.startswith(s)]
Also refrain from overlaying python global name-space like what you did when you named your list as list. I have called it l in my example.