I'm writing code that's pulling in data from a website and it's printing out all the text between certain tags. I am storing the result into a list every time the code pulls data from a tag so I have a list looking something like
Warning
Not
News
Legends
Name1
Name2
Name3
Pickle
Stop
Hello
I want to look into this list of strings and have code that'll find the keywords legends and pickle and print whatever strings are between them.
To elaborate in a further activity, I may create a whole list of all possible legend names and then, if they occur whenever I generate my list, to print those out that reoccur. Any insight into any of these questions?
For the second approach, you could create a regex alternation of expected matching names, then use a list comprehension to generate a list of matches:
tags = ['Warning', 'Not', 'News', 'Legends', 'Name1', 'Name2', 'Name3', 'Pickle', 'Stop', 'Hello']
names = ['Name1', 'Name2', 'Name3']
regex = r'^(?:' + r'|'.join(names) + r')$'
matches = [x for x in tags if re.search(regex, x)]
print(matches) # ['Name1', 'Name2', 'Name3']
Try this:
words = [
"Warning", "Not", "News", "Legends", "Name1",
"Name2", "Name3", "Pickle", "Stop", "Hello"
]
words_in_between = words[words.index("Legends") + 1:words.index("Pickle")]
print(words_in_between)
output:
['Name1', 'Name2', 'Name3']
This assumes that both "Legends" and "Pickle" are in the list exactly once.
You can use the list.index() method to find the numerical index of an item within a list, and then use list slicing to return the items in your list between those two points:
your_list = ['Warning','Not','News','Legends','Name1','Name2','Name3','Pickle','Stop','Hello']
your_list[your_list.index('Legends')+1:your_list.index('Pickle')]
The caveat is that .index() returns only the index of the first occurrence of the given item, so if your list has two 'legends' items, you'll only return the first index.
You can use list.index() to get the index of the first occurance of legends and pickle. Then you can use list slicing to get the elements in between
l = ['Warning','Not','News','Legends','Name1','Name2','Name3','Pickle','Stop','Hello']
l[l.index('Legends')+1 : l.index('Pickle')]
['Name1', 'Name2', 'Name3']
numpys function where gives you all occurances of a given item. So first make the lsit a numpy array
my_array = numpy.array(["Warning","Not","News","Legends","Name1","Name2","Name3","Pickle","Stop","Hello","Legends","Name1","Name2","Name3","Pickle",])
From here on you can use methods of numpy:
legends = np.where(my_array == "Legends")
pickle = np.where(my_array == "Pickle")
concatinating for easier looping
stack = np.concatenate([legends, pickle], axis=0)
look for the values between legends and pickle
np.concatenate([my_list[stack[0, i] + 1:stack[1, i]] for i in range(stack.shape[0])] )
The result in my case is:
array(['Name1', 'Name2', 'Name3', 'Name1', 'Name2'], dtype='<U7')
Related
I am trying to locate words that contains certain string inside a list of lists in python, for example: If I have a list of tuples like:
the_list = [
('Had denoting properly #T-jointure you occasion directly raillery'),
('. In said to of poor full be post face snug. Introduced imprudence'),
('see say #T-unpleasing devonshire acceptance son.'),
('Exeter longer #T-wisdom gay nor design age.', 'Am weather to entered norland'),
('no in showing service. Nor repeated speaking', ' shy appetite.'),
('Excited it hastily an pasture #T-it observe.', 'Snug #T-hand how dare here too.')
]
I want to find a specific string that I search for and extract a complete word that contains it, example
for sentence in the_list:
for word in sentence:
if '#T-' in word:
print(word)
import re
wordSearch = re.compile(r'word')
for x, y in the_list:
if wordSearch.match(x):
print(x)
elif wordSearch.match(y):
print(y)
You could use list of comprehension on a flattened array of yours:
from pandas.core.common import flatten
[[word for word in x.split(' ') if '#T-' in word] for x in list(flatten(the_list)) if '#T-' in x]
#[['#T-jointure'], ['#T-unpleasing'], ['#T-wisdom'], ['#T-it'], ['#T-hand']]
Relevant places: How to make a flat list out of list of lists? (specifically this answer), Double for loop list comprehension.
you would need to use re for this task
import re
a = re.search("#(.*?)[\s]",'Exeter longer #T-wisdom gay nor design age.')
a.group(0)
Note : you need to account for the Nonetype else it will throw and error
for name in the_list:
try:
if isinstance(name,(list,tuple)):
for name1 in name:
result = re.search("#(.*?)[\s]",name1)
print(result.group(0))
else:
result = re.search("#(.*?)[\s]",name)
print(result.group(0))
except:
pass
I have to sort keywords and values in a string.
This is my attempt:
import re
phrase='$1000 is the price of the car, it is 10 years old. And this sandwish cost me 10.34£'
list1 = (re.findall('\d*\.?\d+', phrase)) #this is to make a list that find all the ints in my phrase and sort them (1000, 10, 10.34)
list2= ['car', 'year', 'sandwish'] #this is to make a list of all the keywords in the phrase I need to find.
joinedlist = list1 + list2 #This is the combination of the 2 lists int and str that are in my sentence (the key elements)
filter1 = (sorted(joinedlist, key=phrase.find)) #This is to find all the key elements in my phrase and sort them by order of appearance.
print(filter1)
Unfortunately, in some cases, because the "sorted" function works by lexical sorting, integrals would be printed in the wrong order. This means that in some cases like this one, the output will be:
['1000', '10', 'car', 'year', 'sandwich', '10.34']
instead of:
['1000', 'car', '10', 'year', 'sandwich', '10.34']
as the car appears before 10 in the initial phrase.
Lexical sorting has nothing to do with it, because your sorting key is the position in the original phrase; all the sorting is done by numeric values (the indices returned by find). The reason that the '10' is appearing "out of order" is that phrase.find returns the first occurrence of it, which is inside the 1000 part of the string!
Rather than breaking the sentence apart into two lists and then trying to reassemble them with a sort, why not just use a single regex that selects the different kinds of things you want to keep? That way you don't need to re-sort them at all:
>>> re.findall('\d*\.?\d+|car|year|sandwish', phrase)
['1000', 'car', '10', 'year', 'sandwish', '10.34']
The issue is that 10 and 1000 each have the same value from Python's default string lookup. Both are found at the start of the string since 10 is a substring of 1000.
You can implement a regex lookup into phrase to implement the method you are attempting by using \b word boundaries so that 10 only matches 10 in your string:
def finder(s):
if m:=re.search(rf'\b{s}\b', phrase):
return m.span()[0]
elif m:=re.search(rf'\b{s}', phrase):
return m.span()[0]
return -1
Test it:
>>> sorted(joinedlist, key=finder)
['1000', 'car', '10', 'year', 'sandwish', '10.34']
It is easier if you turn phrase into a look up list of your keywords however. You will need some treatment for year as a keyword vs years in phrase; you can just use the regex r'\d+\.\d+|\w+' as a regex to find the words and then str.startswith() to test if it is close enough:
pl=re.findall(r'\d+\.\d+|\w+', phrase)
def finder2(s):
try: # first try an exact match
return pl.index(s)
except ValueError:
pass # not found; now try .startswith()
try:
return next(i for i,w in enumerate(pl) if w.startswith(s))
except StopIteration:
return -1
>>> sorted(joinedlist, key=finder2)
['1000', 'car', '10', 'year', 'sandwish', '10.34']
Hello I would like to remove the time from this list of dates generated by an API:
['2020-07-31 00:00:00.000', '2020-04-30 04:00:00.000', '2020-01-28 05:00:00.000', '2019-10-30 04:00:00.000', '2019-07-30 04:00:00.000', '2019-04-30 04:00:00.000', '2019-01-29 05:00:00.000']
I want the list to look like this:
['2020-07-31', '2020-04-30', '2020-01-28', '2019-10-30', '2019-07-30', '2019-04-30, '2019-01-29']
The thing is I have no idea how to do this task and would like some help.
You can split the strings and use the first value of that split in a comprehension
dates = [date.split()[0] for date in dates]
How I would do it - assuming that orig is your original list
new_list = [item.split()[0] for item in orig]
This builds a new list from the original - by splitting each item in the original on whitespace, and then taking the first item in that split entry.
Using:
cur.execute(SQL)
response= cur.fetchall() //response is a LOB object
names = response[0][0].read()
i have following SQL response as String names:
'Mike':'Mike'
'John':'John'
'Mike/B':'Mike/B'
As you can see it comes formatted. It is actualy formatted like:\\'Mike\\':\\'Mike\\'\n\\'John\\'... and so on
in order to check if for example Mike is inside list at least one time (i don't care how many times but at least one time)
I would like to have something like that:
l = ['Mike', 'Mike', 'John', 'John', 'Mike/B', 'Mike/B'],
so i could simply iterate over the list and ask
for name in l:
'Mike' == name:
do something
Any Ideas how i could do that?
Many thanks
Edit:
When i do:
list = names.split()
I receive the list which is nearly how i want it, but the elements inside look still like this!!!:
list = ['\\'Mike\\':\\'Mike\\", ...]
names = ['\\'Mike\\':\\'Mike\\", ...]
for name in names:
if "Mike" in name:
print "Mike is here"
The \\' business is caused by mysql escaping the '
if you have a list of names try this:
my_names = ["Tom", "Dick", "Harry"]
names = ['\\'Mike\\':\\'Mike\\", ...]
for name in names:
for my_name in my_names:
if myname in name:
print myname, " is here"
import re
pattern = re.compile(r"[\n\\:']+")
list_of_names = pattern.split(names)
# ['', 'Mike', 'Mike', 'John', 'John', 'Mike/B', '']
# Quick-tip: Try not to name a list with "list" as "list" is a built-in
You can keep your results this way or do a final cleanup to remove empty strings
clean_list = list(filter(lambda x: x!='', list_of_names))
I need to keep a count of words in the list that appear once in a list, and one list for words that appear twice without using any count method, I tried using a set but it removes only the duplicate not the original. Is there any way to keep the words appearing once in one list and words that appear twice in another list?
the sample file is text = ['Andy Fennimore Cooper\n', 'Peter, Paul, and Mary\n',
'Andy Gosling\n'], so technically Andy, and Andy would be in one list, and the rest in the other.
Using dictionaries is not allowed :/
for word in text:
clean = clean_up(word)
for words in clean.split():
clean2 = clean_up(words)
l = clean_list.append(clean2)
if clean2 not in clean_list:
clean_list.append(clean2)
print(clean_list)
This is a very bad, unPythonic way of doing things; but once you disallow Counter and dict, this is about all that's left. (Edit: except for sets, d'oh!)
text = ['Andy Fennimore Cooper\n', 'Peter, Paul, and Mary\n', 'Andy Gosling\n']
once_words = []
more_than_once_words = []
for sentence in text:
for word in sentence.split():
if word in more_than_once_words:
pass # do nothing
elif word in once_words:
once_words.remove(word)
more_than_once_words.append(word)
else:
once_words.append(word)
which results in
# once_words
['Fennimore', 'Cooper', 'Peter,', 'Paul,', 'and', 'Mary', 'Gosling']
# more_than_once_words
['Andy']
It is a silly problem removing key data structures or loops or whatever. Why not just program is C then? Tell your teacher to get a job...
Editorial aside, here is a solution:
>>> text = ['Andy Fennimore Cooper\n', 'Peter, Paul, and Mary\n','Andy Gosling\n']
>>> data=' '.join(e.strip('\n,.') for e in ''.join(text).split()).split()
>>> data
['Andy', 'Fennimore', 'Cooper', 'Peter', 'Paul', 'and', 'Mary', 'Andy', 'Gosling']
>>> [e for e in data if data.count(e)==1]
['Fennimore', 'Cooper', 'Peter', 'Paul', 'and', 'Mary', 'Gosling']
>>> list({e for e in data if data.count(e)==2})
['Andy']
If you can use a set (I wouldn't use it either, if you're not allowed to use dictionaries), then you can use the set to keep track of what words you have 'seen'... and another one for the words that appear more than once. Eg:
seen = set()
duplicate = set()
Then, each time you get a word, test if it is on seen. If it is not, add it to seen. If it is in seen, add it to duplicate.
At the end, you'd have a set of seen words, containing all the words, and a duplicate set, with all those that appear more than once.
Then you only need to substract duplicate from seen, and the result is the words that have no duplicates (ie. the ones that appear only once).
This can also be implemented using only lists (which would be more honest to your homework, if a bit more laborious).
from itertools import groupby
from operator import itemgetter
text = ['Andy Fennimore Cooper\n', 'Peter, Paul, and Mary\n', 'Andy Gosling\n']
one, two = [list(group) for key, group in groupby( sorted(((key, len(list(group))) for key, group in groupby( sorted(' '.join(text).split()))), key=itemgetter(1)), key=itemgetter(1))]