I have a trigram like
trigrm = [((w1,tag1), (w2,tag2),(w3,tag3))]
I would like to extract only tags of each word from above trigram in a tuple like
tup = (tag1,tag2,tag3)
ll = [x for _,x in sum(ll,())]
You can try:
>>> trigrm = [(("w1","tag1"), ("w2","tag2"),("w3","tag3"))]
>>> output = ([x[1] for x in trigrm[0]])
>>> print output
['tag1', 'tag2', 'tag3']
>>> tuple(output)
('tag1', 'tag2', 'tag3')
You can use zip. Here is an example using strings because I don't know the variable values
trigrm = [(('w1','tag1'), ('w2','tag2'),('w3','tag3'))]
tuples = list(zip(*trigrm[0]))[1]
print (tuples)
# ('tag1', 'tag2', 'tag3')
Related
I got a list of strings. Those strings have all the two markers in. I would love to extract the string between those two markers for each string in that list.
example:
markers 'XXX' and 'YYY' --> therefore i want to extract 78665786 and 6866
['XXX78665786YYYjajk', 'XXX6866YYYz6767'....]
You can just loop over your list and grab the substring. You can do something like:
import re
my_list = ['XXX78665786YYYjajk', 'XXX6866YYYz6767']
output = []
for item in my_list:
output.append(re.search('XXX(.*)YYY', item).group(1))
print(output)
Output:
['78665786', '6866']
import re
l = ['XXX78665786YYYjajk', 'XXX6866YYYz6767'....]
l = [re.search(r'XXX(.*)YYY', i).group(1) for i in l]
This should work
Another solution would be:
import re
test_string=['XXX78665786YYYjajk','XXX78665783336YYYjajk']
int_val=[int(re.search(r'\d+', x).group()) for x in test_string]
the command split() splits a String into different parts.
list1 = ['XXX78665786YYYjajk', 'XXX6866YYYz6767']
list2 = []
for i in list1:
d = i.split("XXX")
for g in d:
d = g.split("YYY")
list2.append(d)
print(list2)
it's saved into a list
I have multiple strings that looks like this:
“BPBA-SG790-NGTP-W-AU-BUN-3Y”
I want to compare the string to my list and if part of the string is in the list, I want to get only the part that is found on the list as a new variable.
This is my code:
mylist = ["770", "790", "1470", "1490"]
sq = “BPBA-SG790-NGTP-W-AU-BUN-3Y”
matching = [s for s in mylist if any(xs in s for xs in sq)]
print(matching)
>>> ['770', '790', '1470', '1490']
For example this is what I want to get:
mylist = ["770", "790", "1470", "1490"]
sq = “BPBA-SG790-NGTP-W-AU-BUN-3Y”
matching = [s for s in mylist if any(xs in s for xs in sq)]
print(matching)
>>> 790
Any idea how to do this?
Like this, you can use a list comprehension:
mylist = ["770", "790", "1470", "1490"]
sq = "BPBA-SG790-NGTP-W-AU-BUN-3Y"
matching = [m for m in mylist if m in sq]
print(matching)
Output:
['790']
You can use the in keyword from python:
mylist = ["770", "790", "1470", "1490"]
sq = "BPBA-SG790-NGTP-W-AU-BUN-3Y"
for i in mylist:
if i in sq:
print(i)
The code iterates through the list and prints the list element if it is in the string
Not sure I get your question, but the following should do the trick
[x for x in mylist if x in sq]
It return you with a list of those elements of the list that appears in the string
try
mylist = ["770", "790", "1470", "1490"]
sq = "BPBA-SG790-NGTP-W-AU-BUN-3Y"
b = [x for x in mylist if sq.find(x) != -1]
print b
[s for s in mylist if s in sq]
For those who dislike brevity:
This is a list comprehension. It evaluates to a list of strings s in mylist that satisfy the predicate s in sq (i.e., s is a substring of sq).
I have a list of dictionaries with text as value, and I want to remove the dictionaries that includes certain words in the texts.
df = [{'name':'jon','text':'the day is light'},{'name':'betty','text':'good night'},{'name':'shawn','text':'good afternoon'}]
I want to remove the dictionaries that include words 'light' and 'night' for 'text' key:
words = ['light','night']
pattern = re.compile(r"|".join(words))
Expected result:
df = [{'name':'shawn','text':'good afternoon'}]
[x for x in df if not any(w in x['text'] for w in words)]
You're close. All you need to do is write your list comprehension and apply the search pattern:
result = [x for x in df if not re.search(pattern, x['text'])]
Full example:
import re
df = [{'name':'jon','text':'the day is light'},{'name':'betty','text':'good night'},{'name':'shawn','text':'good afternoon'}]
words = ['light','night']
pattern = re.compile(r"|".join(words))
result = [x for x in df if not re.search(pattern, x['text'])]
print(result) # => [{'name': 'shawn', 'text': 'good afternoon'}]
I found my answer:
[x for x in df if not pattern.search(x['text'])]
I have a dictionary like this:
{'CO,': {u'123456': [55111491410]},
u'OA,': {u'3215': [55111400572]},
u'KO,': {u'asdas': [55111186735],u'5541017924': [55111438755]},
u'KU': {u'45645': [55111281815],u'546465238': [55111461870]},
u'TU': {u'asdfds': [55111161462],u'546454149': [55111128782],
u'546454793': [55111167133],u'546456387': [55111167139],
u'546456925': [55111167140],u'546458931': [55111226912],
u'546458951': [55111226914],u'546459861': [55111226916],
u'546460165': [55111403171, 55111461858]}}
I want to get merged list of all the lists in nested dictionary.
Output should be like this:
[55111491410,55111400572,55111186735,55111438755,55111281815,55111461870,55111167133,55111167139,....55111403171,55111461858]
An elegant answer based on regex and on the fact that all the values of interest are among square brackets
import re
pat = r'(?<=\[).+?(?=\])'
s = """{'CO,': {u'123456': [55111491410]},
u'OA,': {u'3215': [55111400572]},
u'KO,': {u'asdas': [55111186735],u'5541017924': [55111438755]},
u'KU': {u'45645': [55111281815],u'546465238': [55111461870]},
u'TU': {u'asdfds': [55111161462],u'546454149': [55111128782],
u'546454793': [55111167133],u'546456387': [55111167139],
u'546456925': [55111167140],u'546458931': [55111226912],
u'546458951': [55111226914],u'546459861': [55111226916],
u'546460165': [55111403171, 55111461858]}}"""
print('[%s]' % ', '.join(map(str, re.findall(pat, s))))
Output
[55111491410, 55111400572, 55111186735, 55111438755, 55111281815, 55111461870, 55111161462, 55111128782, 55111167133, 55111167139, 55111167140, 55111226912, 55111226914, 55111226916, 55111403171, 55111461858]
xJust a list comprehension using the dict's values and the inner dict values would do the job. But do remember that dicts are not ordered in python till 3.6. So if you are using the older version, the resulting list would also not be in any order
>>> dct = {'CO,': {u'123456': [55111491410]},
... u'OA,': {u'3215': [55111400572]},
... u'KO,': {u'asdas': [55111186735],u'5541017924': [55111438755]},
... u'KU': {u'45645': [55111281815],u'546465238': [55111461870]},
... u'TU': {u'asdfds': [55111161462],u'546454149': [55111128782],
... u'546454793': [55111167133],u'546456387': [55111167139],
... u'546456925': [55111167140],u'546458931': [55111226912],
... u'546458951': [55111226914],u'546459861': [55111226916],
... u'546460165': [55111403171, 55111461858]}}
>>>
>>> [e for idct in dct.values() for lst in idct.values() for e in lst]
[55111491410, 55111400572, 55111186735, 55111438755, 55111281815, 55111461870, 55111161462, 55111128782, 55111167133, 55111167139, 55111167140, 55111226912, 55111226914, 55111226916, 55111403171, 55111461858]
d = {'CO,': {u'123456': [55111491410]},
u'OA,': {u'3215': [55111400572]},
u'KO,': {u'asdas': [55111186735], u'5541017924': [55111438755]},
u'KU': {u'45645': [55111281815], u'546465238': [55111461870]},
u'TU': {u'asdfds': [55111161462], u'546454149': [55111128782],
u'546454793': [55111167133], u'546456387': [55111167139],
u'546456925': [55111167140], u'546458931': [55111226912],
u'546458951': [55111226914], u'546459861': [55111226916],
u'546460165': [55111403171, 55111461858]}}
z = []
for i in d.keys():
for j in d[i].keys():
z.append(d[i][j][0])
print(z)
output:
[55111491410, 55111400572, 55111186735, 55111438755, 55111281815, 55111461870, 55111161462, 55111128782, 55111167133, 55111167139, 55111167140, 55111226912, 55111226914, 55111226916, 55111403171]
(This is probably really simple, but) Say I have this input as a string:
"280.2,259.8 323.1,122.5 135.8,149.5 142.9,403.5"
and I want to separate each coordinate point onto separate lists, for each x and y value, so they'd end up looking like this:
listX = [280.2, 323.1, 135.8, 142.9]
listY = [259.8, 122.5, 149.5, 403.5]
I'd need this to be able to start out with any size string, thanks in advance!
Copy and paste this and it should work:
s_input = "280.2,259.8 323.1,122.5 135.8,149.5 142.9,403.5"
listX = [float(x.split(',')[0]) for x in s_input.split()]
listY = [float(y.split(',')[1]) for y in s_input.split()]
This would work.
my_string="280.2,259.8 323.1,122.5 135.8,149.5 142.9,403.5"
listX =[item.split(",")[0] for item in my_string.split()]
listY=[item.split(",")[1] for item in my_string.split()]
or
X_list=[]
Y_list=[]
for val in [item.split(",") for item in my_string.split()]:
X_list.append(val[0])
Y_list.append(val[1])
Which version to use would probably depend on your personal preference and the length of your string.
Have a look at the split method of strings. It should get you started.
You can do the following:
>>> a ="280.2,259.8 323.1,122.5 135.8,149.5 142.9,403.5"
>>> b = a.split(" ")
>>> b
['280.2,259.8', '323.1,122.5', '135.8,149.5', '142.9,403.5']
>>> c = [ x.split(',') for x in b]
>>> c
[['280.2', '259.8'], ['323.1', '122.5'], ['135.8', '149.5'], ['142.9', '403.5']]
>>> X = [ d[0] for d in c]
>>> X
['280.2', '323.1', '135.8', '142.9']
>>> Y = [ d[1] for d in c]
>>> Y
['259.8', '122.5', '149.5', '403.5']
There's a magical method call str.split, which given a string, splits by a delimiter.
Assume we have the string in a variable s.
To split by the spaces and make a list, we would do
coords = s.split()
At this point, the most straightforward method of putting it into the lists would be to do
listX = [float(sub.split(",")[0]) for sub in coords]
listY = [float(sub.split(",")[1]) for sub in coords]
You can use a a combination of zip and split with a list comprehension:
s = "280.2,259.8 323.1,122.5 135.8,149.5 142.9,403.5"
l = zip(*[a.split(',') for a in s.split()])
This will return a list of 2 tuples.
To get lists instead, use map on it.
l = map(list, zip(*[a.split(',') for a in s.split()]))
l[0] and l[1] will have your lists.
if your list is huge, consider using itertools.izip()