I have a list of strings (from a .tt file) that looks like this:
list1 = ['have\tVERB', 'and\tCONJ', ..., 'tree\tNOUN', 'go\tVERB']
I want to turn it into a dictionary that looks like:
dict1 = { 'have':'VERB', 'and':'CONJ', 'tree':'NOUN', 'go':'VERB' }
I was thinking of substitution, but it doesn't work that well. Is there a way to tag the tab string '\t' as a divider?
Try the following:
dict1 = dict(item.split('\t') for item in list1)
Output:
>>>dict1
{'and': 'CONJ', 'go': 'VERB', 'tree': 'NOUN', 'have': 'VERB'}
Since str.split also splits on '\t' by default ('\t' is considered white space), you could get a functional approach by feeding dict with a map that looks quite elegant:
d = dict(map(str.split, list1))
With the dictionary d now being in the wanted form:
print(d)
{'and': 'CONJ', 'go': 'VERB', 'have': 'VERB', 'tree': 'NOUN'}
If you need a split only on '\t' (while ignoring ' ' and '\n') and still want to use the map approach, you can create a partial object with functools.partial that only uses '\t' as the separator:
from functools import partial
# only splits on '\t' ignoring new-lines, white space e.t.c
tabsplit = partial(str.split, sep='\t')
d = dict(map(tabsplit, list1))
this, of course, yields the same result for d using the sample list of strings.
do that with a simple dict comprehension and a str.split (without arguments strip splits on blanks)
list1 = ['have\tVERB', 'and\tCONJ', 'tree\tNOUN', 'go\tVERB']
dict1 = {x.split()[0]:x.split()[1] for x in list1}
result:
{'and': 'CONJ', 'go': 'VERB', 'tree': 'NOUN', 'have': 'VERB'}
EDIT: the x.split()[0]:x.split()[1] does split twice, which is not optimal. Other answers here do it better without dict comprehension.
A short way to solve the problem, since split method splits '\t' by default (as pointed out by Jim Fasarakis-Hilliard), could be:
dictionary = dict(item.split() for item in list1)
print dictionary
I also wrote down a more simple and classic approach.
Not very pythonic but easy to understand for beginners:
list1 = ['have\tVERB', 'and\tCONJ', 'tree\tNOUN', 'go\tVERB']
dictionary1 = {}
for item in list1:
splitted_item = item.split('\t')
word = splitted_item[0]
word_type = splitted_item[1]
dictionary1[word] = word_type
print dictionary1
Here I wrote the same code with very verbose comments:
# Let's start with our word list, we'll call it 'list1'
list1 = ['have\tVERB', 'and\tCONJ', 'tree\tNOUN', 'go\tVERB']
# Here's an empty dictionary, 'dictionary1'
dictionary1 = {}
# Let's start to iterate using variable 'item' through 'list1'
for item in list1:
# Here I split item in two parts, passing the '\t' character
# to the split function and put the resulting list of two elements
# into 'splitted_item' variable.
# If you want to know more about split function check the link available
# at the end of this answer
splitted_item = item.split('\t')
# Just to make code more readable here I now put 1st part
# of the splitted item (part 0 because we start counting
# from number 0) in "word" variable
word = splitted_item[0]
# I use the same apporach to save the 2nd part of the
# splitted item into 'word_type' variable
# Yes, you're right: we use 1 because we start counting from 0
word_type = splitted_item[1]
# Finally I add to 'dictionary1', 'word' key with a value of 'word_type'
dictionary1[word] = word_type
# After the for loop has been completed I print the now
# complete dictionary1 to check if result is correct
print dictionary1
Useful links:
You can quickly copy and paste this code here to check how it works and tweak it if you like: http://www.codeskulptor.com
If you want to learn more about split and string functions in general: https://docs.python.org/2/library/string.html
Related
I'm working on a project that deciphers the secret sentence.
when I input
apple.appleapple.pear.orange.lemon
I want it to change into
A.B.E.R.T
I used split and replace to do it. However, I can't find a way to change
"apple" into A and
"appleapple" into B
at the same time because when I use replace(), appleapple changes into AA
This is what I've tried.
list1 = n.split()
list2 = f's.split([\\.]) : {list1}'
print(list2.replace("apple", "A"))
print(list2.replace("appleapple", "B"))
print(list2)
You can use a dictionary instead
secret_dict = {'apple':'A','appleapple':'B','pear':'E','orange':'R','lemon':'T'}
n = 'apple.appleapple.pear.orange.lemon'
words_in_n=n.split('.')
resulting_secret_words = [secret_dict.get(word) for word in resulting_secret_words]
secret_sentence = ''.join(resulting_secret_words)
print(secret_sentence)
I think that in this case you should make use of dictionaries instead of replacing all the time. You will be thankful to do so if the 'vocabulary' of your project increases. I would do this:
dictionary = {
'apple': 'A',
'appleapple': 'B',
'pear': 'E',
'orange': 'R',
'lemon': 'T'
}
original = 'apple.appleapple.pear.orange.lemon'
words_list = original.split('.')
result = [dictionary.get(word, 'unknown') for word in words_list]
result = '.'.join(result)
print(result)
The above would print this:
A.B.E.R.T
Note the use of the dictionary's get() method to add a default value if the read word is not found in your vocabulary. For example, with the same dictionary and the string apple.appleapple.pear.orange.lemon.otherthing (I added 'otherthing' at the end) we would have as result the string A.B.E.R.T.unknown.
I have a single list that could be any amount of elements.
['jeff','ham','boat','','my','name','hello']
How do I split this one list into two lists or any amount of lists depending on blank string elements?
All these lists can then be put into one list of lists.
If you are certain that there is only one blank string in the list, you can use str.index to find the index of the blank string, and then slice the list accordingly:
index = lst.index('')
[lst[:index], lst[index + 1:]]
If there could be more than one blank string in the list, you can use itertools.groupby like this:
lst = ['jeff','ham','boat','','my','name','hello','','hello','world']
from itertools import groupby
print([list(g) for k, g in groupby(lst, key=bool) if k])
This outputs:
[['jeff', 'ham', 'boat'], ['my', 'name', 'hello'], ['hello', 'world']]
Using itertools.groupby, you can do:
from itertools import groupby
lst = ['jeff','ham','boat','','my','name','hello']
[list(g) for k, g in groupby(lst, key=bool) if k]
# [['jeff', 'ham', 'boat'], ['my', 'name', 'hello']]
Using bool as grouping key function makes use of the fact that the empty string is the only non-truthy string.
This is one approach using a simple iteration.
Ex:
myList = ['jeff','ham','boat','','my','name','hello']
result = [[]]
for i in myList:
if not i:
result.append([])
else:
result[-1].append(i)
print(result)
Output:
[['jeff', 'ham', 'boat'], ['my', 'name', 'hello']]
Let list_string be your list. This should do the trick :
list_of_list=[[]]
for i in list_string:
if len(i)>0:
list_of_list[-1].append(i)
else:
list_of_list.append([])
Basically, you create a list of list, and you go through your original list of string, each time you encounter a word, you put it in the last list of your list of list, and each time you encounter '' , you create a new list in your list of list. The output for your example would be :
[['jeff','ham','boat'],['my','name','hello']]
i'm not sure that this is what you're trying to do, but try :
my_list = ['jeff','ham','boat','','my','name','','hello']
list_tmp = list(my_list)
final_list = []
while '' in list_tmp:
idx = list_tmp.index('')
final_list.append(list_tmp[:idx])
list_tmp = list_tmp[idx + 1:]
So I have a long list of column headers. All are strings, some are several words long. I've yet to find a way to write a function that extracts the first word from each value in the list and returns a list of just those singular words.
For example, this is what my list looks like:
['Customer ID', 'Email','Topwater -https:', 'Plastics - some uml']
And I want it to look like:
['Customer', 'Email', 'Topwater', 'Plastics']
I currently have this:
def first_word(cur_list):
my_list = []
for word in cur_list:
my_list.append(word.split(' ')[:1])
and it returns None when I run it on a list.
You can use list comprehension to return a list of the first index after splitting the strings by spaces.
my_list = [x.split()[0] for x in your_list]
To address "and it returns None when I run it on a list."
You didn't return my_list. Because it created a new list, didn't change the original list cur_list, the my_list is not returned.
To extract the first word from every value in a list
From #dfundako, you can simplify it to
my_list = [x.split()[0] for x in cur_list]
The final code would be
def first_word(cur_list):
my_list = [x.split()[0] for x in cur_list]
return my_list
Here is a demo. Please note that some punctuation may be left behind especially if it is right after the last letter of the name:
names = ["OMG FOO BAR", "A B C", "Python Strings", "Plastics: some uml"]
first_word(names) would be ['OMG', 'A', 'Python', 'Plastics:']
>>> l = ['Customer ID', 'Email','Topwater -https://karls.azureedge.net/media/catalog/product/cache/1/image/627x470/9df78eab33525d08d6e5fb8d27136e95/f/g/fgh55t502_web.jpg', 'Plastics - https://www.bass.co.za/1473-thickbox_default/berkley-powerbait-10-power-worm-black-blue-fleck.jpg']
>>> list(next(zip(*map(str.split, l))))
['Customer', 'Email', 'Topwater', 'Plastics']
[column.split(' ')[0] for column in my_list] should do the trick.
and if you want it in a function:
def first_word(my_list):
return [column.split(' ')[0] for column in my_list]
(?<=\d\d\d)\d* try using this in a loop to extract the words using regex
The list ['a','a #2','a(Old)'] should become {'a'} because '#' and '(Old)' are to be excised and a list of duplicates isn't needed. I struggled to develop a list comprehension with a generator and settled on this since I knew it'd work and valued time more than looking good:
l = []
groups = ['a','a #2','a(Old)']
for i in groups:
if ('#') in i: l.append(i[:i.index('#')].strip())
elif ('(Old)') in i: l.append(i[:i.index('(Old)')].strip())
else: l.append(i)
groups = set(l)
What's the slick way to get this result?
Here is general solution, if you want to clean elements of list lst from parts in wastes:
lst = ['a','a #2','a(Old)']
wastes = ['#', '(Old)']
cleaned_set = {
min([element.split(waste)[0].strip() for waste in wastes])
for element in arr
}
You could write this whole expression in a single set comprehension
>>> groups = ['a','a #2','a(Old)']
>>> {i.split('#')[0].split('(Old)')[0].strip() for i in groups}
{'a'}
This will get everything preceding a # and everything preceding '(Old)', then trim off whitespace. The remainder is placed into a set, which only keeps unique values.
You could define a helper function to apply all of the splits and then use a set comprehension.
For example:
lst = ['a','a #2','a(Old)', 'b', 'b #', 'b(New)']
splits = {'#', '(Old)', '(New)'}
def split_all(a):
for s in splits:
a = a.split(s)[0]
return a.strip()
groups = {split_all(a) for a in lst}
#{'a', 'b'}
Still super new to Python 3 and have encountered a problem... I am trying to create a function which returns a dictionary with the keys being the length of each word and the values being the words in the string.
For example, if my string is: "The dogs run quickly forward to the park", my dictionary should return
{2: ['to'] 3: ['The', 'run', 'the'], 4: ['dogs', 'park], 7: ['quickly', 'forward']}
Problem is that when I loop through the items, it is only appending one of the words in the string.
def word_len_dict(my_string):
dictionary = {}
input_list = my_string.split(" ")
unique_list = []
for item in input_list:
if item.lower() not in unique_list:
unique_list.append(item.lower())
for word in unique_list:
dictionary[len(word)] = []
dictionary[len(word)].append(word)
return (dictionary)
print (word_len_dict("The dogs run quickly forward to the park"))
The code returns
{2: ['to'], 3: ['run'], 4: ['park'], 7: ['forward']}
Can someone point me in the right direction? Perhaps not giving me the answer freely, but what do I need to look at next in terms of adding the missing words to the list. I thought that appending them to the list would do it, but it's not.
Thank you!
This will solve all your problems:
def word_len_dict(my_string):
input_list = my_string.split(" ")
unique_set = set()
dictionary = {}
for item in input_list:
word = item.lower()
if word not in unique_set:
unique_set.add(word)
key = len(word)
if key not in dictionary:
dictionary[key] = []
dictionary[key].append(word)
return dictionary
You were wiping dict entries each time you encountered a new word. There were also some efficiencly problems (searching a list for membership while growing it resulted in an O(n**2) algorithm for an O(n) task). Replacing the list membership test with a set membership test corrected the efficiency problem.
It gives the correct output for your sample sentence:
>>> print(word_len_dict("The dogs run quickly forward to the park"))
{2: ['to'], 3: ['the', 'run'], 4: ['dogs', 'park'], 7: ['quickly', 'forward']}
I noticed some of the other posted solutions are failing to map words to lowercase and/or failing to remove duplicates, which you clearly wanted.
you can create first the list of the unique words like this in order to avoid a first loop, and populate the dictionary on a second step.
unique_string = set("The dogs run quickly forward to the park".lower().split(" "))
dict = {}
for word in unique_string:
key, value = len(word), word
if key not in dict: # or dict.keys() for better readability (but is the same)
dict[key] = [value]
else:
dict[key].append(value)
print(dict)
You are assigning an empty list to the dictionary item before you append the latest word, which erases all previous words.
for word in unique_list:
dictionary[len(word)] = [x for x in input_list if len(x) == len(word)]
Your code is simply resetting the key to an empty list each time, which is why you only get one value (the last value) in the list for each key.
To make sure there are no duplicates, you can set the default value of a key to a set which is a collection that enforces uniqueness (in other words, there can be no duplicates in a set).
def word_len_dict(my_string):
dictionary = {}
input_list = my_string.split(" ")
for word in input_list:
if len(word) not in dictionary:
dictionary[len(word)] = set()
dictionary[len(word)].add(word.lower())
return dictionary
Once you add that check, you can get rid of the first loop as well. Now it will work as expected.
You can also optimize the code further, by using the setdefault method of dictionaries.
for word in input_list:
dictionary.setdefault(len(word), set()).add(word.lower())
Pythonic way,
Using itertools.groupby
>>> my_str = "The dogs run quickly forward to the park"
>>> {x:list(y) for x,y in itertools.groupby(sorted(my_str.split(),key=len), key=lambda x:len(x))}
{2: ['to'], 3: ['The', 'run', 'the'], 4: ['dogs', 'park'], 7: ['quickly', 'forward']}
This option starts by creating a unique set of lowercase words and then takes advantage of dict's setdefault to avoid searching the dictionary keys multiple times.
>>> a = "The dogs run quickly forward to the park"
>>> b = set((word.lower() for word in a.split()))
>>> result = {}
>>> {result.setdefault(len(word), []).append(word.lower()) for word in b}
{None}
>>> result
{2: ['to'], 3: ['the', 'run'], 4: ['park', 'dogs'], 7: ['quickly', 'forward']}