Splitline Python String - python

I have a list of elements whose text is like the following:
aSampleElementText = "Vraj Shroff\nIndia" I want to have two lists now where the first list's element would have "Vraj Shroff" and the second list's element would have "India".
I looked at other posts about split and splitlines. However, my code below is not giving me expected results.
Output:
"V",
"r"
Desired output:
"Vraj Shroff",
"India"
My code:
personalName = "Something" #first list
personalTitle = "Something" #second list
for i in range(len(names)-1)
#names is a list of elements (example above)
#it is len - 1 becuase I don't want to do this to the first element of the list
i += 1
temp = names[i].text
temp.splitlines()
personName.append(temp[0])
personTitle.append(temp[1])

names is a string. names[I] is the character corresponding to that index in the string. Hence you are getting this kind of output.
Do something like,
x = names.splitlines()
x will be the list with the elements.

names = []
locations = []
a = ["Vraj Shroff\nIndia", "Vraj\nIndia", "Shroff\nxyz", "abd cvd\nUS"]
for i in a:
b = i.splitlines()
names.append(b[0])
locations.append(b[1])
print(names)
print(locations)
output:
['Vraj Shroff', 'Vraj', 'Shroff', 'abd cvd']
['India', 'India', 'xyz', 'US']
Is this what you were looking for?

Related

Python Append from a list to another on a condition

i'm a python newbie and i want to check if a each list element is present in another list(while respecting the index) and append this element to a third list. like this. if first element of 'listy'("11-02-jeej") contains first element of list_of_dates ("11-02), i want this element "11-02-jeej" to be appended in the first list of a list of lists. the code below doesn't work for me :(
the output that i want from this code is :[["11-02-jeej"], [2apples], []]
but instead i get : [[], [], []]
thank you so much !
list_of_dates =["11-02,", "2", "5"]
listy = ["11-02-jeej", "2apples", "d44"]
length = len(list_of_dates)
lst = [[] for m in range(length)]
for i in range(len(list_of_dates)):
date = list_of_dates[i]
for j in range(len(listy)):
name = listy [j]
if date in name:
lst[m].append(name)
print(lst)
There are the following issues in your code:
The input has a comma in the first string: "11-02,". As you expect this to be a prefix, I suppose that trailing comma should not be there: "11-02"
The if statement should be inside the inner loop, since it needs the name variable that is assigned there.
m is not the correct index. It should be i, so you get: lst[i].append(name)
So here is your code with those corrections:
list_of_dates =["11-02", "2", "5"]
listy = ["11-02-jeej", "2apples", "d44"]
length = len(list_of_dates)
lst = [[] for m in range(length)]
for i in range(len(list_of_dates)):
date = list_of_dates[i]
for j in range(len(listy)):
name = listy [j]
if date in name:
lst[i].append(name)
print(lst)
Note that these loops can be written with list comprehension:
lst = [[s for s in listy if prefix in s] for prefix in list_of_dates]
Be aware that for the given example, "2" also occurs in "11-02-jeej", so you have both "11-02" and "2" giving a match, and so that will impact the result. If you wanted "2" to only match with "2apples", then you may want to test a match only at the start of a string, using .startswith().

Loop over each item in a row and compare with each item from another row then save the result in a new column_python

I want to loop in python, over each item from a row against other items from the correspondent row from another column.
If item is not present in the row of the second column then should append to the new list that will be converted in another column (this should also eliminate duplicates when appending through if i not in c).
The goal is to compare items from each row of a column against items from the correspondent row in another column and to save the unique values from the first column, in a new column same df.
df columns
This is just an example, I have much many items in each row
I tried using this code but nothing happened and conversion of the list into the column it's not correct from what I have tested
a= df['final_key_concat'].tolist()
b = df['attributes_tokenize'].tolist()
c = []
for i in df.values:
for i in a:
if i in a:
if i not in b:
if i not in c:
c.append(i)
print(c)
df['new'] = pd.Series(c)
Any help is more than needed, thanks in advance
So seeing as you have these two variables one way would be:
a= df['final_key_concat'].tolist()
b = df['attributes_tokenize'].tolist()
Try something like this:
new = {}
for index, items in enumerate(a):
for thing in items:
if thing not in b[index]:
if index in new:
new[index].append(thing)
else:
new[index] = [thing]
Then map the dictionary to the df.
df['new'] = df.index.map(new)
There are better ways to do it but this should work.
This should be what you want:
import pandas as pd
data = {'final_key_concat':[['Camiseta', 'Tecnica', 'hombre', 'barate'],
['deportivas', 'calcetin', 'hombres', 'deportivas', 'shoes']],
'attributes_tokenize':[['The', 'North', 'Face', 'manga'], ['deportivas',
'calcetin', 'shoes', 'North']]} #recreated from your image
df = pd.DataFrame(data)
a= df['final_key_concat'].tolist() #this generates a list of lists
b = df['attributes_tokenize'].tolist()#this also generates a list of lists
#Both list a and b need to be flattened so as to access their elements the way you want it
c = [itm for sblst in a for itm in sblst] #flatten list a using list comprehension
d = [itm for sblst in b for itm in sblst] #flatten list b using list comprehension
final_list = [itm for itm in c if itm not in d]#Sort elements common to both list c and d
print (final_list)
Result
['Camiseta', 'Tecnica', 'hombre', 'barate', 'hombres']
def parse_str_into_list(s):
if s.startswith('[') and s.endswith(']'):
return ' '.join(s.strip('[]').strip("'").split("', '"))
return s
def filter_restrict_words(row):
targets = parse_str_into_list(row[0]).split(' ', -1)
restricts = parse_str_into_list(row[1]).split(' ', -1)
print(restricts)
# start for loop each words
# use set type to save words or list if we need to keep words in order
words_to_keep = []
for word in targets:
# condition to keep eligible words
if word not in restricts and 3 < len(word) < 45 and word not in words_to_keep:
words_to_keep.append(word)
print(words_to_keep)
return ' '.join(words_to_keep)
df['FINAL_KEYWORDS'] = df[[col_target, col_restrict]].apply(lambda x: filter_restrict_words(x), axis=1)

How can I find all the indices of a string type item(contained in a sublist) that matches a given string?

I have a list that has string type items in its sublists.
mylist = [["Apple"],["Apple"],["Grapes", "Peach"],["Banana"],["Apple"], ["Apple", "Orange"]]
I want to get the indices of sublist that has Apple only.
This is what I have tried so far:
get_apple_indices = [i for i, x in enumerate(list(my_list)) if x == "Apple"]
print(get_apple_indices)
Actual output:
[]
Expected output:
[0,1,4]
perhaps compare each element against a single-item list ['Apple'] instead of comparing a list object against a string.
get_apple_indices = [i for i, x in enumerate(list(my_list)) if x == ["Apple"]]
Assuming you really do need to match on string instead of a list for some reason, here is a solution.
From just-so-snippets:
match_string = "Apple"
get_matching_indices = [i for i, x in enumerate(list(mylist)) if len(x) == 1 and x[0] == match_string]
You can see that it checks for sublists that have a length of 1 (if it has only "Apple", then it must have a length of 1), then it checks to see if the first (only) item matches the string.

Extracting the first word from every value in a list

So I have a long list of column headers. All are strings, some are several words long. I've yet to find a way to write a function that extracts the first word from each value in the list and returns a list of just those singular words.
For example, this is what my list looks like:
['Customer ID', 'Email','Topwater -https:', 'Plastics - some uml']
And I want it to look like:
['Customer', 'Email', 'Topwater', 'Plastics']
I currently have this:
def first_word(cur_list):
my_list = []
for word in cur_list:
my_list.append(word.split(' ')[:1])
and it returns None when I run it on a list.
You can use list comprehension to return a list of the first index after splitting the strings by spaces.
my_list = [x.split()[0] for x in your_list]
To address "and it returns None when I run it on a list."
You didn't return my_list. Because it created a new list, didn't change the original list cur_list, the my_list is not returned.
To extract the first word from every value in a list
From #dfundako, you can simplify it to
my_list = [x.split()[0] for x in cur_list]
The final code would be
def first_word(cur_list):
my_list = [x.split()[0] for x in cur_list]
return my_list
Here is a demo. Please note that some punctuation may be left behind especially if it is right after the last letter of the name:
names = ["OMG FOO BAR", "A B C", "Python Strings", "Plastics: some uml"]
first_word(names) would be ['OMG', 'A', 'Python', 'Plastics:']
>>> l = ['Customer ID', 'Email','Topwater -https://karls.azureedge.net/media/catalog/product/cache/1/image/627x470/9df78eab33525d08d6e5fb8d27136e95/f/g/fgh55t502_web.jpg', 'Plastics - https://www.bass.co.za/1473-thickbox_default/berkley-powerbait-10-power-worm-black-blue-fleck.jpg']
>>> list(next(zip(*map(str.split, l))))
['Customer', 'Email', 'Topwater', 'Plastics']
[column.split(' ')[0] for column in my_list] should do the trick.
and if you want it in a function:
def first_word(my_list):
return [column.split(' ')[0] for column in my_list]
(?<=\d\d\d)\d* try using this in a loop to extract the words using regex

How to get the values in split python?

['column1:abc,def', 'column2:hij,klm', 'column3:xyz,pqr']
I want to get the values after the :. Currently if I split it takes into account column1, column2, column3 as well, which I dont want. I want only the values.
This is similar to key-values pair in dictionary. The only dis-similarity is that it is list of strings.
How will I split it?
EDITED
user_widgets = Widgets.objects.filter(user_id = user_id)
if user_widgets:
for widgets in user_widgets:
widgets_list = widgets.gadgets_list //[u'column1:', u'column2:', u'column3:widget_basicLine']
print [item.split(":")[1].split(',') for item in widgets_list] //yields list index out of range
But when the widgets_list value is copied from the terminal and passed it runs correctly.
user_widgets = Widgets.objects.filter(user_id = user_id)
if user_widgets:
for widgets in user_widgets:
widgets_list = [u'column1:', u'column2:', u'column3:widget_basicLine']
print [item.split(":")[1].split(',') for item in widgets_list] //prints correctly.
Where I'm going wrong?
You can split items by ":", then split the item with index 1 by ",":
>>> l = ['column1:abc,def', 'column2:hij,klm', 'column3:xyz,pqr']
>>> [item.split(":")[1].split(',') for item in l]
[['abc', 'def'], ['hij', 'klm'], ['xyz', 'pqr']]
Nothing wrong with a 'for' loop and testing if your RH has actual data:
li=[u'column1:', u'column2:', u'column3:widget_basicLine', u'column4']
out=[]
for us in li:
us1,sep,rest=us.partition(':')
if rest.strip():
out.append(rest)
print out # [u'widget_basicLine']
Which can be reduced to a list comprehension if you wish:
>>> li=[u'column1:', u'column2:', u'column3:widget_basicLine', u'column4']
>>> [e.partition(':')[2] for e in li if e.partition(':')[2].strip()]
[u'widget_basicLine']
And you can further split by the comma if you have data:
>>> li=[u'column1:', u'column2:a,b', u'column3:c,d', u'column4']
>>> [e.partition(':')[2].split(',') for e in li if e.partition(':')[2].strip()]
[[u'a', u'b'], [u'c', u'd']]

Categories

Resources