Get the characters from a list of lists

Get the characters from a list of lists - python

I have this example :
example=[["hello i am adolf","hi my name is "],["this is a test","i like to play"]]
So , I want to get the following array:
chars2=[['h', 'e', 'l', 'l', 'o', ' ', 'i', ' ', 'a', 'm', ' ', 'a', 'd', 'o', 'l', 'f','h', 'i', ' ', 'm', 'y', ' ', 'n', 'a', 'm', 'e', ' ', 'i', 's'],['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 't', 'e', 's', 't', 'i', ' ', 'l', 'i', 'k', 'e', ' ', 't', 'o', ' ', 'p', 'l', 'a', 'y']]
I tried this:
chars2=[]
for list in example:
for string in list:
chars2.extend(string)
but i get the following:
['h', 'e', 'l', 'l', 'o', ' ', 'i', ' ', 'a', 'm', ' ', 'a', 'd', 'o', 'l', 'f', 'h', 'i', ' ', 'm', 'y', ' ', 'n', 'a', 'm', 'e', ' ', 'i', 's', ' ', 't', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 't', 'e', 's', 't', 'i', ' ', 'l', 'i', 'k', 'e', ' ', 't', 'o', ' ', 'p', 'l', 'a', 'y']

For each list in example you need to add another list inside chars2 , currently you are just extending chars2 directly with each character.
Example -
chars2=[]
for list in example:
a = []
chars2.append(a)
for string in list:
a.extend(string)
Example/Demo -
>>> example=[["hello i am adolf","hi my name is "],["this is a test","i like to play"]]
>>> chars2=[]
>>> for list in example:
... a = []
... chars2.append(a)
... for string in list:
... a.extend(string)
...
>>> chars2
[['h', 'e', 'l', 'l', 'o', ' ', 'i', ' ', 'a', 'm', ' ', 'a', 'd', 'o', 'l', 'f', 'h', 'i', ' ', 'm', 'y', ' ', 'n', 'a', 'm', 'e', ' ', 'i', 's', ' '], ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 't', 'e', 's', 't', 'i', ' ', 'l', 'i', 'k', 'e', ' ', 't', 'o', ' ', 'p', 'l', 'a', 'y']]

Try using a simple list comprehension
example = [list(item) for sub in example for item in sub]

Related

Joining individual elements of an array

I have an array consisting of labels but each label has been broken down by individual characters. For example, this is the first 2 elements of the array:
array([['1', '.', ' ', 'I', 'd', 'e', 'n', 't', 'i', 'f', 'y', 'i', 'n',
'g', ',', ' ', 'A', 's', 's', 'e', 's', 's', 'i', 'n', 'g', ' ',
'a', 'n', 'd', ' ', 'I', 'm', 'p', 'r', 'o', 'v', 'i', 'n', 'g',
' ', 'C', 'a', 'r', 'e', '', ''],
['9', '.', ' ', 'N', 'o', 'n', '-', 'P', 'h', 'a', 'r', 'm', 'a',
'c', 'o', 'l', 'o', 'g', 'i', 'c', 'a', 'l', ' ', 'I', 'n', 't',
'e', 'r', 'v', 'e', 'n', 't', 'i', 'o', 'n', 's', '', '', '',
'', ''], ...
I would like it to be formatted as such:
array(['1. Identifying, Assessing and Improving Care',
'9. Non-Pharmacological Interventions', ...
I want to be able to iterate through a concatenate the label output so it is as shown above.
Any help in achieving this would be much appreciated :) Many thanks!

import numpy as np
k=np.array([['1', '.', ' ', 'I', 'd', 'e', 'n', 't', 'i', 'f', 'y', 'i', 'n',
'g', ',', ' ', 'A', 's', 's', 'e', 's', 's', 'i', 'n', 'g', ' ',
'a', 'n', 'd', ' ', 'I', 'm', 'p', 'r', 'o', 'v', 'i', 'n', 'g',
' ', 'C', 'a', 'r', 'e', '', ''],
['9', '.', ' ', 'N', 'o', 'n', '-', 'P', 'h', 'a', 'r', 'm', 'a',
'c', 'o', 'l', 'o', 'g', 'i', 'c', 'a', 'l', ' ', 'I', 'n', 't',
'e', 'r', 'v', 'e', 'n', 't', 'i', 'o', 'n', 's', '', '', '',
'', '']])
for x in k:
print(''.join(x))
#output
1. Identifying, Assessing and Improving Care
9. Non-Pharmacological Interventions
Using List comprehension:
[''.join(x) for x in k]
#output
['1. Identifying, Assessing and Improving Care',
'9. Non-Pharmacological Interventions']

Considering the array as a list of lists, you could join all characters by looping through the list:
r = [['1', '.', ' ', 'I', 'd', 'e', 'n', 't', 'i', 'f', 'y', 'i', 'n',
'g', ',', ' ', 'A', 's', 's', 'e', 's', 's', 'i', 'n', 'g', ' ',
'a', 'n', 'd', ' ', 'I', 'm', 'p', 'r', 'o', 'v', 'i', 'n', 'g',
' ', 'C', 'a', 'r', 'e', '', ''],
['9', '.', ' ', 'N', 'o', 'n', '-', 'P', 'h', 'a', 'r', 'm', 'a',
'c', 'o', 'l', 'o', 'g', 'i', 'c', 'a', 'l', ' ', 'I', 'n', 't',
'e', 'r', 'v', 'e', 'n', 't', 'i', 'o', 'n', 's', '', '', '',
'', '']]
t = ["".join(i) for i in r]
print(t)
Output:
['1. Identifying, Assessing and Improving Care',
'9. Non-Pharmacological Interventions']

array = [['1', '.', ' ', 'I', 'd', 'e', 'n', 't', 'i', 'f', 'y', 'i', 'n',
'g', ',', ' ', 'A', 's', 's', 'e', 's', 's', 'i', 'n', 'g', ' ',
'a', 'n', 'd', ' ', 'I', 'm', 'p', 'r', 'o', 'v', 'i', 'n', 'g',
' ', 'C', 'a', 'r', 'e', '', ''],
['9', '.', ' ', 'N', 'o', 'n', '-', 'P', 'h', 'a', 'r', 'm', 'a',
'c', 'o', 'l', 'o', 'g', 'i', 'c', 'a', 'l', ' ', 'I', 'n', 't',
'e', 'r', 'v', 'e', 'n', 't', 'i', 'o', 'n', 's', '', '', '',
'', '']]
# array(['1. Identifying, Assessing and Improving Care',
# '9. Non-Pharmacological Interventions', ...
array = [''.join(i) for i in array]
print(array) #['1. Identifying, Assessing and Improving Care', '9. Non-Pharmacological Interventions']

Assuming from array([...]) that you are using numpy, here's a solution
import numpy as np
a = np.array([['1', '.', ' ', 'I', 'd', 'e', 'n', 't', 'i', 'f', 'y', 'i', 'n',
'g', ',', ' ', 'A', 's', 's', 'e', 's', 's', 'i', 'n', 'g', ' ',
'a', 'n', 'd', ' ', 'I', 'm', 'p', 'r', 'o', 'v', 'i', 'n', 'g',
' ', 'C', 'a', 'r', 'e', '', ''],
['9', '.', ' ', 'N', 'o', 'n', '-', 'P', 'h', 'a', 'r', 'm', 'a',
'c', 'o', 'l', 'o', 'g', 'i', 'c', 'a', 'l', ' ', 'I', 'n', 't',
'e', 'r', 'v', 'e', 'n', 't', 'i', 'o', 'n', 's', '', '', '',
'', '']])
b = np.empty(a.shape[0], dtype=object)
for i, x in enumerate(a): b[i] = ''.join(x)

If you make a loop over each element of your array, you can then use list .join to get what you are looking for.
Something like:
arr = [['1', '.', ' ', 'I', ...], ...]
output = list()
for x in arr:
output.append(''.join(x))
output
>>>
['1. Identifying, Assessing and Improving Care', ...]

python how to create many-to-many of lists inside one list

I have variable like this :
message = "hello world"
and I want to put each 2 letters inside one list like this:
list = [['h', 'e'], ['l', 'l'], ...]
I have tried this method:
message = "hello world"
x,test = 0,[[]] * len(message)
for i in message:
if len(test[x]) >= 2:
x += 1
test[x].append(i)
else:
test[x].append(i)
but the result was adding hello world for every list.

The problem here is that your outer list contains a reference to a single inner list, just repeated. You can see what I mean by taking your resulting test and reassigning the value of one of the elements:
>>> test[1][0] = 9999
[[9999, 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'],
[9999, 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'],
[9999, 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'],
[9999, 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'],
[9999, 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'],
[9999, 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'],
[9999, 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'],
[9999, 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'],
[9999, 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'],
[9999, 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'],
[9999, 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']]
So even though x is incrementing, you're still just appending to a single list object because your test variable is a list of repeated references to the same object.
You can get around this by using a comprehension to initialize your test variable:
test = [[] for _ in range(len(message))]
You can also use zip and slicing to get what you want in a single line of code:
[[*z] for z in zip(s[0::2], s[1::2])]

Removing specific characters from string

I am new to NLP and trying to do some pre-processing steps on my data for a classification task. I have already done most of the cleaning but there still are some special characters within the text that I am now trying to remove.
The text is in a Dataframe and is already tokenized and lemmatized, converted to lowercase, with no stopwords and no punctuation.
Each text record is represented by a list of words.
['â€‹â€˜the', 'redwood', 'massacreâ€™', 'five', 'adventurous', 'friend', 'visiting', 'legendary', 'murder', 'site', 'redwood', 'hallmark', 'exciting', 'thrilling', 'camping', 'weekend', 'away', 'soon', 'discover', 'theyâ€™re', 'people', 'mysterious', 'location', 'fun', 'camping', 'expedition', 'soon', 'turn', 'nightmare', 'sadistically', 'stalked', 'mysterious', 'unseen', 'killer']
I tried the following code and other solutions as well but I can't understand why the output splits the words into single letters instead of just removing the special character, leaving the words in a compact format.
def remove_character(text):
new_text=[word.replace('€','') for word in text]
return new_text
df["Column_name"]=df["Column_name"].apply(lambda x:remove_character(x))
After applying the function this is the output on the same text record:
"['[', ""'"", 'â', '', '‹', 'â', '', '˜', 't', 'h', 'e', ""'"", ',', ' ', ""'"", 'r', 'e', 'd', 'w', 'o', 'o', 'd', ""'"", ',', ' ', ""'"", 'm', 'a', 's', 's', 'a', 'c', 'r', 'e', 'â', '', '™', ""'"", ',', ' ', ""'"", 'f', 'i', 'v', 'e', ""'"", ',', ' ', ""'"", 'a', 'd', 'v', 'e', 'n', 't', 'u', 'r', 'o', 'u', 's', ""'"", ',', ' ', ""'"", 'f', 'r', 'i', 'e', 'n', 'd', ""'"", ',', ' ', ""'"", 'v', 'i', 's', 'i', 't', 'i', 'n', 'g', ""'"", ',', ' ', ""'"", 'l', 'e', 'g', 'e', 'n', 'd', 'a', 'r', 'y', ""'"", ',', ' ', ""'"", 'm', 'u', 'r', 'd', 'e', 'r', ""'"", ',', ' ', ""'"", 's', 'i', 't', 'e', ""'"", ',', ' ', ""'"", 'r', 'e', 'd', 'w', 'o', 'o', 'd', ""'"", ',', ' ', ""'"", 'h', 'a', 'l', 'l', 'm', 'a', 'r', 'k', ""'"", ',', ' ', ""'"", 'e', 'x', 'c', 'i', 't', 'i', 'n', 'g', ""'"", ',', ' ', ""'"", 't', 'h', 'r', 'i', 'l', 'l', 'i', 'n', 'g', ""'"", ',', ' ', ""'"", 'c', 'a', 'm', 'p', 'i', 'n', 'g', ""'"", ',', ' ', ""'"", 'w', 'e', 'e', 'k', 'e', 'n', 'd', ""'"", ',', ' ', ""'"", 'a', 'w', 'a', 'y', ""'"", ',', ' ', ""'"", 's', 'o', 'o', 'n', ""'"", ',', ' ', ""'"", 'd', 'i', 's', 'c', 'o', 'v', 'e', 'r', ""'"", ',', ' ', ""'"", 't', 'h', 'e', 'y', 'â', '', '™', 'r', 'e', ""'"", ',', ' ', ""'"", 'p', 'e', 'o', 'p', 'l', 'e', ""'"", ',', ' ', ""'"", 'm', 'y', 's', 't', 'e', 'r', 'i', 'o', 'u', 's', ""'"", ',', ' ', ""'"", 'l', 'o', 'c', 'a', 't', 'i', 'o', 'n', ""'"", ',', ' ', ""'"", 'f', 'u', 'n', ""'"", ',', ' ', ""'"", 'c', 'a', 'm', 'p', 'i', 'n', 'g', ""'"", ',', ' ', ""'"", 'e', 'x', 'p', 'e', 'd', 'i', 't', 'i', 'o', 'n', ""'"", ',', ' ', ""'"", 's', 'o', 'o', 'n', ""'"", ',', ' ', ""'"", 't', 'u', 'r', 'n', ""'"", ',', ' ', ""'"", 'n', 'i', 'g', 'h', 't', 'm', 'a', 'r', 'e', ""'"", ',', ' ', ""'"", 's', 'a', 'd', 'i', 's', 't', 'i', 'c', 'a', 'l', 'l', 'y', ""'"", ',', ' ', ""'"", 's', 't', 'a', 'l', 'k', 'e', 'd', ""'"", ',', ' ', ""'"", 'm', 'y', 's', 't', 'e', 'r', 'i', 'o', 'u', 's', ""'"", ',', ' ', ""'"", 'u', 'n', 's', 'e', 'e', 'n', ""'"", ',', ' ', ""'"", 'k', 'i', 'l', 'l', 'e', 'r', ""'"", ']']"

It seems you have single words in cells like this
$ df.head()
Column_name
0 â€‹â€˜the
1 redwood
2 massacreâ€™
3 five
4 adventurous
so you shouldn't use for word in text which will split word into chars - it will work like for char in text.
You should use only replace() in apply() which will run it with every cell (similar to for-loop)
df["Column_name"] = df["Column_name"].apply(lambda word: word.replace('€',''))
Minimal working example (so everyone can copy and run it)
import pandas as pd
def remove_character(text):
return [word.replace('€', '') for word in text]
df = pd.DataFrame({'Column_name': ['â€‹â€˜the', 'redwood', 'massacreâ€™', 'five', 'adventurous', 'friend', 'visiting', 'legendary', 'murder', 'site', 'redwood', 'hallmark', 'exciting', 'thrilling', 'camping', 'weekend', 'away', 'soon', 'discover', 'theyâ€™re', 'people', 'mysterious', 'location', 'fun', 'camping', 'expedition', 'soon', 'turn', 'nightmare', 'sadistically', 'stalked', 'mysterious', 'unseen', 'killer']})
print(df.head())
df["Column_name"] = df["Column_name"].apply(lambda word: word.replace('€',''))
#df["Column_name"] = df["Column_name"].apply(lambda x:remove_character(x))
print(df.head())

Your remove_character function should return a string rather than a list. However, pandas includes the str accessor on a Series to perform operations on strings so another option you can use is
df["Column_name"] = df["Column_name"].str.replace('€','')
(no need to use apply)

How to print a list without [ , '' in python

I want to print
*IBM is a trademark of the International Business Machine Corporation.
in python instead of this
['*', 'I', 'B', 'M', ' ', 'i', 's', ' ', 'a', ' ', 't', 'r', 'a', 'd', 'e', 'm', 'a', 'r', 'k', ' ', 'o', 'f', ' ', 't', 'h', 'e', ' ', 'I', 'n', 't', 'e', 'r', 'n', 'a', 't', 'i', 'o', 'n', 'a', 'l', ' ', 'B', 'u', 's', 'i', 'n', 'e', 's', 's', ' ', 'M', 'a', 'c', 'h', 'i', 'n', 'e', ' ', 'C', 'o', 'r', 'p', 'o', 'r', 'a', 't', 'i', 'o', 'n', '.']
My code:
n=str(input())
l=len(n)
m=[' ']*l
for i in range(l):
m[i]=chr(ord(n[i])-7)
print(m)

Assuming that your list is:
this_is_a_list = ['*', 'I', 'B', 'M', ' ', 'i', 's', ' ', 'a', ' ', 't', 'r', 'a', 'd', 'e', 'm', 'a', 'r', 'k', ' ', 'o', 'f', ' ', 't', 'h', 'e', ' ', 'I', 'n', 't', 'e', 'r', 'n', 'a', 't', 'i', 'o', 'n', 'a', 'l', ' ', 'B', 'u', 's', 'i', 'n', 'e', 's', 's', ' ', 'M', 'a', 'c', 'h', 'i', 'n', 'e', ' ', 'C', 'o', 'r', 'p', 'o', 'r', 'a', 't', 'i', 'o', 'n', '.']
use join:
''.join(this_is_a_list)
extended:
in case you plan to use the string in the future: This method is extremely inefficient, but I'm going to leave it here as a showcase of what not to do: (Thanks to #PM 2Ring)
# BAD EXAMPLE, AVOID THIS METHOD
final_word = ""
for i in xrange(len(this_is_a_list)):
final_word = final_word + this_is_a_list[i]
print final_word
further edited, thanks to #kuro
final_word = ''.join(this_is_a_list)

Use join
x = ['*', 'I', 'B', 'M', ' ', 'i', 's', ' ', 'a', ' ', 't', 'r', 'a', 'd', 'e', 'm', 'a', 'r', 'k', ' ', 'o', 'f', ' ', 't', 'h', 'e', ' ', 'I', 'n', 't', 'e', 'r', 'n', 'a', 't', 'i', 'o', 'n', 'a', 'l', ' ', 'B', 'u', 's', 'i', 'n', 'e', 's', 's', ' ', 'M', 'a', 'c', 'h', 'i', 'n', 'e', ' ', 'C', 'o', 'r', 'p', 'o', 'r', 'a', 't', 'i', 'o', 'n', '.']
print(''.join(x))
'*IBM is a trademark of the International Business Machine Corporation.'

The sensible way to do this is to use .join. And to perform the decoding operation you can loop directly over the chars of the input string rather than using indices.
s = input('> ')
a = []
for u in s:
c = chr(ord(u) - 7)
a.append(c)
print(''.join(a))
demo
> 1PIT'pz'h'{yhklthyr'vm'{ol'Pu{lyuh{pvuhs'I|zpulzz'Thjopul'Jvywvyh{pvu5
*IBM is a trademark of the International Business Machine Corporation.
We can make this much more compact by using a list comprehension.
s = input('> ')
print(''.join([chr(ord(u)-7) for u in s]))

You can try this one
to_print = ['*', 'I', 'B', 'M', ' ', 'i', 's', ' ', 'a', ' ', 't', 'r', 'a', 'd', 'e', 'm', 'a', 'r', 'k', ' ', 'o', 'f', ' ', 't', 'h', 'e', ' ', 'I', 'n', 't', 'e', 'r', 'n', 'a', 't', 'i', 'o', 'n', 'a', 'l', ' ', 'B', 'u', 's', 'i', 'n', 'e', 's', 's', ' ', 'M', 'a', 'c', 'h', 'i', 'n', 'e', ' ', 'C', 'o', 'r', 'p', 'o', 'r', 'a', 't', 'i', 'o', 'n', '.']
word = ''
for i in range(len(to_print)):
word = word + to_print[i]
print (word)

Complex Python List Comprehension

Can somebody explain how exactly is list comprehension working here?
page = 'one two one three\n' * 10
unique_words = list(word for line in page for word in line.split())
print unique_words
OUTPUT
['o', 'n', 'e', 't', 'w', 'o', 'o', 'n', 'e', 't', 'h', 'r', 'e', 'e', 'o', 'n', 'e', 't', 'w', 'o', 'o', 'n', 'e', 't', 'h', 'r', 'e', 'e', 'o', 'n', 'e', 't', 'w', 'o', 'o', 'n', 'e', 't', 'h', 'r', 'e', 'e']
I am confused over where the variables are declared and where they are used?
e.g. Initially we only know about page as a string,
line in page -> should return each character from the string.
word in line.split() -> is removing '\n' and whitespaces and returning each character
and hence the output. But I still don't understand the way of writing it so that the compiler understands what I want.
QUESTION: How exactly is word for line in page for word in line.split() processed by the compiler step by step?

You need to see the double for loops as nested, from left to right:
for line in page:
for word in line.split():
word
You have one long string going in, so for line in page loops over each individual character; line is one character at a time. Splitting that character gives you a list with just that one character, unless that character is whitespace (space, newline, tab, etc.):
>>> page = 'one two one three\n' * 10
>>> list(page)
['o', 'n', 'e', ' ', 't', 'w', 'o', ' ', 'o', 'n', 'e', ' ', 't', 'h', 'r', 'e', 'e', '\n', 'o', 'n', 'e', ' ', 't', 'w', 'o', ' ', 'o', 'n', 'e', ' ', 't', 'h', 'r', 'e', 'e', '\n', 'o', 'n', 'e', ' ', 't', 'w', 'o', ' ', 'o', 'n', 'e', ' ', 't', 'h', 'r', 'e', 'e', '\n', 'o', 'n', 'e', ' ', 't', 'w', 'o', ' ', 'o', 'n', 'e', ' ', 't', 'h', 'r', 'e', 'e', '\n', 'o', 'n', 'e', ' ', 't', 'w', 'o', ' ', 'o', 'n', 'e', ' ', 't', 'h', 'r', 'e', 'e', '\n', 'o', 'n', 'e', ' ', 't', 'w', 'o', ' ', 'o', 'n', 'e', ' ', 't', 'h', 'r', 'e', 'e', '\n', 'o', 'n', 'e', ' ', 't', 'w', 'o', ' ', 'o', 'n', 'e', ' ', 't', 'h', 'r', 'e', 'e', '\n', 'o', 'n', 'e', ' ', 't', 'w', 'o', ' ', 'o', 'n', 'e', ' ', 't', 'h', 'r', 'e', 'e', '\n', 'o', 'n', 'e', ' ', 't', 'w', 'o', ' ', 'o', 'n', 'e', ' ', 't', 'h', 'r', 'e', 'e', '\n', 'o', 'n', 'e', ' ', 't', 'w', 'o', ' ', 'o', 'n', 'e', ' ', 't', 'h', 'r', 'e', 'e', '\n']
>>> page[0].split()
['o']
>>> page[3].split()
[]
so the end result is a list with individual characters.
Note that technically speaking you have a generator expression feeding a list() call; the output is the same a list comprehension however. You'd get a list comprehension if you replaced list(...) with [...].
If you wanted unique words, use a set() instead and just a simple str.split() call, no need for looping:
unique_words = set(page.split())
str.split() will already split your sentences into words on all whitespace, including the newlines; set() removes any duplicates:
>>> set(page.split())
{'two', 'one', 'three'}

You read that left to right:
[word for line in page for word in line.split()]
is the same as:
mylist=[]
for line in page:
for word in line.split():
mylist.append(word)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get the characters from a list of lists - python

Try using a simple list comprehension example = [list(item) for sub in example for item in sub]

Related

Joining individual elements of an array

python how to create many-to-many of lists inside one list

Removing specific characters from string

How to print a list without [ , '' in python

Complex Python List Comprehension

Categories

Resources