replace duplicate values in a list with white space - python

Say I have a sorted list, and I want to keep each value in the list for once.
a = ['aa', 'aa', 'aa', 'bb', 'bb', 'cc']
shall be converted into
a = ['aa', ' ', ' ', 'bb', ' ', 'cc']
It seems to be a very odd request. The reason behind this is I want a unique label list for my seaborn heatmap for xticklabel. The length of my list is very long (>1000). If I plot every value in my list, the plot will be a disaster.

If the list is sorted, the simplest is to use itertools.groupby to convert every subsequence, then stitch them together:
from itertools import groupby
new_a = [x for k, v in groupby(a) for x in [k] + [' '] * (sum(1 for __ in v) - 1)]

Here's another approach with easier readability.
org = None
a = ['aa', 'aa', 'aa', 'bb', 'bb', 'cc']
for i in range(len(a)):
if a[i] == org:
a[i] = " "
else:
org = a[i]
print(a)
Output:
['aa', ' ', ' ', 'bb', ' ', 'cc']

One way is to use counters
In [26]: a
Out[26]: ['aa', 'aa', 'aa', 'bb', 'bb', 'cc']
In [27]: from collections import Counter
In [28]: data = []
In [29]: for i in counter:
...: data.append(i)
...: data.extend([" "] * (counter[i] - 1))
...:
...:
In [30]: data
Out[30]: ['aa', ' ', ' ', 'bb', ' ', 'cc']

a = ['aa', 'aa', 'aa', 'bb', 'bb', 'cc']
newlist = []
for i in a:
if i not in newlist:
newlist.append(i)
else:
newlist.append('')
print(newlist)
>> ['aa', '', '', 'bb', '', 'cc']

First, create a new list,
new_a = []
Then, ignore all the other occurrences of that particular element and replace it with whitespaces
for i in a:
if i not in new_a:
new_a.append(i)
else:
new_a.append(" ")
print(new_a)
Output :
>> ['aa', ' ', ' ', 'bb', ' ', 'cc']

Related

Splitting list of strings in a column of vaex dataframe

There is a vaex dataframe with a column such as:
df['col']
['aa', ' NO']
['aa', ' NO']
['aa', ' NO']
['aa', ' NO']
['aa', ' NO']
I want to convert this one column to two columns as follow:
df['col1', 'col2']
['aa'], [' NO']
['aa'], [' NO']
['aa'], [' NO']
['aa'], [' NO']
['aa'], [' NO']
Is there any way to do that in Vaex?
I do like that (not very clean but ok. Maybe You can use find method to if you dont know where is str word start or end ):
df.head(10)
>>> col
>>> 0 ['aa', 'NO']
>>> 1 ['aa', 'NO']
>>> 2 ['aa', 'NO']
>>> 3 ['aa', 'NO']
>>> 4 ['aa', 'NO']
df['col1'] = [[x[1:5]] for x in df['col']]
df['col2'] = [[x[7:11]] for x in df['col']]
df.head(10)
>>> col col1 col2
>>> 0 ['aa', 'NO'] ['aa'] ['NO']
>>> 1 ['aa', 'NO'] ['aa'] ['NO']
>>> 2 ['aa', 'NO'] ['aa'] ['NO']
>>> 3 ['aa', 'NO'] ['aa'] ['NO']
>>> 4 ['aa', 'NO'] ['aa'] ['NO']

Python: how to remove key from list and keep value?

I have an array like this
myarr = [
[{'text':'da','id':'aa','info':'aaa'},{'text':'da','id':'aa','info':'aaa'},{'text':'da','id':'aa','info':'aaa'}],
[{'text':'da','id':'aa','info':'aaa'},{'text':'da','id':'aa','info':'aaa'},{'text':'da','id':'aa','info':'aaa'}]
]
I need result:
myarr = [
[['da','aa','aaa'],['da','aa','aaa'],['da','aa','aaa']],
[['da','aa','aaa'],['da','aa','aaa'],['da','aa','aaa']]
]
How can i get sample result? Please help me!
You can try a list comprehension -
# l will iterate over each inner list and
# e will iterate over dictionaries in each inner list
myarr = [[list(e.values()) for e in l] for l in myarr]
print(myarr)
Ouput:
[[['da', 'aa', 'aaa'], ['da', 'aa', 'aaa'], ['da', 'aa', 'aaa']], [['da', 'aa', 'aaa'], ['da', 'aa', 'aaa'], ['da', 'aa', 'aaa']]]
For some variety, you could also use:
myarr = [[*map(list, map(dict.values, x))] for x in myarr]

how to find all overlapping substrings of length k in a sample string in python

str1 = "ABCDEF"
I want to find a list of all substrings of length 3 in the above string including overlap
For example:
list1 = ['ABC','BCD','CDE','DEF']
I tried the following but it misses the overlap:
n = 3
lst = [str1[i:i+n] for i in range(0, len(str1), n)]
x = "ABCDEF"
print ([x[i:i+3] for i in range(len(x)-2)])
Output:
['ABC', 'BCD', 'CDE', 'DEF']
More generally:
x = "ABCDEF"
n = 2
print ([x[i:i+n] for i in range(len(x)-n+1)])
Output:
['AB', 'BC', 'CD', 'DE', 'EF']
Even more generally:
x = "ABCDEF"
for n in range(len(x)+1):
print ([x[i:i+n] for i in range(len(x)-n+1)])
Output:
['', '', '', '', '', '', '']
['A', 'B', 'C', 'D', 'E', 'F']
['AB', 'BC', 'CD', 'DE', 'EF']
['ABC', 'BCD', 'CDE', 'DEF']
['ABCD', 'BCDE', 'CDEF']
['ABCDE', 'BCDEF']
['ABCDEF']

How to create sub list with fixed length from given number of inputs or list in Python?

I want to create sub-lists with fixed list length, from given number of inputs in Python.
For example, my inputs are: ['a','b','c',......'z']... Then I want to put those values in several lists. Each list length should be 6. So I want something like this:
first list = ['a','b','c','d','e','f']
second list = ['g','h','i','j','k','l']
last list = [' ',' ',' ',' ',' ','z' ]
How can I achieve this?
The smallest solution:
x = ["a","b","c","d","e","f","g","h","i","j"]
size = 3 (user input)
for counter in range(0,len(x),size):
print(x[counter:counter+size])
This will split your list into 2 lists of equal length (6):
>>> my_list = [1, 'ab', '', 'No', '', 'NULL', 2, 'bc', '','Yes' ,'' ,'Null']
>>> x = my_list[:len(my_list)//2]
>>> y = my_list[len(my_list)//2:]
>>> x
[1, 'ab', '', 'No', '', 'NULL']
>>> y
[2, 'bc', '', 'Yes', '', 'Null']
If you want to split a list to many smaller lists use:
chunks = [my_list[x:x+size] for x in range(0, len(my_list), size)]
Where size is the size of the smaller lists you want, example:
>>> size = 2
>>> chunks = [my_list[x:x+size] for x in range(0, len(my_list), size)]
[[1, 'ab'], ['', 'No'], ['', 'NULL'], [2, 'bc'], ['', 'Yes'], ['', 'Null']]
>>> for item in chunks:
print (item)
[1, 'ab']
['', 'No']
['', 'NULL']
[2, 'bc']
['', 'Yes']
['', 'Null']
Your input is a string, and you need to split it first by comma, and then divide it further:
input_string = "1, 'ab', '', 'No', '', 'NULL', 2, 'bc', '','Yes' ,'' ,'Null'"
bits = input_string.split(',')
x,y = bits[:6],bits[6:] # divide by 6
x,y = bits[:len(bits)//2],bits[len(bits)//2:] # divide in half
This returns a 2d list "b" that contains as many entries per list as chunksize is big.
a = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
b = []
chunksize = 6
def get_list(a, chunk):
return a[chunk*chunksize:chunk*chunksize+chunksize]
for i in range(int(len(a) / chunksize)):
b.append(get_list(a,i))
print(b)
Output:
[['a', 'b', 'c', 'd', 'e', 'f'], ['g', 'h', 'i', 'j', 'k', 'l'], ['m', 'n', 'o', 'p', 'q', 'r'], ['s', 't', 'u', 'v', 'w', 'x']]

Sorting a List of Strings, Ignoring ASCII Ordinal Values

I want to sort this list:
>>> L = ['A', 'B', 'C', ... 'Z', 'AA', 'AB', 'AC', ... 'AZ', 'BA' ...]
Exactly the way it is, regardless of the contents (assuming all CAPS alpha).
>>> L.sort()
>>> L
['A', 'AA', 'AB', 'AC'...]
How can I make this:
>>> L.parkinglot_sort()
>>> L
['A', 'B', 'C', ... ]
I was thinking of testing for length, and sorting each length, and mashing all the separate 1-length, 2-length, n-length elements of L into the new L.
Thanks!
What about this?
l.sort(key=lambda element: (len(element), element))
It will sort the list taking into account not only each element, but also its length.
>>> l = ['A', 'AA', 'B', 'BB', 'C', 'CC']
>>> l.sort(key=lambda element: (len(element), element))
>>> print l
['A', 'B', 'C', 'AA', 'BB', 'CC']

Categories

Resources