Python: how to remove key from list and keep value? - python

I have an array like this
myarr = [
[{'text':'da','id':'aa','info':'aaa'},{'text':'da','id':'aa','info':'aaa'},{'text':'da','id':'aa','info':'aaa'}],
[{'text':'da','id':'aa','info':'aaa'},{'text':'da','id':'aa','info':'aaa'},{'text':'da','id':'aa','info':'aaa'}]
]
I need result:
myarr = [
[['da','aa','aaa'],['da','aa','aaa'],['da','aa','aaa']],
[['da','aa','aaa'],['da','aa','aaa'],['da','aa','aaa']]
]
How can i get sample result? Please help me!

You can try a list comprehension -
# l will iterate over each inner list and
# e will iterate over dictionaries in each inner list
myarr = [[list(e.values()) for e in l] for l in myarr]
print(myarr)
Ouput:
[[['da', 'aa', 'aaa'], ['da', 'aa', 'aaa'], ['da', 'aa', 'aaa']], [['da', 'aa', 'aaa'], ['da', 'aa', 'aaa'], ['da', 'aa', 'aaa']]]

For some variety, you could also use:
myarr = [[*map(list, map(dict.values, x))] for x in myarr]

Related

Get the list of values/entries that are common to all the columns in a dataframe (or a csv file) in python

I have this setup of a pandas dataframe (or a csv file):
df = {
'colA': ['aa', 'bb', 'cc', 'dd', 'ee'],
'colB': ['aa', 'bb', 'dd', 'qq', 'ee'],
'colC': ['aa', 'bb', 'cc', 'ee', 'dd'],
'colD': ['aa', 'bb', 'ee', 'cc', 'dd']
}
The goal here is to get a list/column with the set of values that appear in all the columns or in other words the entries that are common to all the columns.
Required output:
col
aa
bb
dd
ee
or a output with the common value's list:
common_list = ['aa', 'bb', 'dd', 'ee']
I have a silly solution (but it doesn't seem to be correct as I am not getting what I want when implemented to my dataframe)
import pandas as pd
df = pd.read_csv('Bus Names Concat test.csv') #i/p csv file (pandas df converted into csv)
df = df.stack().value_counts()
core_list = df[df>2].index.tolist() #defining the common list as core list
print(len(core_list))
df_core = pd.DataFrame(core_list)
print(df_core)
Any help/sugggestion/feedback to get the required o/p will be appreciated.
In your case
s = df.melt().groupby('value')['variable'].nunique()
outlist = s[s==4].index.tolist()
Out[307]: ['aa', 'bb', 'dd', 'ee']
You can use .intersection() method of sets to find common values between sets of each column:
# wrapped in a list, take first column set and pass sets of other columns as arguments
common_list = list(set(df.colA).intersection(set(df.colB), set(df.colC), set(df.colD)))
sorted(common_list) # needs sorting in alphabetical order
Output:
['aa', 'bb', 'dd', 'ee']
Alternatively, for unspecified number of columns and without sorting:
common_list = list(set(df[df.columns[0]] # first column's set to compare the rest with
).intersection(
*(set(df[col]) # unpack generator with sets
for col in df.columns[1:]))) # of the remaining columns
common_list
Output:
['dd', 'aa', 'ee', 'bb'] # curiously enough, it is ordered differently
Transpose the dataframe so columns are rows and rows are columns.
Convert the values to a list of lists.
Map each sublist as a Set and unpack the Intersection of the Sets to a List of unique values.
common_list = list(set.intersection(*map(set, df.values.transpose().tolist())))
print(common_list)
['aa', 'bb', 'dd', 'ee']

How to combine words from lists into one?

list1 = ['a', 'b', 'c']
list2 = ['d', 'e', 'f']
How to combine these two lists into one, and the output should be
['ad', 'be', 'cf']
You should use zip in conjunction with a list comprehension as follows:
list1 = ['a', 'b', 'c']
list2 = ['d', 'e', 'f']
list3 = [a+b for a, b in zip(list1, list2)]
print(list3)
Output:
['ad', 'be', 'cf']
The following code will work:
list1 = ['a', 'b', 'c']
list2 = ['d', 'e', 'f']
list3 = []
for i in range(len(list1)):
list3.append(list1[i] + list2[i])
print(list3)
Output:
['ad', 'be', 'cf']
Explanation:
We will first create a new list, list3 that will store the combination of list1 and list2. To combine the two lists, we will use a for loop to iterate through every index of both list1 and list2 and then use the .append() function to add the i'th element of list1 and list2 to list3.
I hope this helped! Please let me know if you have any further questions or clarifications :)
list1 = ['a', 'b', 'c']
list2 = ['d', 'e', 'f']
for i in range(len(list1)):
list1[i]+=list2[i]
print(list1)
#['ad', 'be', 'cf']
you can you zip to iterate over several iterables in parallel, producing tuples with an item from each one and then concatinate them using ''.join(sequence).
In [2]: ["".join(i) for i in zip(list1, list2)]
Out[2]: ['ad', 'be', 'cf']

Comparing lists elements to sublist elements in Pandas

df
col1 col2
['aa', 'bb', 'cc', 'dd'] [['ee', 'ff', 'gg', 'hh'], ['qq', 'ww', 'ee', 'rr']]
['ss', 'dd', 'ff', 'gg'] [['mm', 'nn', 'vv', 'cc'], ['zz', 'aa', 'jj', 'kk']]
['ss', 'dd'] [['mm', 'nn', 'vv', 'cc'], ['zz', 'aa', 'jj', 'kk']]
I'd like to be able to run a function that concats the first list element in col1 to the first sublist elements (there are multiple sublists) in col2, then concats the second list element in col1 to the second sublist elements in col2.
Results would be like this column:
results
[['aaee', 'bbff', 'ccgg', 'ddhh'],['aaqq', 'bbww', 'ccee', 'ddrr']]
[['ssmm', 'ddnn', 'ffvv', 'ggcc'],['sszz', 'ddaa', 'ffjj', 'ggkk']]
[['ssmm', 'ddnn'],['sszz', 'ddaa']]
I'm thinking it would have something to do with looping through the first elements in col1 and somehow loop and match them to the corresponding items in each sublist in col2 - how can I do this?
Converted code
[[[df1.agg(lambda x: get_top_matches(u,w), axis=1) for u,w in zip(x,v)]\
for v in y] for x,y in zip(df1['parent_org_name_list'], df1['children_org_name_sublists'])]
Results:
You can just use zip here:
[[[u+w for u,w in zip(x,v)] for v in y] for x,y in zip(df['col1'], df['col2'])]
Output:
[[['aaee', 'bbff', 'ccgg', 'ddhh'], ['aaqq', 'bbww', 'ccee', 'ddrr']],
[['ssmm', 'ddnn', 'ffvv', 'ggcc'], ['sszz', 'ddaa', 'ffjj', 'ggkk']],
[['ssmm', 'ddnn'], ['sszz', 'ddaa']]]
To assign back to your dataframe, you can do:
df['results'] = [[[u+w for u,w in zip(x,v)] for v in y]
for x,y in zip(df['col1'], df['col2'])]
Max, try this solution with a cycle. It allows finer control over transformations, including dealing with uneven lengths (see len_limit in the example):
import pandas as pd
df = pd.DataFrame({'c1':[['aa', 'bb', 'cc', 'dd'],['ss', 'dd', 'ff', 'gg']],
'c2':[[['ee', 'ff', 'gg', 'hh'], ['qq', 'ww', 'ee', 'rr']],
[['mm', 'nn', 'vv', 'cc'], ['zz', 'aa', 'jj', 'kk']]],})
df ['c3'] = 'empty' # send string to 'c3' so it is object data type
print(df)
c1 c2 c3
0 [aa, bb, cc, dd] [[ee, ff, gg, hh], [qq, ww, ee, rr]] empty
1 [ss, dd, ff, gg] [[mm, nn, vv, cc], [zz, aa, jj, kk]] empty
for i, row in df.iterrows():
c3_list = []
len_limit = len (row['c1']
for c2_sublist in row['c2']:
c3_list.append([j1+j2 for j1, j2 in zip(row['c1'], c2_sublist[:len_limit])])
df.at[i, 'c3'] = c3_list
print (df['c3'])
0 [[aaee, bbff, ccgg, ddhh], [aaqq, bbww, ccee, ...
1 [[ssmm, ddnn, ffvv, ggcc], [sszz, ddaa, ffjj, ...
Name: c3, dtype: object
Try:
df["results"] = df[["col1", "col2"]].apply(lambda x: [list(map(''.join, zip(x["col1"], el))) for el in x["col2"]], axis=1)
Outputs:
>>> df["results"]
0 [[aaee, bbff, ccgg, ddhh], [aaqq, bbww, ccee, ...
1 [[ssmm, ddnn, ffvv, ggcc], [sszz, ddaa, ffjj, ...
2 [[ssmm, ddnn], [sszz, ddaa]]

replace duplicate values in a list with white space

Say I have a sorted list, and I want to keep each value in the list for once.
a = ['aa', 'aa', 'aa', 'bb', 'bb', 'cc']
shall be converted into
a = ['aa', ' ', ' ', 'bb', ' ', 'cc']
It seems to be a very odd request. The reason behind this is I want a unique label list for my seaborn heatmap for xticklabel. The length of my list is very long (>1000). If I plot every value in my list, the plot will be a disaster.
If the list is sorted, the simplest is to use itertools.groupby to convert every subsequence, then stitch them together:
from itertools import groupby
new_a = [x for k, v in groupby(a) for x in [k] + [' '] * (sum(1 for __ in v) - 1)]
Here's another approach with easier readability.
org = None
a = ['aa', 'aa', 'aa', 'bb', 'bb', 'cc']
for i in range(len(a)):
if a[i] == org:
a[i] = " "
else:
org = a[i]
print(a)
Output:
['aa', ' ', ' ', 'bb', ' ', 'cc']
One way is to use counters
In [26]: a
Out[26]: ['aa', 'aa', 'aa', 'bb', 'bb', 'cc']
In [27]: from collections import Counter
In [28]: data = []
In [29]: for i in counter:
...: data.append(i)
...: data.extend([" "] * (counter[i] - 1))
...:
...:
In [30]: data
Out[30]: ['aa', ' ', ' ', 'bb', ' ', 'cc']
a = ['aa', 'aa', 'aa', 'bb', 'bb', 'cc']
newlist = []
for i in a:
if i not in newlist:
newlist.append(i)
else:
newlist.append('')
print(newlist)
>> ['aa', '', '', 'bb', '', 'cc']
First, create a new list,
new_a = []
Then, ignore all the other occurrences of that particular element and replace it with whitespaces
for i in a:
if i not in new_a:
new_a.append(i)
else:
new_a.append(" ")
print(new_a)
Output :
>> ['aa', ' ', ' ', 'bb', ' ', 'cc']

Sorting a List of Strings, Ignoring ASCII Ordinal Values

I want to sort this list:
>>> L = ['A', 'B', 'C', ... 'Z', 'AA', 'AB', 'AC', ... 'AZ', 'BA' ...]
Exactly the way it is, regardless of the contents (assuming all CAPS alpha).
>>> L.sort()
>>> L
['A', 'AA', 'AB', 'AC'...]
How can I make this:
>>> L.parkinglot_sort()
>>> L
['A', 'B', 'C', ... ]
I was thinking of testing for length, and sorting each length, and mashing all the separate 1-length, 2-length, n-length elements of L into the new L.
Thanks!
What about this?
l.sort(key=lambda element: (len(element), element))
It will sort the list taking into account not only each element, but also its length.
>>> l = ['A', 'AA', 'B', 'BB', 'C', 'CC']
>>> l.sort(key=lambda element: (len(element), element))
>>> print l
['A', 'B', 'C', 'AA', 'BB', 'CC']

Categories

Resources