Lowercase columns by name using dataframe method - python

I have a dataframe containing strings and NaNs. I want to str.lower() certain columns by name to_lower = ['b', 'd', 'e']. Ideally I could do it with a method on the whole dataframe, rather than with a method on df[to_lower]. I have
df[to_lower] = df[to_lower].apply(lambda x: x.astype(str).str.lower())
but I would like a way to do it without assigning to the selected columns.
df = pd.DataFrame({'a': ['A', 'a'], 'b': ['B', 'b']})
to_lower = ['a']
df2 = df.copy()
df2[to_lower] = df2[to_lower].apply(lambda x: x.astype(str).str.lower())

You can use assign method and unpack the result as keyword argument:
df = pd.DataFrame({'a': ['A', 'a'], 'b': ['B', 'b'], 'c': ['C', 'c']})
to_lower = ['a', 'b']
df.assign(**df[to_lower].apply(lambda x: x.astype(str).str.lower()))
# a b c
#0 a b C
#1 a b c

You want this:
for column in to_lower:
df[column] = df[column].str.lower()
This is far more efficient assuming you have more rows than columns.

Related

Confronting values between dataframe

I'm trying to find a way to confront the equality of values contained into a different dataframes having different column names.
label = {
'aoo' : ['a', 'b', 'c'],
'boo' : ['a', 'b', 'c'],
'coo' : ['a', 'b', 'c']
'label': ['label', 'label', 'label']
}
unlabel = {
'unlabel1' : ['a', 'b', 'c'],
'unlabel2' : ['a', 'b', 'c'],
'unlabel3': ['a', 'b', 'hhh']
}
label = pd.DataFrame(label)
unlabel = pd.DataFrame(unlabel)
Desired output is a dataframe that contains the column where their values is equal and the column label.
Where a single value is not equal unlabel['unlabel3'] i don't want to keep the values in the output.
desired_output = {
'unlabel1' : ['a', 'b', 'c'],
'unlabel2' : ['a', 'b', 'c'],
'label' : ['label', 'label', 'label']
}
If the labels where numbers I could try np.where but I can't find similar helper for string.
Could you help?
Thanks
You can use pd.merge and specify the columns to merge with left_on and right_on
out = unlabel.merge(label, left_on=['unlabel1', 'unlabel2', 'unlabel3'], right_on=['aoo', 'boo', 'coo'], how='left').drop(['unlabel3', 'aoo', 'boo', 'coo'], axis=1)
print(out)
unlabel1 unlabel2 label
0 a a label
1 b b label
2 c c NaN

Get counts of unique lists in Pandas

I have a pandas Dataframe where one of the columns is full of lists:
import pandas
df = pandas.DataFrame([[1, [a, b, c]],
[2, [d, e, f]],
[3, [a, b, c]]])
And I'd like to make a pivot table that shows the list and a count of occurrences
List Count
[a,b,c] 2
[d,e,f] 1
Because list is a non-hashable type, what aggregate functions could do this?
You can zip a list of rows and a list of counts, then make a dataframe from the zip object:
import pandas
df = pandas.DataFrame([[1, ['a', 'b', 'c']],
[2, ['d', 'e', 'f']],
[3, ['a', 'b', 'c']]])
rows = []
counts = []
for index,row in df.iterrows():
if row[1] not in rows:
rows.append(row[1])
counts.append(1)
else:
counts[rows.index(row[1])] += 1
df = pandas.DataFrame(zip(rows, counts))
print(df)
The solution I ended up using was:
import pandas
df = pandas.DataFrame([[1, ['a', 'b', 'c']],
[2, ['d','e', 'f']],
[3, ['a', 'b', 'c']]])
print(df[1])
df[1] = df[1].map(tuple)
#Thanks Ch3steR
df2 = pandas.pivot_table(df,index=df[1], aggfunc='count')
print(df2)

Convert all rows of a Pandas dataframe column to comma-separated values with each value in single quote

I have a Pandas dataframe similar to:
df = pd.DataFrame(['a', 'b', 'c', 'd'], columns=['Col'])
df
Col
0 a
1 b
2 c
3 d
I am trying to convert all rows of this column to a comma-separated string with each value in single quotes, like below:
'a', 'b', 'c', 'd'
I have tried the following with several different combinations, but this is the closest I got:
s = df['Col'].str.cat(sep="', '")
s
"a', 'b', 'c', 'd"
I think that the end result should be:
"'a', 'b', 'c', 'd'"
A quick fix will be
"'" + df['Col1'].str.cat(sep="', '") + "'"
"'a', 'b', 'c', 'd'"
Another alternative is adding each element with an extra quote and then use the default .join;
', '.join([f"'{i}'" for i in df['Col1']])
"'a', 'b', 'c', 'd'"
Try this:
s = df['Col'].tolist()
Try something like this:
df = pd.DataFrame(['a', 'b', 'c', 'd'], columns=['Col1'])
values = df['Col1'].to_list()
with_quotes = ["'"+x+"'" for x in values]
','.join(with_quotes)
Output:
"'a','b','c','d'"

Return a list with dataframe column values ordered based on another list

I have a df with columns a-h, and I wish to create a list of these column values, but in the order of values in another list (list1). list1 corresponds to the index value in df.
df
a b c d e f g h
list1
[3,1,0,5,2,7,4,6]
Desired list
['d', 'b', 'a', 'f', 'c', 'h', 'e', 'g']
You can just do df.columns[list1]:
import pandas as pd
df = pd.DataFrame([], columns=list('abcdefgh'))
list1 = [3,1,0,5,2,7,4,6]
print(df.columns[list1])
# Index(['d', 'b', 'a', 'f', 'c', 'h', 'e', 'g'], dtype='object')
First get a np.array of alphabets
arr = np.array(list('abcdefgh'))
Or in your case, a list of your df columns
arr = np.array(df.columns)
Then use your indices as a indexing mask
arr[[3,1,0]]
out:
['d', 'b', 'a']
Check
df.columns.to_series()[list1].tolist()

Creating variable number of lists from pandas dataframes

I have a pandas dataframe being generated by some other piece of code - the dataframe may have different number of columns each time it is generated: let's call them col1,col2,...,coln where n is not fixed. Please note that col1,col2,... are just placeholders, the actual names of columns can be arbitrary like TimeStamp or PrevState.
From this, I want to convert each column into a list, with the name of the list being the same as the column. So, I want a list named col1 with the entries in the first column of the dataframe and so on till coln.
How do I do this?
Thanks
It is not recommended, better is create dictionary:
d = df.to_dict('list')
And then select list by keys of dict from columns names:
print (d['col'])
Sample:
df = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
})
d = df.to_dict('list')
print (d)
{'A': ['a', 'b', 'c', 'd', 'e', 'f'], 'B': [4, 5, 4, 5, 5, 4], 'C': [7, 8, 9, 4, 2, 3]}
print (d['A'])
['a', 'b', 'c', 'd', 'e', 'f']
import pandas as pd
df = pd.DataFrame()
df["col1"] = [1,2,3,4,5]
df["colTWO"] = [6,7,8,9,10]
for col_name in df.columns:
exec(col_name + " = " + df[col_name].values.__repr__())

Categories

Resources