I would like to know if there is a function to change specific column names but without selecting a specific name or without changing all of them.
I have the code:
df=df.rename(columns = {'nameofacolumn':'newname'})
But with it i have to manually change each one of them writing each name.
Also to change all of them I have
df = df.columns['name1','name2','etc']
I would like to have a function to change columns 1 and 3 without writing their names just stating their location.
say you have a dictionary of the new column names and the name of the column they should replace:
df.rename(columns={'old_col':'new_col', 'old_col_2':'new_col_2'}, inplace=True)
But, if you don't have that, and you only have the indices, you can do this:
column_indices = [1,4,5,6]
new_names = ['a','b','c','d']
old_names = df.columns[column_indices]
df.rename(columns=dict(zip(old_names, new_names)), inplace=True)
You can use a dict comprehension and pass this to rename:
In [246]:
df = pd.DataFrame(columns=list('abc'))
new_cols=['d','e']
df.rename(columns=dict(zip(df.columns[1:], new_cols)),inplace=True)
df
Out[246]:
Empty DataFrame
Columns: [a, d, e]
Index: []
It also works if you pass a list of ordinal positions:
df.rename(columns=dict(zip(df.columns[[1,2]], new_cols)),inplace=True)
You don't need to use rename method at all.
You simply replace the old column names with new ones using lists. To rename columns 1 and 3 (with index 0 and 2), you do something like this:
df.columns.values[[0, 2]] = ['newname0', 'newname2']
or possibly if you are using older version of pandas than 0.16.0, you do:
df.keys().values[[0, 2]] = ['newname0', 'newname2']
The advantage of this approach is, that you don't need to copy the whole dataframe with syntax df = df.rename, you just change the index values.
You should be able to reference the columns by index using ..df.columns[index]
>> temp = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
>> print(temp.columns[0])
a
>> print(temp.columns[1])
b
So to change the value of specific columns, first assign the values to an array and change only the values you want
>> newcolumns=temp.columns.values
>> newcolumns[0] = 'New_a'
Assign the new array back to the columns and you'll have what you need
>> temp.columns = newcolumns
>> temp.columns
>> print(temp.columns[0])
New_a
if you have a dict of {position: new_name}, you can use items()
e.g.,
new_columns = {3: 'fourth_column'}
df.rename(columns={df.columns[i]: new_col for i, new_col in new_cols.items()})
full example:
$ ipython
Python 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.24.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import numpy as np
...: import pandas as pd
...:
...: rng = np.random.default_rng(seed=0)
...: df = pd.DataFrame({key: rng.uniform(size=3) for key in list('abcde')})
...: df
Out[1]:
a b c d e
0 0.636962 0.016528 0.606636 0.935072 0.857404
1 0.269787 0.813270 0.729497 0.815854 0.033586
2 0.040974 0.912756 0.543625 0.002739 0.729655
In [2]: new_columns = {3: 'fourth_column'}
...: df.rename(columns={df.columns[i]: new_col for i, new_col in new_columns.items()})
Out[2]:
a b c fourth_column e
0 0.636962 0.016528 0.606636 0.935072 0.857404
1 0.269787 0.813270 0.729497 0.815854 0.033586
2 0.040974 0.912756 0.543625 0.002739 0.729655
In [3]:
Related
I recently posted on how to create multiple variables from a CSV file. The code worked in that I have the variables created. However, the code is creating a bunch of variables all equal to the first row. I need the code to make 1 variable for each row in the dataframe
I need 208000 variables labeled A1:A20800
The code I currently have:
df = pandas.read_csv(file_name)
for i in range(1,207999):
for c in df:
exec("%s = %s" % ('A' + str(i), c))
i += 1
I have tried adding additional quotation marks around the second %s (gives a syntax error). I have tried selecting all the rows of the df and using that. Not sure why it isn't working! Every time I print a variable to test if it worked, it is printing the same value, (i.e. A1 = A2 = A3...=A207999) What I actually want is:
A1 = row 1
A2 = row 2
.
.
.
Thank you in advance for any assistance!
I don't know how pandas reads a file, but I'm guessing it returns an iterable. In that case using islice should allow just 20800 rows to be read:
from itertools import islice
df = pandas.read_csv(file_name)
A = list(islice(df, 20800))
# now access rows: A[index]
If you want to create a list containing the values of each row from your DataFrame, you can use the method df.iterrows():
[row[1].to_list() for row in df.iterrows()]
If you still want to create a large number of variables, you can do so in a loop as:
for row in df.iterrows():
list_with_row_values = row[0].to_list()
# create your variables here...
You are getting the same value for all the variables because you are incrementing i in your inner for loop, so all the Annnn variables are probably set to the last value.
So you want something more like:
In [2]: df = pd.DataFrame({'a':[1,2,3], 'b':[42, 42, 42]})
In [3]: df
Out[3]:
a b
0 1 42
1 2 42
2 3 42
In [28]: for c in df:
...: exec("%s = %s" % ('A' + str(i), c))
...: i += 1
...:
In [29]: A1
Out[29]:
(0L, a 1
b 42
Name: 0, dtype: int64)
In [30]: A1[0]
Out[30]: 0L
In [32]: A1[1]
Out[32]:
a 1
b 42
Name: 0, dtype: int64
I am trying to use the replace method several times in order to change the indeces of a given level of a multiindex pandas' dataframe.
As seen here: Pandas: Modify a particular level of Multiindex, #John got a solution that works great so long the replace method is used once.
The problem is, that it does not work if I use this method several times.
E.g.
df.index = df.index.set_levels(df.index.levels[0].str.replace("dataframe_",'').replace("_r",' r'), level=0)
I get the following error message:
AttributeError: 'Index' object has no attribute 'replace'
What am I missing?
Use str.replace twice:
idx = df.index.levels[0].str.replace("dataframe_",'').str.replace("_r",' r')
df.index = df.index.set_levels(idx, level=0)
Another solution is converting to_series and then replace by dictionary:
d = {'dataframe_':'','_r':' r'}
idx = df.index.levels[0].to_series().replace(d)
df.index = df.index.set_levels(idx, level=0)
And solution with map and fillna, if large data and performance is important:
d = {'dataframe_':'','_r':' r'}
s = df.index.levels[0].to_series()
df.index = df.index.set_levels(s.map(d).fillna(s), level=0)
Sample:
df = pd.DataFrame({
'A':['dataframe_','_r', 'a'],
'B':[7,8,9],
'C':[1,3,5],
}).set_index(['A','B'])
print (df)
C
A B
dataframe_ 7 1
_r 8 3
a 9 5
d = {'dataframe_':'','_r':' r'}
idx = df.index.levels[0].to_series().replace(d)
df.index = df.index.set_levels(idx, level=0)
print (df)
C
A B
7 1
r 8 3
a 9 5
so I currently have a column containing values like this:
d = {'col1': [LINESTRING(174.76028 -36.80417,174.76041 -36.80389, 175.76232 -36.82345)]
df = pd.DataFrame(d)
and I am trying to make it so that I can:
1) apply a function to each of the numerical values and
2) end up with something like this.
d = {'col1': [LINESTRING], 'col2': [(174.76028, -36.80417),(174.76041 -36.80389), (175.76232 -36.82345)]
df = pd.DataFrame(d)
Any thoughts?
Thanks
Here is one way. Note that LineString accepts an ordered collection of tuples as an input. See the docs for more information.
We use operator.attrgetter to access the required attributes: coords and __class__.__name__.
import pandas as pd
from operator import attrgetter
class LineString():
def __init__(self, list_of_coords):
self.coords = list_of_coords
pass
df = pd.DataFrame({'col1': [LineString([(174.76028, -36.80417), (174.76041, -36.80389), (175.76232, -36.82345)])]})
df['col2'] = df['col1'].apply(attrgetter('coords'))
df['col1'] = df['col1'].apply(attrgetter('__class__')).apply(attrgetter('__name__'))
print(df)
col1 col2
0 LineString [(174.76028, -36.80417), (174.76041, -36.80389...
I am using Pandas and want to add rows to an empty DataFrame with columns already established.
So far my code looks like this...
def addRows(cereals,lines):
for i in np.arange(1,len(lines)):
dt = parseLine(lines[i])
dt = pd.Series(dt)
print(dt)
# YOUR CODE GOES HERE (add dt to cereals)
cereals.append(dt, ignore_index = True)
return(cereals)
However, when I run...
cereals = addRows(cereals,lines)
cereals
the dataframe returns with no rows, just the columns. I am not sure what I am doing wrong but I am pretty sure it has something to do with the append method. Anyone have any ideas as to what I am doing wrong?
There are two probably reasons your code is not operating as intended:
cereals.append(dt, ignore_index = True) is not doing what you think it is. You're trying to append a series, not a DataFrame there.
cereals.append(dt, ignore_index = True) does not modify cereals in place, so when you return it, you're returning an unchanged copy. An equivalent function would look like this:
--
>>> def foo(a):
... a + 1
... return a
...
>>> foo(1)
1
I haven't tested this on my machine, but I think you're fixed solution would look like this:
def addRows(cereals, lines):
for i in np.arange(1,len(lines)):
data = parseLine(lines[i])
new_df = pd.DataFrame(data, columns=cereals.columns)
cereals = cereals.append(new_df, ignore_index=True)
return cereals
by the way.. I don't really know where lines is coming from, but right away I would at least modify it to look like this:
data = [parseLine(line) for line in lines]
cereals = cereals.append(pd.DataFrame(data, cereals.columns), ignore_index=True)
How to add an extra row to a pandas dataframe
You could also create a new DataFrame and just append that DataFrame to your existing one. E.g.
>>> import pandas as pd
>>> empty_alph = pd.DataFrame(columns=['letter', 'index'])
>>> alph_abc = pd.DataFrame([['a', 0], ['b', 1], ['c', 2]], columns=['letter', 'index'])
>>> empty_alph.append(alph_abc)
letter index
0 a 0.0
1 b 1.0
2 c 2.0
As I noted in the link, you can also use the loc method on a DataFrame:
>>> df = empty_alph.append(alph_abc)
>>> df.loc[df.shape[0]] = ['d', 3] // df.shape[0] just finds next # in index
letter index
0 a 0.0
1 b 1.0
2 c 2.0
3 d 3.0
I have a df that looks like this:
df = pd.DataFrame(np.random.random((4,4)))
df.columns = pd.MultiIndex.from_product([['1','2'],['A','B']])
print df
1 2
A B A B
0 0.030626 0.494912 0.364742 0.320088
1 0.178368 0.857469 0.628677 0.705226
2 0.886296 0.833130 0.495135 0.246427
3 0.391352 0.128498 0.162211 0.011254
How can I rename column '1' and '2' as 'One' and 'Two'?
I thought df.rename() would've helped but it doesn't. Have no idea how to do this?
That is indeed something missing in rename (ideally it should let you specify the level).
Another way is by setting the levels of the columns index, but then you need to know all values for that level:
In [41]: df.columns.levels[0]
Out[41]: Index([u'1', u'2'], dtype='object')
In [43]: df.columns = df.columns.set_levels(['one', 'two'], level=0)
In [44]: df
Out[44]:
one two
A B A B
0 0.899686 0.466577 0.867268 0.064329
1 0.162480 0.455039 0.736870 0.759595
2 0.620960 0.922119 0.060141 0.669997
3 0.871107 0.043799 0.080080 0.577421
In [45]: df.columns.levels[0]
Out[45]: Index([u'one', u'two'], dtype='object')
As of pandas 0.22.0 (and probably much earlier), you can specify the level:
df = df.rename(columns={'1': one, '2': two}, level=0)
or, alternatively (new notation since pandas 0.21.0):
df = df.rename({'1': one, '2': two}, axis='columns', level=0)
But actually, it works even when omitting the level:
df = df.rename(columns={'1': one, '2': two})
In that case, all column levels are checked for occurrences to be renamed.
Use set_levels:
>>> df.columns.set_levels(['one','two'], 0, inplace=True)
>>> print(df)
one two
A B A B
0 0.731851 0.489611 0.636441 0.774818
1 0.996034 0.298914 0.377097 0.404644
2 0.217106 0.808459 0.588594 0.009408
3 0.851270 0.799914 0.328863 0.009914
df.columns.set_levels(['one', 'two'], level=0, inplace=True)
df.rename_axis({'1':'one', '2':'two'}, axis='columns', inplace=True)
This is a good question. Combining the answer above, you can write a function:
def rename_col( df, columns, level = 0 ):
def rename_apply ( x, rename_dict ):
try:
return rename_dict[x]
except KeyError:
return x
if isinstance(df.columns, pd.core.index.MultiIndex):
df.columns = df.columns.set_levels([rename_apply(x, rename_dict = columns ) for x in df.columns.levels[level]], level= level)
else:
df.columns = [rename_apply(x, rename_dict = columns ) for x in df.columns ]
return df
It worked for me.
Ideally, a functionality like this should be integrated into the "official" "rename" function in the future, so you don't need to write a hack like this.