how to set columns of pandas dataframe as list

how to set columns of pandas dataframe as list - python

I have a pandas dataframe and when I try to acess its columns (like df[["a"]) it is not possible because
the columns are defined as an "Index" object (pandas.core.indexes.base.Index). or Index(['col2','col2'], [![enter image description here][1]][1]dtype='object')
I tried convert it doing something like df.columns = df.columns.tolist() and also df.columns = [str(col) for col in df.columns]
but the columns remained as an Index object.
What I want is to make df.columns and it would return a list object.
What Can I do ?

columns is not callable. So, you need to remove the parenthesis ():
df.columns will give you the name of the columns as an object.
list(df.columns) will give you the name of the columns as a list.
In your example, list(ss.columns) will return a list of column names.

try this:
df.columns.values.tolist()
since you were trying to convert it using this approach, you missed the values attribute

You have to wrap it over list Constructor to function it like a list i.e list(ss.columns).
list(ss.columns)
Hope this works!

Related

Get a pandas column name as a string

I am having a dataframe containing multiple columns and multiple rows. I am trying to find the column which contains the entry 'some_string'. I managed to this by
col = df.columns[df.isin(['some_string']).any()]
I would like to have col as a string, but instead it is of the following type
In [47]:
print(col)
Out[47]:
Index(['col_N'], dtype='object')
So how can I get just 'col_N' returned? I just can't find an answer to that! Tnx

You can treat your output as a list. If you have only one match you can as for
print(col[0])
If you have one or more and you want to print then all, you can convert it to a list:
print(list(col))
or you can only pass the values of col to the print:
print(*col)

I think typecasting will help
list_of_columns = list(df.columns)

How to find if a values exists in all rows of a dataframe?

I have an array of unique elements and a dataframe.
I want to find out if the elements in the array exist in all the row of the dataframe.
p.s- I am new to python.
This is the piece of code I've written.
for i in uniqueArray:
for index,row in newDF.iterrows():
if i in row['MKT']:
#do something to find out if the element i exists in all rows
Also, this way of iterating is quite expensive, is there any better way to do the same?
Thanks in Advance.

Pandas allow you to filter a whole column like if it was Excel:
import pandas
df = pandas.Dataframe(tableData)
Imagine your columns names are "Column1", "Column2"... etc
df2 = df[ df["Column1"] == "ValueToFind"]
df2 now has only the rows that has "ValueToFind" in df["Column1"]. You can concatenate several filters and use AND OR logical doors.

You can try
for i in uniqueArray:
if newDF['MKT'].contains(i).any():
# do your task

You can use isin() method of pd.Series object.
Assuming you have a data frame named df and you check if your column 'MKT' includes any items of your uniqueArray.
new_df = df[df.MKT.isin(uniqueArray)].copy()
new_df will only contain the rows where values of MKT is contained in unique Array.
Now do your things on new_df, and join/merge/concat to the former df as you wish.

Creating new pandas dataframe from certain columns of existing dataframe

I have read a csv file into a pandas dataframe and want to do some simple manipulations on the dataframe. I can not figure out how to create a new dataframe based on selected columns from my original dataframe. My attempt:
names = ['A','B','C','D']
dataset = pandas.read_csv('file.csv', names=names)
new_dataset = dataset['A','D']
I would like to create a new dataframe with the columns A and D from the original dataframe.

It is called subset - passed list of columns in []:
dataset = pandas.read_csv('file.csv', names=names)
new_dataset = dataset[['A','D']]
what is same as:
new_dataset = dataset.loc[:, ['A','D']]
If need only filtered output add parameter usecols to read_csv:
new_dataset = pandas.read_csv('file.csv', names=names, usecols=['A','D'])
EDIT:
If use only:
new_dataset = dataset[['A','D']]
and use some data manipulation, obviously get:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
If you modify values in new_dataset later you will find that the modifications do not propagate back to the original data (dataset), and that Pandas does warning.
As pointed EdChum add copy for remove warning:
new_dataset = dataset[['A','D']].copy()

You must pass a list of column names to select columns. Otherwise, it will be interpreted as MultiIndex; df['A','D'] would work if df.columns was MultiIndex.
The most obvious way is df.loc[:, ['A', 'B']] but there are other ways (note how all of them take lists):
df1 = df.filter(items=['A', 'D'])
df1 = df.reindex(columns=['A', 'D'])
df1 = df.get(['A', 'D']).copy()
N.B. items is the first positional argument, so df.filter(['A', 'D']) also works.
Note that filter() and reindex() return a copy as well, so you don't need to worry about getting SettingWithCopyWarning later.

pandas automatically create dataframe from list of series with column names

I have a list of pandas series objects. I have a list of functions that generate them. How do I create a dataframe of the objects with the column names being the names of the functions that created the objects?
So, to create the regular dataframe, I've got:
pandas.concat([list of series objects],axis=1,join='inner')
But I don't currently have a way to insert all the functionA.__name__, functionB.__name__, etc. as column names in the dataframe.
How would I preserve the same conciseness, and set the column names?

IIUC, given your concat dataframe df you can:
df = pandas.concat([list of series objects],axis=1,join='inner')
and then assign the column names as a list of functions names:
df.columns = [functionA.__name__, functionB.__name__, etc.]
Hope that helps.

You can set the column names in a second step:
df = pandas.concat([list of series objects],axis=1,join='inner')
df.columns = [functionA.__name__, functionB.__name__]

How to get pandas.DataFrame columns containing specific dtype

I'm using df.columns.values to make a list of column names which I then iterate over and make charts, etc... but when I set this up I overlooked the non-numeric columns in the df. Now, I'd much rather not simply drop those columns from the df (or a copy of it). Instead, I would like to find a slick way to eliminate them from the list of column names.
Now I have:
names = df.columns.values
what I'd like to get to is something that behaves like:
names = df.columns.values(column_type=float64)
Is there any slick way to do this? I suppose I could make a copy of the df, and drop those non-numeric columns before doing columns.values, but that strikes me as clunky.
Welcome any inputs/suggestions. Thanks.

Someone will give you a better answe than this possibly, but one thing I tend to do is if all my numeric data are int64 or float64 objects, then you can create a dict of the column data types and then use the values to create your list of columns.
So for example, in a dataframe where I have columns of type float64, int64 and object firstly you can look at the data types as so:
DF.dtypes
and if they conform to the standard whereby the non-numeric columns of data are all object types (as they are in my dataframes), then you can do the following to get a list of the numeric columns:
[key for key in dict(DF.dtypes) if dict(DF.dtypes)[key] in ['float64', 'int64']]
Its just a simple list comprehension. Nothing fancy. Again, though whether this works for you will depend upon how you set up you dataframe...

dtypes is a Pandas Series.
That means it contains index & values attributes.
If you only need the column names:
headers = df.dtypes.index
it will return a list containing the column names of "df" dataframe.

There's a new feature in 0.14.1, select_dtypes to select columns by dtype, by providing a list of dtypes to include or exclude.
For example:
df = pd.DataFrame({'a': np.random.randn(1000),
'b': range(1000),
'c': ['a'] * 1000,
'd': pd.date_range('2000-1-1', periods=1000)})
df.select_dtypes(['float64','int64'])
Out[129]:
a b
0 0.153070 0
1 0.887256 1
2 -1.456037 2
3 -1.147014 3
...

To get the column names from pandas dataframe in python3-
Here I am creating a data frame from a fileName.csv file
>>> import pandas as pd
>>> df = pd.read_csv('fileName.csv')
>>> columnNames = list(df.head(0))
>>> print(columnNames)

You can also try to get the column names from panda data frame that returns columnn name as well dtype. here i'll read csv file from https://mlearn.ics.uci.edu/databases/autos/imports-85.data but you have define header that contain columns names.
import pandas as pd
url="https://mlearn.ics.uci.edu/databases/autos/imports-85.data"
df=pd.read_csv(url,header = None)
headers=["symboling","normalized-losses","make","fuel-type","aspiration","num-of-doors","body-style",
"drive-wheels","engine-location","wheel-base","length","width","height","curb-weight","engine-type",
"num-of-cylinders","engine-size","fuel-system","bore","stroke","compression-ratio","horsepower","peak-rpm"
,"city-mpg","highway-mpg","price"]
df.columns=headers
print df.columns

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to set columns of pandas dataframe as list - python

columns is not callable. So, you need to remove the parenthesis (): df.columns will give you the name of the columns as an object. list(df.columns) will give you the name of the columns as a list. In your example, list(ss.columns) will return a list of column names.

try this: df.columns.values.tolist() since you were trying to convert it using this approach, you missed the values attribute

You have to wrap it over list Constructor to function it like a list i.e list(ss.columns). list(ss.columns) Hope this works!

Related

Get a pandas column name as a string

How to find if a values exists in all rows of a dataframe?

Creating new pandas dataframe from certain columns of existing dataframe

pandas automatically create dataframe from list of series with column names

How to get pandas.DataFrame columns containing specific dtype

Categories

Resources