I have a list having Pandas Series objects, which I've created by doing something like this:
li = []
li.append(input_df.iloc[0])
li.append(input_df.iloc[4])
where input_df is a Pandas Dataframe
I want to convert this list of Series objects back to Pandas Dataframe object, and was wondering if there is some easy way to do it
Based on the post you can do this by doing:
pd.DataFrame(li)
To everyone suggesting pd.concat, this is not a Series anymore. They are adding values to a list and the data type for li is a list. So to convert the list to dataframe then they should use pd.Dataframe(<list name>).
Since the right answer has got hidden in the comments, I thought it would be better to mention it as an answer:
pd.concat(li, axis=1).T
will convert the list li of Series to DataFrame
It seems that you wish to perform a customized melting of your dataframe.
Using the pandas library, you can do it with one line of code. I am creating below the example to replicate your problem:
import pandas as pd
input_df = pd.DataFrame(data={'1': [1,2,3,4,5]
,'2': [1,2,3,4,5]
,'3': [1,2,3,4,5]
,'4': [1,2,3,4,5]
,'5': [1,2,3,4,5]})
Using pd.DataFrame, you will be able to create your new dataframe that melts your two selected lists:
li = []
li.append(input_df.iloc[0])
li.append(input_df.iloc[4])
new_df = pd.DataFrame(li)
if what you want is that those two lists present themselves under one column, I would not pass them as list to pass those list back to dataframe.
Instead, you can just append those two columns disregarding the column names of each of those columns.
new_df = input_df.iloc[0].append(input_df.iloc[4])
Let me know if this answers your question.
The answer already mentioned, but i would like to share my version:
li_df = pd.DataFrame(li).T
If you want each Series to be a row of the dataframe, you should not use concat() followed by T(), unless all your values are of the same datatype.
If your data has both numerical and string values, then the transpose() function will mangle the dtypes, likely turning them all to objects.
The right way to do this in general is:
Convert each series to a dict()
Pass the list of dicts either into the pd.Dataframe() constructor directly or use pd.Dataframe.from_dicts and set the orient keyword to "index."
In your case the following should work:
my_list_of_dicts = [s.to_dict() for s in li]
my_df = pd.Dataframe(my_list_of_dicts)
Related
I have a problem with a list containing many dataframes. I create them in that way:
listWithDf = []
listWithDf.append(file)
And I got:
And now I wanna work with data inside this list but I want to have one dataframe with all the data. I know this is a very ugly way and this must be changed every time the quantity of the dataframe is changed.
df = pd.concat([listWithDf[0], listWithDf[1], ...)
So, I was wondering is any better way to unpack a list like that. Or maybe is a different way to make some dataframe in a loop, which contains the data that I need.
Here's a way you can do it as suggested in comments by #sjw:
df = pd.concat(listWithDf)
Here's a method with a loop(but it's unnecessary!):
df = pd.concat([i for i in listWithDf])
I have a list of columns from a dataframe
df_date=[df[var1],df[var2]]
I want to change the data in that columns to date time type
for t in df_date:
pd.DatetimeIndex(t)
for some reason its not working
I whould like to understand what is more general solution for applying sevral operations on several columns.
As an alternative, you can do:
for column_name in ["var1", "var2"]:
df[column_name] = pd.DatetimeIndex(df[column_name])
You can use pandas.to_datetime and pandas.DataFrame.apply to convert a dataframe's entire content to datetime. You can also filter out the columns you need and apply it only to them.
df[['column1', 'column2']] = df[['column1', 'column2']].apply(pd.to_datetime)
Note that a list of series and a DataFrame are not the same thing.
A DataFrame is accessed like this:
df[[columns]]
While a list of series is looks like this:
[seriesA, seriesB]
Assuming I have a pandas DF as follows:
mydf=pd.DataFrame([{'sectionId':'f0910b98','xml':'<p/p>'},{'sectionId':'f0345b98','xml':'<a/a>'}])
mydf.set_index('sectionId', inplace=True)
I would like to get a dicctionary out of it as follwos:
{'f0910b98':'<p/p>', 'f0345b98':'<a/a>'}
I tried the following:
mydf.to_dict()
mydf.to_dict('records')
And it is not what I am looking for.
I am looking for the correct way to use to_dict()
Note: I know I can get the two columns into two lists and pack them in a dict like in:
mydict = dict(zip(mydf.sectionId, mydf.xml))
but I am looking for a pandas straight direct method (if there is one)
You could transpose your dataframe and then to_dict it and select the first item (the xml now-index).
mydf.T.to_dict(orient='records')[0]
returns
{'f0910b98': '<p/p>', 'f0345b98': '<a/a>'}
I have a pandas dataframe which has may be 1000 Columns. However I do not need so many columns> I need columns only if they match/starts/contains specific strings.
So lets say I have a dataframe columns like
df.columns =
HYTY, ABNH, CDKL, GHY#UIKI, BYUJI##hy BYUJI#tt BBNNII#5 FGATAY#J ....
I want to select columns whose name are only like HYTY, CDKL, BYUJI* & BBNNI*
So what I was trying to do is to create a list of regular expressions like:
import re
relst = ['HYTY', 'CDKL*', 'BYUJI*', 'BBNI*']
my_w_lst = [re.escape(s) for s in relst]
mask_pattrn = '|'.join(my_w_lst)
Then I create the logical vector to give me a list of TRUE/FALSE to say whether the string is present or not. However, not understanding how to get the dataframe of only those true selected columns from this.
Any help will be appreciated.
Using what you already have you can pass your mask to filter like:
df.filter(regex=mask_pattrn)
Use re.findall(). It will give you a list of columns to pass to df[mylist]
We can do startswith
relst = ['CDKL', 'BYUJI', 'BBNI']
subdf = df.loc[:,df.columns.str.startswith(tuple(relst))|df.columns.isin(['HYTY'])]
I have an array of unique elements and a dataframe.
I want to find out if the elements in the array exist in all the row of the dataframe.
p.s- I am new to python.
This is the piece of code I've written.
for i in uniqueArray:
for index,row in newDF.iterrows():
if i in row['MKT']:
#do something to find out if the element i exists in all rows
Also, this way of iterating is quite expensive, is there any better way to do the same?
Thanks in Advance.
Pandas allow you to filter a whole column like if it was Excel:
import pandas
df = pandas.Dataframe(tableData)
Imagine your columns names are "Column1", "Column2"... etc
df2 = df[ df["Column1"] == "ValueToFind"]
df2 now has only the rows that has "ValueToFind" in df["Column1"]. You can concatenate several filters and use AND OR logical doors.
You can try
for i in uniqueArray:
if newDF['MKT'].contains(i).any():
# do your task
You can use isin() method of pd.Series object.
Assuming you have a data frame named df and you check if your column 'MKT' includes any items of your uniqueArray.
new_df = df[df.MKT.isin(uniqueArray)].copy()
new_df will only contain the rows where values of MKT is contained in unique Array.
Now do your things on new_df, and join/merge/concat to the former df as you wish.