Error while converting tuples to Pandas DataFrame - python

When i try to convert tuples to pandas dataframe i get the following error:
DataFrame constructor not properly called!
I am using the following code
columnlist=["Timestamp","Price","Month","Day","DayofWeek","tDaysleftMonth","tDayinMonth","tDayinWeek"]
tickerData=pd.DataFrame(tickerDataRaw,columns=columnlist)
The data was loaded to tuples from a MySQL database ,
Please find a screenshot of the data. Data I am trying to convert

I think you can use DataFrame.from_records with converting tuples to list:
import pandas as pd
tuples = ((1,2,3),(4,6,7),(7,3,6),(8,2,7),(4,6,3),(7,3,6))
columnlist = ['a','b','c']
df = pd.DataFrame.from_records(list(tuples), columns=columnlist)
print (df)
a b c
0 1 2 3
1 4 6 7
2 7 3 6
3 8 2 7
4 4 6 3
5 7 3 6
Another solution with DataFrame constructor only:
import pandas as pd
tuples = ((1,2,3),(4,6,7),(7,3,6),(8,2,7),(4,6,3),(7,3,6))
columnlist = ['a','b','c']
df = pd.DataFrame(list(tuples), columns=columnlist)
print (df)
a b c
0 1 2 3
1 4 6 7
2 7 3 6
3 8 2 7
4 4 6 3
5 7 3 6
EDIT:
If check DataFrame and parameter data:
data : numpy ndarray (structured or homogeneous), dict, or DataFrame
Dict can contain Series, arrays, constants, or list-like objects

According to the Dataframe documentation page, data is required to be
numpy ndarray (structured or homogeneous), dict, or DataFrame
The easiest way to resolve your problem is simplty load your data in a numpy array and it should work fine.
>>> tuples = ((1,2,3),(1,2,3),(1,2,3))
>>> columns = ["A", "B", "C"]
>>> pd.DataFrame(tuples, columns=columns)
PandasError: DataFrame constructor not properly called!
>>> pd.DataFrame(np.array(tuples), columns=columns)
A B C
0 1 2 3
1 1 2 3
2 1 2 3

Related

Pandas solution to computing the maximum of two sums of two columns?

So I have a DataFrame with (amongst others) four colours with numerical values. I want to add a column to the DataFrame that has the maximum of the two sums obtained from summing two columns.
My solutions so far is
from pandas import DataFrame
df = DataFrame(data={'text': ['a','b','c'], 'a':[1,2,3],'b':[2,3,4],'c':[5,4,2],'d':[-2,4,1]})
df['sum1'] = df['a'].add(df['b'])
df['sum2'] = df['c'].add(df['d'])
df['maxsum'] = df[['sum1','sum2']].max(axis=1)
which gives the desired result.
I am pretty sure, there is a more concise way to do this...
There is nothing wrong with your approach. In fact, it is the approach I would take if nothing more than the fact it is easy to read and figure out what you are doing. But if you are looking for another solution, here is one using numpy.ufunc.reduceat
import pandas as pd
import numpy as np
# sample frame
df = pd.DataFrame(data={'text': ['a','b','c'], 'a':[1,2,3],'b':[2,3,4],'c':[5,4,2],'d':[-2,4,1]})
# we skip the first column and convert to an array - df[df.columns[1:]].values
# we specify the indicies to slice - np.arange(len(df.columns[1:]))[::2]
# then find the max
df['max'] = np.max(np.add.reduceat(df[df.columns[1:]].values,
np.arange(len(df.columns[1:]))[::2],
axis=1),
axis=1)
text a b c d max
0 a 1 2 5 -2 3
1 b 2 3 4 4 8
2 c 3 4 2 1 7
Not that it much more concised, but instead of your current approach you can apply one-shot assignment:
df = df.assign(sum1=df[['a', 'b']].sum(1), sum2=df[['c', 'd']].sum(1),
maxsum=lambda df: df[['sum1','sum2']].max(1))
text a b c d sum1 sum2 maxsum
0 a 1 2 5 -2 3 3 3
1 b 2 3 4 4 5 8 8
2 c 3 4 2 1 7 3 7

Put dataframe rows in front

I have a dataframe like this:
A
B
1
2
3
4
5
6
I want to take its rows and put them in front like this:
A
B
A
B
A
B
1
2
3
4
5
6
Is there any way I can do that?
I tried using iloc but could not figure out how to do this.
One option is to:
flatten values as numpy array using .values dataframe property and np.reshape function
build a new dataframe, whose column names can be obtained by using np.tile on the original column list
pd.DataFrame(
df.values.reshape(1, -1),
columns = np.tile(df.columns.values, len(df)).tolist()
)
Output:
A B A B A B
0 1 2 3 4 5 6

Pandas add a second level index to the columns using a list

I have a dataframe with column headings (and for my real data multi-level row indexes). I want to add a second level index to the columns based on a list I have.
import pandas as pd
data = {"apple": [7,5,6,4,7,5,8,6],
"strawberry": [3,5,2,1,3,0,4,2],
"banana": [1,2,1,2,2,2,1,3],
"chocolate" : [5,8,4,2,1,6,4,5],
"cake":[4,4,5,1,3,0,0,3]
}
df = pd.DataFrame(data)
food_cat = ["fv","fv","fv","j","j"]
I am wanting something that looks like this:
I tried to use How to add a second level column header/index to dataframe by matching to dictionary values? - however couldn't get it working (and not ideal as I'd need to figure out how to automate the dictionary, which I don't have).
I also tried adding the list as a row in the dataframe and converting that row to a second level index as in this answer using
df.loc[len(df)] = food_cat
df = pd.MultiIndex.from_arrays(df.columns, df.iloc[len(df)-1])
but got the error
Check if lengths of all arrays are equal or not,
TypeError: Input must be a list / sequence of array-likes.
I also tried using df = pd.MultiIndex.from_arrays(df.columns, np.array(food_cat)) with import numpy as np but got the same error.
I feel like this should be a simple task (it is for rows), and there are a lot of questions asked, but I was struggling to find something I could duplicate to adapt to my data.
Pandas multi index creation requires a list(or list like) passed as an argument:
df.columns = pd.MultiIndex.from_arrays([food_cat, df.columns])
df
fv j
apple strawberry banana chocolate cake
0 7 3 1 5 4
1 5 5 2 8 4
2 6 2 1 4 5
3 4 1 2 2 1
4 7 3 2 1 3
5 5 0 2 6 0
6 8 4 1 4 0
7 6 2 3 5 3

Convert multi-dim list in one column in python pandas

I would like to know whether I can get some help in "translating" a multi dim list in a single column of a frame in pandas.
I found help here to translate a multi dim list in a column with multiple columns, but I need to translate the data in one
Suppose I have the following list of list
x=[[1,2,3],[4,5,6]]
If I create a frame I get
frame=pd.Dataframe(x)
0 1 2
1 2 3
4 5 6
But my desire outcome shall be
0
1
2
3
4
5
6
with the zero as column header.
I can of course get the result with a for loop, which from my point of view takes much time. Is there any pythonic/pandas way to get it?
Thanks for helping men
You can use np.concatenate
x=[[1,2,3],[4,5,6]]
frame=pd.DataFrame(np.concatenate(x))
print(frame)
Output:
0
0 1
1 2
2 3
3 4
4 5
5 6
First is necessary flatten values of lists and pass to DataFrame constructor:
df = pd.DataFrame([z for y in x for z in y])
Or:
from itertools import chain
df = pd.DataFrame(list(chain.from_iterable(x)))
print (df)
0
0 1
1 2
2 3
3 4
4 5
5 6
If you use numpy you can utilize the method ravel():
pd.DataFrame(np.array(x).ravel())

How can I sort one column without changing other columns in pandas?

Example:
Current df looks like:
df=
A B
1 5
2 6
3 8
4 1
I want the resulting df to be like this (B is sorted and A remains untouched):
df=
A B
1 8
2 6
3 5
4 1
You need to break an internal Pandas security mechanism - aligning by index, which takes care of the data consistency. So assigning 1D Numpy array or a vanilla Python list would do the trick, because both of them don't have an index, so Pandas can't do alignment:
df['B'] = df['B'].sort_values(ascending=False).values
or
df['B'] = df['B'].sort_values(ascending=False).tolist()
both yield:
In [77]: df
Out[77]:
A B
0 1 8
1 2 6
2 3 5
3 4 1
You can do this as well :
df['B'] = sorted(df['B'].tolist())[::-1]

Categories

Resources