I'm trying to extract the following dictionary using a pandas data frame into Excel:
results = {'ZF_DTSPP': [735.0500558302846,678.5413714617252,772.0300704610595,722.254907241738,825.2955175305726], 'ZF_DTSPPG': [732.0500558302845,637.4786326591071,655.8462451037873,721.404907241738,821.8455175305724]}
This is my code:
df = pd.DataFrame(data=results, index=[5, 2])
df = (df.T)
print(df)
df.to_excel('dict1.xlsx')
Somehow I always receive following error:
"ValueError: Shape of passed values is (5, 2), indices imply (2, 2)".
What can I do? How do I need to adapt the index?
Is there a way to compare the different values of "ZF_DTSPP" and "ZF_DTSPPG" directly with python?
You can use pd.DataFrame.from_dict as shown in pandas-from-dict, then your code:
df = pd.DataFrame.from_dict(results)
Related
I'm trying to run a function over many partitions of a Dask dataframe. The code requires unpacking tuples and works well with Pandas but not with Dask map_partitions. The data corresponds to lists of tuples, where the length of the lists can vary, but the tuples are always of a known fixed length.
import dask.dataframe as dd
import pandas as pd
def func(df):
for index, row in df.iterrows():
tuples = row['A']
for t in tuples:
x, y = t
# Do more stuff
# Create Pandas dataframe
# Each list may have a different length, tuples have fixed known length
df = pd.DataFrame({'A': [[(1, 1), (3, 4)], [(3, 2)]]})
# Pandas to Dask
ddf = dd.from_pandas(df, npartitions=2)
# Run function over Pandas dataframe
func(df)
# Run function over Dask dataframe
ddf.map_partitions(func).compute()
Here, the Pandas version runs with no issues. However, the Dask one, raises the error:
ValueError: Metadata inference failed in `func`.
You have supplied a custom function and Dask is unable to
determine the type of output that that function returns.
To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.
Original error is below:
------------------------
ValueError('not enough values to unpack (expected 2, got 1)')
In my original function, I'm using these tuples as auxiliary variables, and the data which is finally returned is completely different so using meta doesn't fix the problem. How can I unpack the tuples?
When you use map_partitions without specifying meta, dask will try to run the functions to infer what the output is. This can cause problems if your function is not compatible with the sample dataframe used, you can see this sample dataframe with ddf._meta_nonempty (in this case it will return a column of foo).
An easy fix in this case is to provide meta, it's okay for returned data to be of different format, e.g. if each returned result is a list, you can provide meta=list:
import dask.dataframe as dd
import pandas as pd
def func(df):
for index, row in df.iterrows():
tuples = row['A']
for t in tuples:
x, y = t
return [1,2,3]
df = pd.DataFrame({'A': [[(1, 1), (3, 4)], [(3, 2)]]})
ddf = dd.from_pandas(df, npartitions=2)
ddf.map_partitions(func, meta=list).compute()
Another approach is to make your function compatible with the sample dataframe used. The sample dataframe has an object column but it contains foo rather than a list of tuples, so it cannot be unpacked as a tuple. Modifying your function to accept non-tuple columns (with x, *y = t) will make it work:
import dask.dataframe as dd
import pandas as pd
def func(df):
for index, row in df.iterrows():
tuples = row['A']
for t in tuples:
x, *y = t
return [1,2,3]
df = pd.DataFrame({'A': [[(1, 1), (3, 4)], [(3, 2)]]})
ddf = dd.from_pandas(df, npartitions=2)
#notice that no meta is specified here
ddf.map_partitions(func).compute()
I have python string list, which i want to convert into pandas dataframe with predefined columns. I have tried following code but it shows error.
I have tried following code.
import pandas as pd
list = ['jack', '9860', 'datasc', 'vill','0', 'stack']
df = pd.DataFrame(list, columns= ['name', 'no','job'])
ValueError: Shape of passed values is (1, 6), indices imply (3, 6)
Dont use variable list, because python code word (builtin).
Convert list to numpy array and reshape:
L = ['jack', '9860', 'datasc', 'vill','0', 'stack']
df = pd.DataFrame(np.array(L).reshape(-1,3), columns= ['name', 'no','job'])
print (df)
name no job
0 jack 9860 datasc
1 vill 0 stack
for L,M in laundry1['latitude'],laundry1['longitude']:
print('latitude:-')
print(L)
print('longitude:-')
print(M)
i am trying to iterate over the two columns of a data-frame, assigning there value to L & M and printing there value but it shows error of "too many values to unpack (expected 2) " view of the dataset with error view ->enter image description here
sample output:
latitude:-
22.1449787
18.922290399999998
22.1544736
22.136872
22.173595499999998
longitude:-
-101.0056829
-99.234332
-100.98580909999998
-100.9345736
-100.9946027
Use zip:
for L,M in zip(laundry1['latitude'],laundry1['longitude']):
print('latitude:-')
print(L)
print('longitude:-')
print(M)
Pandas has his own iterate methods, if you just want to iterate over a dataframe values, without modifying it, i suggest you to use the itertuples method:
import pandas as pd
values = [[22.1449787,-101.0056829]
,[18.922290399999998,-99.234332]
,[22.1544736,-100.98580909999998]
,[22.136872,-100.9345736]
,[22.173595499999998,-100.9946027]]
df = pd.DataFrame(values, columns=['latitude','longitude'])
for row in df.itertuples():
print(row.latitude)
print(row.longitude)
I'm trying to create a DataFrame in Pandas with the following code:
df_coefficients = pd.DataFrame(data = log_model.coef_, index = X.columns,
columns = ['Coefficients'])
However, I keep getting the following error:
Shape of passed values is (5, 1), indices imply (1, 5)
The values and indices are as follows:
Indices =
Index([u'Daily Time Spent on Site', u'Age', u'Area Income',
u'Daily Internet Usage', u'Male'],
dtype='object')
Values =
array([[ -4.45816498e-02, 2.18379839e-01, -7.63621392e-06,
-2.45264007e-02, 1.13334440e-03]])
How would I fix this? I've built the same type of table before and I've never gotten this error.
Any help would be appreciated.
Thanks
It looks like your Index and Values arrays have different shapes. As you can see the Index array has single brackets while the Values array has double brackets.
That way python reads index as having shape (5,1) while the Values array is (1,5).
if you enter Values as you wrote in the question:
Values =
array([[ -4.45816498e-02, 2.18379839e-01, -7.63621392e-06,
-2.45264007e-02, 1.13334440e-03]])
and call Values.shape it returns
Values.shape
(1,5)
Instead if you set Values as:
Values = np.array([ -4.45816498e-02, 2.18379839e-01, -7.63621392e-06,
-2.45264007e-02, 1.13334440e-03])
then the shape of Values will be (5,) which will fit with the index array.
Your data has five columns and one row instead of one column and five rows. Just use the transposed version of it with .T:
df_coefficients = pd.DataFrame(data = log_model.coef_.T, index = X.columns,
columns = ['Coefficients'])
I am trying to create a pandas dataframe from a list of image files (.png files)
samples = []
img = misc.imread('a.png')
X = img.reshape(-1, 3)
samples.append(X)
I added multiple .png files in samples like this. I am then trying to create a pandas dataframe from this.
df = pd.DataFrame(samples)
It is throwing error "ValueError: Must pass 2-d input". What is wrong here? Is it really possible to convert a list of image files to pandas dataframe. I am totally new to panda, so do not mind if this looks silly.
For ex.
X = [[1,2,3,4],[2,3,4,5]] df = pd.DataFrame(X)
gives me a nice dataframe of samples 2 as expected (row 2 column 4), but it is not happening with image files.
you can use:
df = pd.DataFrame.from_records(samples)
If you want to create a DataFrame from a list, the easiest way to do this is to create a pandas.Series, like the following example:
import pandas as pd
samples = ['a','b','c']
s = pd.Series(samples)
print s
output:
0 a
1 b
2 c
X = img.reshape(-1, 3)
samples.append(X)
So X is a 2D array of size (number_of_pixels,3), and that makes samples a 3D list of size (number_of_images, numbers_pixels, 3) . So the error you're getting ( "ValueError: Must pass 2-d input") is legitimate.
what you probably want is :
X = img.flatten()
or
X = img.reshape(-1)
either is going to give you X of size (number_of_pixels*3,) and samples of size (number_of_images, number_of_pixels*3).
you will probably take extra care to ensure that all images have the same number of pixels and channels.
You can use reshape(-1)
x.append((img[::2,::2]/255.0).reshape(-1))
df = pd.DataFrame(x)