im trying to convert a data dictionary with a structure as below:
{'name': array(['Ben','Sean,'Fred'])
'age': array([22, 16, 35]),
'marks': array([98, 75, 60]),
'result': array('HD','D','C')}
I need to then filter out the dictionary to only include name, mark and result in the new numpy array to be able to plot on a graph (i can do this but cant for the life of me filter the list and then convert to numpy)
Let's assume your dictionary is something like this.
dict = {
'name': ['Ben','Sean','Fred'],
'age': [22, 16, 35],
'marks': [98, 75, 60],
'result': ['HD','D','C']
}
You can iterate over the dictionary to get desired values and append them into a list. Then convert it into a NumPy array. Here I am using all of the keys
name, age, marks, result
but you can filter some keys if you like.
if key not in ['age']:
import numpy as np
data_list = []
for key, val in dict.items():
data_list.append(val)
numpy_array = np.array(data_list)
transpose = numpy_array.T
transpose_list = transpose.tolist()
The end result will be following:
[['Ben', '22', '98', 'HD'],
['Sean', '16', '75', 'D'],
['Fred', '35', '60', 'C']]
You can try pandas
import pandas as pd
d = {
'name': ['Ben','Sean','Fred'],
'age': [22, 16, 35],
'marks': [98, 75, 60],
'result': ['HD','D','C']
}
df = pd.DataFrame(d)
result = df[['name', 'marks', 'result']].T.values
print(type(result))
print(result)
<class 'numpy.ndarray'>
[['Ben' 'Sean' 'Fred']
[98 75 60]
['HD' 'D' 'C']]
Related
import pandas as pd
df = pd.DataFrame({'Name': ['John', 'John', 'Sara', 'Sara', 'Sara', 'Peter', 'Peter'],
'Age': [11, 22, 33, 44, 55, 66, 77]})
Assume that I have a data frame which is given above. My goal is to convert this data frame to the following dictionary format below. Does anybody know a convenient way to solve this problem? Thanks in advance.
# Expected Output:
out_df = {'John':[11, 22], 'Sara': [33, 44, 55], 'Peter': [66, 77]}
First aggregate list and then convert Series to dictionary:
d = df.groupby('Name')['Age'].agg(list).to_dict()
I would like to get the content of specific row without header column , I'm going to use df.iloc[row number] , but it didn't give me an expected result ?
my code as below:
import pandas as pd
df = pd.DataFrame({
'first_name': ['John', 'Jane', 'Marry', 'Victoria', 'Gabriel', 'Layla'],
'last_name': ['Smith', 'Doe', 'Jackson', 'Smith', 'Brown', 'Martinez'],
'age': [34, 29, 37, 52, 26, 32]},
)
df.head()
df_temp = df.loc[2]
The result i get is:
first_name Marry
last_name Jackson
age 37
Name: 2, dtype: object
I expected it could give me a list , sth like below:
['Marry', 'Jackson','37']
Any idea to do this, could you please advise for my case?
Well there are many functions in pandas that could help you do this. to_String() or values are a few among them.
So if you do something like
import pandas as pd
df = pd.DataFrame({
'first_name': ['John', 'Jane', 'Marry', 'Victoria', 'Gabriel', 'Layla'],
'last_name': ['Smith', 'Doe', 'Jackson', 'Smith', 'Brown', 'Martinez'],
'age': [34, 29, 37, 52, 26, 32]},
)
df.head()
df_temp = df.loc[2].to_String()
print(df_temp)
you will get an output like this for your given code:
first_name Marry
last_name Jackson
age 37
however in your case because you want a list you can just call values and get it as you want. Here's your updated code below:
import pandas as pd
df = pd.DataFrame({
'first_name': ['John', 'Jane', 'Marry', 'Victoria', 'Gabriel', 'Layla'],
'last_name': ['Smith', 'Doe', 'Jackson', 'Smith', 'Brown', 'Martinez'],
'age': [34, 29, 37, 52, 26, 32]},
)
df.head()
df_temp = df.loc[2].values
print(df_temp)
which will give you the output you probably want as
['Marry' 'Jackson' 37]
I have a big pandas dataframe (about 150000 rows). I have tried method groupby('id') but in returns group tuples. I need just a list of dataframes, and then I convert them into np array batches to put into an autoencoder (like this https://www.datacamp.com/community/tutorials/autoencoder-keras-tutorial but 1D)
So I have a pandas dataset :
data = {'Name': ['Tom', 'Joseph', 'Krish', 'John', 'John', 'John', 'John', 'Krish'], 'Age': [20, 21, 19, 18, 18, 18, 18, 18],'id': [1, 1, 2, 2, 3, 3, 3, 3]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
df.head(10)
I need the same output (just a list of pandas dataframe). Also, i need a list of unsorted lists, it is important, because its time series.
data1 = {'Name': ['Tom', 'Joseph'], 'Age': [20, 21],'id': [1, 1]}
data2 = {'Name': ['Krish', 'John', ], 'Age': [19, 18, ],'id': [2, 2]}
data3 = {'Name': ['John', 'John', 'John', 'Krish'], 'Age': [18, 18, 18, 18],'id': [3, 3, 3, 3]}
pd_1 = pd.DataFrame(data1)
pd_2 = pd.DataFrame(data2)
pd_3 = pd.DataFrame(data3)
array_list = [pd_1,pd_2,pd_3]
array_list
How can I split dataframe ?
Or you can TRY:
array_list = df.groupby(df.id.values).agg(list).to_dict('records')
Output:
[{'Name': ['Tom', 'Joseph'], 'Age': [20, 21], 'id': [1, 1]},
{'Name': ['Krish', 'John'], 'Age': [19, 18], 'id': [2, 2]},
{'Name': ['John', 'John', 'John', 'Krish'],
'Age': [18, 18, 18, 18],
'id': [3, 3, 3, 3]}]
UPDATE:
If you need a dataframe list:
df_list = [g for _,g in df.groupby('id')]
#OR
df_list = [pd.DataFrame(i) for i in df.groupby(df.id.values).agg(list).to_dict('records')]
To reset the index of each dataframe:
df_list = [g.reset_index(drop=True) for _,g in df.groupby('id')]
Let us group on id and using to_dict with orientation list prepare records per id
[g.to_dict('list') for _, g in df.groupby('id', sort=False)]
[{'Name': ['Tom', 'Joseph'], 'Age': [20, 21], 'id': [1, 1]},
{'Name': ['Krish', 'John'], 'Age': [19, 18], 'id': [2, 2]},
{'Name': ['John', 'John', 'John', 'Krish'], 'Age': [18, 18, 18, 18], 'id': [3, 3, 3, 3]}]
I am not sure about your need but does something like this works for you?
df = df.set_index("id")
[df.loc[i].to_dict("list") for i in df.index.unique()]
or if you really want to keep your index in your list:
[df.query(f"id == {i}").to_dict("list") for i in df.id.unique()]
If you want to create new DataFrames storing the values:
(Previous answers are more relevant if you want to create a list)
This can be solved by iterating over each id using a for loop and create a new dataframe every loop.
I refer you to #40498463 and the other answers for the usage of the groupby() function. Please note that I have changed the name of the id column to Id.
for Id, df in df.groupby("Id"):
str1 = "df"
str2 = str(Id)
new_name = str1 + str2
exec('{} = pd.DataFrame(df)'.format(new_name))
Output:
df1
Name Age Id
0 Tom 20 1
1 Joseph 21 1
df2
Name Age Id
2 Krish 19 2
3 John 18 2
df3
Name Age Id
4 John 18 3
5 John 18 3
6 John 18 3
7 Krish 18 3
Okay, so I have a dataframe. Each element of column 'z' is a list of dictionaries.
For example, row two of column 'z' looks like this:
[ {'name': 'Tom', 'hw': [180, 79]},
{'name': 'Mark', 'hw': [119, 65]} ]
I would like it to just contain the 'name' values, in this case the element would be Tom and Mark without the 'hw' values.
I've tried converting it into a list, then removing every second element, but I lost which values came from the same row. Not every row has the same number of elements in it, some have 2 names, some might have 4.
One way using list comprehension with dict.get:
Example
df = pd.DataFrame({'z': [[{'name': 'Tom', 'hw': [180, 79]},
{'name': 'Mark', 'hw': [119, 65]}]]})
df['name'] = [[d.get('name') for d in x] for x in df['z']]
[out]
z name
0 [{'name': 'Tom', 'hw': [180, 79]}, {'name': 'M... [Tom, Mark]
Let us use pandas get using series.str.get
df['name']=df.col.str.get('name')
df
col name
0 {'name': 'Tom', 'hw': [180, 79]} Tom
1 {'name': 'Mark', 'hw': [119, 65]} Mark
I have a dataframe. I would like some of the data to be converted to a list of list. The columns I'm interested in are the index, Name, and Births. My code works, but it seems inefficient and for some reason the letter L is added to the end of each index.
My code:
import pandas as pd
data = [['Bob', 968, 'Male'], ['Jessica', 341, 'Female'], ['Mary', 77, 'Female'], ['John', 578, 'Male'], ['Mel', 434, 'Female']]
headers = ['Names', 'Births', 'Gender']
df = pd.DataFrame(data = data, columns=headers)
indexes = df.index.values.tolist()
mylist = [[x] for x in indexes]
for x in mylist:
x.extend([df.ix[x[0],'Names'], df.ix[x[0],'Births']])
print mylist
Desired Output:
[[0, 'Bob', 968], [1, 'Jessica', 341], [2, 'Mary', 77], [3, 'John', 578], [4, 'Mel', 434]]
Why not just use .values.tolist() as you mentioned?
import pandas as pd
# your data
# =================================================
data = [['Bob', 968, 'Male'], ['Jessica', 341, 'Female'], ['Mary', 77, 'Female'], ['John', 578, 'Male'], ['Mel', 434, 'Female']]
headers = ['Names', 'Births', 'Gender']
df = pd.DataFrame(data = data, columns=headers)
# nested list
# ============================
df.reset_index()[['index', 'Names', 'Births']].values.tolist()
Out[46]:
[[0, 'Bob', 968],
[1, 'Jessica', 341],
[2, 'Mary', 77],
[3, 'John', 578],
[4, 'Mel', 434]]
Ok, this works (based on Jianxun Li's answer and comments):
import pandas as pd
# Data
data = [['Bob', 968, 'Male'], ['Jessica', 341, 'Female'], ['Mary', 77, 'Female'], ['John', 578, 'Male'], ['Mel', 434, 'Female']]
headers = ['Names', 'Births', 'Gender']
df = pd.DataFrame(data = data, columns=headers)
# Output
print df.reset_index()[['index', 'Names', 'Births']].values.astype(str).tolist()
Thank you Jianxun Li, this also helped me :-)
In general, one can use the following to transform the complete dataframe into a list of lists (which is what I needed):
df.values.astype(str).tolist()