List into Dataframe - python

I have a list:
l = [{'County': 'SentenceCase'}, {'Postcode': 'UpperCase'}]
type(l) equals <class 'list'>
If I load it into a dataframe.
df = pd.DataFrame(l)
The first row then ends up as the column names.
County Postcode
0 SentenceCase NaN
1 NaN UpperCase
I've tried Header=None, etc. but nothing seems to work.
I would want the dataframe to be
Header1 Header2
0 County SentenceCase
1 Postcode UpperCase

I think you are just using the wrong structure. instead of a list of dictionaries, you should have a list of lists to load into the DataFrame:
[['County', 'SentenceCase'], ['Postcode', 'UpperCase']]
If for some reason you require your original structure, you could use something like this:
new_l = []
for item in l:
for key, value in item.items() :
new_l.append([key, value])
new_df = pd.DataFrame(new_l)
new_df.columns = ['Header1', 'Header2']
new_df
which will give:
Header1 Header2
0 County SentenceCase
1 Postcode UpperCase

Every dictionary in the list represent a row in the table, the key is the Header.
It should look like that:
l = [{'Header1': 'County','Header2':'SentenceCase'},
{'Header1': 'Postcode ','Header2':'UpperCase'}]

Related

Unlist dictionaries from a list

I have a very large list, so i will use the below as a reproducible example. I would like to unlist the following so i can use the keys of the dictionaries as columns to a dataframe.
[{'message':'Today is a sunny day.','comments_count':'45','id':
'1401305690071546_11252160039985938','created_time': '2020-02-29T13:43:46+0000'},
{'message':'Today is a cloudy day.','comments_count':'47','id':
'1401305690073586_11252160039985938','created_time': '2020-03-29T13:43:46+0000'}]
Desired output will be the following columns as a panda dataframe:
message comments_count id created_time
If it’s a list of dictionaries that you want to transform to data-frame you can just do the following:
df1 = pd.DataFrame(l)
# or
df2 = pd.DataFrame.from_dict(l)
the output of both use cases is:
print(df2)
print(df2.columns)
message ... created_time
0 Today is a sunny day. ... 2020-02-29T13:43:46+0000
1 Today is a cloudy day. ... 2020-03-29T13:43:46+0000
[2 rows x 4 columns]
Index(['message', 'comments_count', 'id', 'created_time'], dtype='object')
If you want to put all of the data into the dataframe:
import pandas as pd
my_container = [{'message':'Today is a sunny day.','comments_count':'45','id': '1401305690071546_11252160039985938','created_time': '2020-02-29T13:43:46+0000'}, {'message':'Today is a cloudy day.','comments_count':'47','id': '1401305690073586_11252160039985938','created_time': '2020-03-29T13:43:46+0000'}]
df = pd.DataFrame(my_container)
If you want an empty dataframe with the correct columns:
columns = set()
for d in my_container:
columns.update(d.keys())
df = pd.DataFrame(columns=columns)
You can iterate through the list and find the type() of each item
dictList = []
for i in myList:
if type(i) == dict:
dictList.append(i)
myList.remove(i)

Passing a defaultdict into a df

I am trying to import a txt file with states and universities listed in it. I have utilized defaultdict to import the txt and parse it to where I have a list whereby universities are attached to the state. How do I then put the data into a pandas dataframe with two columns (State, RegionName)? Nothing thus far has worked.
I built an empty dataframe with:
ut = pd.DataFrame(columns = {'State', 'RegionName'})
and have tried a couple of different methods but none have worked.
with open('ut.txt') as ut:
for line in ut:
if '[edit]' in line:
a = line.rstrip().split('[')
d[a[0]].append(a[1])
else:
b = line.rstrip().split(' ')
d[a[0]].append(b[0])
continue
This gets me a nice list:
defaultdict(<class 'list'>, {'State': ['edit]', 'School', 'School2', 'School3', 'School4', 'School5', 'School6', 'School7', 'School8'],
The edit] is part of the original txt file signifying a state. Everything after are the towns the schools are in.
I'd like to build a nice 2 column dataframe where state is the left column and all schools on the right...
Considering the following dictionary
data_dict = {"a": 1, "b": 2, "c": 3}
Considering that from that dictionary you want to create a dataframe and name the columns State and RegionName, respectively, the following will do the work
data_items = data_dict.items()
data_list = list(data_items)
df = pd.DataFrame(data_list, columns = ["State", "RegionName"])
Which will get
[In]: print(df)
[Out]:
State RegionName
0 a 1
1 b 2
2 c 3
If one doesn't pass the name of the columns when creating the dataframe, considering that the columns have the name a and b one can rename the columns with pandas.DataFrame.rename
df = df.rename(columns = {"a": "State", "b": "RegionName"})
If the goal is solely reading a txt file with a structure like this
column1 column2
1 2
3 4
5 6
Then the following will do the work
colnames=['State', 'RegionName']
df = pd.read_csv("file.txt", colnames, header=None)
Note that if the name of the columns is already the one one wants use just the following
df = pd.read_csv("file.txt")

Python convert list to string in the dataframe

I have bunch of list and string like texts in the cell value of the pandas data frame. I am trying to convert list to string, I am able to convert list to string, but its splitting the string as well. How do I only apply this logic if the cell contains list [] in the particular column?
raw_data = {'Name': [['\'John Smith\''], ['\'Jane Doe\'']],
'id': [['\'A1005\'','\'A1006\''], 'A200,A400,A500']}
dfRaw = pd.DataFrame(raw_data, columns = ['Name','id'])
dfRaw['Name'] = dfRaw['Name'].astype(str)
Data
Name id
0 ["'John Smith'"] ['A1005', 'A1006']
1 ["'Jane Doe'"] A200,A400,A500
Need output like this:
Name id
0 ["'John Smith'"] 'A1005','A1006'
1 ["'Jane Doe'"] A200,A400,A500
But the code below is splitting string cell values too.
dfRaw['id'] = dfRaw['id'].apply(lambda x: ','.join([str(i) for i in x]))
Name id
0 ["'John Smith'"] 'A1005','A1006'
1 ["'Jane Doe'"] A,2,0,0,,,A,4,0,0,,,A,5,0,0
You could use a list comprehension to generate a new list with the rows in id joining those entries that are lists using string.join.
You can check if an entry is a list using isinstance:
df['id'] = [','.join(i) if isinstance(i, list) else i for i in df['id']]
Output
Name id
0 ['John Smith'] A1005,A1006
1 ['Jane Doe'] A200,A400,A500

Creating pandas dataframes from nested json file that has lista

a picture on how the data look like
So, I have a json file with data, the file is really nested, I want to take only the words and create a new dataframe for each post id. Can anyone help with this?
You can use apply with list comprehension:
df = pd.DataFrame({'member_info.vocabulary':[[], [{'post_iD':'3913', 'word':'Twisters'},
{'post_iD':'3911', 'word':'articulate'}]]})
df['words'] = df['member_info.vocabulary'].apply(lambda x: [y.get('word') for y in x])
print (df)
member_info.vocabulary words
0 [] []
1 [{'post_iD': '3913', 'word': 'Twisters'}, {'po... [Twisters, articulate]
And if get one element lists only add str[0] for select first value of lists:
df = pd.DataFrame({'member_info.vocabulary':[[], [{'post_iD':'3913', 'word':'Twisters'}]]})
df['words'] = df['member_info.vocabulary'].apply(lambda x: [y.get('word') for y in x]).str[0]
print (df)
member_info.vocabulary words
0 [] NaN
1 [{'post_iD': '3913', 'word': 'Twisters'}] Twisters

create names of dataframes in a loop

I need to give names to previously defined dataframes.
I have a list of dataframes :
liste_verif = ( dffreesurfer,total,qcschizo)
And I would like to give them a name by doing something like:
for h in liste_verif:
h.name = str(h)
Would that be possible ?
When I'm testing this code, it's doesn't work : instead of considering h as a dataframe, python consider each column of my dataframe.
I would like the name of my dataframe to be 'dffreesurfer', 'total' etc...
You can use dict comprehension and map DataFrames by values in list L:
dffreesurfer = pd.DataFrame({'col1': [7,8]})
total = pd.DataFrame({'col2': [1,5]})
qcschizo = pd.DataFrame({'col2': [8,9]})
liste_verif = (dffreesurfer,total,qcschizo)
L = ['dffreesurfer','total','qcschizo']
dfs = {L[i]:x for i,x in enumerate(liste_verif)}
print (dfs['dffreesurfer'])
col1
0 7
1 8
print (dfs['total'])
col2
0 1
1 5

Categories

Resources