I have csv dataframe like -
print(test.loc[1])
outlook sunny
temperature mild
humidity normal
wind weak
playtennis yes
Name: 1, dtype: object
I want to convert this into something like -
outlook.sunny.temperature.mild.humidity.normal.wind.weak.playtennis.yes
How can I achieve this?
Let ser = test.loc[1].
You can convert this series to a dictionary with .to_dict(),
Then convert the dictionary into a list of key/value tuples with .items(),
Then merge the tuples into one list with itertools.chain, and finally
Join the list items with periods with .join().
Python code:
from itertools import chain
'.'.join(chain.from_iterable(ser.to_dict().items()))
#'outlook.sunny.temperature.mild.humidity.normal....yes'
This is one way using a list comprehension and str.join:
import pandas as pd
test = pd.DataFrame([['sunny', 'mild', 'normal', 'weak', 'yes']],
columns=['outlook', 'temperature', 'humidity', 'wind', 'playtennis'])
res = '.'.join([k+'.'+test[k].iloc[0] for k in test])
print(res)
'outlook.sunny.temperature.mild.humidity.normal.wind.weak.playtennis.yes'
Alternatively, you can zip column names and dataframe values:
res = '.'.join(i+'.'+j for i, j in zip(test, test.values[0]))
Related
I'm new to pandas and I want to know if there is a way to map a column of lists in a dataframe to values stored in a dictionary.
Lets say I have the dataframe 'df' and the dictionary 'dict'. I want to create a new column named 'Description' in the dataframe where I can see the description of the Codes shown. The values of the items in the column should be stored in a list as well.
import pandas as pd
data = {'Codes':[['E0'],['E0','E1'],['E3']]}
df = pd.DataFrame(data)
dic = {'E0':'Error Code', 'E1':'Door Open', 'E2':'Door Closed'}
Most efficient would be to use a list comprehension.
df['Description'] = [[dic.get(x, None) for x in l] for l in df['Codes']]
output:
Codes Description
0 [E0] [Error Code]
1 [E0, E1] [Error Code, Door Open]
2 [E3] [None]
If needed you can post-process to replace the empty lists with NaN, use an alternative list comprehension to avoid non-matches: [[dic[x] for x in l if x in dic] for l in df['Codes']], but this would probably be ambiguous if you have one no-match among several matches (which one is which?).
{'labels': ['travel', 'dancing', 'cooking'],
'scores': [0.9938651323318481, 0.0032737774308770895, 0.002861034357920289],
'sequence': 'one day I will see the world'}
i have this a df['prediction'] column i want to split this result into three different column as df['travel'],df['dancing'],df['cooking'] and their respective scores i am sorry if the question is not appropriaterequired result
required result
you can edit your data as a list of dicts and each dict is row data
and at the end, you can you set_index you select the index
import pandas as pd
list_t = [{
"travel":0.9938651323318481,
"dancing": 0.0032737774308770895,
"cooking":0.002861034357920289,
"sequence":'one day I will see the world'
}]
df = pd.DataFrame(list_t)
df.set_index("sequence")
#output
travel dancing cooking
sequence
one day I will see the world 0.993865 0.003274 0.002861
What you can do is iterate over this dict and make another dictionary
say s is the source dictionary and x is the new dictionary that you want
x = {}
x['sequence']=s['sequence']
for i, l in enumerate(s['labels']):
x[l] = s['scores'][i]
This should solve your problem.
I am new to pandas, and I would appreciate any help. I have a pandas dataframe that comes from csv file. The data contains 2 columns : dates and cashflows. Is it possible to convert these list into list comprehension with tuples inside the list? Here how my dataset looks like:
2021/07/15 4862.306832
2021/08/15 3474.465543
2021/09/15 7121.260118
The desired output is :
[(2021/07/15, 4862.306832),
(2021/08/15, 3474.465543),
(2021/09/15, 7121.260118)]
use apply with lambda function
data = {
"date":["2021/07/15","2021/08/15","2021/09/15"],
"value":["4862.306832","3474.465543","7121.260118"]
}
df = pd.DataFrame(data)
listt = df.apply(lambda x:(x["date"],x["value"]),1).tolist()
Output:
[('2021/07/15', '4862.306832'),
('2021/08/15', '3474.465543'),
('2021/09/15', '7121.260118')]
I would like to construct a MultiIndex DataFrame from a deeply-nested dictionary of the form
md = {'50': {'100': {'col1': ('0.100',
'0.200',
'0.300',
'0.400'),
'col2': ('6.263E-03',
'6.746E-03',
'7.266E-03',
'7.825E-03')},
'101': {'col1': ('0.100',
'0.200',
'0.300',
'0.400'),
'col2': ('6.510E-03',
'7.011E-03',
'7.553E-03',
'8.134E-03')}
'102': ...
}
'51': ...
}
I've tried
df = pd.DataFrame.from_dict({(i,j): md[i][j][v] for i in md.keys() for j in md[i].keys() for v in md[i][j]}, orient='index')
following Construct pandas DataFrame from items in nested dictionary, but I get a DataFrame with 1 row and many columns.
Bonus:
I'd also like to label the MultiIndex keys and the columns 'col1' and 'col2', as well as convert the strings to int and float, respectively.
How can I reconstruct my original dictionary from the dataframe?
I tried df.to_dict('list').
Check out this answer: https://stackoverflow.com/a/24988227/9404057. This method unpacks the keys and values of the dictionary, and reforms the data into an easily processed format for multiindex dataframes. Note that if you are using python 3.5+, you will need to use .items() rather than .iteritems() as shown in the linked answer:
>>>>import pandas as pd
>>>>reform = {(firstKey, secondKey, thirdKey): values for firstKey, middleDict in md.items() for secondKey, innerdict in middleDict.items() for thirdKey, values in innerdict.items()}
>>>>df = pd.DataFrame(reform)
To change the data type of col1 and col to int and float, you can then use pandas.DataFrame.rename() and specify any values you want:
df.rename({'col1':1, 'col2':2.5}, axis=1, level=2, inplace=True)
Also, if you'd rather have the levels on the index rather than the columns, you can also use pandas.DataFrame.T
If you wanted to reconstruct your dictionary from this MultiIndex, you could do something like this:
>>>>md2={}
>>>>for i in df.columns:
if i[0] not in md2.keys():
md2[i[0]]={}
if i[1] not in md2[i[0]].keys():
md2[i[0]][i[1]]={}
md2[i[0]][i[1]][i[2]]=tuple(df[i[0]][i[1]][i[2]].values)
I have a pandas dataframe, and one of the columns has date values as strings (like "2014-01-01"). I would like to define a different list for each year that is present in the column, where the elements of the list are the index of the row in which the year is found in the dataframe.
Here's what I've tried:
import pandas as pd
df = pd.DataFrame(["2014-01-01","2013-01-01","2014-02-02", "2012-08-09"])
df = df.values.flatten().tolist()
for i in range(len(df)):
df[i] = df[i][0:4]
y2012 = []; y2013 = []; y2014 = []
for i in range(len(df)):
if df[i] == "2012":
y2012.append(i)
elif df[i] == "2013":
y2013.append(i)
else:
y2014.append(i)
print y2014 # [0, 2]
print y2013 # [1]
print y2012 # [3]
Does anyone know a better way of doing this? This way works fine, but I have a lot of years, so I have to manually define each variable and then run it through the for loop, and so the code gets really long. I was trying to use groupby in pandas, but I couldn't seem to get it to work.
Thank you so much for any help!
Scan through the original DataFrame values and parse out the year. Given, that, add the index into a defaultdict. That is, the following code creates a dict, one item per year. The value for a specific year is a list of the rows in which the year is found in the dataframe.
A defaultdict sounds scary, but it's just a dictionary. In this case, each value is a list. If we append to a nonexistent value, then it gets spontaneously created. Convenient!
source
from collections import defaultdict
import pandas as pd
df = pd.DataFrame(["2014-01-01","2013-01-01","2014-02-02", "2012-08-09"])
# df = df.values.flatten().tolist()
dindex = defaultdict(list)
for index,dateval in enumerate(df.values):
year = dateval[0].split('-')[0]
dindex[year].append(index)
assert dindex == {'2014': [0, 2], '2013': [1], '2012': [3]}
print dindex
output
defaultdict(<type 'list'>, {'2014': [0, 2], '2013': [1], '2012': [3]})
Pandas is awesome for this kind of thing, so don't be so hasty to turn your dataframe back into lists right away.
The trick here lies in the .apply() method and the .groupby() method.
Take a dataframe that has strings with ISO formatted dates in it
parse the column containing the date strings into datetime objects
Create another column of years using the datetime.year
attribute of the items in the datetime column
Group the dataframe by the new year column
Iterate over the groupby object and extract your column
Here's some code for you to play with and grok:
import pandas
import dateutil
df = pd.DataFrame({'strings': ["2014-01-01","2013-01-01","2014-02-02", "2012-08-09"]})
df['datetimes'] = df['strings'].apply(dateutil.parser.parse)
df['year'] = df['datetimes'].apply(lambda x: x.year)
grouped_data= df.groupby('year')
lists_by_year = {}
for year, data in grouped_data
lists_by_year [year] = list(data['strings'])
Which gives us a dictionary of lists, where the key is the year and the contents is a list of strings with that year.
print lists_by_year
{2012: ['2012-08-09'],
2013: ['2013-01-01'],
2014: ['2014-01-01', '2014-02-02']}
As it turns out
df.groupby('A') #is just syntactical sugar for df.groupby(df['A'])
This means that all you have to do to group by year is leverage the apply function and re-work the syntax
Solution
getYear = lambda x:x.split("-")[0]
yearGroups = df.groupby(df["dates"].apply(getYear))
Output
for key,group in yearGroups:
print key
2012
2013
2014