I am importing data from csv file and storing it in pandas dataframe. Here is the image from the csv file:
For each row I want a string like this:
Here is the code I am using to import data from the csv file and storing it in the dataframe:
import csv
import pandas as pd
filename ="../Desktop/venkat.csv"
df = pd.read_table(filename,sep=" ")
How can I achieve this?
Consider the dataframe df
df = pd.DataFrame(np.arange(10).reshape(-1, 5), columns=list('ABCDE'))
A B C D E
0 0 1 2 3 4
1 5 6 7 8 9
You can get a series of json strings for each row
df.apply(pd.Series.to_json, 1)
0 {"A":0,"B":1,"C":2,"D":3,"E":4}
1 {"A":5,"B":6,"C":7,"D":8,"E":9}
I think it's better to use a dict to save your data with to_dict:
df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (df)
A B C D E F
0 1 4 7 1 5 7
1 2 5 8 3 3 4
2 3 6 9 5 6 3
#select some row - e.g. with index 2
print (df.loc[2])
A 3
B 6
C 9
D 5
E 6
F 3
Name: 2, dtype: int64
d = df.loc[2].to_dict()
print (d)
{'E': 6, 'B': 6, 'F': 3, 'A': 3, 'C': 9, 'D': 5}
print (d['A'])
3
If ordering is important use OrderedDict:
from collections import OrderedDict
print (OrderedDict(df.loc[2]))
OrderedDict([('A', 3), ('B', 6), ('C', 9), ('D', 5), ('E', 6), ('F', 3)])
If you need all values in columns use DataFrame.to_dict:
d = df.to_dict(orient='list')
print (d)
{'E': [5, 3, 6], 'B': [4, 5, 6], 'F': [7, 4, 3],
'A': [1, 2, 3], 'C': [7, 8, 9], 'D': [1, 3, 5]}
print (d['A'])
[1, 2, 3]
d = df.to_dict(orient='index')
print (d)
{0: {'E': 5, 'B': 4, 'F': 7, 'A': 1, 'C': 7, 'D': 1},
1: {'E': 3, 'B': 5, 'F': 4, 'A': 2, 'C': 8, 'D': 3},
2: {'E': 6, 'B': 6, 'F': 3, 'A': 3, 'C': 9, 'D': 5}}
#get value in row 2 and column A
print (d[2]['A'])
3
import csv
csvfile = open('test.csv','r')
csvFileArray = []
for row in csv.reader(csvfile, delimiter = '\t'):
csvFileArray.append(row)
header =csvFileArray[0][0].split(',')
str_list=[]
for each in csvFileArray[1:]:
data= each[0].split(',')
local_list = []
for i in range(len(data)):
str_data =' '.join([header[i], '=', data[i]])
local_list.append(str_data)
str_list.append(local_list)
print str_list
Assume your csv is a comma delimited file.
import pandas as pd
filename ="../Desktop/venkat.csv"
df = pd.read_csv(filename,sep=",")
output = df.to_dict(orient='records')
print output #this would yield a list of dictionaries
Related
I have list of identical dictionaries:
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
I need to get something like this:
a = [1, 4, 7]
b = [2, 5, 8]
c = [3, 6, 9]
I know how to do in using for .. in .., but is there way to do it without looping?
If i do
a, b, c = zip(*my_list)
i`m getting
a = ('a', 'a', 'a')
b = ('b', 'b', 'b')
c = ('c', 'c', 'c')
Any solution?
You need to extract all the values in my_list.You could try:
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
a, b, c = zip(*map(lambda d: d.values(), my_list))
print(a, b, c)
# (1, 4, 7) (2, 5, 8) (3, 6, 9)
Pointed out by #Alexandre,This work only when the dict is ordered.If you couldn't make sure the order, consider the answer of yatu.
You will have to loop to obtain the values from the inner dictionaries. Probably the most appropriate structure would be to have a dictionary, mapping the actual letter and a list of values. Assigning to different variables is usually not the best idea, as it will only work with the fixed amount of variables.
You can iterate over the inner dictionaries, and append to a defaultdict as:
from collections import defaultdict
out = defaultdict(list)
for d in my_list:
for k,v in d.items():
out[k].append(v)
print(out)
#defaultdict(list, {'a': [1, 4, 7], 'b': [2, 5, 8], 'c': [3, 6, 9]})
Pandas DataFrame has just a factory method for this, so if you already have it as a dependency or if the input data is large enough:
import pandas as pd
my_list = ...
df = pd.DataFrame.from_rows(my_list)
a = list(df['a']) # df['a'] is a pandas Series, essentially a wrapped C array
b = list(df['b'])
c = list(df['c'])
Please find the code below. I believe that the version with a loop is much easier to read.
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
# we assume that all dictionaries have the sames keys
a, b, c = map(list, map(lambda k: map(lambda d: d[k], my_list), my_list[0]))
print(a,b,c)
Given a Pandas df:
Name V1 V2
a 1 2
a 3 4
a 5 6
b 7 8
b 9 10
c 11 12
...
How to reform it into a complex dictionary of format:
{a: [(1,2), (3,4), (5,6)], b: [(7,8), (9,10)], c: [(11,12)], ...}
Please note that values of the same name also needs to be combined across rows; like "a" has three rows to be combined as one signel value array of number pairs.
Try:
df['tup'] = df[['V1','V2']].agg(tuple, axis=1)
df.groupby('Name')['tup'].agg(list).to_dict()
Output:
{'a': [(1, 2), (3, 4), (5, 6)], 'b': [(7, 8), (9, 10)], 'c': [(11, 12)]}
If you don't mind the results being list instead of tuple, you can also use groupby in a dict comprehension:
d = {group:items[["V1","V2"]].values.tolist() for group, items in df.groupby("Name")}
print (d)
{'a': [[1, 2], [3, 4], [5, 6]], 'b': [[7, 8], [9, 10]], 'c': [[11, 12]]}
Check this out, specific to columns
data_frame = {
"Name": ["a", "a", "a", "b", "b", "c"],
"V1": [1, 3, 5, 7, 9, 11],
"V2": [2, 4, 6, 8, 10, 12]
}
df = pd.DataFrame(data_frame, columns=['Name', 'V1', 'V2'])
data_dict = {}
for i, row in df.iterrows():
data_dict[row["Name"]] = [row['V1'], row['V2']]
print(data_dict)
Output be like
{'a': [5, 6], 'b': [9, 10], 'c': [11, 12]}
Assuming the DataFrame variable be data_frame
print(data_frame)
Name V1 V2
a 1 2
a 3 4
a 5 6
b 7 8
b 9 10
c 11 12
data_dict = {}
for data in data_frame.values:
print(data)
data_dict[data[0]] = [j for j in data[1:]]
print(data_dict)
Also, there are some methods on the data frame object like to_dict() variants. You can use to_dict('records') also and manipulate accordingly.
Referece: Data Frame Dict
Suppose, we have the following DataFrame:
dt = {'A': ['a','a','a','a','a','a','b','b','c'],
'B': ['x','x','x','y','y','z','x','z','y'],
'C': [10, 14, 15, 11, 10, 14, 14, 11, 10],
'D': [1, 3, 2, 1, 3, 5, 1, 4, 2]}
df = pd.DataFrame(data=dt)
I want to extract certain rows based on a dictionary where keys are column names and values are row values. For example:
d = {'A': 'a', 'B': 'x'}
d = {'A': 'a', 'B': 'y', 'C': 10}
d = {'A': 'b', 'B': 'z', 'C': 11, 'D': 4}
It can be done using loop (consider the last dictionary):
for iCol in d:
df = df[df[iCol] == d[iCol]]
Out[215]:
A B C D
7 b z 11 4
Since DataFrame is expected to be pretty large and it may have many columns to select on, I am looking for the efficient way to solve the problem without using for loop to iterate the dataframe.
Use the below, Make the dict a Series:
print(df[(df[list(d)] == pd.Series(d)).all(axis=1)])
Output:
A B C D
7 b z 11 4
Say we have a Python Pandas DataFrame:
In[1]: df = pd.DataFrame({'A': [1, 1, 2, 3, 5],
'B': [5, 6, 7, 8, 9]})
In[2]: print(df)
A B
0 1 5
1 1 6
2 2 7
3 3 8
4 5 9
I want to change rows that match a certain condition. I know that this can be done via direct assignment:
In[3]: df[df.A==1] = pd.DataFrame([{'A': 0, 'B': 5},
{'A': 0, 'B': 6}])
In[4]: print(df)
A B
0 0 5
1 0 6
2 2 7
3 3 8
4 5 9
My question is: Is there an equivalent solution to the above assignment that would return a new DataFrame with the rows changed, i.e. a stateless solution? I'm looking for something like pandas.DataFrame.assign but which acts on rows instead of columns.
DataFrame.copy
df2 = df.copy()
df2[df.A == 1] = pd.DataFrame([{'A': 0, 'B': 5}, {'A': 0, 'B': 6}])
DataFrame.mask + fillna
m = df.A == 1
fill_df = pd.DataFrame([{'A': 0, 'B': 5}, {'A': 0, 'B': 6}], index=df.index[m])
df2 = df.mask(m).fillna(fill_df)
Say I have a dictionary that looks like this:
dictionary = {'A' : {'a': [1,2,3,4,5],
'b': [6,7,8,9,1]},
'B' : {'a': [2,3,4,5,6],
'b': [7,8,9,1,2]}}
and I want a dataframe that looks something like this:
A B
a b a b
0 1 6 2 7
1 2 7 3 8
2 3 8 4 9
3 4 9 5 1
4 5 1 6 2
Is there a convenient way to do this? If I try:
In [99]:
DataFrame(dictionary)
Out[99]:
A B
a [1, 2, 3, 4, 5] [2, 3, 4, 5, 6]
b [6, 7, 8, 9, 1] [7, 8, 9, 1, 2]
I get a dataframe where each element is a list. What I need is a multiindex where each level corresponds to the keys in the nested dict and the rows corresponding to each element in the list as shown above. I think I can work a very crude solution but I'm hoping there might be something a bit simpler.
Pandas wants the MultiIndex values as tuples, not nested dicts. The simplest thing is to convert your dictionary to the right format before trying to pass it to DataFrame:
>>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.items() for innerKey, values in innerDict.items()}
>>> reform
{('A', 'a'): [1, 2, 3, 4, 5],
('A', 'b'): [6, 7, 8, 9, 1],
('B', 'a'): [2, 3, 4, 5, 6],
('B', 'b'): [7, 8, 9, 1, 2]}
>>> pandas.DataFrame(reform)
A B
a b a b
0 1 6 2 7
1 2 7 3 8
2 3 8 4 9
3 4 9 5 1
4 5 1 6 2
[5 rows x 4 columns]
You're looking for the functionality in .stack:
df = pandas.DataFrame.from_dict(dictionary, orient="index").stack().to_frame()
# to break out the lists into columns
df = pandas.DataFrame(df[0].values.tolist(), index=df.index)
dict_of_df = {k: pd.DataFrame(v) for k,v in dictionary.items()}
df = pd.concat(dict_of_df, axis=1)
Note that the order of columns is lost for python < 3.6
This recursive function should work:
def reform_dict(dictionary, t=tuple(), reform={}):
for key, val in dictionary.items():
t = t + (key,)
if isinstance(val, dict):
reform_dict(val, t, reform)
else:
reform.update({t: val})
t = t[:-1]
return reform
If lists in the dictionary are not of the same lenght, you can adapte the method of BrenBarn.
>>> dictionary = {'A' : {'a': [1,2,3,4,5],
'b': [6,7,8,9,1]},
'B' : {'a': [2,3,4,5,6],
'b': [7,8,9,1]}}
>>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.items() for innerKey, values in innerDict.items()}
>>> reform
{('A', 'a'): [1, 2, 3, 4, 5],
('A', 'b'): [6, 7, 8, 9, 1],
('B', 'a'): [2, 3, 4, 5, 6],
('B', 'b'): [7, 8, 9, 1]}
>>> pandas.DataFrame.from_dict(reform, orient='index').transpose()
>>> df.columns = pd.MultiIndex.from_tuples(df.columns)
A B
a b a b
0 1 6 2 7
1 2 7 3 8
2 3 8 4 9
3 4 9 5 1
4 5 1 6 NaN
[5 rows x 4 columns]
This solution works for a larger dataframe, it fits what was requested
cols = df.columns
int_cols = len(cols)
col_subset_1 = [cols[x] for x in range(1,int(int_cols/2)+1)]
col_subset_2 = [cols[x] for x in range(int(int_cols/2)+1, int_cols)]
col_subset_1_label = list(zip(['A']*len(col_subset_1), col_subset_1))
col_subset_2_label = list(zip(['B']*len(col_subset_2), col_subset_2))
df.columns = pd.MultiIndex.from_tuples([('','myIndex'),*col_subset_1_label,*col_subset_2_label])
OUTPUT
A B
myIndex a b c d
0 0.159710 1.472925 0.619508 -0.476738 0.866238
1 -0.665062 0.609273 -0.089719 0.730012 0.751615
2 0.215350 -0.403239 1.801829 -2.052797 -1.026114
3 -0.609692 1.163072 -1.007984 -0.324902 -1.624007
4 0.791321 -0.060026 -1.328531 -0.498092 0.559837
5 0.247412 -0.841714 0.354314 0.506985 0.425254
6 0.443535 1.037502 -0.433115 0.601754 -1.405284
7 -0.433744 1.514892 1.963495 -2.353169 1.285580