How to write a Dictionary to Excel in Python - python

I have the following dictionary in python that represents a From - To Distance Matrix.
graph = {'A':{'A':0,'B':6,'C':INF,'D':6,'E':7},
'B':{'A':INF,'B':0,'C':5,'D':INF,'E':INF},
'C':{'A':INF,'B':INF,'C':0,'D':9,'E':3},
'D':{'A':INF,'B':INF,'C':9,'D':0,'E':7},
'E':{'A':INF,'B':4,'C':INF,'D':INF,'E':0}
}
Is it possible to output this matrix into excel or to a csv file so that it has the following format? I have looked into using csv.writer and csv.DictWriter but can not produce the desired output.

You may create a pandas dataframe from that dict, then save to CSV or Excel:
import pandas as pd
df = pd.DataFrame(graph).T # transpose to look just like the sheet above
df.to_csv('file.csv')
df.to_excel('file.xls')

Probably not the most minimal result, but pandas would solve this marvellously (and if you're doing data analysis of any kind, I can highly recommend pandas!).
Your data is already in a perfectformat for bringing into a Dataframe
INF = 'INF'
graph = {'A':{'A':0,'B':6,'C':INF,'D':6,'E':7},
'B':{'A':INF,'B':0,'C':5,'D':INF,'E':INF},
'C':{'A':INF,'B':INF,'C':0,'D':9,'E':3},
'D':{'A':INF,'B':INF,'C':9,'D':0,'E':7},
'E':{'A':INF,'B':4,'C':INF,'D':INF,'E':0}
}
import pandas as pd
pd.DataFrame(graph).to_csv('OUTPUT.csv')
but the output you want is this Transposed, so:
pd.DataFrame(graph).T.to_csv('OUTPUT.csv')
where T returns the transpose of the Dataframe.

Related

Python write series object into csv file

I have the following two 'Series' objects, which I want to write into a normal csv file with two columns (date and value):
However, with my code (see below) I only managed to include the value, but not the date with it:
Help would be greatly appreciated!
Best,
Lena
You could create a Pandas Dataframe for each of the series, and use the export to csv function of the dataframe. Like this:
import pandas as pd
hailarea_south = pd.DataFrame(hailarea_south)
hailarea_south.to_csv("hailarea_south.csv")
I suggest using pandas.write_csv since it handles most of the things you're currently doing manually. For this approach, however, a DataFrame is easiest to handle, so we need to convert first.
import numpy as np
import pandas as pd
# setup mockdata for example
north = pd.Series({'2002-04-01': 0, '2002-04-02': 0, '2021-09-28': 167})
south = pd.Series({'2002-04-01': 0, '2002-04-02': 0, '2021-09-28': 0})
# convert series to DataFrame
df = pd.DataFrame(data={'POH>=80_south': south, 'POH>=80_north': north})
# save as csv
df.to_csv('timeseries_POH.csv', index_label='date')
output:
date,POH>=80_south,POH>=80_north
2002-04-01,0,0
2002-04-02,0,0
2021-09-28,0,167
In case you want different separators, quotes or the like, please refer to this documentation for further reading.

How do I copy pandas nested column to another DF?

We have some data in a Delta source which has nested structures. For this example we are focusing on a particular field from the Delta named status which has a number of sub-fields: commissionDate, decommissionDate, isDeactivated, isPreview, terminationDate.
In our transformation we currently read the Delta file in using PySpark, convert the DF to pandas using df.toPandas() and operate on this pandas DF using the pandas API. Once we have this pandas DF we would like to access its fields without using row iteration.
The data in Pandas looks like the following when queried using inventory_df["status"][0] (i.e. inventory_df["status"] is a list):
Row(commissionDate='2011-07-24T00:00:00+00:00', decommissionDate='2013-07-15T00:00:00+00:00', isDeactivated=True, isPreview=False, terminationDate=None)
We have found success using row iteration like:
unit_df["Active"] = [
not row["isDeactivated"] for row in inventory_df["status"]
]
but we have to use a row iteration each time we want to access data from the inventory_df. This is more verbose and is less efficient.
We would love to be able to do something like:
unit_df["Active"] = [
not inventory_df["status.isDeactivated"]
]
which is similar to the Spark destructuring approach, and allows accessing all of the rows at once but there doesn't seem to be equivalent pandas logic.
The data within PySpark has a format like status: struct<commissionDate:string,decommissionDate:string,isDeactivated:boolean,isPreview:boolean,terminationDate:string> and we can use the format mentioned above, selecting a subcolumn like df.select("status.isDeactivated").
How can this approach be done using pandas?
This may get you to where you think you are:
unit_df["Active"] = unit_df["Active"].apply(lambda x: pd.DataFrame(x.asDict()))
From here I would do:
unit_df = pd.concat([pd.concat(unif_df["Active"], ignore_index=True), unit_df], axis=1)
Which would get you a singular pd.DataFrame, now with columns for commissiondate, decomissiondate, etc.

Using pyMannKendall python package for testing the trend for gridded rainfall data

I am using pyMannKendall python package for testing the trend for gridded rainfall data. I am successful in carrying out trend analysis for all the grids but now I want to write the results of all the grids to a CSV file. I am new to coding and is facing a problem. Attached below is my code.
import pymannkendall as mk
import csv
import numpy as np
df = pd.read_csv("pr_1979_2018.csv", index_col = "Year")
for i in df.columns:
res = mk.hamed_rao_modification_test(df[i])
new_df=pd.DataFrame(data=a, index= ['trend','h','p','z', 'Tau',
's','var_s','slope', 'intercept'], columns=['stats'], dtype=None)
new_df.to_csv("Mk_2.csv")
On running this code I am getting only a single column in my CSV file, however I want results of all the columns in my resulting CSV file. Please help
You can convert your rows into columns in Python using Transpose() in Pandas before export.
Try this:
new_df = pd.DataFrame.transpose(new_df)
new_df.to_csv("Mk_2.csv",header=True)

split json within dataframe column into multiple column in python

actually I am fetching data from database
which has data as below CSV format
DS,SID,SID_T,E_DATE,S_DATA
TECH,312,TID,2021-01-03,"{""idx"":""msc"",""cid"":""f323d3"",""iname"":""master_in_science"",""mcap"":21.33,""sg"":[{""upt"":true,""dwt"":true,""high_low"":false}]}"
TECH,343,TID,2021-01-03,"{""idx"":""bsc"",""cid"":""k33d3"",""iname"":""bachelor_in_science"",""mcap"":81.33,""sg"":[{""upt"":false,""dwt"":true,""high_low"":false}]}"
TECH,554,TID,2021-01-03,"{""idx"":""ba"",""cid"":""3d3f32"",""iname"":""bachelor_in_art"",""mcap"":67.83,""sg"":[{""upt"":true,""dwt"":false,""high_low"":false}]}"
TECH,323,TID,2021-01-03,"{""idx"":""ma"",""cid"":""m23k66"",""iname"":""master_in_art"",""mcap"":97.13,""sg"":[{""upt"":true,""dwt"":true,""high_low"":true}]}"
dataframe look like this
i wanted to split the S_DATA column into multiple column
hence the output is like this
what i have tried
i tried to convert the dataframe into json + and trying to normalize the json using pandas.normalize
but i was unable to do the so
also S_DATA.sg values i.e.""sg"":[{""upt"":true,""dwt"":true,""high_low"":false} is also creating the trouble while the entire conversion process
Try this:
import pandas as pd
import json
df = pd.read_csv('data.csv')
df['S_DATA'] = df['S_DATA'].apply(lambda x: json.loads(x))
pd.concat([df[df.columns.difference(['S_DATA'])], pd.json_normalize(df.S_DATA)], axis=1)

Creating a sparse matrix from csv file data

Data in the csv file is of the format ("user_id", "group_id", "group_value").
"group_id" ranges from 0 to 100.
For a given user_id, it may be possible that group_value for a particular group_id is not available.
I want to create a sparse matrix representation of the above data. ("group_id_0", "group_id_1", ... , "group_id_100")
What is the best way to achieve this in Python?
Edit: Data is too big to iterate over.
You could do this with pandas.
Update 08.08.2018:
As noticed by Can Kavaklıoğlu, as_matrix() is deprecated as of pandas version 0.23.0. Changed to values.
import pandas as pd
df = pd.read_csv('csv_file.csv', names=['user_id', 'group_id', 'group_value'])
df = df.pivot(index='user_id', columns='group_id', values='group_value')
mat = df.values

Categories

Resources