Transform excel table looping through columns - python

I'm working on a Excel table transformation into another file for database upload. The tables usually looks like this:
The result should be a long list looking like this:
And this is the code I was trying to use...any thoughts?
import pandas as pd
from pandas import DataFrame
import numpy as np
df_excel = pd.read_excel('Excel_Forecast.xlsx', engine='openpyxl')
df_details = df_excel['Details']
df_base = []
for column in df_excel.columns[2:]:
df_base['Details'].append(df_excel['Details'])
df_base = DataFrame(df_base.append(df_excel[(column)]),columns=['Amount'])
df_base.to_excel('Temp.xlsx', index=False)

Use df.melt:
df.melt(['Group', 'Item'])

Related

How to handle .json fine in tabular form in python?

By using this code:
import pandas as pd
patients_df = pd.read_json('/content/students.json',lines=True)
patients_df.head()
the data are shown in tabular form look like this:
The main json file looks like this:
data = []
for line in open('/content/students.json', 'r'):
data.append(json.loads(line))
How can I get the score column of the table in an organized manner like column name Exam, Quiz, and Homework
Possible solution could be the following:
# pip install pandas
import pandas as pd
import json
def separate_column(row):
for e in row["scores"]:
row[e["type"]] = e["score"]
return row
with open('/content/students.json', 'r') as file:
data = [json.loads(line.rstrip()) for line in file]
df = pd.json_normalize(data)
df = df.apply(separate_column, axis=1)
df = df.drop(['scores'], axis=1)
print(df)

Filter data from a created list

I am working on my Covid data set from github and I would like to filter my data set with the countries that appear in the this EU_member list in csv format.
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv')
df = df[df.continent == 'Europe']
# From here I want to just pick those countries that appear in the following list:
EU_members= ['Austria','Italy','Belgium''Latvia','Bulgaria','Lithuania','Croatia','Luxembourg','Cyprus','Malta','Czechia','Netherlands','Denmark','Poland','Estonia',
'Portugal','Finland','Romania','France','Slovakia','Germany','Slovenia','Greece','Spain','Hungary','Sweden','Ireland']
# I have tried something like this but it is not what I expected:
df.location.str.find('EU_members')
You can use .isin():
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv')
EU_members= ['Austria','Italy','Belgium''Latvia','Bulgaria','Lithuania','Croatia','Luxembourg','Cyprus','Malta','Czechia','Netherlands','Denmark','Poland','Estonia',
'Portugal','Finland','Romania','France','Slovakia','Germany','Slovenia','Greece','Spain','Hungary','Sweden','Ireland']
df_out = df[df['location'].isin(EU_members)]
df_out.to_csv('data.csv')
Creates data.csv:

Having Trouble Writing Table to Excel with Python

Hi I am trying to create a table in excel using a dataframe from another excel spreadsheet and writing the table to a new one. I believe my code is correct but the table isn't writing to the new excel spreadsheet. Can someone take a look at my code and tell me what's wrong?
import xlsxwriter
import pandas as pd
import openpyxl as pxl
import xlsxwriter
import numpy as np
from openpyxl import load_workbook
path = '/Users/benlong/Downloads/unemployment.xlsx'
df = pd.read_excel(path)
rows = df.shape[0]
columns = df.shape[1]
wb = xlsxwriter.Workbook('UE2.xlsx')
ws = wb.add_worksheet('Sheet1')
ws.add_table(0,0,rows,columns, {'df': df})
wb.close()
You should convert your dataframe to list . By using df.values.tolist() and use the key data.
In your case , you also should set the header of df and avoid getting a nan value error.
eg:
import xlsxwriter as xlw
# while got NaN/Inf values from ur dataframe , u'll get a value of '#NUM!' instead in saved excel
wb = xlw.Workbook('UE2.xlsx',{'nan_inf_to_errors': True})
ws = wb.add_worksheet('Sheet1')
cell_range = xlw.utility.xl_range(0, 0, rows, columns-1)
header = [{'header': str(di)} for di in df.columns.tolist()]
ws.add_table(cell_range, {'header_row': True,'first_column': False,'columns':header,'data':df.values.tolist()})
wb.close()
Possible duplicate: How to use xlsxwriter .add_table() method with a dataframe?
You can try converting the dataframe to a list of lists and use the data keyword.
ws.add_table(0,0,rows,columns, {'data': df.values.T.tolist()})

Python, how to add a new column in excel

I am having below file(file1.xlsx) as input. In total i am having 32 columns in this file and almost 2500 rows. Just for example i am mentioning 5 columns in screen print
I want to edit same file with python and want output as (file1.xlsx)
it should be noted i am adding one column named as short and data is a kind of substring upto first decimal of data present in name(A) column of same excel.
Request you to please help
Regards
Kawaljeet
Here is what you need...
import pandas as pd
file_name = "file1.xlsx"
df = pd.read_excel(file_name) #Read Excel file as a DataFrame
df['short'] = df['Name'].str.split(".")[0]
df.to_excel("file1.xlsx")
hello guys i solved the problem with below code:
import pandas as pd
import os
def add_column():
file_name = "cmdb_inuse.xlsx"
os.chmod(file_name, 0o777)
df = pd.read_excel(file_name,) #Read Excel file as a DataFrame
df['short'] = [x.split(".")[0] for x in df['Name']]
df.to_excel("cmdb_inuse.xlsx", index=False)

how to subtract one column data from 2nd row to 1st row in csv files using python

I have CSV file with data like
data,data,10.00
data,data,11.00
data,data,12.00
I need to update this as
data,data,10.00
data,data,11.00,1.00(11.00-10.00)
data,data,12.30,1.30(12.30-11.00)
could you help me to update the csv file using python
You can use pandas and numpy. pandas reads/writes the csv and numpy does the calculations:
import pandas as pd
import numpy as np
data = pd.read_csv('test.csv', header=None)
col_data = data[2].values
diff = np.diff(col_data)
diff = np.insert(diff, 0, 0)
data['diff'] = diff
# write data to file
data.to_csv('test1.csv', header=False, index=False)
when you open test1.csv then you will find the correct results as you described above with the addition of a zero next to the first data point.
For more info see the following docs:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
http://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.DataFrame.to_csv.html

Categories

Resources