How to iterate inside ILOC in a DF in Excel? - python

I need to read this Excel file named 2023-2.xlsx(which within it are dates of the month that correspond to 4 columns and the first 3 correspond to the index) and print in different files every day for example when reading this file that is seen in the image , I should print 4 files with names: 01-02-2023.xlsx , 02-02-2023.xlsx , 03-02-2023.xlsx , 04-02-2023.xlsx , with their corresponding data.
And so also the other days, how could you iterate inside iloc to avoid having to write all the necessary columns?
import pandas as pd
import numpy as np
import xlsxwriter
import glob
import os
import csv
all_files = glob.glob("C:/Users/ep_irojaso/Desktop/PROGRAMA DESEMPEÑO/saturn/2023-2.xlsx")
file_list = []
for i,f in enumerate(all_files):
df = pd.read_excel(f)
first = df.iloc[:, [0,1,2,3,4,5,6]]
second = df.iloc[:, [0,1,2,3,7,8,9,10]]
third = df.iloc[:, [0,1,2,3,11,12,13,14]]
firstWriter = pd.ExcelWriter("first.xlsx")
pd.DataFrame(first).to_excel(firstWriter)
firstWriter.save()
secondWriter = pd.ExcelWriter("second.xlsx")
pd.DataFrame(second).to_excel(secondWriter)
secondWriter.save()
thirdWriter = pd.ExcelWriter("third.xlsx")
pd.DataFrame(third).to_excel(thirdWriter)
thirdWriter.save()

Related

How to concatenate a list of csv files (including empty ones) using Pandas

I have a list of .csv files stored in a local folder and I'm trying to concatenate them into one single dataframe.
Here is the code I'm using :
import pandas as pd
import os
folder = r'C:\Users\_M92\Desktop\myFolder'
df = pd.concat([pd.read_csv(os.path.join(folder, f), delimiter=';') for f in os.listdir(folder)])
display(df)
Only one problem, it happens that one of the files is sometimes empty (0 cols, 0 rows) and in this case, pandas is throwing an EmptyDataError: No columns to parse from file in line 6.
Do you have any suggestions how to bypass the empty csv file ?
And why not how to concatenate csv files in a more efficient/simplest way.
Ideally, I would also like to add a column (to the dataframe df) to carry the name of each .csv.
You can check if a file is empty with:
import os
os.stat(FILE_PATH).st_size == 0
In your use case:
import os
df = pd.concat([
pd.read_csv(os.path.join(folder, f), delimiter=';') \
for f in os.listdir(folder) \
if os.stat(os.path.join(folder, f)).st_size != 0
])
Personally I would filter the files for content first, then merge them using the basic try-except.
import pandas as pd
import os
folder = r'C:\Users\_M92\Desktop\myFolder'
data = []
for f in os.listdir(folder):
try:
temp = pd.read_csv(os.path.join(folder, f), delimiter=';')
# adding original filename column as per request
temp['origin'] = f
data.append(temp)
except pd.errors.EmptyDataError:
continue
df = pd.concat(data)
display(df)

Pandas generating an empty csv while trying combine all csv's into one csv

I am writing a python script that will read all the csv files in the current location and merge them into a single csv file. Below is my code:-
import os
import numpy as np
import pandas as pd
import glob
path = os.getcwd()
extension = csv
os.chdir(path)
tables = glob.glob('*.{}'.format(extension))
data = pd.DataFrame()
for i in tables:
try:
df = pd.read_csv(r''+path+'/'+i+'')
# Here I want to create an index column with the name of the file and leave that column empty
df[i] = np.NaN
df.set_index(i, inplace=True)
# Below line appends an empty row for easy differentiation
df.loc[df.iloc[-1].name+1,:] = np.NaN
data = data.append(df)
except Exception as e:
print(e)
data.to_csv('final_output.csv', indexx=False, header=None)
If I remove the below lines of code then it works:-
df[i] = np.NaN
df.set_index(i, inplace=True)
But I want to have the first column name as the name of the file and its values NaN or empty.
I want the output to look something like this:-
I tend to avoid the .append method in favor of pandas.concat
Try this:
import os
from pathlib import Path
import pandas as pd
files = Path(os.getcwd()).glob('*.csv')
df = pd.concat([
pd.read_csv(f).assign(filename=f.name)
for f in files
], ignore_index=True)
df.to_csv('alldata.csv', index=False)

How do I add a list of values for a parameter to a group of dataframes, where that parameter has a different value for each dataframe?

I have 15 samples that each have a unique value of a parameter L.
Each sample was tested and provided data which I have placed into separate DataFrames in Pandas.
Each of the DataFrames has a different number of rows, and I want to place the corresponding value of L in each row, i.e. create a column for parameter L.
Note that L is constant in its respective DataFrame.
Is there a way to write a loop that will take a value of L from a list containing all of its values, and create a column in its corresponding sample data DataFrame?
I have so far been copying and pasting each line, and then updating the values and DataFrame names manually, but I suppose that this is not the most effective way of using python/pandas!
Most of the code I have used so far has been based on what I have found online, and my actual understanding of it is quite limited but I have tried to comment where possible.
UPDATED based on first suggested answer.
import pandas as pd
from pandas import DataFrame
import numpy as np
from pathlib import Path
from glob import glob
from os.path import join
path = r'file-directory/'
data_files = glob(join(path + '*.txt'))
def main():
from contextlib import ExitStack
with ExitStack() as context_manager: # Allows python to access different data folders
files = [context_manager.enter_context(open(f, "r")) for f in data_files]
# Define an empty list and start reading data files
df1 = []
for file in files:
df = pd.read_csv(file,
encoding='utf-8',
skiprows=114,
header=0,
# names=heads,
skipinitialspace=True,
sep='\t'
)
# Process the dataframe to remove unwanted rows and columns, and rename the headers
df = df[df.columns[[1, 2, 4, 6, 8, 10, 28]]]
df = df.drop(0, axis=0)
df = df.reset_index(drop=True)
df.rename(columns=dict(zip(df, heads)), inplace=True)
for columns in df:
df[columns] = pd.to_numeric(df[columns], errors='coerce')
# Append each new dataframe to a new row in the empty dataframe
df1.append(df)
# Extract dataframes from list
data1_0 = df1[0]
data1_1 = df1[1]
data1_2 = df1[2]
data1_3 = df1[3]
data1_4 = df1[4]
data1_5 = df1[5]
data1_6 = df1[6]
data1_7 = df1[7]
data1_8 = df1[8]
data1_9 = df1[9]
data1_10 = df1[10]
data1_11 = df1[11]
data1_12 = df1[12]
data1_13 = df1[13]
data1_14 = df1[14]
# Add in a new column for values of 'L'
L = ['L0', 'L1', 'L2', 'L3', 'L4', 'L5', 'L6', 'L7', 'L8', 'L9', 'L10', 'L11', 'L12', 'L13', 'L14']
data1_0['L'] = L[0]
data1_1['L'] = L[1]
data1_2['L'] = L[2]
data1_3['L'] = L[3]
data1_4['L'] = L[4]
data1_5['L'] = L[5]
data1_6['L'] = L[6]
data1_7['L'] = L[7]
data1_8['L'] = L[8]
data1_9['L'] = L[9]
data1_10['L'] = L[10]
data1_11['L'] = L[11]
data1_12['L'] = L[12]
data1_13['L'] = L[13]
data1_14['L'] = L[14]
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
The method I am using (copy and paste lines) works so far, it's just that it doesn't seem to be the most efficient use of my time or the tools I have, and I don't really know how to approach this one with my limited experience of python so far.
I also have several other parameters and datasets that I need to do this for, so any help would be greatly appreciated!
You can do just data1_0['L'] = L0 and so on for the rest of DataFrames. Given a single value on such assignment will fill the whole column with that value automatically, no need to compute length/index.
Untested code:
import pandas as pd
from pandas import DataFrame
import numpy as np
from pathlib import Path
from glob import glob
from os.path import join
path = r'file-directory/'
data_files = glob(join(path + '*.txt'))
def main():
from contextlib import ExitStack
with ExitStack() as context_manager: # Allows python to access different data folders
files = [context_manager.enter_context(open(f, "r")) for f in data_files]
# Define an empty list and start reading data files
df1 = []
for file in files:
df = pd.read_csv(file,
encoding='utf-8',
skiprows=114,
header=0,
# names=heads,
skipinitialspace=True,
sep='\t'
)
# Process the dataframe to remove unwanted rows and columns, and rename the headers
df = df[df.columns[[1, 2, 4, 6, 8, 10, 28]]]
df = df.drop(0, axis=0)
df = df.reset_index(drop=True)
df.rename(columns=dict(zip(df, heads)), inplace=True)
for columns in df:
df[columns] = pd.to_numeric(df[columns], errors='coerce')
# Add file name as identifier
df['FNAME'] = os.path.basename(file.name)
# Append each new dataframe to a new row in the empty dataframe
df1.append(df)
# Concatenate the results into single dataframe
data = pd.concat(df1)
L = ['L0', 'L1', 'L2', 'L3', 'L4', 'L5', 'L6', 'L7', 'L8', 'L9', 'L10', 'L11', 'L12', 'L13', 'L14']
# Supposing number of files and length of L is the same
repl_dict = {k:v for k,v in zip([os.path.basename(file.name) for file in files], L)}
# Add the new column
data1['L'] = data.FNAME.map(repl_dict)
return 0
if __name__ == "__main__":
import sys
sys.exit(main())

Changing Column Heading CSV File

I am currently trying to change the headings of the file I am creating. The code I am using is as follows;
import pandas as pd
import os, sys
import glob
path = "C:\\Users\\cam19\\Desktop\\Test1\\*.csv"
list_=[]
for fname in glob.glob(path):
df = pd.read_csv(fname, dtype=None, low_memory=False)
output = (df['logid'].value_counts())
list_.append(output)
df1 = pd.DataFrame()
df2 = pd.concat(list_, axis=1)
df2.to_csv('final.csv')
Basically I am looping through a file directory and extracting data from each file. Using this is outputs the following image;
http://imgur.com/a/LE7OS
All i want to do it change the columns names from 'logid' to the file name it is currently searching but I am not sure how to do this. Any help is great! Thanks.
Instead of appending the values try to append values by creating the dataframe and setting the column i.e
output = pd.DataFrame(df['value'].value_counts())
output.columns = [os.path.basename(fname).split('.')[0]]
list_.append(output)
Changes in the code in the question
import pandas as pd
import os, sys
import glob
path = "C:\\Users\\cam19\\Desktop\\Test1\\*.csv"
list_=[]
for fname in files:
df = pd.read_csv(fname)
output = pd.DataFrame(df['value'].value_counts())
output.columns = [os.path.basename(fname).split('.')[0]]
list_.append(output)
df2 = pd.concat(list_, axis=1)
df2.to_csv('final.csv')
Hope it helps

subtract consecutive rows from a .dat file

I wish to subtract rows from the preceding rows in a .dat file and then make a new column out of the result. In my file, I wish to do that with the first column time , I want to find time interval for each timestep and then make a new column out of it. I took help from stackoverflow community and wrote a pseudo code in pandas python. but it's not working so far:
import pandas as pd
import numpy as np
from sys import argv
from pylab import *
import csv
script, filename = argv
# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("./flash.dat").readlines()]
# write it as a new CSV file
with open("./flash.dat", "wb") as f:
writer = csv.writer(f)
writer.writerows(datContent)
columns_to_keep = ['#time']
dataframe = pd.read_csv("./flash.csv", usecols=columns_to_keep)
df = pd.DataFrame({"#time": pd.date_range("24 sept 2016"),periods=5*24,freq="1h")})
df["time"] = df["#time"] + [pd.Timedelta(minutes=m) for m in np.random.choice(a=range(60), size=df.shape[0])]
df["value"] = np.random.normal(size=df.shape[0])
df["prev_time"] = [np.nan] + df.iloc[:-1]["time"].tolist()
df["time_delta"] = df.time - df.prev_time
df
dataframe.plot(x='#time', y='time_delta', style='r')
print dataframe
show()
I am also sharing the file for your convenience, your help is mostly appreciated.
https://www.dropbox.com/s/w4jbxmln9e83355/flash.dat?dl=0

Categories

Resources