I wish to subtract rows from the preceding rows in a .dat file and then make a new column out of the result. In my file, I wish to do that with the first column time , I want to find time interval for each timestep and then make a new column out of it. I took help from stackoverflow community and wrote a pseudo code in pandas python. but it's not working so far:
import pandas as pd
import numpy as np
from sys import argv
from pylab import *
import csv
script, filename = argv
# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("./flash.dat").readlines()]
# write it as a new CSV file
with open("./flash.dat", "wb") as f:
writer = csv.writer(f)
writer.writerows(datContent)
columns_to_keep = ['#time']
dataframe = pd.read_csv("./flash.csv", usecols=columns_to_keep)
df = pd.DataFrame({"#time": pd.date_range("24 sept 2016"),periods=5*24,freq="1h")})
df["time"] = df["#time"] + [pd.Timedelta(minutes=m) for m in np.random.choice(a=range(60), size=df.shape[0])]
df["value"] = np.random.normal(size=df.shape[0])
df["prev_time"] = [np.nan] + df.iloc[:-1]["time"].tolist()
df["time_delta"] = df.time - df.prev_time
df
dataframe.plot(x='#time', y='time_delta', style='r')
print dataframe
show()
I am also sharing the file for your convenience, your help is mostly appreciated.
https://www.dropbox.com/s/w4jbxmln9e83355/flash.dat?dl=0
Related
I need to read this Excel file named 2023-2.xlsx(which within it are dates of the month that correspond to 4 columns and the first 3 correspond to the index) and print in different files every day for example when reading this file that is seen in the image , I should print 4 files with names: 01-02-2023.xlsx , 02-02-2023.xlsx , 03-02-2023.xlsx , 04-02-2023.xlsx , with their corresponding data.
And so also the other days, how could you iterate inside iloc to avoid having to write all the necessary columns?
import pandas as pd
import numpy as np
import xlsxwriter
import glob
import os
import csv
all_files = glob.glob("C:/Users/ep_irojaso/Desktop/PROGRAMA DESEMPEÑO/saturn/2023-2.xlsx")
file_list = []
for i,f in enumerate(all_files):
df = pd.read_excel(f)
first = df.iloc[:, [0,1,2,3,4,5,6]]
second = df.iloc[:, [0,1,2,3,7,8,9,10]]
third = df.iloc[:, [0,1,2,3,11,12,13,14]]
firstWriter = pd.ExcelWriter("first.xlsx")
pd.DataFrame(first).to_excel(firstWriter)
firstWriter.save()
secondWriter = pd.ExcelWriter("second.xlsx")
pd.DataFrame(second).to_excel(secondWriter)
secondWriter.save()
thirdWriter = pd.ExcelWriter("third.xlsx")
pd.DataFrame(third).to_excel(thirdWriter)
thirdWriter.save()
I am having below file(file1.xlsx) as input. In total i am having 32 columns in this file and almost 2500 rows. Just for example i am mentioning 5 columns in screen print
I want to edit same file with python and want output as (file1.xlsx)
it should be noted i am adding one column named as short and data is a kind of substring upto first decimal of data present in name(A) column of same excel.
Request you to please help
Regards
Kawaljeet
Here is what you need...
import pandas as pd
file_name = "file1.xlsx"
df = pd.read_excel(file_name) #Read Excel file as a DataFrame
df['short'] = df['Name'].str.split(".")[0]
df.to_excel("file1.xlsx")
hello guys i solved the problem with below code:
import pandas as pd
import os
def add_column():
file_name = "cmdb_inuse.xlsx"
os.chmod(file_name, 0o777)
df = pd.read_excel(file_name,) #Read Excel file as a DataFrame
df['short'] = [x.split(".")[0] for x in df['Name']]
df.to_excel("cmdb_inuse.xlsx", index=False)
The goal for this program to accomplish is to read each column header and to read all of the data underneath each column. After reading this data it will then make a list of it and log it all into a text file. When doing this with small data it works but when working with large amounts of data (2000 lines and up) it records in the text file up to the number 30 then the next element is '...'. it then resumes recording correctly all the way up until the 2000th element.
I have tried all that i can do. Plz help. I almost punched a hole in the wall trying to fix this.
import csv
import pandas as pd
import os
import linecache
from tkinter import *
from tkinter import filedialog
def create_dict(df):
# Creates an empty text file for the dictionary if it doesn't exist
if not os.path.isfile("Dictionary.txt"):
open("Dictionary.txt", 'w').close()
# Opens the dictionary for reading and writing
with open("Dictionary.txt", 'r+') as dictionary:
column_headers = list(df)
i = 0
# Creates an entry in the dictionary for each header
for header in column_headers:
dictionary.write("==========================\n"
"\t=" + header + "=\n"
"==========================\n\n\n\n")
dictionary.write(str(df[str(column_headers[i])]))
#for line in column_info[:-1]:
# dictionary.write(line + '\n')
dictionary.write('\n')
i += 1
Some of these imports might not be used. I just included all of them.
you can directly write pandas dataframe to txt file ..
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(low = 1, high = 100, size =3000), columns= ['Random Number'])
filename = 'dictionary.txt'
with open(filename,'w') as file:
df.to_string(file)
I am currently trying to change the headings of the file I am creating. The code I am using is as follows;
import pandas as pd
import os, sys
import glob
path = "C:\\Users\\cam19\\Desktop\\Test1\\*.csv"
list_=[]
for fname in glob.glob(path):
df = pd.read_csv(fname, dtype=None, low_memory=False)
output = (df['logid'].value_counts())
list_.append(output)
df1 = pd.DataFrame()
df2 = pd.concat(list_, axis=1)
df2.to_csv('final.csv')
Basically I am looping through a file directory and extracting data from each file. Using this is outputs the following image;
http://imgur.com/a/LE7OS
All i want to do it change the columns names from 'logid' to the file name it is currently searching but I am not sure how to do this. Any help is great! Thanks.
Instead of appending the values try to append values by creating the dataframe and setting the column i.e
output = pd.DataFrame(df['value'].value_counts())
output.columns = [os.path.basename(fname).split('.')[0]]
list_.append(output)
Changes in the code in the question
import pandas as pd
import os, sys
import glob
path = "C:\\Users\\cam19\\Desktop\\Test1\\*.csv"
list_=[]
for fname in files:
df = pd.read_csv(fname)
output = pd.DataFrame(df['value'].value_counts())
output.columns = [os.path.basename(fname).split('.')[0]]
list_.append(output)
df2 = pd.concat(list_, axis=1)
df2.to_csv('final.csv')
Hope it helps
I have CSV file with data like
data,data,10.00
data,data,11.00
data,data,12.00
I need to update this as
data,data,10.00
data,data,11.00,1.00(11.00-10.00)
data,data,12.30,1.30(12.30-11.00)
could you help me to update the csv file using python
You can use pandas and numpy. pandas reads/writes the csv and numpy does the calculations:
import pandas as pd
import numpy as np
data = pd.read_csv('test.csv', header=None)
col_data = data[2].values
diff = np.diff(col_data)
diff = np.insert(diff, 0, 0)
data['diff'] = diff
# write data to file
data.to_csv('test1.csv', header=False, index=False)
when you open test1.csv then you will find the correct results as you described above with the addition of a zero next to the first data point.
For more info see the following docs:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
http://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.DataFrame.to_csv.html