when I read the excel file to python.
import pandas as pd
data = pd.read_excel('1.xlsx')
data
Some part of my time data uploaded successfully but the another part of time data has some problems. Problem is on these columns (in_time, call_time, process_in_time, out_time).
Why is this happened?
And how to handle and normalize this time data ?
Related
when I read the excel file to python.
import pandas as pd
data = pd.read_excel('copy.xlsx')
data
Some part of my time data was uploaded successfully but another part of the time data has some problems. Problem is on these columns (in_time, call_time, process_in_time, out_time).
Why is this happening?
And how to handle and normalize this time data?
The Excel data link is here >enter link description here
enter image description here
I am trying to read in temperature data from a .csv file using pandas. Here is a sample of the file:
My issue is that my code is not extracting the data to 1 decimal place? This is what I have at the moment:
# Import packages
import pandas as pd
# Open data file
data_file_name = "data_set.csv"
data_file = pd.read_csv(data_file_name, header=2).astype(int)
# Extract temperature data
target_data = data_file["Temperature"].astype(float)
print(target_data.loc[[0]])
After adding in the print statement to see if the first value is -23.5 as it should be, instead I get:
-23.0
Why isn't my code reading the data as a float with 1 d.p?
I believe your issue is that you're reading in the datafile as .astype(int), which is converting everything in the CSV to an int, so you are unable to recover the decimal by doing .astype(float). Try to not specify type on the inital read_csv, as Pandas can normally handle properly typing automatically.
I have an excel sheet with [.xls] format containing live streaming of stock data from a software.
I want to read and process the data from the sheet in python after every 5 seconds.
Python is getting refreshed data only when i manually save the .xls file. It is not automatically getting new data points on running script after 1st time.
Any help?
This should help you:
import threading
import pandas as pd
def main_task():
threading.Timer(5.0, main_task).start() #Repeats the function main_task every 5 seconds
df = pd.read_excel("filename.xls") #Reads the excel file
main_task() #Calls the function
This code will update your pandas DataFrame with the new values every 5 seconds.
I am Trying to convert the YAML Data to Data frame through pandas with yamltodb package. but it is showing only the single row enclosed with header and only one data is showing. I tried to convert the Yaml file to JSON file and then tried normalize function. But it is not working out. Attached the screenshot for JSON function output. I need to categorize it under batman, bowler and runs etc. Code
Output Image and their code..
Just guessing, as I don’t know what your data actually looks like
import pandas as pd
import yaml
with open('fName.yaml', 'r') as f:
df = pd.io.json.json_normalize(yaml.load(f))
df.head()
i am having a huge pickle file which needs to be updated in every 3 hrs from a dailydata file(a csv file.)
there are two field named TRX_DATE and TIME_STAMP in each two having values like 24/11/2015 and 24/11/2015 10:19:02 respectively.(also 50 additionl fields are there)
so what i am doing is first reading the huge pickle to a dataframe. Then dropping any values for today's date by comparing with TRX_DATE field.
Then reading that csv file to another dataframe. then appending both dataframe and again creating new pickle.
my scripts looks like
import pandas as pd
import datetime as dt
import pickle
df = pd.read_pickle('hugedata pickle')
Today = dt.datetime.today()
df = df[(df.TRX_DATE > Today)] #delete any entries for today in main pickle
df1 = pd.read_csv(daily data csv file)
df = df.append(df1,ignore_index=True)
df.to_pickle('same huge data pickle')
problem is as follows
1.it is taking huge memory as well as time reading that huge pickle.
2.i need to append df1 to df and only columns from df should only remain and it should exclude if any new column from df1 getting appended. But i am getting new column values having NUN values at so many places.
So need assistance on these things
1.is there way that i will read the small sized csv only and append to pickle file ...(or reading that pickle is mandatory)
2.can it be done like converting the csv to pickle and merge two pickles. by load ,dump method (actually never used that)
3.how to read time from TIME_STAMP field and getting datas between two timestamp (filtering by TIME_STAMP).and upadting that to main pickle.previously i am filtering by TRX_DATE values.
Is there a better way--- please suggest.
HDF5 is made for what you are trying to do.
import tables
import numpy as np
from pandas import HDFStore,DataFrame
df.to_hdf('test.h5',key='test1') # create an hdf5 file
pd.read_hdf('test.h5',key='test1') # read an hdf5 file
df.to_hdf() defaults to append mode.