This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 2 years ago.
Is there a way in excel, Python, or R to convert data that is in the format of time and quantity per date into one long column. For instance:
Current format:
Instead I want this data to be one long column of 17 0s followed by 1 1 and 176 0s etc.
Thank you in advance for any help.
To elaborate the data looks like this:
Current data:
And I need this data to look like this:
Final result:
One option with uncount
library(tidyr)
uncount(dat, quantity)
Or with rep
with(dat, rep(time, quantity))
Related
This question already has an answer here:
Pandas read_excel: parsing Excel datetime field correctly [duplicate]
(1 answer)
Closed 1 year ago.
I'm trying to convert an entire column containing a 5 digit date code (EX: 43390, 43599) to a normal date format. This is just to make data analysis easier, it doesn't matter which way it's formatted. In a series, the DATE column looks like this:
1 43390
2 43599
3 43605
4 43329
5 43330
...
264832 43533
264833 43325
264834 43410
264835 43461
264836 43365
I don't understand previous submissions with this question, and when I tried code such as
date_col = df.iloc[:,0]
print((datetime.utcfromtimestamp(0) + timedelta(date_col)).strftime("%Y-%m-%d"))
I get this error
unsupported type for timedelta days component: Series
Thanks, sorry if this is a basic question.
You are calling the dataframe and assigning it to date_col. If you want to get the value of your first row, for example, use date_col = df.iloc[0]. This will return the value.
Timedelta takes an integer value, not Series.
This question already has answers here:
Convert Pandas Column to DateTime
(8 answers)
Closed 1 year ago.
I have a dataframe which contains a datetime column like this:
As you see in the "date_time" column the smallest time unit is minute. In fact, it does not have second uinte. I mean, for example, in the first six rows, 4:24 is repeated which means data gathered every 10 seconds or 4:25 repeated 10 times which means data recorded every 6 seconds.
Indeed, I am looking for a solution to have second in the "date_time" column.
The desirable format is like this:
Just use to_datetime() method of pandas
Solution:-
df['date_time']=pd.to_datetime(df['date_time'])
Then use apply() method:-
df['date_time']=df['date_time'].apply(lambda x:x.strftime("%H:%M:%S"))
This question already has answers here:
Resampling Minute data
(2 answers)
Closed 2 years ago.
I have some dataset. Let's presume it is:
dataset = pd.read_csv('some_stock_name_here.csv', index_col=['Date'], parse_dates=['Date'])
The csv file has 2500 observation(Date and Close price position) and I want to create a new csv file which inlude the same time series but with much less frequency data on the raw. For example every 40-th of the previous? How can I do this?
2. Also I'm wondering whether I could manipulate that frequency within the notebook without creating new csv file.
Thanks in advance.
You can slice your df using iloc:
Going over all rows and taking those at indexes that are divisible with X.
X = 40
df.iloc[::X]
Saving data-frame is achieved by the following code:
df.to_csv(FILE_PATH_HERE)
This question already has answers here:
The first three max value in a column in python
(1 answer)
Count and Sort with Pandas
(5 answers)
Closed 3 years ago.
I am doing an online course which has a problem like " Find the name of the state with maximum number of counties". The problem dataframe is the image below
Problem Dataframe
Now, I have given the dataframe two new index (hierarchical indexing) and after that the dataframe takes a new look like the image below
Modified Dataframe
I have used this code to get the modified dataframe:
def answer_five():
new_df = census_df[census_df['SUMLEV'] == 50]
new_df = new_df.set_index(['STNAME', 'CTYNAME'])
return new_df
answer_five()
What I want to do now is to find the name of the state with most number of counties i.e to find the index with maximum number of rows. How Can I do that?
I know that using something like groupby() method this can be done but I'm not familiar with this method yet and so don't want to use it. Can anyone help? I have searched for this but failed. Sorry if the problem is rudimentary. Thanks in advance.
This question already has answers here:
Pandas groupby: How to get a union of strings
(8 answers)
Closed 3 years ago.
new in pandas and I was able to create a dataframe from a csv file. I was also able to sort it out.
What I am struggling now is the following: I give an image as an example from a pandas data frame.
First column is the index,
Second column is a group number
Third column is what happened.
I want based on the second column to take out the third column on the same unique data frame.
I highlight few examples: For the number 9 return back the sequence
[60,61,70,51]
For the number 6 get back the sequence
[65,55,56]
For the number 8 get back the single element 8.
How groupby can be used to do this extraction?
Thanks a lot
Regards
Alex
Starting from the answers on this question we can extract following code to receive the desired result.
dataframe = pd.DataFrame({'index':[0,1,2,3,4], 'groupNumber':[9,9,9,9,9], 'value':[12,13,14,15,16]})
grouped = dataframe.groupby('groupNumber')['value'].apply(list)