I'm an Environmental Engineer, trying to make a leap change to the data science area which interests me more.
I'm new to Python, I work at a company that evaluates air quality data and I think that if I automate the analysis, I should save some time.
I've imported the CSV files with environmental data from the past month, did some filter in that just to make sure that the data were okay and did a groupby just analyse the data day-to-day (I need that in my report for the regulatory agency).
The step by step of what I did:
medias = tabela.groupby(by=["Data"]).mean()
display (tabela)
enter image description here
As you can see there's a column named Data, but when I do the info check it not recognizes the Data as a column.
print (medias.info())
enter image description here
How can I solve this? I need to plot some graphs with the concentration of rain and dust per day.
After grouping, please do a reset index
medias = tabela.groupby(by=["Data"]).mean()
medias = medias.reset_index()
Related
OK, I'll ask in more detail. I'm updating the question and will add an image as well. I have the sectors and job vacancies data for those sectors as in the picture. The first column is dates and it's an index. the other 18 columns are job vacancies data for sectors.
enter image description here
Now my question is,
When I chart calculations such as seasonality and moving average, there are 18 tables for each sector.
For example, the healthcare industry, or mining.
enter image description here
I have exactly 18 of these three tables. At the end of the data preprocessing stage, I will have almost hundreds of tables. I wanted to tell them table by table in the readme.md section when I upload them to my github profile. But this way it won't be possible. Do you think I'm going right? I think about it. Or am I making things difficult for myself?
Is there any other way to analyze them? Can't I merge? I am open to suggestions at this point... I am doing time series analysis for the first time.
I am trying to build a binary classifier to predict the propensity of customers transitioning from one account to another.
I have age, gender, cust-segment data but also a time-series of their bank balances for the last 18mths on a monthly basis and also have a lot of high cardinality categorical variables.
So, what I want to know is how do I transform the time series data so its in a more compact static form to the rest of the data points like age, gender etc. Or can I just throw this into the algorithm too?
sample data may look like the below:
customer number, age, gender, marital status code, 18mth-bal, 17mth-bal,...,3mth-bal, postcode-segment ..
Any help would be Fantastic! Thank you.
I would generate descriptive statistics for each time serie. Standard deviation seems interesting, but you coud also use percentiles, mean, min and max... or all of them.
# add a column for the standard deviation (and/or percentiles etc.)
df['standard_deviation']= np.zeros(len(df)).astype(int)
# calculate the standard deviation with np.std() and add it to the new column
for i in df.index :
df['standard_deviation'][i] = np.std(df.loc[[i]]['balance'][i])
This last line works if the content of a 'balance' cell is a list or an array (of the 18 last amounts known for this customer for example).
Sorry I can't be more specific as I can't see your dataframe, but hope it helps !
enter image description hereenter image description hereI have the following data frame. I am trying to group the data into rolling date categories based on the received date. I need this code to work so that when a new piece of data is added to the source, it still falls into these date groups: 0-3 months, 3-6 months, 6-9, 9-12, and 12+. Can someone please walk me through how to do this? I have searched high and low for days and I just don't understand how to do it. I have tried grouping the dates based on receive date but that doesn't account for new dates being added (picture attached)enter image description here. My ultimate goal is to deploy this in a dash app. I have a ton of other graphs that are currently deployed but I have a handful that have to use the datetime buckets and I can't do them until I figure this part out.
I want to make a chart on any country's map by using python. For example: To make a record of Covid-19 infected or death toll of a country on that country map, what type of module or how can I do it?
First of all you need a data set as far as I understand. Data you can use for corona virus,
Corona virus data set
Download the csv files here.
Later,
With phyton, you can learn how to transfer to excel.
csv to excel
To make a graphic, to examine here.
pyhton chart
Matplotlib and Seaborn can help you with necessarily this.
Excel newbie and aspiring Data analyst, I have this data and I want to find the distribution of City wise Shopping Experience. The column M has the shopping experience rated from 1 to 5.
What I tried
I am not able to google how to do this at all. I tried running correlation, but the in-built excel data analysis tool does not let me run it on non-numeric data, and I am not able to group the City cells either. I thought of replacing every city with numeric alias but I don't know how to do that either. How to search, or go ahead with this problem?
Update: I was thinking of some way to get this out of the cities column.
I am thinking this is better done in python.
How about something like this, have just taken the cities and data to show averageif, sumif and countif:
I used Data validation to provide the list to select from.