I have the following two 'Series' objects, which I want to write into a normal csv file with two columns (date and value):
However, with my code (see below) I only managed to include the value, but not the date with it:
Help would be greatly appreciated!
Best,
Lena
You could create a Pandas Dataframe for each of the series, and use the export to csv function of the dataframe. Like this:
import pandas as pd
hailarea_south = pd.DataFrame(hailarea_south)
hailarea_south.to_csv("hailarea_south.csv")
I suggest using pandas.write_csv since it handles most of the things you're currently doing manually. For this approach, however, a DataFrame is easiest to handle, so we need to convert first.
import numpy as np
import pandas as pd
# setup mockdata for example
north = pd.Series({'2002-04-01': 0, '2002-04-02': 0, '2021-09-28': 167})
south = pd.Series({'2002-04-01': 0, '2002-04-02': 0, '2021-09-28': 0})
# convert series to DataFrame
df = pd.DataFrame(data={'POH>=80_south': south, 'POH>=80_north': north})
# save as csv
df.to_csv('timeseries_POH.csv', index_label='date')
output:
date,POH>=80_south,POH>=80_north
2002-04-01,0,0
2002-04-02,0,0
2021-09-28,0,167
In case you want different separators, quotes or the like, please refer to this documentation for further reading.
Related
Good evening, I need help on getting two columns together, my brain is stuck right now, here's my code:
import pandas as pd
import numpy as np
tabela = pd.read_csv('/content/idkfa_linkedin_user_company_202208121105.csv', sep=';')
tabela.head(2)
coluna1 = 'startPeriodoMonth'
coluna2 = 'startPeriodoYear'
pd.concat([coluna1, coluna2])
ERROR: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid
I'm currently getting this error, but I really don't know what to do, by the way I'm a beginner, I don't know much about coding, so any help is very appreciated.
I am new to Pandas too. But I think I can help you. You seem to have created (2) string variables by encapsulating the literal strings "startPeriodMonth" and "startPeriodYear" in single quotes ('xyz'). I think that what you're trying to do is pass columns from your Pandas data frame... and the way to do that is to explicitly reference the df and then wrap your column name in square brackets, like this:
coluna1 = tabela[startPeriodoMonth]
This is why it is saying that you "can't concatenate an object of type string". It only accepts a series or dataframe object.
From what I understand coluna1, coluna2 are columnw from tabela. You have two options:
The first is selecting the columns from the dataframe and storing it in a new dataframe.
import pandas as pd
import numpy as np
tabela = pd.read_csv('/content/idkfa_linkedin_user_company_202208121105.csv', sep=';')
tabela.head(2)
coluna1 = 'startPeriodoMonth'
coluna2 = 'startPeriodoYear'
new_df=df[[coluna1, coluna2]]
The second option is creating a Dataframe, which contains just the desired column (for both columns), followed by the concatenation of these Dataframes.
coluna1 = 'startPeriodoMonth'
coluna2 = 'startPeriodoYear'
df_column1=tabela[[coluna1]]
df_column2=tabela[[coluna2]]
pd_concat=[df_column1, df_column2]
result = pd.concat(pd_concat)
You can create a new column in your existing data frame to get the desired output.
tabela['month_year'] = tabela['coluna1'].apply(str) + '/' + tabela['coluna2'].apply(str)
I am trying to output a filtered list based on the input of the index.
In my case, I want to make the Location the index, and only show all the results whose location is 'Switzerland'. I am using jupyter-notebook
I have an xlsx file called Book1 containing [here.][1]
, I type this in.
import pandas as pd
from pandas import Series, DataFrame
from scipy import stats
substats=pd.read_excel('Book1.xlsx', index_col=1) #index_col=1 makes Location the index
I am stuck, but I am expecting [the output to be like this][2]
Notice that the second image index is not 4, 6, but instead 1, 2.
Can you help me with this?
[1]: https://i.stack.imgur.com/NIbKx.png
[2]: https://i.stack.imgur.com/whSEP.png
I believe this is an off-by-one error
You are assuming Pandas is 0-indexing their arrays, but it looks like they are 1-indexing it.
Using the 2nd column as the index should solve this.
I have a Pandas dataframe in which one column contains JSON data (the JSON structure is simple: only one level, there is no nested data):
ID,Date,attributes
9001,2020-07-01T00:00:06Z,"{"State":"FL","Source":"Android","Request":"0.001"}"
9002,2020-07-01T00:00:33Z,"{"State":"NY","Source":"Android","Request":"0.001"}"
9003,2020-07-01T00:07:19Z,"{"State":"FL","Source":"ios","Request":"0.001"}"
9004,2020-07-01T00:11:30Z,"{"State":"NY","Source":"windows","Request":"0.001"}"
9005,2020-07-01T00:15:23Z,"{"State":"FL","Source":"ios","Request":"0.001"}"
I would like to normalize the JSON content in the attributes column so the JSON attributes become each a column in the dataframe.
ID,Date,attributes.State, attributes.Source, attributes.Request
9001,2020-07-01T00:00:06Z,FL,Android,0.001
9002,2020-07-01T00:00:33Z,NY,Android,0.001
9003,2020-07-01T00:07:19Z,FL,ios,0.001
9004,2020-07-01T00:11:30Z,NY,windows,0.001
9005,2020-07-01T00:15:23Z,FL,ios,0.001
I have been trying using Pandas json_normalize which requires a dictionary. So, I figure I would convert the attributes column to a dictionary but it does not quite work out as expected for the dictionary has the form:
df.attributes.to_dict()
{0: '{"State":"FL","Source":"Android","Request":"0.001"}',
1: '{"State":"NY","Source":"Android","Request":"0.001"}',
2: '{"State":"FL","Source":"ios","Request":"0.001"}',
3: '{"State":"NY","Source":"windows","Request":"0.001"}',
4: '{"State":"FL","Source":"ios","Request":"0.001"}'}
And the normalization takes the key (0, 1, 2, ...) as the column name instead of the JSON keys.
I have the feeling that I am close but I can't quite work out how to do this exactly. Any idea is welcome.
Thank you!
Normalize expects to work on an object, not a string.
import json
import pandas as pd
df_final = pd.json_normalize(df.attributes.apply(json.loads))
You shouldn’t need to convert to a dictionary first.
Try:
import pandas as pd
pd.json_normalize(df[‘attributes’])
I found an solution but I am not overly happy with it. I reckon it is very inefficient.
import pandas as pd
import json
# Import full dataframe
df = pd.read_csv(r'D:/tmp/sample_simple.csv', parse_dates=['Date'])
# Create empty dataframe to hold the results of data conversion
df_attributes = pd.DataFrame()
# Loop through the data to fill the dataframe
for index in df.index:
row_json = json.loads(df.attributes[index])
normalized_row = pd.json_normalize(row_json)
# df_attributes = df_attributes.append(normalized_row) (deprecated method) use concat instead
df_attributes = pd.concat([df_attributes, normalized_row], ignore_index=True)
# Reset the index of the attributes dataframe
df_attributes = df_attributes.reset_index(drop=True)
# Drop the original attributes column
df = df.drop(columns=['attributes'])
# Join the results
df_final = df.join(df_attributes)
# Show results
print(df_final)
print(df_final.info())
Which gives me the expected result. However, as I said, there are several inefficiencies in it. For starters, the dataframe append in the for loop. According to the documentation the best practice is to make a list and then append but I could not figure out how to do that while keeping the shape I wanted. I welcome all critics and ideas.
I am working from this dataset and I would like to combine yr_built and yr_renovated into one, preferably to yr_built, based on this: if the value in yr_renovated is bigger than 0, then I would like to have this value, otherwise the yr_built's value.
Can you please help me on this?
Thank you!
Here you go. You basically need pandas for the dataframe, then create a new column using numpy to check if the value of 'yr_renovated' is greater than zero else use 'yr_built'
import pandas as pd
import numpy as np
df = pd.read_csv('https://raw.githubusercontent.com/Jonasyao/Machine-Learning-Specialization-University-of-Washington-/master/Regression/Assignment_four/kc_house_data.csv', error_bad_lines=False)
df=df[['yr_built','yr_renovated','date','bedrooms']]
newdata['MyYear']=np.where(df['yr_renovated'] > 0,df['yr_renovated'],df['yr_built'])
newdata
I have the following dictionary in python that represents a From - To Distance Matrix.
graph = {'A':{'A':0,'B':6,'C':INF,'D':6,'E':7},
'B':{'A':INF,'B':0,'C':5,'D':INF,'E':INF},
'C':{'A':INF,'B':INF,'C':0,'D':9,'E':3},
'D':{'A':INF,'B':INF,'C':9,'D':0,'E':7},
'E':{'A':INF,'B':4,'C':INF,'D':INF,'E':0}
}
Is it possible to output this matrix into excel or to a csv file so that it has the following format? I have looked into using csv.writer and csv.DictWriter but can not produce the desired output.
You may create a pandas dataframe from that dict, then save to CSV or Excel:
import pandas as pd
df = pd.DataFrame(graph).T # transpose to look just like the sheet above
df.to_csv('file.csv')
df.to_excel('file.xls')
Probably not the most minimal result, but pandas would solve this marvellously (and if you're doing data analysis of any kind, I can highly recommend pandas!).
Your data is already in a perfectformat for bringing into a Dataframe
INF = 'INF'
graph = {'A':{'A':0,'B':6,'C':INF,'D':6,'E':7},
'B':{'A':INF,'B':0,'C':5,'D':INF,'E':INF},
'C':{'A':INF,'B':INF,'C':0,'D':9,'E':3},
'D':{'A':INF,'B':INF,'C':9,'D':0,'E':7},
'E':{'A':INF,'B':4,'C':INF,'D':INF,'E':0}
}
import pandas as pd
pd.DataFrame(graph).to_csv('OUTPUT.csv')
but the output you want is this Transposed, so:
pd.DataFrame(graph).T.to_csv('OUTPUT.csv')
where T returns the transpose of the Dataframe.