I am interacting through a number of csv files and want to append the mean temperatures to a blank csv file. How do you create an empty csv file with pandas?
for EachMonth in MonthsInAnalysis:
TheCurrentMonth = pd.read_csv('MonthlyDataSplit/Day/Day%s.csv' % EachMonth)
MeanDailyTemperaturesForCurrentMonth = TheCurrentMonth.groupby('Day')['AirTemperature'].mean().reset_index(name='MeanDailyAirTemperature')
with open('my_csv.csv', 'a') as f:
df.to_csv(f, header=False)
So in the above code how do I create the my_csv.csv prior to the for loop?
Just a note I know you can create a data frame then save the data frame to csv but I am interested in whether you can skip this step.
In terms of context I have the following csv files:
Each of which have the following structure:
The Day column reads up to 30 days for each file.
I would like to output a csv file that looks like this:
But obviously includes all the days for all the months.
My issue is that I don't know which months are included in each analysis hence I wanted to use a for loop that used a list that has that information in it to access the relevant csvs, calculate the mean temperature then save it all into one csv.
Input as text:
Unnamed: 0 AirTemperature AirHumidity SoilTemperature SoilMoisture LightIntensity WindSpeed Year Month Day Hour Minute Second TimeStamp MonthCategorical TimeOfDay
6 6 18 84 17 41 40 4 2016 1 1 6 1 1 10106 January Day
7 7 20 88 22 92 31 0 2016 1 1 7 1 1 10107 January Day
8 8 23 1 22 59 3 0 2016 1 1 8 1 1 10108 January Day
9 9 23 3 22 72 41 4 2016 1 1 9 1 1 10109 January Day
10 10 24 63 23 83 85 0 2016 1 1 10 1 1 10110 January Day
11 11 29 73 27 50 1 4 2016 1 1 11 1 1 10111 January Day
Just open the file in write mode to create it.
with open('my_csv.csv', 'w'):
pass
Anyway I do not think you should be opening and closing the file so many times. You'd better open the file once, write several times.
with open('my_csv.csv', 'w') as f:
for EachMonth in MonthsInAnalysis:
TheCurrentMonth = pd.read_csv('MonthlyDataSplit/Day/Day%s.csv' % EachMonth)
MeanDailyTemperaturesForCurrentMonth = TheCurrentMonth.groupby('Day')['AirTemperature'].mean().reset_index(name='MeanDailyAirTemperature')
df.to_csv(f, header=False)
Creating a blank csv file is as simple as this one
import pandas as pd
pd.DataFrame({}).to_csv("filename.csv")
I would do it this way: first read up all your CSV files (but only the columns that you really need) into one DF, then make groupby(['Year','Month','Day']).mean() and save resulting DF into CSV file:
import glob
import pandas as pd
fmask = 'MonthlyDataSplit/Day/Day*.csv'
df = pd.concat((pd.read_csv(f, sep=',', usecols=['Year','Month','Day','AirTemperature']) for f in glob.glob(fmask)))
df.groupby(['Year','Month','Day']).mean().to_csv('my_csv.csv')
and if want to ignore the year:
import glob
import pandas as pd
fmask = 'MonthlyDataSplit/Day/Day*.csv'
df = pd.concat((pd.read_csv(f, sep=',', usecols=['Month','Day','AirTemperature']) for f in glob.glob(fmask)))
df.groupby(['Month','Day']).mean().to_csv('my_csv.csv')
Some details:
(pd.read_csv(f, sep=',', usecols=['Month','Day','AirTemperature']) for f in glob.glob('*.csv'))
will generate tuple of data frames from all your CSV files
pd.concat(...)
will concatenate them into resulting single DF
df.groupby(['Year','Month','Day']).mean()
will produce wanted report as a data frame, which might be saved into new CSV file:
.to_csv('my_csv.csv')
The problem is a little unclear, but assuming you have to iterate month by month, and apply the groupby as stated just use:
#Before loops
dflist=[]
Then in each loop do something like:
dflist.append(MeanDailyTemperaturesForCurrentMonth)
Then at the end:
final_df = pd.concat([dflist], axis=1)
and this will join everything into one dataframe.
Look at:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html
http://pandas.pydata.org/pandas-docs/stable/merging.html
You could do this to create an empty CSV and add columns without an index column as well.
import pandas as pd
df=pd.DataFrame(columns=["Col1","Col2","Col3"]).to_csv(filename.csv,index=False)
Related
Name
4 A-------
5 ---
6 Father Name
7 ------
8 Gender
9 Country of
10 M
11 Oman
12 Identity Number -n?
13 Date of Birth
14 ------------9
15 28.10.1995
16 ----
17 Date of Issue
18 Date of Expiry
To extract a specific column from a csv file you can simply use the iloc function from the pandas library after reading the initial csv file.
dataset = pd.read_csv("path_of_csv")
# Now once you've read the original csv file you can slice along the columns
# to get the desired column (Example: Name, 1st column)
Name = dataset.iloc[:,0]
Or if you use an older version of pandas, this just might work:
(Definitely works for pandas version 1.3.5)
dataset = pd.read_csv("path_of_csv")
Name = dataset['Name']
I have a dictionary with multiple values to a key. For ex:
dict = {u'Days': [u'Monday', u'Tuesday', u'Wednesday', u'Thursday'],u'Temp_value':[54,56,57,45], u'Level_value': [30,34,35,36] and so on...}
I want to export this Data to excel in the below-mentioned formet.
Column 1 Column 2 column 3 so on...
Days Temp_value Level_value
Monday 54 30
Tuesday 56 34
Wednesday 57 35
Thursday 45 36
How can I do that?
Use pandas
import pandas as pd
df = pd.DataFrame(your_dict)
df.to_excel('your_file.xlsx', index=False)
I am trying to create a plot from csv file. In the csv file, the first column is timestamp, second till sixth column are different parties. I want to create a graph where x axis is year(ie. 2004) and plot the graph with the values of the parties in percentage in y axis.
The csv file looks like:
date,CSU/CDU,SPD,Gruene,FDP,Linke
891468000.0,34,44,6,5,6
891986400.0,34,44,6,5,6
892677600.0,35,43,6,5,5
894405600.0,32,46,6,6,5
895010400.0,33,46,5,5,5
I have tried the below code.
import numpy as np
import matplotlib.pyplot as plt
with open('polldata.csv') as f:
names = f.readline().strip().split(',')
data = np.loadtxt(f, delimiter=',')
cols = data.shape[1]
for n in range (1,cols):
plt.plot(data[:,0],data[:,n],label=names[n])
plt.xlabel('year',fontsize=14)
plt.ylabel('parties',fontsize=14)
plt.show()
From the first column of my csv file, I want to convert that timestamp to year .Also, I need to display in a bar chart so that the color differentiation parties can be easily identified.
I want the graph to look similar like the 5TH one in the below page
(https://moderndata.plot.ly/elections-analysis-in-r-python-and-ggplot2-9-charts-from-4-countries/)
THANKS IN ADVANCE!
You can use the csv reader from pandas. Documentation is here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
it looks like this:
import pandas as pd
import matplotlib.pyplot as plt
import datetime
df = pd.read_csv("polldata.csv", delimiter=',')
df['date'] = df['date'].apply(lambda ts: datetime.datetime.utcfromtimestamp(ts).strftime('%Y'))
print(df)
ols = df.columns
for n in range (len(cols)):
plt.plot(df,label=cols[n])
plt.xlabel('year',fontsize=14)
plt.ylabel('parties',fontsize=14)
plt.show()
it will print:
date CSU/CDU SPD Gruene FDP Linke
0 1998 34 44 6 5 6
1 1998 34 44 6 5 6
2 1998 35 43 6 5 5
3 1998 32 46 6 6 5
4 1998 33 46 5 5 5
does that get you started?
I am using the google finance api to pull data into a pandas dataframe. The index is a number and I would like to change it to be a date inclusive of hours and minutes. Any ideas? Thanks!
import pandas as pd
api_call = 'http://finance.google.com/finance/getprices?q=SPY&i=300&p=1d&f=d,o,h,l,c,'
df = pd.read_csv(api_call, skiprows=8, header=None)
df.columns = ['Record', 'Open', 'High', 'Low', 'Close']
df['Record'] = df.index
Record Open High Low Close
0 0 268.19 268.48 268.18 268.46
1 1 268.14 268.23 267.98 268.19
2 2 268.11 268.19 268.06 268.13
3 3 268.05 268.16 267.96 268.11
4 4 267.93 268.1 267.9 268.06
5 5 267.98 268.01 267.89 267.92
6 6 267.95 267.99 267.86 267.97
7 7 267.88 267.95 267.85 267.94
8 8 267.78 267.9 267.78 267.88
9 9 267.94 267.96 267.68 267.78
10 10 267.91 267.95 267.87 267.94
Doesn't look like Pandas supports reading from the Google api. If you look at the raw response from the api it looks like this:
EXCHANGE%3DNYSEARCA
MARKET_OPEN_MINUTE=570
MARKET_CLOSE_MINUTE=960
INTERVAL=300
COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN
DATA=
TIMEZONE_OFFSET=-300
a1514557800,268.51,268.55,268.48,268.55
1,268.19,268.48,268.18,268.46
2,268.14,268.23,267.98,268.19
3,268.11,268.19,268.06,268.13
That first datetime value (with the leading a) is the unix timestamp. Each subsequent "datetime" is really the data for next 300 seconds (INTERVAL value) from the previous row. You need to write something that will parse the header information, and use that to create the timestamps.
I am trying to create csv download ,but result download gives me in different format
def csv_download(request):
import csv
import calendar
from datetime import *
from dateutil.relativedelta import relativedelta
now=datetime.today()
month = datetime.today().month
d = calendar.mdays[month]
# Create the HttpResponse object with the appropriate CSV header.
response = HttpResponse(mimetype='text/csv')
response['Content-Disposition'] = 'attachment; filename=somefilename.csv'
m=Product.objects.filter(product_sellar = 'jhon')
writer = csv.writer(response)
writer.writerow(['S.No'])
writer.writerow(['product_name'])
writer.writerow(['product_buyer'])
for i in xrange(1,d):
writer.writerow(str(i) + "\t")
for f in m:
writer.writerow([f.product_name,f.porudct_buyer])
return response
output of above code :
product_name
1
2
4
5
6
7
8
9
1|10
1|1
1|2
.
.
.
2|7
mgm | x_name
wge | y_name
I am looking out put like this
s.no porduct_name product_buyser 1 2 3 4 5 6 7 8 9 10 .....27 total
1 mgm x_name 2 3 8 13
2 wge y_name 4 9 13
can you please help me with above csv download ?
if possible can you please tell me how to sum up all the individual user total at end?
Example :
we have selling table in that every day seller info will be inserted
table data looks like
S.no product_name product_seller sold Date
1 paint jhon 5 2011-03-01
2 paint simth 6 2011-03-02
I have created a table where it displays below format and i am trying to create csv download
s.no prod_name prod_sellar 1-03-2011 2-03-2011 3-03-2011 4-03-2011 total
1 paint john 10 15 0 0 25
2 paint smith 2 6 2 0 10
Please read the csv module documentation, particularly the writer object API.
You'll notice that the csv.writer object takes a list with elements representing their position in your delimited line. So to get the desired output, you would need to pass in a list like so:
writer = csv.writer(response)
writer.writerow(['S.No', 'product_name', 'product_buyer'] + range(1, d) + ['total'])
This will give you your desired header output.
You might want to explore the csv.DictWriter class if you want to only populate some parts of the row. It's much cleaner. This is how you would do it:
writer = csv.DictWriter(response,
['S.No', 'product_name', 'product_buyer'] + range(1, d) + ['total'])
Then when your write command would follow as so:
for f in m:
writer.writerow({'product_name': f.product_name, 'product_buyer': f.product_buyer})