Name
4 A-------
5 ---
6 Father Name
7 ------
8 Gender
9 Country of
10 M
11 Oman
12 Identity Number -n?
13 Date of Birth
14 ------------9
15 28.10.1995
16 ----
17 Date of Issue
18 Date of Expiry
To extract a specific column from a csv file you can simply use the iloc function from the pandas library after reading the initial csv file.
dataset = pd.read_csv("path_of_csv")
# Now once you've read the original csv file you can slice along the columns
# to get the desired column (Example: Name, 1st column)
Name = dataset.iloc[:,0]
Or if you use an older version of pandas, this just might work:
(Definitely works for pandas version 1.3.5)
dataset = pd.read_csv("path_of_csv")
Name = dataset['Name']
Related
Trying to solve a trading problem, but rephrasing it in a different way.
I have an array of countries as
countries = {'country_name': ['France','Germany','Italy','Japan']}
For each country, I have a CSV stored on my laptop. Each CSV has 3 columns [Date, Birth, Death].
I am making for loop on Array and reading the CSV and creating a dataframe object.
countries = {'country_name': ['France','Germany','Italy','Japan']}
countries = pd.DataFrame(countries)
for country in countries['country_name']:
country_file_name = country + '.csv'
vars()[country] = pd.read_csv(country_file_name)
## Here I want to append country to each column except index
When I do France.head()
I get the output as France
index Birth Deaths
2020-01-01 9 10
2002-01-02 5 12
...
2002-12-10 14 10
But I want the output as France
index France_Birth France_Deaths
2020-01-01 9 10
2002-01-02 5 12
....
2002-12-10 14 10
Note - I do not want to do France.columns= ['France_Birth','France_Deaths'] because it will take me days to do it for all the csv.
I am using jupyternote book here.
https://colab.research.google.com/drive/1aOg3eOhsigbewAhRwQE1QsxGDKzEPyW5?usp=sharing
Note sure there is any way to this or I have to change my approach.
This can be achieved using the rename function of pandas.Dataframe:
countries = {'country_name': ['France','Germany','Italy','Japan']}
countries = pd.DataFrame(countries)
for country in countries['country_name']:
country_file_name = country + '.csv'
vars()[country] = pd.read_csv(country_file_name).rename(columns=lambda s: country + "_" + s)
You can check the documentation here.
I have a CSV file which contains two columns, the first is a date column in the format 01/01/2020 and the second is a number for each month representing the months sales volume. The dates range from 2004 to 2019 and my task is to create a 12 bar chart, with each bar representing the average sales volume for that month across every years data. I attempted to use a groupby function but got an error relating to not having numeric types to aggregate. I am very new to python so apologies for the beginner questions. I Have posted my code so far below. Thanks in advance for any help with this :)
# -*- coding: utf-8 -*-
import csv
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
file = "GlasgowSalesVolume.csv"
data = pd.read_csv(file)
typemean = (data.groupby(['Date', 'SalesVolume'], as_index=False).mean().groupby('Date')
['SalesVolume'].mean())
Output:
DataError: No numeric types to aggregate
enter code here
I prepared a DataFrame limited to just 2 rows and 3 months:
Date Sales
0 01/01/2019 3
1 01/02/2019 4
2 01/03/2019 8
3 01/01/2020 10
4 01/02/2020 20
5 01/03/2020 30
For now Date column is of string type, so the first step is to
convert it to datetime64:
df.Date = pd.to_datetime(df.Date, dayfirst=True)
Now to compute your result, run:
result = df.groupby(df.Date.dt.month).Sales.mean()
The result is a Series containing:
Date
1 6.5
2 12.0
3 19.0
Name: Sales, dtype: float64
The index is the month number (1 thru 12) and the value is the mean from
respective month, from all years.
18F-AV-1451-A07 Value refer to another sheet called "CONTENT" in which column "B" and row "3".
I have load the dataframe using code
pd.read_excel('data/A07.xls',sheet_name = 'DM',skiprows = 12, skipfooter = 2)
I'm getting null value in that column of "Conversion Definition" instead of "18F-AV-1451-A07".
how can i get that data in my dataframe, and i don't want to do hardcoded.
First Credits, I didn't actually solve this, i took help from user U9-Forwrad, Now to do this you need
import pandas as pd
xlsx = pd.ExcelFile('Sample.xlsx')
df1 = pd.read_excel(xlsx, 'CONTENT', header=None)
df2 = pd.read_excel(xlsx, 'Sheet2')
boolean = df2['Class'].isin(df1[0].fillna(df1[1]).dropna())
idxs = boolean.index[boolean == True]
print(df2.iloc[idxs[0]:idxs[1]+1])
Which gives you
Day Month Class
1 tuesday Feb CM
2 Wednesday Mar NaN
3 Thursday Apr NaN
4 Friday May NaN
5 Saturday Jun NaN
6 Sunday Jul DM
Which I think is what you are looking for.
Note: You will need to convert the file to xlsx, ODS format isn't supported by pandas.
I have data that has been changed due to some Excel formatting issues. When there is a number involved with a - dash it automatically changes into a date format.
For example 1-1 changed into 01-Jan, 25-2 changes to 25-Feb in Excel.
But the data with dashes or other values like 1A and 1001 are in tact. When I load the data into Spyder it actually changes format again into a datetime type.
First the data looks like this in Excel:
Name ID Value
Hello 1A 22
Hi 01-Jan 20
What 02-Jan 12
Is 1001 10
Up 25-Mar 11
The data comes up as a Pandas Dataframe format with the current year (2019) in Python with the code:
import pandas as pd
FAC_sheet = pd.read_excel('data', dtype=str)
Name ID Value
Hello 1A 22
Hi 2019-01-01 00:00:00 20
What 2019-01-02 00:00:00 12
Is 1001 10
Up 2019-03-25 00:00:00 11
Is there a way I can change only the strangely date formatted values and keep the rest in tact? The desired output is:
Name ID Value
Hello 1A 22
Hi 1-1 20
What 1-2 12
Is 1001 10
Up 3-25 11
You can try the below to try and override the Date behave auto conversion in pandas (replace Date with the column name):
pandas.read_excel(xlsx, sheet, converters={'Date': str})
From the docs (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html):
converters : dict, optional
Dict of functions for converting values in certain columns. Keys can either be integers or column labels.
I am interacting through a number of csv files and want to append the mean temperatures to a blank csv file. How do you create an empty csv file with pandas?
for EachMonth in MonthsInAnalysis:
TheCurrentMonth = pd.read_csv('MonthlyDataSplit/Day/Day%s.csv' % EachMonth)
MeanDailyTemperaturesForCurrentMonth = TheCurrentMonth.groupby('Day')['AirTemperature'].mean().reset_index(name='MeanDailyAirTemperature')
with open('my_csv.csv', 'a') as f:
df.to_csv(f, header=False)
So in the above code how do I create the my_csv.csv prior to the for loop?
Just a note I know you can create a data frame then save the data frame to csv but I am interested in whether you can skip this step.
In terms of context I have the following csv files:
Each of which have the following structure:
The Day column reads up to 30 days for each file.
I would like to output a csv file that looks like this:
But obviously includes all the days for all the months.
My issue is that I don't know which months are included in each analysis hence I wanted to use a for loop that used a list that has that information in it to access the relevant csvs, calculate the mean temperature then save it all into one csv.
Input as text:
Unnamed: 0 AirTemperature AirHumidity SoilTemperature SoilMoisture LightIntensity WindSpeed Year Month Day Hour Minute Second TimeStamp MonthCategorical TimeOfDay
6 6 18 84 17 41 40 4 2016 1 1 6 1 1 10106 January Day
7 7 20 88 22 92 31 0 2016 1 1 7 1 1 10107 January Day
8 8 23 1 22 59 3 0 2016 1 1 8 1 1 10108 January Day
9 9 23 3 22 72 41 4 2016 1 1 9 1 1 10109 January Day
10 10 24 63 23 83 85 0 2016 1 1 10 1 1 10110 January Day
11 11 29 73 27 50 1 4 2016 1 1 11 1 1 10111 January Day
Just open the file in write mode to create it.
with open('my_csv.csv', 'w'):
pass
Anyway I do not think you should be opening and closing the file so many times. You'd better open the file once, write several times.
with open('my_csv.csv', 'w') as f:
for EachMonth in MonthsInAnalysis:
TheCurrentMonth = pd.read_csv('MonthlyDataSplit/Day/Day%s.csv' % EachMonth)
MeanDailyTemperaturesForCurrentMonth = TheCurrentMonth.groupby('Day')['AirTemperature'].mean().reset_index(name='MeanDailyAirTemperature')
df.to_csv(f, header=False)
Creating a blank csv file is as simple as this one
import pandas as pd
pd.DataFrame({}).to_csv("filename.csv")
I would do it this way: first read up all your CSV files (but only the columns that you really need) into one DF, then make groupby(['Year','Month','Day']).mean() and save resulting DF into CSV file:
import glob
import pandas as pd
fmask = 'MonthlyDataSplit/Day/Day*.csv'
df = pd.concat((pd.read_csv(f, sep=',', usecols=['Year','Month','Day','AirTemperature']) for f in glob.glob(fmask)))
df.groupby(['Year','Month','Day']).mean().to_csv('my_csv.csv')
and if want to ignore the year:
import glob
import pandas as pd
fmask = 'MonthlyDataSplit/Day/Day*.csv'
df = pd.concat((pd.read_csv(f, sep=',', usecols=['Month','Day','AirTemperature']) for f in glob.glob(fmask)))
df.groupby(['Month','Day']).mean().to_csv('my_csv.csv')
Some details:
(pd.read_csv(f, sep=',', usecols=['Month','Day','AirTemperature']) for f in glob.glob('*.csv'))
will generate tuple of data frames from all your CSV files
pd.concat(...)
will concatenate them into resulting single DF
df.groupby(['Year','Month','Day']).mean()
will produce wanted report as a data frame, which might be saved into new CSV file:
.to_csv('my_csv.csv')
The problem is a little unclear, but assuming you have to iterate month by month, and apply the groupby as stated just use:
#Before loops
dflist=[]
Then in each loop do something like:
dflist.append(MeanDailyTemperaturesForCurrentMonth)
Then at the end:
final_df = pd.concat([dflist], axis=1)
and this will join everything into one dataframe.
Look at:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html
http://pandas.pydata.org/pandas-docs/stable/merging.html
You could do this to create an empty CSV and add columns without an index column as well.
import pandas as pd
df=pd.DataFrame(columns=["Col1","Col2","Col3"]).to_csv(filename.csv,index=False)