Intro Python question: I am working on a program that counts the number of politicians in each political party for each session of the U.S. Congress. I'm starting from a .csv with biographical data, and wish to export my political party membership count as a new .csv. This is what I'm doing:
import pandas as pd
read = pd.read_csv('30.csv', delimiter = ';', names = ['Name', 'Years', 'Position', 'Party', 'State', 'Congress'])
party_count = read.groupby('Party').size()
with open('parties.csv', 'a') as f:
party_count.to_csv(f, header=False)
This updates my .csv to read as follows:
'Year','Party','Count'
'American Party',1
'Democrat',162
'Independent Democrat',3
'Party',1
'Whig',145
I next need to include the date under my first column ('Year'). This is contained in the 'Congress' column in my first .csv. What do I need to add to my final line of code to make this work?
Here is a snippet from the original .csv file I am drawing from:
'Name';'Years';'Position';'Party';'State';'Congress'
'ABBOTT, Amos';'1786-1868';'Representative';'Whig';'MA';'1847'
'ADAMS, Green';'1812-1884';'Representative';'Whig';'KY';'1847'
'ADAMS, John Quincy';'1767-1848';'Representative';'Whig';'MA';'1847'
You can merge back the counts of Party to your original dataframe by:
party_count = df.groupby('Party').size().reset_index(name='Count')
df = df.merge(party_count, on='Party', how='left')
Once you have the count of parties now you can select your data. For eg: If you need [Congress, Party, Count] you can use:
out_df = df[['Congress ', 'Party', 'Count']].drop_duplicates()
out_df.columns = ['Year', 'Party', 'Count']
Here, out_df being the dataframe you can write to my.csv file.
out_df.to_csv('my.csv', index=False)
Related
To clarify, I have 2 CSV files I want to read.
First CSV has the following headers: ['ISO3', 'Languages', 'Country Name'].
Second CSV has the following headers: ['ISO3', 'Area', 'Country Name']
I want to write to a new CSV file with the following headers (and their corresponding values obviously), so like: ['ISO3', 'Area', 'Languages', 'Country Name']. Basically, I want to merge the 2 CSVs, without having the duplication of ISO3 and Country Name.
Right now, i am reading both CSVs and then I am able to successfully write the 'Area' to the original written CSV which contains only ['ISO3', 'Languages', 'Country Name'].
However, the formatting is off.
import csv
filePath = '/file/path/shortlist_languages.csv'
fp_write = input("Enter fp for writing new CSV (do not include .csv extension): ")
country_data_fields =[]
with open(filePath) as file:
reader = csv.DictReader(file)
for row in reader:
country_data_fields.append({
'Languages': row['Languages'],
'Country Name': row['Country Name'],
'ISO3': row['ISO3']
})
with open('/file/path/shortlist_area.csv') as file_t:
reader = csv.DictReader(file_t)
for row in reader:
country_data_fields.append({
'Area': row['Area'],
})
with open(fp_write+'country_data_table.csv', 'w',
newline='') as country_data_fields_csv:
fieldnames = ['Languages', 'Country Name', 'ISO3', 'Area']
csv_dict_writer = csv.DictWriter(country_data_fields_csv, fieldnames=fieldnames)
csv_dict_writer.writeheader()
for data in country_data_fields:
csv_dict_writer.writerow(data)
The CSV result looks like the below:
Languages,Country Name,ISO3,Area
Albanian,Albania,ALB,
Arabic,Algeria,DZA,
Catalan,Andorra,AND,
Portuguese,Angola,AGO,
English,Antigua and Barbuda,ATG,
,,,28748
,,,2381741
,,,468
,,,1246700
,,,442
I want the "Area" values to be nicely lined up with the others though, so how?
I understand that you're identifying each record by the 'ISO3' key? Use a dict instead of a list, using the 'ISO3' value as a key.
In the first loop instead of .append just set the dict value with the key, in the second loop get the existing record dict for that key, set ['Area'] to the row['Area'] value, and it should update properly. Something like this (not tested):
for row in reader:
iso3 = row['ISO3']
country_record = country_data_fields[iso3]
country_record['Area'] = row['Area']
Modify the final loop to iterate through the dict instead of a list.
I used JotForm Configurable list widget to collect data, but having troubles pwhile parsing or reading the data as the number of records > 2K
The configurable field name is Person Details and the list has these options to take as input,
Name Gender Date of Birth Govt. ID Covid Test Covid Result Type of Follow Up Qualification Medical History Disabilities Employment Status Individual Requirement
A Snap of the excel file, Configurable List Submissions
I want the excel or csv sheet having the data as one column as per the snap be exported into different columns with the list options mentioned above as the heading for each column
I'm very much new to python, pandas or data parsing, and this is for a very important and social benefit project to help people during this time of COVID Crisis , so any help would be gladly appreciated :)
This having the labels in each row isn't something the standard pandas tools like read_csv handle natively. I would iterate through the rows as text strings, and then build the dataframe one row at a time. We will do this by getting each line into the form pd.Series({"Column1": "data", "Column2": "data"...}), and then building a dataframe out of a list of those objects.
import pandas as pd
##Sample Data
data = ["Column1: Data1, Column2: Data2, Column3: Data3", "Column1: Data4, Column2: Data5, Column3: Data6"]
rows = []
##Iterate over rows
for line in data:
##split along commas
split1 = line.split(',')
##
split2 = [s.split(': ') for s in split1]
Now split2 for a row looks like this: [['Column1', ' Data1'], [' Column2', ' Data2'], [' Column3', ' data3']]
##make a series
row = pd.Series({item[0]: item[1] for item in split2})
rows.append(row)
df = pd.DataFrame(rows)
Now df looks like this:
Column1 Column2 Column3
0 Data1 Data2 Data3
1 Data4 Data5 Data6
and you can save it in this format with df.to_csv("filename.csv") and open it in tools like excel.
I have data which consists 3004 rows without header, and each row has different number of fields (e.g. for row number 1,2,3,4 has 16,17,21,12, respectively). Here is my code when I call the csv.
df = pd.read_csv(file,'rb', delimiter ='\t', engine='python')
here is the output:
$GPRMC,160330.40,A,1341.,N,10020.,E,0.006,,150517,,,A*7D
$GPGGA,160330.40,1341.,N,10020.,E,1,..
$PUBX,00,160330.40,1341.,N,10020.,E,...
$PUBX,03,20,2,-,056,40,,000,5,U,014,39,41,026,...
$PUBX,04,160330.40,150517,144210.39,1949,18,-6...
ÿ$GPRMC,160330.60,A,1341.,N,10020.,E...
$GPGGA,160330.60,1341.,N,10020.,E,1,...
It seemed like delimiter didn't work at all to separate the data into column by column. Hence, I tried with columns function based on number of fields from ($PUBX, 00). Here is the code when I add columns:
my_cols = ['MSG type', 'ID MSG', 'UTC','LAT', 'N/S', 'LONG', 'E/W', 'Alt', 'Status','hAcc', 'vAcc','SOG', 'COG', 'VD','HDOP', 'VDOP', 'TDOP', 'Svs', 'reserved', 'DR', 'CS', '<CR><LF>']
df = pd.read_csv(file, 'rb', header = None, na_filter = False, engine = 'python', index_col=False, names=my_cols)
and the result be like the picture below. The file becomes into one column in 'MSG type'.
the output
My purpose after success to call this csv is to read rows only with combination between $PUBX, 00,... and one column of $PUBX, 04,... and write it to csv. But, I am still struggling how to separate the file into columns. Please advice me on this matter. Thank you very much.
pd.read_csv
is used for reading CSV(comma separated values) Files hence you don't need to specify a delimiter.
If you want to read a TSV (Tab separated values) File, you can use:
pd.read_table(filepath)
The default separator is tab
Hat Tip to Ilja Everilä
#Hasanah Based on your code:
df = pd.read_csv(file,'rb', delimiter ='\t', engine='python')
delimiter='\t' tells pandas to separate the data into fields based on tab characters.
The default delimiter when pandas reads in csv files is a comma, so you should not need to define a delimiter:
df = pd.read_csv(file,'rb', engine='python')
I am working on a web scraper and i have 3 lists containing items, and i am trying to save them to a csv file and i want associate each list of items to a column header like name list goes under column NAME email list go under column EMAIL and so on, but for some reason everything is getting grouped together in one column
here is the code
foundings = {'NAME': namelist, 'EMAIL': emaillist, 'PHONE': phoneliist}
Save_to_Csv(foundings)
def Save_to_Csv(data):
filename = 'loki.csv'
df = pd.DataFrame(data)
df.set_index('NAME', drop=True, inplace=True)
if os.path.isfile(filename):
with open(filename,'a') as f:
df.to_csv(f, mode='a', sep="\t", header=False, encoding='utf-8')
else:
df.to_csv(filename, sep="\t")
simple enough code to make the NAME the index and add each list of elements to associated columns
but it does something like this
Groups everything together in a single column
what am i doing wrong here ?
import csv
f = csv.reader(open('lmt.csv','r')) # open input file for reading
Date, Open, Hihh, mLow, Close, Volume = zip(*f) #s plit it into separate columns
ofile = open("MYFILEnew1.csv", "wb") # output csv file
c = csv.writer(ofile)
item = Date
item2 = Volume
rows = zip(item, item)
i = 0
for row in item2:
print row
writer = csv.writer(ofile, delimiter='\t')
writer.writerow([row])
ofile.close()
Above is what I have produced so far.
As you can see in the 3rd line, I have extracted 6 columns from a spreadsheet.
I want to create a .csv file under the name of MYFILEnew1.csv which only has two columns, Date and Volume.
What I have above creates a .csv that only writes Volume column into the first column of the new .csv file.
How would you go about placing Date into the second column?
For example
Date Open High Low Close Volume
17-Feb-16 210 212.97 209.1 212.74 1237731
is what i have. and Id like to produce a new csv file such that it has
Date Volume
17-Feb-16 1237731
If I understand you question correctly, you can achieve that very easily using panda's read_csv and to_csv (#downvoter: Could you explain your downvote, please!?); the final solution to your problem can be found below EDIT2:
import pandas as pd
# this assumes that your file is comma separated
# if it is e.g. tab separated you should use pd.read_csv('data.csv', sep = '\t')
df = pd.read_csv('data.csv')
# select desired columns
df = df[['Date', 'Volume']]
#write to the file (tab separated)
df.to_csv('MYFILEnew1.csv', sep='\t', index=False)
So, if your data.csv file looks like this:
Date,Open,Hihh,mLow,Close,Volume
1,5,9,13,17,21
2,6,10,14,18,22
3,7,11,15,19,23
4,8,12,16,20,24
The the MYFILEnew1.csv would look like this after running the script above:
Date Volume
1 21
2 22
3 23
4 24
EDIT
Using your data (tab separated, stored in the file data3.csv):
Date Open Hihh mLow Close Volume
17-Feb-16 210 212.97 209.1 212.74 1237731
Then
import pandas as pd
df = pd.read_csv('data3.csv', sep='\t')
# select desired columns
df = df[['Date', 'Volume']]
# write to the file (tab separated)
df.to_csv('MYFILEnew1.csv', sep='\t', index=False)
gives the desired output
Date Volume
17-Feb-16 1237731
EDIT2
Since your header in your input csv file seems to be messed up (as discussed in the comments), you have to rename the first column. The following now works fine for me using your entire dataset:
import pandas as pd
df = pd.read_csv('lmt.csv', sep=',')
# get rid of the wrongly formatted column name
df.rename(columns={df.columns[0]: 'Date' }, inplace=True)
# select desired columns
df = df[['Date', 'Volume']]
# write to the file (tab separated)
df.to_csv('MYFILEnew1.csv', sep='\t', index=False)
Here I would suggest using the csv module's csv.DictReader object to read and write from the files. To read the file, you would do something like
import csv
fieldnames=('Date', 'Open', 'High', 'mLow', 'Close', 'Volume')
with open('myfilename.csv') as f:
reader = csv.DictReader(f, fieldnames=fieldnames)
Beyond this, you will just need to filter out the keys you don't want from each row and similarly use the csv.DictWriter class to write to your export file.
You were so close:
import csv
f = csv.reader(open('lmt.csv','rb')) # csv is binary
Date, Open, Hihh, mLow, Close, Volume = zip(*f)
rows = zip(Date, Volume)
ofile = open("MYFILEnew1.csv", "wb")
writer = csv.writer(ofile)
for row in rows:
writer.writerow(row) # row is already a tuple so no need to make it a list
ofile.close()