Append to a DataFrame within a for loop - python

I have data in a csv file, where I want to execute a HTTP GET request for every row in the csv and store the results of the request in a DataFrame.
Here's what I'm working with so far:
with open('input.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
df = pd.DataFrame()
for row in csv_reader:
result = requests.get(BASEURL+row['ID']+"&access_token="+TOKEN).json()
data = pd.DataFrame(result)
df.append(data)
However, this doesn't seem to be appending to the df?
Note the json response will always return id, first_name, last_name key-value pairs.

The append operation returns a new dataframe with the appended data.
Change the last line to:
df = df.append(data)

Related

How do i read two CSV files, then merge their data, and write to one CSV file in Python?

To clarify, I have 2 CSV files I want to read.
First CSV has the following headers: ['ISO3', 'Languages', 'Country Name'].
Second CSV has the following headers: ['ISO3', 'Area', 'Country Name']
I want to write to a new CSV file with the following headers (and their corresponding values obviously), so like: ['ISO3', 'Area', 'Languages', 'Country Name']. Basically, I want to merge the 2 CSVs, without having the duplication of ISO3 and Country Name.
Right now, i am reading both CSVs and then I am able to successfully write the 'Area' to the original written CSV which contains only ['ISO3', 'Languages', 'Country Name'].
However, the formatting is off.
import csv
filePath = '/file/path/shortlist_languages.csv'
fp_write = input("Enter fp for writing new CSV (do not include .csv extension): ")
country_data_fields =[]
with open(filePath) as file:
reader = csv.DictReader(file)
for row in reader:
country_data_fields.append({
'Languages': row['Languages'],
'Country Name': row['Country Name'],
'ISO3': row['ISO3']
})
with open('/file/path/shortlist_area.csv') as file_t:
reader = csv.DictReader(file_t)
for row in reader:
country_data_fields.append({
'Area': row['Area'],
})
with open(fp_write+'country_data_table.csv', 'w',
newline='') as country_data_fields_csv:
fieldnames = ['Languages', 'Country Name', 'ISO3', 'Area']
csv_dict_writer = csv.DictWriter(country_data_fields_csv, fieldnames=fieldnames)
csv_dict_writer.writeheader()
for data in country_data_fields:
csv_dict_writer.writerow(data)
The CSV result looks like the below:
Languages,Country Name,ISO3,Area
Albanian,Albania,ALB,
Arabic,Algeria,DZA,
Catalan,Andorra,AND,
Portuguese,Angola,AGO,
English,Antigua and Barbuda,ATG,
,,,28748
,,,2381741
,,,468
,,,1246700
,,,442
I want the "Area" values to be nicely lined up with the others though, so how?
I understand that you're identifying each record by the 'ISO3' key? Use a dict instead of a list, using the 'ISO3' value as a key.
In the first loop instead of .append just set the dict value with the key, in the second loop get the existing record dict for that key, set ['Area'] to the row['Area'] value, and it should update properly. Something like this (not tested):
for row in reader:
iso3 = row['ISO3']
country_record = country_data_fields[iso3]
country_record['Area'] = row['Area']
Modify the final loop to iterate through the dict instead of a list.

Python Pandas save to csv Column headers garbled

I am working on a web scraper and i have 3 lists containing items, and i am trying to save them to a csv file and i want associate each list of items to a column header like name list goes under column NAME email list go under column EMAIL and so on, but for some reason everything is getting grouped together in one column
here is the code
foundings = {'NAME': namelist, 'EMAIL': emaillist, 'PHONE': phoneliist}
Save_to_Csv(foundings)
def Save_to_Csv(data):
filename = 'loki.csv'
df = pd.DataFrame(data)
df.set_index('NAME', drop=True, inplace=True)
if os.path.isfile(filename):
with open(filename,'a') as f:
df.to_csv(f, mode='a', sep="\t", header=False, encoding='utf-8')
else:
df.to_csv(filename, sep="\t")
simple enough code to make the NAME the index and add each list of elements to associated columns
but it does something like this
Groups everything together in a single column
what am i doing wrong here ?

Mapping CSV Header using a Dictionary

I have a reference file that looks like this:
Experiment,Array,Drug
8983,Genechip,Famotidine
8878,Microarray,Dicyclomine
8988,Genechip,Etidronate
8981,Microarray,Flunarizine
I successfully created a dictionary mapping the Experiment numbers to the Drug name using the following:
reader = csv.reader(open('C:\Users\Troy\Documents\ExPSRef.txt'))
#Configure dictionary
result = {}
for row in reader:
key = row[0]
result[key] = row[2]
di = result
I want to map this dictionary to the header of another file which consists of the experiment number. It currently looks like this:
Gene,8988,8981,8878,8983
Vcp,0.011,-0.018,-0.032,-0.034
Ube2d2,0.034,0.225,-0.402,0.418
Becn1,0.145,-0.108,-0.421,-0.048
Lypla2,-0.146,-0.026,-0.101,-0.011
But it should look like this:
Gene,Etidronate,Flunarizine,Dicyclomine,Famotidine
Vcp,0.011,-0.018,-0.032,-0.034
Ube2d2,0.034,0.225,-0.402,0.418
Becn1,0.145,-0.108,-0.421,-0.048
Lypla2,-0.146,-0.026,-0.101,-0.011
I tried using:
import csv
import pandas as pd
reader = csv.reader(open('C:\Users\Troy\Documents\ExPSRef.txt'))
result = {}
for row in reader:
key = row[0]
result[key] = row[2]
di = result
df = pd.read_csv('C:\Users\Troy\Documents\ExPS2.txt')
df['row[0]'].replace(di, inplace=True)
but it returned a KeyError: 'row[0]'.
I tried the following as well, even transposing in order to merge:
import pandas as pd
df1 = pd.read_csv('C:\Users\Troy\Documents\ExPS2.txt',).transpose()
df2 = pd.read_csv('C:\Users\Troy\Documents\ExPSRef.txt', delimiter=',', engine='python')
df3 = df1.merge(df2)
df4 = df3.set_index('Drug').drop(['Experiment', 'Array'], axis=1)
df4.index.name = 'Drug'
print df4
and this time received MergeError('No common columns to perform merge on').
Is there a simpler way to map my dictionary to the header that would work?
One of the things to keep in mind would be to making sure that both the keys corresponding to the mapper dictionary as well as the header which it is mapped to are of the same data type.
Here, one is a string and the other of integer type. So while reading itself, we'll let it not interpret dtype by setting it to str for the reference DF.
df1 = pd.read_csv('C:\Users\Troy\Documents\ExPS2.txt') # Original
df2 = pd.read_csv('C:\Users\Troy\Documents\ExPSRef.txt', dtype=str) # Reference
Convert the columns of the original DF to it's series representation and then replace the old value which were Experiment Nos. with the new Drug name retrieved from the reference DF.
df1.columns = df1.columns.to_series().replace(df2.set_index('Experiment').Drug)
df1
I used csv for the whole script. This fixes the header you wanted and saves into a new file. The new filename can be replaced with the same one if that's what you prefer. This program is written with python3.
import csv
with open('sample.txt', 'r') as ref:
reader = csv.reader(ref)
# skip header line
next(reader)
# make dictionary
di = dict([(row[0], row[2]) for row in reader])
data = []
with open('sample1.txt', 'r') as df:
reader = csv.reader(df)
header = next(reader)
new_header = [header[0]] + [di[i] for i in header if i in di]
data = list(reader)
# used to make new file, can also replace with the same file name
with open('new_sample1.txt', 'w') as df_new:
writer = csv.writer(df_new)
writer.writerow(new_header)
writer.writerows(data)

Removing a row from CSV with python if data wasn't recorded in a column

I'm trying to import a batch of CSV's into PostgreSQL and constantly run into an issue with missing data:
psycopg2.DataError: missing data for column "column_name" CONTEXT:
COPY table_name, line where ever in the CSV that data wasn't
recorded, and here are data values up to the missing column.
There is no way to get the complete set of data written to the row at times, and I have to deal with the files as is. I am trying to figure a way to remove the row if data wasn't recorded into any column. Here's what I have:
file_list = glob.glob(path)
for f in file_list:
filename = os.path.basename(f) #get the file name
arc_csv = arc_path + filename #path for revised copy of CSV
with open(f, 'r') as inp, open(arc_csv, 'wb') as out:
writer = csv.writer(out)
for line in csv.reader(inp):
if "" not in line: #if the row doesn't have any empty fields
writer.writerow(line)
cursor.execute("COPY table_name FROM %s WITH CSV HEADER DELIMITER ','",(arc_csv,))
You could use pandas to remove rows with missing values:
import glob, os, pandas
file_list = glob.glob(path)
for f in file_list:
filename = os.path.basename(f)
arc_csv = arc_path + filename
data = pandas.read_csv(f, index_col=0)
ind = data.apply(lambda x: not pandas.isnull(x.values).any(), axis=1)
# ^ provides an index of all rows with no missing data
data[ind].to_csv(arc_csv) # writes the revised data to csv
However, this could get slow if you're working with large datasets.
EDIT - added index_col=0 as an argument to pandas.read_csv() to prevent the added index column issue. This uses the first column in the csv as an existing index. Replace 0 with another column's number if you have reason not to use the first column as index.
Unfortunately, you cannot parameterize table or column names. Use string formatting, but make sure to validate/escape the value properly:
cursor.execute("COPY table_name FROM {column_name} WITH CSV HEADER DELIMITER ','".format(column_name=arc_csv))

how to write a matrix to a csv file in python with adding static headers in first row and first column?

I have a matrix which is generated after running a correlation - mat = Statistics.corr(result, method="pearson"). now I want to write this matrix to a csv file but I want to add headers to the first row and first column of the file so that the output looks like this:
index,col1,col2,col3,col4,col5,col6
col1,1,0.005744233,0.013118052,-0.003772589,0.004284689
col2,0.005744233,1,-0.013269414,-0.007132092,0.013950261
col3,0.013118052,-0.013269414,1,-0.014029249,-0.00199437
col4,-0.003772589,-0.007132092,-0.014029249,1,0.022569309
col5,0.004284689,0.013950261,-0.00199437,0.022569309,1
I have a list which contains the columns names - colmn = ['col1','col2','col3','col4','col5','col6']. The index in the above format is a static string to indicate the index names. i wrote this code but it only add the header in first row but i am unable to get the header in the first column as well:
with open("file1", "wb") as f:
writer = csv.writer(f,delimiter=",")
writer.writerow(['col1','col2','col3','col4','col5','col6'])
writer.writerows(mat)
How can I write the matrix to a csv file with heading static headers to the first row and 1st column?
You could use pandas. DataFrame.to_csv() defaults to writing both the column headers and the index.
import pandas as pd
headers = ['col1','col2','col3','col4','col5','col6']
df = pd.DataFrame(mat, columns=headers, index=headers)
df.to_csv('file1')
If on the other hand this is not an option, you can add your index with a little help from enumerate:
with open("file1", "wb") as f:
writer = csv.writer(f,delimiter=",")
headers = ['col1','col2','col3','col4','col5','col6']
writer.writerow(['index'] + headers)
# If your mat is already a python list of lists, you can skip wrapping
# the rows with list()
writer.writerows(headers[i:i+1] + list(row) for i, row in enumerate(mat))
You can use a first variable to indicate the first line, and then add each row name to the row as it is written:
cols = ["col2", "col2", "col3", "col4", "col5"]
with open("file1", "wb") as f:
writer = csv.writer(f)
first = True
for i, line in enumerate(mat):
if first:
writer.writerow(["Index"] + cols)
first = False
else:
writer.writerow(["Row"+str(i)] + line)

Categories

Resources