How to list to individual columns in CSV file? - python

I have a script that I have been working on where I pull info from a text file and output it to a CSV file with specific column headers.
I am having an issue with writing to the correct columns with the output. Instead of having it "interface_list" writing all the port names under "Interface", it instead writes all of them across the row. I am having the same issue for my other lists as well.
This is what the output looks like in the csv file:
Current Output
But I would like it to look like this:
Desired Output
I am kind of new to python but have been learning through online searches.
Can anybody help me understand how to get my lists to go in their respective columns?
Here is my code:
import netmiko
import csv
import datetime
import os
import sys
import re
import time
interface_pattern = re.compile(r'interface (\S+)')
regex_description = re.compile(r'description (.*)')
regex_switchport = re.compile(r'switchport (.*)')
with open('int-ports.txt','r') as file:
output = file.read()
with open('HGR-Core2.csv', 'a', newline='') as f:
writer = csv.writer(f)
writer.writerow(['Interface', 'Description', 'Switchport'])
interface_iter = interface_pattern.finditer(output)
interface_list = []
for interface in interface_iter:
interface_list.append(interface.group(1))
writer.writerow(interface_list)
description_iter = regex_description.finditer(output)
description_list = []
for description in description_iter:
description_list.append(description.group(1))
writer.writerow(description_list)
switchport_iter = regex_switchport.finditer(output)
switchport_list = []
for switchport in switchport_iter:
switchport_list.append(switchport.group(0))
f.close()
Thanks.

Append can get very bittersome in loops.
People don't realize how resource hungry things can get quickly.
1.A.) All hings to 1 dataframe that later you can export as csv
salary = [['Alice', 'Data Scientist', 122000],
['Bob', 'Engineer', 77000],
['Ann', 'Manager', 119000]]
# Method 2
import pandas as pd
df = pd.DataFrame(salary)
df.to_csv('file2.csv', index=False, header=False)
1.B.) 1 list to 1 specific column in dataframe from here
L = ['Thanks You', 'Its fine no problem', 'Are you sure']
#create new df
df = pd.DataFrame({'col':L})
print (df)
col
0 Thanks You
1 Its fine no problem
2 Are you sure
2.) Export as csv documentation
df.to_csv('name.csv',index=False)

Related

I need to split one column in csv file into two columns using python

Hello everyone I am learning python I am new I have a column in a csv file with this example of value:
I want to divide the column programme based on that semi column into two columns for example
program 1: H2020-EU.3.1.
program 2: H2020-EU.3.1.7.
This is what I wrote initially
import csv
import os
with open('IMI.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
with open('new_IMI.csv', 'w') as new_file:
csv_writer = csv.writer(new_file, delimiter='\t')
#for line in csv_reader:
# csv_writer.writerow(line)
please note that after i do the split of columns I need to write the file again as a csv and save it to my computer
Please guide me
Using .loc to iterate through each row of a dataframe is somewhat inefficient. Better to split an entire column, with the expand=True to assign to the new columns. Also as stated, easy to use pandas here:
Code:
import pandas as pd
df = pd.read_csv('IMI.csv')
df[['programme1','programme2']] = df['programme'].str.split(';', expand=True)
df.drop(['programme'], axis=1, inplace=True)
df.to_csv('IMI.csv', index=False)
Example of output:
Before:
print(df)
id acronym status programme topics
0 945358 BIGPICTURE SIGNED H2020-EU.3.1.;H2020-EU3.1.7 IMI2-2019-18-01
1 821362 EBiSC2 SIGNED H2020-EU.3.1.;H2020-EU3.1.7 IMI2-2017-13-06
2 116026 HARMONY SIGNED H202-EU.3.1. IMI2-2015-06-04
After:
print(df)
id acronym status topics programme1 programme2
0 945358 BIGPICTURE SIGNED IMI2-2019-18-01 H2020-EU.3.1. H2020-EU3.1.7
1 821362 EBiSC2 SIGNED IMI2-2017-13-06 H2020-EU.3.1. H2020-EU3.1.7
2 116026 HARMONY SIGNED IMI2-2015-06-04 H2020-EU.3.1. None
You can use pandas library instead of csv.
import pandas as pd
df = pd.read_csv('IMI.csv')
p1 = {}
p2 = {}
for i in range(len(df)):
if ';' in df['programme'].loc[i]:
p1[df['id'].loc[i]] = df['programme'].loc[i].split(';')[0]
p2[df['id'].loc[i]] = df['programme'].loc[i].split(';')[1]
df['programme1'] = df['id'].map(p1)
df['programme2'] = df['id'].map(p2)
and if you want to delete programme column:
df.drop('programme', axis=1)
To save new csv file:
df.to_csv('new_file.csv', inplace=True)

How to read a specific value form a row in CSV file?

I was trying to write a function that read a CSV file that looks like this.
flowers.csv
petunia,5.95
alyssum,3.95
begonia,5.95
sunflower,5.95
coelius,4.95
I have tried this code for my function.
def read_csv(csv_pricefile):
import csv
f = open(csv_pricefile)
li = []
for row in csv.reader(f):
li.append(row)
f.close()
print(li)
read_csv("flower.csv")
when I call my function it gives the following output.
[['petunia', '5.95'], ['alyssum', '3.95'], ['begonia', '5.95'], ['sunflower', '5.95'], ['coelius', '4.95']]
But I don't know how to write a function that will take two parameters for example,
read_csv("flowers.csv","alyssum")
If I call the function, it should give me the following output.
3.95
Use pandas library for read csv, this will make a dataframe object
import pandas as pd
df = pd.read_csv('flowers.csv')
df.columns =['flower','price']
Then if you want to know price of any flower
df = df.set_index(['flower'])
f = 'alyssum'
print("{} costs {}".format(f,df.loc[f].price))
Here is my solution I just tried
def read_csv(csv_pricefile,flower):
import csv
f = open(csv_pricefile)
my_dic = {}
for row in csv.reader(f):
myData = {row[0]:row[1]}
my_dic.update(myData)
f.close()
print(my_dic[flower])
read_csv("flower.csv","alyssum")

Python: Compare two csv files single column and write non-duplicate entries in seperate file

I have two csv files containing email addresses. One file consists of email addresses that i need to remove from the second file. i have a code but it seeems to be giving IndexError.
The sample code i worked on is
import csv
# Open details file and get a unique set of links
details_csv = csv.DictReader(open('D:/emails_to_remove.csv','r'))
details = set(i.get('link') for i in details_csv)
# Open master file and only retain the data not in the set
master_csv = csv.DictReader(open('D:/emails-list.csv','r'))
master = [i for i in master_csv if i.get('link') not in details]
# Overwrite master file with the new results
with open('D:/master-output.csv', 'w') as file:
writer = csv.DictWriter(file, master[0].keys(), lineterminator='\n')
writer.writeheader()
writer.writerows(master)
Content of file 1:
abc#123.com
efg#456.com
Content of file2:
ijk#987.com
abc#123.com
Desired Output:
efg#456.com
ijk#987.com
The problem can be easily solved with sets like so
set1 = {"abc#123.com", "efg#456.com"}
set2 = {"ijk#987.com", "abc#123.com"}
set3 = set1.union(set2) - set1.intersection(set2)
print(set3)
# set(['ijk#987.com', 'efg#456.com'])
A good source to learn what can be done with sets is e.g. https://www.geeksforgeeks.org/intersection-function-python/.
You can use pandas dataframe for this purpose.
import pandas as pd
details_csv = pd.read_csv('D:/emails_to_remove.csv')
master_csv = pd.read_csv('D:/emails-list.csv')
fn = master_csv[~(master_csv["emails"].isin(details_csv["emails"]))].reset_index(drop = True)
cn = details_csv[~(details_csv["emails"].isin(master_csv["emails"]))].reset_index(drop=True)
final = pd.concat([cn,fn])
df.to_csv(r'Path\File Name.csv')
print(final)
sample code is working for your problem but you must add "emails" header to the csv files.
There is a pandas package that helps you simplify csv processing. Below is how you use it for your purpose
import pandas as pd
details_df = pd.read_csv('D:/emails_to_remove.csv')
master_df = pd.read_csv('D:/emails-list.csv')
# 1. Concat both csv
merged_df = pd.concat([details_df, master_df], ignore_index=True).reset_index(drop=True)
# 2. Drop rows with duplicates email
merged_df.drop_duplicates(subset='emails', keep=False)
# You can save them if you wish
merged_df.to_csv("D:/final.csv")

How to cleanse a string of data so it can be used in Pandas / Converting one column into multiple columns

I am trying to analyse WhatsApp by putting it into a Pandas dataframe, however it is only being read as a single column when I do enter it. What do I need to do to correct my error? I believe my error is due to how it needs to be formatted
I have tried to read it and then use Pandas to make it into columns, but because of how it is read, I believe it only sees one column.
I have also tried to use pd.read_csv and that method does not yield the correct result either and the sep method too
The information from whatsapp is presented as follows in notebook:
[01/09/2017, 13:51:27] name1: abc
[02/09/2017, 13:51:28] name2: def
[03/09/2017, 13:51:29] name3: ghi
[04/09/2017, 13:51:30] name4: jkl
[05/09/2017, 13:51:31] name5: mno
[06/09/2017, 13:51:32] name6: pqr
The python code is as folows:
enter code here
import re
import sys
import pandas as pd
pd.set_option('display.max_rows', 500)
def read_history1(file):
chat = open(file, 'r', encoding="utf8")
#get all which exist in this format
messages = re.findall('\d+/\d+/\d+, \d+:\d+:\d+\W .*: .*', chat.read())
print(messages)
chat.close()
#make messages into a database
history = pd.DataFrame(messages,columns=['Date','Time', 'Name',
'Message'])
print(history)
return history
#the encoding is added because of the way the file is written
#https://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-
codec-cant-decode-byte-x-in-position-y-character/9233174
#i tried using sep, but it is not ideal for this data
def read_history2(file):
messages = pd.read_csv(file)
messages.columns = ['a','b]
print(messages.head())
return
filename = "AFC_Test.txt"
read_history2(filename)
The two methods I have tried are above.
I expect 4 coluumns.
The date, time, name and the message for each row
In case anyone comes across this I resolved it as follows:
The error was in the regex
def read_history2(file):
print('\n')
chat = open(file, 'r', encoding="utf8")
content = re.findall('\W(\d+/\d+/\d+), (\d+:\d+:\d+)\W (.*): (.*)', chat.read())
history = pd.DataFrame(content, columns=['Date','Time', 'Name', 'Message'])
print(history)
filename = "AFC_Test.txt"
read_history2(filename)
So you can split each line into a set of strings, with code that might look a bit like this:
# read in file
with open(file, 'r', encoding="utf8") as chat:
contents = chat.read()
# list for each line of the dataframe
rows = []
# clean data up into nice strings
for line in contents.splitlines():
newline = line.split()
for item in newline:
item = item.strip("[],:")
rows.append(line)
# create dataframe
history = pd.DataFrame(rows, columns=['Date','Time', 'Name', 'Message']
I think that should work!
Let me know how it goes :)

2 Nested for loops, 2 variables (1 from a csv) & Unexpected Results

I've been puzzling over why this for loop won't generate the expected results - for their to be 1 matching entry that generates a 'yes' output. Can anyone help point out my error? The csv that I'm importing has 7 columns & 17,000 rows. "alle" is also imported from a csv to list with with 4 elements - each of which is a list of 6 elements. I'm using Python2.7 & I realize I'm opening more libraries than I need, but I am new & didn't want to remove any that could break the code before posting this.
"alle" elements look like:
['Danlaw Inc', 'Applications Engineer', 'Novi, MI', 'http://www.indeed.com/rc/clk?jk=e199589101464b99', 'Novi', 'MI']
Rows of the csv file look like:
['4318055', 'Brownsville', 'LA', 'Brownsville, LA', '32.48709', '-92.1543', '4317']
Here's my code:
import math
import csv
import urllib2
from urllib2 import urlopen
import json
from json import load
import requests
from pprint import pprint
from time import sleep
f = open(r'C:\Users\****\*****\Python\Best City Pop Long Lat Data\UScities1000_Trimmed_Full_NoHeader.csv', "rb")
csv_f1 = csv.reader(f)
for a in alle:
for e in csv_f1:
if a[2] == e[3]:
print ('yes')
I have confirmed that both lists have the matching entry (which is a city & state - "Novi, MI"), but when I run the code I don't get any "yes" as output. Any thoughts? Thank you!
UPDATE:
Here is how I'm appending the "alle" csv list variable that I think is causing the problem:
def splitter(element):
city,state=element.split(', ',1)
return city, state
# >>>>> Assign variable to input city
#cities = []
#location = []
for e in alle:
if ', ' in e[2]:
city,state = splitter(e[2])
#location = [[city],[state]]
#e.append(location)
e.append(city)
e.append(state)
Ok, so I figured it out. The way I was reading data from the csv file into the list variable was the culprit - although I don't know the exact reason. However, the below works:
alle=[]
with open(r'C:\Users\****\******\Recruiting Business\Proximity Project\AllDailyCIJobs 26thJan10_12-30daysNoHeader.csv', "r+b") as f:
reader = csv.reader(f)
for row in reader:
alle.append(row)
This is what I was trying to use before:
f = open(r'C:\Users\****\******\Recruiting Business\Proximity Project\AllDailyCIJobs 26thJan10_12-30daysNoHeader.csv', "rb")
alle = csv.reader(f)

Categories

Resources