So I am collecting data and this data is saved into csv files, however for presentation purposes I want to reorder the columns in each respective csv file based on it's related "order".
I was using this question (write CSV columns out in a different order in Python) as a guide but I'm not sure why I'm getting the error
writeindices = [name2index[name] for name in writenames]
KeyError: % Processor Time
when I run it. Note this error doesn't seem to be limited to just the string % Processor Time'.
Where am I going wrong?
Here is my code:
CPU_order=["%"+" Processor Time", "%"+" User Time", "Other"]
Memory_order=["Available Bytes", "Pages/sec", "Pages Output/sec", "Pages Input/sec", "Page Faults/sec"]
def reorder_csv(path,title,input_file):
if title == 'CPU':
order=CPU_order
elif title == 'Memory':
order=Memory_order
output_file=path+'/'+title+'_reorder'+'.csv'
writenames = order
reader = csv.reader(input_file)
writer = csv.writer(open(output_file, 'wb'))
readnames = reader.next()
name2index = dict((name, index) for index, name in enumerate(readnames))
writeindices = [name2index[name] for name in writenames]
reorderfunc = operator.itemgetter(*writeindices)
writer.writerow(writenames)
for row in reader:
writer.writerow(reorderfunc(row))
Here is a sample of what the input CSV file looks like:
,CPU\% User Time,CPU\% Processor Time,CPU\Other
05/23/2016 06:01:51.552,0,0,0
05/23/2016 06:02:01.567,0.038940741537158409,0.62259056657940626,0.077882481554869071
05/23/2016 06:02:11.566,0.03900149141703179,0.77956981074955856,0
05/23/2016 06:02:21.566,0,0,0
05/23/2016 06:02:31.566,0,1.1695867249963632,0
Your code works. It is your data which does not have a column named "% Processor Time". Here is a sample data I use:
Other,% User Time,% Processor Time
o1,u1,p1
o2,u2,p2
And here is the code which I call:
reorder_csv('.', 'CPU', open('data.csv'))
With these settings, everything works fine. Please check your data.
Update
Now that I see your data, it looks like your have column names such as "CPU\% Processor Time" and want to translate it to "% Processor Time" before writing out. All you need to do is creating your name2index this way:
name2index = dict((name.replace('CPU\\', ''), index) for index, name in enumerate(readnames))
The difference here is instead of name, you should have name.replace('CPU\\', ''), which get rid of the CPU\ part.
Update 2
I reworked your code to use csv.DictReader and csv.DictWriter. I also assume that "CPU\% Prvileged Time" will be transformed into "Other". If that is not the case, you can fix it in the transformer dictionary.
import csv
import os
def rename_columns(row):
""" Take a row (dictionary) of data and return a new row with columns renamed """
transformer = {
'CPU\\% User Time': '% User Time',
'CPU\\% Processor Time': '% Processor Time',
'CPU\\% Privileged Time': 'Other',
}
new_row = {transformer.get(k, k): v for k, v in row.items()}
return new_row
def reorder_csv(path, title, input_file):
header = dict(
CPU=["% Processor Time", "% User Time", "Other"],
Memory=["Available Bytes", "Pages/sec", "Pages Output/sec", "Pages Input/sec", "Page Faults/sec"],
)
reader = csv.DictReader(input_file)
output_filename = os.path.join(path, '{}_reorder2.csv'.format(title))
with open(output_filename, 'wb') as outfile:
# Create a new writer where each row is a dictionary.
# If the row contains extra keys, ignore them
writer = csv.DictWriter(outfile, header[title], extrasaction='ignore')
writer.writeheader()
for row in reader:
# Each row is a dictionary, not list
print row
row = rename_columns(row)
print row
print
writer.writerow(row)
Related
I stored data on a CSV file with Python. Now I need to read it with Python but there are some issues with it. There is a
";;;;;;"
statement on the finish of every line.
Here is the code that I used for writing data to CSV :
file = open("products.csv", "a")
writer = csv.writer(file, delimiter=",", quotechar='"', quoting=csv.QUOTE_ALL)
writer.writerow(data)
And I am trying to read that with that code :
with open("products.csv", "r", newline="") as in_file, open("3.csv", "w", newline='') as to_file:
reader = csv.reader(in_file, delimiter="," ,doublequote=True)
for row in reader:
print(row)
Of course, I am not reading it for just printing I need to remove duplicated lines and make it a readable CSV.
I've tried this to fetch strings and edit them and it's worked for other fields except for semicolons. I cant understand why I cant edit those semicolons.
for row in reader:
try:
print(row)
rowList = row[0].split(",")
for index, field in enumerate(rowList):
if '"' in field:
field = field.replace('"', "")
elif ";;;;;;" in rowList[index]:
field = field.replace(";;;;;;", "")
rowList[index] = field
print(rowList)
Here is the output of the code above :
['Product Name', 'Product Description', 'SKU', 'Regular Price', 'Sale Price', 'Images;;;;;;']
Can anybody help me?
I realized that I used 'elif' on there. I changed it and it solved. Thanks for the help. But I still don't know why it added that semicolon to there.
Working on a project creating a python flask website that stores user logins into a text file. I have a text file where each line is one user and each user has 5 parameters stored on the line. All user parameters are separated by a ; character.
Parameters are:
username
password
first name
last name
background color
title
avatar
Sample of the text file:
joebob;pass1;joe;bob;yellow;My title!!;https://upload.wikimedia.org/wikipedia/commons/c/cd/Stick_Figure.jpg
richlong;pass2;rich;long;blue;My title2!!;https://www.iconspng.com/images/stick-figure-walking/stick-figure-walking.jpg
How do I go about storing the parameters into a python array, and how do I access them later when I need to reference log-ins.
Here is what I wrote so far:
accounts = { }
def readAccounts():
file = open("assignment11-account-info.txt", "r")
for accounts in file: #line
tmp = accounts.split(';')
for data in tmp: #data in line
accounts[data[0]] = {
'user': data[0],
'pass': data[1],
'first': data[2],
'last': data[3],
'color': data[4],
'title': data[5],
'avatar': data[6].rstrip()
}
file.close()
You can use the python builtin csv to parse
import csv
with open("assignment11-account-info.txt", "r") as file:
reader = csv.reader(file, delimiter=';')
result = []
for row in reader:
fields = ('user', 'passwd', 'first', 'last', 'color','title','avatar')
res = dict(zip(fields, row))
result.append(res)
Or equivalent but harder to read for a beginner the pythonic list comprehension:
with open("assignment11-account-info.txt", "r") as file:
reader = csv.reader(file, delimiter=';')
fields = ('user', 'passwd', 'first', 'last', 'color','title','avatar')
result = [ dict(zip(fields, row)) for row in reader ]
Here's what I might do:
accounts = {}
with open("assignment11-account-info.txt", "r") as file:
for line in file:
fields = line.rstrip().split(";")
user = fields[0]
pass = fields[1]
first = fields[2]
last = fields[3]
color = fields[4]
title = fields[5]
avatar = fields[6]
accounts[user] = {
"user" : user,
"pass" : pass,
"first" : first,
"last" : last,
"color" : color,
"title" : title,
"avatar" : avatar
}
By using with, the file handle file is closed for you automatically. This is the most "Python"-ic way of doing things.
So long as user is unique, you won't overwrite any entries you put in as you read through the file assignment11-account-info.txt.
If you need to deal with a case where user is repeated in the file assignment11-account-info.txt, then you need to use an array or list ([...]) as opposed to a dictionary ({...}). This is because reusing the value of user will overwrite any previous user entry you add to accounts. Overwriting existing entries is almost always a bad thing when using dictionaries!
If that is the case, I might do the following:
accounts = {}
with open("assignment11-account-info.txt", "r") as file:
for line in file:
fields = line.rstrip().split(";")
user = fields[0]
pass = fields[1]
first = fields[2]
last = fields[3]
color = fields[4]
title = fields[5]
avatar = fields[6]
if user not in accounts:
accounts[user] = []
accounts[user].append({
"user" : user,
"pass" : pass,
"first" : first,
"last" : last,
"color" : color,
"title" : title,
"avatar" : avatar
})
In this way, you preserve any cases where user is duplicated.
I have created a function that fetches price, rating, etc after it hits an API:
def is_priced(business_id):
try:
priced_ind = get_business(API_KEY, business_id)
priced_ind1 = priced_ind['price']
except:
priced_ind1 = 'None'
return priced_ind1
priced_ind = is_priced(b_id)
print(priced_ind)
Similar for rating
def is_rated(business_id):
try:
rated_ind = get_business(API_KEY, business_id)
rated_ind1 = rated_ind['rating']
except:
rated_ind1 = 'None'
return rated_ind1
However, I want my function to loop through the business names I have in my CSV file and catch all this data and export it to a new csv file with these two parameters beside the names of the business.
The CSV file has info on the name of the business along with its address,city,state,zip and country
Eg:
Name address city state zip country
XYZ(The) 5* WE 223899th St. New York NY 19921 US
My output:
Querying https://api.xyz.com/v3/businesses/matches ...
True
Querying https://api.xyz.com/v3/businesses/matches ...
4.0
Querying https://api.xyz.com/v3/businesses/matches ...
$$
Querying https://api.xyz.com/v3/businesses/matches ...
Querying https://api.xyz.com/v3/businesses/matches ...
The real issue is my output only returns business id in the csv. and the rating etc as u see is just returned in the console. how do I set a loop such that it returns for all the businesses the info i desire into a single CSV?
The csv module is useful for this sort of thing e.g.
import csv
with open('f.csv', 'r') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
with open('tmp.csv', 'w') as output:
writer = csv.writer(output)
for row in reader:
business_id = row[0]
row.append(get_price_index(business_id))
row.append(get_rate_index(business_id))
writer.writerow(row)
You can read the business names from the CSV file, iterate over them using a for loop, hit the API and store the results, and write to a new CSV file.
import csv
data = []
with open('businesses.csv') as fp:
# skip header line
header = next(fp)
reader = csv.reader(fp)
for row in reader:
b_name = reader[0]
# not sure how you get the business ID:
b_id = get_business_id(b_name)
p = is_priced(b_id)
r = is_rated(b_id)
out.append((b_name, p, r))
# write out the results
with open('business_data.csv', 'w') as fp:
writer = csv.writer(fp)
writer.writerow(['name', 'price', 'rating'])
for row in data:
writer.writerow(row)
You can do this easily using pandas:
import pandas as pd
csv = pd.read_csv('your_csv.csv', usecols=['business_name']) # since you only need the name
# you'll receive business_name in your functions
csv = csv.apply(is_priced, axis=1)
csv = csv.apply(is_rated, axis=1)
csv.to_csv('result.csv', index=False)
All you have to do in your functions is:
def is_priced(row):
business_name = row['business_name']
business_id = ??
...
I'm working on a Python script that takes Nessus data exported as CSV and removes duplicate data, however due to the way the exporting works results for different ports and protocols have their own unique row, even though all the other data in the row is the same. I need to remove these duplicates, but I want to keep the Port and Protocol column data and append it to the previous row.
Here is a very small CSV I'm using to test and build the script:
Screenshot of CSV File
As you can see all fields are the exact same apart from the port field and sometimes the protocol field will be different too, so I need to read both rows of the CSV file and then append the port like this: 80, 443 and the same with protocol: tcp, tcp
Then only save the one line to remove duplicate data, I have tried doing this by checking if there has already been an instance of the Plugin ID, however my output is only printing the second rows Port and Protocol.
protocollist = []
portlist = []
pluginid_list = []
multiple = False
with open(csv_file_input, 'rb') as csvfile:
nessusreader = csv.DictReader(csvfile)
for row in nessusreader:
pluginid = row['Plugin ID']
if pluginid != '':
pluginid_list.append(row['Plugin ID'])
print(pluginid_list)
count = pluginid_list.count(pluginid)
cve = row['CVE']
if count > 0:
protocollist.append(row['Protocol'])
print(protocollist)
portlist.append(row['Port'])
print(portlist)
print('Counted more than 1')
multiple = True
if multiple == True:
stringlist = ', '.join(protocollist)
newstring1 = stringlist
protocol = newstring1
stringlist2 = ', '.join(portlist)
newstring2 = stringlist2
port = newstring2
else:
protocol = row['Protocol']
port = row['Port']
cvss = row['CVSS']
risk = row['Risk']
host = row['Host']
name = row['Name']
synopsis = row['Synopsis']
description = row['Description']
solution = row['Solution']
seealso = row['See Also']
pluginoutput = row['Plugin Output']
with open(csv_file_output, 'w') as csvfile:
fieldnames = ['Plugin ID', 'CVE', 'CVSS', 'Risk', 'Host', 'Protocol', 'Port', 'Name', 'Synopsis', 'Description', 'Solution', 'See Also', 'Plugin Output']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerow({'Plugin ID': pluginid, 'CVE': cve, 'CVSS': cvss, 'Risk': risk, 'Host': host, 'Protocol': protocol, 'Port': port, 'Name': name, 'Synopsis': synopsis, 'Description': description, 'Solution': solution, 'See Also': seealso, 'Plugin Output': pluginoutput})
There are probably a few errors in the code as I've been trying different things, but just wanted to show the code I've been working on to give more context to the issue. This code works if the data is only as shown in the CSV as there are only two items, however I introduced a third set of data with a different Plugin ID and it then added that to the list also, probably due to the if statement being set to > 0.
I made an improvement to my code according to this suggestion from #paultrmbrth. what i need is to scrape data from pages that are similar to this and this one and i want the csv output to be like the picture below.
But my code's csv output is little messy, like this:
I have two questions, Is there anyway that the csv output can be like the first picture? and my second question is, i want the movie tittle to be scrapped too, Please give me a hint or provide to me a code that i can use to scrape the movie title and the contents.
UPDATE
The problem has been solved by Tarun Lalwani perfectly. But Now, the csv File's Header only contains the first scraped url categories. for example when i try to scrape this webpage which has References, Referenced in, Features, Featured in and Spoofed in categories and this webpage which has Follows, Followed by, Edited from, Edited into, Spin-off, References, Referenced in, Features, Featured in, Spoofs and Spoofed in categories then the csv output file header will only contain the first webpage's categories i.e References, Referenced in, Features, Featured in and Spoofed in so some categories from the 2nd webpage like Follows, Followed by, Edited from, Edited into and Spoofswill not be on the output csv file header so is its contents.
Here is the code i used:
import scrapy
class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["imdb.com"]
start_urls = (
'http://www.imdb.com/title/tt0093777/trivia?tab=mc&ref_=tt_trv_cnn',
'http://www.imdb.com/title/tt0096874/trivia?tab=mc&ref_=tt_trv_cnn',
)
def parse(self, response):
item = {}
for cnt, h4 in enumerate(response.css('div.list > h4.li_group'), start=1):
item['Title'] = response.css("h3[itemprop='name'] a::text").extract_first()
key = h4.xpath('normalize-space()').get().strip()
if key in ['Follows', 'Followed by', 'Edited into', 'Spun-off from', 'Spin-off', 'Referenced in',
'Featured in', 'Spoofed in', 'References', 'Spoofs', 'Version of', 'Remade as', 'Edited from',
'Features']:
values = h4.xpath('following-sibling::div[count(preceding-sibling::h4)=$cnt]', cnt=cnt).xpath(
'string(.//a)').getall(),
item[key] = values
yield item
and here is exporters.py file:
try:
from itertools import zip_longest as zip_longest
except:
from itertools import izip_longest as zip_longest
from scrapy.exporters import CsvItemExporter
from scrapy.conf import settings
class NewLineRowCsvItemExporter(CsvItemExporter):
def __init__(self, file, include_headers_line=True, join_multivalued=',', **kwargs):
super(NewLineRowCsvItemExporter, self).__init__(file, include_headers_line, join_multivalued, **kwargs)
def export_item(self, item):
if self._headers_not_written:
self._headers_not_written = False
self._write_headers_and_set_fields_to_export(item)
fields = self._get_serialized_fields(item, default_value='',
include_empty=True)
values = list(self._build_row(x for _, x in fields))
values = [
(val[0] if len(val) == 1 and type(val[0]) in (list, tuple) else val)
if type(val) in (list, tuple)
else (val, )
for val in values]
multi_row = zip_longest(*values, fillvalue='')
for row in multi_row:
self.csv_writer.writerow([unicode(s).encode("utf-8") for s in row])
What I'm trying to achieve is i want all these categories to be on the csv output header.
'Follows', 'Followed by', 'Edited into', 'Spun-off from', 'Spin-off', 'Referenced in',
'Featured in', 'Spoofed in', 'References', 'Spoofs', 'Version of', 'Remade as', 'Edited from', 'Features'
Any help would be appreciated.
You can extract the title using below
item = {}
item['Title'] = response.css("h3[itemprop='name'] a::text").extract_first()
For the CSV part you would need to create a FeedExports which can split each row into multiple rows
from itertools import zip_longest
from scrapy.contrib.exporter import CsvItemExporter
class NewLineRowCsvItemExporter(CsvItemExporter):
def __init__(self, file, include_headers_line=True, join_multivalued=',', **kwargs):
super(NewLineRowCsvItemExporter, self).__init__(file, include_headers_line, join_multivalued, **kwargs)
def export_item(self, item):
if self._headers_not_written:
self._headers_not_written = False
self._write_headers_and_set_fields_to_export(item)
fields = self._get_serialized_fields(item, default_value='',
include_empty=True)
values = list(self._build_row(x for _, x in fields))
values = [
(val[0] if len(val) == 1 and type(val[0]) in (list, tuple) else val)
if type(val) in (list, tuple)
else (val, )
for val in values]
multi_row = zip_longest(*values, fillvalue='')
for row in multi_row:
self.csv_writer.writerow(row)
Then you need to assign the feed exporter in your settings
FEED_EXPORTERS = {
'csv': '<yourproject>.exporters.NewLineRowCsvItemExporter',
}
Assuming you put the code in exporters.py file. The output will be as desired
Edit-1
To set the fields and their order you will need to define FEED_EXPORT_FIELDS in your settings.py
FEED_EXPORT_FIELDS = ['Title', 'Follows', 'Followed by', 'Edited into', 'Spun-off from', 'Spin-off', 'Referenced in',
'Featured in', 'Spoofed in', 'References', 'Spoofs', 'Version of', 'Remade as', 'Edited from',
'Features']
https://doc.scrapy.org/en/latest/topics/feed-exports.html#std:setting-FEED_EXPORT_FIELDS
To set csv data format one of the easiest way is to clean data using excel power queries follow these steps:
1: open csv file in excel.
2:select all values using ctrl+A
3:Then click on table from insert and create table.
4:after create table click on Data from top menu and select From Table 5:know they open new excel window power queries.
6:select any column and click on split column
7: from split column select by delimiter,
8: know select delimiter like comma,space etc
9: final step select advanced option in which there are two options split in rows or column
10: you can do all type of data cleaning using these power queries this is the easiest way to setup data format according to your need