How to get non-csv lines in csv file - python

I have a csv like:
"Equipment","LNKEQP","METAST","METSER","MODSTA","METEOD"
"HLL_POS_00098",1,1,0,0,0
"TOY_GAT_00003",0,0,0,3,0
"NAT_POS_00010",0,3,0,3,0
"NAT_GAT_00002",0,0,0,0,0
"NAT_GAT_00001",0,0,0,4,0
A machine A is unavailable
And i use the code to read that csv file as:
reader = csv.DictReader(f)
s=[]
for row in reader:
But the row doesn't contain "A machine A is unavailable", how to get this line and output as this example:
'METEOD': '0', 'MODSTA': '0', 'METSER': '0', 'LNKEQP': '0', 'METAST': '0', 'Equipmnt': 'NAT_VCF_00001'
'METEOD': '0', 'MODSTA': '0', 'METSER': '0', 'LNKEQP': '1', 'METAST': '1', 'Equipment': 'NAT_TVM_00002'
A machine A is unavailable
Thank for your help

Remove the offending lines before parsing them:
import csv
from StringIO import StringIO
i = """"Equipment","LNKEQP","METAST","METSER","MODSTA","METEOD"
"HLL_POS_00098",1,1,0,0,0
"TOY_GAT_00003",0,0,0,3,0
"NAT_POS_00010",0,3,0,3,0
"NAT_GAT_00002",0,0,0,0,0
"NAT_GAT_00001",0,0,0,4,0
A machine A is unavailable
"""
# Take only those lines that contain a comma.
j = "".join([line for line in StringIO(i).readlines() if ',' in line])
# Parse the taken lines as CSV.
reader = csv.reader(StringIO(j))
for line in reader:
print line
Output:
['Equipment', 'LNKEQP', 'METAST', 'METSER', 'MODSTA', 'METEOD']
['HLL_POS_00098', '1', '1', '0', '0', '0']
['TOY_GAT_00003', '0', '0', '0', '3', '0']
['NAT_POS_00010', '0', '3', '0', '3', '0']
['NAT_GAT_00002', '0', '0', '0', '0', '0']
['NAT_GAT_00001', '0', '0', '0', '4', '0']

Related

Sort list of strings numerically and filter duplicates?

Given a list of strings in the following format:
[
"464782,-100,4,3,1,100,0,0"
"465042,-166.666666666667,4,3,1,100,0,0",
"465825,-250.000000000001,4,3,1,100,0,0",
"466868,-166.666666666667,4,3,1,100,0,0",
"467390,-200.000000000001,4,3,1,100,0,0",
"469999,-100,4,3,1,100,0,0",
"470260,-166.666666666667,4,3,1,100,0,0",
"474173,-100,4,3,1,100,0,0",
"474434,-166.666666666667,4,3,1,100,0,0",
"481477,-100,4,3,1,100,0,1",
"531564,259.011439671919,4,3,1,60,1,0",
"24369,-333.333333333335,4,3,1,100,0,0",
"21082,410.958904109589,4,3,1,60,1,0",
"21082,-250,4,3,1,100,0,0",
"22725,-142.857142857143,4,3,1,100,0,0",
"23547,-166.666666666667,4,3,1,100,0,0",
"24369,-333.333333333335,4,3,1,100,0,0",
"27657,-200.000000000001,4,3,1,100,0,0",
"29301,-142.857142857143,4,3,1,100,0,0",
"30123,-166.666666666667,4,3,1,100,0,0",
"30945,-250,4,3,1,100,0,0",
"32588,-166.666666666667,4,3,1,100,0,0",
"34232,-250,4,3,1,100,0,0",
"35876,-142.857142857143,4,3,1,100,0,0",
"36698,-166.666666666667,4,3,1,100,0,0",
"37520,-250,4,3,1,100,0,0",
"42451,-142.857142857143,4,3,1,100,0,0",
"43273,-166.666666666667,4,3,1,100,0,0",
]
How can I sort the list based on the first number in each line with python?
And then, once sorted, remove all duplicates, if any are there?
The sorting criteria for the list is the number before the first comma in each line, which is always an integer.
I tried using list.sort() , however, this sorts the items in lexical order, not numerically.
You could use a dictionary for this. The key will be number before the first comma and the value the entire string. Duplicates will be eliminated, but only the last occurrence of a particular number's string is stored.
l = ['464782,-100,4,3,1,100,0,0',
'465042,-166.666666666667,4,3,1,100,0,0',
'465825,-250.000000000001,4,3,1,100,0,0',
'466868,-166.666666666667,4,3,1,100,0,0',
'467390,-200.000000000001,4,3,1,100,0,0',
...]
d = {int(s.split(',')[0]) : s for s in l}
result = [d[key] for key in sorted(d.keys())]
I would try one of these two methods:
def sort_list(lis):
nums = [int(num) if isdigit(num) else float(num) for num in lis]
nums = list(set(nums))
nums.sort()
return [str(i) for i in nums] # I assumed you wanted them to be strings.
The first will raise a TypeError if all items in lis are not ints, floats, or string representations of a number. The second method doesn't have that problem, but it's a bit wonkier.
def sort_list(lis):
ints = [int(num) for num in lis if num.isdigit()]
floats = [float(num) for num in lis if not num.isdigit()]
nums = ints.copy()
nums.extend(floats)
nums = list(set(nums))
nums.sort()
return [str(i) for i in nums] # I assumed you wanted them to be strings.
Hope this helps.
You can try this.
First we need to remove the duplicates inside the list using set()
removed_duplicates_list = list(set(listr))
Then we convert the list of strings in to a list of tuples
list_of_tuples = [tuple(i.split(",")) for i in removed_duplicates_list]
Then we sort it using the sort()
list_of_tuples.sort()
The complete code sample below:
listr = [
"464782,-100,4,3,1,100,0,0"
"465042,-166.666666666667,4,3,1,100,0,0",
"465825,-250.000000000001,4,3,1,100,0,0",
"466868,-166.666666666667,4,3,1,100,0,0",
"467390,-200.000000000001,4,3,1,100,0,0",
"469999,-100,4,3,1,100,0,0",
"470260,-166.666666666667,4,3,1,100,0,0",
"474173,-100,4,3,1,100,0,0",
"474434,-166.666666666667,4,3,1,100,0,0",
"481477,-100,4,3,1,100,0,1",
"531564,259.011439671919,4,3,1,60,1,0",
"24369,-333.333333333335,4,3,1,100,0,0",
"21082,410.958904109589,4,3,1,60,1,0",
"21082,-250,4,3,1,100,0,0",
"22725,-142.857142857143,4,3,1,100,0,0",
"23547,-166.666666666667,4,3,1,100,0,0",
"24369,-333.333333333335,4,3,1,100,0,0",
"27657,-200.000000000001,4,3,1,100,0,0",
"29301,-142.857142857143,4,3,1,100,0,0",
"30123,-166.666666666667,4,3,1,100,0,0",
"30945,-250,4,3,1,100,0,0",
"32588,-166.666666666667,4,3,1,100,0,0",
"34232,-250,4,3,1,100,0,0",
"35876,-142.857142857143,4,3,1,100,0,0",
"36698,-166.666666666667,4,3,1,100,0,0",
"37520,-250,4,3,1,100,0,0",
"42451,-142.857142857143,4,3,1,100,0,0",
"43273,-166.666666666667,4,3,1,100,0,0",
]
removed_duplicates_list = list(set(listr))
list_of_tuples = [tuple(i.split(",")) for i in removed_duplicates_list]
list_of_tuples.sort()
print(list_of_tuples) # the output is a list of tuples
OUTPUT:
[('21082', '-250', '4', '3', '1', '100', '0', '0'),
('21082', '410.958904109589', '4', '3', '1', '60', '1', '0'),
('22725', '-142.857142857143', '4', '3', '1', '100', '0', '0'),
('23547', '-166.666666666667', '4', '3', '1', '100', '0', '0'),
('24369', '-333.333333333335', '4', '3', '1', '100', '0', '0'),
('27657', '-200.000000000001', '4', '3', '1', '100', '0', '0'),
('29301', '-142.857142857143', '4', '3', '1', '100', '0', '0'),
('30123', '-166.666666666667', '4', '3', '1', '100', '0', '0'),
('30945', '-250', '4', '3', '1', '100', '0', '0'),
('32588', '-166.666666666667', '4', '3', '1', '100', '0', '0'),
('34232', '-250', '4', '3', '1', '100', '0', '0'),
('35876', '-142.857142857143', '4', '3', '1', '100', '0', '0'),
('36698', '-166.666666666667', '4', '3', '1', '100', '0', '0'),
('37520', '-250', '4', '3', '1', '100', '0', '0'),
('42451', '-142.857142857143', '4', '3', '1', '100', '0', '0'),
('43273', '-166.666666666667', '4', '3', '1', '100', '0', '0'),
('464782','-100','4','3','1','100','0'),
('465042','-166.666666666667','4','3','1','100','0','0'),
('465825', '-250.000000000001', '4', '3', '1', '100', '0', '0'),
('466868', '-166.666666666667', '4', '3', '1', '100', '0', '0'),
('467390', '-200.000000000001', '4', '3', '1', '100', '0', '0'),
('469999', '-100', '4', '3', '1', '100', '0', '0'),
('470260', '-166.666666666667', '4', '3', '1', '100', '0', '0'),
('474173', '-100', '4', '3', '1', '100', '0', '0'),
('474434', '-166.666666666667', '4', '3', '1', '100', '0', '0'),
('481477', '-100', '4', '3', '1', '100', '0', '1'),
('531564', '259.011439671919', '4', '3', '1', '60', '1', '0')]
I hope this will help to.
I place all your list elements in a separate file named lista.txt
In this example I will get your list from file... I like to be more organizated and to have separate files you can do in on python as well, but the idea is you need to get all elements from list one by one (while function or for function) and to add them to a temporary list by checking if the new items already exist, if is exist pass and then you can sample use .sort() because will do the trick and with numbers.
# Global variables
file = "lista.txt"
tempList = []
# Logic get items from file
def GetListFromFile(fileName):
# Local variables
showDoneMsg = True
# Try to run this code
try:
# Open file and try to read it
with open(fileName, mode="r") as f:
# Define line
line = f.readline()
# For every line in file
while line:
# Get out all end white space (\n, \r)
item = line.rstrip()
# Check if this item is not allready in the list
if item not in tempList:
# Append item to a temporar list
tempList.append(item)
# Show me if a itmes allready exist
else:
print("Dublicate >>", item)
# Go to new line
line = f.readline()
# This is optional because is callet automatical
# but I like to be shore
f.close()
# Execptions
except FileNotFoundError:
print("ERROR >> File do not exist!")
showDoneMsg = False
# Sort the list
tempList.sort()
# Show me when is done if file exist
if showDoneMsg == True:
print("\n>>> DONE <<<\n")
# Logic show list items
def ShowListItems(thisList):
if len(thisList) == 0:
print("Temporary list is empty...")
else:
print("This is new items list:")
for i in thisList:
print(i)
# Execute function
GetListFromFile(file)
# Testing if items was sorted
ShowListItems(tempList)
Out put:
========================= RESTART: D:\Python\StackOverflow\help.py =========================
Dublicate >> 43273,-166.666666666667,4,3,1,100,0,0
>>> DONE <<<
21082,-250,4,3,1,100,0,0
21082,410.958904109589,4,3,1,60,1,0
22725,-142.857142857143,4,3,1,100,0,0
...
474434,-166.666666666667,4,3,1,100,0,0
481477,-100,4,3,1,100,0,1
531564,259.011439671919,4,3,1,60,1,0
>>>

How to reverse multiple lists?

scores=open('scores.csv','r')
for score in scores.readlines():
score = score.strip()
rev=[]
for s in reversed(score[0:]):
rev.append(s)
print(rev)
This is my code, what I am going to do is the print reversed list from scores.csv
If I print scores at the beginning, the result is:
['0.74,0.63,0.58,0.89\n', '0.91,0.89,0.78,0.99\n', '0.43,0.35,0.34,0.45\n', '0.56,0.61,0.66,0.58\n', '0.50,0.49,0.76,0.72\n', '0.88,0.75,0.61,0.78\n']
It looks normal, and if I print score after I remove all \n in the list, the result is:
0.74,0.63,0.58,0.89
0.91,0.89,0.78,0.99
0.43,0.35,0.34,0.45
0.56,0.61,0.66,0.58
0.50,0.49,0.76,0.72
0.88,0.75,0.61,0.78
it still looks ok, but if I print at the end of the code, it shows:
['9', '8', '.', '0', ',', '8', '5', '.', '0', ',', '3', '6', '.', '0', ',', '4', '7', '.', '0']
['9', '9', '.', '0', ',', '8', '7', '.', '0', ',', '9', '8', '.', '0', ',', '1', '9', '.', '0']
['5', '4', '.', '0', ',', '4', '3', '.', '0', ',', '5', '3', '.', '0', ',', '3', '4', '.', '0']
['8', '5', '.', '0', ',', '6', '6', '.', '0', ',', '1', '6', '.', '0', ',', '6', '5', '.', '0']
['2', '7', '.', '0', ',', '6', '7', '.', '0', ',', '9', '4', '.', '0', ',', '0', '5', '.', '0']
['8', '7', '.', '0', ',', '1', '6', '.', '0', ',', '5', '7', '.', '0', ',', '8', '8', '.', '0']
looks like python converts my result from decimal to integer, but when I am trying to use float(s) to convert it back, it gives me an error. I would like to know what's wrong with my code?
In your approach, score is a string, so it's doing exactly what you tell it to: reverse the entire line character by character. You can do two things:
Use the csv module to read your CSV file (recommended), to get a list of float values, then reverse that.
Split your line on commas, then reverse that list, and finally stitch it back together. An easy way to reverse a list in Python is mylist[::-1].
For number 2, it would be something like:
score = score.strip()
temp = score.split(',')
temp_reversed = temp[::-1]
score_reversed = ','.join(temp_reversed)
always use csv module to read csv files. This module parses the data, splits according to commas, etc...
Your attempt is just reversing the line char by char. I'd rewrite it completely using csv module, which yields the tokens already split by comma (default):
import csv
with open('scores.csv','r') as scores:
cr = csv.reader(scores)
rev = []
for row in cr:
rev.append(list(reversed(row))
that doesn't convert data to float, that said, I'd replace the loop by a comprehension + float conversion
rev = [[float(x) for x in reversed(row)] for row in cr]

Search of elements inside a big CSV file using Python

Im trying to filter a CSV file and get the fifth value of a list that is inside another list , but Im getting out of range all time .
import csv
from operator import itemgetter
teste=[]
f = csv.reader(open('power_supply_info.csv'), delimiter =',' )
for word in f:
teste.append(word)
#print teste
#print ('\n')
print map( itemgetter(5), teste)
But , I got this error :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\rafael.paiva\Dev\Python2.7\WinPython-64bit-2.7.6.4\python-2.7.6.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 540, in runfile
execfile(filename, namespace)
File "C:/Users/rafael.paiva/Desktop/Rafael/CSV.py", line 24, in <module>
print map( itemgetter(5), teste)
IndexError: list index out of range
What is inside "word" variable , appended to "teste" according with steps is :
[['2015-12-31-21:02:30.754271', '25869', '500000', 'Unknown', '1', '0', '4790780', '1', '0', '0', '375', '0', '-450060', '-326040', '3437000', 'Normal', 'N/A', '93', 'Good', '19', '1815372', 'Unknown', 'Charging', '4195078', '4440000', '4208203', '4171093', '0', '44290', 'Li-ion', '95', '1', '3000000', '1', '375', '-450060', '-326040', '3437000', '93', 'Good', '1815372', '4195000', '4440000', '4208203', '4165625', '0', '44290', '95', '3000000', '1', ''],
['2015-12-31-21:03:30.910972', '25930', '500000', 'Unknown', '1', '0', '4794730', '1', '0', '0', '377', '0', '55692', '107328', '3437000', 'Normal', 'N/A', '92', 'Good', '19', '1814234', 'Unknown', 'Charging', '4200390', '4440000', '4207734', '4214062', '0', '41200', 'Li-ion', '95', '1', '3000000', '1', '377', '55692', '107328', '3437000', '92', 'Good', '1814234', '4200390', '4440000', '4207734', '4214062', '0', '41200', '95', '3000000', '1', '']]
Can someone can help me with it please?
You should add some diagnostics to your loop, this will help to show you where a problem might be in your csv file:
import csv
from operator import itemgetter
teste = []
with open('power_supply_info.csv', 'rb') as f_input:
for line, words in enumerate(csv.reader(f_input, delimiter =',' ), start=1):
if len(words) <= 5:
print "Line {} only has {} elements".format(line, len(words))
teste.append(words)
print map(itemgetter(5), teste)
It is likely that one of you lines is either blank or has too few entries, this script will list which lines numbers have problems.
I don't know what's in your power_supply_info.csv file, but it's clear what you have after csv.reader has done its job:
a list with 2 lists (ie: 2 elements)
That's why you get an error accessing the 5th element, there are only 2
A possible approach for your problem:
import csv
f = csv.reader(open('power_supply_info.csv'), delimiter =',' )
# First iterate over the rows and then get each list in the row
teste = [x for x in (row for row in f)]
print map(lambda x: x[5], teste)
The real challenge would be to see the input you have in your csv file to understand why you end up with those 2 lists inside a list.
Note: In case you output belongs to teste and not to word, the code could be:
import csv
f = csv.reader(open('power_supply_info.csv'), delimiter =',' )
teste = [row for row in f]
print [x[5] for x in teste]
Best regards
The code you show works correctly with the data sample you have provided:
In [8]: l = [['2015-12-31-21:02:30.754271', '25869', '500000', 'Unknown', '1', '0', '4790780', '1', '0', '0'],
...: ['2015-12-31-21:03:30.910972', '25930', '500000', 'Unknown', '1', '0', '4794730', '1', '0', '0']]
In [9]: list(map(itemgetter(5),l))
Out[9]: ['0', '0']
I suspect that one line (probably the last line) in your CSV file is blank, therefore the last element of teste is actually an empty list, and therefore itemgetter(5) fails for that last line.
Instead of cramming everything into a single line, try
for item in teste:
if item:
print item[5]

Doctest not running tests

import doctest
def create_grid(size):
grid = []
for i in range(size):
row = ['0']*size
grid.append(row)
"""
>>> create_grid(4)
[['0', '0', '0', '0'], ['0', '0', '0', '0'],
['0', '0', '0', '0'], ['0', '0', '0', '0']]
"""
return grid
if __name__ == '__main__':
doctest.testmod()
Running the above with python Test_av_doctest.py -v gives the following message:
2 items had no tests:
__main__
__main__.create_grid
0 tests in 2 items.
0 passed and 0 failed.
Test passed.
Any idea why this error occurs?
The issue is that your doctest-formatted string isn't a docstring.
Which docstrings are examined?
The module docstring, and all function, class and method docstrings are searched.
If you move the testing string below the function definition, it will become a function docstring, and thus will be targeted by doctest:
def create_grid(size):
"""
>>> create_grid(4)
[['0', '0', '0', '0'], ['0', '0', '0', '0'],
['0', '0', '0', '0'], ['0', '0', '0', '0']]
"""
grid = []
for i in range(size):
row = ['0']*size
grid.append(row)
return grid
if __name__ == '__main__':
doctest.testmod()
$ python Test_av_doctest.py -v
...
1 passed and 0 failed.
Test passed.

Value Error in python numpy

I am trying to remove rows with ['tempn', '0', '0'] in it. Rows with ['tempn', '0'] should not be removed however.
my_input = array([['temp1', '0', '32k'],
['temp2', '13C', '0'],
['temp3', '0', '465R'],
['temp4', '0', '0'],
['temp5', '22F', '0'],
['temp6', '0', '-15C'],
['temp7', '0', '0'],
['temp8', '212K', '1'],
['temp9', '32C', '0'],
['temp10', '0', '19F']],
dtype='|S2'), array([['temp1', '15K'],
['temp2', '0'],
['temp3', '16C'],
['temp4', '0'],
['temp5', '22F'],
['temp6', '0'],
['temp7', '457R'],
['temp8', '305K'],
['temp9', '0'],
['temp10', '0']], dtype='|S2')]
Based on a previous question, I tried
my_output = []
for c in my_input:
my_output.remove(c[np.all(c[:, 1:] == '1', axis = 1)])
I sprung up with a value error however, saying truth value of an array of more than one element is ambiguous. Thanks!
The trick is to compare the elements individually rather than both at the same time, which is probably why you were getting the error.
final_out = []
for item1 in my_input:
my_output = []
for item2 in item1:
try:
if item2[1] != '0' or item2[2] != '0':
my_output.append(item2)
except IndexError:
my_output.append(item2)
final_out.append(np.array(my_output))
This will preserve your list of array structure while removing ['tempn', '0', '0'].

Categories

Resources