Creating lists from rows with different lengths in python - python

I am trying to create a list for each column in python of my data that looks like this:
399.75833 561.572000000 399.75833 561.572000000 a_Fe I 399.73920 nm
399.78316 523.227000000 399.78316 523.227000000
399.80799 455.923000000 399.80799 455.923000000 a_Fe I 401.45340 nm
399.83282 389.436000000 399.83282 389.436000000
399.85765 289.804000000 399.85765 289.804000000
The problem is that each row of my data is a different length. Is there anyway to format the remaining spaces of the shorter rows with a space so they are all the same length?
I would like my data to be in the form:
list one= [399.75833, 399.78316, 399.80799, 399.83282, 399.85765]
list two= [561.572000000, 523.227000000, 455.923000000, 389.436000000, 289.804000000]
list three= [a_Fe, " ", a_Fe, " ", " "]
This is the code I used to import the data into python:
fh = open('help.bsp').read()
the_list = []
for line in fh.split('\n'):
print line.strip()
splits = line.split()
if len(splits) ==1 and splits[0]== line.strip():
splits = line.strip().split(',')
if splits:the_list.append(splits)

You need to use izip_longest to make your column lists, since standard zip will only run till the shortest length in the given list of arrays.
from itertools import izip_longest
with open('workfile', 'r') as f:
fh = f.readlines()
# Process all the rows line by line
rows = [line.strip().split() for line in fh]
# Use izip_longest to get all columns, with None's filled in blank spots
cols = [col for col in izip_longest(*rows)]
# Then run your type conversions for your final data lists
list_one = [float(i) for i in cols[2]]
list_two = [float(i) for i in cols[3]]
# Since you want " " instead of None for blanks
list_three = [i if i else " " for i in cols[4]]
Output:
>>> print list_one
[399.75833, 399.78316, 399.80799, 399.83282, 399.85765]
>>> print list_two
[561.572, 523.227, 455.923, 389.436, 289.804]
>>> print list_three
['a_Fe', ' ', 'a_Fe', ' ', ' ']

So, your lines are either whitespace delimited or comma delimited, and if comma delimited, the line contains no whitespace? (note that if len(splits)==1 is true, then splits[0]==line.strip() is also true). That's not the data you're showing, and not what you're describing.
To get the lists you want from the data you show:
with open('help.bsp') as h:
the_list = [ line.strip().split() for line in h.readlines() ]
list_one = [ d[0] for d in the_list ]
list_two = [ d[1] for d in the_list ]
list_three = [ d[4] if len(d) > 4 else ' ' for d in the_list ]
If you're reading comma separated (or similarly delimited) files, I always recommend using the csv module - it handles a lot of edge cases that you may not have considered.

Related

I want to get rid of all the special characters like [(' and im really stuck in how to

def trackItems():
cursor.execute("SELECT ItemsBought, COUNT(*) FROM Purchase GROUP BY ItemsBought")
stock = []
Graphs = cursor.fetchall()
print(Graphs)
separator = " "
f = open("Stock.txt", "w")
values = ','.join([str(i) for i in Graphs])
f.write(values)
Output
('DONT', 1),('MY', 2),('PLEASE', 2)
How can i get rid of opening and closing brackets and all the quotation marks. If anyone could help, it would be much appreciated
You can replace substrings from a string with "" (so it will remove it) using str.replace(substring, "") e.g.
"(abcd(.ad".replace("(", "") #output: abcd.ad
Then you can just write this string to the file.
Code:
def trackItems():
cursor.execute("SELECT ItemsBought, COUNT(*) FROM Purchase GROUP BY ItemsBought")
stock = []
Graphs = cursor.fetchall()
print(Graphs)
separator = " "
f = open("Stock.txt", "w")
values = ','.join([str(i) for i in Graphs]).replace("(", "").replace(")", "").replace("'", "")
f.write(values)
Rather than thinking about this as a problem about special characters, think about it as flattening a sequence (list) of subsequences (tuples, the rows) into a list of individual elements, which can then be joined.
You could do this with a for loop:
>>> flattened = []
>>> for row in Graphs:
... flattened.extend(row)
...
>>> flattened
['DONT', 1, 'MY', 2, 'PLEASE', 2]
but a list comprehension is more idiomatic
>>> Graphs = [('DONT', 1),('MY', 2),('PLEASE', 2)]
>>> values = ','.join([str(i) for j in Graphs for i in j])
>>> print(values)
DONT,1,MY,2,PLEASE,2

Match two sets (/lists) in Python

I have two sets that are like below
Set A:
(['African American and Japanese', 'Indian', 'Chinese'])
Set B:
(['African', 'American', 'African American', 'Chinese', 'Russian'])
I want the output to be (['African American', 'Chinese']) but my script gives me either just Chinese or African, American, Chinese (splits African and American, I know that's how my script is, but am not sure how to edit).
I tried this so far.
import csv
alist, blist = [], []
with open("sample.csv", "rb") as fileA:
reader = csv.reader(fileA, delimiter=',')
for row in reader:
for row_str in row:
alist.append(row_str)
#alist = alist.strip().split() #If I use this, it also prints African, but doesn't print African American.
with open("ethnicity.csv", "rb") as fileB:
reader = csv.reader(fileB, delimiter='\n')
for row in reader:
blist += row
blist = [x.lower() for x in blist]
first_set = set(alist)
second_set = set(blist)
print [s for s in first_set if second_set in s]
EDIT:
Elements in SetA are not always separated by "and", it could be anything else or just a space.
You can rearrange list i.e. split the list item when it contains "and" as substring
Then use intersection method of set to get common items from both list.
code:
def convert(input):
output = []
for i in input:
for j in i.split("and"):
output.append(j.strip())
return output
a = ['African American and Japanese', 'Indian', 'Chinese']
b = ['African American', 'Chinese']
a = convert(a)
print a
b = convert(b)
print set(a).intersection(set(b))
Output:
set(['African American', 'Chinese'])
Is this helpful ?
If it could be any string (spaces included) separating the words, you can do something like this:
import re
sep = ' ; '
_a = sep.join(re.split(' [a-z]* ', sep.join(a)))
_b = sep.join(re.split(' [a-z]* ', sep.join(b)))
set(_b.split(sep)).intersection(_a.split(sep))
It won't work when ; is separating two words in your lists... but I think it does handle all cases when you have a non-capatalized word separator.

Change the display of a list took from text file

I have this code wrote in Python:
with open ('textfile.txt') as f:
list=[]
for line in f:
line = line.split()
if line:
line = [int(i) for i in line]
list.append(line)
print(list)
This actually read integers from a text file and put them in a list.But it actually result as :
[[10,20,34]]
However,I would like it to display like:
10 20 34
How to do this? Thanks for your help!
You probably just want to add the items to the list, rather than appending them:
with open('textfile.txt') as f:
list = []
for line in f:
line = line.split()
if line:
list += [int(i) for i in line]
print " ".join([str(i) for i in list])
If you append a list to a list, you create a sub list:
a = [1]
a.append([2,3])
print a # [1, [2, 3]]
If you add it you get:
a = [1]
a += [2,3]
print a # [1, 2, 3]!
with open('textfile.txt') as f:
lines = [x.strip() for x in f.readlines()]
print(' '.join(lines))
With an input file 'textfiles.txt' that contains:
10
20
30
prints:
10 20 30
It sounds like you are trying to print a list of lists. The easiest way to do that is to iterate over it and print each list.
for line in list:
print " ".join(str(i) for i in line)
Also, I think list is a keyword in Python, so try to avoid naming your stuff that.
If you know that the file is not extremely long, if you want the list of integers, you can do it at once (two lines where one is the with open(.... And if you want to print it your way, you can convert the element to strings and join the result via ' '.join(... -- like this:
#!python3
# Load the content of the text file as one list of integers.
with open('textfile.txt') as f:
lst = [int(element) for element in f.read().split()]
# Print the formatted result.
print(' '.join(str(element) for element in lst))
Do not use the list identifier for your variables as it masks the name of the list type.

Python - how to write strings to file without quotes and spaces?

Is it possible to write into file string without quotes and spaces (spaces for any type in list)?
For example I have such list:
['blabla', 10, 'something']
How can I write into file so line in a file would become like:
blabla,10,something
Now every time I write it into file I get this:
'blabla', 10, 'something'
So then I need to replace ' and ' ' with empty symbol. Maybe there is some trick, so I shouldn't need to replace it all the time?
This will work:
lst = ['blabla', 10, 'something']
# Open the file with a context manager
with open("/path/to/file", "a+") as myfile:
# Convert all of the items in lst to strings (for str.join)
lst = map(str, lst)
# Join the items together with commas
line = ",".join(lst)
# Write to the file
myfile.write(line)
Output in file:
blabla,10,something
Note however that the above code can be simplified:
lst = ['blabla', 10, 'something']
with open("/path/to/file", "a+") as myfile:
myfile.write(",".join(map(str, lst)))
Also, you may want to add a newline to the end of the line you write to the file:
myfile.write(",".join(map(str, lst))+"\n")
This will cause each subsequent write to the file to be placed on its own line.
Did you try something like that ?
yourlist = ['blabla', 10, 'something']
open('yourfile', 'a+').write(', '.join([str(i) for i in yourlist]) + '\n')
Where
', '.join(...) take a list of strings and glue it with a string (', ')
and
[str(i) for i in yourList] converts your list into a list of string (in order to handle numbers)
Initialise an empty string j
for all item the the list,concatenate to j which create no space in for loop,
printing str(j) will remove the Quotes
j=''
for item in list:
j = j + str(item)
print str(j)

Merging 2 lists and remove double entries

I have a piece of code that loads up 2 lists with this code:
with open('blacklists.bls', 'r') as f:
L = [dnsbls.strip() for dnsbls in f]
with open('ignore.bls', 'r') as f2:
L2 = [ignbls.stip() for ignbls in f2]
dnsbls contains:
list1
list2
list3
ignbls contains
list2
What I want to do is merge dnsbls and ignbls and then remove any lines that appears more than once and print those with "for". I was thinking something like:
for combinedlist in L3:
print combinedlist
Which in the aboe example would print out:
list1
list3
You need to use sets instead of lists:
L3 = list(set(L).difference(L2))
Demonstration:
>>> L=['list1','list2','list3']
>>> L2=['list2']
>>> set(L).difference(L2)
set(['list1', 'list3'])
>>> list(set(L).difference(L2))
['list1', 'list3']
For your purposes you probably don't have to convert it back to a list again, you can iterate over the resulting set just fine.
If ignores are smaller than the blacklists (which is normally the case I think), then (untested):
with open('blacklists.bls') as bl, open('ignore.bls') as ig:
bl_for = (line.strip() for line in bl if 'for' not in line)
ig_for = (line.strip() for line in ig if 'for' not in line)
res = set(ig_for).difference(bl_for)

Categories

Resources