This question has been edited as to make more sense.
The original question is how to insert values into a numpy record array, and I have had som success but still have an issue. Based off of the website below I have been inserting values into a record array.
Python code
instance_format={
'names' : ('name','offset'),
'formats' : ('U100','U30')}
instance=np.zeros(20,dtype=instance_format)
#I am placing values in the array similar to this
instance[0]['name']="Wire 1"
instance[1]['name']="Wire 2"
instance[2]['name']="Wire 3"
instance[0]['offset']="0x103"
instance[1]['offset']="0x104"
instance[2]['offset']="0x105"
#Here is the insertion statement that works
instance1 = np.insert(instance1,1,"Module one")
print(instance1)
Output
[('One Wire 1', '0x103')
('Module One', 'Module One')
('One Wire 2', '0x104')
('One Wire 3', '0x105')
So the insert statement works, however it inserts it both in the name and the offset field. I want to insert it just in the name field. How do I this?
Thanks
Your instance
In [470]: instance
Out[470]:
array([('', '', ''), ('', '', ''), ('', '', ''), ('', '', ''),
('', '', ''), ('', '', ''), ('', '', ''), ('', '', ''),
('', '', ''), ('', '', ''), ('', '', ''), ('', '', ''),
('', '', ''), ('', '', ''), ('', '', ''), ('', '', ''),
('', '', ''), ('', '', ''), ('', '', ''), ('', '', '')],
dtype=[('name', '<U100'), ('module', '<U100'), ('offset', '<U30')])
does not look like
['One Wire Instance 1', 'One Wire Instance 2', 'One Wire Instance 3']
Are you talking about one record of instance, which would display as
('One Wire Instance 1', 'One Wire Instance 2', 'One Wire Instance 3')
with each string being the name, module, and offset.
Or are these 3 strings e.g. instance['name'][:3], the 'name' field from 3 records?
Inserting a new record into the instance array is one thing, adding a new field to the array is quite another.
To use np.insert with a structured array, you need provide a 1 element array with the correct dtype.
With your new instance:
In [580]: newone = np.array(("module one",'',''),dtype=instance.dtype)
In [581]: newone
Out[581]:
array(('module one', '', ''),
dtype=[('name', '<U100'), ('module', '<U100'), ('offset', '<U30')])
In [582]: np.insert(instance,1,newone)
Out[582]:
array([('Wire 1', '', '0x103'), ('module one', '', ''),
('Wire 2', '', '0x104'), ('Wire 3', '', '0x105')],
dtype=[('name', '<U100'), ('module', '<U100'), ('offset', '<U30')])
np.insert is just a function that performs these steps:
In [588]: instance2 = np.zeros((4,),dtype=instance.dtype)
In [589]: instance2[:1]=instance[:1]
In [590]: instance2[2:]=instance[1:3]
In [591]: instance2
Out[591]:
array([('Wire 1', '', '0x103'), ('', '', ''), ('Wire 2', '', '0x104'),
('Wire 3', '', '0x105')],
dtype=[('name', '<U100'), ('module', '<U100'), ('offset', '<U30')])
In [592]: instance2[1]=newone
In [593]: instance2
Out[593]:
array([('Wire 1', '', '0x103'), ('module one', '', ''),
('Wire 2', '', '0x104'), ('Wire 3', '', '0x105')],
dtype=[('name', '<U100'), ('module', '<U100'), ('offset', '<U30')])
It creates a new array of the correct target size, copies elements from the original array, and puts the new array into the empty slot.
I can't understand what you mean by:
I want to insert the name "Reserved" in the second element which would make the array have the following contents
['One Wire Instance 1','Reserved' , 'One Wire Instance 2', 'One Wire Instance 3']
Do you want:
instance[1] = 'Reserved','', ''
?
How would i create a dictionary using a csv file if the key is the last index (index[9]) in every row. for example:
,,,,,,,,,KEY_1
,,,,,,,,,KEY_1
,,,,,,,,,KEY_1
,,,,,,,,,KEY_2
,,,,,,,,,KEY_2
,,,,,,,,,KEY_2
,,,,,,,,,KEY_3
,,,,,,,,,KEY_3
,,,,,,,,,KEY_3
Is there a way to create a dictionary that would look like this:
dictt = {
'KEY_1':[,,,,,,,,], [,,,,,,,,], [,,,,,,,,],
'KEY_2':[,,,,,,,,], [,,,,,,,,], [,,,,,,,,],
'KEY_3':[,,,,,,,,], [,,,,,,,,], [,,,,,,,,],
}
I only have 6mons of self taught python and I am working out the growing pains. Any help is greatly appreciated. thank you in advanced
In answer to your "is it possible" question, one must say "not quite", because no Python construct matches the syntax you show:
dictt = {
'KEY_1':[,,,,,,,,], [,,,,,,,,], [,,,,,,,,],
'KEY_2':[,,,,,,,,], [,,,,,,,,], [,,,,,,,,],
'KEY_3':[,,,,,,,,], [,,,,,,,,], [,,,,,,,,],
}
Entering this would be a syntax error, and no code can thus build the equivalent.
But if you actually mean, e.g,
dictt = {
'KEY_1':[['','',,,,,,,], [,,,,,,,,], [,,,,,,,,]],
'KEY_2':[[,,,,,,,,], [,,,,,,,,], [,,,,,,,,]],
'KEY_3':[[,,,,,,,,], [,,,,,,,,], [,,,,,,,,]],
}
(and so on replacing each ,, to have something inside, e.g an empty string -- not gonna spend a long time editing this to fix it!-), then sure, it is possible.
E.g:
import collections
import csv
dictt = collections.defaultdict(list)
with open('some.csv') as f:
r = csv.reader(f)
for row in r:
dictt[r[-1]].append(r[:-1])
When this is done dictt will be an instance of collections.defaultdict (a subclass of dict) but you can use it as a dict. Or if you absolutely insist on its being a dict and not a subclass thereof (though there is no conceivably good reason to thus insist), follow up with
dictt = dict(dictt)
and voila, it's converted:-)
Another way:
txt='''\
,,,,,,,,,KEY_1
,,,,,,,,,KEY_1
,,,,,,,,,KEY_1
,,,,,,,,,KEY_2
,,,,,,,,,KEY_2
,,,,,,,,,KEY_2
,,,,,,,,,KEY_3
,,,,,,,,,KEY_3
,,,,,,,,,KEY_3
'''
import csv
result={}
for line in csv.reader(txt.splitlines()):
result.setdefault(line[-1], []).append(line[:-1])
>>> result
{'KEY_1': [['', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '']], 'KEY_3': [['', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '']], 'KEY_2': [['', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '']]}
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Below is a sample of the typical contents of a CSV file.
['**05:32:55PM**', '', '', 'Event Description', '0', "89.0 near Some Street; Suburb Ext 3; in Town Park; [**Long 37\xb0 14' 34.8 E Lat 29\xb0", '']
['', '', '', '', '', "17' 29.1 S** ]", '']
['06:09:11PM', '', '', 'Event Description', '0', "89.0 near Someother Street; Suburb Ext 3; in Town Park; [Long 37\xb0 14' 34.9 E Lat 29\xb0", '']
['', '', '', '', '', "17' 29.1 S ]", '']
['Report Line Header ', '', '', '', '', '', '']
['HeaderX', ': HeaderY', '', 'HeaderZ', '', 'HeaderAA', '']
['From Date', ': 2014/01/17 06:00:00 AM', '', 'To Date : 2014/01/17 06:15:36 PM', '', 'HeaderBB', '']
['HeaderA', 'HeaderB', 'Header0', 'Header1', 'Header2', 'Header3', '']
['', '', '', '', 'Header 4', 'Header5', '']
From each line containing the Date/Time and the location ( marked with ** -- ** ), I would like to just extract those relevant info, while ignoring the rest.
Even if I can just print results to screen, that is OK, ideally, create a new CSV containing only the time and lat/long.
If you really want to extract the data of this file formatted as in your example, then you could use the following since the data in every line has a list representation:
>>> import ast
>>> f = open('data.txt', 'r')
>>> lines = f.readlines()
>>> for line in lines:
... list_representation_of_line = ast.literal_eval(line)
... for element in list_representation_of_line:
... if element.startswith('**') and element.endswith('**'):
... print list_representation_of_line
... # or print single fields, e.g. timeIndex = 0 or another index
... # print list_representation_of_line[timeindex]
... break
...
['**05:32:55PM**', '', '', 'Event Description', '0', "89.0 near Some Street; Suburb Ext 3; in Town Park; [**Long 37\xb0 14' 34.8 E Lat 29\xb0", '']
>>>
otherwise you should reformat your data as csv
If that's really what your CSV file looks like, I wouldn't even bother. It's got different data on different rows, and a huge mess of nested ad-hoc strings, with separators within separators.
Even once you get to your lat and long figures, they look like a bizarre mix of decimal, hex and character data.
I think you'd be asking for trouble by giving the impression that you can deal with data in that format programmatically. If it's just a once off task, and that's the extent of the data, I'd do it by hand.
If not, I think the correct solution is to push back and try to get some cleaner data.
I developed and application for harvest any type of emails from files
types : ishani#dolly.lk
ishani(at)dit.dolly.lk
ishani at cs dot dolly dot edu
But the problem is output shows some extra items in a list other than the extracted full email. I coudnt figure out why is that. I tried in various ways.I think there is a problem in my regular expression or the logic
here is my code
data=f.read()
regexp_email = r'(([\w]+)#([\w]+)([.])([\w]+[\w.]+))|(([\w]+)(\(at\))([\w]+)([.])([\w]+[\w.]+))|(([\w]+)(\sat\s)([\w-]+)(\sdot\s)([\w]+(\sdot\s[\w]+)))'
pattern = re.compile(regexp_email)
emailAddresses = re.findall(pattern, data)
print emailAddresses
the output is like this
[('ishani#sliit.lk', 'ishani', 'sliit', '.', 'lk', '', '', '', '', '', '', '', '', '', '', '', '', ''), ('', '', '', '', '', 'ishani(at)dit.sliit.lk', 'ishani', '(at)', 'dit', '.', 'sliit.lk', '', '', '', '', '', '', ''), ('', '', '', '', '', '', '', '', '', '', '', 'ishani at cs dot dolly dot edu', 'ishani', ' at ', 'cs', ' dot ', 'dolly dot edu', ' dot edu')]
but Im expecting a output like this
['ishani#dolly.lk','ishani(at)dit.dolly.lk','ishani at cs dot dolly dot edu']
Is there any method that anyone tried which support my problem?
Change your regexp_email to this:
r'[\w]+#[\w]+[.][\w]+[\w.]+|[\w]+\(at\)[\w]+[.][\w]+[\w.]+|[\w]+\sat\s[\w-]+\sdot\s[\w]+\sdot\s[\w]+'
It doesn't seem that you need the capturing groups, so I have removed all of them.
You also don't need the [] around \w if \w is all you need to specify:
r'\w+#\w+[.]\w+[\w.]+|\w+\(at\)\w+[.]\w+[\w.]+|\w+\sat\s[\w-]+\sdot\s\w+\sdot\s\w+'
You could just skip the blanks
print [e for ea in emailAddresses for e in ea if e]
which produces
['ishani#sliit.lk', 'ishani', 'sliit', '.', 'lk', 'ishani(at)dit.sliit.lk', 'ishani', '(at)', 'dit', '.', 'sliit.lk', 'ishani at cs dot dolly dot edu', 'ishani', ' at ', 'cs', ' dot ', 'dolly dot edu', ' dot edu']
which isn't exactly what you asked for...