Sort voluminous file text by date using python [closed]

Sort voluminous file text by date using python [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm new with python and I have to sort by date a voluminous file text with lot of line like these:
CCC!LL!EEEE!EW050034!2016-04-01T04:39:54.000Z!7!1!1!1
CCC!LL!EEEE!GH676589!2016-04-01T04:39:54.000Z!7!1!1!1
CCC!LL!EEEE!IJ6758004!2016-04-01T04:39:54.000Z!7!1!1!1
Can someone help me please ?
Thank you all !

Have you considered using the *nix sort program? in raw terms, it'll probably be faster than most Python scripts.
Use -t \! to specify that columns are separated by a ! char, -k n to specify the field, where n is the field number, and -o outputfile if you want to output the result to a new file.
Example:
sort -t \! -k 5 -o sorted.txt input.txt
Will sort input.txt on its 5th field, and output the result to sorted.txt

I would like to convert the time to timestamp then sort.
first convert the date to list.
rawData = '''CCC!LL!EEEE!EW050034!2016-04-01T04:39:54.000Z!7!1!1!1
CCC!LL!EEEE!GH676589!2016-04-01T04:39:54.000Z!7!1!1!1
CCC!LL!EEEE!IJ6758004!2016-04-01T04:39:54.000Z!7!1!1!1'''
a = rawData.split('\n')
>>> import dateutil.parser,time
>>> sorted(a,key= lambda line:time.mktime(dateutil.parser.parse(line.split('!')[4]).timetuple()))
['CCC!LL!EEEE!EW050034!2016-04-01T04:39:54.000Z!7!1!1!1 ', ' CCC!LL!EEEE!GH676589!2016-04-01T04:39:54.000Z!7!1!1!1', ' CCC!LL!EEEE!IJ6758004!2016-04-01T04:39:54.000Z!7!1!1!1']

Take a look into regular expression module, I've used it a couple of times and it looks lretty simple to do what you want with this module
https://docs.python.org/2/library/re.html Here is the docs but try googling for regular expression python examples to make it more clear, good luck.

Related

Splitting the paragraphs in two Python strings into lines of a maximum width [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 months ago.
Improve this question
I have two strings in a Python script which each contain single lines of text, blank lines and multiple paragraphs. Some of the paragraphs in the strings are very long so I would like to split them into multiple lines of text so that each line in the paragraphs is a certain maximum width. I would then like to split each string into lines so that the strings may be compared using the HtmlDiff class in the difflib module. Might someone know a quick and easy way to do this? I would greatly appreciate it. Thanks so much.

By searching, I found the following link:
How to modify list entries during for loop?
Using the information in the first answer, and the first comment to this question, I was able to achieve what I was looking for using code as the following below:
firstListOfLines = firstText.splitlines()
for index, line in enumerate(firstListOfLines):
firstListOfLines[index] = textwrap.fill(line)
firstListOfLines = '\n'.join(firstListOfLines).splitlines()
secondListOfLines = secondText.splitlines()
for index, line in enumerate(secondListOfLines):
secondListOfLines[index] = textwrap.fill(line)
secondListOfLines = '\n'.join(secondListOfLines).splitlines()
Thanks so much. The first comment helped me to think about what to do. Thanks again.

How to selectively replace characters in a string? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
How would I replace characters in a string for certain indices in Python?
For example, I have version = "00.00.00" and need to change each of the 0s to a different value, say 3, to look like "33.33.33". Also, would this be possible if I had a variable storing this value. If I have vnumber = "3", would I be able to get the same output by using the variable? I'm sure replace() is a good function to use for this, but I'm not sure about syntax.

From an interactive session, you could type:
>>> help(str.replace)
But to answer the question most directly:
vnumber = '3'
newversion = version.replace('0', vnumber)
Is probably what you want to do.

Your guess about str.replace was right. It takes to arguments, the first is the string to be found in the original string, and the second is the string to replace the found occurrences of the first argument with. Code could be like this:
vnumber = "3"
version = "00.00.00"
newversion = version.replace("0", vnumber)
print(newversion)

Hash (MD5) an incremental list to provide individual hash values [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm hoping that somebody could help me. I'm really new to scripting. I'm not trying to reverse hash anything but I am interested in trying to generate a numeric list from 0 to 1000 and then obtain the MD5 for each number.
I appreciate that this sounds very simple. I've managed to generate a txt file with the list but I can't seem to get the numbers on individual lines which I could then hash using the likes of md5deep.
If anyone could help me out I would really appreciate it.

I am not clear what your exact problem is from the question, it sounds like you have a few options. Here is a pure shell/python solution
Pure python:
import hashlib
for i in xrange(1, 1001):
print hashlib.md5(str(i)).hexdigest()
Output
c4ca4238a0b923820dcc509a6f75849b
c81e728d9d4c2f636f067f89cc14862c
...
b706835de79a2b4e80506f582af3676a
a9b7ba70783b617e9998dc4dd82eb3c5
Pure shell (here shown on osx, similar on linux)
seq 1 1000 | xargs -I {} md5 -s "{}"
Output
MD5 ("1") = c4ca4238a0b923820dcc509a6f75849b
MD5 ("2") = c81e728d9d4c2f636f067f89cc14862c
...
MD5 ("999") = b706835de79a2b4e80506f582af3676a
MD5 ("1000") = a9b7ba70783b617e9998dc4dd82eb3c5

How to randomly sample lines from a large text file - from the command line [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I am working with a ~0.5 GB text file, and I want to extract a representative subset of lines. Say, one millionth of them. I've create a small script to do this:
import random
result = []
with open("data.txt") as f:
for line in f:
if random.random() < 0.000001:
result.append(line)
But it would be more useful for my purpose if I could do this from the command line, without a script. Note, I don't care how many lines out output, I just want to be able to set a percent/probability of outputting each line.
MY QUESTION/REQUEST: Is how to do this with just a short one-liner which is suitable for the commandline.

Is perl ok? Try this:
cat yourfile.txt | perl -ne 'print if (rand() < 0.000001)'

How do I read a text file into a string variable in Python starting at the second line? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I use the following code segment to read a file in python
file = open("test.txt", "rb")
data=file.readlines()[1:]
file.close
print data
However, I need to read the entire file (apart from the first line) as a string into the variable data.
As it is, when my files contents are test test test, my variable contains the list ['testtesttest'].
How do I read the file into a string?
I am using python 2.7 on Windows 7.

The solution is pretty simple. You just need to use a with ... as construct like this, read from lines 2 onward, and then join the returned list into a string. In this particular instance, I'm using "" as a join delimiter, but you can use whatever you like.
with open("/path/to/myfile.txt", "rb") as myfile:
data_to_read = "".join(myfile.readlines()[1:])
...
The advantage of using a with ... as construct is that the file is explicitly closed, and you don't need to call myfile.close().

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sort voluminous file text by date using python [closed] - python

Take a look into regular expression module, I've used it a couple of times and it looks lretty simple to do what you want with this module https://docs.python.org/2/library/re.html Here is the docs but try googling for regular expression python examples to make it more clear, good luck.

Related

Splitting the paragraphs in two Python strings into lines of a maximum width [closed]

How to selectively replace characters in a string? [closed]

Hash (MD5) an incremental list to provide individual hash values [closed]

How to randomly sample lines from a large text file - from the command line [closed]

How do I read a text file into a string variable in Python starting at the second line? [closed]

Categories

Resources