How to pull variables from line of data file in Python [closed]

How to pull variables from line of data file in Python [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a large data file where each row looks as follows, where each pipe-delimited value represents a consistent variable (i.e. 1517892812 and 1517892086 represent the Unix Timestamp, and the last pipe delimited object will always be UnixTimestamp)
264|2|8|6|1.32235000|1.33070000|1.31400000|1257.89480966|1517892812
399|10|36|2|1.12329614|1.12659227|1.12000000|148194.47200218|1517892086
How can I pull out the values I need to make variables in Python? For example, looking at a row and getting UnixTimestamp=1517892812 (and other variables) out of it.
I want to pull out each relevant variable per line, work with them, and then look at the next line and reevaluate all of the variable values.
Is RegEx what I should be dealing with here?

No need for regex, you can use split():
int(a.strip().split('|')[-1])

If all variable are only number and you want a matrix whit all your values you can simply do something like:
[int(line.strip().split('|')) for line in your_data.splitlines()]

You can use regex and re.search():
int(re.search(r'[^|]+$', text).group())

Related

How to add data for missing values [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
all.
I have a question on how to add missing values to a dataset object.
I'm currently working on crop growth modeling, and employ NASA Power API as a weather dataset.
However, the NASA Power dataset has missing days.
enter image description here
I used pcse library in order to extract NASA Power dataset.
My question is, how to add the missing day's data.
I tried
wdp(date) = wdp(date-timedelta(days=1))
but it gives me back 'can't assign to function call'
anyhow, it seems that the data for the missing date does not exist in the object and I am not allowed to make it.

You have the right idea, but the wrong syntax. In Python, list and dict access uses square brackets ([]), see the docs.
To add to that, pcse’s WeatherDataProvider object does not support this style access. Checking out the code in this link, it appears there is a method you can call named _store_WeatherDataContainer, where the leading _ indicates it is not intended for public use, but that doesn’t mean you can’t :-)
It should look like this:
wdp._store_WeatherDataContainer(wdp(date-timedelta(days=1)), date)

Assign multiple dataset as one variable [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am extracting multiple dataset into one csv file.
data = Dataset(r'C:/path/2011.daily_rain.nc', 'r')
I successfully assigned one dataset but i still have ten more to work with in the same way. Are there any methods or functions can allow me to assign or combine multiple dataset as one variable?

From what you've described, it sounds like you want to perform the same task on each set of data. If that is the case, then consider using storing your dataset paths in an array, then using a for .. in loop to iterate through each path.
Consider the following sample code:
dataset_paths = [
"C:/path/some_data_file-0.nc",
"C:/path/some_data_file-1.nc",
"C:/path/some_data_file-2.nc",
"C:/path/some_data_file-3.nc",
# ... and the rest of your dataset file paths
]
for path in dataset_paths:
data = Dataset(path, 'r')
# Code that uses the data here
Everything in the for .. in block will be run for each path defined in the dataset_paths array. This will allow you to work with each dataset in the same way.

How to categorize entries in Python without O^2 runtime? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I want to write a Python 3 script to manage my expenses, and I'm going to have a rules filter that says 'if the description contains a particular string, categorize it as x', and these rules will be read in from a text file.
The only way I can think of doing this is to apply str.find() for each rule on the description of each transaction, and break if one is found - but this is a O^2 solution, is there a better way of doing this?

Strip punctuation from the description, and split it into words. Make the words in the description into a set, and the categories into another set.
Since sets use dictionaries internally and dictionaries are built on hash-tables, average membership checking is O(1).
Only when a transaction is entered (or changed), intersect both sets to find the categories that apply (if any), and add the categories to your transaction record (dict, namedtuple, whatever).

How can i make python take a data time in format hh:mm:ss and store it in an array? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a list of csv file which contains several columns.
There's one that contains the lenght of my test in this format hh:mm:ss
I need to divide this data in two database based on lenght: <00:16:00 or >00:16:00
How can I do that?
Thanks for helping and sorry for my bad english.

Brute force:
value = "00:15:47" # taken from csv
if value < "00:16:00":
# handle smaller values
else:
# handle bigger values

Making a sorted list [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have been given a set of 20,000 entries in Excel. Each entry is a string, and they are all names of events such as: Daytona 500, NASCAR, 3x1 Brand Rep, etc.
Many of the event names are repeated, and I would like to make a list and sort them and find the most common items in the list, and how many times each one is entered. I am half way through my first semester of Python and have just learned about lists, and would like to use Python 2.7 to do this task, but I am also open to using Excel or R if it makes more sense to use one of these.
I'm not sure where to start or how to input such a large list into a program.

In Excel I would use a PivotTable, about 15 seconds to set up:

your_list = ['Daytona 500', 'NASCAR'] # more values of course
Now use a dictionary comprehension to count items for each unique key.
your_dict = {i:your_list.count(i) for i in set(your_list)}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to pull variables from line of data file in Python [closed] - python

No need for regex, you can use split(): int(a.strip().split('|')[-1])

If all variable are only number and you want a matrix whit all your values you can simply do something like: [int(line.strip().split('|')) for line in your_data.splitlines()]

You can use regex and re.search(): int(re.search(r'[^|]+$', text).group())

Related

How to add data for missing values [closed]

Assign multiple dataset as one variable [closed]

How to categorize entries in Python without O^2 runtime? [closed]

How can i make python take a data time in format hh:mm:ss and store it in an array? [closed]

Making a sorted list [closed]

Categories

Resources