I am currently working on a python projekt with tensorflow and I need to preprocess my data.
The data I want to use is stored in an sqlite3 database with the columns:
timestamp|dev|event
10:00 |01 | on
11:00 |02 | off
11:15 |01 | off
11:30 |02 | on
And I would like to export the Data into a file (.csv) looking like that:
Timestamp|01 |02 |...
10:00 |on |0 |...
11:00 |on |off|...
11:15 |off|off|...
11:30 |off|on |...
Which always has the latest information of every Device associated with the current timestamp and with every new timestamp the old values should stay and if there is an update only those value(s) should be updated.
The number of Devices does not change and I can find that number with
SELECT COUNT(DISTINCT dev) FROM table01;
Currently that Number is 38 diffrent devices and a total of 10000 entries.
Is there a way to to this computation with sqlite3 or do I have to write a program in python to process the data. I am new to both topics.
~Fabian
You can work it in sqlite, something along this lines
select
timestamp,
group_concat(case when dev="01" then event else "" end, "") as D01,
group_concat(case when dev="02" then event else "" end, "") as D02
from
table01
group by
timestamp;
Basically you are pivoting the table.
Challenges are, as the pivot needs to be kind of dynamic i.e. the list of the devices are not fixed. You need to query the list of devices and then build this query i.e. case when else part based on the list of the devices.
Also, generally you need to group based on the timestamp, as for device status for different devices will be in different row for a single timestamp.
Also if the {timestamp, device} is not unique you need make it unique.
Related
I'm trying to get information out of my MySQL database, but the data is stored every 10 seconds and I want to extract it every minute or every hour. I am using this sql command to get the data from a sensor with tagid = 1186. Thanks!.
SELECT t_stamp, floatvalue FROM database1.sqlt_data_1_2022_01 where tagid=1186;
console:
t_stamp floatvalue
1641013208360 86.2012939453125
1641013218361 86.32317352294922
1641013228362 86.3144760131836
1641013238365 86.53619384765625
1641013248366 86.37206268310547
1641013258367 86.31449890136719
1641013268368 86.36858367919922
1641013278369 86.26002502441406
1641013288370 86.34619903564453
1641013298375 86.14665985107422
1641013308372 86.06439971923828
1641013318373 86.54731750488281
Sounds to me you're asking how to perform a timely job. You can create a record table and store the output from your query every 1 minute or so. Try using 'EVENT'.
set ##global.event_scheduler=on; -- first of all enable the event scheduler
delimiter //
CREATE EVENT get_data_every_minute
ON SCHEDULE EVERY 1 minute STARTS now() DO
BEGIN
insert into record_table SELECT t_stamp, floatvalue FROM database1.sqlt_data_1_2022_01 where tagid=1186;
END//
delimiter ;
On the other hand ,if you intend to read the onscreen output other than storing the data. I would suggest using a procedure to run the query on a timely basis in an infinite loop.
delimiter //
create procedure show_result()
begin
while true do
SELECT t_stamp, floatvalue FROM database1.sqlt_data_1_2022_01 where tagid=1186;
select sleep(60); -- in seconds
end while;
end//
delimiter ;
call show_result; -- start the procedure
To terminate the procedure(infinite loop) , check the processlist and kill its pid.
show processlist; -- the process of the infinite loop looks like below:
| 144306 | root | % | testdb | Query | 0 | User sleep | select sleep(10) |
kill 144306 ; -- kill the pid to stop it
I have a csv with timestamp in UTC8.
whatever.csv:
timestamp
2020-09-09 11:42:33
2020-09-09 11:42:51
2020-09-09 11:49:29
I want to store them in BQ. After storing to BQ, this is the result I'm getting :
It said UTC instead of UTC+8.
However, the timestamp is correct but is there any way I can store it like this 2020-09-11 19:58:51 UTC+8 ? or something related as long as it reflect the actual timezone of the timestamp..
Secondly, can I specify the requirement in field schema since I'm storing this using python script and mapping it through schema from a YAML file such as :
somefile.yaml:
schema:
- name: "timestamp"
type: "TIMESTAMP"
mode: "NULLABLE"
You may need to state more about what do you want to achieve to get better help.
For one, BigQuery always stores TIMESTAMP in UTC. I have to guess that you don't really need the timestamp to be stored in certain timezone (because I can't imagine why does it matter to you how the ts is stored), you care more about how to display the timestamp in UTC+8. If my guess is right, there are 2 ways:
SELECT STRING(TIMESTAMP "2008-12-25 15:30:00+00", "UTC+8")
The approach requires you to decorate each of your TS columns, an one thing for all approach could be
SET ##time_zone = "Asia/Shanghai";
-- All subsequent query will use time zone "Asia/Shanghai"
SELECT STRING(TIMESTAMP "2008-12-25 15:30:00+00");
Both ones output:
+------------------------+
| f0_ |
+------------------------+
| 2008-12-25 23:30:00+08 |
+------------------------+
_id | name |
------------------
ew293 item_1
13fse item_2
dsv82 item_3
----------------
Lets assume this is the part of the data base, and i want to fetch a limited data
Ex:
db.collection.find({}, {name:1,_id:0}).limit(40)
Everytime i access, i want next set of 40 entries.
Is there any such command -- like the below one
db.collection.find({}, {name:1,_id:0}).limit(40).next()
I want next set of data only if it exists
And i access this in Python, so need a python code on this
Use the skip() method. This is supported in PyMongo (Python Mongo Driver) as well.
To get first 40 entries:
db.collection.find({}, {name:1,_id:0}).limit(40)
To get the next 40 entries:
db.collection.find({}, {name:1,_id:0}).skip(40).limit(40)
To get another 40 entries:
db.collection.find({}, {name:1,_id:0}).skip(80).limit(40)
i have a sqlite3 database file with the following tables.
solicitari-id
id_im
data_inchidere
defect
tip_solicitare
operatii_aditionale-id
id_im
data_aditionala
descriere
First table (solicitari) contains:
id , id_im, data_inchidere, defect,tip_solicitare
---------------------------------------------------
1,123456,2017-01-01 10:00:00,faulty mouse,replacement
2,789456,2017-01-01 11:00:00,pos installed,intall
3,147852,2017-01-05 12:00:00, monitor installed,install
4,369852,2017-01-06 11:00:00, monitor installed,install
Second table(operatii_aditionale) contain aditional operations:
id, id_im, data_aditionala, descriere
---------------------------------------------
1,123456,2017-01-02 10:00:00,mouse replaced need cd replacement to
2,123456,2017-01-03 10:00:00,cd replaced system ok
3,123456,2017-01-03 10:00:00,hdd replaced system ok
4,789456,2017-01-04 10:00:00,ac adapter not working anymore
What i want to do, is to build a table from these two tables but only with existing data between 2 dates wich will look like this:
id_im, data_inchidere, defect,tip_solicitare, id_im, data_aditionala, descriere
-------------------------------------------------------------------------------
123456,2017-01-01 10:00:00,faulty mouse,replacement,123456,2017-01-02 10:00:00,mouse replaced need cd replacement to
123456,2017-01-03 10:00:00,cd replaced system ok
123456,2017-01-03 10:00:00,hdd replaced system ok
2,789456,2017-01-01 11:00:00,pos installed,intall,789456,2017-01-04 10:00:00,ac adapter not working anymore
3,147852,2017-01-05 12:00:00, monitor installed,install
4,369852,2017-01-06 11:00:00, monitor installed,install
I have used , as separator here column separator.
I found something similar but is filling the left part for each aditional row from second table.
Is there a method to do this directly from sqlite query command or i need to use a python script?
btw, i need this for a python app.
thanks
I have a Python application. It has an SQLite database, full of data about things that happen, retrieved by a Web scraper from the Web. This data includes time-date groups, as Unix timestamps, in a column reserved for them. I want to retrieve the names of organisations that did things and count how often they did them, but to do this for each week (i.e. 604,800 seconds) I have data for.
Pseudocode:
for each 604800-second increment in time:
select count(time), org from table group by org
Essentially what I'm trying to do is iterate through the database like a list sorted on the time column, with a step value of 604800. The aim is to analyse how the distribution of different organisations in the total changed over time.
If at all possible, I'd like to avoid pulling all the rows from the db and processing them in Python as this seems a) inefficient and b) probably pointless given that the data is in a database.
Not being familiar with SQLite I think this approach should work for most databases, as it finds the weeknumber and subtracts the offset
SELECT org, ROUND(time/604800) - week_offset, COUNT(*)
FROM table
GROUP BY org, ROUND(time/604800) - week_offset
In Oracle I would use the following if time was a date column:
SELECT org, TO_CHAR(time, 'YYYY-IW'), COUNT(*)
FROM table
GROUP BY org, TO_CHAR(time, 'YYYY-IW')
SQLite probably has similar functionality that allows this kind of SELECT which is easier on the eye.
Create a table listing all weeks since the epoch, and JOIN it to your table of events.
CREATE TABLE Weeks (
week INTEGER PRIMARY KEY
);
INSERT INTO Weeks (week) VALUES (200919); -- e.g. this week
SELECT w.week, e.org, COUNT(*)
FROM Events e JOIN Weeks w ON (w.week = strftime('%Y%W', e.time))
GROUP BY w.week, e.org;
There are only 52-53 weeks per year. Even if you populate the Weeks table for 100 years, that's still a small table.
To do this in a set-based manner (which is what SQL is good at) you will need a set-based representation of your time increments. That can be a temporary table, a permanent table, or a derived table (i.e. subquery). I'm not too familiar with SQLite and it's been awhile since I've worked with UNIX. Timestamps in UNIX are just # seconds since some set date/time? Using a standard Calendar table (which is useful to have in a database)...
SELECT
C1.start_time,
C2.end_time,
T.org,
COUNT(time)
FROM
Calendar C1
INNER JOIN Calendar C2 ON
C2.start_time = DATEADD(dy, 6, C1.start_time)
INNER JOIN My_Table T ON
T.time BETWEEN C1.start_time AND C2.end_time -- You'll need to convert to timestamp here
WHERE
DATEPART(dw, C1.start_time) = 1 AND -- Basically, only get dates that are a Sunday or whatever other day starts your intervals
C1.start_time BETWEEN #start_range_date AND #end_range_date -- Period for which you're running the report
GROUP BY
C1.start_time,
C2.end_time,
T.org
The Calendar table can take whatever form you want, so you could use UNIX timestamps in it for the start_time and end_time. You just pre-populate it with all of the dates in any conceivable range that you might want to use. Even going from 1900-01-01 to 9999-12-31 won't be a terribly large table. It can come in handy for a lot of reporting type queries.
Finally, this code is T-SQL, so you'll probably need to convert the DATEPART and DATEADD to whatever the equivalent is in SQLite.