Data-mining the Dog
2019-11-11 05:07 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
I suppose I could have titled this one "Data-mining the to-DO loG",
but since the log is kept in a directory called Dog
...
My to.do
file is somewhere in between a bullet journal and a
logbook. Since its start as a pure to-do list in 2006, it has come to take
on an increasingly important role in my life. (Some people might say that
that's because my memory is deteriorating; they might be right.)
If you haven't already seen my How to.do it post, you might want to read that first. Or look under the cut in any of my "done since" posts. I mentioned a new tagging convention in this post. I have since extended it to make it easier to extract information, and also written a more general-purpose search tool. Because tool-using bear.
The net effect is that I can now easily answer questions like "when was the last time Colleen was discharged from a hospital" (answer: September 10), "what else did I do that day?" (answer: fix a messed-up fstab on Nova, and start to make a list of things I avoid doing, among other things), and so on.
As long as I can search and reliably find the search term and the date on
the same line, grep
works pretty well, and the convention of
putting the mmdd
date in parentheses right in front of one of
the words "Admitted", "Discharged", or "Transferred" (or just the letter),
and I can get "the last time Colleen was in the hospital" from:
grep '[0-9])D' 2*/*.done | tail -1
and the number of hospital stays in 2018 with
grep '[0-9])A' 2018/*.done | wc -l
Requiring a digit before the right parenthesis keeps me from getting false positives on things like "(gastroenterologist)Dr.". Other queries are equally simple. With a date somewhere on the line, I can find things like "CPAP" and "litter".
Of course, I had to go back searching for things like "admit" and
"hospital" and put them into the correct format. But none of that helps
much with queries like "what else was I doing?", because grep
just returns a filename and a line number along with the lines that it
finds. Then I have to go to emacs
or less
and
navigate down to the line. It's possible to do better.
The solution was a script called dgrep
, where the "d" stands for "done" or something like
that. It does a couple of things differently:
- Mainly, it knows that dates are four digits starting in column 1, so it can print them with the hit.
- It knows where my to-do archive is, so I don't need to tell it what directories to search if I just want to search all of them.
so I can do the following:
dgrep '[0-9]\)D' | tail -1 2019/09.done:247: 0910: / (0910)Discharge instructions:
but there's one more trick. The '--less' option prints, not a filename
and line number (which emacs
and other editors can parse),
but a command that you can use to search for that date:
dgrep --less '[0-9]\)D' | tail -1 less -p ^0910 2019/09.done 247: / (0910)Discharge instructions:
I just select the command, and click the middle mouse button to paste it into the command line. The help message also tells you the command line you need to look at each of the hits in succession.
The dgrep
script is written in Perl and necessarily uses
regular expressions, both of which are well into "now you have two
problems" territory if you're not careful. But it works.
Another fine post from
The Computer Curmudgeon (also at
computer-curmudgeon.com).
Donation buttons in profile.
NaBloPoMo stats: 7582 words in 11 posts this month (average 689/post) 618 words in 1 post today
no subject
Date: 2019-11-12 08:33 pm (UTC)no subject
Date: 2019-11-12 08:54 pm (UTC)