Storing Nuggets of Information

Latest Update: Added link to Doug’s ‘Digital Packrat’ article.

Introduction

The Situation

Some of my favourite non-fiction writers have an amazing ability to spout any number of fascinating and amazing facts about any subject that comes up. Bill Bryson, for example, or Chuck Palahniuk. At first, I just assumed they had utterly incredible memories, and just knew all this stuff. After a while, though, it occurred to me that they were actively finding this stuff, and storing away whatever they came across from day to day.

Researching stuff as and when needed is pretty easy now. Google will find facts about pretty much anything. There’s some skill to using it well, but it’s not difficult. Storing stuff away day to day, though, is still surprisingly challenging. If you want to be a good factual writer, it’s an important thing to work out.

Purpose

So why am I writing this? I want ideas. I’ll suggest a few of my own thoughts, but I’m interested in how other people have approached this. If nothing else, it’ll make a good article for our writing section. Oh, and getting things written down into an article can be a great way to help clarify your own thoughts.

Why It’s Difficult

It’s not something that sounds difficult to do, but I think there’s a few problems here.

Quick Storage

If it takes long for you to note something down, you won’t bother. If it takes too long to file that note away, you’ll never get around to it, and you’ll be left with just a stack of unsorted notes.

Quick Retrieval

When you come to write an article about eggs, you need to be able to find all the snips of info you’ve collected about eggs. And the quicker you can do that, the quicker you can write that article. If it takes too long, or too much effort, you’ll probably give up, and just use what you remember. The article won’t be as good, and your book (“I Really Like Eggs”) won’t sell very well.

Long Term Storage

You’re not storing stuff away to access next week. You might not need some of this stuff for years. That gives you two problems.

Safety

You need the notes to stay safe for some time. If you lose everything when your hard drive dies, or the paper notes go mouldy in a damp drawer, you’ve lost some important value.

Long-term Accessibility

Shouldn’t be a problem for paper, but if you store everything in computer files, you need to be able to access them when you need them. Will the file format you’re using be readable by the computer you’ll be using in ten or twenty years?

Solutions Writers Have Used

I don’t know how most writers approach this, though I’m interested to hear. A couple I do know about…

Chris Bidmead

I’m fairly sure it was Chris Bidmead who wrote about this a long time ago in PCW Magazine (UK computing magazine). He used plain text files, as they’re the only almost universal format. The filenames were just dates with a serial number appended. The first line of each file was in a specific format, with the source, title, and keywords in it. He then had scripts that could search all the files for anything matching specific sources, keywords, etc.

I remember him mentioning at the time that he’d kept the years as one digit, because the system was only designed with a ten-year lifespan in mind. Not sure what he did when he hit that limit. Could be easily enough expanded by using a folder for each decade, I suppose, or just renaming the files.

Chuck Palahniuk

I’ve just read Chuck’s ‘Stranger Than Fiction’, and he mentions his filing system in there – a wall of filing cabinets he stores everything in. It’s in the context of receipts, though, so I don’t know if this he uses something similar for notes too.

Bill Bryson

Bill is probably my favourite fact-based author, and I don’t know how he approaches this problem. He seems too convinced that his computer hates him, though, for it to seem likely that he entrusts it with storing all of his notes. I could be wrong. There’s a few hints in Notes from a Big Country – he mentions scribbled notes on bits of paper, and storing notes in a file labelled ‘Absent Mindedness’, so paper filed by subject matter would seem to be it.

My Thoughts on the Problems

Quick Storage

Most of the stuff I find these days that I might want to store probably comes at me through my computer, so that seems the most sensible place to put stuff for speed’s sake. Printing it out for paper storage would add time and effort. If most of your stuff comes at you printed, you might be pushed in the other direction by the extra time needed to scan or grab digital snaps of the items.

Time needed to store things depends on how it’s stored, and in what formats. Copy and paste should work pretty well to a text editor to store plain text. Lots of apps now can export HTML. Getting stuff into an application like Microsoft OneNote is pretty easy – just copy and paste. Custom-built databases for this sort of purpose might make things even easier.

Quick Retrieval

Full-text search is probably quite important here, in whatever solution you choose. I think it would be important also, though, to be able to do just a keyword search. So you’re not scanning through every article you’ve ever snipped that mentioned eggs, just the ones about eggs.

Long-term Storage

Safety

If you’re using paper, I guess your enemies here are fire and water. For computers, just be very sure of your backups. Enough good backups can protect against pretty much anything, as long as you keep copies off site.

Long-term Accessibility

This is a bit of a thorny point with computers. Your paper shouldn’t go out of date in ten years. Your computer files might. If I store everything in Microsoft OneNote, will the data still be accessible from whatever computer I’m using ten years from now? Will Microsoft still make OneNote? If they don’t, will the copy I have now still work under Windows MegaSplendid 2015, Ultimate Wicked Edition? I have no idea, and I have no way of finding out. Microsoft don’t know either. If OneNote doesn’t sell, they’ll stop making it. If it sells well, they’ll keep updating it, and make sure it upgrades the data with each new version. My data access is being governed by market forces.

PDF files seem pretty safe right now, but will they still be in twenty years? Plain text is pretty much certain to still work, and I doubt HTML will become unreadable any time soon, but there’s limits to what data we can put in to these formats.

If I wasn’t so anal about it, I’d probably just dump everything into OneNote and stop worrying. But I am, so I do worry. Reducing everything to text adds work, though, which damages our Quick Storage, and makes it difficult to store pictures.

Possible Solutions

OneNote

OneNote wouldn’t even be here, except that I use a tablet PC. OneNote goes well with tablets. At the touch of a button, a nice new window will pop up, sitting on top of whatever else you’re doing, for you to drag stuff to, or make a quick scribbled note with the tablet’s pen. That makes a big difference to your Quick Storage. You can search pretty quickly and easily too, though you can’t really limit it to keywords. The clever part is that you can scribble your notes in ink with the pen, and it can still search those notes. It reads your handwriting in the background, so it knows what the text says.

The problem, though, is the question of how long the data will last. If we’re looking for a solution that will last for ten or twenty years (or more), there’s no way of knowing if it will do the job. And if not, getting five years worth of data out of it all at once five years down the line could take some time. If there’s any chance of you switching platforms somewhere down the line – to Mac or Linux, say, then OneNote is unlikely to ever work on them – though if the much-rumoured Mac tablet ever actually happens, Microsoft may be tempted to port OneNote to there.

Plain Text Files Only

The safest option for the computing paranoid. Can be easily moved from system to system – no problem if you move to Mac or Linux, or whatever else should come along. Should certainly be readable in ten or twenty years.

You’re limited in what you can store – no pictures, no ink notes on the tablet, which does have an effect on how quickly you can store stuff.

Limited File Types

You could choose a limited set of file types you allow yourself to store, and store your data there. To have a consistent way of keeping them dated and tagged with keywords, you could just use a defined format for the filenames – say “YYYY-MM-DD_Note Title_keyword keyword keyword_Source of Note.ext” or something similar.

You can make this as flexible as you like, or as reliable as you like, depending on how many file types you allow.

Plain Text

Certainly should be allowed – simple and reliable, and readable on anything.

HTML

The success of the web means that HTML is readable by almost any computer, and should be reliably readable for plenty of years to come.

JPG

Jpeg images should be as safe as HTML.

Windows Journal

Relevant to tablet users – comes with tablet editions of Windows, so is pretty sure to be in the next version of Windows at least, but could be dropped at any time. Can export to HTML or TIFF images, but this would have to be done for each file, one at a time. Unlikely to be readable in anything other than Windows unless it catches on a lot more than it has so far.

Microsoft Word

Word is probably a settled enough application that its files will still be readable for a long time to come. Readable on Mac or Linux using OpenOffice, if platform-independence is important to you.

Purpose-designed Database

Although it’s not something I’ve really looked into, there are databases specifically designed for storing all these snippets of information for writers. The problem with these is that you would have to be pretty sure that the publisher will still be updating it for as long as you need it to work. I don’t know any of them that well. It also doesn’t seem to be such a unique thing to need to do as to require a special program writing.

Conclusions

I’m interested to hear how other people have tackled this same problem, but my own temptation is to stick to a few file formats with carefully defined filenames to make the searching easier. As a tablet user, though, the temptation of ink is strong. I think maybe OneNote and Journal are great for transitory stuff – to-do lists, making quick notes, planning and writing articles (I’m writing this in OneNote now) – but less useful for any sort of long-term storage.

I think I’ll probably go for lots of files, with just certain file types ‘allowed’ (though it’s my rule, so I can change it) and using a filename that includes keywords to make searching quick and easy.

Any ideas?

External Related Articles

27 thoughts on “Storing Nuggets of Information

  1. A few things come to mind:

    Other writers: From his essays, I twigged that Martin Gardner kept drawers of index cards, meticulously cross-indexed, with relevant articles or snippets from his reading paper-clipped to them. He’d draw on these when writing his books/essays.

    The New Yorker magazine also had a legendary cross-indexed 3×5 index card catalog of the magazine’s contents going back to the founding. Their insurance company identified the index cards as a risk, which led them to move to a database, and then to scan in the issues, and then to release the magazine’s contents on DVD (I’m getting them for Christmas). The 3×5 card system has now been abandoned. (Read this in a NY Times article and an interview on NPR.)

    Journalist James Fallows (who worked with Msft on the development of OneNote, I think, esp from a journlist perspective) is a computer buff from way back. He touted the use of old DOS programs like Grandview (outliner program to help him organize his stories) and Lotus Agenda (“a spreadsheet for words,” which had pretty amazing natural-language processing of text on the fly– Google on that and breathe in the nostalgia). He used Agenda to collect snippets of everything, create categories and views on the fly, and essentially keep track of his research and notes.

    Nowadays, he uses Brainstorm and Mindmanager, and who knows what all.

    The novelist Robertson Davis kept a writer’s notebook of ideas, characters, etc (near to my heart as a writer). He numbered each page, and each entry on a page got a letter. When it came time to write a novel, he noted that entries 9F, 10A, and 12B related to a single character, and he drew the threads together that way.

    I’ve also had (and have) the info-packrat disease, which fueled my purchase of Agenda, Infoselect, Ecco Pro, and god knows how many others.

    The computer columnist Jim Seymour wrote somewhere, and it made an impression on me, that there is information that likes to be structured — by chronology, by someone’s name, by the alphabet, by location, by function, by program name, whatever — and then there is loose info that you can’t define a container for YET, but that you can’t bear to lose. This has caused me sleepless nights and I debate its core usefulness to me, often.

    The 43Folders post on living inside a single text file inspired me to try again at home with Notetab (Windows text editor). It has a simple structuring facility it calls an outline, but which is simply a flat list of topic headings on the left, and the text on the right. I’ve found I prefer the flat headings to hierarchical; they remind me of keeping notes in my Palm Memo (ie, “Books/Loaned to,” “Books/Library,” etc). it’s also like spreading everything out on a table so I can scan it quickly; nothing is hidden underneath another topic; everything is on the surface.

    Lately, I’m trying to bookmark less often, save info less often, UNLESS I have a specific project in mind. In that case, I create the folders/structures to contain that info and the info naturally adheres to it.

    At work, I use a dead-simple program called Electric Notebook (http://lincoln.midcoast.com/~ian/notebook.html), a very personal (ie, idiosyncratic) program with few of the amentities of OneNote, except that it can sit open all day, I type stuff in as it occurs to me, with (I hope) the right keywords, and then I search on it as I need to. Which is never as often as I think. It’s an electronic logbook, basically. It’s based on just keeping stuff chronologically, but in a rough-and-ready fashion. I find that it’s dumbed-down enough to suit my simple needs very nicely. I find, though, that I use it at home less than I use Notetab.

    For structured info at work, I use an OpenOffice Writer document to simulate Word’s Document Map function (which is similar to Notetab’s outline function — is there a pattern??). This document is called “infoindex” and holds various Unix commands, checklists, timecard chargecodes, etc., that demand to be stored and used as reference, not stuff that’s part of the passing scene. Stuff I input into Notebook that’s worth remembering or referring back to more than once gets migrated to the infoindex.

    I find this two-pronged approach works well for me. Electric Notebook for unstructured info, Infoindex for structured info. And it’s a simple enough process that I can use it when I’m distracted or under the weather.

    I would also refer you to the c2.com wiki’s entires on LogBook (http://c2.com/cgi/wiki?LogBook) and ElectronicLogBook (http://c2.com/cgi/wiki?ElectronicLogBook).

    Sorry for the long post! But this is a big interest of mine.

    Mike

  2. I’ve struggled with this for years. One of the first applications I bought for my Macintosh in 1985was FactFinder. It was a slick little database for storing and retrieving text notes.

    What served me well from 1988 to 2004 was a small, spiral-bound notebook made by DayTimer. The model number is 98160. You can see them here

    This little book had several distinct advantages.

    1. Spiral bind lays flat.

    2. Pages are numbered (68 pages per book).

    3. Small enough (3.5 x 6.5 inches) to fit in a pocket.

    4. Fit beautifully in my Filofax.

    I would write the date at the top of each page and then write all of my notes for that day. Some days ran to several pages. Everything went in there: phone numbers, notes from phone calls, ideas, quotes from articles or speeches or conversations. I usually filled up 2 or 3 books each year. One year I filled 5.

    I think I got the idea from one of Jerry Pournelle’ BYTE columns.

    http://www.byte.com/art/9601/sec13/art1.htm

    For a while I kept a simple database of the books’ contents. Wished I’d kept it up.

    (Edited by pigpogm – just removing a long URL that was breaking the page flow – replaced it with a link.)

  3. Thanks for all the comments…

    Jim: XML is a step in the right direction, but I don’t know how much it helps practically – the application still has to be able to understand the specific markup used in the XML file, doesn’t it?. Certainly better than anything binary, though – at least the text should all be readable and unederstandable, and in some way parseable.

    Rob: Google desktop certainly can help with the searching, but doesn’t help with file formats becoming unsupported over the years. I ended up uninstalling it, as it seemed to be taking too many resources, but the searching was quite impressive.

    mc: Mail.app stores everything as text? Nice. I assume it uses one of the standard old Unix mail formats, then, which should remain fully supported for a long time, and at least searchable and readable as text pretty much indefinitely. Sounds like quite a nice setup you have there – the Newton still has many fans.

    Steve: I only say HTML slightly less than text, as the markup does change a bit over the years, but you’re probably right – old HTML 2 files should still work just fine in any modern browser, and they’re still just as searchable and readable in a text editor. I thought current versions of Word would still open pretty much any previous version of Word’s files, along with WordPerfect 5.1, WordStar, and the like, but I have to admit, I’ve not tried it. Reopening and saving in new versions would be a lot of work each time, but would keep things up to date. I’d have thought JPEG should remain pretty much safe, since the web is so heavily dependent on it – can’t see JPEGs becoming unsupported in the next ten or twenty years. Not too easy to search for text, though – next version of OneNote will do it. You make a good point about CDs and DVDs, though. I’d probably keep everything on hard disk, though, with CDs and DVDs for backups. We’re probably not talking about that much data, but if we’re including such things as digital camera snaps, it could certainly add up to a lot.

    All: Sorry for the formatting problems in the comments – it seems to have been eating line breaks since upgrading WordPress – grabbed a new version of the MarkDown plug-in, and that seems to have fixed things. Thanks again for the input.

  4. “Limited File Formats” a slight quibble.

    HTML should be as good as text, provided you keep the markup extremely simple. It’s only text after all.

    I personally prefer HTML over text as it is structured but simple. And as text, it is searchable by command-level tools.

    jpeg? not nearly as durable as HTML since HTML is as good as text.

    MS Word? If you look back over the history of Word, you will find Microsoft changes the file format of .doc almost every version of Word. And backwards compatibility is lost every 2, 3 versions. This is extremely unlikely to endure.

    The only truly ‘safe’ way to keep documents readable, is to maintain your entire archive in current formats. Using MS Word as an example, then, you would upgrade Word/Office with each new version (at least every other version) and re-open all your old Word documents, saving them under the new format, and re-archiving.

    Ditto with jpegs and any other binary file formats.

    This approach has the added benefit of keeping your store of hard-drives, CDs or DVDs, relatively ‘fresh’ – though it does add a lot of effort to the whole process. For that reason, I’d say a yearly ‘archive refresh’ would be smart policy. Probably shouldn’t take more than a weekend?

  5. Thanks for the article. I have the original tablet, a Newton 2100 which compares most excellently with the travesty of an Acer tablet that my wife’s work gave her. (Sorry, don’t mean to start a war on that account, so ignore it, please.) The Newton files I have can easily be sent by mail, where they are stored in my Mail.app folders – text files, standards readable, etc. I do the “yyyymmdd title keyword” naming system, in addition. When not writing on the Newton, I use plain text where possible. All this gets merged together into searchable files, and it works great. Plain text archives, readable in any format, linked together with OS level search and archiving. Works for me…

  6. You could store it using any application then use Google desktop to find it.

  7. The Commonplace Book

    Over at the PigPog Blog is a great post about Storing Nuggets of Information, calling for ideas. This is something I’ve been struggling with for many years myself, and have only lately been making any sort of headway. When I think about all the years …

  8. Good problem statement. One of the best tools I’ve come across for this is an app called Tinderbox. Unfortunately, it’s only available for the Mac, although a Win version has been promised – for over two years. Data is in XML format, which, while not a guarantee of future accessibility, at least increases the probability. Something to consider in whatever solution you select.

  9. Wow, thanks, Mike. That’s a lot of information. It’s fascinating to know some of the history behind some of this stuff – I assume you (and most other people) have come from Douglas’ article on DIYPlanner, where he discusses Commonplace Books?

    Not seen Brainstorm. Mindmanager is a program that you can’t seem to help being repeatedly told about when you have a tablet PC, but the price is just way too much for me to consider. I’ve never tried Agenda, but it’s certainly a legendary app – Mitch Kapor’s work, wasn’t it, in the early days of Lotus? Vaguely heard of Infoselect, but don’t know much about it. Ecco Pro still has its fans, and it’s distributed free now, which always makes things more popular.

    The 43 Folders post you mention was an inspiring one. It did tempt me for a while, but plain text doesn’t really go well with a tablet. Admittedly, I was thinking then about lists of current information, which has very different requirements from reference material stored away in case it’s needed years from now. GTD lists and the like don’t need to be in a format that’s readable in ten years, as long as they’re readable for ten days or weeks.

    Thanks again for all the info. I’ll probably do a follow-up article soon to this, taking in some of what I’ve learned in the comments and from Doug.

  10. Hi Michael — Yes, I got to this topic from DIY, though your site has been on my radar for quite a while and I check in occasionally. I tried PigPog for a while, didn’t click with me, but it helped me think through how I personally process info and tasks, and I re-read the post from time to time.

    I’ve always been a fan of commonplace books, don’t know why. I keep a Word file that I dump them into, and at end of year I print it out and put it in a “Commonplace” folder; the folder also holds hard copy I come across that I want to preserve.

    See, information packrat 🙂

    I bought Brainstorm and have tried it a couple of times, but it also doesn’t click with me. I’ll probably try it again; I like trying out idiosyncratic programs made by developers at home. Notebox Disorganizer is another oddity; the UI is basically a spreadsheet grid, but each cell is a cubbyhole in which you can dump your information. The Editorium newsletter had a neat description of how he uses it; scroll down to “Resources.”

    Mindmaps are more fun to hand draw and noodle with, IMO, than the software-based ones. Too much cognitive overhead and time spent getting it just right on the puter, when a good-enough handdrawn one will help sort out your thinking.

    There’s also Evernote, if you’ve not tried it. It’s been getting some good buzz.

    Yes, Agenda was Kapor’s brainchild, and he’s now working on something called Chandler, supposedly another info mgmt tool. Agenda still has quite a loyal following.

    So much software, so little value from so much of it. I wonder if, in a world of less software meant to save time and improve my life, I’d have read more books.

    I think software is sometimes best used for a specific project or purpose, not as something to live in. That’s why I like the idea of the single text file approach — Google has taught us that categorization is not vital if you have full text search. And there’s little in my personal life that requires the full categorization that I need in my workplace.

    Still, I’m one of those people who like to file and make categories, so it comes naturally to me. I remember something I read a long time ago, that humans (esp computer people) tend to leap for the complicated solution first, thinking of all the exceptions that have to be trapped, and so on. In reality, a good-enough system will probably work and you only should handle exceptions as they arise.

    This is why I’ve drifted away from all-in-one software solutions, because I find I tend not to think of them as easy to use as a pencil or a text editor. (I daresay PigPog is an attempt to simplify GTD in the same way.) I also think that’s the value of the weekly review, to refresh those brain synapses about what’s out there. You can’t remember everything, but if you can remember where you put it, then that’s just as good. As the Extreme Programming guys say, do the simplest thing that could possibly work.

    You probably read/heard about the researcher who used DevonThink as his commonplace book/dumping ground for bits of text. He had an assistant type in lots of stuff and then Devon searched around and made unusual connections the writer would not have thought of. But the time cost of doing something like that is prohibitive to me. And as you say, what if the software never progresses (like Agenda or Ecco)?

    Sorry for another long post! I find this kind of discussion hits on things I’ve tried to figure out in my own life/work.

    All best — mike

  11. I have tried dozens of programs for info-keeping on the PC and settled on TexNotes from GemX. However, my “real” commonplace book is a Moleskine notebook. Yep….pen and paper….hard to beat.

  12. The Commonplace Book: Part I

    Over at the PigPog Blog is a great post about Storing Nuggets of Information, calling for ideas. This is something I’ve been struggling with for many years myself, and have only lately been making any sort of headway. When I think about all the years …

  13. I use topic tagging, and the same tagging for my folder system for storage, both virtual and physical. What I finally settled on was the Library of Congress system, but smushed and shortened for my convenience.

    Example:

    Z-Lib

    Z-Lib_Cataloging

    Z-Lib_Typography_Fonts

    Z-Lib_Work

    The fonts category has a symbolic link to it from N-ART_NE_Print, since that would be the other place to find that kind of information in an academic library like the one I work in.

    Another thing that’s helped is having an AA Media and AA Programs folder/tag, that contains all the other tags. That way my media and programs are always seperate, and in my main organization scheme it takes less than three tries to find the information I stored.

    You can see an example of it here at my LibraryThing catalog. http://www.librarything.com/work/2418640&book=216333

    While I found it’s great for books, it’s also great for storing nuggets of information, as long as I remember to tag it with every tag I’ve created that I think is relevant.

    It’s been working wonderfully for me. It’s text based and folder based, so should work on any computer system. I use Ubuntu Debian GNU/Linux, which has a great finder program called “Beagle”, but I found that didn’t help me with my physical notes and files. The system I devised has been working great for me.

  14. I remember him mentioning at the time that he’d kept the years as one digit, because the system was only designed with a ten-year lifespan in mind. Not sure what he did when he hit that limit.

    I recall this too, but slightly differently. It was the month, not the year, that I reduced to a single (hex) digit so that I could keep the filename prefix down to 8 chars, allowing me three leading chars for descriptive purposes, and reserving the three char suffix for defining filetype or some other parameter like magazine. Eg: SEX11A89.PCW. As you say, all these files were .txt files, at least initially, although later I mixed in .rtf files as well. Any “time limit” on this scheme would have been because I foresaw the eventual removal of the damnfool restriction to 8.3 chars per filename imposed by MSDOS.

    Incidentally the first line header took this form:

    [Title of the piece][magazine][date][author]

    … which allowed these fields to be parsed out by a variety of different text processing utils. This part of the scheme is something I still use today.

    — Chris

  15. Thanks for the clarification, Chris. I do remember the first line being special information, which was pulled out and processed by a bit of Unix scripting, and I think the filename limit was just so you could move the whole thing to any other system without having to chop filenames down, so you limited it to DOS/ISO9660. The month was certainly hex, because picked up the habit for a while of using the same thing, which annoyed anyone attempting to read those dates 😉

    Shame I don’t still have the back issues to be able to check, but after keeping a couple of years of PCWs, the house foundations were starting to crumble, so we had to hire a low-loader and a crane, and have them removed. They don’t make magazines that thick any more.

    I vaguely remember, though, some mention of the limit being relatively short, but that you were confident that technology would catch up a bit by then.

    Anyway, ‘SEX’ and ‘PCW’? Did I miss an issue? Actually, you could never tell with Michael Hewitt’s column.

    Thanks for those great columns, though – it says a lot that they are still inspiring discussion all this time later. Now it feels like we’re having the world’s slowest conversation.

  16. Pingback: BrownStudies : Storing Nuggets of Information

  17. “To have a consistent way of keeping them dated and tagged with keywords, you could just use a defined format for the filenames – say “YYYY-MM-DD_Note Title_keyword keyword keyword_Source of Note.ext”

    I was just wondering, if there’s a special reason for this configuration for file names, or what are your reasons. Could explain it in more detail, which are your reasons, and if you currently are using this or any other schema. Regards.

Comments are closed.