Paradigms of Information Retrieval


What a fancy title for a simple concept.

I think of information as organizable in 3 ways, ordered by increasing flexibility:

  • Folders
  • Tags
  • Search

These methods can and do overlap.

Folders

Information is stored in a hierarchy and is inherently ordered.

This is good because you get efficient iteration and a clear hierarchy.

This is great until it isn’t. Not everything is clearly orderable. Sometimes there is more than one natural order.

Folders are analogous to arrays in programming languages.

Tags

Information is tagged with keywords or some other map.

Tags are analogous to maps in programming (A.K.A. hash tables, dictionaries, hashmaps, etc) because they are maps, and carry the pros and cons of maps.

Tags are, above all, flexible. You can come up with any sort of mapping scheme, and even implement some sort of ordering (for example, by tagging each piece of info with an integer, A.K.A. an array).

Their disadvantages are specific to the mapping scheme used, but in a technical sense, they’re strictly more capable than folders, since folders are isomorphic to a specific tagging scheme.

This method is only possible because modern computers are really good. Rather than organizing information, you just search for what you need.

You can impose structure on your data (such as tags and folders), and either use or ignore it when searching.

There’s not much to say here. We’ve all used google, and the programmers among us have tried grep.

Search is analogous to a really fast computer that makes a lot (but not all) of data structure optimization pointless.

When the data gets big, you need to start imposing some structure to efficiently find anything.

Errors

If you find any errors, or if I’m wrong, or you want to tell me something, feel free to tell me.

Related Posts

Compactness of the Classical Groups

Derivative AT a Discontinuity

Just because 2 things are dual, doesn't mean they're just opposites

Boolean Algebra, Arithmetic POV

discontinuous linear functions

Continuous vs Bounded

Minimal Surfaces

November 2, 2023

NTK reparametrization

Kate from Vancouver, please email me