Reintegrating “memex”

2017-03-08

At the moment, the codebase that runs https://hypothes.is is divided into two parts:

  1. An installable package, memex, is responsible for annotation persistence and management, and provides the annotation-specific parts of the API (/api/annotations, /api/search). For historical reasons it’s also responsible for the hypermedia-ish API root view.
  2. An application package, h, which does everything else, including managing the static assets, accounts system, groups, administration pages, websocket server, activity pages, etc, etc.

memex was made a separate package with a view to releasing it independently of the rest of the h application for reuse by others, on the basis that:

Despite the fact that we have not yet released memex as an independent package, it is my view that we should reintegrate the code that is currently part of memex into the main h codebase.

People aren’t trying to use our code to run competing services

So far, everyone who has chosen to use our server-side code has done so because they have policy or security restrictions that prevent them from using our public service, and not because they are building an alternate service.

These people are investigative journalists and others who do not want to (or cannot) have their content on our servers. The other annotation services will come in time, but it’s probably too early to be worrying about them.

Integrating memex isn’t trivial, anyway

In practice, integrating memex (even a hypothetical documented and released version) is substantially harder than learning to deploy and customise h, in most cases. Even more importantly, customising h is currently made substantially harder by the need to understand the distinction between h and memex.

Apart from the extension points that we provide in the code, the integration between memex and a hypothetical integrator’s code is through the mechanism of Pyramid modules (memex is a Pyramid module). Given the small market share that Pyramid has in the Python web development world, we can expect this concept to be new to most developers choosing to integrate with it.

The fact that memex doesn’t provide an accounts system was (we thought) a feature, but in practice it’s probably a bug. If integrating memex means building your own Pyramid-compatible accounts system, then integrating memex is a lot of work.

Existing integrators of h have in practice chosen to use the existing accounts system as-is, or with the small modification that authentication can be done by a layer above h itself.

Changing templates and branding doesn’t seem to be a huge blocker

Not one person who has sought to use our backend code has asked how to change the theming or branding of the service. It seems we can happily ignore this problem until someone asks about it.

Database migrations… ¯\_(ツ)_/¯

Asking people to integrate with memex rather than h is asking people to integrate with a system which:

It doesn’t actually work with our client

Our client, in practice, depends on functionality implemented in h (such as the profile endpoint and, shortly, flagging and moderation endpoints).

We should endeavour to make as much of this new functionality orthogonal and optional (so that not every service has to implement every feature). But in practice most reusers of our software will want this functionality, and won’t want to have to build it again themselves.

Summary

I don’t believe that there is a strong case for continuing the attempt to fully separate memex from the rest of h.

I believe we can substantially reduce the cognitive overhead needed to understand the h codebase by integrating memex models into h.models, memex views into h.views, and so on.

We should and will continue to use good programming practice to minimise the complexity of the dependency relationships between different parts of the code. Integrating memex does not mean we should replace flexible extension points with hard-coded behaviours, especially where those extension points serve as useful abstractions that make it easier to think about components of the codebase in isolation.