Preface¶

The following pages describe the architecture of InvenioRDM and Invenio Framework from a high-level point of view. They are meant to describe and expose the higher-level structure, and help guide how code should be organized.

How did we arrive to the current architecture?¶

The perfect architecture

The overarching goal of the architecture is similar to any other software system. We want a software system that's easily maintainable, scalable, extendable, adaptable, resilient, and ...insert your favorite other buzz words.

Reality

The reality is that there is a lot of methodologies and patterns on how to build and architect software systems. However, in practice, while methodologies are useful it is often more about tradeoffs and finding the right balance than about strict application of a specific methodology. Most of the time you have to deal with deadlines, requirements, design patterns, costs, legacy code, people, projects, prior history and practices.

Evolving

InvenioRDM is no different. The architecture is largely a byproduct of our past experiences and challenges we've faced. The architecture as described here is not meant to be a final answer, but rather an evolving architecture that adapts and improves over time. You also won't find the answer to all your questions in the following documents. As we work with the architecture, we identify shortcomings, missing things and concepts that could be better defined.

Past experiences and challenges

Some of the experiences and challenges we faced:

High developer turn-over and many juniors: Because of CERN's organizational framework and contract policies, we've often had a high turn-over of developers and juniors. This means we must focus on good onboarding and great documentation to get developers to a level where they can efficiently contribute and develop high quality software.
Spaghetti code: We have had our share of "data massaging", type conversions and fluffy requirements which all led to big balls of mud and interdependency hell.
Bad design choices: We've obviously sometimes made bad design choices and learned from our mistakes.
Recovering from failures: We've had to recover from some spectacular database crashes and file loss incidents on big distributed storage clusters that helped us understand which features were helpful and which features you wish you would've had.

By no means have we solved all of these, and any software project out there is likely facing the same issues.

Why not X?¶

InvenioRDM is a monolithic application using something as old as a relational database system, thus we sometimes get asked questions like: why not use microservices? why not serverless? and why not use NoSQL? Here is an attempt to give some vague answers.

Meta

First, nothing ever rarely starts from scratch. Even when you start a new project, the people and organizations involved bring a long history.

Microservices

Microservices themselves are not a substitute for an architecture. They are simply another way of tying different systems together. Part of the complexity at the software level is moved to the infrastructure level, but independently of where components of the system reside they still have to communicate and have clear boundaries. You can do good and bad architectures with both monoliths and microservices (search for "microservices death star" for some examples).

NoSQL

SQL databases have been around for the past 40 years and are often highly reliable systems. Most NoSQL systems on the other hand have only been around for a much shorter period and do not provide the same reliability. On top of that, InvenioRDM uses a hybrid approach of performing mainly writes of the primary data to the relational database, but keeping a secondary copy indexed in NoSQL system for faster reads.