Tuesday, August 21, 2018

The future of prooph components

There is much development going on in the prooph team at the moment. The new event-store-client is more stable every day and development of event-store v8 is beginning soon (a lot of planning and experimenting is done already).

So naturally it was time to check if other components could need some new major version as well. There was a bit of a discussion on changes to prooph/common (#70, #71#72) and a new prototype for snapshotter and event-sourcing component was created as well.

This was the point, where I stopped and started rethinking it completely. After I could put my thoughts together I spoke with Alexander Miertsch (the other prooph maintainer). As usual, even though we are two different people, we share the same mind and could agree pretty fast on how to proceed further.

So here's the deal...

Prooph hereby announces that the development of prooph/service-bus and prooph/event-sourcing will be dropped. This also includes all snapshot-store implementations as well as message producers.

Here is the full list of the components:

- https://github.com/prooph/service-bus
- https://github.com/prooph/event-sourcing
- https://github.com/prooph/common
- https://github.com/prooph/event-store-bus-bridge
- https://github.com/prooph/snapshotter
- https://github.com/prooph/http-middleware
- https://github.com/prooph/psb-bernard-producer
- https://github.com/prooph/pdo-snapshot-store
- https://github.com/prooph/memcached-snapshot-store
- https://github.com/prooph/mongodb-snapshot-store
- https://github.com/prooph/humus-amqp-producer
- https://github.com/prooph/annotations
- https://github.com/prooph/psb-http-producer
- https://github.com/prooph/psb-enqueue-producer
- https://github.com/prooph/snapshot-store
- https://github.com/prooph/arangodb-snapshot-store
- https://github.com/prooph/psb-zeromq-producer
- https://github.com/prooph/service-bus-zfc-rbac-bridge

All components will receive support until December 31, 2019 and will then be marked as deprecated and receive no further support from the prooph core team.

Let me explain why we decided this:

Event-Sourcing:
In fact, we recommend not using any framework or library as part of your domain model, and building the few lines of code needed to implement a left fold yourself - partly to ensure understanding and partly to keep your domain model dependency-free. The event-sourcing component was always meant to be a blueprint for a homegrown implementation but most people installed and used it as is, thus the prooph core team was complicit in advertising bad practices.

Don't worry, we will also ship some blueprints as inspiration on how to implement an aggregate root and aggregate repository, but this time, it's part of the documentation and not a repository that you could install again. You can still copy / paste from it and make it run, it's your choice.

Service-Bus:
When it was originally developed, there was no good alternative out there. In the meantime symfony shipped its own messenger component (I looked into it, it looks really great!) and sooner or later, it will probably have more downloads then our famous service-bus implementation.

This also means we now have a clear focus on developing prooph/event-store and provide even better documentation for it. The next goals are first to stabilize the new event-store-client and write documentation and more tests for it, and then start with the development of prooph/event-store v8.

Sunday, March 18, 2018

Why there will be no Kafka EventStore in prooph

tl;dr

When even Greg Young, the author of the original EventStore implementation (http://geteventstore.com/) says that it's a bad idea to implement a Kafka EventStore, than it's a bad idea. The prooph-team will not provide a Kafka EventStore implementation.

Before we begin, let's see what requirements we need from an event-store:

- Concurrency checks
  When an event with same version is appended twice to the event-store, only the first attempt is allowed to succeed. This is very important, imagine you have multiple processes inserting event to an existing stream, let's say we have an an existing aggregate with only one event (version 1). Now two processes insert two events, so we have: event 1, event 2, event 2', event 3, event 3'. Next another process is inserting an event 4. If the consumer of the stream has to decide whether or not an event belongs to the stream, it's a hard decision now, because we could have the following possible event streams: event 1, event 2, event 3, event 4 or event 1, event 2', event 3, event 4 or event 1, event 2, event 3', event 4 or event 1, event 2', event 3', event 4. Additionally you can't rely on timing (let's say event 2 was inserted slightly before event 2'), because on a different data center the order could be the other way around. The matter complicates the further the event stream goes. And this is not a blockchain-like problem, where simply the longest chain wins, because sometimes, there will be no more additional events to an existing stream. I have to add, that this concurrency check requirement and version constraint might not be needed for all use-cases, in some applications it might be okay to just record whatever happened and versions / order don't matter at all (or not that much), but for a general purpose event-store implementation (where you don't wanna put dozens of warnings, and stuff), this will only bring problems and lot of bug-reports.

- One stream per aggregate
  In the original event-store implementation of Greg Young (https://github.com/EventStore/EventStore), there is by default one stream per aggregate. That means that not all events related to aggregate type "User" are stored in a single stream, but that we have one stream for each aggregate, f.e. "User-<user_id_1>", "User-<user_id_2>", "User-<user_id_3>", ...
  This option is also available for prooph-event-store, limiting the usage to disallow this strategy is possibile, but not really wanted.
  To quote Greg Young: "You need stream per aggregate not type. You can do single aggregate instance for all instances but it's yucky"

- Store forever
  While at first glance obvious, the event store should persist the event forever, it's not allowed to be removed, garbage collected or deleted on server shutdown.

- Querying event offset
  Another quite obvious thing to consider at first: Given you loaded an aggregate from a snapshot, you already have it at a specific version (let's say event version is 10 f.e.). Then you only need to load events starting from 11. This is especially important, once you have thousands of events in an aggregate (imagine you would need to load all 5000 events again, instead of only the last 3, event when you have a snapshot). Even more important when one stream per aggregate is not possible.
 

Now let's look at what Kafka has to offer:

- Concurrency checks

  Well, Kafka hasn't that. It would not be such a problem with something like actor model, to quote Greg Young again:
  >> I also really like the idea of in memory models (especially when built
  >> up as actors in say erlang or akka). One of the main benefits here is
  >> that the actor infrastructure can assure you a single instance in the
  >> cluster and gets rid of things like the need for optimistic concurrency.
  >>
  >> Greg
  This this one is a really big issue.

- One stream per aggregate
  If I have thousands of aggregates and each have a topic, Kafka (and ZooKeeper specifically) will explode. These are real problems with Kafka. ZooKeeper can't handle 10M partitions right now.

- Store forever
  On Kafka events expire! Yes really, they expire! Fortunately enough, with never versions of Kafka, you can configure it to not expire messages at all (day saved!).

- Querying event offset
  Here we go, that's not possibile with Kafka! Combine that with the "One stream per aggregate" problem, and here we go full nightmare. It's simply not reasonable to read millions or even billions of events, just to replay the 5 events you're interessted in to your aggregate root.

Way around some limitations:

- Use some actor modelish implementation (like Akka) - this would solve the "Concurrency checks" issue and well as the
  "Queryingevent offset" issue, because you can use stuff like "idempotent producer semantics" (to track producer id and position in event stream) and in-memory checks of aggregates, to go around the concurrency checks requirement. But prooph are PHP components and even if we would implement some akka-like infrastructure, that would be a rare use-case that someone wanted to use this implementation.
 
 
Another interessting quote from Greg Young about an Kafka EventStore:

>> Most systems have some amount of data that is "live" and some much
>> larger amount of data that is essentially "historical" a perfect example
>> of this might be mortgage applications in a bank. There is some tiny %
>> that are currently in process in the system and a vast number that are
>> "done".
>>
>> If you wanted to just put everything into one stream you would need to
>> hydrate all of them and keep them in memory (even the ones you really
>> don't care about any more).
>>
>> This can turn into a very expensive operation/decision.
>>
>> Cheers,
>>
>> Greg

and also this one:

>> Most of the systems discussed are not really event sourced they just
>> raise events and are distributing them. They do not keep their events as
>> their source of truth (they just throw them away). They don't do things
>> like replaying (which is kind of the benchmark)
>>
>> Not everyone who sends a message is "event sourcing"
>>
>> Greg

Final thoughts:

Kafka is great for stream processing. With enqueue (github.com/php-enqueue/enqueue-dev) and prooph's enqueue-producer (github.com/prooph/psb-enqueue-producer/) you can already send messages to Kafka for processing. So send your messages to Kafka, if you need or want to.
In my opinion a Kafka EventStore implementation would be very limited and not useful for most of the PHP applications build. Therefor I think there will never be a Kafka EventStore implementation (not in prooph, nor in any other programming language - correct me if I'm wrong and you know an open source Kafka EventStore implementation somewhere!).
When even Greg Young things a Kafka EventStore is a bad idea, I'm at least not the only one out there.

References:

Most of the Greg Young quotes about an Kafka EventStore are taken from here: https://groups.google.com/forum/#!topic/dddcqrs/rm02iCfffUY
I recommend this thread for everyone you wants to dig deaper into the problems with Kafka as an EventStore.