Sunday, March 18, 2018

Why there will be no Kafka EventStore in prooph


When even Greg Young, the author of the original EventStore implementation ( says that it's a bad idea to implement a Kafka EventStore, than it's a bad idea. The prooph-team will not provide a Kafka EventStore implementation.

Before we begin, let's see what requirements we need from an event-store:

- Concurrency checks
  When an event with same version is appended twice to the event-store, only the first attempt is allowed to succeed. This is very important, imagine you have multiple processes inserting event to an existing stream, let's say we have an an existing aggregate with only one event (version 1). Now two processes insert two events, so we have: event 1, event 2, event 2', event 3, event 3'. Next another process is inserting an event 4. If the consumer of the stream has to decide whether or not an event belongs to the stream, it's a hard decision now, because we could have the following possible event streams: event 1, event 2, event 3, event 4 or event 1, event 2', event 3, event 4 or event 1, event 2, event 3', event 4 or event 1, event 2', event 3', event 4. Additionally you can't rely on timing (let's say event 2 was inserted slightly before event 2'), because on a different data center the order could be the other way around. The matter complicates the further the event stream goes. And this is not a blockchain-like problem, where simply the longest chain wins, because sometimes, there will be no more additional events to an existing stream. I have to add, that this concurrency check requirement and version constraint might not be needed for all use-cases, in some applications it might be okay to just record whatever happened and versions / order don't matter at all (or not that much), but for a general purpose event-store implementation (where you don't wanna put dozens of warnings, and stuff), this will only bring problems and lot of bug-reports.

- One stream per aggregate
  In the original event-store implementation of Greg Young (, there is by default one stream per aggregate. That means that not all events related to aggregate type "User" are stored in a single stream, but that we have one stream for each aggregate, f.e. "User-<user_id_1>", "User-<user_id_2>", "User-<user_id_3>", ...
  This option is also available for prooph-event-store, limiting the usage to disallow this strategy is possibile, but not really wanted.
  To quote Greg Young: "You need stream per aggregate not type. You can do single aggregate instance for all instances but it's yucky"

- Store forever
  While at first glance obvious, the event store should persist the event forever, it's not allowed to be removed, garbage collected or deleted on server shutdown.

- Querying event offset
  Another quite obvious thing to consider at first: Given you loaded an aggregate from a snapshot, you already have it at a specific version (let's say event version is 10 f.e.). Then you only need to load events starting from 11. This is especially important, once you have thousands of events in an aggregate (imagine you would need to load all 5000 events again, instead of only the last 3, event when you have a snapshot). Even more important when one stream per aggregate is not possible.

Now let's look at what Kafka has to offer:

- Concurrency checks

  Well, Kafka hasn't that. It would not be such a problem with something like actor model, to quote Greg Young again:
  >> I also really like the idea of in memory models (especially when built
  >> up as actors in say erlang or akka). One of the main benefits here is
  >> that the actor infrastructure can assure you a single instance in the
  >> cluster and gets rid of things like the need for optimistic concurrency.
  >> Greg
  This this one is a really big issue.

- One stream per aggregate
  If I have thousands of aggregates and each have a topic, Kafka (and ZooKeeper specifically) will explode. These are real problems with Kafka. ZooKeeper can't handle 10M partitions right now.

- Store forever
  On Kafka events expire! Yes really, they expire! Fortunately enough, with never versions of Kafka, you can configure it to not expire messages at all (day saved!).

- Querying event offset
  Here we go, that's not possibile with Kafka! Combine that with the "One stream per aggregate" problem, and here we go full nightmare. It's simply not reasonable to read millions or even billions of events, just to replay the 5 events you're interessted in to your aggregate root.

Way around some limitations:

- Use some actor modelish implementation (like Akka) - this would solve the "Concurrency checks" issue and well as the
  "Queryingevent offset" issue, because you can use stuff like "idempotent producer semantics" (to track producer id and position in event stream) and in-memory checks of aggregates, to go around the concurrency checks requirement. But prooph are PHP components and even if we would implement some akka-like infrastructure, that would be a rare use-case that someone wanted to use this implementation.
Another interessting quote from Greg Young about an Kafka EventStore:

>> Most systems have some amount of data that is "live" and some much
>> larger amount of data that is essentially "historical" a perfect example
>> of this might be mortgage applications in a bank. There is some tiny %
>> that are currently in process in the system and a vast number that are
>> "done".
>> If you wanted to just put everything into one stream you would need to
>> hydrate all of them and keep them in memory (even the ones you really
>> don't care about any more).
>> This can turn into a very expensive operation/decision.
>> Cheers,
>> Greg

and also this one:

>> Most of the systems discussed are not really event sourced they just
>> raise events and are distributing them. They do not keep their events as
>> their source of truth (they just throw them away). They don't do things
>> like replaying (which is kind of the benchmark)
>> Not everyone who sends a message is "event sourcing"
>> Greg

Final thoughts:

Kafka is great for stream processing. With enqueue ( and prooph's enqueue-producer ( you can already send messages to Kafka for processing. So send your messages to Kafka, if you need or want to.
In my opinion a Kafka EventStore implementation would be very limited and not useful for most of the PHP applications build. Therefor I think there will never be a Kafka EventStore implementation (not in prooph, nor in any other programming language - correct me if I'm wrong and you know an open source Kafka EventStore implementation somewhere!).
When even Greg Young things a Kafka EventStore is a bad idea, I'm at least not the only one out there.


Most of the Greg Young quotes about an Kafka EventStore are taken from here:!topic/dddcqrs/rm02iCfffUY
I recommend this thread for everyone you wants to dig deaper into the problems with Kafka as an EventStore.


  1. Hi there Sascha

    The Kafka broker is really a log so it’s best not to think of it as a database. It provides a powerful event storage layer. Atop that you add a view (i.e. the Query side of CQRS). People do this in two ways: either via the Kafka Streams API (which is part of Kafka) which provides an internal ‘state store’ where you can build views that allow you to query aggregates directly within your app. Alternatively via a database (which it can be connected to via one of the connectors).

    The approach is little different to the way people use Event Store, so I can see where the confusion would come from, but it has some nice advantages:
    - it encourages you to process events directly which takes you down a more event-driven route.
    - It separates the log of events (the source of truth) from the view(s) which makes it work well in multi tenant systems like microservices. This is useful because the source of truth remains (and is authoritative), but the view(s) typically change frequently.
    - This also means microservices can share a single source of truth, but each microservice has views that are lightweight and targeted to the job that microservice needs to do, as well as being wholly owned and operated by that service.
    - You can leverage all the tools that come with a stream processing engine, right on your log of events.

    So I see Event Store as an event sourcing database, and it definitely has its place. Kafka is more like an event store that facilitates a *set* of event driven applications through the event sourcing / command sourcing, EDA and CQRS patterns, as well as providing a lot of scalability and resiliency primitives. That’s why it tends to be used a lot with microservices.

    There is an example of a small application here:

    There is also a bit more detail in this free book:

    Oh, and you can store data in kafka for as long as you want (set In fact topics with hundreds of TBs are not uncommon.


    1. Hi Ben, thanks for your input. At the open source project prooph ( we got a lot of requests for an event store implemented with Kafka. So I looked into it and this blog post is the result of my research.

      I am not saying that Kafka is a bad tool, or it cannot be used in event sourced systems at all. All I am saying is that Kafka is not suitable for an event store. By event store I mean a database that is able to store events (like or as in prooph we use MySQL / Postgres). This works really well and Kafka can be a nice addition for event-processing, view generation, etc. But as you said yourself, the Kafka broker is a log. I would not use the word "event store" when talking about Kafka, because an event store has already a specific meaning. People get confused quickly, if you say things like "Kafka is an event store that..." - Kafka is not an event store. As I laid out in this blog post, it's not suitable for an event store at all. But Kafka is still a great tool that can give awesome results when used in combination, no doubts about that.

    2. RDBMS is good choice for implementing event store but, they lack on rebuilding read models and projections, for example we have 1M stream (aka table in RDBMS on per-aggregate-id per-aggregate-type strategies),how i can project from multiple tables?

    3. Hello Ben,

      I'm currently investigating on using Kafka as an Event Store, and I have a few questions regarding your previous comment (sorry for digging up the thread).

      - it encourages you to process events directly which takes you down a more event-driven route.

      => That is not directly related to using Kafka, but is always true when going in the Event Sourcing path, isn't it?

      - It separates the log of events (the source of truth) from the view(s) which makes it work well in multi tenant systems like microservices. This is useful because the source of truth remains (and is authoritative), but the view(s) typically change frequently.
      - This also means microservices can share a single source of truth, but each microservice has views that are lightweight and targeted to the job that microservice needs to do, as well as being wholly owned and operated by that service.

      => That also will be achieved when using CQRS/Event Sourcing with any event store infrastructure, isn't it?

      - You can leverage all the tools that come with a stream processing engine, right on your log of events.

      Are you talking about kafka-streams, KStream and KTable for instance?

      More generally, my main concern about using Kafka as an event store is the difficulty/impossibility to retrieve a stream of events for a given aggregate type and id. A workaround to this is to have a topic per aggregate type and consumers that index events by aggregate id. Then, when a process needs to fetch a stream of events to rebuild an aggregate (during an update operation for instance), it can rely on this separate index. Which mean that the write processes rely on a view (these indexes are projections/views of the source of truth), and not on the source of truth, which makes me very uncomfortable (as these indexes are eventual consistent by nature)!

      My second concern, as Sascha has stated, is the inability to handle concurrency check when publishing events (aka optimistic locking) with Kafka. That means that domain invariant can be silently broken and that does not sound good at all... However, adding such feature to Kafka is under discussion here:

      Please note that these concerns could arise because I have a biased idea of what, or how, should an event sourced system work (mainly by experiencing on Prooph/Event Store), so feel free to correct me if I'm raising incorrect issues (or non issues)!



    4. Regarding concurrency issue, I've just had a breakthrough while browsing this example project:

      Basically, by publishing commands on a topic and using aggregate id as the message keys, the owner of the project was able to ensure that no command would be simultaneously handled for a given aggregate id.

      Thus, completely removing the need for concurrency check when appending the new events.

      I believe the magic happens here:



    5. Hi Gildas,
      I'm the author of, and also gave a talk about it in Codemotion Milan 2018 (unfortunately, the recording had failed but you can see slides here:

      You were right to notice that the concurrency issue can be solved by making sure that all rights go through a Kafka topic with a partition key of the aggregate ID, ensuring that command handling is serializable thus there are no concurrency problems.

      Regarding your other concerns: there's no need to create a topic per aggregate. Kafka was not designed for such use case - you should use a single topic for events, which has enough partitions to accommodate for your scalability needs.

      For your query needs - you first need to understand what you're trying to achieve. If you need to query the events in order to rebuild your aggregate's state: I solved it by using a state store to hold current snapshots of all aggregates. So there's no need to actually rebuild the state, because our snapshots are constantly updating. I also sent these snapshots to a compacted log - which makes it very convenient to build "reactive" systems by listening to changes in aggregate states.

      Another use case for querying the event stream for a specific aggregate is to view its history. If you need this, it's trivial to just spin up another consumer (group) to the events topic, which starts at offset 0, and persist the events to some data store which is better suited for querying (Cassandra, MySQL, DynamoDB, ...) according to your query needs.

      I have built event sourced systems that rely on RDBMS and found it very painful compared to Kafka + Kafka Streams.

      Feel free to reach out if you have any more questions (@amitayh on Twitter)


  2. I was surfing the net and luckily came across this site and found some very interesting stuff here Fall guys

  3. ล่นคาสิโนออนไลน์ บาคาร่า ยังไงให้ไม่มีอันตราย

    บาคาร่า การเล่นการเดิมพันในประเทศไทย กำลังเป็นที่นิยมอย่างใหญ่โต โดยเฉพาะอย่างยิ่งกลุ่มวัยรุ่นรวมทั้งคนที่อยู่ในวัยทำงาน นับว่าเป็นอีกหนึ่งกิจกรรมซึ่งสามารถสร้างรายได้และก็ความย่ำแย่ในขณะเดียวกัน แม้กระนั้นในการเล่นการเดิมพันในปัจจุบันนี้ บาคาร่า เป็นการพนันแบบออนไลน์ โดยไม่ต้องเดินทางไปเล่นตามสถานที่ต่างๆบาคาร่า ที่เป็นจุดมีโอกาสเสี่ยงต่อการโดนสลายการรวมกันและก็จับตัวได้จากตำรวจ สล็อต ซึ่งพวกเราชอบพบเจอได้บ่อยมากตามหน้าเว็บและก็หน้าหนังสือพิมพ์ โดยผู้เล่นยุคสมัยใหม่เลือกเล่นคาสิโนออนไลน์มากยิ่งกว่าการเข้าบ่อนการพนันอย่างชัดเจน โดยผู้เล่นสามารถระบุความปรารถนาแล้วก็เลือกเองได้ สล็อตออนไลน์ ว่าจะเล่นกับเว็บไซต์ไหน


    ในเว็บเกม slot 888 นั้นจะประกอบไปด้วยเกมพนันมากไม่น้อยเลยทีเดียวหลายอย่าง ที่สามารถเลือกเล่นได้อย่างอิสระแล้วก็ในแต่ละเกมนั้น มีอีกทั้งเกมที่เล่นได้ง่ายไม่สลับซับซ้อน ask me bet รวมถึงเกมพนันที่จะต้องใช้ความรู้ความเข้าใจและก็วิธีต่างๆสำหรับการเล่น แม้กระนั้นผู้เล่นจะสามารถทำความเข้าใจสำหรับการเล่นได้อย่างเร็ว ด้วยระบบการเล่นที่ล้ำยุคซึ่งมากับไกด์ออนไลน์ หรือผู้แนะนำสำหรับการเล่น ถึงแนวทางการเล่นรวมทั้งวัสดุต่างๆบนจอ ทำให้ท่านรู้เรื่องระบบการเล่นของเกมได้ภายในเวลาอันสั้น รวมทั้งที่สำคัญเป็น สล็อต 888 มีการรองรับการใช้แรงงานภาษาไทย 100% อีกด้วย อย่างไรก็ดี สล็อต ไม่ว่านักพนันท่านไหนที่พึ่งพิงกำลังฝึกเล่นหรือเป็นพวกใหม่จำเป็นที่จะเรียนเนื้อหาทุกสิ่งทุกอย่างของเกมพนันนั้นๆอย่างละเอียดก่อนจะมีการพนันทุกหนด้วยนะคะ จะได้ไม่เสี่ยงสูญเงินไปเปล่าๆ