Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Should we think of BuildFlow as an alternative to workflow managers like Prefect or kubeflow or is it a higher level library for stream processing like Beam?


More of a higher level library like Beam, and I could see it being plugged into a Prefect workflow.


I see, what fault tolerance mechanisms does it provide?

I don’t see anything on snapshotting or checkpointing like Flink. Is this just for stateless jobs?


We don't support any snapshotting or checkpointing directly in BuildFlow at the moment, but these are great features we should support.

But we do have some fault tolerance baked into our I/O operations. Specifically for Google Cloud Pub/Sub the acks don't happen until the data has been successfully processed and written to the sink, so if there is a bug or some transient failure the message will be resent later depending on your subscriber configuration.


I should also mention BuildFlow does support stateful processing with the Processor class API: https://www.buildflow.dev/docs/processors/overview#processor...


Is there an underlying stream processor (e.g. Flink)? How many messages per second can it process?


All of our processing is done via Ray (https://www.ray.io/). Our early benchmarks are about 5k mesesages per second on a single 4 core VM, but we believe we can increase the with some more optimizations.

This bench mark was consuming a Google Cloud Pub/Sub stream and outputting to BigQuery.


Delighted to hear your choice of Ray and building atop Ray.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: