I’m happy to announce that Confluent will be hosting the first Stream Data Hackathon this April 25th in San Francisco! Apache Kafka has recently introduced two new major features: Kafka Connect for data integration and Kafka Streams for distributed, fault tolerant stream processing. Combined together, these form a powerful toolset for building real-time data pipelines and we’re hosting a hackathon to help the community build connectors and learn how to build Kafka Streams applications. Whether you’re a beginner with Kafka or a seasoned expert, join us to help improve the ecosystem of connectors, create proof of concept stream processing applications, and maybe win a prize in the process. Here are the key details:
WHEN: Monday, April 25, 2016 from 6:00 PM to 10:00 PM (PDT)
WHERE: Hilton San Francisco Union Square – Imperial B Ballroom – 333 O’Farrell Street, San Francisco, CA 94102 – View Map
We’ll have food, drinks, and prizes for participants, with Kafka developers on hand to help you with any questions. Already interested? Register here. Want to know more? Read on below for more details.
About Kafka Connect
Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other data systems. It makes it simple to quickly define connectors that move large streaming datasets into and out of Kafka. Kafka Connect can ingest entire databases or collect metrics from all your application servers into Kafka topics, making the data available for stream processing with low latency. A sink connector can deliver data from Kafka topics into secondary indexes like Elasticsearch or into batch systems such as Hadoop for offline analysis.
Kafka Connect abstracts away the common problems every connector to Kafka needs to solve: schema management, fault tolerance, partitioning, offset management and delivery semantics, operations, and monitoring. This allows connector developers to focus on the details specific to the system they are copying data from, while relying on Kafka Connect to solve the hard problems. Connect users can pick from a repository of open-source connectors without having to worry about interoperability as well as have a single system to manage, monitor, and deploy several connectors on.
About Kafka Streams
Kafka Streams is a library for building streaming applications, specifically applications that transform input Kafka topics into output Kafka topics. Kafka Streams has a very low barrier to entry, easy operationalization, and has a natural DSL for writing stream processing applications. It achieves this unique feature set by working directly with Kafka and leveraging the existing distributed, fault tolerant clients. By implementing stream processing as a library instead of a framework, it remains agnostic to resource management and configuration tools so it is easily adopted in any organization — write and deploy your stream processing applications like you would any other. And because it builds upon important concepts for stream processing such as properly distinguishing between event time and processing time, windowing support, and simple yet efficient management of application state, you get all the modern stream processing features you expect in a lightweight library.
Interested in the hackathon but not sure what to build? No problem. Here are a few ideas to get the creative juices flowing:
- Graphite Source Connector – Make your metrics available at scale in real time. If your organization already standardizes on graphite for metrics, you can stream them into Kafka and make them available for complex alerting applications or to buffer them for delivery to batch storage systems for offline analysis.
- Real-time Alerting – Build a Kafka Streams application that consumes metrics or logs from Kafka, performs aggregations, and can trigger alerts on certain conditions (e.g. a metric’s 1 minute average exceeds a configurable threshold).
- Slack Source Connector – The hottest thing in chat since IRC. Connect to the Slack Real Time Message API to stream message contents into Kafka. Downstream applications could aggregate and analyze this data.
- Twitter Trending Topics – Using data imported by a Twitter connector, analyze and aggregate tweets to detect trending topics. This might be accomplished by computing popular topics over multiple time periods, then joining these data sets to find terms/hashtags that are significantly more popular recently (e.g. last hour vs last day).
- PagerDuty Sink Connector – Use the PagerDuty Event API to trigger alerts for your ops team directly from events generated in Kafka. This could be the final stage in an alerting system (which you might build with the Graphite connector described above).
Get creative! Databases and message queues are obvious targets for connectors, but you can connect all sorts of systems, from loading Wikipedia edits into Kafka to creating JIRA tickets from messages in a Kafka topic. Kafka Streams applications could leverage and combine data from any set of available connectors.
Still not sure what to build? During registration you can include systems you’re interested in and we’ll help connect you with other participants so you can work in a team to come up with and implement a project.
Entries will be judged by a panel of judges at the end of the hackathon based on creativity, features, and completeness.
1st place: Lunch with Kafka co-creator and Confluent co-founder Jay Kreps
2nd and 3rd place: $100 giftcards for Amazon or iTunes
Everyone: T-Shirt and stickers
Prizes will only be awarded to entries that open source their code, making it available on a code sharing site like GitHub or Bitbucket and be willing to list it on the Kafka Connector Hub.
Q & A
Is attendance restricted to Kafka Summit attendees?
No, this is a community event and anyone is welcome to register and participate.
Do I need to already be familiar with Kafka Connect and Kafka Streams?
No previous experience with Kafka Connect or Kafka Streams is required, but we encourage you to review some of the resources listed below to get some basic familiarity with the framework. This will let you focus on designing and writing your connector during the hackathon.
Do I need to know what I’m going to build before I arrive?
No, although it will help you get up and running more quickly if you come with a few ideas. We’ve provided some examples of possible projects in the “Project Ideas” section above to give you an idea of the types of systems you might want a connector for and applications you might build with Kafka Streams.
Can I work in a team?
Absolutely, and we encourage it! To help form teams, you can include projects you are interested in building with your registration. We’ll connect you with other participants with similar interests at the beginning of the event.
What type of food will be provided?
Light dinner and drinks.
Am I required to submit my code or open source it?
You are not required to do either, but you must publish your code under an open source license to be eligible for the prizes. We recommend the Apache v2 License, but other popular open source licenses are acceptable.
How complete are projects expected to be at the end of the hackathon?
The hackathon is just one evening, but enough time to get a prototype up and running. We hope this will motivate you to get started on a fully featured connector or Kafka Streams application, but the expectation is to only have a prototype by the end of the night.
Will a skeleton be provided to help get started?
Yes, a repository with a skeleton connector will be provided in the resources section before the event, and example applications for Kafka Streams can be found here. We encourage starting from a skeleton so you can make the most of the time during the hackathon.
Who will be available to provide help with the Kafka Connect and Kafka Streams?
Kafka committers, Kafka Connect and Kafka Streams developers, Confluent engineers, and community members will attend the event to help you go from design to implementation of your connector.
How will projects be judged?
Near the end of the hackathon we’ll ask you to give a brief overview of what you’ve built and provide us a link to the repository. No need for a fancy demo, just a quick summary. A small panel of judges will select the most outstanding project, based on creativity, features, and completeness.
The Stream Data Hackathon is a free event, but all attendees must register. For more details and to complete your registration, please click here.
The hackathon will be most productive if you’ve done a bit of prep work so you can get straight to coding. Here are some resources you might find useful:
- Introductory blog post and tutorial
- Documentation and connector developer guide
- Find example connectors in Confluent Platform (JDBC, HDFS) and on the Connector Hub