- As an individual contributor, design and implement tools and libraries to improve the reliability of the canva microservices (for instance, adding a new long-awaited feature in our circuit breaker library)
- Conduct and organise reliability experiments to identify possible scenarios in which failures might occur and verifying the reliability measures
- Design and build services and tooling that help ensure each of Canva’s microservices are working optimally and are resilient to failures
- Have widespread impact across the organisation by facilitating and spearheading cross-team initiatives that raise the bar for reliability, velocity and security
- Deep investigation into production incidents followed up by applying the learning to code
- Researching, developing, and justifying the best choices in the form of design docs for features that will shape the future of Canva
- Identifying and addressing performance bottlenecks within the application and broader infrastructure
- Propose new approaches and solutions to ensure we future-proof Canva’s distributed cloud infrastructure as we scale. Participating in design meetings, hiring interviews, and code reviews
Required Skills & Experience
- Five-plus (5+) years of commercial experience developing complex, distributed web applications on the JVM; experience writing Java (experience with other JVM languages such as Scala, Kotlin, etc is a big plus – however, we primarily use Java 13)
- Familiar with asynchronous or reactive programming techniques and associated patterns; At our scale, writing blocking code is often not an option
- Comfortable with computer science and engineering fundamentals: concurrency, data structures, and distributed systems
- Experience working in large distributed cloud environments (ideally AWS); CDNs, DNS. We’re hosted on AWS and leverage the tools they provide as much as possible
- Disciplined coding practices, experience with code reviews and pull requests, and a creative and conceptual problem-solving approach as opposed to framework or library-focused; We strive to build fast and lean solutions; not library/framework-heavy patchwork
- Strong communication and team collaboration skills, both written and verbal. A Canva engineer strives to understand the exact problem before jumping into writing code – look before you leap!
Nice to have; Not required!
- Experience with RPC Frameworks, Finagle, Thrift or gRPC will be a huge plus, but not required; We strive to make sure that the communication layer is not a bottleneck, and simplifies the life of the product engineers as much as possible
- Knowledge of networking protocols such as TCP, HTTP 1/2, QUIC, protobuf, etc. would be a big plus; The life of a request doesn’t start inside the backend web server, but rather in the browser of a user
- An understanding of resiliency techniques and patterns – load balancing, throttling, back pressure, circuit breaking, etc; For Canva, reliability is a feature, and the Gateway and broader infrastructure teams aim to provide the best tools possible to the application engineers to ensure that Canva stays available at all times.