THE PURPOSE:
The Data Engineer works within the Data Solutions organization on critical reporting, visualization, and analysis initiatives. Reporting spans from custom ad-hoc requests to scheduled jobs to supporting our growing data warehouse, and building our future cloud analytics platform. The developer must be able to communicate to business users the exact scope of metrics as well as the confidence and quality of the data in reports.
THE ROLE:
- Work directly with the business users to understand the reporting needs and lead business users to practical solutions
- Help translate business requirements into specification documents to track and perform analysis of new and existing site features
- Understand the necessity of data quality and requirement for confidence of accuracy of any reports
- Develop/monitor/maintain new reports, dashboards, visualizations, procedures, data structures and databases
- Design data pipelines and maintain data pipelines in cloud or on-premise environments
- Design data schema, perform data transformations, enrichments, and manipulations with efficiency and reusability in mind
- Planning, conducting and directing the analysis of complex business problems and projects
THE CANDIDATE: ·
- Understand data structures and algorithms. Understanding of basic statistics (confidence intervals, statistical significance, etc)
- Experience in working with large size data sets (Billions of rows/Petabytes of data)
- Experience in working with various data sources (ODBC, flat files, etc)
- Experience working with and designing complex data schemas
- Strong skills in SQL, Java and/or Python
- Experience with SQL query performance optimization
- Strong skills Experience with Apache Big Data Frameworks (Hadoop/EMR/Databricks, Spark, Hive)
- Strong experience with Spark performance optimization and troubleshooting
- Experience with Kafka and event driven architectures
- Familiarity with workflow scheduling/orchestration tools (Airflow, Jenkins)
- Experience with AWS
- Experience with Tableau and or other Self Service Analytical tools.
- Implemented Redshift, Snowflake, Azure Data Warehouse, ADLS, S3, Kafka, Presto, EMR, Databricks, or Data Lake Architecture in one or more public clouds in a Production Large Scale environment.
To Be Successful You Will Be:
- Highly motivated with a great attitude and desire to dive into raw data to understand trends in behavior to find insights
- Excellent at multitasking who can execute multiple requests and reports under tight timelines
- Inquisitive, self-starter, able to work autonomously
- Able to work in a fast-paced dynamic startup like environment
- Detail-oriented tactician who strives for perfection
- Strong verbal and written communication (and listening) skills
- Excellent reading comprehension and attention to detail.
- Strong problem-solving skills
- Strong documentation skills as you code (Jira, Confluence)
As a Data Engineer, your day-to-day tasks will include:
- Helping us leverage large-scale data stores and data infrastructure by building out data pipelines, streams, and utilities in Spark and other technologies for feedback to our business systems, partners, or users
- Developing robust, low latency and fault tolerant pipelines to support business critical systems
- Aggregating key metrics for business partners to inform key decisions
- Working with cloud technologies to build and deploy your applications
Environment
Can work effectively on a small and nimble team, no trouble context-switching
Education
B.S./M.S. in Computer Science or Computer Engineering or 3+ years of equivalent experience