As a Big Data Engineer, you will work on the collection, storing, processing, and analysis of huge sets of data from heterogeneous domains. Your primary focus will be on researching optimal solutions appropriate for the aforementioned purposes then implementing, maintaining, and monitoring them.
- Researching, designing, and developing appropriate algorithms for Big Data collection, processing, and analysis;
- Selecting and integrating any Big Data tools and frameworks required to enable new and existing product capabilities;
- Collaborate closely with the product team to define the requirements and set milestones that relate to Big Data features;
- Detect anomalies and perform an audit on raw and processed data;
- Monitoring performance and advising any necessary infrastructure changes;
- Defining data retention policies;
- Present data findings to internal and external stakeholders;
- Closely interact with the Data Scientists in providing feature-ed datasets.
- Experience in one of the following: Java or Scala;
- Proficiency with Hadoop ecosystem services such as MapReduce v2, HDFS, YARN, Hive, HBase;
- Experience with building stream-processing systems using solutions such as Apache Kafka and Apache Spark streaming;
- Experience with designing, implementing, and deploying in cluster data pipelines using Apache Spark framework (RDD, Dataframe, Streaming);
- Experience with integrating data from multiple heterogeneous sources and various formats (CSV, XML, JSON, Avro, Parquet);
- Experience with SQL databases and NoSQL databases, such as Elasticsearch and MongoDB;
- Proficient understanding of microservices architecture and distributed systems;
- Experience with Hadoop ecosystem on-premise or on-cloud;
- Nice to have hands-on experience with Docker, Kubernetes.