Job Title: Data Engineer

Duration: Full Time Employee

Location: Houston, TX

Summary:
As a Data Engineer you will develop and maintain scalable data pipelines while collaborating with analytical and business teams to improve data models.

Responsibilities:

  • Implements processes and systems to monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.
  • Writes unit/integration tests, contributes to engineering wiki, and documents work.
  • Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
  • Works closely with a team of frontend and backend engineers, product managers, and analysts.
  • Defines company data assets (data models), and other jobs to populate data models.
  • Designs data integrations and data quality framework.
  • Designs and evaluates open source and vendor tools for data lineage.
  • Works closely with all business units and engineering teams to develop strategy for long term data platform architecture.

Minimum Qualifications:

  • Preferable to have a Degree in an analytical field (e.g. Computer Science, Mathematics, Statistics, Engineering, Operations Research, Management Science) and 4+ years of professional experience.
  • At least 4 years of data analytics experience in a distributed computing environment
  • Database maintenance
  • Building and analyzing dashboards and reports
  • Evaluating and defining metrics and perform exploratory analysis
  • Monitoring key product metrics and understanding root causes of changes in metrics
  • Empower and assist operation and product teams through building key data sets and data-based recommendations
  • Automating analyses and authoring pipelines via SQL/python based ETL framework
  • Superb SQL programming skill.
  • Understanding of ETL tools and database architecture.
  • Advanced knowledge of data warehousing.
  • Strong knowledge of code and programming concepts. Experience with Python.
  • Experience with Kubernetes deployments and DevOps approach
  • Highly motivated self-starter who is flexible and goal oriented
  • Strong Python Knowledge
    • Data Models
    • Object-Oriented Programming
    • Testing (Unit / Regression)
  • Database Experience
    • Window Functions
    • Partitioning/Indexes
    • Relational and Non-Relational
  • Big Data Experience
    • Hadoop
    • Spark
    • DataFrame API
  • Performance Benchmarking
    • Cluster Configuration/Optimization
    • Spark Optimzation
  • Version Control, CI/CD
    •  Git
    •  Jenkins,  Drone
  • Some Cloud Experience
    •  AWS (primary), Azure, Google Cloud

Nice to Have

  • Data Science Experience (Either direct or from working closely with a DS team)
      • Scikit-Learn, Tensorflow, Spark.Mllib, General Algebra & Algorithms
    • Airflow (Scheduling Tools)
    • Container Experience
      • Docker, Kubernetes
    • Streaming Experience
      • Kafka, Spark Streaming, Flink

Benefits and Additional Perks:

  • Fast paced, collaborative and fun environment
  • Work with data and latest technology to transform industry
  • Competitive salary and bonus
  • Medical, dental, vision, 401k, life and long-term disability insurance
  • Paid Time Off