As a Data Engineer with Ag Data Services, you will work with faculty, graduate students, and other researchers to improve data accessibility and quality through implementing automated data pipelines and applying best practices for data management and cyberinfrastructure. The Data Engineer will translate researcher’s needs and requirements into technical implementations of automated data pipelines that encompass data transport, quality assurance, cleansing, profiling, extract-transform-load (ETL), feature extraction, metadata enrichment, provenance and lineage, etc. from variety of sources into appropriate files stores, databases, and other data stores. This position will also help design and implement effective practices for data management that enhance accessibility and reusability of research data through adding metadata as well as creating and applying consistent naming and organization schemas for data sets. The Data Engineer will assist with the requirements, planning, and implementation of data-related cyberinfrastructure to meet the needs of researchers. In addition, the Data Engineer will provide consulting and assistance to faculty, graduate students, and other researchers in the effective use of various data pipeline components and analytic tool sets and will influence and implement the ongoing development and continuous improvement of college-level data services supporting the College’s scientific and research endeavors.
- Bachelor’s degree
- 6 years of experience consisting of the following:
- 4 years of relevant experience in one or more of the following areas: programming, application development, database administration, data analytics, production deployment and production support
- 2 or more years of relevant experience in design and implementation of distributed data pipelines, preferably using tools and languages prevalent in the Spark ecosystem such as: Scala, Kafka, Hive, and/or Python
- Experience in working with research data or other forms of unstructured data, and applying data management practices
- Knowledge of statistical and/or analytical algorithm implementation, data interpretation, data processing, and/or modeling
- Excellent communications skills in a customer-facing and / or relationship-building role, especially in translating customer needs into technical requirements and designs
- Capable time and project management skills
- Proficiency in configuration and use of Linux or Unix operating systems
- Excellent communications skills in a customer-facing and / or relationship-building role is mandatory
- Capable time and project management skills a must
- Programming skills in Java and/or Scala, R and Python
- Knowledge of statistical and/or analytical algorithm implementation, data interpretation, and/or modeling
- Experience in a big data environment, such as Apache Hadoop/Spark
- Experience working with non-relational data and NoSQL systems, such as: HBase, Mongo, Cassandra, Parquet, etc.
- Experience working with agricultural data, especially in the areas of remote sensing and geospatial data
- Proficiency in administration of Red Hat operating system
- All new hires will be expected to follow Protect Purdue guidelines. To learn more, visit https://bit.ly/3DH3z6f
- Purdue will not sponsor work authorization for this position.
- A background check is required for employment in this position
- FLSA: Exempt (Not Eligible for Overtime)
- Retirement: Defined Contribution Immediately.
- Purdue University is an EEO/AA employer. All individuals, including minorities, women, individuals with disabilities and veterans are encouraged to apply