Remote - Biostatistician/Data Scientist


Role and Responsibilities:

  • Perform advanced data quality control and analysis of public health related data (e.g., large EMR datasets, large relational databases) using R or Python
  • Cleanse data (data munging) and prepare datasets for analytics
  • Perform statistical analyses, develop advanced data analysis and data visualization tools and applications in Python and/or R
  • Develop multilevel statistical models towards specific outcomes
  • Understand, specify, process and present information from potentially several disparate data sets
  • Generate actionable knowledge from data
  • Prepare technical reports and routine data analysis reports using large scale data sets using Python and/or R
  • Assist interpreting data analysis findings and offer solutions for issues identified
  • Communicate and collaborate with other scientists on aspects of study analysis and interpretation

Basic Qualifications:

  • Masters or Doctorate Degree in Statistics, Biostatistics, Engineering, Computer Science, Mathematics, or similar field, including a minimum of 3 years of related work experience and graduate level coursework in statistics
  • Experience with statistical techniques (e.g., measures central tendency, dispersion, variance, regression)
  • Experience with cleaning data for analysis (i.e., data munging)
  • Experience working with real world large databases and identifying data gaps and inconsistencies (i.e.- data validation and missingness patterns)
  • 3-5 years of experience with quantitative analysis and data interpretation
  • 3-5 years of experience programming in PySpark/Python programming for Statistical analyses and data management
  • Demonstrated ability of analytical and problem-solving skills necessary for quickly developing recommendations based on quantitative and qualitative data from many different types of sources
  • Excellent organizational skills, commitment to generating accurate data, ability to meet short deadlines, and demonstrated experience in multi-tasking
  • Excellent oral and written communication skills, particularly with multidisciplinary staff from different organizations/agencies
  Strong interpersonal and teamwork skills
  • Strong interpersonal and teamwork skills
  • Adheres to reproducible research

Preferred Qualifications:

  • Experience with cloud platform, preferably Azure (Databricks, Azure Data Lake, Azure Data Factory, Python, and Power BI)
  • Demonstrated application of innovative statistical and machine learning methods in public health practice
  • Prior CDC and Public Health experience especially CDC OCIO Enterprise Data Analytics and Visualization (EDAV) Azure platform
  • Experience in developing machine learning models and algorithms, including the use of the following Python machine learning libraries: Numpy, Scipy, Scikit-learn, Theano, TensorFlow, Keras, PyTorch, Pandas, and Matplotlib
  • Experience with using R-Studio/R
  • Experience with programming in R and experience using machine learning packages (e.g., caret, randomForest, nnet, neuralnet, e1071, hiernet, tree, xgboost, SMOTE, etc.)
  • Experience developing user-friendly interactive, data insights and visualization tools (e.g. R Shiny applications, PowerBI, Tableau)
  • Microsoft Certified Azure Data Analyst Associate/Azure Data Fundamentals
  • Experience developing and supporting Python based AI/ML solutions
  • Experience using source control and DevOps tools (such as, Git/Bitbucket, Atlassian Confluence, Jira)


