Data Scientist

Home Exercise

A new kind of virus has been spreading around the world. Very little is known about it, yet the Israeli health care bodies have been able to collect some data from patients which they have made public in an attempt to get help from the country’s AI communities. 


What do we have?


A file containing data for ~77K patients (labeled as ‘serial_number’) and corresponding test results and diagnosis (labeled as ‘Diagnosis’)


Our goal

  1. Build a predictor for a new patient’s status




Please submit 

  1. Code - either python or Jupyter (please make it legible, and use comments, GitHub repos are best)

  2. Validation set results (confusion matrix) 



  1. The code should be to run on a laptop and shouldn’t require external compute power

  2. The entire exercise should take no more than 4h start to finish



Helpful things to consider before starting


  1. The answer to the exercise can be short, and based on open source. We actually appreciate simple & elegant solutions 💪. 

  2. Make sure you are able to explain how your algorithm works and to justify your choices.

  3. Does the data have any special characteristics?

  4. Does this problem resemble other data science problems? If so, which ones and how do they resemble?

  5. If your laptop \ home machine can’t handle the compute effort - change approach 🙂

  6. If you’re entering hour 5, reconsider your approach 

  7. You can use any open source \ public library you wish but, be sure you’re able to explain how it works and why you chose it.

  8. You only need to answer 1 of the 2 questions.

  9. If you choose question 2, which metric would be best to prove your results?

  10. Stuck?         Call \ email us. Don’t be shy.

+1 (650) 843-9196 (toll free)

©2019 by Vanti.