Crowdsourcing scientific skills: Kaggle and data modeling

kaggle-logo-transparent-300Kaggle is not exactly a newcomer but it is an excellent example of how the web 2.0 can boost science and help solve scientific problems.

Kaggle harvests the power of crowdsourcing to solve problems in need of data modeling. Predictive models are everywhere, they help predict various phenomena, from customer behaviors to bird migration.  However there is no general rule for designing such models, and they often end up being optimized by trial and error. So it seems the field is well suited for the massive amounts of work-hours crowdsourcing can provide.

Kaggle asks participants to develop predictive models to help resolve problems that have been submitted by companies ( GE, Allstate, Merck, Ford…) and other organisations (universities, governmental organisations…). Tens to hundreds of different models can then be compared, and the best is chosen as the winner.

Turning work into a game is a common startegy to motivate participation, however it is interesting to see that Kaggle pushes the sport analogy quite far. The terms “player”, “competition” and “winner” are often used.  And a winner there is, with the creator of the most optimized models usually rewarded with hundreds if not millions of dollars.

Founded in 2010, the company has successfully raised millions of dollars and major companies are coming onboard with their own data to be modeled, convinced by a series of successful projects. A great example how to put brilliant minds (with some free time on their hands) to collaborative working!