Dedan Kimathi University, Nyeri, Kenya
More generally, a data scientist is someone who knows how to extract meaning from and interpret data, which requires both tools and methods from statistics and machine learning, as well as being human. She spends a lot of time in the process of collecting, cleaning, and munging data, because data is never clean. This process requires persistence, statistics, and software engineering skills—skills that are also necessary for understanding biases in the data, and for debugging logging output from code.
Cathy O’Neil and Rachel Strutt from O’Neill and Schutt (2013)
We don’t know what science we’ll want to do in five years’ time, but we won’t want slower experiments, we won’t want more expensive experiments and we won’t want a narrower selection of experiments.
Table: Portion of data that was annotated.
Twitter Data | Facebook Data | |
---|---|---|
Initial dataset | 15,354 | 430,075 |
Dataset after Annotation | 3,527 | 4,479 |
Cohen’s kappa inter-annotation used to measure annotator agreement.
Table: Cohen’s kappa agreement scores for the data.
Category | Score |
---|---|
Language | 0.89 |
Aspect | 0.69 |
Sentiment | 0.73 |
Misinformation | 0.74 |
company: Trent AI
book: The Atomic Human
twitter: @lawrennd
The Atomic Human pages human-analogue machine (HAMs) 343-347, 359-359, 365-368 .
newspaper: Guardian Profile Page
blog posts: