The Neuromation Platform will use distributed computing along with blockchain proof of work tokens to revolutionize AI model development.
The revolution is long overdue: deep learning employs artificial neural networks of extremely large capacitance and, therefore, requires highly accurate labeling. Collecting large datasets of images, text and sound is easy, but describing and annotating data to make it usable has traditionally been challenging and costly. Crowdsourcing was applied to the problem of dataset creation and labeling a few years ago, employing large numbers of humans to correct mistakes and improve accuracy. It proved slow, expensive and introduced human bias. Besides, there were tasks that humans simply could not do well, such as estimating distances between objects, quantifying lighting in a scene, accurately translating text, and so on.
We propose a solution whose accuracy is guaranteed by construction: synthesizing large datasets along with perfectly accurate labels. The benefits of synthetic data are manifold. It is fast to synthesize and render, perfectly accurate, tailored for the task at hand, and can be modified to improve the model and training itself. It is important to note that real data with accurate labels is still required for evaluating models trained on synthetic data, in order to guarantee acceptable performance at inference time. However, the amount of validation data required is orders of magnitude smaller than training data!