Born to Learn: 2011

Not the titles, not the fancy tools. It is not the expertize, either. It is the responsibility one takes in doing one’s job. Why? Because what one wants from a craftsman is to have one’s problem solved. You want a doctor to see you through the illness and make you feel better, a developer -- to write software that serves your purpose, etc. You want the result.
There always was and always will be a tendency to consider the tools and titles when faced with a choice, which implies their necessity to you as a craftsman. If you're a developer, you'd better have all those fancy certificates saying you really know the languages, platforms, methodologies etc. People expect it. It will help you get your customers. Yet if you don't care, if you take no responsibility or don't provide the result they want, no certificate will help you keep them.
That is exactly why the most important part of your CV is the one saying what have you done. That is why, when choosing a doctor one checks his customer reviews rather then his diplomas. That is the ultimate quality criterion.

The job I'm sabotaging at the moment involves machine learning for probabilistic models. Such are necessary when the problem cannot be solved deterministically either due to high computational complexity or due to lack of our knowledge of the actual structure of the problem. One example of the latter case is the natural language, which is quite complex and inherently ambiguous. Linguists, psycholinguists and cognitive scientists keep generating and checking numerous hypotheses on the subject, and yet there's no single comprehensive model of the language, no algorithm to pass the Turing test. So we stick to small subtasks we can manage, and even there we generally design a simplistic model reflecting our idea of the underlying structure and leave a number of degrees of freedom in the model to be consequently adjusted to fit the data. The process of adjustment is called training. The idea is simple -- we provide a number of sample inputs and try to find parameter values which allow the model to best reproduce the correct outputs. The examples used for training are called the training set.
Sounds simple, yeah. But there's a number of catches, of course. Firstly, the performance of the model on the training set is only partially indicative of its expected performance on the other, unseen data. Besides, depending on the structure of the model, the training can be an extremely computationally intensive task -- training can usually be seen as an optimization process, not necessarily convex and usually in a high-dimensional space. Finally, the more complex the problem -- the bigger training set we need. And training data may be expensive, since it often takes a skilled linguist or other specialist to produce it.
Funny, really. Most humans are exposed to only a limited number of training examples and still manage to learn well from them. We still don't exactly know how.

Born to Learn

понедельник, 7 февраля 2011 г.

What Makes a Craftsman?

пятница, 21 января 2011 г.

Постоянные читатели

Архив блога

Обо мне