How to know for sure if we can learn from a given data or not?

up vote
4
down vote

favorite

I want to know that given a set of data and a target, how we can know for sure whether we can learn from that data to make any inference or not?

edited Sep 3 at 10:14

Media

5,54041443

asked Sep 3 at 8:27

Pranav Pandey

235

add a commentÂ |Â

up vote
4
down vote

favorite

I want to know that given a set of data and a target, how we can know for sure whether we can learn from that data to make any inference or not?

edited Sep 3 at 10:14

Media

5,54041443

asked Sep 3 at 8:27

Pranav Pandey

235

add a commentÂ |Â

up vote
4
down vote

favorite

I want to know that given a set of data and a target, how we can know for sure whether we can learn from that data to make any inference or not?

edited Sep 3 at 10:14

Media

5,54041443

asked Sep 3 at 8:27

Pranav Pandey

235

I want to know that given a set of data and a target, how we can know for sure whether we can learn from that data to make any inference or not?

machine-learning neural-network deep-learning data learning

edited Sep 3 at 10:14

Media

5,54041443

asked Sep 3 at 8:27

Pranav Pandey

235

edited Sep 3 at 10:14

Media

5,54041443

asked Sep 3 at 8:27

Pranav Pandey

235

edited Sep 3 at 10:14

Media

5,54041443

edited Sep 3 at 10:14

Media

5,54041443

edited Sep 3 at 10:14

Media

5,54041443

asked Sep 3 at 8:27

Pranav Pandey

235

asked Sep 3 at 8:27

Pranav Pandey

235

asked Sep 3 at 8:27

Pranav Pandey

235

add a commentÂ |Â

4 Answers
4

active

oldest

votes

up vote
6
down vote

accepted

how can we know for sure

We can't.
A toy example to show why even humans can not do this for sure:

Assume you get the number 2, 4, 8, 16, 32, ?? and want to extrapolate to the next number ??. A natural extension of the series would be 64, but we can not take this for granted. The next number can just as well be 0. You can not be sure.

Only given the data and without additional assumptions about what you would expect to see, you can not learn a correct model per-se. You always have to be critical about your data.

answered Sep 3 at 8:54

AndrÃ©

4389

1

I like to troll puzzles with such "guess the next number" questions, by answering ÃÂ€ and providing a function which really does that.
â€“Â vsz
Sep 3 at 20:47

add a commentÂ |Â

up vote
3
down vote

With respect to the presented answers, I want to add an extra explanation. Basically, what ML approaches do is approximating a mapping from inputs to outputs. This function usually should be well-behaved 1 otherwise you should have so much data to enable your model to learn it in the current feature space. To be more specific, you should find the distribution of your training data in your current feature space. It helps you investigate how much the distribution of your different labels overlap, for classification tasks. By doing that, you'll be able to figure out the best performance your best ML approach can have. The distribution of your data in your current feature space can show you the Bayes error of your model.

If you find out that the current Bayes error is a large value, then you can be sure that your data cannot be learned in the current feature space and you have to change the current features.

answered Sep 3 at 10:13

Media

5,54041443

add a commentÂ |Â

up vote
2
down vote

The current standard is essentially:

Given this input data, can any other system or approach classify it or estimate a quantity of interest? If so, then a machine learning approach may be able to achieve the same.

This is basically how machine learning challenges in computer perception can be treated as tractable. We have humans and other animals as working models, and make the assumption that the process can be automated. A similar approach can be made on any machine learning system which attempts to re-create the behaviour of an expert - provided we use the exact same input data, and enough of it, the ML system can learn what the expert does through statistical approximation.

The "expert" can be a statistician/data scientist looking at the data, using any tool. Exploratory plots of features and measures of correlation are a good way to assess whether a data set might be amenable to training a ML model for prediction. If you can visually separate classes on a scatter plot using some combination of features, then it is likely that a suitable ML model will be able to separate those classes too.

There are hard cases, where it seems on the surface like there is no pattern. Perhaps a relationship could be teased out and shown to exist with statistical analysis, but you could eschew that and directly throw some non-linear ML model at the problem in the hope that it finds it for you with the correct hyper-parameters. Of course you don't know in advance whether that is a worthwhile approach, and this carries some risks. But it is not that expensive to do once you have some data - just throw a fairly robust non-linear model at the problem, like XGBoost, and see what happens.

Of course, ML is not magic. If there is nothing to find, it will tend to find nothing. Worse than that, it can find spurious correlations, or patterns due to prejudice inherent in the data collection or labelling. Those issues are a problem regardless of evidence on whether it was theoretically possible to achieve a result at all. However, the kind of thinking that drives "let's throw some neural networks at this" has led to some published works which are quite terrifying and wrong on many levels. An example of such a system was a NN which classified a person as criminal or not according to a picture of their face - luckily flaws were pointed out in data collection on that one, but the original story made headline news in many places, despite essentially being a modern re-birth of Phrenology.

answered Sep 3 at 9:55

Neil Slater

15.2k22356

Thanks, Neil for your detailed explanation. :)
â€“Â Pranav Pandey
Sep 3 at 11:22

add a commentÂ |Â

up vote
2
down vote

One cannot evaluate ML models using deterministic approach. ML models do not simply follows if else statement where one can verify whether the model predict the outcome correctly or not. Majority of ML algorithms work on probabilistic approach that predicts the most probable or near class.

In addition to this, distinguishing boundary between different classes are not simple and linear always and in majority of the cases the class boundary that separates the data points follows the higher order of differential functions.

Many a times, noise data leads the separating boundary more complex and leads to deteriorate the performance of the model. Bias- variance trade off is the important concept one should learn in order to make the model work as intended.

answered Sep 18 at 13:49

Nirav Gandhi

616

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f37738%2fhow-to-know-for-sure-if-we-can-learn-from-a-given-data-or-not%23new-answer', 'question_page');

);

Post as a guest

Name

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
6
down vote

accepted

how can we know for sure

We can't.
A toy example to show why even humans can not do this for sure:

Only given the data and without additional assumptions about what you would expect to see, you can not learn a correct model per-se. You always have to be critical about your data.

answered Sep 3 at 8:54

AndrÃ©

4389

1

I like to troll puzzles with such "guess the next number" questions, by answering ÃÂ€ and providing a function which really does that.
â€“Â vsz
Sep 3 at 20:47

add a commentÂ |Â

up vote
6
down vote

accepted

how can we know for sure

We can't.
A toy example to show why even humans can not do this for sure:

Only given the data and without additional assumptions about what you would expect to see, you can not learn a correct model per-se. You always have to be critical about your data.

answered Sep 3 at 8:54

AndrÃ©

4389

1

I like to troll puzzles with such "guess the next number" questions, by answering ÃÂ€ and providing a function which really does that.
â€“Â vsz
Sep 3 at 20:47

add a commentÂ |Â

up vote
6
down vote

accepted

how can we know for sure

We can't.
A toy example to show why even humans can not do this for sure:

Only given the data and without additional assumptions about what you would expect to see, you can not learn a correct model per-se. You always have to be critical about your data.

answered Sep 3 at 8:54

AndrÃ©

4389

how can we know for sure

We can't.
A toy example to show why even humans can not do this for sure:

Only given the data and without additional assumptions about what you would expect to see, you can not learn a correct model per-se. You always have to be critical about your data.

answered Sep 3 at 8:54

AndrÃ©

4389

answered Sep 3 at 8:54

AndrÃ©

4389

answered Sep 3 at 8:54

AndrÃ©

4389

answered Sep 3 at 8:54

AndrÃ©

4389

1

I like to troll puzzles with such "guess the next number" questions, by answering ÃÂ€ and providing a function which really does that.
â€“Â vsz
Sep 3 at 20:47

add a commentÂ |Â

1

I like to troll puzzles with such "guess the next number" questions, by answering ÃÂ€ and providing a function which really does that.
â€“Â vsz
Sep 3 at 20:47

I like to troll puzzles with such "guess the next number" questions, by answering ÃÂ€ and providing a function which really does that.
â€“Â vsz
Sep 3 at 20:47

add a commentÂ |Â

up vote
3
down vote

If you find out that the current Bayes error is a large value, then you can be sure that your data cannot be learned in the current feature space and you have to change the current features.

answered Sep 3 at 10:13

Media

5,54041443

add a commentÂ |Â

up vote
3
down vote

If you find out that the current Bayes error is a large value, then you can be sure that your data cannot be learned in the current feature space and you have to change the current features.

answered Sep 3 at 10:13

Media

5,54041443

add a commentÂ |Â

up vote
3
down vote

If you find out that the current Bayes error is a large value, then you can be sure that your data cannot be learned in the current feature space and you have to change the current features.

answered Sep 3 at 10:13

Media

5,54041443

If you find out that the current Bayes error is a large value, then you can be sure that your data cannot be learned in the current feature space and you have to change the current features.

answered Sep 3 at 10:13

Media

5,54041443

answered Sep 3 at 10:13

Media

5,54041443

answered Sep 3 at 10:13

Media

5,54041443

answered Sep 3 at 10:13

Media

5,54041443

add a commentÂ |Â

up vote
2
down vote

The current standard is essentially:

Given this input data, can any other system or approach classify it or estimate a quantity of interest? If so, then a machine learning approach may be able to achieve the same.

answered Sep 3 at 9:55

Neil Slater

15.2k22356

Thanks, Neil for your detailed explanation. :)
â€“Â Pranav Pandey
Sep 3 at 11:22

add a commentÂ |Â

up vote
2
down vote

The current standard is essentially:

Given this input data, can any other system or approach classify it or estimate a quantity of interest? If so, then a machine learning approach may be able to achieve the same.

answered Sep 3 at 9:55

Neil Slater

15.2k22356

Thanks, Neil for your detailed explanation. :)
â€“Â Pranav Pandey
Sep 3 at 11:22

add a commentÂ |Â

up vote
2
down vote

The current standard is essentially:

Given this input data, can any other system or approach classify it or estimate a quantity of interest? If so, then a machine learning approach may be able to achieve the same.

answered Sep 3 at 9:55

Neil Slater

15.2k22356

The current standard is essentially:

Given this input data, can any other system or approach classify it or estimate a quantity of interest? If so, then a machine learning approach may be able to achieve the same.

answered Sep 3 at 9:55

Neil Slater

15.2k22356

answered Sep 3 at 9:55

Neil Slater

15.2k22356

answered Sep 3 at 9:55

Neil Slater

15.2k22356

answered Sep 3 at 9:55

Neil Slater

15.2k22356

Thanks, Neil for your detailed explanation. :)
â€“Â Pranav Pandey
Sep 3 at 11:22

add a commentÂ |Â

Thanks, Neil for your detailed explanation. :)
â€“Â Pranav Pandey
Sep 3 at 11:22

Thanks, Neil for your detailed explanation. :)
â€“Â Pranav Pandey
Sep 3 at 11:22

add a commentÂ |Â

up vote
2
down vote

answered Sep 18 at 13:49

Nirav Gandhi

616

add a commentÂ |Â

up vote
2
down vote

answered Sep 18 at 13:49

Nirav Gandhi

616

add a commentÂ |Â

up vote
2
down vote

answered Sep 18 at 13:49

Nirav Gandhi

616

answered Sep 18 at 13:49

Nirav Gandhi

616

answered Sep 18 at 13:49

Nirav Gandhi

616

answered Sep 18 at 13:49

Nirav Gandhi

616

answered Sep 18 at 13:49

Nirav Gandhi

616

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu