What is the precise definition of “performance” in machine learning?

In machine learning, people usually refer to the "performance of a model" or "performance of an optimizer". What is the exact definition of "performance"? What would be the "performance of an optimizer"?

I know that there are ways of measuring how far away current predictions of machine learning models are from the expected ones: for example, you can measure the accuracy of a model, perplexity, etc. Is this what the performance of a ML model refers to? Is it a name to refer to any way of measuring the correctness of the predictions (or, in general, outputs) of the model? Or is performance actually the time it takes to perform the prediction? Or something else?

asked Jan 2 at 16:08

nbro

472619

add a comment |

asked Jan 2 at 16:08

nbro

472619

add a comment |

asked Jan 2 at 16:08

nbro

472619

machine-learning terminology model-evaluation definition

asked Jan 2 at 16:08

nbro

472619

asked Jan 2 at 16:08

nbro

472619

asked Jan 2 at 16:08

nbro

472619

asked Jan 2 at 16:08

nbro

472619

asked Jan 2 at 16:08

nbro

472619

add a comment |

3 Answers
3

active

oldest

votes

In the absence of any specific clarifying context, "performance" is just a synonym for "quality."

The sentence "I want a model that performs better" is essentially the same as the sentence "I want a higher-quality model." Readers understand that the speaker is not satisfied with how well the model solves some particular problem, but the reader does not know, precisely, what about the model is dissatisfactory. Does the model predict too many false positives? Or false negatives? Does it predict incorrect classes for images that have a tilted horizon, or are taken on cloudy days? Understanding what about the model needs improvement would require further, specific elaboration.

Likewise, if someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim. One way to assesses performance of an optimizer is how many iterations it takes to reach some neighborhood around a minimum; another, which is particular to machine learning classifiers, is how well the solutions obtained by an optimizer generalize to out-of-sample data.

edited Jan 2 at 16:41

answered Jan 2 at 16:14

Sycorax

39.3k1299198

$begingroup$
What about the performance of a "optimizer" (e.g. Adam)?
$endgroup$
– nbro
Jan 2 at 16:30

$begingroup$
In the absence of any specific clarifying context, "performance" just means quality. If someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim.
$endgroup$
– Sycorax
Jan 2 at 16:34

$begingroup$
@nbro For an optimizer, better "performance" usually means it needs fewer training samples than the baseline in order to reach the same level of accuracy/perplexity/whatever.
$endgroup$
– Ray
Jan 2 at 23:45

$begingroup$
In other words, there is not a precise definition of performance. However, in most cases it is thought to be implied by the situation (lower out of sample error for a model, faster convergence for an algorithm).
$endgroup$
– Cliff AB
Jan 3 at 0:36

$begingroup$
@Ray Training samples or epochs/iterations to train (or both)?
$endgroup$
– nbro
Jan 3 at 2:16

|
show 1 more comment

Performance does not englobe a formal definition, it is mostly dependent on, but not limited to:

the context of the model you are trying to implement

the metrics you are using to evaluate the model's output

the objective you are pursuing

The use of one metric or another will depend whether you are trying to predict a continuous or a discrete variable. Some of these are: Accuracy, Recall, Precision, R2, F-Measure, Mean Square Error, etc.

To make this clear, say for instance you are working on a credit card fraud machine learning algorithm, where you want to predict the number of fraud transactions. To evaluate how well the algorithm works: a) understand the context b) understand the metrics that are applicable to the problem

a) We are dealing with a classification problem; a transaction can be fraud or not (target is a discrete variable). We will be most-likely facing a highly imbalanced dataset since most of the credit card transactions are non-fraudulent.

b) Since it is a classification problem, we can discard all the metrics associated to continuous variables (for example, R2 and MSE). Moreover, due to the possible imbalance, we can also discard Accuracy metric. This leaves us with two metrics: Recall and Precision. We know by theory that these two metrics present a trade-off (if I increase one, the other decreases). Recall would help detect the major possible transactions that were fraud. Precision would help with avoiding misclassifying frauds.

Concluding, how well our it works, after considering the previous things, will ultimately depend on what is our objective:

Is our goal to detect the highest amount of fraudulent transactions
as possible? If it is, our metric to evaluate the model's
performance, and further improvements, will be Recall.

Is our goal to avoid classifying a transaction as non-fraud when it
was a fraud? If it is, our metric to evaluate the model's
performance, and further improvements, will be Precision.

Hopefully this provides you a better insight.

edited Jan 3 at 13:08

answered Jan 3 at 2:35

Fran J

312

add a comment |

As the other answer correctly points out, there is no universal definition or measurement of performance of a machine learning model. Rather, performance metrics are highly dependent on the domain and ultimate purpose of the model being built. Performance of an ML model is just "how good" it does at a particular task, but the definition of "good" can take many forms. A "good" model could be one that predicts well, one that trains quickly, one that finds a robust solution, or any combination of the above.

For example, an algorithm used for a medical screening test should be highly sensitive - we want to catch all possible cases of a disease, at the cost of misdiagnosing some people who aren't actually sick. These individuals can go on for further tests that may optimize other metrics like positive predictive value, indicating that a positive test result is likely a result of actually having the disease. Depending on the purpose of the test, we may want to put more weight on true positives/negatives at the cost of errors on the other side.

Performance can also be a function of the error measure used. Suppose your classifier outputs values on a continuous scale which are then thresholded for a binary classification. Do you care only if points are on the correct side of the boundary (accuracy measure)? Or do you care how badly you missed on the misclassified points (RMSE)? There is no universal best way to optimize performance.

answered Jan 2 at 16:47

Nuclear Wang

2,567820

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f385312%2fwhat-is-the-precise-definition-of-performance-in-machine-learning%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

In the absence of any specific clarifying context, "performance" is just a synonym for "quality."

edited Jan 2 at 16:41

answered Jan 2 at 16:14

Sycorax

39.3k1299198

$begingroup$
What about the performance of a "optimizer" (e.g. Adam)?
$endgroup$
– nbro
Jan 2 at 16:30

$begingroup$
In the absence of any specific clarifying context, "performance" just means quality. If someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim.
$endgroup$
– Sycorax
Jan 2 at 16:34

$begingroup$
@nbro For an optimizer, better "performance" usually means it needs fewer training samples than the baseline in order to reach the same level of accuracy/perplexity/whatever.
$endgroup$
– Ray
Jan 2 at 23:45

$begingroup$
In other words, there is not a precise definition of performance. However, in most cases it is thought to be implied by the situation (lower out of sample error for a model, faster convergence for an algorithm).
$endgroup$
– Cliff AB
Jan 3 at 0:36

$begingroup$
@Ray Training samples or epochs/iterations to train (or both)?
$endgroup$
– nbro
Jan 3 at 2:16

|
show 1 more comment

In the absence of any specific clarifying context, "performance" is just a synonym for "quality."

edited Jan 2 at 16:41

answered Jan 2 at 16:14

Sycorax

39.3k1299198

$begingroup$
What about the performance of a "optimizer" (e.g. Adam)?
$endgroup$
– nbro
Jan 2 at 16:30

$begingroup$
In the absence of any specific clarifying context, "performance" just means quality. If someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim.
$endgroup$
– Sycorax
Jan 2 at 16:34

$begingroup$
@nbro For an optimizer, better "performance" usually means it needs fewer training samples than the baseline in order to reach the same level of accuracy/perplexity/whatever.
$endgroup$
– Ray
Jan 2 at 23:45

$begingroup$
In other words, there is not a precise definition of performance. However, in most cases it is thought to be implied by the situation (lower out of sample error for a model, faster convergence for an algorithm).
$endgroup$
– Cliff AB
Jan 3 at 0:36

$begingroup$
@Ray Training samples or epochs/iterations to train (or both)?
$endgroup$
– nbro
Jan 3 at 2:16

|
show 1 more comment

In the absence of any specific clarifying context, "performance" is just a synonym for "quality."

edited Jan 2 at 16:41

answered Jan 2 at 16:14

Sycorax

39.3k1299198

In the absence of any specific clarifying context, "performance" is just a synonym for "quality."

edited Jan 2 at 16:41

answered Jan 2 at 16:14

Sycorax

39.3k1299198

edited Jan 2 at 16:41

answered Jan 2 at 16:14

Sycorax

39.3k1299198

answered Jan 2 at 16:14

Sycorax

39.3k1299198

answered Jan 2 at 16:14

Sycorax

39.3k1299198

$begingroup$
What about the performance of a "optimizer" (e.g. Adam)?
$endgroup$
– nbro
Jan 2 at 16:30

$begingroup$
In the absence of any specific clarifying context, "performance" just means quality. If someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim.
$endgroup$
– Sycorax
Jan 2 at 16:34

$begingroup$
@nbro For an optimizer, better "performance" usually means it needs fewer training samples than the baseline in order to reach the same level of accuracy/perplexity/whatever.
$endgroup$
– Ray
Jan 2 at 23:45

$begingroup$
In other words, there is not a precise definition of performance. However, in most cases it is thought to be implied by the situation (lower out of sample error for a model, faster convergence for an algorithm).
$endgroup$
– Cliff AB
Jan 3 at 0:36

$begingroup$
@Ray Training samples or epochs/iterations to train (or both)?
$endgroup$
– nbro
Jan 3 at 2:16

|
show 1 more comment

$begingroup$
What about the performance of a "optimizer" (e.g. Adam)?
$endgroup$
– nbro
Jan 2 at 16:30

$begingroup$
In the absence of any specific clarifying context, "performance" just means quality. If someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim.
$endgroup$
– Sycorax
Jan 2 at 16:34

$begingroup$
@nbro For an optimizer, better "performance" usually means it needs fewer training samples than the baseline in order to reach the same level of accuracy/perplexity/whatever.
$endgroup$
– Ray
Jan 2 at 23:45

$begingroup$
In other words, there is not a precise definition of performance. However, in most cases it is thought to be implied by the situation (lower out of sample error for a model, faster convergence for an algorithm).
$endgroup$
– Cliff AB
Jan 3 at 0:36

$begingroup$
@Ray Training samples or epochs/iterations to train (or both)?
$endgroup$
– nbro
Jan 3 at 2:16

What about the performance of a "optimizer" (e.g. Adam)?

– nbro
Jan 2 at 16:30

In the absence of any specific clarifying context, "performance" just means quality. If someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim.

– Sycorax
Jan 2 at 16:34

@nbro For an optimizer, better "performance" usually means it needs fewer training samples than the baseline in order to reach the same level of accuracy/perplexity/whatever.

– Ray
Jan 2 at 23:45

In other words, there is not a precise definition of performance. However, in most cases it is thought to be implied by the situation (lower out of sample error for a model, faster convergence for an algorithm).

– Cliff AB
Jan 3 at 0:36

@Ray Training samples or epochs/iterations to train (or both)?

– nbro
Jan 3 at 2:16

|
show 1 more comment

Performance does not englobe a formal definition, it is mostly dependent on, but not limited to:

the context of the model you are trying to implement

the metrics you are using to evaluate the model's output

the objective you are pursuing

Concluding, how well our it works, after considering the previous things, will ultimately depend on what is our objective:

Is our goal to detect the highest amount of fraudulent transactions
as possible? If it is, our metric to evaluate the model's
performance, and further improvements, will be Recall.

Is our goal to avoid classifying a transaction as non-fraud when it
was a fraud? If it is, our metric to evaluate the model's
performance, and further improvements, will be Precision.

Hopefully this provides you a better insight.

edited Jan 3 at 13:08

answered Jan 3 at 2:35

Fran J

312

add a comment |

Performance does not englobe a formal definition, it is mostly dependent on, but not limited to:

the context of the model you are trying to implement

the metrics you are using to evaluate the model's output

the objective you are pursuing

Concluding, how well our it works, after considering the previous things, will ultimately depend on what is our objective:

Is our goal to detect the highest amount of fraudulent transactions
as possible? If it is, our metric to evaluate the model's
performance, and further improvements, will be Recall.

Is our goal to avoid classifying a transaction as non-fraud when it
was a fraud? If it is, our metric to evaluate the model's
performance, and further improvements, will be Precision.

Hopefully this provides you a better insight.

edited Jan 3 at 13:08

answered Jan 3 at 2:35

Fran J

312

add a comment |

Performance does not englobe a formal definition, it is mostly dependent on, but not limited to:

the context of the model you are trying to implement

the metrics you are using to evaluate the model's output

the objective you are pursuing

Concluding, how well our it works, after considering the previous things, will ultimately depend on what is our objective:

Is our goal to detect the highest amount of fraudulent transactions
as possible? If it is, our metric to evaluate the model's
performance, and further improvements, will be Recall.

Is our goal to avoid classifying a transaction as non-fraud when it
was a fraud? If it is, our metric to evaluate the model's
performance, and further improvements, will be Precision.

Hopefully this provides you a better insight.

edited Jan 3 at 13:08

answered Jan 3 at 2:35

Fran J

312

Performance does not englobe a formal definition, it is mostly dependent on, but not limited to:

the context of the model you are trying to implement

the metrics you are using to evaluate the model's output

the objective you are pursuing

Concluding, how well our it works, after considering the previous things, will ultimately depend on what is our objective:

Is our goal to detect the highest amount of fraudulent transactions
as possible? If it is, our metric to evaluate the model's
performance, and further improvements, will be Recall.

Is our goal to avoid classifying a transaction as non-fraud when it
was a fraud? If it is, our metric to evaluate the model's
performance, and further improvements, will be Precision.

Hopefully this provides you a better insight.

edited Jan 3 at 13:08

answered Jan 3 at 2:35

Fran J

312

edited Jan 3 at 13:08

answered Jan 3 at 2:35

Fran J

312

answered Jan 3 at 2:35

Fran J

312

answered Jan 3 at 2:35

Fran J

312

add a comment |

answered Jan 2 at 16:47

Nuclear Wang

2,567820

add a comment |

answered Jan 2 at 16:47

Nuclear Wang

2,567820

add a comment |

answered Jan 2 at 16:47

Nuclear Wang

2,567820

answered Jan 2 at 16:47

Nuclear Wang

2,567820

answered Jan 2 at 16:47

Nuclear Wang

2,567820

answered Jan 2 at 16:47

Nuclear Wang

2,567820

answered Jan 2 at 16:47

Nuclear Wang

2,567820

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu