What is the precise definition of “performance” in machine learning?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












5












$begingroup$


In machine learning, people usually refer to the "performance of a model" or "performance of an optimizer". What is the exact definition of "performance"? What would be the "performance of an optimizer"?



I know that there are ways of measuring how far away current predictions of machine learning models are from the expected ones: for example, you can measure the accuracy of a model, perplexity, etc. Is this what the performance of a ML model refers to? Is it a name to refer to any way of measuring the correctness of the predictions (or, in general, outputs) of the model? Or is performance actually the time it takes to perform the prediction? Or something else?










share|cite|improve this question









$endgroup$
















    5












    $begingroup$


    In machine learning, people usually refer to the "performance of a model" or "performance of an optimizer". What is the exact definition of "performance"? What would be the "performance of an optimizer"?



    I know that there are ways of measuring how far away current predictions of machine learning models are from the expected ones: for example, you can measure the accuracy of a model, perplexity, etc. Is this what the performance of a ML model refers to? Is it a name to refer to any way of measuring the correctness of the predictions (or, in general, outputs) of the model? Or is performance actually the time it takes to perform the prediction? Or something else?










    share|cite|improve this question









    $endgroup$














      5












      5








      5


      1



      $begingroup$


      In machine learning, people usually refer to the "performance of a model" or "performance of an optimizer". What is the exact definition of "performance"? What would be the "performance of an optimizer"?



      I know that there are ways of measuring how far away current predictions of machine learning models are from the expected ones: for example, you can measure the accuracy of a model, perplexity, etc. Is this what the performance of a ML model refers to? Is it a name to refer to any way of measuring the correctness of the predictions (or, in general, outputs) of the model? Or is performance actually the time it takes to perform the prediction? Or something else?










      share|cite|improve this question









      $endgroup$




      In machine learning, people usually refer to the "performance of a model" or "performance of an optimizer". What is the exact definition of "performance"? What would be the "performance of an optimizer"?



      I know that there are ways of measuring how far away current predictions of machine learning models are from the expected ones: for example, you can measure the accuracy of a model, perplexity, etc. Is this what the performance of a ML model refers to? Is it a name to refer to any way of measuring the correctness of the predictions (or, in general, outputs) of the model? Or is performance actually the time it takes to perform the prediction? Or something else?







      machine-learning terminology model-evaluation definition






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Jan 2 at 16:08









      nbronbro

      472619




      472619




















          3 Answers
          3






          active

          oldest

          votes


















          8












          $begingroup$

          In the absence of any specific clarifying context, "performance" is just a synonym for "quality."



          The sentence "I want a model that performs better" is essentially the same as the sentence "I want a higher-quality model." Readers understand that the speaker is not satisfied with how well the model solves some particular problem, but the reader does not know, precisely, what about the model is dissatisfactory. Does the model predict too many false positives? Or false negatives? Does it predict incorrect classes for images that have a tilted horizon, or are taken on cloudy days? Understanding what about the model needs improvement would require further, specific elaboration.



          Likewise, if someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim. One way to assesses performance of an optimizer is how many iterations it takes to reach some neighborhood around a minimum; another, which is particular to machine learning classifiers, is how well the solutions obtained by an optimizer generalize to out-of-sample data.






          share|cite|improve this answer











          $endgroup$












          • $begingroup$
            What about the performance of a "optimizer" (e.g. Adam)?
            $endgroup$
            – nbro
            Jan 2 at 16:30











          • $begingroup$
            In the absence of any specific clarifying context, "performance" just means quality. If someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim.
            $endgroup$
            – Sycorax
            Jan 2 at 16:34











          • $begingroup$
            @nbro For an optimizer, better "performance" usually means it needs fewer training samples than the baseline in order to reach the same level of accuracy/perplexity/whatever.
            $endgroup$
            – Ray
            Jan 2 at 23:45










          • $begingroup$
            In other words, there is not a precise definition of performance. However, in most cases it is thought to be implied by the situation (lower out of sample error for a model, faster convergence for an algorithm).
            $endgroup$
            – Cliff AB
            Jan 3 at 0:36










          • $begingroup$
            @Ray Training samples or epochs/iterations to train (or both)?
            $endgroup$
            – nbro
            Jan 3 at 2:16


















          3












          $begingroup$

          Performance does not englobe a formal definition, it is mostly dependent on, but not limited to:



          • the context of the model you are trying to implement

          • the metrics you are using to evaluate the model's output

          • the objective you are pursuing

          The use of one metric or another will depend whether you are trying to predict a continuous or a discrete variable. Some of these are: Accuracy, Recall, Precision, R2, F-Measure, Mean Square Error, etc.



          To make this clear, say for instance you are working on a credit card fraud machine learning algorithm, where you want to predict the number of fraud transactions. To evaluate how well the algorithm works: a) understand the context b) understand the metrics that are applicable to the problem



          a) We are dealing with a classification problem; a transaction can be fraud or not (target is a discrete variable). We will be most-likely facing a highly imbalanced dataset since most of the credit card transactions are non-fraudulent.



          b) Since it is a classification problem, we can discard all the metrics associated to continuous variables (for example, R2 and MSE). Moreover, due to the possible imbalance, we can also discard Accuracy metric. This leaves us with two metrics: Recall and Precision. We know by theory that these two metrics present a trade-off (if I increase one, the other decreases). Recall would help detect the major possible transactions that were fraud. Precision would help with avoiding misclassifying frauds.



          Concluding, how well our it works, after considering the previous things, will ultimately depend on what is our objective:



          • Is our goal to detect the highest amount of fraudulent transactions
            as possible? If it is, our metric to evaluate the model's
            performance, and further improvements, will be Recall.

          • Is our goal to avoid classifying a transaction as non-fraud when it
            was a fraud? If it is, our metric to evaluate the model's
            performance, and further improvements, will be Precision.

          Hopefully this provides you a better insight.






          share|cite|improve this answer











          $endgroup$




















            2












            $begingroup$

            As the other answer correctly points out, there is no universal definition or measurement of performance of a machine learning model. Rather, performance metrics are highly dependent on the domain and ultimate purpose of the model being built. Performance of an ML model is just "how good" it does at a particular task, but the definition of "good" can take many forms. A "good" model could be one that predicts well, one that trains quickly, one that finds a robust solution, or any combination of the above.



            For example, an algorithm used for a medical screening test should be highly sensitive - we want to catch all possible cases of a disease, at the cost of misdiagnosing some people who aren't actually sick. These individuals can go on for further tests that may optimize other metrics like positive predictive value, indicating that a positive test result is likely a result of actually having the disease. Depending on the purpose of the test, we may want to put more weight on true positives/negatives at the cost of errors on the other side.



            Performance can also be a function of the error measure used. Suppose your classifier outputs values on a continuous scale which are then thresholded for a binary classification. Do you care only if points are on the correct side of the boundary (accuracy measure)? Or do you care how badly you missed on the misclassified points (RMSE)? There is no universal best way to optimize performance.






            share|cite|improve this answer









            $endgroup$












              Your Answer





              StackExchange.ifUsing("editor", function ()
              return StackExchange.using("mathjaxEditing", function ()
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              );
              );
              , "mathjax-editing");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "65"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f385312%2fwhat-is-the-precise-definition-of-performance-in-machine-learning%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              8












              $begingroup$

              In the absence of any specific clarifying context, "performance" is just a synonym for "quality."



              The sentence "I want a model that performs better" is essentially the same as the sentence "I want a higher-quality model." Readers understand that the speaker is not satisfied with how well the model solves some particular problem, but the reader does not know, precisely, what about the model is dissatisfactory. Does the model predict too many false positives? Or false negatives? Does it predict incorrect classes for images that have a tilted horizon, or are taken on cloudy days? Understanding what about the model needs improvement would require further, specific elaboration.



              Likewise, if someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim. One way to assesses performance of an optimizer is how many iterations it takes to reach some neighborhood around a minimum; another, which is particular to machine learning classifiers, is how well the solutions obtained by an optimizer generalize to out-of-sample data.






              share|cite|improve this answer











              $endgroup$












              • $begingroup$
                What about the performance of a "optimizer" (e.g. Adam)?
                $endgroup$
                – nbro
                Jan 2 at 16:30











              • $begingroup$
                In the absence of any specific clarifying context, "performance" just means quality. If someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim.
                $endgroup$
                – Sycorax
                Jan 2 at 16:34











              • $begingroup$
                @nbro For an optimizer, better "performance" usually means it needs fewer training samples than the baseline in order to reach the same level of accuracy/perplexity/whatever.
                $endgroup$
                – Ray
                Jan 2 at 23:45










              • $begingroup$
                In other words, there is not a precise definition of performance. However, in most cases it is thought to be implied by the situation (lower out of sample error for a model, faster convergence for an algorithm).
                $endgroup$
                – Cliff AB
                Jan 3 at 0:36










              • $begingroup$
                @Ray Training samples or epochs/iterations to train (or both)?
                $endgroup$
                – nbro
                Jan 3 at 2:16















              8












              $begingroup$

              In the absence of any specific clarifying context, "performance" is just a synonym for "quality."



              The sentence "I want a model that performs better" is essentially the same as the sentence "I want a higher-quality model." Readers understand that the speaker is not satisfied with how well the model solves some particular problem, but the reader does not know, precisely, what about the model is dissatisfactory. Does the model predict too many false positives? Or false negatives? Does it predict incorrect classes for images that have a tilted horizon, or are taken on cloudy days? Understanding what about the model needs improvement would require further, specific elaboration.



              Likewise, if someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim. One way to assesses performance of an optimizer is how many iterations it takes to reach some neighborhood around a minimum; another, which is particular to machine learning classifiers, is how well the solutions obtained by an optimizer generalize to out-of-sample data.






              share|cite|improve this answer











              $endgroup$












              • $begingroup$
                What about the performance of a "optimizer" (e.g. Adam)?
                $endgroup$
                – nbro
                Jan 2 at 16:30











              • $begingroup$
                In the absence of any specific clarifying context, "performance" just means quality. If someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim.
                $endgroup$
                – Sycorax
                Jan 2 at 16:34











              • $begingroup$
                @nbro For an optimizer, better "performance" usually means it needs fewer training samples than the baseline in order to reach the same level of accuracy/perplexity/whatever.
                $endgroup$
                – Ray
                Jan 2 at 23:45










              • $begingroup$
                In other words, there is not a precise definition of performance. However, in most cases it is thought to be implied by the situation (lower out of sample error for a model, faster convergence for an algorithm).
                $endgroup$
                – Cliff AB
                Jan 3 at 0:36










              • $begingroup$
                @Ray Training samples or epochs/iterations to train (or both)?
                $endgroup$
                – nbro
                Jan 3 at 2:16













              8












              8








              8





              $begingroup$

              In the absence of any specific clarifying context, "performance" is just a synonym for "quality."



              The sentence "I want a model that performs better" is essentially the same as the sentence "I want a higher-quality model." Readers understand that the speaker is not satisfied with how well the model solves some particular problem, but the reader does not know, precisely, what about the model is dissatisfactory. Does the model predict too many false positives? Or false negatives? Does it predict incorrect classes for images that have a tilted horizon, or are taken on cloudy days? Understanding what about the model needs improvement would require further, specific elaboration.



              Likewise, if someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim. One way to assesses performance of an optimizer is how many iterations it takes to reach some neighborhood around a minimum; another, which is particular to machine learning classifiers, is how well the solutions obtained by an optimizer generalize to out-of-sample data.






              share|cite|improve this answer











              $endgroup$



              In the absence of any specific clarifying context, "performance" is just a synonym for "quality."



              The sentence "I want a model that performs better" is essentially the same as the sentence "I want a higher-quality model." Readers understand that the speaker is not satisfied with how well the model solves some particular problem, but the reader does not know, precisely, what about the model is dissatisfactory. Does the model predict too many false positives? Or false negatives? Does it predict incorrect classes for images that have a tilted horizon, or are taken on cloudy days? Understanding what about the model needs improvement would require further, specific elaboration.



              Likewise, if someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim. One way to assesses performance of an optimizer is how many iterations it takes to reach some neighborhood around a minimum; another, which is particular to machine learning classifiers, is how well the solutions obtained by an optimizer generalize to out-of-sample data.







              share|cite|improve this answer














              share|cite|improve this answer



              share|cite|improve this answer








              edited Jan 2 at 16:41

























              answered Jan 2 at 16:14









              SycoraxSycorax

              39.3k1299198




              39.3k1299198











              • $begingroup$
                What about the performance of a "optimizer" (e.g. Adam)?
                $endgroup$
                – nbro
                Jan 2 at 16:30











              • $begingroup$
                In the absence of any specific clarifying context, "performance" just means quality. If someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim.
                $endgroup$
                – Sycorax
                Jan 2 at 16:34











              • $begingroup$
                @nbro For an optimizer, better "performance" usually means it needs fewer training samples than the baseline in order to reach the same level of accuracy/perplexity/whatever.
                $endgroup$
                – Ray
                Jan 2 at 23:45










              • $begingroup$
                In other words, there is not a precise definition of performance. However, in most cases it is thought to be implied by the situation (lower out of sample error for a model, faster convergence for an algorithm).
                $endgroup$
                – Cliff AB
                Jan 3 at 0:36










              • $begingroup$
                @Ray Training samples or epochs/iterations to train (or both)?
                $endgroup$
                – nbro
                Jan 3 at 2:16
















              • $begingroup$
                What about the performance of a "optimizer" (e.g. Adam)?
                $endgroup$
                – nbro
                Jan 2 at 16:30











              • $begingroup$
                In the absence of any specific clarifying context, "performance" just means quality. If someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim.
                $endgroup$
                – Sycorax
                Jan 2 at 16:34











              • $begingroup$
                @nbro For an optimizer, better "performance" usually means it needs fewer training samples than the baseline in order to reach the same level of accuracy/perplexity/whatever.
                $endgroup$
                – Ray
                Jan 2 at 23:45










              • $begingroup$
                In other words, there is not a precise definition of performance. However, in most cases it is thought to be implied by the situation (lower out of sample error for a model, faster convergence for an algorithm).
                $endgroup$
                – Cliff AB
                Jan 3 at 0:36










              • $begingroup$
                @Ray Training samples or epochs/iterations to train (or both)?
                $endgroup$
                – nbro
                Jan 3 at 2:16















              $begingroup$
              What about the performance of a "optimizer" (e.g. Adam)?
              $endgroup$
              – nbro
              Jan 2 at 16:30





              $begingroup$
              What about the performance of a "optimizer" (e.g. Adam)?
              $endgroup$
              – nbro
              Jan 2 at 16:30













              $begingroup$
              In the absence of any specific clarifying context, "performance" just means quality. If someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim.
              $endgroup$
              – Sycorax
              Jan 2 at 16:34





              $begingroup$
              In the absence of any specific clarifying context, "performance" just means quality. If someone says that Adam has better performance than another optimizer, they're making a claim that Adam does better at some task, which they would have to specify for it to be possible to assess the truthfulness of the claim.
              $endgroup$
              – Sycorax
              Jan 2 at 16:34













              $begingroup$
              @nbro For an optimizer, better "performance" usually means it needs fewer training samples than the baseline in order to reach the same level of accuracy/perplexity/whatever.
              $endgroup$
              – Ray
              Jan 2 at 23:45




              $begingroup$
              @nbro For an optimizer, better "performance" usually means it needs fewer training samples than the baseline in order to reach the same level of accuracy/perplexity/whatever.
              $endgroup$
              – Ray
              Jan 2 at 23:45












              $begingroup$
              In other words, there is not a precise definition of performance. However, in most cases it is thought to be implied by the situation (lower out of sample error for a model, faster convergence for an algorithm).
              $endgroup$
              – Cliff AB
              Jan 3 at 0:36




              $begingroup$
              In other words, there is not a precise definition of performance. However, in most cases it is thought to be implied by the situation (lower out of sample error for a model, faster convergence for an algorithm).
              $endgroup$
              – Cliff AB
              Jan 3 at 0:36












              $begingroup$
              @Ray Training samples or epochs/iterations to train (or both)?
              $endgroup$
              – nbro
              Jan 3 at 2:16




              $begingroup$
              @Ray Training samples or epochs/iterations to train (or both)?
              $endgroup$
              – nbro
              Jan 3 at 2:16













              3












              $begingroup$

              Performance does not englobe a formal definition, it is mostly dependent on, but not limited to:



              • the context of the model you are trying to implement

              • the metrics you are using to evaluate the model's output

              • the objective you are pursuing

              The use of one metric or another will depend whether you are trying to predict a continuous or a discrete variable. Some of these are: Accuracy, Recall, Precision, R2, F-Measure, Mean Square Error, etc.



              To make this clear, say for instance you are working on a credit card fraud machine learning algorithm, where you want to predict the number of fraud transactions. To evaluate how well the algorithm works: a) understand the context b) understand the metrics that are applicable to the problem



              a) We are dealing with a classification problem; a transaction can be fraud or not (target is a discrete variable). We will be most-likely facing a highly imbalanced dataset since most of the credit card transactions are non-fraudulent.



              b) Since it is a classification problem, we can discard all the metrics associated to continuous variables (for example, R2 and MSE). Moreover, due to the possible imbalance, we can also discard Accuracy metric. This leaves us with two metrics: Recall and Precision. We know by theory that these two metrics present a trade-off (if I increase one, the other decreases). Recall would help detect the major possible transactions that were fraud. Precision would help with avoiding misclassifying frauds.



              Concluding, how well our it works, after considering the previous things, will ultimately depend on what is our objective:



              • Is our goal to detect the highest amount of fraudulent transactions
                as possible? If it is, our metric to evaluate the model's
                performance, and further improvements, will be Recall.

              • Is our goal to avoid classifying a transaction as non-fraud when it
                was a fraud? If it is, our metric to evaluate the model's
                performance, and further improvements, will be Precision.

              Hopefully this provides you a better insight.






              share|cite|improve this answer











              $endgroup$

















                3












                $begingroup$

                Performance does not englobe a formal definition, it is mostly dependent on, but not limited to:



                • the context of the model you are trying to implement

                • the metrics you are using to evaluate the model's output

                • the objective you are pursuing

                The use of one metric or another will depend whether you are trying to predict a continuous or a discrete variable. Some of these are: Accuracy, Recall, Precision, R2, F-Measure, Mean Square Error, etc.



                To make this clear, say for instance you are working on a credit card fraud machine learning algorithm, where you want to predict the number of fraud transactions. To evaluate how well the algorithm works: a) understand the context b) understand the metrics that are applicable to the problem



                a) We are dealing with a classification problem; a transaction can be fraud or not (target is a discrete variable). We will be most-likely facing a highly imbalanced dataset since most of the credit card transactions are non-fraudulent.



                b) Since it is a classification problem, we can discard all the metrics associated to continuous variables (for example, R2 and MSE). Moreover, due to the possible imbalance, we can also discard Accuracy metric. This leaves us with two metrics: Recall and Precision. We know by theory that these two metrics present a trade-off (if I increase one, the other decreases). Recall would help detect the major possible transactions that were fraud. Precision would help with avoiding misclassifying frauds.



                Concluding, how well our it works, after considering the previous things, will ultimately depend on what is our objective:



                • Is our goal to detect the highest amount of fraudulent transactions
                  as possible? If it is, our metric to evaluate the model's
                  performance, and further improvements, will be Recall.

                • Is our goal to avoid classifying a transaction as non-fraud when it
                  was a fraud? If it is, our metric to evaluate the model's
                  performance, and further improvements, will be Precision.

                Hopefully this provides you a better insight.






                share|cite|improve this answer











                $endgroup$















                  3












                  3








                  3





                  $begingroup$

                  Performance does not englobe a formal definition, it is mostly dependent on, but not limited to:



                  • the context of the model you are trying to implement

                  • the metrics you are using to evaluate the model's output

                  • the objective you are pursuing

                  The use of one metric or another will depend whether you are trying to predict a continuous or a discrete variable. Some of these are: Accuracy, Recall, Precision, R2, F-Measure, Mean Square Error, etc.



                  To make this clear, say for instance you are working on a credit card fraud machine learning algorithm, where you want to predict the number of fraud transactions. To evaluate how well the algorithm works: a) understand the context b) understand the metrics that are applicable to the problem



                  a) We are dealing with a classification problem; a transaction can be fraud or not (target is a discrete variable). We will be most-likely facing a highly imbalanced dataset since most of the credit card transactions are non-fraudulent.



                  b) Since it is a classification problem, we can discard all the metrics associated to continuous variables (for example, R2 and MSE). Moreover, due to the possible imbalance, we can also discard Accuracy metric. This leaves us with two metrics: Recall and Precision. We know by theory that these two metrics present a trade-off (if I increase one, the other decreases). Recall would help detect the major possible transactions that were fraud. Precision would help with avoiding misclassifying frauds.



                  Concluding, how well our it works, after considering the previous things, will ultimately depend on what is our objective:



                  • Is our goal to detect the highest amount of fraudulent transactions
                    as possible? If it is, our metric to evaluate the model's
                    performance, and further improvements, will be Recall.

                  • Is our goal to avoid classifying a transaction as non-fraud when it
                    was a fraud? If it is, our metric to evaluate the model's
                    performance, and further improvements, will be Precision.

                  Hopefully this provides you a better insight.






                  share|cite|improve this answer











                  $endgroup$



                  Performance does not englobe a formal definition, it is mostly dependent on, but not limited to:



                  • the context of the model you are trying to implement

                  • the metrics you are using to evaluate the model's output

                  • the objective you are pursuing

                  The use of one metric or another will depend whether you are trying to predict a continuous or a discrete variable. Some of these are: Accuracy, Recall, Precision, R2, F-Measure, Mean Square Error, etc.



                  To make this clear, say for instance you are working on a credit card fraud machine learning algorithm, where you want to predict the number of fraud transactions. To evaluate how well the algorithm works: a) understand the context b) understand the metrics that are applicable to the problem



                  a) We are dealing with a classification problem; a transaction can be fraud or not (target is a discrete variable). We will be most-likely facing a highly imbalanced dataset since most of the credit card transactions are non-fraudulent.



                  b) Since it is a classification problem, we can discard all the metrics associated to continuous variables (for example, R2 and MSE). Moreover, due to the possible imbalance, we can also discard Accuracy metric. This leaves us with two metrics: Recall and Precision. We know by theory that these two metrics present a trade-off (if I increase one, the other decreases). Recall would help detect the major possible transactions that were fraud. Precision would help with avoiding misclassifying frauds.



                  Concluding, how well our it works, after considering the previous things, will ultimately depend on what is our objective:



                  • Is our goal to detect the highest amount of fraudulent transactions
                    as possible? If it is, our metric to evaluate the model's
                    performance, and further improvements, will be Recall.

                  • Is our goal to avoid classifying a transaction as non-fraud when it
                    was a fraud? If it is, our metric to evaluate the model's
                    performance, and further improvements, will be Precision.

                  Hopefully this provides you a better insight.







                  share|cite|improve this answer














                  share|cite|improve this answer



                  share|cite|improve this answer








                  edited Jan 3 at 13:08

























                  answered Jan 3 at 2:35









                  Fran JFran J

                  312




                  312





















                      2












                      $begingroup$

                      As the other answer correctly points out, there is no universal definition or measurement of performance of a machine learning model. Rather, performance metrics are highly dependent on the domain and ultimate purpose of the model being built. Performance of an ML model is just "how good" it does at a particular task, but the definition of "good" can take many forms. A "good" model could be one that predicts well, one that trains quickly, one that finds a robust solution, or any combination of the above.



                      For example, an algorithm used for a medical screening test should be highly sensitive - we want to catch all possible cases of a disease, at the cost of misdiagnosing some people who aren't actually sick. These individuals can go on for further tests that may optimize other metrics like positive predictive value, indicating that a positive test result is likely a result of actually having the disease. Depending on the purpose of the test, we may want to put more weight on true positives/negatives at the cost of errors on the other side.



                      Performance can also be a function of the error measure used. Suppose your classifier outputs values on a continuous scale which are then thresholded for a binary classification. Do you care only if points are on the correct side of the boundary (accuracy measure)? Or do you care how badly you missed on the misclassified points (RMSE)? There is no universal best way to optimize performance.






                      share|cite|improve this answer









                      $endgroup$

















                        2












                        $begingroup$

                        As the other answer correctly points out, there is no universal definition or measurement of performance of a machine learning model. Rather, performance metrics are highly dependent on the domain and ultimate purpose of the model being built. Performance of an ML model is just "how good" it does at a particular task, but the definition of "good" can take many forms. A "good" model could be one that predicts well, one that trains quickly, one that finds a robust solution, or any combination of the above.



                        For example, an algorithm used for a medical screening test should be highly sensitive - we want to catch all possible cases of a disease, at the cost of misdiagnosing some people who aren't actually sick. These individuals can go on for further tests that may optimize other metrics like positive predictive value, indicating that a positive test result is likely a result of actually having the disease. Depending on the purpose of the test, we may want to put more weight on true positives/negatives at the cost of errors on the other side.



                        Performance can also be a function of the error measure used. Suppose your classifier outputs values on a continuous scale which are then thresholded for a binary classification. Do you care only if points are on the correct side of the boundary (accuracy measure)? Or do you care how badly you missed on the misclassified points (RMSE)? There is no universal best way to optimize performance.






                        share|cite|improve this answer









                        $endgroup$















                          2












                          2








                          2





                          $begingroup$

                          As the other answer correctly points out, there is no universal definition or measurement of performance of a machine learning model. Rather, performance metrics are highly dependent on the domain and ultimate purpose of the model being built. Performance of an ML model is just "how good" it does at a particular task, but the definition of "good" can take many forms. A "good" model could be one that predicts well, one that trains quickly, one that finds a robust solution, or any combination of the above.



                          For example, an algorithm used for a medical screening test should be highly sensitive - we want to catch all possible cases of a disease, at the cost of misdiagnosing some people who aren't actually sick. These individuals can go on for further tests that may optimize other metrics like positive predictive value, indicating that a positive test result is likely a result of actually having the disease. Depending on the purpose of the test, we may want to put more weight on true positives/negatives at the cost of errors on the other side.



                          Performance can also be a function of the error measure used. Suppose your classifier outputs values on a continuous scale which are then thresholded for a binary classification. Do you care only if points are on the correct side of the boundary (accuracy measure)? Or do you care how badly you missed on the misclassified points (RMSE)? There is no universal best way to optimize performance.






                          share|cite|improve this answer









                          $endgroup$



                          As the other answer correctly points out, there is no universal definition or measurement of performance of a machine learning model. Rather, performance metrics are highly dependent on the domain and ultimate purpose of the model being built. Performance of an ML model is just "how good" it does at a particular task, but the definition of "good" can take many forms. A "good" model could be one that predicts well, one that trains quickly, one that finds a robust solution, or any combination of the above.



                          For example, an algorithm used for a medical screening test should be highly sensitive - we want to catch all possible cases of a disease, at the cost of misdiagnosing some people who aren't actually sick. These individuals can go on for further tests that may optimize other metrics like positive predictive value, indicating that a positive test result is likely a result of actually having the disease. Depending on the purpose of the test, we may want to put more weight on true positives/negatives at the cost of errors on the other side.



                          Performance can also be a function of the error measure used. Suppose your classifier outputs values on a continuous scale which are then thresholded for a binary classification. Do you care only if points are on the correct side of the boundary (accuracy measure)? Or do you care how badly you missed on the misclassified points (RMSE)? There is no universal best way to optimize performance.







                          share|cite|improve this answer












                          share|cite|improve this answer



                          share|cite|improve this answer










                          answered Jan 2 at 16:47









                          Nuclear WangNuclear Wang

                          2,567820




                          2,567820



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Cross Validated!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f385312%2fwhat-is-the-precise-definition-of-performance-in-machine-learning%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown






                              Popular posts from this blog

                              How to check contact read email or not when send email to Individual?

                              Displaying single band from multi-band raster using QGIS

                              How many registers does an x86_64 CPU actually have?