Avoiding the overhead of C# virtual calls

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












33














I have a few heavily optimized math functions that take 1-2 nanoseconds to complete. These functions are called hundreds of millions of times per second, so call overhead is a concern, despite the already-excellent performance.



In order to keep the program maintainable, the classes that provide these methods inherit an IMathFunction interface, so that other objects can directly store a specific math function and use it when needed.



public interface IMathFunction

double Calculate(double input);
double Derivate(double input);


public SomeObject

// Note: There are cases where this is mutable
private readonly IMathFunction mathFunction_;

public double SomeWork(double input, double step)

var f = mathFunction_.Calculate(input);
var dv = mathFunction_.Derivate(input);
return f - (dv * step);




This interface is causing an enormous overhead compared to a direct call due to how the consuming code uses it. A direct call takes 1-2ns, whereas the virtual interface call takes 8-9ns. Evidently, the presence of the interface and its subsequent translation of the virtual call is the bottleneck for this scenario.



I would like to retain both maintainability and performance if possible. Is there a way I can resolve the virtual function to a direct call when the object is instantiated so that all subsequent calls are able to avoid the overhead? I assume this would involve creating delegates with IL, but I wouldn't know where to start with that.










share|improve this question



















  • 5




    How do you measure the nano second timing?
    – z3nth10n
    Dec 14 at 19:38







  • 3




    @z3nth10n BenchmarkDotNet. It warms up and pre-JIT's everything before benchmarking. This is in release mode. Profiling with dotTrace also shows similar results.
    – Haus
    Dec 14 at 19:39






  • 1




    Have you looked at the generated IL. I'm not an expert in this regard, but my understanding is that both virtual and non-virtual calls in C# and VB use the callvirt instruction (in order to properly fail with the object instance associated with the call is null). Your question surprised me.
    – Flydog57
    Dec 14 at 19:46







  • 1




    Is there any feasible way you can offload the core of the math functions to C or C++? That way you can have a struct with a function pointer in it, which you'd only have to fill once, and you'd never see a vtable lookup (well, in C, not sure about C++). When we're talking nanosecond-level timings, it could be worth it.
    – Chris Akridge
    Dec 14 at 19:47






  • 4




    Cory Nelson's solution is good and that is what I would pursue. It might be worthwhile however to do a quick check and see what the performance of saving the function into a delegate is, and then invoking the delegate. That also has some indirection, but it might be slightly smaller than the interface indirection.
    – Eric Lippert
    Dec 14 at 19:49















33














I have a few heavily optimized math functions that take 1-2 nanoseconds to complete. These functions are called hundreds of millions of times per second, so call overhead is a concern, despite the already-excellent performance.



In order to keep the program maintainable, the classes that provide these methods inherit an IMathFunction interface, so that other objects can directly store a specific math function and use it when needed.



public interface IMathFunction

double Calculate(double input);
double Derivate(double input);


public SomeObject

// Note: There are cases where this is mutable
private readonly IMathFunction mathFunction_;

public double SomeWork(double input, double step)

var f = mathFunction_.Calculate(input);
var dv = mathFunction_.Derivate(input);
return f - (dv * step);




This interface is causing an enormous overhead compared to a direct call due to how the consuming code uses it. A direct call takes 1-2ns, whereas the virtual interface call takes 8-9ns. Evidently, the presence of the interface and its subsequent translation of the virtual call is the bottleneck for this scenario.



I would like to retain both maintainability and performance if possible. Is there a way I can resolve the virtual function to a direct call when the object is instantiated so that all subsequent calls are able to avoid the overhead? I assume this would involve creating delegates with IL, but I wouldn't know where to start with that.










share|improve this question



















  • 5




    How do you measure the nano second timing?
    – z3nth10n
    Dec 14 at 19:38







  • 3




    @z3nth10n BenchmarkDotNet. It warms up and pre-JIT's everything before benchmarking. This is in release mode. Profiling with dotTrace also shows similar results.
    – Haus
    Dec 14 at 19:39






  • 1




    Have you looked at the generated IL. I'm not an expert in this regard, but my understanding is that both virtual and non-virtual calls in C# and VB use the callvirt instruction (in order to properly fail with the object instance associated with the call is null). Your question surprised me.
    – Flydog57
    Dec 14 at 19:46







  • 1




    Is there any feasible way you can offload the core of the math functions to C or C++? That way you can have a struct with a function pointer in it, which you'd only have to fill once, and you'd never see a vtable lookup (well, in C, not sure about C++). When we're talking nanosecond-level timings, it could be worth it.
    – Chris Akridge
    Dec 14 at 19:47






  • 4




    Cory Nelson's solution is good and that is what I would pursue. It might be worthwhile however to do a quick check and see what the performance of saving the function into a delegate is, and then invoking the delegate. That also has some indirection, but it might be slightly smaller than the interface indirection.
    – Eric Lippert
    Dec 14 at 19:49













33












33








33


6





I have a few heavily optimized math functions that take 1-2 nanoseconds to complete. These functions are called hundreds of millions of times per second, so call overhead is a concern, despite the already-excellent performance.



In order to keep the program maintainable, the classes that provide these methods inherit an IMathFunction interface, so that other objects can directly store a specific math function and use it when needed.



public interface IMathFunction

double Calculate(double input);
double Derivate(double input);


public SomeObject

// Note: There are cases where this is mutable
private readonly IMathFunction mathFunction_;

public double SomeWork(double input, double step)

var f = mathFunction_.Calculate(input);
var dv = mathFunction_.Derivate(input);
return f - (dv * step);




This interface is causing an enormous overhead compared to a direct call due to how the consuming code uses it. A direct call takes 1-2ns, whereas the virtual interface call takes 8-9ns. Evidently, the presence of the interface and its subsequent translation of the virtual call is the bottleneck for this scenario.



I would like to retain both maintainability and performance if possible. Is there a way I can resolve the virtual function to a direct call when the object is instantiated so that all subsequent calls are able to avoid the overhead? I assume this would involve creating delegates with IL, but I wouldn't know where to start with that.










share|improve this question















I have a few heavily optimized math functions that take 1-2 nanoseconds to complete. These functions are called hundreds of millions of times per second, so call overhead is a concern, despite the already-excellent performance.



In order to keep the program maintainable, the classes that provide these methods inherit an IMathFunction interface, so that other objects can directly store a specific math function and use it when needed.



public interface IMathFunction

double Calculate(double input);
double Derivate(double input);


public SomeObject

// Note: There are cases where this is mutable
private readonly IMathFunction mathFunction_;

public double SomeWork(double input, double step)

var f = mathFunction_.Calculate(input);
var dv = mathFunction_.Derivate(input);
return f - (dv * step);




This interface is causing an enormous overhead compared to a direct call due to how the consuming code uses it. A direct call takes 1-2ns, whereas the virtual interface call takes 8-9ns. Evidently, the presence of the interface and its subsequent translation of the virtual call is the bottleneck for this scenario.



I would like to retain both maintainability and performance if possible. Is there a way I can resolve the virtual function to a direct call when the object is instantiated so that all subsequent calls are able to avoid the overhead? I assume this would involve creating delegates with IL, but I wouldn't know where to start with that.







c# virtual-functions micro-optimization






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 14 at 19:46

























asked Dec 14 at 19:35









Haus

476112




476112







  • 5




    How do you measure the nano second timing?
    – z3nth10n
    Dec 14 at 19:38







  • 3




    @z3nth10n BenchmarkDotNet. It warms up and pre-JIT's everything before benchmarking. This is in release mode. Profiling with dotTrace also shows similar results.
    – Haus
    Dec 14 at 19:39






  • 1




    Have you looked at the generated IL. I'm not an expert in this regard, but my understanding is that both virtual and non-virtual calls in C# and VB use the callvirt instruction (in order to properly fail with the object instance associated with the call is null). Your question surprised me.
    – Flydog57
    Dec 14 at 19:46







  • 1




    Is there any feasible way you can offload the core of the math functions to C or C++? That way you can have a struct with a function pointer in it, which you'd only have to fill once, and you'd never see a vtable lookup (well, in C, not sure about C++). When we're talking nanosecond-level timings, it could be worth it.
    – Chris Akridge
    Dec 14 at 19:47






  • 4




    Cory Nelson's solution is good and that is what I would pursue. It might be worthwhile however to do a quick check and see what the performance of saving the function into a delegate is, and then invoking the delegate. That also has some indirection, but it might be slightly smaller than the interface indirection.
    – Eric Lippert
    Dec 14 at 19:49












  • 5




    How do you measure the nano second timing?
    – z3nth10n
    Dec 14 at 19:38







  • 3




    @z3nth10n BenchmarkDotNet. It warms up and pre-JIT's everything before benchmarking. This is in release mode. Profiling with dotTrace also shows similar results.
    – Haus
    Dec 14 at 19:39






  • 1




    Have you looked at the generated IL. I'm not an expert in this regard, but my understanding is that both virtual and non-virtual calls in C# and VB use the callvirt instruction (in order to properly fail with the object instance associated with the call is null). Your question surprised me.
    – Flydog57
    Dec 14 at 19:46







  • 1




    Is there any feasible way you can offload the core of the math functions to C or C++? That way you can have a struct with a function pointer in it, which you'd only have to fill once, and you'd never see a vtable lookup (well, in C, not sure about C++). When we're talking nanosecond-level timings, it could be worth it.
    – Chris Akridge
    Dec 14 at 19:47






  • 4




    Cory Nelson's solution is good and that is what I would pursue. It might be worthwhile however to do a quick check and see what the performance of saving the function into a delegate is, and then invoking the delegate. That also has some indirection, but it might be slightly smaller than the interface indirection.
    – Eric Lippert
    Dec 14 at 19:49







5




5




How do you measure the nano second timing?
– z3nth10n
Dec 14 at 19:38





How do you measure the nano second timing?
– z3nth10n
Dec 14 at 19:38





3




3




@z3nth10n BenchmarkDotNet. It warms up and pre-JIT's everything before benchmarking. This is in release mode. Profiling with dotTrace also shows similar results.
– Haus
Dec 14 at 19:39




@z3nth10n BenchmarkDotNet. It warms up and pre-JIT's everything before benchmarking. This is in release mode. Profiling with dotTrace also shows similar results.
– Haus
Dec 14 at 19:39




1




1




Have you looked at the generated IL. I'm not an expert in this regard, but my understanding is that both virtual and non-virtual calls in C# and VB use the callvirt instruction (in order to properly fail with the object instance associated with the call is null). Your question surprised me.
– Flydog57
Dec 14 at 19:46





Have you looked at the generated IL. I'm not an expert in this regard, but my understanding is that both virtual and non-virtual calls in C# and VB use the callvirt instruction (in order to properly fail with the object instance associated with the call is null). Your question surprised me.
– Flydog57
Dec 14 at 19:46





1




1




Is there any feasible way you can offload the core of the math functions to C or C++? That way you can have a struct with a function pointer in it, which you'd only have to fill once, and you'd never see a vtable lookup (well, in C, not sure about C++). When we're talking nanosecond-level timings, it could be worth it.
– Chris Akridge
Dec 14 at 19:47




Is there any feasible way you can offload the core of the math functions to C or C++? That way you can have a struct with a function pointer in it, which you'd only have to fill once, and you'd never see a vtable lookup (well, in C, not sure about C++). When we're talking nanosecond-level timings, it could be worth it.
– Chris Akridge
Dec 14 at 19:47




4




4




Cory Nelson's solution is good and that is what I would pursue. It might be worthwhile however to do a quick check and see what the performance of saving the function into a delegate is, and then invoking the delegate. That also has some indirection, but it might be slightly smaller than the interface indirection.
– Eric Lippert
Dec 14 at 19:49




Cory Nelson's solution is good and that is what I would pursue. It might be worthwhile however to do a quick check and see what the performance of saving the function into a delegate is, and then invoking the delegate. That also has some indirection, but it might be slightly smaller than the interface indirection.
– Eric Lippert
Dec 14 at 19:49












2 Answers
2






active

oldest

votes


















34














So this has obvious limitations and should not be used all the time anywhere you have an interface, but if you have a place where perf really needs to be maximized you can use generics:



public SomeObject<TMathFunction> where TMathFunction: struct, IMathFunction 

private readonly TMathFunction mathFunction_;

public double SomeWork(double input, double step)

var f = mathFunction_.Calculate(input);
var dv = mathFunction_.Derivate(input);
return f - (dv * step);




And instead of passing an interface, pass your implementation as TMathFunction. This will avoid vtable lookups due to an interface and also allow inlining.



Note the use of struct is important here, as generics will otherwise access the class via the interface.



Some implementation:



I made a simple implementation of IMathFunction for testing:



class SomeImplementationByRef : IMathFunction

public double Calculate(double input)

return input + input;


public double Derivate(double input)

return input * input;




... as well as a struct version and an abstract version.



So, here's what happens with the interface version. You can see it is relatively inefficient because it performs two levels of indirection:



 return obj.SomeWork(input, step);
sub esp,40h
vzeroupper
vmovaps xmmword ptr [rsp+30h],xmm6
vmovaps xmmword ptr [rsp+20h],xmm7
mov rsi,rcx
vmovsd qword ptr [rsp+60h],xmm2
vmovaps xmm6,xmm1
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov r11,7FFED7980020h ; load vtable address of the IMathFunction.Calculate function.
cmp dword ptr [rcx],ecx
call qword ptr [r11] ; call IMathFunction.Calculate function which will call the actual Calculate via vtable.
vmovaps xmm7,xmm0
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov r11,7FFED7980028h ; load vtable address of the IMathFunction.Derivate function.
cmp dword ptr [rcx],ecx
call qword ptr [r11] ; call IMathFunction.Derivate function which will call the actual Derivate via vtable.
vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
vmovaps xmm0,xmm7
vmovaps xmm6,xmmword ptr [rsp+30h]
vmovaps xmm7,xmmword ptr [rsp+20h]
add rsp,40h
pop rsi
ret


Here's an abstract class. It's a little more efficient but only negligibly:



 return obj.SomeWork(input, step);
sub esp,40h
vzeroupper
vmovaps xmmword ptr [rsp+30h],xmm6
vmovaps xmmword ptr [rsp+20h],xmm7
mov rsi,rcx
vmovsd qword ptr [rsp+60h],xmm2
vmovaps xmm6,xmm1
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
call qword ptr [rax+20h] ; call Calculate via offset 0x20 of vtable.
vmovaps xmm7,xmm0
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
call qword ptr [rax+28h] ; call Derivate via offset 0x28 of vtable.
vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
vmovaps xmm0,xmm7
vmovaps xmm6,xmmword ptr [rsp+30h]
vmovaps xmm7,xmmword ptr [rsp+20h]
add rsp,40h
pop rsi
ret


So both an interface and an abstract class rely heavily on branch target prediction to have acceptable performance. Even then, you can see there's quite a lot more going into it, so the best-case is still relatively slow while the worst-case is a stalled pipeline due to a mispredict.



And finally here's the generic version with a struct. You can see it's massively more efficient because everything has been fully inlined so there's no branch prediction involved. It also has the nice side effect of removing most of the stack/parameter management that was in there too, so the code becomes very compact:



 return obj.SomeWork(input, step);
push rax
vzeroupper
movsx rax,byte ptr [rcx+8]
vmovaps xmm0,xmm1
vaddsd xmm0,xmm0,xmm1 ; Calculate - got inlined
vmulsd xmm1,xmm1,xmm1 ; Derivate - got inlined
vmulsd xmm1,xmm1,xmm2 ; dv * step
vsubsd xmm0,xmm0,xmm1 ; f -
add rsp,8
ret





share|improve this answer






















  • This is a very clever solution! Thanks for the post. It absolutely works for classes where mathFunction_ is readonly, but I also have cases where it is a mutable field. Nevertheless, I will experiment with this.
    – Haus
    Dec 14 at 19:45










  • What is that trailing underscore there for?
    – Robert Harvey
    Dec 14 at 22:16










  • @RobertHarvey it's part of the field's name
    – Cory Nelson
    Dec 14 at 22:20






  • 3




    @RobertHarvey i assume it is a convention denoting a private field. It is part of the question's code.
    – Cory Nelson
    Dec 14 at 22:26






  • 2




    I don't think this will actually do anything to optimize it. From what I understand of CLR internals (and, admittedly it may have changed in the last 3 or so years), the CLR will only generate one version of the generic class internally. i.e. it will generate an IMathFunction call rather than a direct call. If TMathFunction is a struct, it will generate separate code for each type of TMathFunction. The only way I could think to optimize it all the way would be to use an abstract base class (because virtual calls are faster than interface calls) or require where T: struct, IMathFunction.
    – Colorfully Monochrome
    Dec 14 at 23:28



















9














I would assign the methods to delegates. This allows you to still program against the interface, while avoiding the interface method resolution.



public SomeObject

private readonly Func<double, double> _calculate;
private readonly Func<double, double> _derivate;

public SomeObject(IMathFunction mathFunction)

_calculate = mathFunction.Calculate;
_derivate = mathFunction.Derivate;


public double SomeWork(double input, double step)

var f = _calculate(input);
var dv = _derivate(input);
return f - (dv * step);





In response to @CoryNelson's comment I made tests so see what the impact really is. I have sealed the function class, but this seems to make absolutely no difference since my methods are not virtual.



Test Results (mean time of 100 million iterations in ns) with the empty method time subtracted in braces:




Empty Work method: 1.48

Interface: 5.69 (4.21)

Delegates: 5.78 (4.30)

Sealed Class: 2.10 (0.62)

Class: 2.12 (0.64)




The delegate version time is about the same as for the interface version (the exact times vary from test execution to test execution). While working against the class is about 6.8 x faster (comparing times minus the empty work method time)! This means that my suggestion to work with delegates was not helpful!



What surprised me was, that I expected a much longer execution time for the interface version. Since this kind of test does not represent the exact context of the OP's code, its validity is limited.



static class TimingInterfaceVsDelegateCalls

const int N = 100_000_000;
const double msToNs = 1e6 / N;

static SquareFunctionSealed _mathFunctionClassSealed;
static SquareFunction _mathFunctionClass;
static IMathFunction _mathFunctionInterface;
static Func<double, double> _calculate;
static Func<double, double> _derivate;

static TimingInterfaceVsDelegateCalls()

_mathFunctionClass = new SquareFunction();
_mathFunctionClassSealed = new SquareFunctionSealed();
_mathFunctionInterface = _mathFunctionClassSealed;
_calculate = _mathFunctionInterface.Calculate;
_derivate = _mathFunctionInterface.Derivate;


interface IMathFunction

double Calculate(double input);
double Derivate(double input);


sealed class SquareFunctionSealed : IMathFunction

public double Calculate(double input)

return input * input;


public double Derivate(double input)

return 2 * input;



class SquareFunction : IMathFunction

public double Calculate(double input)

return input * input;


public double Derivate(double input)

return 2 * input;



public static void Test()

var stopWatch = new Stopwatch();

stopWatch.Start();
for (int i = 0; i < N; i++)
double result = SomeWorkEmpty(i);

stopWatch.Stop();
double emptyTime = stopWatch.ElapsedMilliseconds * msToNs;
Console.WriteLine($"Empty Work method: emptyTime:n2");

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkInterface(i);

stopWatch.Stop();
PrintResult("Interface", stopWatch.ElapsedMilliseconds, emptyTime);

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkDelegate(i);

stopWatch.Stop();
PrintResult("Delegates", stopWatch.ElapsedMilliseconds, emptyTime);

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkClassSealed(i);

stopWatch.Stop();
PrintResult("Sealed Class", stopWatch.ElapsedMilliseconds, emptyTime);

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkClass(i);

stopWatch.Stop();
PrintResult("Class", stopWatch.ElapsedMilliseconds, emptyTime);


private static void PrintResult(string text, long elapsed, double emptyTime)

Console.WriteLine($"text: elapsed * msToNs:n2 (elapsed * msToNs - emptyTime:n2)");


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkEmpty(int i)

return 0.0;


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkInterface(int i)

double f = _mathFunctionInterface.Calculate(i);
double dv = _mathFunctionInterface.Derivate(i);
return f - (dv * 12.34534);


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkDelegate(int i)

double f = _calculate(i);
double dv = _derivate(i);
return f - (dv * 12.34534);


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkClassSealed(int i)

double f = _mathFunctionClassSealed.Calculate(i);
double dv = _mathFunctionClassSealed.Derivate(i);
return f - (dv * 12.34534);


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkClass(int i)

double f = _mathFunctionClass.Calculate(i);
double dv = _mathFunctionClass.Derivate(i);
return f - (dv * 12.34534);




The idea of [MethodImpl(MethodImplOptions.NoInlining)] is to prevent the compiler from calculating the addresses of the methods before the loop if the method was inlined.






share|improve this answer


















  • 1




    Does this have any meaningful effect? At first glance, it appears you're substituting the interface vtable for a custom made one.
    – Cory Nelson
    Dec 14 at 20:41











  • Sorry. my previous version was done in Debug mode. This final verion is now in Release mode.
    – Olivier Jacot-Descombes
    Dec 14 at 21:49










  • @CoryNelson: In response to your comments, I have done some tests. It does indeed have no noticable effects.
    – Olivier Jacot-Descombes
    Dec 14 at 22:16










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53785910%2favoiding-the-overhead-of-c-sharp-virtual-calls%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









34














So this has obvious limitations and should not be used all the time anywhere you have an interface, but if you have a place where perf really needs to be maximized you can use generics:



public SomeObject<TMathFunction> where TMathFunction: struct, IMathFunction 

private readonly TMathFunction mathFunction_;

public double SomeWork(double input, double step)

var f = mathFunction_.Calculate(input);
var dv = mathFunction_.Derivate(input);
return f - (dv * step);




And instead of passing an interface, pass your implementation as TMathFunction. This will avoid vtable lookups due to an interface and also allow inlining.



Note the use of struct is important here, as generics will otherwise access the class via the interface.



Some implementation:



I made a simple implementation of IMathFunction for testing:



class SomeImplementationByRef : IMathFunction

public double Calculate(double input)

return input + input;


public double Derivate(double input)

return input * input;




... as well as a struct version and an abstract version.



So, here's what happens with the interface version. You can see it is relatively inefficient because it performs two levels of indirection:



 return obj.SomeWork(input, step);
sub esp,40h
vzeroupper
vmovaps xmmword ptr [rsp+30h],xmm6
vmovaps xmmword ptr [rsp+20h],xmm7
mov rsi,rcx
vmovsd qword ptr [rsp+60h],xmm2
vmovaps xmm6,xmm1
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov r11,7FFED7980020h ; load vtable address of the IMathFunction.Calculate function.
cmp dword ptr [rcx],ecx
call qword ptr [r11] ; call IMathFunction.Calculate function which will call the actual Calculate via vtable.
vmovaps xmm7,xmm0
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov r11,7FFED7980028h ; load vtable address of the IMathFunction.Derivate function.
cmp dword ptr [rcx],ecx
call qword ptr [r11] ; call IMathFunction.Derivate function which will call the actual Derivate via vtable.
vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
vmovaps xmm0,xmm7
vmovaps xmm6,xmmword ptr [rsp+30h]
vmovaps xmm7,xmmword ptr [rsp+20h]
add rsp,40h
pop rsi
ret


Here's an abstract class. It's a little more efficient but only negligibly:



 return obj.SomeWork(input, step);
sub esp,40h
vzeroupper
vmovaps xmmword ptr [rsp+30h],xmm6
vmovaps xmmword ptr [rsp+20h],xmm7
mov rsi,rcx
vmovsd qword ptr [rsp+60h],xmm2
vmovaps xmm6,xmm1
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
call qword ptr [rax+20h] ; call Calculate via offset 0x20 of vtable.
vmovaps xmm7,xmm0
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
call qword ptr [rax+28h] ; call Derivate via offset 0x28 of vtable.
vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
vmovaps xmm0,xmm7
vmovaps xmm6,xmmword ptr [rsp+30h]
vmovaps xmm7,xmmword ptr [rsp+20h]
add rsp,40h
pop rsi
ret


So both an interface and an abstract class rely heavily on branch target prediction to have acceptable performance. Even then, you can see there's quite a lot more going into it, so the best-case is still relatively slow while the worst-case is a stalled pipeline due to a mispredict.



And finally here's the generic version with a struct. You can see it's massively more efficient because everything has been fully inlined so there's no branch prediction involved. It also has the nice side effect of removing most of the stack/parameter management that was in there too, so the code becomes very compact:



 return obj.SomeWork(input, step);
push rax
vzeroupper
movsx rax,byte ptr [rcx+8]
vmovaps xmm0,xmm1
vaddsd xmm0,xmm0,xmm1 ; Calculate - got inlined
vmulsd xmm1,xmm1,xmm1 ; Derivate - got inlined
vmulsd xmm1,xmm1,xmm2 ; dv * step
vsubsd xmm0,xmm0,xmm1 ; f -
add rsp,8
ret





share|improve this answer






















  • This is a very clever solution! Thanks for the post. It absolutely works for classes where mathFunction_ is readonly, but I also have cases where it is a mutable field. Nevertheless, I will experiment with this.
    – Haus
    Dec 14 at 19:45










  • What is that trailing underscore there for?
    – Robert Harvey
    Dec 14 at 22:16










  • @RobertHarvey it's part of the field's name
    – Cory Nelson
    Dec 14 at 22:20






  • 3




    @RobertHarvey i assume it is a convention denoting a private field. It is part of the question's code.
    – Cory Nelson
    Dec 14 at 22:26






  • 2




    I don't think this will actually do anything to optimize it. From what I understand of CLR internals (and, admittedly it may have changed in the last 3 or so years), the CLR will only generate one version of the generic class internally. i.e. it will generate an IMathFunction call rather than a direct call. If TMathFunction is a struct, it will generate separate code for each type of TMathFunction. The only way I could think to optimize it all the way would be to use an abstract base class (because virtual calls are faster than interface calls) or require where T: struct, IMathFunction.
    – Colorfully Monochrome
    Dec 14 at 23:28
















34














So this has obvious limitations and should not be used all the time anywhere you have an interface, but if you have a place where perf really needs to be maximized you can use generics:



public SomeObject<TMathFunction> where TMathFunction: struct, IMathFunction 

private readonly TMathFunction mathFunction_;

public double SomeWork(double input, double step)

var f = mathFunction_.Calculate(input);
var dv = mathFunction_.Derivate(input);
return f - (dv * step);




And instead of passing an interface, pass your implementation as TMathFunction. This will avoid vtable lookups due to an interface and also allow inlining.



Note the use of struct is important here, as generics will otherwise access the class via the interface.



Some implementation:



I made a simple implementation of IMathFunction for testing:



class SomeImplementationByRef : IMathFunction

public double Calculate(double input)

return input + input;


public double Derivate(double input)

return input * input;




... as well as a struct version and an abstract version.



So, here's what happens with the interface version. You can see it is relatively inefficient because it performs two levels of indirection:



 return obj.SomeWork(input, step);
sub esp,40h
vzeroupper
vmovaps xmmword ptr [rsp+30h],xmm6
vmovaps xmmword ptr [rsp+20h],xmm7
mov rsi,rcx
vmovsd qword ptr [rsp+60h],xmm2
vmovaps xmm6,xmm1
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov r11,7FFED7980020h ; load vtable address of the IMathFunction.Calculate function.
cmp dword ptr [rcx],ecx
call qword ptr [r11] ; call IMathFunction.Calculate function which will call the actual Calculate via vtable.
vmovaps xmm7,xmm0
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov r11,7FFED7980028h ; load vtable address of the IMathFunction.Derivate function.
cmp dword ptr [rcx],ecx
call qword ptr [r11] ; call IMathFunction.Derivate function which will call the actual Derivate via vtable.
vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
vmovaps xmm0,xmm7
vmovaps xmm6,xmmword ptr [rsp+30h]
vmovaps xmm7,xmmword ptr [rsp+20h]
add rsp,40h
pop rsi
ret


Here's an abstract class. It's a little more efficient but only negligibly:



 return obj.SomeWork(input, step);
sub esp,40h
vzeroupper
vmovaps xmmword ptr [rsp+30h],xmm6
vmovaps xmmword ptr [rsp+20h],xmm7
mov rsi,rcx
vmovsd qword ptr [rsp+60h],xmm2
vmovaps xmm6,xmm1
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
call qword ptr [rax+20h] ; call Calculate via offset 0x20 of vtable.
vmovaps xmm7,xmm0
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
call qword ptr [rax+28h] ; call Derivate via offset 0x28 of vtable.
vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
vmovaps xmm0,xmm7
vmovaps xmm6,xmmword ptr [rsp+30h]
vmovaps xmm7,xmmword ptr [rsp+20h]
add rsp,40h
pop rsi
ret


So both an interface and an abstract class rely heavily on branch target prediction to have acceptable performance. Even then, you can see there's quite a lot more going into it, so the best-case is still relatively slow while the worst-case is a stalled pipeline due to a mispredict.



And finally here's the generic version with a struct. You can see it's massively more efficient because everything has been fully inlined so there's no branch prediction involved. It also has the nice side effect of removing most of the stack/parameter management that was in there too, so the code becomes very compact:



 return obj.SomeWork(input, step);
push rax
vzeroupper
movsx rax,byte ptr [rcx+8]
vmovaps xmm0,xmm1
vaddsd xmm0,xmm0,xmm1 ; Calculate - got inlined
vmulsd xmm1,xmm1,xmm1 ; Derivate - got inlined
vmulsd xmm1,xmm1,xmm2 ; dv * step
vsubsd xmm0,xmm0,xmm1 ; f -
add rsp,8
ret





share|improve this answer






















  • This is a very clever solution! Thanks for the post. It absolutely works for classes where mathFunction_ is readonly, but I also have cases where it is a mutable field. Nevertheless, I will experiment with this.
    – Haus
    Dec 14 at 19:45










  • What is that trailing underscore there for?
    – Robert Harvey
    Dec 14 at 22:16










  • @RobertHarvey it's part of the field's name
    – Cory Nelson
    Dec 14 at 22:20






  • 3




    @RobertHarvey i assume it is a convention denoting a private field. It is part of the question's code.
    – Cory Nelson
    Dec 14 at 22:26






  • 2




    I don't think this will actually do anything to optimize it. From what I understand of CLR internals (and, admittedly it may have changed in the last 3 or so years), the CLR will only generate one version of the generic class internally. i.e. it will generate an IMathFunction call rather than a direct call. If TMathFunction is a struct, it will generate separate code for each type of TMathFunction. The only way I could think to optimize it all the way would be to use an abstract base class (because virtual calls are faster than interface calls) or require where T: struct, IMathFunction.
    – Colorfully Monochrome
    Dec 14 at 23:28














34












34








34






So this has obvious limitations and should not be used all the time anywhere you have an interface, but if you have a place where perf really needs to be maximized you can use generics:



public SomeObject<TMathFunction> where TMathFunction: struct, IMathFunction 

private readonly TMathFunction mathFunction_;

public double SomeWork(double input, double step)

var f = mathFunction_.Calculate(input);
var dv = mathFunction_.Derivate(input);
return f - (dv * step);




And instead of passing an interface, pass your implementation as TMathFunction. This will avoid vtable lookups due to an interface and also allow inlining.



Note the use of struct is important here, as generics will otherwise access the class via the interface.



Some implementation:



I made a simple implementation of IMathFunction for testing:



class SomeImplementationByRef : IMathFunction

public double Calculate(double input)

return input + input;


public double Derivate(double input)

return input * input;




... as well as a struct version and an abstract version.



So, here's what happens with the interface version. You can see it is relatively inefficient because it performs two levels of indirection:



 return obj.SomeWork(input, step);
sub esp,40h
vzeroupper
vmovaps xmmword ptr [rsp+30h],xmm6
vmovaps xmmword ptr [rsp+20h],xmm7
mov rsi,rcx
vmovsd qword ptr [rsp+60h],xmm2
vmovaps xmm6,xmm1
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov r11,7FFED7980020h ; load vtable address of the IMathFunction.Calculate function.
cmp dword ptr [rcx],ecx
call qword ptr [r11] ; call IMathFunction.Calculate function which will call the actual Calculate via vtable.
vmovaps xmm7,xmm0
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov r11,7FFED7980028h ; load vtable address of the IMathFunction.Derivate function.
cmp dword ptr [rcx],ecx
call qword ptr [r11] ; call IMathFunction.Derivate function which will call the actual Derivate via vtable.
vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
vmovaps xmm0,xmm7
vmovaps xmm6,xmmword ptr [rsp+30h]
vmovaps xmm7,xmmword ptr [rsp+20h]
add rsp,40h
pop rsi
ret


Here's an abstract class. It's a little more efficient but only negligibly:



 return obj.SomeWork(input, step);
sub esp,40h
vzeroupper
vmovaps xmmword ptr [rsp+30h],xmm6
vmovaps xmmword ptr [rsp+20h],xmm7
mov rsi,rcx
vmovsd qword ptr [rsp+60h],xmm2
vmovaps xmm6,xmm1
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
call qword ptr [rax+20h] ; call Calculate via offset 0x20 of vtable.
vmovaps xmm7,xmm0
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
call qword ptr [rax+28h] ; call Derivate via offset 0x28 of vtable.
vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
vmovaps xmm0,xmm7
vmovaps xmm6,xmmword ptr [rsp+30h]
vmovaps xmm7,xmmword ptr [rsp+20h]
add rsp,40h
pop rsi
ret


So both an interface and an abstract class rely heavily on branch target prediction to have acceptable performance. Even then, you can see there's quite a lot more going into it, so the best-case is still relatively slow while the worst-case is a stalled pipeline due to a mispredict.



And finally here's the generic version with a struct. You can see it's massively more efficient because everything has been fully inlined so there's no branch prediction involved. It also has the nice side effect of removing most of the stack/parameter management that was in there too, so the code becomes very compact:



 return obj.SomeWork(input, step);
push rax
vzeroupper
movsx rax,byte ptr [rcx+8]
vmovaps xmm0,xmm1
vaddsd xmm0,xmm0,xmm1 ; Calculate - got inlined
vmulsd xmm1,xmm1,xmm1 ; Derivate - got inlined
vmulsd xmm1,xmm1,xmm2 ; dv * step
vsubsd xmm0,xmm0,xmm1 ; f -
add rsp,8
ret





share|improve this answer














So this has obvious limitations and should not be used all the time anywhere you have an interface, but if you have a place where perf really needs to be maximized you can use generics:



public SomeObject<TMathFunction> where TMathFunction: struct, IMathFunction 

private readonly TMathFunction mathFunction_;

public double SomeWork(double input, double step)

var f = mathFunction_.Calculate(input);
var dv = mathFunction_.Derivate(input);
return f - (dv * step);




And instead of passing an interface, pass your implementation as TMathFunction. This will avoid vtable lookups due to an interface and also allow inlining.



Note the use of struct is important here, as generics will otherwise access the class via the interface.



Some implementation:



I made a simple implementation of IMathFunction for testing:



class SomeImplementationByRef : IMathFunction

public double Calculate(double input)

return input + input;


public double Derivate(double input)

return input * input;




... as well as a struct version and an abstract version.



So, here's what happens with the interface version. You can see it is relatively inefficient because it performs two levels of indirection:



 return obj.SomeWork(input, step);
sub esp,40h
vzeroupper
vmovaps xmmword ptr [rsp+30h],xmm6
vmovaps xmmword ptr [rsp+20h],xmm7
mov rsi,rcx
vmovsd qword ptr [rsp+60h],xmm2
vmovaps xmm6,xmm1
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov r11,7FFED7980020h ; load vtable address of the IMathFunction.Calculate function.
cmp dword ptr [rcx],ecx
call qword ptr [r11] ; call IMathFunction.Calculate function which will call the actual Calculate via vtable.
vmovaps xmm7,xmm0
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov r11,7FFED7980028h ; load vtable address of the IMathFunction.Derivate function.
cmp dword ptr [rcx],ecx
call qword ptr [r11] ; call IMathFunction.Derivate function which will call the actual Derivate via vtable.
vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
vmovaps xmm0,xmm7
vmovaps xmm6,xmmword ptr [rsp+30h]
vmovaps xmm7,xmmword ptr [rsp+20h]
add rsp,40h
pop rsi
ret


Here's an abstract class. It's a little more efficient but only negligibly:



 return obj.SomeWork(input, step);
sub esp,40h
vzeroupper
vmovaps xmmword ptr [rsp+30h],xmm6
vmovaps xmmword ptr [rsp+20h],xmm7
mov rsi,rcx
vmovsd qword ptr [rsp+60h],xmm2
vmovaps xmm6,xmm1
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
call qword ptr [rax+20h] ; call Calculate via offset 0x20 of vtable.
vmovaps xmm7,xmm0
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
call qword ptr [rax+28h] ; call Derivate via offset 0x28 of vtable.
vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
vmovaps xmm0,xmm7
vmovaps xmm6,xmmword ptr [rsp+30h]
vmovaps xmm7,xmmword ptr [rsp+20h]
add rsp,40h
pop rsi
ret


So both an interface and an abstract class rely heavily on branch target prediction to have acceptable performance. Even then, you can see there's quite a lot more going into it, so the best-case is still relatively slow while the worst-case is a stalled pipeline due to a mispredict.



And finally here's the generic version with a struct. You can see it's massively more efficient because everything has been fully inlined so there's no branch prediction involved. It also has the nice side effect of removing most of the stack/parameter management that was in there too, so the code becomes very compact:



 return obj.SomeWork(input, step);
push rax
vzeroupper
movsx rax,byte ptr [rcx+8]
vmovaps xmm0,xmm1
vaddsd xmm0,xmm0,xmm1 ; Calculate - got inlined
vmulsd xmm1,xmm1,xmm1 ; Derivate - got inlined
vmulsd xmm1,xmm1,xmm2 ; dv * step
vsubsd xmm0,xmm0,xmm1 ; f -
add rsp,8
ret






share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 19 at 18:24

























answered Dec 14 at 19:40









Cory Nelson

22.8k24983




22.8k24983











  • This is a very clever solution! Thanks for the post. It absolutely works for classes where mathFunction_ is readonly, but I also have cases where it is a mutable field. Nevertheless, I will experiment with this.
    – Haus
    Dec 14 at 19:45










  • What is that trailing underscore there for?
    – Robert Harvey
    Dec 14 at 22:16










  • @RobertHarvey it's part of the field's name
    – Cory Nelson
    Dec 14 at 22:20






  • 3




    @RobertHarvey i assume it is a convention denoting a private field. It is part of the question's code.
    – Cory Nelson
    Dec 14 at 22:26






  • 2




    I don't think this will actually do anything to optimize it. From what I understand of CLR internals (and, admittedly it may have changed in the last 3 or so years), the CLR will only generate one version of the generic class internally. i.e. it will generate an IMathFunction call rather than a direct call. If TMathFunction is a struct, it will generate separate code for each type of TMathFunction. The only way I could think to optimize it all the way would be to use an abstract base class (because virtual calls are faster than interface calls) or require where T: struct, IMathFunction.
    – Colorfully Monochrome
    Dec 14 at 23:28

















  • This is a very clever solution! Thanks for the post. It absolutely works for classes where mathFunction_ is readonly, but I also have cases where it is a mutable field. Nevertheless, I will experiment with this.
    – Haus
    Dec 14 at 19:45










  • What is that trailing underscore there for?
    – Robert Harvey
    Dec 14 at 22:16










  • @RobertHarvey it's part of the field's name
    – Cory Nelson
    Dec 14 at 22:20






  • 3




    @RobertHarvey i assume it is a convention denoting a private field. It is part of the question's code.
    – Cory Nelson
    Dec 14 at 22:26






  • 2




    I don't think this will actually do anything to optimize it. From what I understand of CLR internals (and, admittedly it may have changed in the last 3 or so years), the CLR will only generate one version of the generic class internally. i.e. it will generate an IMathFunction call rather than a direct call. If TMathFunction is a struct, it will generate separate code for each type of TMathFunction. The only way I could think to optimize it all the way would be to use an abstract base class (because virtual calls are faster than interface calls) or require where T: struct, IMathFunction.
    – Colorfully Monochrome
    Dec 14 at 23:28
















This is a very clever solution! Thanks for the post. It absolutely works for classes where mathFunction_ is readonly, but I also have cases where it is a mutable field. Nevertheless, I will experiment with this.
– Haus
Dec 14 at 19:45




This is a very clever solution! Thanks for the post. It absolutely works for classes where mathFunction_ is readonly, but I also have cases where it is a mutable field. Nevertheless, I will experiment with this.
– Haus
Dec 14 at 19:45












What is that trailing underscore there for?
– Robert Harvey
Dec 14 at 22:16




What is that trailing underscore there for?
– Robert Harvey
Dec 14 at 22:16












@RobertHarvey it's part of the field's name
– Cory Nelson
Dec 14 at 22:20




@RobertHarvey it's part of the field's name
– Cory Nelson
Dec 14 at 22:20




3




3




@RobertHarvey i assume it is a convention denoting a private field. It is part of the question's code.
– Cory Nelson
Dec 14 at 22:26




@RobertHarvey i assume it is a convention denoting a private field. It is part of the question's code.
– Cory Nelson
Dec 14 at 22:26




2




2




I don't think this will actually do anything to optimize it. From what I understand of CLR internals (and, admittedly it may have changed in the last 3 or so years), the CLR will only generate one version of the generic class internally. i.e. it will generate an IMathFunction call rather than a direct call. If TMathFunction is a struct, it will generate separate code for each type of TMathFunction. The only way I could think to optimize it all the way would be to use an abstract base class (because virtual calls are faster than interface calls) or require where T: struct, IMathFunction.
– Colorfully Monochrome
Dec 14 at 23:28





I don't think this will actually do anything to optimize it. From what I understand of CLR internals (and, admittedly it may have changed in the last 3 or so years), the CLR will only generate one version of the generic class internally. i.e. it will generate an IMathFunction call rather than a direct call. If TMathFunction is a struct, it will generate separate code for each type of TMathFunction. The only way I could think to optimize it all the way would be to use an abstract base class (because virtual calls are faster than interface calls) or require where T: struct, IMathFunction.
– Colorfully Monochrome
Dec 14 at 23:28














9














I would assign the methods to delegates. This allows you to still program against the interface, while avoiding the interface method resolution.



public SomeObject

private readonly Func<double, double> _calculate;
private readonly Func<double, double> _derivate;

public SomeObject(IMathFunction mathFunction)

_calculate = mathFunction.Calculate;
_derivate = mathFunction.Derivate;


public double SomeWork(double input, double step)

var f = _calculate(input);
var dv = _derivate(input);
return f - (dv * step);





In response to @CoryNelson's comment I made tests so see what the impact really is. I have sealed the function class, but this seems to make absolutely no difference since my methods are not virtual.



Test Results (mean time of 100 million iterations in ns) with the empty method time subtracted in braces:




Empty Work method: 1.48

Interface: 5.69 (4.21)

Delegates: 5.78 (4.30)

Sealed Class: 2.10 (0.62)

Class: 2.12 (0.64)




The delegate version time is about the same as for the interface version (the exact times vary from test execution to test execution). While working against the class is about 6.8 x faster (comparing times minus the empty work method time)! This means that my suggestion to work with delegates was not helpful!



What surprised me was, that I expected a much longer execution time for the interface version. Since this kind of test does not represent the exact context of the OP's code, its validity is limited.



static class TimingInterfaceVsDelegateCalls

const int N = 100_000_000;
const double msToNs = 1e6 / N;

static SquareFunctionSealed _mathFunctionClassSealed;
static SquareFunction _mathFunctionClass;
static IMathFunction _mathFunctionInterface;
static Func<double, double> _calculate;
static Func<double, double> _derivate;

static TimingInterfaceVsDelegateCalls()

_mathFunctionClass = new SquareFunction();
_mathFunctionClassSealed = new SquareFunctionSealed();
_mathFunctionInterface = _mathFunctionClassSealed;
_calculate = _mathFunctionInterface.Calculate;
_derivate = _mathFunctionInterface.Derivate;


interface IMathFunction

double Calculate(double input);
double Derivate(double input);


sealed class SquareFunctionSealed : IMathFunction

public double Calculate(double input)

return input * input;


public double Derivate(double input)

return 2 * input;



class SquareFunction : IMathFunction

public double Calculate(double input)

return input * input;


public double Derivate(double input)

return 2 * input;



public static void Test()

var stopWatch = new Stopwatch();

stopWatch.Start();
for (int i = 0; i < N; i++)
double result = SomeWorkEmpty(i);

stopWatch.Stop();
double emptyTime = stopWatch.ElapsedMilliseconds * msToNs;
Console.WriteLine($"Empty Work method: emptyTime:n2");

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkInterface(i);

stopWatch.Stop();
PrintResult("Interface", stopWatch.ElapsedMilliseconds, emptyTime);

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkDelegate(i);

stopWatch.Stop();
PrintResult("Delegates", stopWatch.ElapsedMilliseconds, emptyTime);

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkClassSealed(i);

stopWatch.Stop();
PrintResult("Sealed Class", stopWatch.ElapsedMilliseconds, emptyTime);

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkClass(i);

stopWatch.Stop();
PrintResult("Class", stopWatch.ElapsedMilliseconds, emptyTime);


private static void PrintResult(string text, long elapsed, double emptyTime)

Console.WriteLine($"text: elapsed * msToNs:n2 (elapsed * msToNs - emptyTime:n2)");


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkEmpty(int i)

return 0.0;


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkInterface(int i)

double f = _mathFunctionInterface.Calculate(i);
double dv = _mathFunctionInterface.Derivate(i);
return f - (dv * 12.34534);


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkDelegate(int i)

double f = _calculate(i);
double dv = _derivate(i);
return f - (dv * 12.34534);


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkClassSealed(int i)

double f = _mathFunctionClassSealed.Calculate(i);
double dv = _mathFunctionClassSealed.Derivate(i);
return f - (dv * 12.34534);


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkClass(int i)

double f = _mathFunctionClass.Calculate(i);
double dv = _mathFunctionClass.Derivate(i);
return f - (dv * 12.34534);




The idea of [MethodImpl(MethodImplOptions.NoInlining)] is to prevent the compiler from calculating the addresses of the methods before the loop if the method was inlined.






share|improve this answer


















  • 1




    Does this have any meaningful effect? At first glance, it appears you're substituting the interface vtable for a custom made one.
    – Cory Nelson
    Dec 14 at 20:41











  • Sorry. my previous version was done in Debug mode. This final verion is now in Release mode.
    – Olivier Jacot-Descombes
    Dec 14 at 21:49










  • @CoryNelson: In response to your comments, I have done some tests. It does indeed have no noticable effects.
    – Olivier Jacot-Descombes
    Dec 14 at 22:16















9














I would assign the methods to delegates. This allows you to still program against the interface, while avoiding the interface method resolution.



public SomeObject

private readonly Func<double, double> _calculate;
private readonly Func<double, double> _derivate;

public SomeObject(IMathFunction mathFunction)

_calculate = mathFunction.Calculate;
_derivate = mathFunction.Derivate;


public double SomeWork(double input, double step)

var f = _calculate(input);
var dv = _derivate(input);
return f - (dv * step);





In response to @CoryNelson's comment I made tests so see what the impact really is. I have sealed the function class, but this seems to make absolutely no difference since my methods are not virtual.



Test Results (mean time of 100 million iterations in ns) with the empty method time subtracted in braces:




Empty Work method: 1.48

Interface: 5.69 (4.21)

Delegates: 5.78 (4.30)

Sealed Class: 2.10 (0.62)

Class: 2.12 (0.64)




The delegate version time is about the same as for the interface version (the exact times vary from test execution to test execution). While working against the class is about 6.8 x faster (comparing times minus the empty work method time)! This means that my suggestion to work with delegates was not helpful!



What surprised me was, that I expected a much longer execution time for the interface version. Since this kind of test does not represent the exact context of the OP's code, its validity is limited.



static class TimingInterfaceVsDelegateCalls

const int N = 100_000_000;
const double msToNs = 1e6 / N;

static SquareFunctionSealed _mathFunctionClassSealed;
static SquareFunction _mathFunctionClass;
static IMathFunction _mathFunctionInterface;
static Func<double, double> _calculate;
static Func<double, double> _derivate;

static TimingInterfaceVsDelegateCalls()

_mathFunctionClass = new SquareFunction();
_mathFunctionClassSealed = new SquareFunctionSealed();
_mathFunctionInterface = _mathFunctionClassSealed;
_calculate = _mathFunctionInterface.Calculate;
_derivate = _mathFunctionInterface.Derivate;


interface IMathFunction

double Calculate(double input);
double Derivate(double input);


sealed class SquareFunctionSealed : IMathFunction

public double Calculate(double input)

return input * input;


public double Derivate(double input)

return 2 * input;



class SquareFunction : IMathFunction

public double Calculate(double input)

return input * input;


public double Derivate(double input)

return 2 * input;



public static void Test()

var stopWatch = new Stopwatch();

stopWatch.Start();
for (int i = 0; i < N; i++)
double result = SomeWorkEmpty(i);

stopWatch.Stop();
double emptyTime = stopWatch.ElapsedMilliseconds * msToNs;
Console.WriteLine($"Empty Work method: emptyTime:n2");

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkInterface(i);

stopWatch.Stop();
PrintResult("Interface", stopWatch.ElapsedMilliseconds, emptyTime);

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkDelegate(i);

stopWatch.Stop();
PrintResult("Delegates", stopWatch.ElapsedMilliseconds, emptyTime);

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkClassSealed(i);

stopWatch.Stop();
PrintResult("Sealed Class", stopWatch.ElapsedMilliseconds, emptyTime);

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkClass(i);

stopWatch.Stop();
PrintResult("Class", stopWatch.ElapsedMilliseconds, emptyTime);


private static void PrintResult(string text, long elapsed, double emptyTime)

Console.WriteLine($"text: elapsed * msToNs:n2 (elapsed * msToNs - emptyTime:n2)");


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkEmpty(int i)

return 0.0;


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkInterface(int i)

double f = _mathFunctionInterface.Calculate(i);
double dv = _mathFunctionInterface.Derivate(i);
return f - (dv * 12.34534);


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkDelegate(int i)

double f = _calculate(i);
double dv = _derivate(i);
return f - (dv * 12.34534);


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkClassSealed(int i)

double f = _mathFunctionClassSealed.Calculate(i);
double dv = _mathFunctionClassSealed.Derivate(i);
return f - (dv * 12.34534);


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkClass(int i)

double f = _mathFunctionClass.Calculate(i);
double dv = _mathFunctionClass.Derivate(i);
return f - (dv * 12.34534);




The idea of [MethodImpl(MethodImplOptions.NoInlining)] is to prevent the compiler from calculating the addresses of the methods before the loop if the method was inlined.






share|improve this answer


















  • 1




    Does this have any meaningful effect? At first glance, it appears you're substituting the interface vtable for a custom made one.
    – Cory Nelson
    Dec 14 at 20:41











  • Sorry. my previous version was done in Debug mode. This final verion is now in Release mode.
    – Olivier Jacot-Descombes
    Dec 14 at 21:49










  • @CoryNelson: In response to your comments, I have done some tests. It does indeed have no noticable effects.
    – Olivier Jacot-Descombes
    Dec 14 at 22:16













9












9








9






I would assign the methods to delegates. This allows you to still program against the interface, while avoiding the interface method resolution.



public SomeObject

private readonly Func<double, double> _calculate;
private readonly Func<double, double> _derivate;

public SomeObject(IMathFunction mathFunction)

_calculate = mathFunction.Calculate;
_derivate = mathFunction.Derivate;


public double SomeWork(double input, double step)

var f = _calculate(input);
var dv = _derivate(input);
return f - (dv * step);





In response to @CoryNelson's comment I made tests so see what the impact really is. I have sealed the function class, but this seems to make absolutely no difference since my methods are not virtual.



Test Results (mean time of 100 million iterations in ns) with the empty method time subtracted in braces:




Empty Work method: 1.48

Interface: 5.69 (4.21)

Delegates: 5.78 (4.30)

Sealed Class: 2.10 (0.62)

Class: 2.12 (0.64)




The delegate version time is about the same as for the interface version (the exact times vary from test execution to test execution). While working against the class is about 6.8 x faster (comparing times minus the empty work method time)! This means that my suggestion to work with delegates was not helpful!



What surprised me was, that I expected a much longer execution time for the interface version. Since this kind of test does not represent the exact context of the OP's code, its validity is limited.



static class TimingInterfaceVsDelegateCalls

const int N = 100_000_000;
const double msToNs = 1e6 / N;

static SquareFunctionSealed _mathFunctionClassSealed;
static SquareFunction _mathFunctionClass;
static IMathFunction _mathFunctionInterface;
static Func<double, double> _calculate;
static Func<double, double> _derivate;

static TimingInterfaceVsDelegateCalls()

_mathFunctionClass = new SquareFunction();
_mathFunctionClassSealed = new SquareFunctionSealed();
_mathFunctionInterface = _mathFunctionClassSealed;
_calculate = _mathFunctionInterface.Calculate;
_derivate = _mathFunctionInterface.Derivate;


interface IMathFunction

double Calculate(double input);
double Derivate(double input);


sealed class SquareFunctionSealed : IMathFunction

public double Calculate(double input)

return input * input;


public double Derivate(double input)

return 2 * input;



class SquareFunction : IMathFunction

public double Calculate(double input)

return input * input;


public double Derivate(double input)

return 2 * input;



public static void Test()

var stopWatch = new Stopwatch();

stopWatch.Start();
for (int i = 0; i < N; i++)
double result = SomeWorkEmpty(i);

stopWatch.Stop();
double emptyTime = stopWatch.ElapsedMilliseconds * msToNs;
Console.WriteLine($"Empty Work method: emptyTime:n2");

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkInterface(i);

stopWatch.Stop();
PrintResult("Interface", stopWatch.ElapsedMilliseconds, emptyTime);

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkDelegate(i);

stopWatch.Stop();
PrintResult("Delegates", stopWatch.ElapsedMilliseconds, emptyTime);

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkClassSealed(i);

stopWatch.Stop();
PrintResult("Sealed Class", stopWatch.ElapsedMilliseconds, emptyTime);

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkClass(i);

stopWatch.Stop();
PrintResult("Class", stopWatch.ElapsedMilliseconds, emptyTime);


private static void PrintResult(string text, long elapsed, double emptyTime)

Console.WriteLine($"text: elapsed * msToNs:n2 (elapsed * msToNs - emptyTime:n2)");


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkEmpty(int i)

return 0.0;


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkInterface(int i)

double f = _mathFunctionInterface.Calculate(i);
double dv = _mathFunctionInterface.Derivate(i);
return f - (dv * 12.34534);


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkDelegate(int i)

double f = _calculate(i);
double dv = _derivate(i);
return f - (dv * 12.34534);


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkClassSealed(int i)

double f = _mathFunctionClassSealed.Calculate(i);
double dv = _mathFunctionClassSealed.Derivate(i);
return f - (dv * 12.34534);


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkClass(int i)

double f = _mathFunctionClass.Calculate(i);
double dv = _mathFunctionClass.Derivate(i);
return f - (dv * 12.34534);




The idea of [MethodImpl(MethodImplOptions.NoInlining)] is to prevent the compiler from calculating the addresses of the methods before the loop if the method was inlined.






share|improve this answer














I would assign the methods to delegates. This allows you to still program against the interface, while avoiding the interface method resolution.



public SomeObject

private readonly Func<double, double> _calculate;
private readonly Func<double, double> _derivate;

public SomeObject(IMathFunction mathFunction)

_calculate = mathFunction.Calculate;
_derivate = mathFunction.Derivate;


public double SomeWork(double input, double step)

var f = _calculate(input);
var dv = _derivate(input);
return f - (dv * step);





In response to @CoryNelson's comment I made tests so see what the impact really is. I have sealed the function class, but this seems to make absolutely no difference since my methods are not virtual.



Test Results (mean time of 100 million iterations in ns) with the empty method time subtracted in braces:




Empty Work method: 1.48

Interface: 5.69 (4.21)

Delegates: 5.78 (4.30)

Sealed Class: 2.10 (0.62)

Class: 2.12 (0.64)




The delegate version time is about the same as for the interface version (the exact times vary from test execution to test execution). While working against the class is about 6.8 x faster (comparing times minus the empty work method time)! This means that my suggestion to work with delegates was not helpful!



What surprised me was, that I expected a much longer execution time for the interface version. Since this kind of test does not represent the exact context of the OP's code, its validity is limited.



static class TimingInterfaceVsDelegateCalls

const int N = 100_000_000;
const double msToNs = 1e6 / N;

static SquareFunctionSealed _mathFunctionClassSealed;
static SquareFunction _mathFunctionClass;
static IMathFunction _mathFunctionInterface;
static Func<double, double> _calculate;
static Func<double, double> _derivate;

static TimingInterfaceVsDelegateCalls()

_mathFunctionClass = new SquareFunction();
_mathFunctionClassSealed = new SquareFunctionSealed();
_mathFunctionInterface = _mathFunctionClassSealed;
_calculate = _mathFunctionInterface.Calculate;
_derivate = _mathFunctionInterface.Derivate;


interface IMathFunction

double Calculate(double input);
double Derivate(double input);


sealed class SquareFunctionSealed : IMathFunction

public double Calculate(double input)

return input * input;


public double Derivate(double input)

return 2 * input;



class SquareFunction : IMathFunction

public double Calculate(double input)

return input * input;


public double Derivate(double input)

return 2 * input;



public static void Test()

var stopWatch = new Stopwatch();

stopWatch.Start();
for (int i = 0; i < N; i++)
double result = SomeWorkEmpty(i);

stopWatch.Stop();
double emptyTime = stopWatch.ElapsedMilliseconds * msToNs;
Console.WriteLine($"Empty Work method: emptyTime:n2");

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkInterface(i);

stopWatch.Stop();
PrintResult("Interface", stopWatch.ElapsedMilliseconds, emptyTime);

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkDelegate(i);

stopWatch.Stop();
PrintResult("Delegates", stopWatch.ElapsedMilliseconds, emptyTime);

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkClassSealed(i);

stopWatch.Stop();
PrintResult("Sealed Class", stopWatch.ElapsedMilliseconds, emptyTime);

stopWatch.Restart();
for (int i = 0; i < N; i++)
double result = SomeWorkClass(i);

stopWatch.Stop();
PrintResult("Class", stopWatch.ElapsedMilliseconds, emptyTime);


private static void PrintResult(string text, long elapsed, double emptyTime)

Console.WriteLine($"text: elapsed * msToNs:n2 (elapsed * msToNs - emptyTime:n2)");


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkEmpty(int i)

return 0.0;


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkInterface(int i)

double f = _mathFunctionInterface.Calculate(i);
double dv = _mathFunctionInterface.Derivate(i);
return f - (dv * 12.34534);


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkDelegate(int i)

double f = _calculate(i);
double dv = _derivate(i);
return f - (dv * 12.34534);


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkClassSealed(int i)

double f = _mathFunctionClassSealed.Calculate(i);
double dv = _mathFunctionClassSealed.Derivate(i);
return f - (dv * 12.34534);


[MethodImpl(MethodImplOptions.NoInlining)]
private static double SomeWorkClass(int i)

double f = _mathFunctionClass.Calculate(i);
double dv = _mathFunctionClass.Derivate(i);
return f - (dv * 12.34534);




The idea of [MethodImpl(MethodImplOptions.NoInlining)] is to prevent the compiler from calculating the addresses of the methods before the loop if the method was inlined.







share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 21 at 17:21

























answered Dec 14 at 20:01









Olivier Jacot-Descombes

65.3k885136




65.3k885136







  • 1




    Does this have any meaningful effect? At first glance, it appears you're substituting the interface vtable for a custom made one.
    – Cory Nelson
    Dec 14 at 20:41











  • Sorry. my previous version was done in Debug mode. This final verion is now in Release mode.
    – Olivier Jacot-Descombes
    Dec 14 at 21:49










  • @CoryNelson: In response to your comments, I have done some tests. It does indeed have no noticable effects.
    – Olivier Jacot-Descombes
    Dec 14 at 22:16












  • 1




    Does this have any meaningful effect? At first glance, it appears you're substituting the interface vtable for a custom made one.
    – Cory Nelson
    Dec 14 at 20:41











  • Sorry. my previous version was done in Debug mode. This final verion is now in Release mode.
    – Olivier Jacot-Descombes
    Dec 14 at 21:49










  • @CoryNelson: In response to your comments, I have done some tests. It does indeed have no noticable effects.
    – Olivier Jacot-Descombes
    Dec 14 at 22:16







1




1




Does this have any meaningful effect? At first glance, it appears you're substituting the interface vtable for a custom made one.
– Cory Nelson
Dec 14 at 20:41





Does this have any meaningful effect? At first glance, it appears you're substituting the interface vtable for a custom made one.
– Cory Nelson
Dec 14 at 20:41













Sorry. my previous version was done in Debug mode. This final verion is now in Release mode.
– Olivier Jacot-Descombes
Dec 14 at 21:49




Sorry. my previous version was done in Debug mode. This final verion is now in Release mode.
– Olivier Jacot-Descombes
Dec 14 at 21:49












@CoryNelson: In response to your comments, I have done some tests. It does indeed have no noticable effects.
– Olivier Jacot-Descombes
Dec 14 at 22:16




@CoryNelson: In response to your comments, I have done some tests. It does indeed have no noticable effects.
– Olivier Jacot-Descombes
Dec 14 at 22:16

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53785910%2favoiding-the-overhead-of-c-sharp-virtual-calls%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown






Popular posts from this blog

How to check contact read email or not when send email to Individual?

How many registers does an x86_64 CPU actually have?

Nur Jahan