Avoiding the overhead of C# virtual calls

I have a few heavily optimized math functions that take 1-2 nanoseconds to complete. These functions are called hundreds of millions of times per second, so call overhead is a concern, despite the already-excellent performance.

In order to keep the program maintainable, the classes that provide these methods inherit an IMathFunction interface, so that other objects can directly store a specific math function and use it when needed.

public interface IMathFunction

 double Calculate(double input);
 double Derivate(double input);


public SomeObject

 // Note: There are cases where this is mutable
 private readonly IMathFunction mathFunction_; 

 public double SomeWork(double input, double step)
 
 var f = mathFunction_.Calculate(input);
 var dv = mathFunction_.Derivate(input);
 return f - (dv * step);

This interface is causing an enormous overhead compared to a direct call due to how the consuming code uses it. A direct call takes 1-2ns, whereas the virtual interface call takes 8-9ns. Evidently, the presence of the interface and its subsequent translation of the virtual call is the bottleneck for this scenario.

I would like to retain both maintainability and performance if possible. Is there a way I can resolve the virtual function to a direct call when the object is instantiated so that all subsequent calls are able to avoid the overhead? I assume this would involve creating delegates with IL, but I wouldn't know where to start with that.

edited Dec 14 at 19:46

asked Dec 14 at 19:35

Haus

476112

5

How do you measure the nano second timing?
– z3nth10n
Dec 14 at 19:38

3

@z3nth10n BenchmarkDotNet. It warms up and pre-JIT's everything before benchmarking. This is in release mode. Profiling with dotTrace also shows similar results.
– Haus
Dec 14 at 19:39

1

Have you looked at the generated IL. I'm not an expert in this regard, but my understanding is that both virtual and non-virtual calls in C# and VB use the callvirt instruction (in order to properly fail with the object instance associated with the call is null). Your question surprised me.
– Flydog57
Dec 14 at 19:46

1

Is there any feasible way you can offload the core of the math functions to C or C++? That way you can have a struct with a function pointer in it, which you'd only have to fill once, and you'd never see a vtable lookup (well, in C, not sure about C++). When we're talking nanosecond-level timings, it could be worth it.
– Chris Akridge
Dec 14 at 19:47

4

Cory Nelson's solution is good and that is what I would pursue. It might be worthwhile however to do a quick check and see what the performance of saving the function into a delegate is, and then invoking the delegate. That also has some indirection, but it might be slightly smaller than the interface indirection.
– Eric Lippert
Dec 14 at 19:49

|
show 17 more comments

public interface IMathFunction

 double Calculate(double input);
 double Derivate(double input);


public SomeObject

 // Note: There are cases where this is mutable
 private readonly IMathFunction mathFunction_; 

 public double SomeWork(double input, double step)
 
 var f = mathFunction_.Calculate(input);
 var dv = mathFunction_.Derivate(input);
 return f - (dv * step);

edited Dec 14 at 19:46

asked Dec 14 at 19:35

Haus

476112

5

How do you measure the nano second timing?
– z3nth10n
Dec 14 at 19:38

3

@z3nth10n BenchmarkDotNet. It warms up and pre-JIT's everything before benchmarking. This is in release mode. Profiling with dotTrace also shows similar results.
– Haus
Dec 14 at 19:39

1

Have you looked at the generated IL. I'm not an expert in this regard, but my understanding is that both virtual and non-virtual calls in C# and VB use the callvirt instruction (in order to properly fail with the object instance associated with the call is null). Your question surprised me.
– Flydog57
Dec 14 at 19:46

1

Is there any feasible way you can offload the core of the math functions to C or C++? That way you can have a struct with a function pointer in it, which you'd only have to fill once, and you'd never see a vtable lookup (well, in C, not sure about C++). When we're talking nanosecond-level timings, it could be worth it.
– Chris Akridge
Dec 14 at 19:47

4

Cory Nelson's solution is good and that is what I would pursue. It might be worthwhile however to do a quick check and see what the performance of saving the function into a delegate is, and then invoking the delegate. That also has some indirection, but it might be slightly smaller than the interface indirection.
– Eric Lippert
Dec 14 at 19:49

|
show 17 more comments

public interface IMathFunction

 double Calculate(double input);
 double Derivate(double input);


public SomeObject

 // Note: There are cases where this is mutable
 private readonly IMathFunction mathFunction_; 

 public double SomeWork(double input, double step)
 
 var f = mathFunction_.Calculate(input);
 var dv = mathFunction_.Derivate(input);
 return f - (dv * step);

edited Dec 14 at 19:46

asked Dec 14 at 19:35

Haus

476112

public interface IMathFunction

 double Calculate(double input);
 double Derivate(double input);


public SomeObject

 // Note: There are cases where this is mutable
 private readonly IMathFunction mathFunction_; 

 public double SomeWork(double input, double step)
 
 var f = mathFunction_.Calculate(input);
 var dv = mathFunction_.Derivate(input);
 return f - (dv * step);

c# virtual-functions micro-optimization

edited Dec 14 at 19:46

asked Dec 14 at 19:35

Haus

476112

edited Dec 14 at 19:46

asked Dec 14 at 19:35

Haus

476112

edited Dec 14 at 19:46

asked Dec 14 at 19:35

Haus

476112

asked Dec 14 at 19:35

Haus

476112

asked Dec 14 at 19:35

Haus

476112

5

How do you measure the nano second timing?
– z3nth10n
Dec 14 at 19:38

3

@z3nth10n BenchmarkDotNet. It warms up and pre-JIT's everything before benchmarking. This is in release mode. Profiling with dotTrace also shows similar results.
– Haus
Dec 14 at 19:39

1

Have you looked at the generated IL. I'm not an expert in this regard, but my understanding is that both virtual and non-virtual calls in C# and VB use the callvirt instruction (in order to properly fail with the object instance associated with the call is null). Your question surprised me.
– Flydog57
Dec 14 at 19:46

1

Is there any feasible way you can offload the core of the math functions to C or C++? That way you can have a struct with a function pointer in it, which you'd only have to fill once, and you'd never see a vtable lookup (well, in C, not sure about C++). When we're talking nanosecond-level timings, it could be worth it.
– Chris Akridge
Dec 14 at 19:47

4

Cory Nelson's solution is good and that is what I would pursue. It might be worthwhile however to do a quick check and see what the performance of saving the function into a delegate is, and then invoking the delegate. That also has some indirection, but it might be slightly smaller than the interface indirection.
– Eric Lippert
Dec 14 at 19:49

|
show 17 more comments

5

How do you measure the nano second timing?
– z3nth10n
Dec 14 at 19:38

3

@z3nth10n BenchmarkDotNet. It warms up and pre-JIT's everything before benchmarking. This is in release mode. Profiling with dotTrace also shows similar results.
– Haus
Dec 14 at 19:39

1

Have you looked at the generated IL. I'm not an expert in this regard, but my understanding is that both virtual and non-virtual calls in C# and VB use the callvirt instruction (in order to properly fail with the object instance associated with the call is null). Your question surprised me.
– Flydog57
Dec 14 at 19:46

1

Is there any feasible way you can offload the core of the math functions to C or C++? That way you can have a struct with a function pointer in it, which you'd only have to fill once, and you'd never see a vtable lookup (well, in C, not sure about C++). When we're talking nanosecond-level timings, it could be worth it.
– Chris Akridge
Dec 14 at 19:47

4

Cory Nelson's solution is good and that is what I would pursue. It might be worthwhile however to do a quick check and see what the performance of saving the function into a delegate is, and then invoking the delegate. That also has some indirection, but it might be slightly smaller than the interface indirection.
– Eric Lippert
Dec 14 at 19:49

How do you measure the nano second timing?
– z3nth10n
Dec 14 at 19:38

@z3nth10n BenchmarkDotNet. It warms up and pre-JIT's everything before benchmarking. This is in release mode. Profiling with dotTrace also shows similar results.
– Haus
Dec 14 at 19:39

Have you looked at the generated IL. I'm not an expert in this regard, but my understanding is that both virtual and non-virtual calls in C# and VB use the callvirt instruction (in order to properly fail with the object instance associated with the call is null). Your question surprised me.
– Flydog57
Dec 14 at 19:46

Is there any feasible way you can offload the core of the math functions to C or C++? That way you can have a struct with a function pointer in it, which you'd only have to fill once, and you'd never see a vtable lookup (well, in C, not sure about C++). When we're talking nanosecond-level timings, it could be worth it.
– Chris Akridge
Dec 14 at 19:47

Cory Nelson's solution is good and that is what I would pursue. It might be worthwhile however to do a quick check and see what the performance of saving the function into a delegate is, and then invoking the delegate. That also has some indirection, but it might be slightly smaller than the interface indirection.
– Eric Lippert
Dec 14 at 19:49

|
show 17 more comments

2 Answers
2

active

oldest

votes

So this has obvious limitations and should not be used all the time anywhere you have an interface, but if you have a place where perf really needs to be maximized you can use generics:

public SomeObject<TMathFunction> where TMathFunction: struct, IMathFunction 

 private readonly TMathFunction mathFunction_;

 public double SomeWork(double input, double step)
 
 var f = mathFunction_.Calculate(input);
 var dv = mathFunction_.Derivate(input);
 return f - (dv * step);

And instead of passing an interface, pass your implementation as TMathFunction. This will avoid vtable lookups due to an interface and also allow inlining.

Note the use of struct is important here, as generics will otherwise access the class via the interface.

Some implementation:

I made a simple implementation of IMathFunction for testing:

class SomeImplementationByRef : IMathFunction

 public double Calculate(double input)
 
 return input + input;
 

 public double Derivate(double input)
 
 return input * input;

... as well as a struct version and an abstract version.

So, here's what happens with the interface version. You can see it is relatively inefficient because it performs two levels of indirection:

 return obj.SomeWork(input, step);
sub esp,40h 
vzeroupper 
vmovaps xmmword ptr [rsp+30h],xmm6 
vmovaps xmmword ptr [rsp+20h],xmm7 
mov rsi,rcx
vmovsd qword ptr [rsp+60h],xmm2 
vmovaps xmm6,xmm1
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6 
mov r11,7FFED7980020h ; load vtable address of the IMathFunction.Calculate function.
cmp dword ptr [rcx],ecx 
call qword ptr [r11] ; call IMathFunction.Calculate function which will call the actual Calculate via vtable.
vmovaps xmm7,xmm0
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6 
mov r11,7FFED7980028h ; load vtable address of the IMathFunction.Derivate function.
cmp dword ptr [rcx],ecx 
call qword ptr [r11] ; call IMathFunction.Derivate function which will call the actual Derivate via vtable.
vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
vmovaps xmm0,xmm7 
vmovaps xmm6,xmmword ptr [rsp+30h] 
vmovaps xmm7,xmmword ptr [rsp+20h] 
add rsp,40h 
pop rsi 
ret

Here's an abstract class. It's a little more efficient but only negligibly:

 return obj.SomeWork(input, step);
 sub esp,40h 
 vzeroupper 
 vmovaps xmmword ptr [rsp+30h],xmm6 
 vmovaps xmmword ptr [rsp+20h],xmm7 
 mov rsi,rcx 
 vmovsd qword ptr [rsp+60h],xmm2 
 vmovaps xmm6,xmm1 
 mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
 vmovaps xmm1,xmm6 
 mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
 mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
 call qword ptr [rax+20h] ; call Calculate via offset 0x20 of vtable.
 vmovaps xmm7,xmm0 
 mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
 vmovaps xmm1,xmm6 
 mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
 mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
 call qword ptr [rax+28h] ; call Derivate via offset 0x28 of vtable.
 vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
 vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
 vmovaps xmm0,xmm7
 vmovaps xmm6,xmmword ptr [rsp+30h] 
 vmovaps xmm7,xmmword ptr [rsp+20h] 
 add rsp,40h 
 pop rsi 
 ret

So both an interface and an abstract class rely heavily on branch target prediction to have acceptable performance. Even then, you can see there's quite a lot more going into it, so the best-case is still relatively slow while the worst-case is a stalled pipeline due to a mispredict.

And finally here's the generic version with a struct. You can see it's massively more efficient because everything has been fully inlined so there's no branch prediction involved. It also has the nice side effect of removing most of the stack/parameter management that was in there too, so the code becomes very compact:

 return obj.SomeWork(input, step);
push rax 
vzeroupper 
movsx rax,byte ptr [rcx+8] 
vmovaps xmm0,xmm1 
vaddsd xmm0,xmm0,xmm1 ; Calculate - got inlined
vmulsd xmm1,xmm1,xmm1 ; Derivate - got inlined
vmulsd xmm1,xmm1,xmm2 ; dv * step
vsubsd xmm0,xmm0,xmm1 ; f - 
add rsp,8 
ret

edited Dec 19 at 18:24

answered Dec 14 at 19:40

Cory Nelson

22.8k24983

This is a very clever solution! Thanks for the post. It absolutely works for classes where mathFunction_ is readonly, but I also have cases where it is a mutable field. Nevertheless, I will experiment with this.
– Haus
Dec 14 at 19:45

What is that trailing underscore there for?
– Robert Harvey♦
Dec 14 at 22:16

@RobertHarvey it's part of the field's name
– Cory Nelson
Dec 14 at 22:20

3

@RobertHarvey i assume it is a convention denoting a private field. It is part of the question's code.
– Cory Nelson
Dec 14 at 22:26

2

I don't think this will actually do anything to optimize it. From what I understand of CLR internals (and, admittedly it may have changed in the last 3 or so years), the CLR will only generate one version of the generic class internally. i.e. it will generate an IMathFunction call rather than a direct call. If TMathFunction is a struct, it will generate separate code for each type of TMathFunction. The only way I could think to optimize it all the way would be to use an abstract base class (because virtual calls are faster than interface calls) or require where T: struct, IMathFunction.
– Colorfully Monochrome
Dec 14 at 23:28

|
show 3 more comments

I would assign the methods to delegates. This allows you to still program against the interface, while avoiding the interface method resolution.

public SomeObject

 private readonly Func<double, double> _calculate;
 private readonly Func<double, double> _derivate;

 public SomeObject(IMathFunction mathFunction)
 
 _calculate = mathFunction.Calculate;
 _derivate = mathFunction.Derivate;
 

 public double SomeWork(double input, double step)
 
 var f = _calculate(input);
 var dv = _derivate(input);
 return f - (dv * step);

In response to @CoryNelson's comment I made tests so see what the impact really is. I have sealed the function class, but this seems to make absolutely no difference since my methods are not virtual.

Test Results (mean time of 100 million iterations in ns) with the empty method time subtracted in braces:

Empty Work method: 1.48

Interface: 5.69 (4.21)

Delegates: 5.78 (4.30)

Sealed Class: 2.10 (0.62)

Class: 2.12 (0.64)

The delegate version time is about the same as for the interface version (the exact times vary from test execution to test execution). While working against the class is about 6.8 x faster (comparing times minus the empty work method time)! This means that my suggestion to work with delegates was not helpful!

What surprised me was, that I expected a much longer execution time for the interface version. Since this kind of test does not represent the exact context of the OP's code, its validity is limited.

static class TimingInterfaceVsDelegateCalls

 const int N = 100_000_000;
 const double msToNs = 1e6 / N;

 static SquareFunctionSealed _mathFunctionClassSealed;
 static SquareFunction _mathFunctionClass;
 static IMathFunction _mathFunctionInterface;
 static Func<double, double> _calculate;
 static Func<double, double> _derivate;

 static TimingInterfaceVsDelegateCalls()
 
 _mathFunctionClass = new SquareFunction();
 _mathFunctionClassSealed = new SquareFunctionSealed();
 _mathFunctionInterface = _mathFunctionClassSealed;
 _calculate = _mathFunctionInterface.Calculate;
 _derivate = _mathFunctionInterface.Derivate;
 

 interface IMathFunction
 
 double Calculate(double input);
 double Derivate(double input);
 

 sealed class SquareFunctionSealed : IMathFunction
 
 public double Calculate(double input)
 
 return input * input;
 

 public double Derivate(double input)
 
 return 2 * input;
 
 

 class SquareFunction : IMathFunction
 
 public double Calculate(double input)
 
 return input * input;
 

 public double Derivate(double input)
 
 return 2 * input;
 
 

 public static void Test()
 
 var stopWatch = new Stopwatch();

 stopWatch.Start();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkEmpty(i);
 
 stopWatch.Stop();
 double emptyTime = stopWatch.ElapsedMilliseconds * msToNs;
 Console.WriteLine($"Empty Work method: emptyTime:n2");

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkInterface(i);
 
 stopWatch.Stop();
 PrintResult("Interface", stopWatch.ElapsedMilliseconds, emptyTime);

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkDelegate(i);
 
 stopWatch.Stop();
 PrintResult("Delegates", stopWatch.ElapsedMilliseconds, emptyTime);

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkClassSealed(i);
 
 stopWatch.Stop();
 PrintResult("Sealed Class", stopWatch.ElapsedMilliseconds, emptyTime);

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkClass(i);
 
 stopWatch.Stop();
 PrintResult("Class", stopWatch.ElapsedMilliseconds, emptyTime);
 

 private static void PrintResult(string text, long elapsed, double emptyTime)
 
 Console.WriteLine($"text: elapsed * msToNs:n2 (elapsed * msToNs - emptyTime:n2)");
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkEmpty(int i)
 
 return 0.0;
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkInterface(int i)
 
 double f = _mathFunctionInterface.Calculate(i);
 double dv = _mathFunctionInterface.Derivate(i);
 return f - (dv * 12.34534);
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkDelegate(int i)
 
 double f = _calculate(i);
 double dv = _derivate(i);
 return f - (dv * 12.34534);
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkClassSealed(int i)
 
 double f = _mathFunctionClassSealed.Calculate(i);
 double dv = _mathFunctionClassSealed.Derivate(i);
 return f - (dv * 12.34534);
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkClass(int i)
 
 double f = _mathFunctionClass.Calculate(i);
 double dv = _mathFunctionClass.Derivate(i);
 return f - (dv * 12.34534);

The idea of [MethodImpl(MethodImplOptions.NoInlining)] is to prevent the compiler from calculating the addresses of the methods before the loop if the method was inlined.

edited Dec 21 at 17:21

answered Dec 14 at 20:01

Olivier Jacot-Descombes

65.3k885136

1

Does this have any meaningful effect? At first glance, it appears you're substituting the interface vtable for a custom made one.
– Cory Nelson
Dec 14 at 20:41

Sorry. my previous version was done in Debug mode. This final verion is now in Release mode.
– Olivier Jacot-Descombes
Dec 14 at 21:49

@CoryNelson: In response to your comments, I have done some tests. It does indeed have no noticable effects.
– Olivier Jacot-Descombes
Dec 14 at 22:16

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53785910%2favoiding-the-overhead-of-c-sharp-virtual-calls%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

So this has obvious limitations and should not be used all the time anywhere you have an interface, but if you have a place where perf really needs to be maximized you can use generics:

public SomeObject<TMathFunction> where TMathFunction: struct, IMathFunction 

 private readonly TMathFunction mathFunction_;

 public double SomeWork(double input, double step)
 
 var f = mathFunction_.Calculate(input);
 var dv = mathFunction_.Derivate(input);
 return f - (dv * step);

And instead of passing an interface, pass your implementation as TMathFunction. This will avoid vtable lookups due to an interface and also allow inlining.

Note the use of struct is important here, as generics will otherwise access the class via the interface.

Some implementation:

I made a simple implementation of IMathFunction for testing:

class SomeImplementationByRef : IMathFunction

 public double Calculate(double input)
 
 return input + input;
 

 public double Derivate(double input)
 
 return input * input;

... as well as a struct version and an abstract version.

So, here's what happens with the interface version. You can see it is relatively inefficient because it performs two levels of indirection:

 return obj.SomeWork(input, step);
sub esp,40h 
vzeroupper 
vmovaps xmmword ptr [rsp+30h],xmm6 
vmovaps xmmword ptr [rsp+20h],xmm7 
mov rsi,rcx
vmovsd qword ptr [rsp+60h],xmm2 
vmovaps xmm6,xmm1
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6 
mov r11,7FFED7980020h ; load vtable address of the IMathFunction.Calculate function.
cmp dword ptr [rcx],ecx 
call qword ptr [r11] ; call IMathFunction.Calculate function which will call the actual Calculate via vtable.
vmovaps xmm7,xmm0
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6 
mov r11,7FFED7980028h ; load vtable address of the IMathFunction.Derivate function.
cmp dword ptr [rcx],ecx 
call qword ptr [r11] ; call IMathFunction.Derivate function which will call the actual Derivate via vtable.
vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
vmovaps xmm0,xmm7 
vmovaps xmm6,xmmword ptr [rsp+30h] 
vmovaps xmm7,xmmword ptr [rsp+20h] 
add rsp,40h 
pop rsi 
ret

Here's an abstract class. It's a little more efficient but only negligibly:

 return obj.SomeWork(input, step);
 sub esp,40h 
 vzeroupper 
 vmovaps xmmword ptr [rsp+30h],xmm6 
 vmovaps xmmword ptr [rsp+20h],xmm7 
 mov rsi,rcx 
 vmovsd qword ptr [rsp+60h],xmm2 
 vmovaps xmm6,xmm1 
 mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
 vmovaps xmm1,xmm6 
 mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
 mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
 call qword ptr [rax+20h] ; call Calculate via offset 0x20 of vtable.
 vmovaps xmm7,xmm0 
 mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
 vmovaps xmm1,xmm6 
 mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
 mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
 call qword ptr [rax+28h] ; call Derivate via offset 0x28 of vtable.
 vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
 vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
 vmovaps xmm0,xmm7
 vmovaps xmm6,xmmword ptr [rsp+30h] 
 vmovaps xmm7,xmmword ptr [rsp+20h] 
 add rsp,40h 
 pop rsi 
 ret

 return obj.SomeWork(input, step);
push rax 
vzeroupper 
movsx rax,byte ptr [rcx+8] 
vmovaps xmm0,xmm1 
vaddsd xmm0,xmm0,xmm1 ; Calculate - got inlined
vmulsd xmm1,xmm1,xmm1 ; Derivate - got inlined
vmulsd xmm1,xmm1,xmm2 ; dv * step
vsubsd xmm0,xmm0,xmm1 ; f - 
add rsp,8 
ret

edited Dec 19 at 18:24

answered Dec 14 at 19:40

Cory Nelson

22.8k24983

This is a very clever solution! Thanks for the post. It absolutely works for classes where mathFunction_ is readonly, but I also have cases where it is a mutable field. Nevertheless, I will experiment with this.
– Haus
Dec 14 at 19:45

What is that trailing underscore there for?
– Robert Harvey♦
Dec 14 at 22:16

@RobertHarvey it's part of the field's name
– Cory Nelson
Dec 14 at 22:20

3

@RobertHarvey i assume it is a convention denoting a private field. It is part of the question's code.
– Cory Nelson
Dec 14 at 22:26

2

I don't think this will actually do anything to optimize it. From what I understand of CLR internals (and, admittedly it may have changed in the last 3 or so years), the CLR will only generate one version of the generic class internally. i.e. it will generate an IMathFunction call rather than a direct call. If TMathFunction is a struct, it will generate separate code for each type of TMathFunction. The only way I could think to optimize it all the way would be to use an abstract base class (because virtual calls are faster than interface calls) or require where T: struct, IMathFunction.
– Colorfully Monochrome
Dec 14 at 23:28

|
show 3 more comments

So this has obvious limitations and should not be used all the time anywhere you have an interface, but if you have a place where perf really needs to be maximized you can use generics:

public SomeObject<TMathFunction> where TMathFunction: struct, IMathFunction 

 private readonly TMathFunction mathFunction_;

 public double SomeWork(double input, double step)
 
 var f = mathFunction_.Calculate(input);
 var dv = mathFunction_.Derivate(input);
 return f - (dv * step);

And instead of passing an interface, pass your implementation as TMathFunction. This will avoid vtable lookups due to an interface and also allow inlining.

Note the use of struct is important here, as generics will otherwise access the class via the interface.

Some implementation:

I made a simple implementation of IMathFunction for testing:

class SomeImplementationByRef : IMathFunction

 public double Calculate(double input)
 
 return input + input;
 

 public double Derivate(double input)
 
 return input * input;

... as well as a struct version and an abstract version.

So, here's what happens with the interface version. You can see it is relatively inefficient because it performs two levels of indirection:

 return obj.SomeWork(input, step);
sub esp,40h 
vzeroupper 
vmovaps xmmword ptr [rsp+30h],xmm6 
vmovaps xmmword ptr [rsp+20h],xmm7 
mov rsi,rcx
vmovsd qword ptr [rsp+60h],xmm2 
vmovaps xmm6,xmm1
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6 
mov r11,7FFED7980020h ; load vtable address of the IMathFunction.Calculate function.
cmp dword ptr [rcx],ecx 
call qword ptr [r11] ; call IMathFunction.Calculate function which will call the actual Calculate via vtable.
vmovaps xmm7,xmm0
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6 
mov r11,7FFED7980028h ; load vtable address of the IMathFunction.Derivate function.
cmp dword ptr [rcx],ecx 
call qword ptr [r11] ; call IMathFunction.Derivate function which will call the actual Derivate via vtable.
vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
vmovaps xmm0,xmm7 
vmovaps xmm6,xmmword ptr [rsp+30h] 
vmovaps xmm7,xmmword ptr [rsp+20h] 
add rsp,40h 
pop rsi 
ret

Here's an abstract class. It's a little more efficient but only negligibly:

 return obj.SomeWork(input, step);
 sub esp,40h 
 vzeroupper 
 vmovaps xmmword ptr [rsp+30h],xmm6 
 vmovaps xmmword ptr [rsp+20h],xmm7 
 mov rsi,rcx 
 vmovsd qword ptr [rsp+60h],xmm2 
 vmovaps xmm6,xmm1 
 mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
 vmovaps xmm1,xmm6 
 mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
 mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
 call qword ptr [rax+20h] ; call Calculate via offset 0x20 of vtable.
 vmovaps xmm7,xmm0 
 mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
 vmovaps xmm1,xmm6 
 mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
 mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
 call qword ptr [rax+28h] ; call Derivate via offset 0x28 of vtable.
 vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
 vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
 vmovaps xmm0,xmm7
 vmovaps xmm6,xmmword ptr [rsp+30h] 
 vmovaps xmm7,xmmword ptr [rsp+20h] 
 add rsp,40h 
 pop rsi 
 ret

 return obj.SomeWork(input, step);
push rax 
vzeroupper 
movsx rax,byte ptr [rcx+8] 
vmovaps xmm0,xmm1 
vaddsd xmm0,xmm0,xmm1 ; Calculate - got inlined
vmulsd xmm1,xmm1,xmm1 ; Derivate - got inlined
vmulsd xmm1,xmm1,xmm2 ; dv * step
vsubsd xmm0,xmm0,xmm1 ; f - 
add rsp,8 
ret

edited Dec 19 at 18:24

answered Dec 14 at 19:40

Cory Nelson

22.8k24983

This is a very clever solution! Thanks for the post. It absolutely works for classes where mathFunction_ is readonly, but I also have cases where it is a mutable field. Nevertheless, I will experiment with this.
– Haus
Dec 14 at 19:45

What is that trailing underscore there for?
– Robert Harvey♦
Dec 14 at 22:16

@RobertHarvey it's part of the field's name
– Cory Nelson
Dec 14 at 22:20

3

@RobertHarvey i assume it is a convention denoting a private field. It is part of the question's code.
– Cory Nelson
Dec 14 at 22:26

2

I don't think this will actually do anything to optimize it. From what I understand of CLR internals (and, admittedly it may have changed in the last 3 or so years), the CLR will only generate one version of the generic class internally. i.e. it will generate an IMathFunction call rather than a direct call. If TMathFunction is a struct, it will generate separate code for each type of TMathFunction. The only way I could think to optimize it all the way would be to use an abstract base class (because virtual calls are faster than interface calls) or require where T: struct, IMathFunction.
– Colorfully Monochrome
Dec 14 at 23:28

|
show 3 more comments

So this has obvious limitations and should not be used all the time anywhere you have an interface, but if you have a place where perf really needs to be maximized you can use generics:

public SomeObject<TMathFunction> where TMathFunction: struct, IMathFunction 

 private readonly TMathFunction mathFunction_;

 public double SomeWork(double input, double step)
 
 var f = mathFunction_.Calculate(input);
 var dv = mathFunction_.Derivate(input);
 return f - (dv * step);

And instead of passing an interface, pass your implementation as TMathFunction. This will avoid vtable lookups due to an interface and also allow inlining.

Note the use of struct is important here, as generics will otherwise access the class via the interface.

Some implementation:

I made a simple implementation of IMathFunction for testing:

class SomeImplementationByRef : IMathFunction

 public double Calculate(double input)
 
 return input + input;
 

 public double Derivate(double input)
 
 return input * input;

... as well as a struct version and an abstract version.

So, here's what happens with the interface version. You can see it is relatively inefficient because it performs two levels of indirection:

 return obj.SomeWork(input, step);
sub esp,40h 
vzeroupper 
vmovaps xmmword ptr [rsp+30h],xmm6 
vmovaps xmmword ptr [rsp+20h],xmm7 
mov rsi,rcx
vmovsd qword ptr [rsp+60h],xmm2 
vmovaps xmm6,xmm1
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6 
mov r11,7FFED7980020h ; load vtable address of the IMathFunction.Calculate function.
cmp dword ptr [rcx],ecx 
call qword ptr [r11] ; call IMathFunction.Calculate function which will call the actual Calculate via vtable.
vmovaps xmm7,xmm0
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6 
mov r11,7FFED7980028h ; load vtable address of the IMathFunction.Derivate function.
cmp dword ptr [rcx],ecx 
call qword ptr [r11] ; call IMathFunction.Derivate function which will call the actual Derivate via vtable.
vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
vmovaps xmm0,xmm7 
vmovaps xmm6,xmmword ptr [rsp+30h] 
vmovaps xmm7,xmmword ptr [rsp+20h] 
add rsp,40h 
pop rsi 
ret

Here's an abstract class. It's a little more efficient but only negligibly:

 return obj.SomeWork(input, step);
 sub esp,40h 
 vzeroupper 
 vmovaps xmmword ptr [rsp+30h],xmm6 
 vmovaps xmmword ptr [rsp+20h],xmm7 
 mov rsi,rcx 
 vmovsd qword ptr [rsp+60h],xmm2 
 vmovaps xmm6,xmm1 
 mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
 vmovaps xmm1,xmm6 
 mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
 mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
 call qword ptr [rax+20h] ; call Calculate via offset 0x20 of vtable.
 vmovaps xmm7,xmm0 
 mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
 vmovaps xmm1,xmm6 
 mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
 mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
 call qword ptr [rax+28h] ; call Derivate via offset 0x28 of vtable.
 vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
 vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
 vmovaps xmm0,xmm7
 vmovaps xmm6,xmmword ptr [rsp+30h] 
 vmovaps xmm7,xmmword ptr [rsp+20h] 
 add rsp,40h 
 pop rsi 
 ret

 return obj.SomeWork(input, step);
push rax 
vzeroupper 
movsx rax,byte ptr [rcx+8] 
vmovaps xmm0,xmm1 
vaddsd xmm0,xmm0,xmm1 ; Calculate - got inlined
vmulsd xmm1,xmm1,xmm1 ; Derivate - got inlined
vmulsd xmm1,xmm1,xmm2 ; dv * step
vsubsd xmm0,xmm0,xmm1 ; f - 
add rsp,8 
ret

edited Dec 19 at 18:24

answered Dec 14 at 19:40

Cory Nelson

22.8k24983

So this has obvious limitations and should not be used all the time anywhere you have an interface, but if you have a place where perf really needs to be maximized you can use generics:

public SomeObject<TMathFunction> where TMathFunction: struct, IMathFunction 

 private readonly TMathFunction mathFunction_;

 public double SomeWork(double input, double step)
 
 var f = mathFunction_.Calculate(input);
 var dv = mathFunction_.Derivate(input);
 return f - (dv * step);

And instead of passing an interface, pass your implementation as TMathFunction. This will avoid vtable lookups due to an interface and also allow inlining.

Note the use of struct is important here, as generics will otherwise access the class via the interface.

Some implementation:

I made a simple implementation of IMathFunction for testing:

class SomeImplementationByRef : IMathFunction

 public double Calculate(double input)
 
 return input + input;
 

 public double Derivate(double input)
 
 return input * input;

... as well as a struct version and an abstract version.

So, here's what happens with the interface version. You can see it is relatively inefficient because it performs two levels of indirection:

 return obj.SomeWork(input, step);
sub esp,40h 
vzeroupper 
vmovaps xmmword ptr [rsp+30h],xmm6 
vmovaps xmmword ptr [rsp+20h],xmm7 
mov rsi,rcx
vmovsd qword ptr [rsp+60h],xmm2 
vmovaps xmm6,xmm1
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6 
mov r11,7FFED7980020h ; load vtable address of the IMathFunction.Calculate function.
cmp dword ptr [rcx],ecx 
call qword ptr [r11] ; call IMathFunction.Calculate function which will call the actual Calculate via vtable.
vmovaps xmm7,xmm0
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6 
mov r11,7FFED7980028h ; load vtable address of the IMathFunction.Derivate function.
cmp dword ptr [rcx],ecx 
call qword ptr [r11] ; call IMathFunction.Derivate function which will call the actual Derivate via vtable.
vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
vmovaps xmm0,xmm7 
vmovaps xmm6,xmmword ptr [rsp+30h] 
vmovaps xmm7,xmmword ptr [rsp+20h] 
add rsp,40h 
pop rsi 
ret

Here's an abstract class. It's a little more efficient but only negligibly:

 return obj.SomeWork(input, step);
 sub esp,40h 
 vzeroupper 
 vmovaps xmmword ptr [rsp+30h],xmm6 
 vmovaps xmmword ptr [rsp+20h],xmm7 
 mov rsi,rcx 
 vmovsd qword ptr [rsp+60h],xmm2 
 vmovaps xmm6,xmm1 
 mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
 vmovaps xmm1,xmm6 
 mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
 mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
 call qword ptr [rax+20h] ; call Calculate via offset 0x20 of vtable.
 vmovaps xmm7,xmm0 
 mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
 vmovaps xmm1,xmm6 
 mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
 mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
 call qword ptr [rax+28h] ; call Derivate via offset 0x28 of vtable.
 vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
 vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
 vmovaps xmm0,xmm7
 vmovaps xmm6,xmmword ptr [rsp+30h] 
 vmovaps xmm7,xmmword ptr [rsp+20h] 
 add rsp,40h 
 pop rsi 
 ret

 return obj.SomeWork(input, step);
push rax 
vzeroupper 
movsx rax,byte ptr [rcx+8] 
vmovaps xmm0,xmm1 
vaddsd xmm0,xmm0,xmm1 ; Calculate - got inlined
vmulsd xmm1,xmm1,xmm1 ; Derivate - got inlined
vmulsd xmm1,xmm1,xmm2 ; dv * step
vsubsd xmm0,xmm0,xmm1 ; f - 
add rsp,8 
ret

edited Dec 19 at 18:24

answered Dec 14 at 19:40

Cory Nelson

22.8k24983

edited Dec 19 at 18:24

answered Dec 14 at 19:40

Cory Nelson

22.8k24983

answered Dec 14 at 19:40

Cory Nelson

22.8k24983

answered Dec 14 at 19:40

Cory Nelson

22.8k24983

This is a very clever solution! Thanks for the post. It absolutely works for classes where mathFunction_ is readonly, but I also have cases where it is a mutable field. Nevertheless, I will experiment with this.
– Haus
Dec 14 at 19:45

What is that trailing underscore there for?
– Robert Harvey♦
Dec 14 at 22:16

@RobertHarvey it's part of the field's name
– Cory Nelson
Dec 14 at 22:20

3

@RobertHarvey i assume it is a convention denoting a private field. It is part of the question's code.
– Cory Nelson
Dec 14 at 22:26

2

I don't think this will actually do anything to optimize it. From what I understand of CLR internals (and, admittedly it may have changed in the last 3 or so years), the CLR will only generate one version of the generic class internally. i.e. it will generate an IMathFunction call rather than a direct call. If TMathFunction is a struct, it will generate separate code for each type of TMathFunction. The only way I could think to optimize it all the way would be to use an abstract base class (because virtual calls are faster than interface calls) or require where T: struct, IMathFunction.
– Colorfully Monochrome
Dec 14 at 23:28

|
show 3 more comments

This is a very clever solution! Thanks for the post. It absolutely works for classes where mathFunction_ is readonly, but I also have cases where it is a mutable field. Nevertheless, I will experiment with this.
– Haus
Dec 14 at 19:45

What is that trailing underscore there for?
– Robert Harvey♦
Dec 14 at 22:16

@RobertHarvey it's part of the field's name
– Cory Nelson
Dec 14 at 22:20

3

@RobertHarvey i assume it is a convention denoting a private field. It is part of the question's code.
– Cory Nelson
Dec 14 at 22:26

2

I don't think this will actually do anything to optimize it. From what I understand of CLR internals (and, admittedly it may have changed in the last 3 or so years), the CLR will only generate one version of the generic class internally. i.e. it will generate an IMathFunction call rather than a direct call. If TMathFunction is a struct, it will generate separate code for each type of TMathFunction. The only way I could think to optimize it all the way would be to use an abstract base class (because virtual calls are faster than interface calls) or require where T: struct, IMathFunction.
– Colorfully Monochrome
Dec 14 at 23:28

This is a very clever solution! Thanks for the post. It absolutely works for classes where mathFunction_ is readonly, but I also have cases where it is a mutable field. Nevertheless, I will experiment with this.
– Haus
Dec 14 at 19:45

What is that trailing underscore there for?
– Robert Harvey♦
Dec 14 at 22:16

@RobertHarvey it's part of the field's name
– Cory Nelson
Dec 14 at 22:20

@RobertHarvey i assume it is a convention denoting a private field. It is part of the question's code.
– Cory Nelson
Dec 14 at 22:26

I don't think this will actually do anything to optimize it. From what I understand of CLR internals (and, admittedly it may have changed in the last 3 or so years), the CLR will only generate one version of the generic class internally. i.e. it will generate an IMathFunction call rather than a direct call. If TMathFunction is a struct, it will generate separate code for each type of TMathFunction. The only way I could think to optimize it all the way would be to use an abstract base class (because virtual calls are faster than interface calls) or require where T: struct, IMathFunction.
– Colorfully Monochrome
Dec 14 at 23:28

|
show 3 more comments

I would assign the methods to delegates. This allows you to still program against the interface, while avoiding the interface method resolution.

public SomeObject

 private readonly Func<double, double> _calculate;
 private readonly Func<double, double> _derivate;

 public SomeObject(IMathFunction mathFunction)
 
 _calculate = mathFunction.Calculate;
 _derivate = mathFunction.Derivate;
 

 public double SomeWork(double input, double step)
 
 var f = _calculate(input);
 var dv = _derivate(input);
 return f - (dv * step);

In response to @CoryNelson's comment I made tests so see what the impact really is. I have sealed the function class, but this seems to make absolutely no difference since my methods are not virtual.

Test Results (mean time of 100 million iterations in ns) with the empty method time subtracted in braces:

Empty Work method: 1.48

Interface: 5.69 (4.21)

Delegates: 5.78 (4.30)

Sealed Class: 2.10 (0.62)

Class: 2.12 (0.64)

What surprised me was, that I expected a much longer execution time for the interface version. Since this kind of test does not represent the exact context of the OP's code, its validity is limited.

static class TimingInterfaceVsDelegateCalls

 const int N = 100_000_000;
 const double msToNs = 1e6 / N;

 static SquareFunctionSealed _mathFunctionClassSealed;
 static SquareFunction _mathFunctionClass;
 static IMathFunction _mathFunctionInterface;
 static Func<double, double> _calculate;
 static Func<double, double> _derivate;

 static TimingInterfaceVsDelegateCalls()
 
 _mathFunctionClass = new SquareFunction();
 _mathFunctionClassSealed = new SquareFunctionSealed();
 _mathFunctionInterface = _mathFunctionClassSealed;
 _calculate = _mathFunctionInterface.Calculate;
 _derivate = _mathFunctionInterface.Derivate;
 

 interface IMathFunction
 
 double Calculate(double input);
 double Derivate(double input);
 

 sealed class SquareFunctionSealed : IMathFunction
 
 public double Calculate(double input)
 
 return input * input;
 

 public double Derivate(double input)
 
 return 2 * input;
 
 

 class SquareFunction : IMathFunction
 
 public double Calculate(double input)
 
 return input * input;
 

 public double Derivate(double input)
 
 return 2 * input;
 
 

 public static void Test()
 
 var stopWatch = new Stopwatch();

 stopWatch.Start();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkEmpty(i);
 
 stopWatch.Stop();
 double emptyTime = stopWatch.ElapsedMilliseconds * msToNs;
 Console.WriteLine($"Empty Work method: emptyTime:n2");

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkInterface(i);
 
 stopWatch.Stop();
 PrintResult("Interface", stopWatch.ElapsedMilliseconds, emptyTime);

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkDelegate(i);
 
 stopWatch.Stop();
 PrintResult("Delegates", stopWatch.ElapsedMilliseconds, emptyTime);

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkClassSealed(i);
 
 stopWatch.Stop();
 PrintResult("Sealed Class", stopWatch.ElapsedMilliseconds, emptyTime);

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkClass(i);
 
 stopWatch.Stop();
 PrintResult("Class", stopWatch.ElapsedMilliseconds, emptyTime);
 

 private static void PrintResult(string text, long elapsed, double emptyTime)
 
 Console.WriteLine($"text: elapsed * msToNs:n2 (elapsed * msToNs - emptyTime:n2)");
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkEmpty(int i)
 
 return 0.0;
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkInterface(int i)
 
 double f = _mathFunctionInterface.Calculate(i);
 double dv = _mathFunctionInterface.Derivate(i);
 return f - (dv * 12.34534);
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkDelegate(int i)
 
 double f = _calculate(i);
 double dv = _derivate(i);
 return f - (dv * 12.34534);
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkClassSealed(int i)
 
 double f = _mathFunctionClassSealed.Calculate(i);
 double dv = _mathFunctionClassSealed.Derivate(i);
 return f - (dv * 12.34534);
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkClass(int i)
 
 double f = _mathFunctionClass.Calculate(i);
 double dv = _mathFunctionClass.Derivate(i);
 return f - (dv * 12.34534);

The idea of [MethodImpl(MethodImplOptions.NoInlining)] is to prevent the compiler from calculating the addresses of the methods before the loop if the method was inlined.

edited Dec 21 at 17:21

answered Dec 14 at 20:01

Olivier Jacot-Descombes

65.3k885136

1

Does this have any meaningful effect? At first glance, it appears you're substituting the interface vtable for a custom made one.
– Cory Nelson
Dec 14 at 20:41

Sorry. my previous version was done in Debug mode. This final verion is now in Release mode.
– Olivier Jacot-Descombes
Dec 14 at 21:49

@CoryNelson: In response to your comments, I have done some tests. It does indeed have no noticable effects.
– Olivier Jacot-Descombes
Dec 14 at 22:16

add a comment |

I would assign the methods to delegates. This allows you to still program against the interface, while avoiding the interface method resolution.

public SomeObject

 private readonly Func<double, double> _calculate;
 private readonly Func<double, double> _derivate;

 public SomeObject(IMathFunction mathFunction)
 
 _calculate = mathFunction.Calculate;
 _derivate = mathFunction.Derivate;
 

 public double SomeWork(double input, double step)
 
 var f = _calculate(input);
 var dv = _derivate(input);
 return f - (dv * step);

In response to @CoryNelson's comment I made tests so see what the impact really is. I have sealed the function class, but this seems to make absolutely no difference since my methods are not virtual.

Test Results (mean time of 100 million iterations in ns) with the empty method time subtracted in braces:

Empty Work method: 1.48

Interface: 5.69 (4.21)

Delegates: 5.78 (4.30)

Sealed Class: 2.10 (0.62)

Class: 2.12 (0.64)

What surprised me was, that I expected a much longer execution time for the interface version. Since this kind of test does not represent the exact context of the OP's code, its validity is limited.

static class TimingInterfaceVsDelegateCalls

 const int N = 100_000_000;
 const double msToNs = 1e6 / N;

 static SquareFunctionSealed _mathFunctionClassSealed;
 static SquareFunction _mathFunctionClass;
 static IMathFunction _mathFunctionInterface;
 static Func<double, double> _calculate;
 static Func<double, double> _derivate;

 static TimingInterfaceVsDelegateCalls()
 
 _mathFunctionClass = new SquareFunction();
 _mathFunctionClassSealed = new SquareFunctionSealed();
 _mathFunctionInterface = _mathFunctionClassSealed;
 _calculate = _mathFunctionInterface.Calculate;
 _derivate = _mathFunctionInterface.Derivate;
 

 interface IMathFunction
 
 double Calculate(double input);
 double Derivate(double input);
 

 sealed class SquareFunctionSealed : IMathFunction
 
 public double Calculate(double input)
 
 return input * input;
 

 public double Derivate(double input)
 
 return 2 * input;
 
 

 class SquareFunction : IMathFunction
 
 public double Calculate(double input)
 
 return input * input;
 

 public double Derivate(double input)
 
 return 2 * input;
 
 

 public static void Test()
 
 var stopWatch = new Stopwatch();

 stopWatch.Start();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkEmpty(i);
 
 stopWatch.Stop();
 double emptyTime = stopWatch.ElapsedMilliseconds * msToNs;
 Console.WriteLine($"Empty Work method: emptyTime:n2");

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkInterface(i);
 
 stopWatch.Stop();
 PrintResult("Interface", stopWatch.ElapsedMilliseconds, emptyTime);

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkDelegate(i);
 
 stopWatch.Stop();
 PrintResult("Delegates", stopWatch.ElapsedMilliseconds, emptyTime);

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkClassSealed(i);
 
 stopWatch.Stop();
 PrintResult("Sealed Class", stopWatch.ElapsedMilliseconds, emptyTime);

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkClass(i);
 
 stopWatch.Stop();
 PrintResult("Class", stopWatch.ElapsedMilliseconds, emptyTime);
 

 private static void PrintResult(string text, long elapsed, double emptyTime)
 
 Console.WriteLine($"text: elapsed * msToNs:n2 (elapsed * msToNs - emptyTime:n2)");
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkEmpty(int i)
 
 return 0.0;
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkInterface(int i)
 
 double f = _mathFunctionInterface.Calculate(i);
 double dv = _mathFunctionInterface.Derivate(i);
 return f - (dv * 12.34534);
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkDelegate(int i)
 
 double f = _calculate(i);
 double dv = _derivate(i);
 return f - (dv * 12.34534);
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkClassSealed(int i)
 
 double f = _mathFunctionClassSealed.Calculate(i);
 double dv = _mathFunctionClassSealed.Derivate(i);
 return f - (dv * 12.34534);
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkClass(int i)
 
 double f = _mathFunctionClass.Calculate(i);
 double dv = _mathFunctionClass.Derivate(i);
 return f - (dv * 12.34534);

The idea of [MethodImpl(MethodImplOptions.NoInlining)] is to prevent the compiler from calculating the addresses of the methods before the loop if the method was inlined.

edited Dec 21 at 17:21

answered Dec 14 at 20:01

Olivier Jacot-Descombes

65.3k885136

1

Does this have any meaningful effect? At first glance, it appears you're substituting the interface vtable for a custom made one.
– Cory Nelson
Dec 14 at 20:41

Sorry. my previous version was done in Debug mode. This final verion is now in Release mode.
– Olivier Jacot-Descombes
Dec 14 at 21:49

@CoryNelson: In response to your comments, I have done some tests. It does indeed have no noticable effects.
– Olivier Jacot-Descombes
Dec 14 at 22:16

add a comment |

I would assign the methods to delegates. This allows you to still program against the interface, while avoiding the interface method resolution.

public SomeObject

 private readonly Func<double, double> _calculate;
 private readonly Func<double, double> _derivate;

 public SomeObject(IMathFunction mathFunction)
 
 _calculate = mathFunction.Calculate;
 _derivate = mathFunction.Derivate;
 

 public double SomeWork(double input, double step)
 
 var f = _calculate(input);
 var dv = _derivate(input);
 return f - (dv * step);

In response to @CoryNelson's comment I made tests so see what the impact really is. I have sealed the function class, but this seems to make absolutely no difference since my methods are not virtual.

Test Results (mean time of 100 million iterations in ns) with the empty method time subtracted in braces:

Empty Work method: 1.48

Interface: 5.69 (4.21)

Delegates: 5.78 (4.30)

Sealed Class: 2.10 (0.62)

Class: 2.12 (0.64)

What surprised me was, that I expected a much longer execution time for the interface version. Since this kind of test does not represent the exact context of the OP's code, its validity is limited.

static class TimingInterfaceVsDelegateCalls

 const int N = 100_000_000;
 const double msToNs = 1e6 / N;

 static SquareFunctionSealed _mathFunctionClassSealed;
 static SquareFunction _mathFunctionClass;
 static IMathFunction _mathFunctionInterface;
 static Func<double, double> _calculate;
 static Func<double, double> _derivate;

 static TimingInterfaceVsDelegateCalls()
 
 _mathFunctionClass = new SquareFunction();
 _mathFunctionClassSealed = new SquareFunctionSealed();
 _mathFunctionInterface = _mathFunctionClassSealed;
 _calculate = _mathFunctionInterface.Calculate;
 _derivate = _mathFunctionInterface.Derivate;
 

 interface IMathFunction
 
 double Calculate(double input);
 double Derivate(double input);
 

 sealed class SquareFunctionSealed : IMathFunction
 
 public double Calculate(double input)
 
 return input * input;
 

 public double Derivate(double input)
 
 return 2 * input;
 
 

 class SquareFunction : IMathFunction
 
 public double Calculate(double input)
 
 return input * input;
 

 public double Derivate(double input)
 
 return 2 * input;
 
 

 public static void Test()
 
 var stopWatch = new Stopwatch();

 stopWatch.Start();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkEmpty(i);
 
 stopWatch.Stop();
 double emptyTime = stopWatch.ElapsedMilliseconds * msToNs;
 Console.WriteLine($"Empty Work method: emptyTime:n2");

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkInterface(i);
 
 stopWatch.Stop();
 PrintResult("Interface", stopWatch.ElapsedMilliseconds, emptyTime);

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkDelegate(i);
 
 stopWatch.Stop();
 PrintResult("Delegates", stopWatch.ElapsedMilliseconds, emptyTime);

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkClassSealed(i);
 
 stopWatch.Stop();
 PrintResult("Sealed Class", stopWatch.ElapsedMilliseconds, emptyTime);

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkClass(i);
 
 stopWatch.Stop();
 PrintResult("Class", stopWatch.ElapsedMilliseconds, emptyTime);
 

 private static void PrintResult(string text, long elapsed, double emptyTime)
 
 Console.WriteLine($"text: elapsed * msToNs:n2 (elapsed * msToNs - emptyTime:n2)");
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkEmpty(int i)
 
 return 0.0;
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkInterface(int i)
 
 double f = _mathFunctionInterface.Calculate(i);
 double dv = _mathFunctionInterface.Derivate(i);
 return f - (dv * 12.34534);
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkDelegate(int i)
 
 double f = _calculate(i);
 double dv = _derivate(i);
 return f - (dv * 12.34534);
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkClassSealed(int i)
 
 double f = _mathFunctionClassSealed.Calculate(i);
 double dv = _mathFunctionClassSealed.Derivate(i);
 return f - (dv * 12.34534);
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkClass(int i)
 
 double f = _mathFunctionClass.Calculate(i);
 double dv = _mathFunctionClass.Derivate(i);
 return f - (dv * 12.34534);

The idea of [MethodImpl(MethodImplOptions.NoInlining)] is to prevent the compiler from calculating the addresses of the methods before the loop if the method was inlined.

edited Dec 21 at 17:21

answered Dec 14 at 20:01

Olivier Jacot-Descombes

65.3k885136

I would assign the methods to delegates. This allows you to still program against the interface, while avoiding the interface method resolution.

public SomeObject

 private readonly Func<double, double> _calculate;
 private readonly Func<double, double> _derivate;

 public SomeObject(IMathFunction mathFunction)
 
 _calculate = mathFunction.Calculate;
 _derivate = mathFunction.Derivate;
 

 public double SomeWork(double input, double step)
 
 var f = _calculate(input);
 var dv = _derivate(input);
 return f - (dv * step);

In response to @CoryNelson's comment I made tests so see what the impact really is. I have sealed the function class, but this seems to make absolutely no difference since my methods are not virtual.

Test Results (mean time of 100 million iterations in ns) with the empty method time subtracted in braces:

Empty Work method: 1.48

Interface: 5.69 (4.21)

Delegates: 5.78 (4.30)

Sealed Class: 2.10 (0.62)

Class: 2.12 (0.64)

What surprised me was, that I expected a much longer execution time for the interface version. Since this kind of test does not represent the exact context of the OP's code, its validity is limited.

static class TimingInterfaceVsDelegateCalls

 const int N = 100_000_000;
 const double msToNs = 1e6 / N;

 static SquareFunctionSealed _mathFunctionClassSealed;
 static SquareFunction _mathFunctionClass;
 static IMathFunction _mathFunctionInterface;
 static Func<double, double> _calculate;
 static Func<double, double> _derivate;

 static TimingInterfaceVsDelegateCalls()
 
 _mathFunctionClass = new SquareFunction();
 _mathFunctionClassSealed = new SquareFunctionSealed();
 _mathFunctionInterface = _mathFunctionClassSealed;
 _calculate = _mathFunctionInterface.Calculate;
 _derivate = _mathFunctionInterface.Derivate;
 

 interface IMathFunction
 
 double Calculate(double input);
 double Derivate(double input);
 

 sealed class SquareFunctionSealed : IMathFunction
 
 public double Calculate(double input)
 
 return input * input;
 

 public double Derivate(double input)
 
 return 2 * input;
 
 

 class SquareFunction : IMathFunction
 
 public double Calculate(double input)
 
 return input * input;
 

 public double Derivate(double input)
 
 return 2 * input;
 
 

 public static void Test()
 
 var stopWatch = new Stopwatch();

 stopWatch.Start();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkEmpty(i);
 
 stopWatch.Stop();
 double emptyTime = stopWatch.ElapsedMilliseconds * msToNs;
 Console.WriteLine($"Empty Work method: emptyTime:n2");

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkInterface(i);
 
 stopWatch.Stop();
 PrintResult("Interface", stopWatch.ElapsedMilliseconds, emptyTime);

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkDelegate(i);
 
 stopWatch.Stop();
 PrintResult("Delegates", stopWatch.ElapsedMilliseconds, emptyTime);

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkClassSealed(i);
 
 stopWatch.Stop();
 PrintResult("Sealed Class", stopWatch.ElapsedMilliseconds, emptyTime);

 stopWatch.Restart();
 for (int i = 0; i < N; i++) 
 double result = SomeWorkClass(i);
 
 stopWatch.Stop();
 PrintResult("Class", stopWatch.ElapsedMilliseconds, emptyTime);
 

 private static void PrintResult(string text, long elapsed, double emptyTime)
 
 Console.WriteLine($"text: elapsed * msToNs:n2 (elapsed * msToNs - emptyTime:n2)");
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkEmpty(int i)
 
 return 0.0;
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkInterface(int i)
 
 double f = _mathFunctionInterface.Calculate(i);
 double dv = _mathFunctionInterface.Derivate(i);
 return f - (dv * 12.34534);
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkDelegate(int i)
 
 double f = _calculate(i);
 double dv = _derivate(i);
 return f - (dv * 12.34534);
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkClassSealed(int i)
 
 double f = _mathFunctionClassSealed.Calculate(i);
 double dv = _mathFunctionClassSealed.Derivate(i);
 return f - (dv * 12.34534);
 

 [MethodImpl(MethodImplOptions.NoInlining)]
 private static double SomeWorkClass(int i)
 
 double f = _mathFunctionClass.Calculate(i);
 double dv = _mathFunctionClass.Derivate(i);
 return f - (dv * 12.34534);

The idea of [MethodImpl(MethodImplOptions.NoInlining)] is to prevent the compiler from calculating the addresses of the methods before the loop if the method was inlined.

edited Dec 21 at 17:21

answered Dec 14 at 20:01

Olivier Jacot-Descombes

65.3k885136

edited Dec 21 at 17:21

answered Dec 14 at 20:01

Olivier Jacot-Descombes

65.3k885136

answered Dec 14 at 20:01

Olivier Jacot-Descombes

65.3k885136

answered Dec 14 at 20:01

Olivier Jacot-Descombes

65.3k885136

1

Does this have any meaningful effect? At first glance, it appears you're substituting the interface vtable for a custom made one.
– Cory Nelson
Dec 14 at 20:41

Sorry. my previous version was done in Debug mode. This final verion is now in Release mode.
– Olivier Jacot-Descombes
Dec 14 at 21:49

@CoryNelson: In response to your comments, I have done some tests. It does indeed have no noticable effects.
– Olivier Jacot-Descombes
Dec 14 at 22:16

add a comment |

1

Does this have any meaningful effect? At first glance, it appears you're substituting the interface vtable for a custom made one.
– Cory Nelson
Dec 14 at 20:41

Sorry. my previous version was done in Debug mode. This final verion is now in Release mode.
– Olivier Jacot-Descombes
Dec 14 at 21:49

@CoryNelson: In response to your comments, I have done some tests. It does indeed have no noticable effects.
– Olivier Jacot-Descombes
Dec 14 at 22:16

Does this have any meaningful effect? At first glance, it appears you're substituting the interface vtable for a custom made one.
– Cory Nelson
Dec 14 at 20:41

Sorry. my previous version was done in Debug mode. This final verion is now in Release mode.
– Olivier Jacot-Descombes
Dec 14 at 21:49

@CoryNelson: In response to your comments, I have done some tests. It does indeed have no noticable effects.
– Olivier Jacot-Descombes
Dec 14 at 22:16

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu