Quickly creating a sparse array
Clash Royale CLAN TAG#URR8PPP
$begingroup$
The motivation for my question is that I'm creating very large, very sparse symmetric matrices and finding the largest eigenvalue. (Think upwards of $10^5times 10^5$ size matrices, with at most $10^2$ nonzero entries on each row. However, to give a workable example, below I'll use $10^4times 10^4$ size matrices, with $30$ nonzero entries.) The entries are mostly independent of one another, although there is some time savings in working with one row at a time. (In the example below, I model this type of behavior by having the constant $c$ in the computation depend only on the row index. In the actual work I do, the dependence isn't quite that simple.) By symmetry, we may focus on the part of the matrix above the main diagonal. (For simplicity, I'm assuming the main diagonal is zero in the example below.)
So, to give a toy example consider the following code:
M = Table[0.0, i, 1, 10000, j, 1, 10000];
Do[
c = RandomInteger[1, 10];
PossibleColumns = RandomSample[Range[i + 1, 10000], Min[10000 - i - 1, 30]];
Do[
M[[PossibleColumns[[j]], i]] = c/RandomInteger[1, 10];
M[[i, PossibleColumns[[j]]]] = c/RandomInteger[1, 10],
j, 1, Length[PossibleColumns]
],
i, 1, 9999
];
Eigenvalues[M, 1];
The creation of the large array takes a couple seconds, and seems to fill up about 1 GB of RAM. (For the larger matrices I want to build, I quickly run out of RAM.) It takes about 3 seconds to update the nonzero entries in the matrix $M$. Finally, the eigenvalue function takes about 3 minutes.
If I replace the top line with the new line
M = SparseArray[, 10000, 10000, 0.0];
and rerun the entire code, then there are no RAM problems (even for much, much larger matrices). The eigenvalue step is almost instantaneous. However, the updating takes over an hour! I was very surprised that the updating takes so long, so I made the following intermediate version, which rather than updating each entry of $M$ independently, it updates each row of $M$ once.
M = SparseArray[, 10000, 10000, 0.0];
Do[
c = RandomInteger[1, 10];
jRow = SparseArray[, 10000, 0.0];
PossibleColumns = RandomSample[Range[i + 1, 10000], Min[10000 - i - 1, 30]];
Do[
jRow[[PossibleColumns[[j]]]] = c/RandomInteger[1, 10],
j, 1, Length[PossibleColumns]
];
M[[i]] = jRow,
i, 1, 9999
];
M = M + Transpose[M];
Eigenvalues[M, 1];
This has the RAM savings, and the eigenvalue savings, and finishes the updating of $M$ in a little over 4.5 minutes.
My question is this: Is there a way to keep (most of) the RAM savings of using a sparse array, along with the time saving in the eigenvalue computation, but also reduce the time used in updating $M$ down to the timing in the first example (which only used 3 seconds)?
performance-tuning sparse-arrays
$endgroup$
add a comment |
$begingroup$
The motivation for my question is that I'm creating very large, very sparse symmetric matrices and finding the largest eigenvalue. (Think upwards of $10^5times 10^5$ size matrices, with at most $10^2$ nonzero entries on each row. However, to give a workable example, below I'll use $10^4times 10^4$ size matrices, with $30$ nonzero entries.) The entries are mostly independent of one another, although there is some time savings in working with one row at a time. (In the example below, I model this type of behavior by having the constant $c$ in the computation depend only on the row index. In the actual work I do, the dependence isn't quite that simple.) By symmetry, we may focus on the part of the matrix above the main diagonal. (For simplicity, I'm assuming the main diagonal is zero in the example below.)
So, to give a toy example consider the following code:
M = Table[0.0, i, 1, 10000, j, 1, 10000];
Do[
c = RandomInteger[1, 10];
PossibleColumns = RandomSample[Range[i + 1, 10000], Min[10000 - i - 1, 30]];
Do[
M[[PossibleColumns[[j]], i]] = c/RandomInteger[1, 10];
M[[i, PossibleColumns[[j]]]] = c/RandomInteger[1, 10],
j, 1, Length[PossibleColumns]
],
i, 1, 9999
];
Eigenvalues[M, 1];
The creation of the large array takes a couple seconds, and seems to fill up about 1 GB of RAM. (For the larger matrices I want to build, I quickly run out of RAM.) It takes about 3 seconds to update the nonzero entries in the matrix $M$. Finally, the eigenvalue function takes about 3 minutes.
If I replace the top line with the new line
M = SparseArray[, 10000, 10000, 0.0];
and rerun the entire code, then there are no RAM problems (even for much, much larger matrices). The eigenvalue step is almost instantaneous. However, the updating takes over an hour! I was very surprised that the updating takes so long, so I made the following intermediate version, which rather than updating each entry of $M$ independently, it updates each row of $M$ once.
M = SparseArray[, 10000, 10000, 0.0];
Do[
c = RandomInteger[1, 10];
jRow = SparseArray[, 10000, 0.0];
PossibleColumns = RandomSample[Range[i + 1, 10000], Min[10000 - i - 1, 30]];
Do[
jRow[[PossibleColumns[[j]]]] = c/RandomInteger[1, 10],
j, 1, Length[PossibleColumns]
];
M[[i]] = jRow,
i, 1, 9999
];
M = M + Transpose[M];
Eigenvalues[M, 1];
This has the RAM savings, and the eigenvalue savings, and finishes the updating of $M$ in a little over 4.5 minutes.
My question is this: Is there a way to keep (most of) the RAM savings of using a sparse array, along with the time saving in the eigenvalue computation, but also reduce the time used in updating $M$ down to the timing in the first example (which only used 3 seconds)?
performance-tuning sparse-arrays
$endgroup$
2
$begingroup$
If you can express the values of the non-zero terms as a function of the position indices, you should be able to construct theSparseArray
from the ground up, by giving a list of rules, or by using patterns and conditions. That should be MUCH better than modifying them element-wise.
$endgroup$
– MarcoB
Feb 27 at 20:27
$begingroup$
@MarcoB Yes, I have a function $f(i,j)$ that gives the value of the matrix in the $(i,j)$ coordinate. Moreover, I have a function $g(i)$ that gives me a list of those $j$ such that the $(i,j)$ entry is nonzero. (In the toy example above, you would replace "PossibleColumns" by $g(i)$, and you would replace "c/RandomInteger[1,10]" with $f(i,j)$.) I don't know how to turn that into a faster population of the sparse array though; any help with the syntax would be appreciated!
$endgroup$
– Pace Nielsen
Feb 27 at 20:45
$begingroup$
Can you share those functions, perhaps expressed both in plain language, and in Mathematica code? Just in case one of the two is more readable than the other.
$endgroup$
– MarcoB
Feb 27 at 21:00
5
$begingroup$
Changing elements of sparse arrays is slow because the whole array must be re-built after each change. Do not assign elements in a loop!! Instead, generate a list likei,j -> value
first, then built the sparse array in a single step.
$endgroup$
– Szabolcs
Feb 27 at 21:11
add a comment |
$begingroup$
The motivation for my question is that I'm creating very large, very sparse symmetric matrices and finding the largest eigenvalue. (Think upwards of $10^5times 10^5$ size matrices, with at most $10^2$ nonzero entries on each row. However, to give a workable example, below I'll use $10^4times 10^4$ size matrices, with $30$ nonzero entries.) The entries are mostly independent of one another, although there is some time savings in working with one row at a time. (In the example below, I model this type of behavior by having the constant $c$ in the computation depend only on the row index. In the actual work I do, the dependence isn't quite that simple.) By symmetry, we may focus on the part of the matrix above the main diagonal. (For simplicity, I'm assuming the main diagonal is zero in the example below.)
So, to give a toy example consider the following code:
M = Table[0.0, i, 1, 10000, j, 1, 10000];
Do[
c = RandomInteger[1, 10];
PossibleColumns = RandomSample[Range[i + 1, 10000], Min[10000 - i - 1, 30]];
Do[
M[[PossibleColumns[[j]], i]] = c/RandomInteger[1, 10];
M[[i, PossibleColumns[[j]]]] = c/RandomInteger[1, 10],
j, 1, Length[PossibleColumns]
],
i, 1, 9999
];
Eigenvalues[M, 1];
The creation of the large array takes a couple seconds, and seems to fill up about 1 GB of RAM. (For the larger matrices I want to build, I quickly run out of RAM.) It takes about 3 seconds to update the nonzero entries in the matrix $M$. Finally, the eigenvalue function takes about 3 minutes.
If I replace the top line with the new line
M = SparseArray[, 10000, 10000, 0.0];
and rerun the entire code, then there are no RAM problems (even for much, much larger matrices). The eigenvalue step is almost instantaneous. However, the updating takes over an hour! I was very surprised that the updating takes so long, so I made the following intermediate version, which rather than updating each entry of $M$ independently, it updates each row of $M$ once.
M = SparseArray[, 10000, 10000, 0.0];
Do[
c = RandomInteger[1, 10];
jRow = SparseArray[, 10000, 0.0];
PossibleColumns = RandomSample[Range[i + 1, 10000], Min[10000 - i - 1, 30]];
Do[
jRow[[PossibleColumns[[j]]]] = c/RandomInteger[1, 10],
j, 1, Length[PossibleColumns]
];
M[[i]] = jRow,
i, 1, 9999
];
M = M + Transpose[M];
Eigenvalues[M, 1];
This has the RAM savings, and the eigenvalue savings, and finishes the updating of $M$ in a little over 4.5 minutes.
My question is this: Is there a way to keep (most of) the RAM savings of using a sparse array, along with the time saving in the eigenvalue computation, but also reduce the time used in updating $M$ down to the timing in the first example (which only used 3 seconds)?
performance-tuning sparse-arrays
$endgroup$
The motivation for my question is that I'm creating very large, very sparse symmetric matrices and finding the largest eigenvalue. (Think upwards of $10^5times 10^5$ size matrices, with at most $10^2$ nonzero entries on each row. However, to give a workable example, below I'll use $10^4times 10^4$ size matrices, with $30$ nonzero entries.) The entries are mostly independent of one another, although there is some time savings in working with one row at a time. (In the example below, I model this type of behavior by having the constant $c$ in the computation depend only on the row index. In the actual work I do, the dependence isn't quite that simple.) By symmetry, we may focus on the part of the matrix above the main diagonal. (For simplicity, I'm assuming the main diagonal is zero in the example below.)
So, to give a toy example consider the following code:
M = Table[0.0, i, 1, 10000, j, 1, 10000];
Do[
c = RandomInteger[1, 10];
PossibleColumns = RandomSample[Range[i + 1, 10000], Min[10000 - i - 1, 30]];
Do[
M[[PossibleColumns[[j]], i]] = c/RandomInteger[1, 10];
M[[i, PossibleColumns[[j]]]] = c/RandomInteger[1, 10],
j, 1, Length[PossibleColumns]
],
i, 1, 9999
];
Eigenvalues[M, 1];
The creation of the large array takes a couple seconds, and seems to fill up about 1 GB of RAM. (For the larger matrices I want to build, I quickly run out of RAM.) It takes about 3 seconds to update the nonzero entries in the matrix $M$. Finally, the eigenvalue function takes about 3 minutes.
If I replace the top line with the new line
M = SparseArray[, 10000, 10000, 0.0];
and rerun the entire code, then there are no RAM problems (even for much, much larger matrices). The eigenvalue step is almost instantaneous. However, the updating takes over an hour! I was very surprised that the updating takes so long, so I made the following intermediate version, which rather than updating each entry of $M$ independently, it updates each row of $M$ once.
M = SparseArray[, 10000, 10000, 0.0];
Do[
c = RandomInteger[1, 10];
jRow = SparseArray[, 10000, 0.0];
PossibleColumns = RandomSample[Range[i + 1, 10000], Min[10000 - i - 1, 30]];
Do[
jRow[[PossibleColumns[[j]]]] = c/RandomInteger[1, 10],
j, 1, Length[PossibleColumns]
];
M[[i]] = jRow,
i, 1, 9999
];
M = M + Transpose[M];
Eigenvalues[M, 1];
This has the RAM savings, and the eigenvalue savings, and finishes the updating of $M$ in a little over 4.5 minutes.
My question is this: Is there a way to keep (most of) the RAM savings of using a sparse array, along with the time saving in the eigenvalue computation, but also reduce the time used in updating $M$ down to the timing in the first example (which only used 3 seconds)?
performance-tuning sparse-arrays
performance-tuning sparse-arrays
edited Mar 1 at 1:18
Michael E2
149k12201481
149k12201481
asked Feb 27 at 19:43
Pace NielsenPace Nielsen
20716
20716
2
$begingroup$
If you can express the values of the non-zero terms as a function of the position indices, you should be able to construct theSparseArray
from the ground up, by giving a list of rules, or by using patterns and conditions. That should be MUCH better than modifying them element-wise.
$endgroup$
– MarcoB
Feb 27 at 20:27
$begingroup$
@MarcoB Yes, I have a function $f(i,j)$ that gives the value of the matrix in the $(i,j)$ coordinate. Moreover, I have a function $g(i)$ that gives me a list of those $j$ such that the $(i,j)$ entry is nonzero. (In the toy example above, you would replace "PossibleColumns" by $g(i)$, and you would replace "c/RandomInteger[1,10]" with $f(i,j)$.) I don't know how to turn that into a faster population of the sparse array though; any help with the syntax would be appreciated!
$endgroup$
– Pace Nielsen
Feb 27 at 20:45
$begingroup$
Can you share those functions, perhaps expressed both in plain language, and in Mathematica code? Just in case one of the two is more readable than the other.
$endgroup$
– MarcoB
Feb 27 at 21:00
5
$begingroup$
Changing elements of sparse arrays is slow because the whole array must be re-built after each change. Do not assign elements in a loop!! Instead, generate a list likei,j -> value
first, then built the sparse array in a single step.
$endgroup$
– Szabolcs
Feb 27 at 21:11
add a comment |
2
$begingroup$
If you can express the values of the non-zero terms as a function of the position indices, you should be able to construct theSparseArray
from the ground up, by giving a list of rules, or by using patterns and conditions. That should be MUCH better than modifying them element-wise.
$endgroup$
– MarcoB
Feb 27 at 20:27
$begingroup$
@MarcoB Yes, I have a function $f(i,j)$ that gives the value of the matrix in the $(i,j)$ coordinate. Moreover, I have a function $g(i)$ that gives me a list of those $j$ such that the $(i,j)$ entry is nonzero. (In the toy example above, you would replace "PossibleColumns" by $g(i)$, and you would replace "c/RandomInteger[1,10]" with $f(i,j)$.) I don't know how to turn that into a faster population of the sparse array though; any help with the syntax would be appreciated!
$endgroup$
– Pace Nielsen
Feb 27 at 20:45
$begingroup$
Can you share those functions, perhaps expressed both in plain language, and in Mathematica code? Just in case one of the two is more readable than the other.
$endgroup$
– MarcoB
Feb 27 at 21:00
5
$begingroup$
Changing elements of sparse arrays is slow because the whole array must be re-built after each change. Do not assign elements in a loop!! Instead, generate a list likei,j -> value
first, then built the sparse array in a single step.
$endgroup$
– Szabolcs
Feb 27 at 21:11
2
2
$begingroup$
If you can express the values of the non-zero terms as a function of the position indices, you should be able to construct the
SparseArray
from the ground up, by giving a list of rules, or by using patterns and conditions. That should be MUCH better than modifying them element-wise.$endgroup$
– MarcoB
Feb 27 at 20:27
$begingroup$
If you can express the values of the non-zero terms as a function of the position indices, you should be able to construct the
SparseArray
from the ground up, by giving a list of rules, or by using patterns and conditions. That should be MUCH better than modifying them element-wise.$endgroup$
– MarcoB
Feb 27 at 20:27
$begingroup$
@MarcoB Yes, I have a function $f(i,j)$ that gives the value of the matrix in the $(i,j)$ coordinate. Moreover, I have a function $g(i)$ that gives me a list of those $j$ such that the $(i,j)$ entry is nonzero. (In the toy example above, you would replace "PossibleColumns" by $g(i)$, and you would replace "c/RandomInteger[1,10]" with $f(i,j)$.) I don't know how to turn that into a faster population of the sparse array though; any help with the syntax would be appreciated!
$endgroup$
– Pace Nielsen
Feb 27 at 20:45
$begingroup$
@MarcoB Yes, I have a function $f(i,j)$ that gives the value of the matrix in the $(i,j)$ coordinate. Moreover, I have a function $g(i)$ that gives me a list of those $j$ such that the $(i,j)$ entry is nonzero. (In the toy example above, you would replace "PossibleColumns" by $g(i)$, and you would replace "c/RandomInteger[1,10]" with $f(i,j)$.) I don't know how to turn that into a faster population of the sparse array though; any help with the syntax would be appreciated!
$endgroup$
– Pace Nielsen
Feb 27 at 20:45
$begingroup$
Can you share those functions, perhaps expressed both in plain language, and in Mathematica code? Just in case one of the two is more readable than the other.
$endgroup$
– MarcoB
Feb 27 at 21:00
$begingroup$
Can you share those functions, perhaps expressed both in plain language, and in Mathematica code? Just in case one of the two is more readable than the other.
$endgroup$
– MarcoB
Feb 27 at 21:00
5
5
$begingroup$
Changing elements of sparse arrays is slow because the whole array must be re-built after each change. Do not assign elements in a loop!! Instead, generate a list like
i,j -> value
first, then built the sparse array in a single step.$endgroup$
– Szabolcs
Feb 27 at 21:11
$begingroup$
Changing elements of sparse arrays is slow because the whole array must be re-built after each change. Do not assign elements in a loop!! Instead, generate a list like
i,j -> value
first, then built the sparse array in a single step.$endgroup$
– Szabolcs
Feb 27 at 21:11
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
You can use the first documented usage for SparseArray
:
So what you want to do is collect all of the rules i,j->val
before passing them to SparseArray
. There is already a built-in method for efficiently collecting a set of rules with no duplicate keys, namely AssociateTo
.
So with a slight modification of the OP's code:
data=<||>;
Do[
c = RandomInteger[1, 10];
PossibleColumns = RandomSample[Range[i + 1, 10000], Min[10000 - i - 1, 30]];
Do[
AssociateTo[data, PossibleColumns[[j]],i -> c/RandomInteger[1, 10]];
AssociateTo[data, i,PossibleColumns[[j]] -> c/RandomInteger[1, 10]];
,
j, 1, Length[PossibleColumns]
],
i, 1, 9999
];
The above runs in ~3.5 seconds on my machine, and constructing the SparseArray
is very fast as well:
SparseArray[Normal[data]] // AbsoluteTiming
$endgroup$
$begingroup$
This is both clean and very fast. I didn't realize that "AssociateTo" for association lists was so much faster than "AppendTo" for normal lists. Thanks!
$endgroup$
– Pace Nielsen
Feb 27 at 23:06
$begingroup$
Glad I can help!
$endgroup$
– Jason B.
Feb 27 at 23:11
add a comment |
$begingroup$
Let's assume we are given the following data:
n = 100000;
k = 100;
g[i_] := RandomSample[i + 1 ;; n, Min[n - i - 1, k]];
f[i_, jlist_] := RandomReal[1, 10, Length[jlist]];
rowcolumnindices = g /@ Range[1, n - 1];
rowlengths = Length /@ rowcolumnindices;
rowvalues = MapIndexed[
jlist, i [Function] f[i[[1]], jlist],
rowcolumnindices
];
Probably the most efficient way to populate the sparse array with this data is by preparing the CRS data by hand and to use the following undocumented constructor:
First @ AbsoluteTiming[
orderings = Ordering /@ rowcolumnindices;
columnindices = Partition[Join @@ MapThread[Part, rowcolumnindices, orderings], 1];
rowpointers = Accumulate[Join[0, Length /@ rowcolumnindices, 0]];
values = Join @@ MapThread[Part, rowvalues, orderings];
M = # + Transpose[#] &[
SparseArray @@ Automatic, n, n, 0., 1, rowpointers, columnindices, values
];
]
1.53083
I used machine real values here (not rational numbers) because they will be processed much faster, in particular in Eigenvalues
.
The CRS format assumes that the column indices for each row are in ascending order. This is why we have to sort the column indices and the corresponding values first (this explains the extra fuzz around orderings
).
$endgroup$
$begingroup$
In the toy example I gave above, the entries are chosen randomly. In the actual computation I care about the entries are determined by the coordinates. I made the toy example different in this way for two reasons: (1) the deterministic function is very complicated, and (2) I wanted to easily create a large matrix to help illustrate the situation. So, I'm having trouble converting your beautiful ideas to my situation, because "rowvalues" is not so easily created. My problem really boils down to trying to put all that data in a list quickly, as you have done with rowvalues.
$endgroup$
– Pace Nielsen
Feb 27 at 21:38
1
$begingroup$
Yes, I assumed that you can compute the values of each row before the actual matrix is constructed. I tried to address the functionsf
andg
in my recent edit. The only difference of my functionf
to yours is that it expects a whole list ofj
s as a second argument so that it creates all values of a row at once. Does this help?
$endgroup$
– Henrik Schumacher
Feb 27 at 21:50
1
$begingroup$
This looks great! I'm still trying to understand it all. I'll let you know soon if I have any further questions.
$endgroup$
– Pace Nielsen
Feb 27 at 21:54
1
$begingroup$
@PaceNielsen Just in case you haven't noticed it, yet: I used the sizes ofn = 100000
andk = 100
as you indicated for your goal (instead ofn = 10000
andk = 30
). By direct comparison, my approach is more than 20 times faster than collecting the array rules withAssociateTo
and applyingSparseArray
on these rules. Might be relevant for your application...
$endgroup$
– Henrik Schumacher
Mar 2 at 20:03
$begingroup$
I'll give your method another look. (I did code it up and the other was faster [because of the nature of how I was computing my functions], but I might have done it in a bad way.)
$endgroup$
– Pace Nielsen
Mar 2 at 22:38
|
show 2 more comments
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "387"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f192328%2fquickly-creating-a-sparse-array%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You can use the first documented usage for SparseArray
:
So what you want to do is collect all of the rules i,j->val
before passing them to SparseArray
. There is already a built-in method for efficiently collecting a set of rules with no duplicate keys, namely AssociateTo
.
So with a slight modification of the OP's code:
data=<||>;
Do[
c = RandomInteger[1, 10];
PossibleColumns = RandomSample[Range[i + 1, 10000], Min[10000 - i - 1, 30]];
Do[
AssociateTo[data, PossibleColumns[[j]],i -> c/RandomInteger[1, 10]];
AssociateTo[data, i,PossibleColumns[[j]] -> c/RandomInteger[1, 10]];
,
j, 1, Length[PossibleColumns]
],
i, 1, 9999
];
The above runs in ~3.5 seconds on my machine, and constructing the SparseArray
is very fast as well:
SparseArray[Normal[data]] // AbsoluteTiming
$endgroup$
$begingroup$
This is both clean and very fast. I didn't realize that "AssociateTo" for association lists was so much faster than "AppendTo" for normal lists. Thanks!
$endgroup$
– Pace Nielsen
Feb 27 at 23:06
$begingroup$
Glad I can help!
$endgroup$
– Jason B.
Feb 27 at 23:11
add a comment |
$begingroup$
You can use the first documented usage for SparseArray
:
So what you want to do is collect all of the rules i,j->val
before passing them to SparseArray
. There is already a built-in method for efficiently collecting a set of rules with no duplicate keys, namely AssociateTo
.
So with a slight modification of the OP's code:
data=<||>;
Do[
c = RandomInteger[1, 10];
PossibleColumns = RandomSample[Range[i + 1, 10000], Min[10000 - i - 1, 30]];
Do[
AssociateTo[data, PossibleColumns[[j]],i -> c/RandomInteger[1, 10]];
AssociateTo[data, i,PossibleColumns[[j]] -> c/RandomInteger[1, 10]];
,
j, 1, Length[PossibleColumns]
],
i, 1, 9999
];
The above runs in ~3.5 seconds on my machine, and constructing the SparseArray
is very fast as well:
SparseArray[Normal[data]] // AbsoluteTiming
$endgroup$
$begingroup$
This is both clean and very fast. I didn't realize that "AssociateTo" for association lists was so much faster than "AppendTo" for normal lists. Thanks!
$endgroup$
– Pace Nielsen
Feb 27 at 23:06
$begingroup$
Glad I can help!
$endgroup$
– Jason B.
Feb 27 at 23:11
add a comment |
$begingroup$
You can use the first documented usage for SparseArray
:
So what you want to do is collect all of the rules i,j->val
before passing them to SparseArray
. There is already a built-in method for efficiently collecting a set of rules with no duplicate keys, namely AssociateTo
.
So with a slight modification of the OP's code:
data=<||>;
Do[
c = RandomInteger[1, 10];
PossibleColumns = RandomSample[Range[i + 1, 10000], Min[10000 - i - 1, 30]];
Do[
AssociateTo[data, PossibleColumns[[j]],i -> c/RandomInteger[1, 10]];
AssociateTo[data, i,PossibleColumns[[j]] -> c/RandomInteger[1, 10]];
,
j, 1, Length[PossibleColumns]
],
i, 1, 9999
];
The above runs in ~3.5 seconds on my machine, and constructing the SparseArray
is very fast as well:
SparseArray[Normal[data]] // AbsoluteTiming
$endgroup$
You can use the first documented usage for SparseArray
:
So what you want to do is collect all of the rules i,j->val
before passing them to SparseArray
. There is already a built-in method for efficiently collecting a set of rules with no duplicate keys, namely AssociateTo
.
So with a slight modification of the OP's code:
data=<||>;
Do[
c = RandomInteger[1, 10];
PossibleColumns = RandomSample[Range[i + 1, 10000], Min[10000 - i - 1, 30]];
Do[
AssociateTo[data, PossibleColumns[[j]],i -> c/RandomInteger[1, 10]];
AssociateTo[data, i,PossibleColumns[[j]] -> c/RandomInteger[1, 10]];
,
j, 1, Length[PossibleColumns]
],
i, 1, 9999
];
The above runs in ~3.5 seconds on my machine, and constructing the SparseArray
is very fast as well:
SparseArray[Normal[data]] // AbsoluteTiming
edited Feb 27 at 23:12
answered Feb 27 at 22:45
Jason B.Jason B.
48.8k388196
48.8k388196
$begingroup$
This is both clean and very fast. I didn't realize that "AssociateTo" for association lists was so much faster than "AppendTo" for normal lists. Thanks!
$endgroup$
– Pace Nielsen
Feb 27 at 23:06
$begingroup$
Glad I can help!
$endgroup$
– Jason B.
Feb 27 at 23:11
add a comment |
$begingroup$
This is both clean and very fast. I didn't realize that "AssociateTo" for association lists was so much faster than "AppendTo" for normal lists. Thanks!
$endgroup$
– Pace Nielsen
Feb 27 at 23:06
$begingroup$
Glad I can help!
$endgroup$
– Jason B.
Feb 27 at 23:11
$begingroup$
This is both clean and very fast. I didn't realize that "AssociateTo" for association lists was so much faster than "AppendTo" for normal lists. Thanks!
$endgroup$
– Pace Nielsen
Feb 27 at 23:06
$begingroup$
This is both clean and very fast. I didn't realize that "AssociateTo" for association lists was so much faster than "AppendTo" for normal lists. Thanks!
$endgroup$
– Pace Nielsen
Feb 27 at 23:06
$begingroup$
Glad I can help!
$endgroup$
– Jason B.
Feb 27 at 23:11
$begingroup$
Glad I can help!
$endgroup$
– Jason B.
Feb 27 at 23:11
add a comment |
$begingroup$
Let's assume we are given the following data:
n = 100000;
k = 100;
g[i_] := RandomSample[i + 1 ;; n, Min[n - i - 1, k]];
f[i_, jlist_] := RandomReal[1, 10, Length[jlist]];
rowcolumnindices = g /@ Range[1, n - 1];
rowlengths = Length /@ rowcolumnindices;
rowvalues = MapIndexed[
jlist, i [Function] f[i[[1]], jlist],
rowcolumnindices
];
Probably the most efficient way to populate the sparse array with this data is by preparing the CRS data by hand and to use the following undocumented constructor:
First @ AbsoluteTiming[
orderings = Ordering /@ rowcolumnindices;
columnindices = Partition[Join @@ MapThread[Part, rowcolumnindices, orderings], 1];
rowpointers = Accumulate[Join[0, Length /@ rowcolumnindices, 0]];
values = Join @@ MapThread[Part, rowvalues, orderings];
M = # + Transpose[#] &[
SparseArray @@ Automatic, n, n, 0., 1, rowpointers, columnindices, values
];
]
1.53083
I used machine real values here (not rational numbers) because they will be processed much faster, in particular in Eigenvalues
.
The CRS format assumes that the column indices for each row are in ascending order. This is why we have to sort the column indices and the corresponding values first (this explains the extra fuzz around orderings
).
$endgroup$
$begingroup$
In the toy example I gave above, the entries are chosen randomly. In the actual computation I care about the entries are determined by the coordinates. I made the toy example different in this way for two reasons: (1) the deterministic function is very complicated, and (2) I wanted to easily create a large matrix to help illustrate the situation. So, I'm having trouble converting your beautiful ideas to my situation, because "rowvalues" is not so easily created. My problem really boils down to trying to put all that data in a list quickly, as you have done with rowvalues.
$endgroup$
– Pace Nielsen
Feb 27 at 21:38
1
$begingroup$
Yes, I assumed that you can compute the values of each row before the actual matrix is constructed. I tried to address the functionsf
andg
in my recent edit. The only difference of my functionf
to yours is that it expects a whole list ofj
s as a second argument so that it creates all values of a row at once. Does this help?
$endgroup$
– Henrik Schumacher
Feb 27 at 21:50
1
$begingroup$
This looks great! I'm still trying to understand it all. I'll let you know soon if I have any further questions.
$endgroup$
– Pace Nielsen
Feb 27 at 21:54
1
$begingroup$
@PaceNielsen Just in case you haven't noticed it, yet: I used the sizes ofn = 100000
andk = 100
as you indicated for your goal (instead ofn = 10000
andk = 30
). By direct comparison, my approach is more than 20 times faster than collecting the array rules withAssociateTo
and applyingSparseArray
on these rules. Might be relevant for your application...
$endgroup$
– Henrik Schumacher
Mar 2 at 20:03
$begingroup$
I'll give your method another look. (I did code it up and the other was faster [because of the nature of how I was computing my functions], but I might have done it in a bad way.)
$endgroup$
– Pace Nielsen
Mar 2 at 22:38
|
show 2 more comments
$begingroup$
Let's assume we are given the following data:
n = 100000;
k = 100;
g[i_] := RandomSample[i + 1 ;; n, Min[n - i - 1, k]];
f[i_, jlist_] := RandomReal[1, 10, Length[jlist]];
rowcolumnindices = g /@ Range[1, n - 1];
rowlengths = Length /@ rowcolumnindices;
rowvalues = MapIndexed[
jlist, i [Function] f[i[[1]], jlist],
rowcolumnindices
];
Probably the most efficient way to populate the sparse array with this data is by preparing the CRS data by hand and to use the following undocumented constructor:
First @ AbsoluteTiming[
orderings = Ordering /@ rowcolumnindices;
columnindices = Partition[Join @@ MapThread[Part, rowcolumnindices, orderings], 1];
rowpointers = Accumulate[Join[0, Length /@ rowcolumnindices, 0]];
values = Join @@ MapThread[Part, rowvalues, orderings];
M = # + Transpose[#] &[
SparseArray @@ Automatic, n, n, 0., 1, rowpointers, columnindices, values
];
]
1.53083
I used machine real values here (not rational numbers) because they will be processed much faster, in particular in Eigenvalues
.
The CRS format assumes that the column indices for each row are in ascending order. This is why we have to sort the column indices and the corresponding values first (this explains the extra fuzz around orderings
).
$endgroup$
$begingroup$
In the toy example I gave above, the entries are chosen randomly. In the actual computation I care about the entries are determined by the coordinates. I made the toy example different in this way for two reasons: (1) the deterministic function is very complicated, and (2) I wanted to easily create a large matrix to help illustrate the situation. So, I'm having trouble converting your beautiful ideas to my situation, because "rowvalues" is not so easily created. My problem really boils down to trying to put all that data in a list quickly, as you have done with rowvalues.
$endgroup$
– Pace Nielsen
Feb 27 at 21:38
1
$begingroup$
Yes, I assumed that you can compute the values of each row before the actual matrix is constructed. I tried to address the functionsf
andg
in my recent edit. The only difference of my functionf
to yours is that it expects a whole list ofj
s as a second argument so that it creates all values of a row at once. Does this help?
$endgroup$
– Henrik Schumacher
Feb 27 at 21:50
1
$begingroup$
This looks great! I'm still trying to understand it all. I'll let you know soon if I have any further questions.
$endgroup$
– Pace Nielsen
Feb 27 at 21:54
1
$begingroup$
@PaceNielsen Just in case you haven't noticed it, yet: I used the sizes ofn = 100000
andk = 100
as you indicated for your goal (instead ofn = 10000
andk = 30
). By direct comparison, my approach is more than 20 times faster than collecting the array rules withAssociateTo
and applyingSparseArray
on these rules. Might be relevant for your application...
$endgroup$
– Henrik Schumacher
Mar 2 at 20:03
$begingroup$
I'll give your method another look. (I did code it up and the other was faster [because of the nature of how I was computing my functions], but I might have done it in a bad way.)
$endgroup$
– Pace Nielsen
Mar 2 at 22:38
|
show 2 more comments
$begingroup$
Let's assume we are given the following data:
n = 100000;
k = 100;
g[i_] := RandomSample[i + 1 ;; n, Min[n - i - 1, k]];
f[i_, jlist_] := RandomReal[1, 10, Length[jlist]];
rowcolumnindices = g /@ Range[1, n - 1];
rowlengths = Length /@ rowcolumnindices;
rowvalues = MapIndexed[
jlist, i [Function] f[i[[1]], jlist],
rowcolumnindices
];
Probably the most efficient way to populate the sparse array with this data is by preparing the CRS data by hand and to use the following undocumented constructor:
First @ AbsoluteTiming[
orderings = Ordering /@ rowcolumnindices;
columnindices = Partition[Join @@ MapThread[Part, rowcolumnindices, orderings], 1];
rowpointers = Accumulate[Join[0, Length /@ rowcolumnindices, 0]];
values = Join @@ MapThread[Part, rowvalues, orderings];
M = # + Transpose[#] &[
SparseArray @@ Automatic, n, n, 0., 1, rowpointers, columnindices, values
];
]
1.53083
I used machine real values here (not rational numbers) because they will be processed much faster, in particular in Eigenvalues
.
The CRS format assumes that the column indices for each row are in ascending order. This is why we have to sort the column indices and the corresponding values first (this explains the extra fuzz around orderings
).
$endgroup$
Let's assume we are given the following data:
n = 100000;
k = 100;
g[i_] := RandomSample[i + 1 ;; n, Min[n - i - 1, k]];
f[i_, jlist_] := RandomReal[1, 10, Length[jlist]];
rowcolumnindices = g /@ Range[1, n - 1];
rowlengths = Length /@ rowcolumnindices;
rowvalues = MapIndexed[
jlist, i [Function] f[i[[1]], jlist],
rowcolumnindices
];
Probably the most efficient way to populate the sparse array with this data is by preparing the CRS data by hand and to use the following undocumented constructor:
First @ AbsoluteTiming[
orderings = Ordering /@ rowcolumnindices;
columnindices = Partition[Join @@ MapThread[Part, rowcolumnindices, orderings], 1];
rowpointers = Accumulate[Join[0, Length /@ rowcolumnindices, 0]];
values = Join @@ MapThread[Part, rowvalues, orderings];
M = # + Transpose[#] &[
SparseArray @@ Automatic, n, n, 0., 1, rowpointers, columnindices, values
];
]
1.53083
I used machine real values here (not rational numbers) because they will be processed much faster, in particular in Eigenvalues
.
The CRS format assumes that the column indices for each row are in ascending order. This is why we have to sort the column indices and the corresponding values first (this explains the extra fuzz around orderings
).
edited Feb 27 at 21:48
answered Feb 27 at 21:06
Henrik SchumacherHenrik Schumacher
57.9k579159
57.9k579159
$begingroup$
In the toy example I gave above, the entries are chosen randomly. In the actual computation I care about the entries are determined by the coordinates. I made the toy example different in this way for two reasons: (1) the deterministic function is very complicated, and (2) I wanted to easily create a large matrix to help illustrate the situation. So, I'm having trouble converting your beautiful ideas to my situation, because "rowvalues" is not so easily created. My problem really boils down to trying to put all that data in a list quickly, as you have done with rowvalues.
$endgroup$
– Pace Nielsen
Feb 27 at 21:38
1
$begingroup$
Yes, I assumed that you can compute the values of each row before the actual matrix is constructed. I tried to address the functionsf
andg
in my recent edit. The only difference of my functionf
to yours is that it expects a whole list ofj
s as a second argument so that it creates all values of a row at once. Does this help?
$endgroup$
– Henrik Schumacher
Feb 27 at 21:50
1
$begingroup$
This looks great! I'm still trying to understand it all. I'll let you know soon if I have any further questions.
$endgroup$
– Pace Nielsen
Feb 27 at 21:54
1
$begingroup$
@PaceNielsen Just in case you haven't noticed it, yet: I used the sizes ofn = 100000
andk = 100
as you indicated for your goal (instead ofn = 10000
andk = 30
). By direct comparison, my approach is more than 20 times faster than collecting the array rules withAssociateTo
and applyingSparseArray
on these rules. Might be relevant for your application...
$endgroup$
– Henrik Schumacher
Mar 2 at 20:03
$begingroup$
I'll give your method another look. (I did code it up and the other was faster [because of the nature of how I was computing my functions], but I might have done it in a bad way.)
$endgroup$
– Pace Nielsen
Mar 2 at 22:38
|
show 2 more comments
$begingroup$
In the toy example I gave above, the entries are chosen randomly. In the actual computation I care about the entries are determined by the coordinates. I made the toy example different in this way for two reasons: (1) the deterministic function is very complicated, and (2) I wanted to easily create a large matrix to help illustrate the situation. So, I'm having trouble converting your beautiful ideas to my situation, because "rowvalues" is not so easily created. My problem really boils down to trying to put all that data in a list quickly, as you have done with rowvalues.
$endgroup$
– Pace Nielsen
Feb 27 at 21:38
1
$begingroup$
Yes, I assumed that you can compute the values of each row before the actual matrix is constructed. I tried to address the functionsf
andg
in my recent edit. The only difference of my functionf
to yours is that it expects a whole list ofj
s as a second argument so that it creates all values of a row at once. Does this help?
$endgroup$
– Henrik Schumacher
Feb 27 at 21:50
1
$begingroup$
This looks great! I'm still trying to understand it all. I'll let you know soon if I have any further questions.
$endgroup$
– Pace Nielsen
Feb 27 at 21:54
1
$begingroup$
@PaceNielsen Just in case you haven't noticed it, yet: I used the sizes ofn = 100000
andk = 100
as you indicated for your goal (instead ofn = 10000
andk = 30
). By direct comparison, my approach is more than 20 times faster than collecting the array rules withAssociateTo
and applyingSparseArray
on these rules. Might be relevant for your application...
$endgroup$
– Henrik Schumacher
Mar 2 at 20:03
$begingroup$
I'll give your method another look. (I did code it up and the other was faster [because of the nature of how I was computing my functions], but I might have done it in a bad way.)
$endgroup$
– Pace Nielsen
Mar 2 at 22:38
$begingroup$
In the toy example I gave above, the entries are chosen randomly. In the actual computation I care about the entries are determined by the coordinates. I made the toy example different in this way for two reasons: (1) the deterministic function is very complicated, and (2) I wanted to easily create a large matrix to help illustrate the situation. So, I'm having trouble converting your beautiful ideas to my situation, because "rowvalues" is not so easily created. My problem really boils down to trying to put all that data in a list quickly, as you have done with rowvalues.
$endgroup$
– Pace Nielsen
Feb 27 at 21:38
$begingroup$
In the toy example I gave above, the entries are chosen randomly. In the actual computation I care about the entries are determined by the coordinates. I made the toy example different in this way for two reasons: (1) the deterministic function is very complicated, and (2) I wanted to easily create a large matrix to help illustrate the situation. So, I'm having trouble converting your beautiful ideas to my situation, because "rowvalues" is not so easily created. My problem really boils down to trying to put all that data in a list quickly, as you have done with rowvalues.
$endgroup$
– Pace Nielsen
Feb 27 at 21:38
1
1
$begingroup$
Yes, I assumed that you can compute the values of each row before the actual matrix is constructed. I tried to address the functions
f
and g
in my recent edit. The only difference of my function f
to yours is that it expects a whole list of j
s as a second argument so that it creates all values of a row at once. Does this help?$endgroup$
– Henrik Schumacher
Feb 27 at 21:50
$begingroup$
Yes, I assumed that you can compute the values of each row before the actual matrix is constructed. I tried to address the functions
f
and g
in my recent edit. The only difference of my function f
to yours is that it expects a whole list of j
s as a second argument so that it creates all values of a row at once. Does this help?$endgroup$
– Henrik Schumacher
Feb 27 at 21:50
1
1
$begingroup$
This looks great! I'm still trying to understand it all. I'll let you know soon if I have any further questions.
$endgroup$
– Pace Nielsen
Feb 27 at 21:54
$begingroup$
This looks great! I'm still trying to understand it all. I'll let you know soon if I have any further questions.
$endgroup$
– Pace Nielsen
Feb 27 at 21:54
1
1
$begingroup$
@PaceNielsen Just in case you haven't noticed it, yet: I used the sizes of
n = 100000
and k = 100
as you indicated for your goal (instead of n = 10000
and k = 30
). By direct comparison, my approach is more than 20 times faster than collecting the array rules with AssociateTo
and applying SparseArray
on these rules. Might be relevant for your application...$endgroup$
– Henrik Schumacher
Mar 2 at 20:03
$begingroup$
@PaceNielsen Just in case you haven't noticed it, yet: I used the sizes of
n = 100000
and k = 100
as you indicated for your goal (instead of n = 10000
and k = 30
). By direct comparison, my approach is more than 20 times faster than collecting the array rules with AssociateTo
and applying SparseArray
on these rules. Might be relevant for your application...$endgroup$
– Henrik Schumacher
Mar 2 at 20:03
$begingroup$
I'll give your method another look. (I did code it up and the other was faster [because of the nature of how I was computing my functions], but I might have done it in a bad way.)
$endgroup$
– Pace Nielsen
Mar 2 at 22:38
$begingroup$
I'll give your method another look. (I did code it up and the other was faster [because of the nature of how I was computing my functions], but I might have done it in a bad way.)
$endgroup$
– Pace Nielsen
Mar 2 at 22:38
|
show 2 more comments
Thanks for contributing an answer to Mathematica Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f192328%2fquickly-creating-a-sparse-array%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
$begingroup$
If you can express the values of the non-zero terms as a function of the position indices, you should be able to construct the
SparseArray
from the ground up, by giving a list of rules, or by using patterns and conditions. That should be MUCH better than modifying them element-wise.$endgroup$
– MarcoB
Feb 27 at 20:27
$begingroup$
@MarcoB Yes, I have a function $f(i,j)$ that gives the value of the matrix in the $(i,j)$ coordinate. Moreover, I have a function $g(i)$ that gives me a list of those $j$ such that the $(i,j)$ entry is nonzero. (In the toy example above, you would replace "PossibleColumns" by $g(i)$, and you would replace "c/RandomInteger[1,10]" with $f(i,j)$.) I don't know how to turn that into a faster population of the sparse array though; any help with the syntax would be appreciated!
$endgroup$
– Pace Nielsen
Feb 27 at 20:45
$begingroup$
Can you share those functions, perhaps expressed both in plain language, and in Mathematica code? Just in case one of the two is more readable than the other.
$endgroup$
– MarcoB
Feb 27 at 21:00
5
$begingroup$
Changing elements of sparse arrays is slow because the whole array must be re-built after each change. Do not assign elements in a loop!! Instead, generate a list like
i,j -> value
first, then built the sparse array in a single step.$endgroup$
– Szabolcs
Feb 27 at 21:11