How to big scale backup Gitlab?
Clash Royale CLAN TAG#URR8PPP
When asking Gitlab support on how to do a 3TB backup on ones on-premise Gitlab they reply use our tool that produces a tarball.
This just seams wrong to me on all levels. This tarball contains the postgres dump, docker images, repo data, GIT LFS, etc config and so on. Backing up TB of static data together with KB very dynamic data doesn't seam right. And then comes the issue of, we want to do a backup every hour.
Question
I'd really like to know from others how they do it, to get a consistent backup.
ZFS on Linux would be fine with me, if that is part of the solution.
linux backup postgresql zfs gitlab
|
show 3 more comments
When asking Gitlab support on how to do a 3TB backup on ones on-premise Gitlab they reply use our tool that produces a tarball.
This just seams wrong to me on all levels. This tarball contains the postgres dump, docker images, repo data, GIT LFS, etc config and so on. Backing up TB of static data together with KB very dynamic data doesn't seam right. And then comes the issue of, we want to do a backup every hour.
Question
I'd really like to know from others how they do it, to get a consistent backup.
ZFS on Linux would be fine with me, if that is part of the solution.
linux backup postgresql zfs gitlab
3
Why is this wrong? You back up your Gitlab completely to restore it completely. I don't think this is wrong. Of course it uses much more space than say, incremental backups, but...I wouldn't care about backup size.
– Lenniey
Feb 5 at 14:05
3
Having a backup every hour is not unheard of, but it is impossible to make a 3TB in less than hour with their approach. And backups for just one day would be ~100TB, where there might only be 10MB of changes to the data.
– Sandra
Feb 5 at 14:11
OK, this is a different question, not about the backup in general but about frequent backups.
– Lenniey
Feb 5 at 14:13
5
In their official docs they even mention their method as being slow and suggest alternatives:If your GitLab server contains a lot of Git repository data you may find the GitLab backup script to be too slow. In this case you can consider using filesystem snapshots as part of your backup strategy.
I can't speak from experience, though. But I may have to include something like this soon...
– Lenniey
Feb 5 at 14:19
Gitlab has options in the config file and backup flags that will allow you to exclude sections, or go so far as to store images and artifacts on an object store
– ssube
Feb 5 at 19:28
|
show 3 more comments
When asking Gitlab support on how to do a 3TB backup on ones on-premise Gitlab they reply use our tool that produces a tarball.
This just seams wrong to me on all levels. This tarball contains the postgres dump, docker images, repo data, GIT LFS, etc config and so on. Backing up TB of static data together with KB very dynamic data doesn't seam right. And then comes the issue of, we want to do a backup every hour.
Question
I'd really like to know from others how they do it, to get a consistent backup.
ZFS on Linux would be fine with me, if that is part of the solution.
linux backup postgresql zfs gitlab
When asking Gitlab support on how to do a 3TB backup on ones on-premise Gitlab they reply use our tool that produces a tarball.
This just seams wrong to me on all levels. This tarball contains the postgres dump, docker images, repo data, GIT LFS, etc config and so on. Backing up TB of static data together with KB very dynamic data doesn't seam right. And then comes the issue of, we want to do a backup every hour.
Question
I'd really like to know from others how they do it, to get a consistent backup.
ZFS on Linux would be fine with me, if that is part of the solution.
linux backup postgresql zfs gitlab
linux backup postgresql zfs gitlab
edited Feb 5 at 14:51
Sandra
asked Feb 5 at 13:57
SandraSandra
4,5253282139
4,5253282139
3
Why is this wrong? You back up your Gitlab completely to restore it completely. I don't think this is wrong. Of course it uses much more space than say, incremental backups, but...I wouldn't care about backup size.
– Lenniey
Feb 5 at 14:05
3
Having a backup every hour is not unheard of, but it is impossible to make a 3TB in less than hour with their approach. And backups for just one day would be ~100TB, where there might only be 10MB of changes to the data.
– Sandra
Feb 5 at 14:11
OK, this is a different question, not about the backup in general but about frequent backups.
– Lenniey
Feb 5 at 14:13
5
In their official docs they even mention their method as being slow and suggest alternatives:If your GitLab server contains a lot of Git repository data you may find the GitLab backup script to be too slow. In this case you can consider using filesystem snapshots as part of your backup strategy.
I can't speak from experience, though. But I may have to include something like this soon...
– Lenniey
Feb 5 at 14:19
Gitlab has options in the config file and backup flags that will allow you to exclude sections, or go so far as to store images and artifacts on an object store
– ssube
Feb 5 at 19:28
|
show 3 more comments
3
Why is this wrong? You back up your Gitlab completely to restore it completely. I don't think this is wrong. Of course it uses much more space than say, incremental backups, but...I wouldn't care about backup size.
– Lenniey
Feb 5 at 14:05
3
Having a backup every hour is not unheard of, but it is impossible to make a 3TB in less than hour with their approach. And backups for just one day would be ~100TB, where there might only be 10MB of changes to the data.
– Sandra
Feb 5 at 14:11
OK, this is a different question, not about the backup in general but about frequent backups.
– Lenniey
Feb 5 at 14:13
5
In their official docs they even mention their method as being slow and suggest alternatives:If your GitLab server contains a lot of Git repository data you may find the GitLab backup script to be too slow. In this case you can consider using filesystem snapshots as part of your backup strategy.
I can't speak from experience, though. But I may have to include something like this soon...
– Lenniey
Feb 5 at 14:19
Gitlab has options in the config file and backup flags that will allow you to exclude sections, or go so far as to store images and artifacts on an object store
– ssube
Feb 5 at 19:28
3
3
Why is this wrong? You back up your Gitlab completely to restore it completely. I don't think this is wrong. Of course it uses much more space than say, incremental backups, but...I wouldn't care about backup size.
– Lenniey
Feb 5 at 14:05
Why is this wrong? You back up your Gitlab completely to restore it completely. I don't think this is wrong. Of course it uses much more space than say, incremental backups, but...I wouldn't care about backup size.
– Lenniey
Feb 5 at 14:05
3
3
Having a backup every hour is not unheard of, but it is impossible to make a 3TB in less than hour with their approach. And backups for just one day would be ~100TB, where there might only be 10MB of changes to the data.
– Sandra
Feb 5 at 14:11
Having a backup every hour is not unheard of, but it is impossible to make a 3TB in less than hour with their approach. And backups for just one day would be ~100TB, where there might only be 10MB of changes to the data.
– Sandra
Feb 5 at 14:11
OK, this is a different question, not about the backup in general but about frequent backups.
– Lenniey
Feb 5 at 14:13
OK, this is a different question, not about the backup in general but about frequent backups.
– Lenniey
Feb 5 at 14:13
5
5
In their official docs they even mention their method as being slow and suggest alternatives:
If your GitLab server contains a lot of Git repository data you may find the GitLab backup script to be too slow. In this case you can consider using filesystem snapshots as part of your backup strategy.
I can't speak from experience, though. But I may have to include something like this soon...– Lenniey
Feb 5 at 14:19
In their official docs they even mention their method as being slow and suggest alternatives:
If your GitLab server contains a lot of Git repository data you may find the GitLab backup script to be too slow. In this case you can consider using filesystem snapshots as part of your backup strategy.
I can't speak from experience, though. But I may have to include something like this soon...– Lenniey
Feb 5 at 14:19
Gitlab has options in the config file and backup flags that will allow you to exclude sections, or go so far as to store images and artifacts on an object store
– ssube
Feb 5 at 19:28
Gitlab has options in the config file and backup flags that will allow you to exclude sections, or go so far as to store images and artifacts on an object store
– ssube
Feb 5 at 19:28
|
show 3 more comments
2 Answers
2
active
oldest
votes
For such a short time between backups (1h), your best bet is to rely on filesystem-level snapshot and send/recv
support.
If using ZoL is not a problem in your environment, I would strongly advise to use it. ZFS is a very robust filesystem and you will really like all the extras (eg: compression) it offer. When coupled with sanoid/syncoid
, it can provide a very strong backup strategy. The main disvantage is that it is not included into mainline kernel, so you need to install/update it separately.
Alternatively, if you really need to restrict yourself to mainline-included stuff, you can use BTRFS. But be sure to understand its (many) drawbacks and pita.
Finally, an alternative solution is to use lvmthin
to take regular backups (eg: with snapper
), relying on third party tools (eg: bdsync
, blocksync
, etc) to copy/ship deltas only.
A different approach would be to have two replicated machines (via DRBD
) where you take indipendent snapshots via lvmthin
.
What about postgres? Would to stop gitlab and postgres for a minute, so a consistant shapshot could be made? Ideally it would be great if postgres could be put in a read-only mode while the snapshot is made.
– Sandra
Feb 5 at 15:07
4
@Sandra restoring from a filesystem snapshots should appear to postgresql (and any other properly written databases) as a generic "host crash" scenario, triggering its own recovery procedure (ie: committing to main database any partially written page). In other words, you do not need to put postgres into read-only mode when taking snapshots.
– shodanshok
Feb 5 at 16:01
add a comment |
I would review what you are backing up and possibly use a "multi-path" approach. For example, you could backup the Git repositories by constantly running through Git pulls on a backup servers. That would copy only the diff and leave you with a second copy of all Git repositories. Presumably you could detect new repos with the API.
And use the "built-in" backup procedures to backup the issues, etc. I doubt that the 3TB comes from this part so you would be able to do backups very often at very little cost. You could also set up the PostgreSQL database with a warm standby with replication.
Possibly your 3TB comes from container images in the Docker registry. Do you need to back those up? If so, then there may be a better approach just for that.
Basically, I would recommend really looking at what it is that makes up your backup and backup the data in various parts.
Even the backup tool from GitLab has options to include/exclude certain parts of the system such as the Docker Registry.
1
git pulls is not a perfect incremental backup.git push --force
will either break the backups or erase history from them, depending on how it's implemented.
– dn3s
Feb 6 at 3:23
@dn3s that's why you always disable git push --force on main repository. If someone want's to change history they can make their own fork, and accept all the risks it brings.
– charlie_pl
Feb 6 at 6:39
2
that might be fine for replication, but you don't want your backups' integrity to rely on correct application behavior. what happens if there's a bug in the application, or it's misconfigured down the road? what if your server is compromised by a malicious user? if your application has the ability to remove content from the backup host, much of the value of incremental remote backups is lost.
– dn3s
Feb 6 at 7:46
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f952392%2fhow-to-big-scale-backup-gitlab%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
For such a short time between backups (1h), your best bet is to rely on filesystem-level snapshot and send/recv
support.
If using ZoL is not a problem in your environment, I would strongly advise to use it. ZFS is a very robust filesystem and you will really like all the extras (eg: compression) it offer. When coupled with sanoid/syncoid
, it can provide a very strong backup strategy. The main disvantage is that it is not included into mainline kernel, so you need to install/update it separately.
Alternatively, if you really need to restrict yourself to mainline-included stuff, you can use BTRFS. But be sure to understand its (many) drawbacks and pita.
Finally, an alternative solution is to use lvmthin
to take regular backups (eg: with snapper
), relying on third party tools (eg: bdsync
, blocksync
, etc) to copy/ship deltas only.
A different approach would be to have two replicated machines (via DRBD
) where you take indipendent snapshots via lvmthin
.
What about postgres? Would to stop gitlab and postgres for a minute, so a consistant shapshot could be made? Ideally it would be great if postgres could be put in a read-only mode while the snapshot is made.
– Sandra
Feb 5 at 15:07
4
@Sandra restoring from a filesystem snapshots should appear to postgresql (and any other properly written databases) as a generic "host crash" scenario, triggering its own recovery procedure (ie: committing to main database any partially written page). In other words, you do not need to put postgres into read-only mode when taking snapshots.
– shodanshok
Feb 5 at 16:01
add a comment |
For such a short time between backups (1h), your best bet is to rely on filesystem-level snapshot and send/recv
support.
If using ZoL is not a problem in your environment, I would strongly advise to use it. ZFS is a very robust filesystem and you will really like all the extras (eg: compression) it offer. When coupled with sanoid/syncoid
, it can provide a very strong backup strategy. The main disvantage is that it is not included into mainline kernel, so you need to install/update it separately.
Alternatively, if you really need to restrict yourself to mainline-included stuff, you can use BTRFS. But be sure to understand its (many) drawbacks and pita.
Finally, an alternative solution is to use lvmthin
to take regular backups (eg: with snapper
), relying on third party tools (eg: bdsync
, blocksync
, etc) to copy/ship deltas only.
A different approach would be to have two replicated machines (via DRBD
) where you take indipendent snapshots via lvmthin
.
What about postgres? Would to stop gitlab and postgres for a minute, so a consistant shapshot could be made? Ideally it would be great if postgres could be put in a read-only mode while the snapshot is made.
– Sandra
Feb 5 at 15:07
4
@Sandra restoring from a filesystem snapshots should appear to postgresql (and any other properly written databases) as a generic "host crash" scenario, triggering its own recovery procedure (ie: committing to main database any partially written page). In other words, you do not need to put postgres into read-only mode when taking snapshots.
– shodanshok
Feb 5 at 16:01
add a comment |
For such a short time between backups (1h), your best bet is to rely on filesystem-level snapshot and send/recv
support.
If using ZoL is not a problem in your environment, I would strongly advise to use it. ZFS is a very robust filesystem and you will really like all the extras (eg: compression) it offer. When coupled with sanoid/syncoid
, it can provide a very strong backup strategy. The main disvantage is that it is not included into mainline kernel, so you need to install/update it separately.
Alternatively, if you really need to restrict yourself to mainline-included stuff, you can use BTRFS. But be sure to understand its (many) drawbacks and pita.
Finally, an alternative solution is to use lvmthin
to take regular backups (eg: with snapper
), relying on third party tools (eg: bdsync
, blocksync
, etc) to copy/ship deltas only.
A different approach would be to have two replicated machines (via DRBD
) where you take indipendent snapshots via lvmthin
.
For such a short time between backups (1h), your best bet is to rely on filesystem-level snapshot and send/recv
support.
If using ZoL is not a problem in your environment, I would strongly advise to use it. ZFS is a very robust filesystem and you will really like all the extras (eg: compression) it offer. When coupled with sanoid/syncoid
, it can provide a very strong backup strategy. The main disvantage is that it is not included into mainline kernel, so you need to install/update it separately.
Alternatively, if you really need to restrict yourself to mainline-included stuff, you can use BTRFS. But be sure to understand its (many) drawbacks and pita.
Finally, an alternative solution is to use lvmthin
to take regular backups (eg: with snapper
), relying on third party tools (eg: bdsync
, blocksync
, etc) to copy/ship deltas only.
A different approach would be to have two replicated machines (via DRBD
) where you take indipendent snapshots via lvmthin
.
answered Feb 5 at 14:59
shodanshokshodanshok
26.2k34587
26.2k34587
What about postgres? Would to stop gitlab and postgres for a minute, so a consistant shapshot could be made? Ideally it would be great if postgres could be put in a read-only mode while the snapshot is made.
– Sandra
Feb 5 at 15:07
4
@Sandra restoring from a filesystem snapshots should appear to postgresql (and any other properly written databases) as a generic "host crash" scenario, triggering its own recovery procedure (ie: committing to main database any partially written page). In other words, you do not need to put postgres into read-only mode when taking snapshots.
– shodanshok
Feb 5 at 16:01
add a comment |
What about postgres? Would to stop gitlab and postgres for a minute, so a consistant shapshot could be made? Ideally it would be great if postgres could be put in a read-only mode while the snapshot is made.
– Sandra
Feb 5 at 15:07
4
@Sandra restoring from a filesystem snapshots should appear to postgresql (and any other properly written databases) as a generic "host crash" scenario, triggering its own recovery procedure (ie: committing to main database any partially written page). In other words, you do not need to put postgres into read-only mode when taking snapshots.
– shodanshok
Feb 5 at 16:01
What about postgres? Would to stop gitlab and postgres for a minute, so a consistant shapshot could be made? Ideally it would be great if postgres could be put in a read-only mode while the snapshot is made.
– Sandra
Feb 5 at 15:07
What about postgres? Would to stop gitlab and postgres for a minute, so a consistant shapshot could be made? Ideally it would be great if postgres could be put in a read-only mode while the snapshot is made.
– Sandra
Feb 5 at 15:07
4
4
@Sandra restoring from a filesystem snapshots should appear to postgresql (and any other properly written databases) as a generic "host crash" scenario, triggering its own recovery procedure (ie: committing to main database any partially written page). In other words, you do not need to put postgres into read-only mode when taking snapshots.
– shodanshok
Feb 5 at 16:01
@Sandra restoring from a filesystem snapshots should appear to postgresql (and any other properly written databases) as a generic "host crash" scenario, triggering its own recovery procedure (ie: committing to main database any partially written page). In other words, you do not need to put postgres into read-only mode when taking snapshots.
– shodanshok
Feb 5 at 16:01
add a comment |
I would review what you are backing up and possibly use a "multi-path" approach. For example, you could backup the Git repositories by constantly running through Git pulls on a backup servers. That would copy only the diff and leave you with a second copy of all Git repositories. Presumably you could detect new repos with the API.
And use the "built-in" backup procedures to backup the issues, etc. I doubt that the 3TB comes from this part so you would be able to do backups very often at very little cost. You could also set up the PostgreSQL database with a warm standby with replication.
Possibly your 3TB comes from container images in the Docker registry. Do you need to back those up? If so, then there may be a better approach just for that.
Basically, I would recommend really looking at what it is that makes up your backup and backup the data in various parts.
Even the backup tool from GitLab has options to include/exclude certain parts of the system such as the Docker Registry.
1
git pulls is not a perfect incremental backup.git push --force
will either break the backups or erase history from them, depending on how it's implemented.
– dn3s
Feb 6 at 3:23
@dn3s that's why you always disable git push --force on main repository. If someone want's to change history they can make their own fork, and accept all the risks it brings.
– charlie_pl
Feb 6 at 6:39
2
that might be fine for replication, but you don't want your backups' integrity to rely on correct application behavior. what happens if there's a bug in the application, or it's misconfigured down the road? what if your server is compromised by a malicious user? if your application has the ability to remove content from the backup host, much of the value of incremental remote backups is lost.
– dn3s
Feb 6 at 7:46
add a comment |
I would review what you are backing up and possibly use a "multi-path" approach. For example, you could backup the Git repositories by constantly running through Git pulls on a backup servers. That would copy only the diff and leave you with a second copy of all Git repositories. Presumably you could detect new repos with the API.
And use the "built-in" backup procedures to backup the issues, etc. I doubt that the 3TB comes from this part so you would be able to do backups very often at very little cost. You could also set up the PostgreSQL database with a warm standby with replication.
Possibly your 3TB comes from container images in the Docker registry. Do you need to back those up? If so, then there may be a better approach just for that.
Basically, I would recommend really looking at what it is that makes up your backup and backup the data in various parts.
Even the backup tool from GitLab has options to include/exclude certain parts of the system such as the Docker Registry.
1
git pulls is not a perfect incremental backup.git push --force
will either break the backups or erase history from them, depending on how it's implemented.
– dn3s
Feb 6 at 3:23
@dn3s that's why you always disable git push --force on main repository. If someone want's to change history they can make their own fork, and accept all the risks it brings.
– charlie_pl
Feb 6 at 6:39
2
that might be fine for replication, but you don't want your backups' integrity to rely on correct application behavior. what happens if there's a bug in the application, or it's misconfigured down the road? what if your server is compromised by a malicious user? if your application has the ability to remove content from the backup host, much of the value of incremental remote backups is lost.
– dn3s
Feb 6 at 7:46
add a comment |
I would review what you are backing up and possibly use a "multi-path" approach. For example, you could backup the Git repositories by constantly running through Git pulls on a backup servers. That would copy only the diff and leave you with a second copy of all Git repositories. Presumably you could detect new repos with the API.
And use the "built-in" backup procedures to backup the issues, etc. I doubt that the 3TB comes from this part so you would be able to do backups very often at very little cost. You could also set up the PostgreSQL database with a warm standby with replication.
Possibly your 3TB comes from container images in the Docker registry. Do you need to back those up? If so, then there may be a better approach just for that.
Basically, I would recommend really looking at what it is that makes up your backup and backup the data in various parts.
Even the backup tool from GitLab has options to include/exclude certain parts of the system such as the Docker Registry.
I would review what you are backing up and possibly use a "multi-path" approach. For example, you could backup the Git repositories by constantly running through Git pulls on a backup servers. That would copy only the diff and leave you with a second copy of all Git repositories. Presumably you could detect new repos with the API.
And use the "built-in" backup procedures to backup the issues, etc. I doubt that the 3TB comes from this part so you would be able to do backups very often at very little cost. You could also set up the PostgreSQL database with a warm standby with replication.
Possibly your 3TB comes from container images in the Docker registry. Do you need to back those up? If so, then there may be a better approach just for that.
Basically, I would recommend really looking at what it is that makes up your backup and backup the data in various parts.
Even the backup tool from GitLab has options to include/exclude certain parts of the system such as the Docker Registry.
answered Feb 5 at 15:08
ETLETL
5,49711944
5,49711944
1
git pulls is not a perfect incremental backup.git push --force
will either break the backups or erase history from them, depending on how it's implemented.
– dn3s
Feb 6 at 3:23
@dn3s that's why you always disable git push --force on main repository. If someone want's to change history they can make their own fork, and accept all the risks it brings.
– charlie_pl
Feb 6 at 6:39
2
that might be fine for replication, but you don't want your backups' integrity to rely on correct application behavior. what happens if there's a bug in the application, or it's misconfigured down the road? what if your server is compromised by a malicious user? if your application has the ability to remove content from the backup host, much of the value of incremental remote backups is lost.
– dn3s
Feb 6 at 7:46
add a comment |
1
git pulls is not a perfect incremental backup.git push --force
will either break the backups or erase history from them, depending on how it's implemented.
– dn3s
Feb 6 at 3:23
@dn3s that's why you always disable git push --force on main repository. If someone want's to change history they can make their own fork, and accept all the risks it brings.
– charlie_pl
Feb 6 at 6:39
2
that might be fine for replication, but you don't want your backups' integrity to rely on correct application behavior. what happens if there's a bug in the application, or it's misconfigured down the road? what if your server is compromised by a malicious user? if your application has the ability to remove content from the backup host, much of the value of incremental remote backups is lost.
– dn3s
Feb 6 at 7:46
1
1
git pulls is not a perfect incremental backup.
git push --force
will either break the backups or erase history from them, depending on how it's implemented.– dn3s
Feb 6 at 3:23
git pulls is not a perfect incremental backup.
git push --force
will either break the backups or erase history from them, depending on how it's implemented.– dn3s
Feb 6 at 3:23
@dn3s that's why you always disable git push --force on main repository. If someone want's to change history they can make their own fork, and accept all the risks it brings.
– charlie_pl
Feb 6 at 6:39
@dn3s that's why you always disable git push --force on main repository. If someone want's to change history they can make their own fork, and accept all the risks it brings.
– charlie_pl
Feb 6 at 6:39
2
2
that might be fine for replication, but you don't want your backups' integrity to rely on correct application behavior. what happens if there's a bug in the application, or it's misconfigured down the road? what if your server is compromised by a malicious user? if your application has the ability to remove content from the backup host, much of the value of incremental remote backups is lost.
– dn3s
Feb 6 at 7:46
that might be fine for replication, but you don't want your backups' integrity to rely on correct application behavior. what happens if there's a bug in the application, or it's misconfigured down the road? what if your server is compromised by a malicious user? if your application has the ability to remove content from the backup host, much of the value of incremental remote backups is lost.
– dn3s
Feb 6 at 7:46
add a comment |
Thanks for contributing an answer to Server Fault!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f952392%2fhow-to-big-scale-backup-gitlab%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
Why is this wrong? You back up your Gitlab completely to restore it completely. I don't think this is wrong. Of course it uses much more space than say, incremental backups, but...I wouldn't care about backup size.
– Lenniey
Feb 5 at 14:05
3
Having a backup every hour is not unheard of, but it is impossible to make a 3TB in less than hour with their approach. And backups for just one day would be ~100TB, where there might only be 10MB of changes to the data.
– Sandra
Feb 5 at 14:11
OK, this is a different question, not about the backup in general but about frequent backups.
– Lenniey
Feb 5 at 14:13
5
In their official docs they even mention their method as being slow and suggest alternatives:
If your GitLab server contains a lot of Git repository data you may find the GitLab backup script to be too slow. In this case you can consider using filesystem snapshots as part of your backup strategy.
I can't speak from experience, though. But I may have to include something like this soon...– Lenniey
Feb 5 at 14:19
Gitlab has options in the config file and backup flags that will allow you to exclude sections, or go so far as to store images and artifacts on an object store
– ssube
Feb 5 at 19:28