The case of source code backup


Never before have organizations dealt with so much information – or been more concerned about it falling into the wrong hands. This concern applies to all data, but especially to the source code they rely on to run their processes.

Companies and individuals rely on platforms such as GitHub, GitLab, and BitBucket to store and manage their source code and keep their development projects running. These platforms are hugely popular: GitHub has over 73 million developers and 200 million repositories, GitLab estimates 30 million registered users, and BitBucket reported 10 million users in 2019.

If security teams aren’t worried about the source code stored on these platforms, they should be because chances are their developers have at least a few projects they keep there. Some attacks in recent years have highlighted the threat: a 2019 ransomware attack wiped out Git source code repositories on all platforms and replaced them with a ransom note. There is also a risk of downtime, as was the case when GitHub was down for at least two hours in June 2020.

The cost of losing source code is high, says John Bambenek, principal threat hunter at Netenrich.

“Everything essential to an organization should be backed up,” he says. “A good rule of thumb is, ‘Can the business continue to operate without it?’ and if the answer is no, there must be a backup plan.”

There are many reasons why a company may not think about backing up its source code. This may be partly about wanting to save money and partly about feeling invulnerable to attacks that would compromise their source code. There’s also the reality that backups cost money with no tangible benefit — until they’re needed, notes Mark Loveless, principal security engineer at GitLab.

“For the most part, you’re just doing something that you don’t see an immediate gain from,” he says. “That’s how backups work. You don’t see an immediate gain, and you never want to see an immediate gain on backups because you hope everything works and you never have to resort to them. But you you need a plan for that.”

Awareness is another issue. Some people may not back up their source code because they don’t think they have to, he adds. GitLab, GitHub, and BitBucket, like major cloud providers, have a “shared responsibility model” in which users and service providers share responsibility for protecting their information.

GitLab backs up to its own servers “almost constantly,” Loveless says, but many people have their own instance of GitLab running on their own private cloud space or on a physical server in their data center. In these cases, users should consider which cloud provider they use, what kind of backups they keep, and when they want their data backed up.

“Git…since it stores a history of code saves and you can perform rollbacks to a previous version of code, [users] tend to think there’s a backup,” says Loveless. “There is, as far as revisions and code changes…but those are stored in a database [and] data files, and these should be backed up.”

A working copy of the repository on each computer should not be considered a backup because it usually only contains the source code and not the issues, comments, pull requests, and other metadata associated with it. It’s common to think that a Git repository or other version control is enough, adds Taylor Gulley, senior application security consultant at nVisium. Version control, while very useful, still only has your code stored in one centralized location.

“Unless your disaster recovery plan is to extract code from a developer’s local machine – assuming there are any that survive the incident that destroyed the server – proper backups are essential. “, says Gulley.

What businesses need to know about the process
Source code backups can take many forms. Businesses can choose to manage their own backups and support the associated infrastructure, processes, and repair costs. While this gives them better control over their data, it may cost more in the long run due to the resources spent on maintenance.

Manual backups also involve technical challenges. It’s difficult to keep all assets consistent to make them retrievable in any Git repository because each vendor has their own API, process, feedback, and issues. API request rate limits pose another hurdle: typically, Git backup is associated with sending many requests to the Git provider API, and they need to limit the number of requests sent in a limited period of time.

Alternatively, they can turn to a third party that handles backup management. In many cases, there are cloud services that can help with this, notes Bambenek. Organizations can turn to a service such as, a tool designed to back up code to GitHub, GitLab, and BitBucket.

“The need was found within our own company,” GitProtect Product Development Manager Greg Bak says of the creation of the product. “We had internal scripts to protect these repositories, but no one was able to guarantee that we would always be able to restore these repositories… that they are properly protected, that our backups are tested. So we decided to [build] this.”

GitProtect is available in two models: backup as a service and on-premises, so organizations can install it locally or deploy it to the public cloud. The product’s goal is not only to protect source code, but also all associated metadata needed to maintain a repository’s consistency, such as comments, issues, and CI/CD tasks, Bak explains.

There are a number of threats that can compromise source code, beyond attacks targeting repositories and the potential disruption of these platforms. Human error and unwanted changes to the code itself could require backups to get processes running again, he adds.

Backup Best Practices
However you decide to back up your source code, GitLab’s Loveless advises bringing a security expert into the room.

“Invest in security guards,” he says. “If you can get people in there, experienced people who know how to do this, invest in people and you should get much better results.”

Experts also advise keeping backups stored in a safe place and encrypted. If you are running a multi-cloud environment, rotate backups offsite or off-system. Gulley recommends keeping a few copies onsite and one offsite, in case the location is compromised. Previous backups should not be able to be modified or deleted by automated backup processes or accounts.

All experts agree that it is not enough to make source code backups. It’s also important to test them and make sure they work. If they don’t, you don’t want to know when you need them. Test the process for accessing and using backups to ensure that you can use them and that everyone involved understands their role in the event of an attack, failure or compromise.


Comments are closed.