59 lines
3.2 KiB
Markdown
59 lines
3.2 KiB
Markdown
# backup.sh ~ descartes underwriting technical test for devops
|
|
|
|
The script takes as input a repository url, a branch, a destination directory and optionally a number of commit to backup, verbose mode, debug mode.
|
|
|
|
It checkouts a git repository, list all commits in a given branch (from oldest to most recent), and for each commit sha checks if it was already backuped (check for ${DATA_DIR}/<commit sha> directory), and if not, checks out all files mentionned in commit and dump their state at given commit into the per commit given directory.
|
|
|
|
The included workflow that actually performs the backup can be manually run through github UI.
|
|
|
|
## Usage
|
|
|
|
```sh
|
|
$ /bin/bash scripts/backup.sh -r <repository> -b <branch> -d </path/to/data> [-n <commit limit>] [-v] [-x]
|
|
```
|
|
|
|
Ex:
|
|
|
|
```sh
|
|
$ /bin/bash scripts/backup.sh -r https://github.com/descartes-underwriting/devops-technical-test-data.git -b 01-01-2022-test -d $(pwd)/data -n 5
|
|
Cloning into 'devops-technical-test-data'...
|
|
remote: Enumerating objects: 21265, done.
|
|
...
|
|
new commit: 282180fe7e5d9cbf297f2f0ef813cffe60ce2328
|
|
new commit: 46fe26c9dcf2354a0ed3f304ed6818de9606f7b5
|
|
new commit: 21e5331d1c0256701bb90cf017e519d54a88f618
|
|
new commit: 47998b5317e66b3bd456cfb07268c93e223704f2
|
|
new commit: 7c5aebc1feeef4eaf19083019547457b8cf3fc3d
|
|
done: 5
|
|
|
|
$ ls -l data/
|
|
total 0
|
|
drwxr-xr-x 1 patrick chicac 8 Jan 1 09:28 21e5331d1c0256701bb90cf017e519d54a88f618
|
|
drwxr-xr-x 1 patrick chicac 18 Jan 1 09:28 282180fe7e5d9cbf297f2f0ef813cffe60ce2328
|
|
drwxr-xr-x 1 patrick chicac 14 Jan 1 09:28 46fe26c9dcf2354a0ed3f304ed6818de9606f7b5
|
|
drwxr-xr-x 1 patrick chicac 14 Jan 1 09:28 47998b5317e66b3bd456cfb07268c93e223704f2
|
|
drwxr-xr-x 1 patrick chicac 14 Jan 1 09:28 7c5aebc1feeef4eaf19083019547457b8cf3fc3d
|
|
|
|
$
|
|
```
|
|
|
|
Note that deleted objects are ignored (no file is created). However, to track commit ids, empty directories can be created and archived, using a `.gitkeep`.
|
|
|
|
Check `.github/workflows/backup.yaml` for usage sample.
|
|
|
|
|
|
## Tests
|
|
|
|
A sample test script exists to verify basic use case of backup'ing a limited number of commits. The `scripts/test.sh` runs a couple of backups and run a `diff -r` against a manually verified backup included in the repository. A workflow runs tests on push.
|
|
|
|
The script is testing a few backups (1, 5, 10) against Descartes Underwriting repository. Then I've created another repository with multiple files in commits to tests a bit more complex commits. I've the feeling this would require more work.
|
|
|
|
Check `.github/workflows/tests.yaml` for the test workflow.
|
|
|
|
|
|
## Not covered / Improvements ideas
|
|
|
|
- Keep track of latest commit backuped to allow iterative backups. As for now, the script checks all commits (which is not really efficient); However, if latest commit backup is tracked and this commit to be overwritten by a force push, it will be eventually required to re-do the whole backup.
|
|
- Keep commit messages, metadata (commiter, date), so it is possible to rebuild the git repository DAG.
|
|
- Move the script/workflow out of the backup repository to allow a re-usable workflow! Also, it is quite boring having the script in the same repository as it will create conflicts during push, as pushing will start backuping and will require pulling again the code.
|