118 lines
5.3 KiB
Markdown
118 lines
5.3 KiB
Markdown
# backup.sh ~ descartes underwriting technical test for devops
|
||
|
||
The script takes as input a repository url, a branch, a destination directory and optionally a number of commit to backup, verbose mode, debug mode.
|
||
|
||
It checkouts a git repository, list all commits in a given branch (from oldest to most recent), and for each commit sha checks if it was already backuped (check for ${DATA_DIR}/<commit sha> directory), and if not, checks out all files mentionned in commit and dump their state at given commit into the per commit given directory.
|
||
|
||
The included workflow that actually performs the backup can be manually run through github UI.
|
||
|
||
## Usage
|
||
|
||
```sh
|
||
$ scripts/backup.sh -h
|
||
Usage: scripts/backup.sh [-r <repository url>] [-b <branch>] [-d <path dest>] [-n <num>] [-v] [-x] [-i] [-h]
|
||
|
||
Ex: scripts/backup.sh \
|
||
-r https://github.com/descartes-underwriting/devops-technical-test-data.git \
|
||
-b main \
|
||
-d /home/mycroft/dev/private-backup/data
|
||
|
||
Available flags:
|
||
-r <repository url> - set remote repository url (suffixed by .git) to backup
|
||
-b <branch> - branch to backup
|
||
-d </path/to/path> - where to backup; relative or absolute
|
||
-n <num> - number of commit to backup (default: unlimited)
|
||
-i - ignore tracking information - to restart from scratch
|
||
-v - verbose mode
|
||
-x - debug mode
|
||
-h - this help
|
||
|
||
```
|
||
|
||
Ex:
|
||
|
||
```sh
|
||
$ /bin/bash scripts/backup.sh -r https://github.com/descartes-underwriting/devops-technical-test-data.git -b 01-01-2022-test -d $(pwd)/data -n 5
|
||
Cloning into 'devops-technical-test-data'...
|
||
remote: Enumerating objects: 21265, done.
|
||
...
|
||
new commit: 282180fe7e5d9cbf297f2f0ef813cffe60ce2328
|
||
new commit: 46fe26c9dcf2354a0ed3f304ed6818de9606f7b5
|
||
new commit: 21e5331d1c0256701bb90cf017e519d54a88f618
|
||
new commit: 47998b5317e66b3bd456cfb07268c93e223704f2
|
||
new commit: 7c5aebc1feeef4eaf19083019547457b8cf3fc3d
|
||
done: 5
|
||
|
||
$ ls -l data/
|
||
total 0
|
||
drwxr-xr-x 1 patrick chicac 8 Jan 1 09:28 21e5331d1c0256701bb90cf017e519d54a88f618
|
||
drwxr-xr-x 1 patrick chicac 18 Jan 1 09:28 282180fe7e5d9cbf297f2f0ef813cffe60ce2328
|
||
drwxr-xr-x 1 patrick chicac 14 Jan 1 09:28 46fe26c9dcf2354a0ed3f304ed6818de9606f7b5
|
||
drwxr-xr-x 1 patrick chicac 14 Jan 1 09:28 47998b5317e66b3bd456cfb07268c93e223704f2
|
||
drwxr-xr-x 1 patrick chicac 14 Jan 1 09:28 7c5aebc1feeef4eaf19083019547457b8cf3fc3d
|
||
|
||
$
|
||
```
|
||
|
||
Note that deleted objects are ignored (no file is created). However, to track commit ids, empty directories can be created and archived, using a `.gitkeep`.
|
||
|
||
Check `.github/workflows/backup.yaml` for usage sample.
|
||
|
||
|
||
## Where is the full backup ?!
|
||
|
||
It is not here. I limited intentionally the workflow to automatically backup 10 commits. It it still possible to trigger the backup manually through github UI, or by running the script on your local workstation:
|
||
|
||
|
||
```sh
|
||
$ git clone git@github.com:mycroft/private-backup.git
|
||
$ cd private-backup
|
||
$ /bin/bash scripts/backup.sh -r https://github.com/descartes-underwriting/devops-technical-test-data.git -b 01-01-2022-test -d all
|
||
Cloning into 'devops-technical-test-data'...
|
||
remote: Enumerating objects: 21364, done.
|
||
remote: Counting objects: 100% (5149/5149), done.
|
||
remote: Compressing objects: 100% (1908/1908), done.
|
||
remote: Total 21364 (delta 1300), reused 5121 (delta 1274), pack-reused 16215 (from 1)
|
||
Receiving objects: 100% (21364/21364), 1.86 MiB | 9.48 MiB/s, done.
|
||
Resolving deltas: 100% (6177/6177), done.
|
||
new commit: 282180fe7e5d9cbf297f2f0ef813cffe60ce2328
|
||
new commit: 46fe26c9dcf2354a0ed3f304ed6818de9606f7b5
|
||
new commit: 21e5331d1c0256701bb90cf017e519d54a88f618
|
||
new commit: 47998b5317e66b3bd456cfb07268c93e223704f2
|
||
...
|
||
new commit: 39fb02bb4a57073d202b489e6bcca3279aecfb24
|
||
new commit: f20996fbdd4c424b81278ca8dec7e3da4571eca7
|
||
done: 4307 commits
|
||
|
||
$ ls -ld all/* | wc -l
|
||
4307
|
||
|
||
$ cat all/.track
|
||
f20996fbdd4c424b81278ca8dec7e3da4571eca7
|
||
|
||
$ find all/f20996fbdd4c424b81278ca8dec7e3da4571eca7/
|
||
ddd/f20996fbdd4c424b81278ca8dec7e3da4571eca7/
|
||
ddd/f20996fbdd4c424b81278ca8dec7e3da4571eca7/oild
|
||
ddd/f20996fbdd4c424b81278ca8dec7e3da4571eca7/oild/Laugho
|
||
ddd/f20996fbdd4c424b81278ca8dec7e3da4571eca7/oild/Laugho/ˈeɪʒə
|
||
ddd/f20996fbdd4c424b81278ca8dec7e3da4571eca7/oild/Laugho/ˈeɪʒə/arcluv.txt
|
||
```
|
||
|
||
|
||
|
||
## Tests
|
||
|
||
A sample test script exists to verify basic use case of backup'ing a limited number of commits. The `scripts/test.sh` runs a couple of backups and run a `diff -r` against a manually verified backup included in the repository. A workflow runs tests on push.
|
||
|
||
The script is testing a few backups (1, 5, 10) against Descartes Underwriting repository. Then I've created another repository with multiple files in commits to tests a bit more complex commits. I've the feeling this would require more work.
|
||
|
||
Check `.github/workflows/tests.yaml` for the test workflow.
|
||
|
||
|
||
## Not covered / Improvements ideas
|
||
|
||
- Keep commit messages, metadata (commiter, date), so it is possible to rebuild the git repository DAG.
|
||
- Move the script/workflow out of the backup repository to allow a re-usable workflow! Also, it is quite boring having the script in the same repository as it will create conflicts during push, as pushing will start backuping and will require pulling again the code.
|
||
- The script does not track deletion, modes, links, submodules. If we want to rebuild a repository, a lot of stuff will have to be done.
|
||
- More tests with badly formatted file names (with double quotes?)
|