2024-10-14 18:43:08 +00:00
2024-10-03 09:35:37 +02:00
2024-10-14 18:43:08 +00:00
2024-10-03 09:55:10 +00:00
2024-10-08 21:33:37 +02:00
2024-10-08 20:35:05 +02:00
2024-10-08 21:33:37 +02:00

backup.sh ~ descartes underwriting technical test for devops

The script takes as input a repository url, a branch, a destination directory and optionally a number of commit to backup, verbose mode, debug mode.

It checkouts a git repository, list all commits in a given branch (from oldest to most recent), and for each commit sha checks if it was already backuped (check for ${DATA_DIR}/ directory), and if not, checks out all files mentionned in commit and dump their state at given commit into the per commit given directory.

The included workflow that actually performs the backup can be manually run through github UI.

Usage

$ scripts/backup.sh -h
Usage: scripts/backup.sh [-r <repository url>] [-b <branch>] [-d <path dest>] [-n <num>] [-v] [-x] [-i] [-h]

Ex: scripts/backup.sh \
      -r https://github.com/descartes-underwriting/devops-technical-test-data.git \
      -b main \
      -d /home/mycroft/dev/private-backup/data

Available flags:
      -r <repository url> - set remote repository url (suffixed by .git) to backup
      -b <branch> - branch to backup
      -d </path/to/path> - where to backup; relative or absolute
      -n <num> - number of commit to backup (default: unlimited)
      -i - ignore tracking information - to restart from scratch
      -v - verbose mode
      -x - debug mode
      -h - this help

Ex:

$ /bin/bash scripts/backup.sh -r https://github.com/descartes-underwriting/devops-technical-test-data.git -b 01-01-2022-test -d $(pwd)/data -n 5
Cloning into 'devops-technical-test-data'...
remote: Enumerating objects: 21265, done.
...
new commit: 282180fe7e5d9cbf297f2f0ef813cffe60ce2328
new commit: 46fe26c9dcf2354a0ed3f304ed6818de9606f7b5
new commit: 21e5331d1c0256701bb90cf017e519d54a88f618
new commit: 47998b5317e66b3bd456cfb07268c93e223704f2
new commit: 7c5aebc1feeef4eaf19083019547457b8cf3fc3d
done: 5

$ ls -l data/
total 0
drwxr-xr-x 1 patrick chicac  8 Jan  1 09:28 21e5331d1c0256701bb90cf017e519d54a88f618
drwxr-xr-x 1 patrick chicac 18 Jan  1 09:28 282180fe7e5d9cbf297f2f0ef813cffe60ce2328
drwxr-xr-x 1 patrick chicac 14 Jan  1 09:28 46fe26c9dcf2354a0ed3f304ed6818de9606f7b5
drwxr-xr-x 1 patrick chicac 14 Jan  1 09:28 47998b5317e66b3bd456cfb07268c93e223704f2
drwxr-xr-x 1 patrick chicac 14 Jan  1 09:28 7c5aebc1feeef4eaf19083019547457b8cf3fc3d

$

Note that deleted objects are ignored (no file is created). However, to track commit ids, empty directories can be created and archived, using a .gitkeep.

Check .github/workflows/backup.yaml for usage sample.

Where is the full backup ?!

It is not here. I limited intentionally the workflow to automatically backup 10 commits. It it still possible to trigger the backup manually through github UI, or by running the script on your local workstation:

$ git clone git@github.com:mycroft/private-backup.git 
$ cd private-backup
$ /bin/bash scripts/backup.sh -r https://github.com/descartes-underwriting/devops-technical-test-data.git -b 01-01-2022-test -d all
Cloning into 'devops-technical-test-data'...
remote: Enumerating objects: 21364, done.
remote: Counting objects: 100% (5149/5149), done.
remote: Compressing objects: 100% (1908/1908), done.
remote: Total 21364 (delta 1300), reused 5121 (delta 1274), pack-reused 16215 (from 1)
Receiving objects: 100% (21364/21364), 1.86 MiB | 9.48 MiB/s, done.
Resolving deltas: 100% (6177/6177), done.
new commit: 282180fe7e5d9cbf297f2f0ef813cffe60ce2328
new commit: 46fe26c9dcf2354a0ed3f304ed6818de9606f7b5
new commit: 21e5331d1c0256701bb90cf017e519d54a88f618
new commit: 47998b5317e66b3bd456cfb07268c93e223704f2
...
new commit: 39fb02bb4a57073d202b489e6bcca3279aecfb24
new commit: f20996fbdd4c424b81278ca8dec7e3da4571eca7
done: 4307 commits

$ ls -ld all/* | wc -l
4307

$ cat all/.track
f20996fbdd4c424b81278ca8dec7e3da4571eca7

$ find all/f20996fbdd4c424b81278ca8dec7e3da4571eca7/
ddd/f20996fbdd4c424b81278ca8dec7e3da4571eca7/
ddd/f20996fbdd4c424b81278ca8dec7e3da4571eca7/oild
ddd/f20996fbdd4c424b81278ca8dec7e3da4571eca7/oild/Laugho
ddd/f20996fbdd4c424b81278ca8dec7e3da4571eca7/oild/Laugho/ˈeɪʒə
ddd/f20996fbdd4c424b81278ca8dec7e3da4571eca7/oild/Laugho/ˈeɪʒə/arcluv.txt

Tests

A sample test script exists to verify basic use case of backup'ing a limited number of commits. The scripts/test.sh runs a couple of backups and run a diff -r against a manually verified backup included in the repository. A workflow runs tests on push.

The script is testing a few backups (1, 5, 10) against Descartes Underwriting repository. Then I've created another repository with multiple files in commits to tests a bit more complex commits. I've the feeling this would require more work.

Check .github/workflows/tests.yaml for the test workflow.

Not covered / Improvements ideas

  • Keep commit messages, metadata (commiter, date), so it is possible to rebuild the git repository DAG.
  • Move the script/workflow out of the backup repository to allow a re-usable workflow! Also, it is quite boring having the script in the same repository as it will create conflicts during push, as pushing will start backuping and will require pulling again the code.
  • The script does not track deletion, modes, links, submodules. If we want to rebuild a repository, a lot of stuff will have to be done.
  • More tests with badly formatted file names (with double quotes?)
Description
No description provided
Readme 971 KiB
Languages
Shell 100%