From 7e2bdb069602711f55cea9e3fc42c95da2b400af Mon Sep 17 00:00:00 2001 From: Patrick Marie Date: Thu, 3 Oct 2024 11:54:28 +0200 Subject: [PATCH] feat: keep track of empty commits; handle deleted files --- README.md | 4 +- scripts/backup.sh | 24 ++++- .../README.md | 100 ------------------ .../README.md | 92 ---------------- .../.gitkeep | 0 .../.gitkeep | 0 6 files changed, 22 insertions(+), 198 deletions(-) delete mode 100644 tests/tests_cases/10/77feffe3f6ccdf44a5f4150b3258e1a745e57807/README.md delete mode 100644 tests/tests_cases/10/c37c77fbf9d12418c83e1510fb0bdece8bd2e11c/README.md create mode 100644 tests/tests_cases/10/d6c5752c05317a5c82f2c4de20acc3d22b110fcd/.gitkeep create mode 100644 tests/tests_cases/10/f782f60197e2802cd52f6e5b424a0e246962e3f4/.gitkeep diff --git a/README.md b/README.md index dab68af..0f0860d 100644 --- a/README.md +++ b/README.md @@ -37,6 +37,8 @@ drwxr-xr-x 1 patrick chicac 14 Jan 1 09:28 7c5aebc1feeef4eaf19083019547457b8cf3 $ ``` +Note that deleted objects are ignored (no file is created). However, to track commit ids, empty directories can be created and archived, using a `.gitkeep`. + Check `.github/workflows/backup.yaml` for usage sample. ## Tests @@ -45,5 +47,5 @@ A sample test script exists to verify basic use case of backup'ing a limited num ## Not covered / Improvements ideas -- Deleted files in commits; The backup.sh script does not cover deleted files; - Keep track of latest commit backuped to allow iterative backups. As for now, the script checks all commits (which is not really efficient); However, if latest commit backup is tracked and this commit to be overwritten by a force push, it will be eventually required to re-do the whole backup. + diff --git a/scripts/backup.sh b/scripts/backup.sh index 915e644..24019e2 100644 --- a/scripts/backup.sh +++ b/scripts/backup.sh @@ -70,13 +70,12 @@ do # There are malformed files names that are creating complex filenames to parse. # Those malformed filenames are double quotes, so to remove quotes, -c core.quotepath=false # and -z are used. sed 's/\x0//g' is removing the null byte - FILES=$($GIT ${GIT_OPTS[@]} show --pretty= --name-only -z ${COMMIT_SHA} | sed 's/\x0//g') + FILES=$($GIT ${GIT_OPTS[@]} show --pretty= --name-only -z ${COMMIT_SHA} | sed 's/\x0//g') if test -z ${FILES} then # merge commit, etc. There is no file here. - echo "No file was found in commit ${COMMIT_SHA}; skipping" - continue + echo "No file was found in commit ${COMMIT_SHA}" fi TARGET_BACKUP_SHA=${DATA_DIR}/${COMMIT_SHA} @@ -92,10 +91,25 @@ do echo "Writing ${TARGET_BACKUP_SHA}/${FILE}" fi - # ${FILE} contains path/to/file - $GIT ${GIT_OPTS[@]} show ${COMMIT_SHA}:${FILE} > ${TARGET_BACKUP_SHA}/${FILE} + # Retrieve file state (Added, Modified, Deleted) + STATE=$(${GIT} ${GIT_OPTS[@]} show --name-status --pretty= ${COMMIT_SHA} | grep -E '^..${FILE}$' | cut -f1) + + if test "${STATE}" != "D" + then + # ${FILE} contains path/to/file + $GIT ${GIT_OPTS[@]} show ${COMMIT_SHA}:${FILE} > ${TARGET_BACKUP_SHA}/${FILE} + else + echo "Skipping ${FILE} as file was deleted in this commit" + fi done + # if ${TARGET_BACKUP_SHA} is empty, keep track it was "backuped" + if test -z "$(ls -A ${TARGET_BACKUP_SHA})" + then + echo "Folder ${TARGET_BACKUP_SHA} is empty. Marking it to keep as it in the backup" + touch ${TARGET_BACKUP_SHA}/.gitkeep + fi + NUM_ADDED=$((NUM_ADDED + 1)) if test ${NUM_ADDED} -eq ${MAX_NUM} diff --git a/tests/tests_cases/10/77feffe3f6ccdf44a5f4150b3258e1a745e57807/README.md b/tests/tests_cases/10/77feffe3f6ccdf44a5f4150b3258e1a745e57807/README.md deleted file mode 100644 index be0f4bf..0000000 --- a/tests/tests_cases/10/77feffe3f6ccdf44a5f4150b3258e1a745e57807/README.md +++ /dev/null @@ -1,100 +0,0 @@ -# Descartes Underwriting - -## Context - -We wish to create a backup tool that will save only the last modified files of a storage unit. - -In our example, the storage unit is **not a bucket**. - -The storage unit is the `DD-MM-YYYY-test` branch of the current `descartes-underwriting/devops-technical-test-data` git repository. - -## Property - -The `descartes-underwriting/devops-technical-test-data` repository is not frozen and will have new commits. - -Commits will be added to the `DD-MM-YYYY-test` branch multiple times every day. - -The `DD-MM-YYYY-test` branch name will be adapted using standard datetime convention eg: `01-01-2022-test` for the 1st of January 2022. - -## Task - -Develop a backup tool to save the modified files at each commit. - -### Submission - -If something is not clear, you can ask questions to the recruiter. - -When submitting your project, your version should **not be draft** but complete and following best practices. - -The solution should be saved on a **private** `descartes-devops` repository on your github account. - -The solution should include: - -- source code -- test code - -When the final version is ready: - -1. Send an email to the recruiter indicating that you finished the project and sharing the url of the project -2. Grant access to: - -- -- -- - -### Script - -Create a script to automate the backup process using open source software. - -The script should track the changes fo the branch `DD-MM-YYYY-test` of the `descartes-underwriting/devops-technical-test-data` repository. - -The execution of the script should be carried out with a github-action / gitlab-pipeline or any other tool automating git workflow on your git project. - -It is highly recommended to use a scheduling tool to execute the back up process. - -### Data - -The backup should store files in separate folders. - -The backup file structure should be based on the sha1 of the `descartes-underwriting/devops-technical-test-data`. - -Starting from the initial commit [282180fe7e5d9cbf297f2f0ef813cffe60ce2328](https://github.com/descartes-underwriting/devops-technical-test-data/commit/282180fe7e5d9cbf297f2f0ef813cffe60ce2328), all the history should be backup. - -## File structure example - -For the following commits on the `descartes-underwriting/devops-technical-test-data`: - -| SHA | OPERATION | -|-----|-----------| -| Commit_N | create readme.md | -| Commit_N+1 | create doc.txt | -| Commit_N+2 | create data/test/test.txt | -| Commit_N+3 | append text to ./doc.txt | -| Commit_N+4 | create test/project/project1.txt | - -The `candidate/descartes-backup-project` repository should have - -```bash -$ tree . -. -├── .gitworkflow -│   └── workflows -│   └── my-lovely-workflow.yml -├── data -│   ├── N -│   │   └── readme.md -│   ├── N+1 -│   │   └── doc.txt -│   ├── N+2 -│   │   └── data -│   │   └── test -│   │   └── test.txt -│   ├── N+3 -│   │   └── doc.txt -│   └── N+4 -│   └── test -│   └── project -│   └── project1.txt -└── script - └── my-beautiful-script.best-language -``` diff --git a/tests/tests_cases/10/c37c77fbf9d12418c83e1510fb0bdece8bd2e11c/README.md b/tests/tests_cases/10/c37c77fbf9d12418c83e1510fb0bdece8bd2e11c/README.md deleted file mode 100644 index b8c1f6a..0000000 --- a/tests/tests_cases/10/c37c77fbf9d12418c83e1510fb0bdece8bd2e11c/README.md +++ /dev/null @@ -1,92 +0,0 @@ -# Descartes Underwriting - -## Context - -We wish to create a backup tool that will save only the last modified files of a storage unit. - -In our example, the storage unit is **not a bucket**. - -The storage unit is the `DD-MM-YYYY-test` branch of the current `descartes-underwriting/devops-technical-test-data` git repository. - -## Property - -The `descartes-underwriting/devops-technical-test-data` repository is not frozen and will have new commits. - -Commits will be added to the `DD-MM-YYYY-test` branch multiple times every day. - -The `DD-MM-YYYY-test` branch name will be adapted using standard datetime convention eg: `01-01-2022-test` for the 1st of January 2022. - -## Task - -Develop a backup tool to save the modified files at each commit. - -### Submission - -Script and data should be saved on a private `candidate/descartes-backup-project` repository on your github account. - -Access should be granted to all members of the `descartes-underwriting` group: - - - -Especially: - -* -* -* - -### Script - -Create a script to automate the backup process using open source software. - -The script should track the changes fo the branch `DD-MM-YYYY-test` of the `descartes-underwriting/devops-technical-test-data` repository. - -The execution of the script should be carried out with a github-action / gitlab-pipeline or any other tool automating git workflow on your git project. - -It is highly recommended to use a scheduling tool to execute the back up process. - -### Data - -The backup should store files in separate folders. - -The backup file structure should be based on the sha1 of the `descartes-underwriting/devops-technical-test-data`. - -Starting from the initial commit [282180fe7e5d9cbf297f2f0ef813cffe60ce2328](https://github.com/descartes-underwriting/devops-technical-test-data/commit/282180fe7e5d9cbf297f2f0ef813cffe60ce2328), all the history should be backup. - -## File structure example - -For the following commits on the `descartes-underwriting/devops-technical-test-data`: - -| SHA | OPERATION | -|-----|-----------| -| Commit_N | create readme.md | -| Commit_N+1 | create doc.txt | -| Commit_N+2 | create data/test/test.txt | -| Commit_N+3 | append text to ./doc.txt | -| Commit_N+4 | create test/project/project1.txt | - -The `candidate/descartes-backup-project` repository should have - -```bash -$ tree . -. -├── .gitworkflow -│   └── workflows -│   └── my-lovely-workflow.yml -├── data -│   ├── N -│   │   └── readme.md -│   ├── N+1 -│   │   └── doc.txt -│   ├── N+2 -│   │   └── data -│   │   └── test -│   │   └── test.txt -│   ├── N+3 -│   │   └── doc.txt -│   └── N+4 -│   └── test -│   └── project -│   └── project1.txt -└── script - └── my-beautiful-script.best-language -``` diff --git a/tests/tests_cases/10/d6c5752c05317a5c82f2c4de20acc3d22b110fcd/.gitkeep b/tests/tests_cases/10/d6c5752c05317a5c82f2c4de20acc3d22b110fcd/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/tests/tests_cases/10/f782f60197e2802cd52f6e5b424a0e246962e3f4/.gitkeep b/tests/tests_cases/10/f782f60197e2802cd52f6e5b424a0e246962e3f4/.gitkeep new file mode 100644 index 0000000..e69de29