feat: keep track of empty commits; handle deleted files

This commit is contained in:
Patrick MARIE 2024-10-03 11:54:28 +02:00
parent 08424d1bf1
commit 7e2bdb0696
Signed by: mycroft
GPG Key ID: BB519E5CD8E7BFA7
6 changed files with 22 additions and 198 deletions

View File

@ -37,6 +37,8 @@ drwxr-xr-x 1 patrick chicac 14 Jan 1 09:28 7c5aebc1feeef4eaf19083019547457b8cf3
$ $
``` ```
Note that deleted objects are ignored (no file is created). However, to track commit ids, empty directories can be created and archived, using a `.gitkeep`.
Check `.github/workflows/backup.yaml` for usage sample. Check `.github/workflows/backup.yaml` for usage sample.
## Tests ## Tests
@ -45,5 +47,5 @@ A sample test script exists to verify basic use case of backup'ing a limited num
## Not covered / Improvements ideas ## Not covered / Improvements ideas
- Deleted files in commits; The backup.sh script does not cover deleted files;
- Keep track of latest commit backuped to allow iterative backups. As for now, the script checks all commits (which is not really efficient); However, if latest commit backup is tracked and this commit to be overwritten by a force push, it will be eventually required to re-do the whole backup. - Keep track of latest commit backuped to allow iterative backups. As for now, the script checks all commits (which is not really efficient); However, if latest commit backup is tracked and this commit to be overwritten by a force push, it will be eventually required to re-do the whole backup.

View File

@ -70,13 +70,12 @@ do
# There are malformed files names that are creating complex filenames to parse. # There are malformed files names that are creating complex filenames to parse.
# Those malformed filenames are double quotes, so to remove quotes, -c core.quotepath=false # Those malformed filenames are double quotes, so to remove quotes, -c core.quotepath=false
# and -z are used. sed 's/\x0//g' is removing the null byte # and -z are used. sed 's/\x0//g' is removing the null byte
FILES=$($GIT ${GIT_OPTS[@]} show --pretty= --name-only -z ${COMMIT_SHA} | sed 's/\x0//g') FILES=$($GIT ${GIT_OPTS[@]} show --pretty= --name-only -z ${COMMIT_SHA} | sed 's/\x0//g')
if test -z ${FILES} if test -z ${FILES}
then then
# merge commit, etc. There is no file here. # merge commit, etc. There is no file here.
echo "No file was found in commit ${COMMIT_SHA}; skipping" echo "No file was found in commit ${COMMIT_SHA}"
continue
fi fi
TARGET_BACKUP_SHA=${DATA_DIR}/${COMMIT_SHA} TARGET_BACKUP_SHA=${DATA_DIR}/${COMMIT_SHA}
@ -92,10 +91,25 @@ do
echo "Writing ${TARGET_BACKUP_SHA}/${FILE}" echo "Writing ${TARGET_BACKUP_SHA}/${FILE}"
fi fi
# ${FILE} contains path/to/file # Retrieve file state (Added, Modified, Deleted)
$GIT ${GIT_OPTS[@]} show ${COMMIT_SHA}:${FILE} > ${TARGET_BACKUP_SHA}/${FILE} STATE=$(${GIT} ${GIT_OPTS[@]} show --name-status --pretty= ${COMMIT_SHA} | grep -E '^..${FILE}$' | cut -f1)
if test "${STATE}" != "D"
then
# ${FILE} contains path/to/file
$GIT ${GIT_OPTS[@]} show ${COMMIT_SHA}:${FILE} > ${TARGET_BACKUP_SHA}/${FILE}
else
echo "Skipping ${FILE} as file was deleted in this commit"
fi
done done
# if ${TARGET_BACKUP_SHA} is empty, keep track it was "backuped"
if test -z "$(ls -A ${TARGET_BACKUP_SHA})"
then
echo "Folder ${TARGET_BACKUP_SHA} is empty. Marking it to keep as it in the backup"
touch ${TARGET_BACKUP_SHA}/.gitkeep
fi
NUM_ADDED=$((NUM_ADDED + 1)) NUM_ADDED=$((NUM_ADDED + 1))
if test ${NUM_ADDED} -eq ${MAX_NUM} if test ${NUM_ADDED} -eq ${MAX_NUM}

View File

@ -1,100 +0,0 @@
# Descartes Underwriting
## Context
We wish to create a backup tool that will save only the last modified files of a storage unit.
In our example, the storage unit is **not a bucket**.
The storage unit is the `DD-MM-YYYY-test` branch of the current `descartes-underwriting/devops-technical-test-data` git repository.
## Property
The `descartes-underwriting/devops-technical-test-data` repository is not frozen and will have new commits.
Commits will be added to the `DD-MM-YYYY-test` branch multiple times every day.
The `DD-MM-YYYY-test` branch name will be adapted using standard datetime convention eg: `01-01-2022-test` for the 1st of January 2022.
## Task
Develop a backup tool to save the modified files at each commit.
### Submission
If something is not clear, you can ask questions to the recruiter.
When submitting your project, your version should **not be draft** but complete and following best practices.
The solution should be saved on a **private** `descartes-devops` repository on your github account.
The solution should include:
- source code
- test code
When the final version is ready:
1. Send an email to the recruiter indicating that you finished the project and sharing the url of the project
2. Grant access to:
- <https://github.com/alexandreCameron>
- <https://github.com/Mareak>
- <https://github.com/jrdescartes>
### Script
Create a script to automate the backup process using open source software.
The script should track the changes fo the branch `DD-MM-YYYY-test` of the `descartes-underwriting/devops-technical-test-data` repository.
The execution of the script should be carried out with a github-action / gitlab-pipeline or any other tool automating git workflow on your git project.
It is highly recommended to use a scheduling tool to execute the back up process.
### Data
The backup should store files in separate folders.
The backup file structure should be based on the sha1 of the `descartes-underwriting/devops-technical-test-data`.
Starting from the initial commit [282180fe7e5d9cbf297f2f0ef813cffe60ce2328](https://github.com/descartes-underwriting/devops-technical-test-data/commit/282180fe7e5d9cbf297f2f0ef813cffe60ce2328), all the history should be backup.
## File structure example
For the following commits on the `descartes-underwriting/devops-technical-test-data`:
| SHA | OPERATION |
|-----|-----------|
| Commit_N | create readme.md |
| Commit_N+1 | create doc.txt |
| Commit_N+2 | create data/test/test.txt |
| Commit_N+3 | append text to ./doc.txt |
| Commit_N+4 | create test/project/project1.txt |
The `candidate/descartes-backup-project` repository should have
```bash
$ tree .
.
├── .gitworkflow
│   └── workflows
│   └── my-lovely-workflow.yml
├── data
│   ├── N
│   │   └── readme.md
│   ├── N+1
│   │   └── doc.txt
│   ├── N+2
│   │   └── data
│   │   └── test
│   │   └── test.txt
│   ├── N+3
│   │   └── doc.txt
│   └── N+4
│   └── test
│   └── project
│   └── project1.txt
└── script
└── my-beautiful-script.best-language
```

View File

@ -1,92 +0,0 @@
# Descartes Underwriting
## Context
We wish to create a backup tool that will save only the last modified files of a storage unit.
In our example, the storage unit is **not a bucket**.
The storage unit is the `DD-MM-YYYY-test` branch of the current `descartes-underwriting/devops-technical-test-data` git repository.
## Property
The `descartes-underwriting/devops-technical-test-data` repository is not frozen and will have new commits.
Commits will be added to the `DD-MM-YYYY-test` branch multiple times every day.
The `DD-MM-YYYY-test` branch name will be adapted using standard datetime convention eg: `01-01-2022-test` for the 1st of January 2022.
## Task
Develop a backup tool to save the modified files at each commit.
### Submission
Script and data should be saved on a private `candidate/descartes-backup-project` repository on your github account.
Access should be granted to all members of the `descartes-underwriting` group:
<https://github.com/orgs/descartes-underwriting/people>
Especially:
* <https://github.com/alexandreCameron>
* <https://github.com/Mareak>
* <https://github.com/jrdescartes>
### Script
Create a script to automate the backup process using open source software.
The script should track the changes fo the branch `DD-MM-YYYY-test` of the `descartes-underwriting/devops-technical-test-data` repository.
The execution of the script should be carried out with a github-action / gitlab-pipeline or any other tool automating git workflow on your git project.
It is highly recommended to use a scheduling tool to execute the back up process.
### Data
The backup should store files in separate folders.
The backup file structure should be based on the sha1 of the `descartes-underwriting/devops-technical-test-data`.
Starting from the initial commit [282180fe7e5d9cbf297f2f0ef813cffe60ce2328](https://github.com/descartes-underwriting/devops-technical-test-data/commit/282180fe7e5d9cbf297f2f0ef813cffe60ce2328), all the history should be backup.
## File structure example
For the following commits on the `descartes-underwriting/devops-technical-test-data`:
| SHA | OPERATION |
|-----|-----------|
| Commit_N | create readme.md |
| Commit_N+1 | create doc.txt |
| Commit_N+2 | create data/test/test.txt |
| Commit_N+3 | append text to ./doc.txt |
| Commit_N+4 | create test/project/project1.txt |
The `candidate/descartes-backup-project` repository should have
```bash
$ tree .
.
├── .gitworkflow
│   └── workflows
│   └── my-lovely-workflow.yml
├── data
│   ├── N
│   │   └── readme.md
│   ├── N+1
│   │   └── doc.txt
│   ├── N+2
│   │   └── data
│   │   └── test
│   │   └── test.txt
│   ├── N+3
│   │   └── doc.txt
│   └── N+4
│   └── test
│   └── project
│   └── project1.txt
└── script
└── my-beautiful-script.best-language
```