Document Management with Git | CORPORATE ETHOS

Document Management with Git

By: | May 28, 2018
Document Management with Git

When we handle a project or business, we produce a huge volume of documents. The information they contain is the cornerstone of the project success. While creating a document it goes through multiple iterations. As the document creation progresses, you (and other team members) create different versions of the documents. Tracking your versions and merging the ones made by your collaborators is a daunting task.

muralicolWhen you save a file after making changes, you can save the complete file as a different version. Now, if you keep on changing different files and each time, save the entire file (with a different name), the whole project will occupy a huge chunk of your storage with lots of unnecessary and redundant data. Another option is to save only the changes that you make (instead of saving the entire project). If you opt for this method you can save storage, but you may find it difficult to view the whole project at a time.

Naming the different versions is another issue. Of course, one can come with a comprehensive naming scheme, but once the project grows you may find naming different versions a difficult task. Another issue: how will you find what exactly is different between these versions (questions like what is the difference between version 1 and version 2). The optimum solution is to automate all of these tasks and this is what a version control system actually does.

You can think of version control as a system that manages the changes you make in a project until the end. The changes you make may be some kind of adding new files or modifying the old files, etc. What the version control system does is that every time you make a change in your project, it creates a snapshot (means entire state of the project at a particular time) of your entire project and save it. These snapshots are actually known as different versions. The snapshot or a particular version will contain information on the kind of files you have at that time and the kind of changes you have made.

For instance, assume you are developing a website. In the beginning, you may have just the index.html file and a few days later you may add a couple of files (say, about.html, style.css) and in another day you may modify some of these files. The version control system will monitor all these changes and save them as different versions. The version control system always keeps your older versions neatly packed and lets you roll back to any of these older versions.

Tracking files with Git

Git is a free open source version control system developed by Linus Torvalds, the creator of Linux. The tool tracks modifications to files over time so that you can recall older versions later. Git does this by enabling you to take snapshots of files any time. These snapshots are called commits.

Git demo

Once we have Git installed, invoke it using the method appropriate to your OS. If on Windows you cause either GIT CMD or Git GUI.

GIT GUI

We start with a standard directory on our file system. Let us create a directory called ‘git_corp’ and move to this folder. We can turn this directory into a git repository (repo) with the command ‘git init’. A repository is a data space where you store all the project files. Now let us add a couple of files into this directory. Here, for the purpose of this demo let us create a text file called ‘readme.md’ and a word document ‘corp.docx’. Now our project folder holds two files. We can now convert this folder into a git repository using the git command ‘git init’.

GIT1

With the ‘init’ command git added a new hidden subdirectory .git’ into the ‘get_corp’ folder. The directory ‘get-corp’ is our repo’s working tree, where we add, remove and edit files for our project.

Before moving further, let us do a simple configuration task. Whenever we make a commit, git includes our name, email and a timestamp with the commit. This is important for tracking when changes were made and who made them. For this execute the following: ‘git –global user.name ‘your_name” ‘ and “git config  – – global user.email ‘your_email’.

Now let us make our first commit. In our working directory, we have two files readme.md and corp.docx.  If you wish ‘git’ to track changes to these files you need to commit it to the repo. You can view the current status of our repo using the ‘git status’ command.

The git status command tells us how things stand in our working tree and in the staging area. First, we see that we are on the master branch. Here we have two untracked files and let us now add the ‘readme.md’ into the staging area so that we can make our first commit with the command ‘git add readme.md’ and take the repo status again.

Now git is tracking readme.md and it is time for us to make our first commit using the command ‘git commit –m”First commit”.

In the above command, we used ‘-m’ option to provide a short message describing what is being changed. With each commit, git inserts a  unique (40 character hexadecimal) number (a kind of fingerprint for the commit). This number will help us recover the content of this commit later. You can obtain the details of this commit via the command ‘git log’.

We have one more file (corp.docx) in the folder; we need to put this also in the repo. Before that let us edit the readme.md and add this line: “Git repo documentation”. Now we have two files (one new and one modified) in our working tree. If you wish to see the difference between the modified version of readme.md and file and the one already in the repo, simply use the command ‘git diff’.

Let us put the file ‘corp.docx’ and the edited version of ‘readme.md’ into our repo. You can use the command ‘git add readme.md corp.docx’ or ‘git add .’ . Here dot (.) means all files (old, modified and new files) in the project folder or working tree. And to commit them to the repo, use the command ‘git commit –m “Second commit: add corp.docx and modified readme.md”.

As the log shows, our recent commit is at the top and the first commit is below that.

So now we have two commits in our commit history. The first commit has the file ‘readme.md’ as it was when we first created it. Our second commit has the updated version of ‘readme.md’ and ‘corp.docx’.

Now let us modify the file corp.docx by adding the line “Document management with Git” (screenshot below).

Let us see how the current working tree content is different from the one in the repo (using the command ‘git diff’.

Let us now add the modified version to our repo using the commands ‘git add corp.docx’ and ‘git commit –m”Third commit: modified corp”

Now we have the history of three commits and currently, the project is in this state: we have ‘readme.md’ in its updated state, the same way it was in the second commit and the new modified version of ‘corp.docx’. Perhaps you are not satisfied with this edit and wish to replace the current version of ‘corp.docx’ with its previous version. Let us see how we can go back to the older version.

Take the log of git commits.  The log shows the commits that affect corp.docx.

As can be seen in the screenshot above, we have added ‘corp.docx’ in the second commit and its unique number starts with 9d7f5 . We can extract the old version of corp.docx from this snapshot using the command ‘git checkout 9d7f5 — corp.docx’. Take a look at the screenshot below and see the change in the size of corp.docx.

Of course, if you want, you can again bring back the modified version by extracting it from the third commit, which holds the modified version. Initially, you may find it difficult to use Git. But once you get the hang of it you will realise its utility.