Graduate Program KB

Table of Contents


Foundations

  • At its core, git is like a key value store
    • The key is a cryptographic hash function, given some data, it produces a 40-digit hexadecimal number.
    • The value is the data/content itself.
  • Git stores compressed data in a blob, along with metadata in a header:
    • The identifier: blob
    • the size of the content
    • \0 delimiter
    • content
  • We can ask git for the SHA1 of contents with the following: We are asking git for the SHA1 of the string "Hello, World!"
echo "Hello, World!" | git hash-object --stdin
  • We use --stdin flag because git hash-object expects to read from a file, but we want to read from standard input.

The Tree

A tree in git is used for a repository and contains pointers (using SHA1) - to blobs (files) - to other trees (directories) and metadata: - type of pointer (blob or tree) - filename or directory name - mode (executable file, symbolic link, etc.)

Git stores files that are copies with the same SHA1, the copy folder will point to the file that it points to.
This is how git saves space and time.

Commits

A commit points to a tree and contains metadata: - Author - Committer - Date - Commit message - Parent commit(s) The SHA1 of the commit is the hash of all of this information. A commit is like a code snapshot, what the code looked like at the time of the commit

  • The commit objects are compressed binary objects and hence can't be viewed with cat.
  • We can view the contents of a commit with the following:
git cat-file -p <commit SHA1>
  • We can view the type of a commit with the following:
git cat-file -t <commit SHA1>

Why can't we change commits?

If you change the data of the commit, the commit will have a new SHA1.

References

Are pointers to commits Could be: - Branches - .git/refs/heads/ - Tags - .git/refs/tags/ - HEAD - .git/HEAD - Is usually a pointer to the most recent commit - Remote branches - etc.

cat-file

A very useful command for viewing contents of the .git file and seeing what's going on under the hood.
Some useful cat commands:
To see where HEAD is pointing to: git cat .git/HEAD ref: refs/heads/master To see what the latest commit on a certain branch is: git cat .git/refs/heads/master cat-file commands:
To see the contents of a commit: git git cat-file -p <commit SHA1> To see the type of a commit: git git cat-file -t <commit SHA1>

Creating a Branches

To create a branch without switching to it, we can use the following command:

git branch <branch-name>

Random Takeaways

  • You can get a condensed view of git log with the following:
    git --no-pager log --oneline
    
  • You can interactively stage with the following:
    git add -p
    
    This will prompt you with the changes and ask if you want to stage them or not. With the following options (y,n,q,a,d,/,j,J,g,e,?): - y: stage this hunk - n: do not stage this hunk - q: quit; do not stage this hunk or any of the remaining ones - a: stage this hunk and all later hunks in the file - d: do not stage this hunk or any of the later hunks in the file - g: select a hunk to go to - /: search for a hunk matching the given regex - j: leave this hunk undecided, see next undecided hunk - J: leave this hunk undecided, see next hunk - k: leave this hunk undecided, see previous undecided hunk - K: leave this hunk undecided, see previous hunk - s: split the current hunk into smaller hunks - e: manually edit the current hunk - ?: print help
  • You can unstage your changes with the following:
    git reset
    

Areas and Stashing

  • Save un-committed work

  • The stash is safe from destructive operations

  • Stash changes: git stash

  • List changes: git stash list

  • Show the contents: git stash show stash@{0}

  • Apply the last stash: git stash apply

  • Apply a specific stash: git stash apply stash@{0}

  • Keep untracked files: git stash --include-untracked

  • Keep all files: git stash --all

  • Name stashes for easy reference: git stash save "message"

  • Start a new branch from a stash: git stash branch <branch-name>

  • Grab a single file from a stash: git checkout <stash name> -- <file>

Cleaning the Stash:

  • Remove the last stash and apply changes
  • It doesn't remove if there is a merge conflict: git stash pop
  • Remove the last stash: git stash drop
  • Remove the nth stash: git stash drop stash@{n}
  • Remove all stashes: git stash clear

Merging and Rebasing

Fast Forward

  • When the branch you are merging into has not diverged from the branch you are merging in
  • A clear path from the tip of the current branch to the tip of the target branch
  • During a fast forward merge, we add the new commits on top of the master branch and then just move the master pointer.
  • If you don't want this, you can force a merge commit with the --no-ff tag

Merge Conflicts

  • Attempt to merge, but files have diverged
  • Git stops until the conflicts have been resolved

REUSE RECORDED RESOLUTION (RERERE)

  • Git saves how you resolved a conflict.
  • allows you to know how to resolve the conflict next time
  • useful for Rebasing Turn it on Globally:
git config --global rerere.enabled true

Turn it on, on a per project basis:

git config rerere.enabled true

Look at the merge conflict differences:

git rerere diff

You'll know it worked because when you resolve and commit, you will be notified that the resolution was recorded.


History and Diffs

Good commit messages are really important!

  • Help with code base history integrity
  • Debugging
  • Code reviews
  • Rolling back
  • Associating code with an issue or ticket

To add descriptive multi-line commit messages to your commits, you can use the following command:

git commit -m "Title" -m "Description line 1" -m "Description line 2"

To see the history of your commits, you can use the following command:

git log
git log --since="2 weeks ago"
git log --since="yesterday"

we can see what has been moved or renamed with the follow flag:

git log --name-status --follow <file

We can search for commits that match a regular expression with:

git log --grep="search term"

We can selectively include or exclude files that have been Added, Deleted, Modified, and many more:

git log --diff-filter=A
git log --diff-filter=D
git log --diff-filter=M

Referencing commits can be done with ^ and ~

git show HEAD^
git show HEAD~2

First command will show the commit before the current one,
Second command will show the commit two before the current one.

  • ^ refers to the parent commit, can specify with a number. If parents are on the same level, they are read left to right (looking at a graph)
  • ~ refers to how many commits back you want to back track.

Show commits and contents:

git show <commit>

Show changes between commits:

git diff <commit1> <commit2>

Unstaged changes:

git diff

Staged changes:

git diff --staged

Branch Clean Up

To see what branches are merged with master or alternatively, not merged:

git branch --merged master
git branch --no-merged master

Fixing Mistakes

Git Checkout

Is used for restoring working tree files or switch branches What happens when you git checkout a branch?

  1. Change HEAD to point to the new branch
  2. Copy the commit snapshot to the staging area.
  3. Update the working area with the branch contents
  • Generally a pretty safe operation

What happens when you git checkout a file (-- file)?

Replace the working area copy with the version from the current staging area.
This operation overwrites the files in the working directory without warning!
It overwrites the working area file with the staging area version from the last commit.

git checkout -- <file_path>

What happens when you git checkout commit -- file?

  1. Update the staging area to match the commit
  2. Update the working area to match the staging area WARNING: This operation overwrites files in the staging area and working directory without warning!

When we checkout a file from a specific commit

Checking out the file from a specific commit - copies both working area & staging area - `git checkout commit --file-path Restore a deleted file

  • git checkout deleting_commit^ --file-path WARNING: This operation overwrites files in the staging area and working directory without warning!

Git Clean

Clears your working area by deleting untracked files
Use the --dry-run flag to see what would be deleted The -f flag to do the deletion The -d flag to clean directories This operation is more friendly but cannot be undone

git clean -d --dry-run
git clean -d -f

First command shows everything that would be deleted. Second command actually deletes.

Git Reset

Is another command that performs different actions depending on the arguments

  • with a path
  • without a path
  • by default git performs a git reset -mixed
    For commits:
  • Moves the HEAD pointer, optionally modifies files
    For file paths:
  • Does not move the HEAD pointer, modifies files
    --soft just moves the HEAD pointer => git reset --soft HEAD~
    --mixed moves the HEAD pointer and stages the commits of where its pointing to
    --hard moves the HEAD pointer, stages the commits and also moves the commits to the working area.

Git Reset CHEATSHEET

  1. Move HEAD and current branch
  2. Reset the staging area
  3. Reset the working area --soft = (1) --mixed = (1) & (2) - is the default git reset --hard = (1) & (2) & (3) Git Reset with a commit and a file only Resets the staging area.

Git Revert

  • The safe version of Reset
  • Always revert if in a shared repository
  • Git revert creates a new commit that introduces the opposite changes from the specified commit.
  • The original commit stays in the repo

Tips:

  • Use revert if you're undoing a commit that has already been shared.
  • Revert does NOT change history

Bringing back a deleted file

We want to find when the file was deleted:

git log --diff-filter=D --oneline -- deleted_file

Locate the commit where the file was deleted and checkout the commit before it.

git checkout <commit_of_deletion_SHA>^ -- deleted_file

Bring back a commit without rewriting history - REVERT

Find the commit where you deleted the file and use that hash with a revert command:

git revert <SHA>

This opens up an editor. Double check to see if it is doing what you want.
Save then exit when you are happy. This will create a new commit with the file you wanted.


Rebase and Amend

Amend a commit

  • Is a quick and easy shortcut that lets you make changes to the previous commit
  • Will add your changes to the last commit
  • For example if you:
    • Commit a file but you forgot a file that should be in that commit
    • Stage the missed file and commit it with the --amend flag
    • This adds it to the previous commit
  • The amended commit is a copy of the original, the original will become dangling and will eventually be garbage collected.

Rebase

  • If branches have diverged and we don't want a messy merge commit in our history.
  • We can pull in all the latest changes from master, and apply our commits on top of them by changing the parent commit of our commits
    Rebase = give a commit a new parent
  • Rebase essentially rewinds HEAD.
  • We can use the -i flag for a more interactive rebase
  • rebasing with the --exec flag is good for running a program (tests) after each commit

My Understanding

  • If your branch has diverged and you're on that branch (ie. HEAD is pointing to your branch)
  • git rebase master will copy your diverged branch on top of the master branch
  • Now HEAD will be pointing to master.
  • git rebase diverged-branch this will now bring master up to date with the latest branch that was diverged.

Power of Rebasing

  • Commits can be:
    • edited
    • removed
    • combined
    • re-ordered
    • inserted
  • Before they're "replayed" on top of the new HEAD
  • Rebase options:
    • pick - keeps the commit
    • reword - keep the commit, just change the message
    • edit - keep the commit, but stop to edit more than the message
    • squash - combine the commit with the previous commit. Stop to edit the message
    • fixup - combine the commit with previous one, keep previous commit message
    • exec - run the command on this line after picking the previous commit
    • drop - remove the commit

Pro Tip

  • Before you rebase / fixup / squash / reorder:
    • Make a copy of your current branch: git branch my_branch_backup
  • git branch will make a new branch, without switching to it
  • If rebase "succeeds" but you messed up...
  • git reset my_branch_backup --hard

Amending

What if we want to amend an arbitrary commit?

  1. git add new files
  2. git commit --fixup SHA - this creates a new commit, the message starts with 'fixup!'
  3. git rebase -i --autosquash SHA^ - ^ gets the parent
  4. git will generate the right todos for you! just save and quit.

Good Advice to Follow

  • Commit often, perfect later, publish once
  • When working locally
    • commit whenever you make changes
    • It'll help you be more productive
  • Before pushing work to a shared repo
    • Rebase to clean up commit history
  • Never rebase public history

Forks and Remote Repos

Remotes

  • Is a git repo stored elsewhere (on the web, github, etc)
  • Origin is the default name git gives the server you cloned from
  • Cloning a remote repo from a URL will fetch the whole repo, and make a local copy in your .git folder
  • You may have different privileges for a remote
  • To view your remotes:
    git remote -v
    

Cloning

To clone a remote repo to your local machine

git clone git@github.com:username/repo/path.git

Fork

  • A fork is a copy of a repo that's stored in your GitHub account
  • You can clone your fork to your local computer with no restrictions because you own the copy
  • You can push changes from your fork to the original project via pull requests
  • To stay up to date in your fork with the original project, you need to set up a upstream.

Upstream

  • The upstream repo is the base repo you created a fork from.
  • This isn't set up by default, you need to manually set it up
  • Once you add your upstream, you can pull changes that have been made to the remote repo after you have forked it
git remote add upstream https://github.com/ORIG_OWNER/REPO.git

GitHub Workflow

Usually follows a triangular workflow: - Fetch from upstream - push changes to your fork - propose changes to the upstream via a pull request

Tracking branch

  • To track a branch you can tie it to an upstream branch
    • allows you to push / pull with no arguments
  • To checkout a remote branch with tracking:
    git checkout -t origin/feature
    
  • Tell git which branch to track the first time you push:
    git push -u origin feature
    
  • Useful command to show which upstream branch you are tracking locally... -vv

Fetch

  • Git fetch is important for keeping your local repo up to date with a remote
  • It pulls down all the changes that happened on the server
  • But, it doesn't change your local repo

Pull

  • Pulls down the changes from the remote repo to your local repo, and merges them with a local branch.
  • Under the hood of pull
    git pull = git fetch && git merge
    
  • If changes happen upstream, git creates a merge commit
  • Otherwise it will fast-forward (ff)

Push

  • Pushing sends your changes to the remote repo
  • git only allows you to push if your changes won't cause a conflict
  • Tip: to see commits which haven't been pushed upstream yet: git cherry -v

Pull --rebase

  • Will fetch, update your local branch to copy the upstream branch, then replay any commits you made via rebase
  • When you open a PR, there will be no unsightly merge commits

Pull Request Tips

Before opening a PR

  • Keep commit history clean and neat, Rebase if needed
  • Run projects tests on your code
  • Pull in upstream changes (preferably via rebase to avoid merge commits)
  • check for a CONTRIBUTING (.md/.txt) file in the project root

After Opening the PR

  • Explain your changes thoroughly in the PR
  • Link to any open issues that your PR might fix
  • Check back for comments from the maintainers

Danger Zone

Local destructive operations

  • If the file is present in the staging area, it'll be overwritten
    git checkout -- <file>
    
  • Will overwrite changes that are staged and in the working area
    git reset --hard
    
  • Unless changes are stashed there is no way of getting them back.
  • TIP: use git stash --include-untracked to include working area changes in your stash

Remote destructive operations

  • Operations that can rewrite history:
    • rebase
    • amend
    • reset
  • If your code is hosted or shared never run git push -f

Recover Lost Work

  • Use ORIG_HEAD
    • The commit HEAD was pointing to before a:
      • reset
      • merge
    • using this method can be done via
      git reset --merge ORIG_HEAD
      
      • The --merge flag is used to preserve any uncommitted changes
  • Check repo for copies
    • github
    • coworker

Using Git Reflog and '@' Syntax

  • Git keeps commit for around 2 weeks by default
  • If you need to go back in time, and find a commit that's no longer referenced, you can look in the reflog.
  • The syntax of reflog is like so:
    • HEAD@2 means the value of HEAD 2 moves ago

Return