Understanding GitHub

It's always advisable to be more familiar with the domain before processing or analyzing any data. Hence before understanding how to extract, process, and analyze data from GitHub, we will spend some time on understanding more about GitHub, its vision, and the major features which are used across the world by software and technology enthusiasts.

As mentioned before, the core of GitHub is a web-based service for hosting Git repositories. You can think of a repository as a directory or a folder containing multiple folders or subdirectories, code files, and other assets such as images, media, documents, and so on. People build software by collaborating together on various repositories which they create and maintain. Open source principles are promoted on GitHub and various open source projects and software are developed, improved, and maintained using GitHub. Anyone can be an open source contributor by talking to the members of a project maintained in a repository, adhering to the necessary coding standards, and being open to collaborative development, reviews, and feedback.

GitHub at its core uses Git functionality and hence it enables users to use all the features of distributed source code management and version control. The following are some of the concepts and terminology widely used in GitHub and the collaborative software development community:

  • A repository is basically a container which contains all necessary code and assets for a software product or project.
  • Repository forks are basically clones or copies of the parent or original repository. Any user can fork a repository and then modify it to add enhancements to suit their own needs as long as it adheres with the license.
  • Repository stars are basically GitHub's version of Facebook likes or +1 from Google+. Users can star a repository if they like it. Then they become a stargazer of that repository. Often stargazer and fork counts are ways of finding out trending and popular repositories!
  • Users can also create issues, feature requests, and file bugs and track them over a period of time.
  • Each repository usually has one or multiple languages which are basically programming languages which were used to build the project.
  • Adding new content to a repository is done through commits.
  • When multiple users are modifying and adding content to a repository using commits, they usually send pull requests which consist of the necessary modifications.
  • Pull requests are usually merged to the repository after due reviews and conflict resolutions.
  • A repository might have multiple branches, where each branch can have the same or different code and assets. Each branch usually can focus on specific new content, features, or enhancements which can then be merged to the default master branch of the repository.

GitHub also offers a variety of interesting features and capabilities besides code hosting, version control, and management. Some of its popular features include the following:

  • Ability to collaborate, develop, and manage code with the help of repositories which can be used by multiple users at any point in time without losing data.
  • Ability to build beautiful markdown-based documentation, wikis, and readme files for various code repositories.
  • GitHub pages to host personal websites or build websites for projects and repositories.
  • Ability to track issues, bugs, and feature requests for various repositories and help improve and evolve software with time.
  • Ability to search through code, repository lists, and users.
  • Ability to get curated content of repositories as well as trending repositories by language over time.
  • Visualizations and statistics with regard to commits, code frequency, punch cards, contributors, members, and networks.
  • Feature-rich issue and enhancement tracking capabilities with abilities to review code, add comments, tag contributors, users, and assign tasks.
  • E-mail notification capabilities.
  • Ability to view CSV files, Jupyter notebooks, PSD files, images, and PDF documents directly from GitHub itself.

Thus you can guess by all the features and capabilities we listed that there is a lot which can be done with GitHub and it has really done a lot to make coding and software development more social, fun, and collaborative! Indeed, the official trademark mascot of GitHub, the Octocat is quite popular amongst the developer community and the logo can be observed in the following figure:

Understanding GitHub

The very popular GitHub mascot—Octocat

GitHub also provides various other features including public and private repositories and special enterprise software development capabilities also known as GitHub enterprise, where it is usually hosted in private enterprise environments behind corporate firewalls. GitHub also has gists for hosting short snippets of code. Besides this, it also has Speaker Deck which can be used for hosting slide and presentation decks. Now that we are well acquainted with GitHub and collaborative software development, let's start our journey by retrieving some data from GitHub.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.254.179