It's always advisable to be more familiar with the domain before processing or analyzing any data. Hence before understanding how to extract, process, and analyze data from GitHub, we will spend some time on understanding more about GitHub, its vision, and the major features which are used across the world by software and technology enthusiasts.
As mentioned before, the core of GitHub is a web-based service for hosting Git repositories. You can think of a repository as a directory or a folder containing multiple folders or subdirectories, code files, and other assets such as images, media, documents, and so on. People build software by collaborating together on various repositories which they create and maintain. Open source principles are promoted on GitHub and various open source projects and software are developed, improved, and maintained using GitHub. Anyone can be an open source contributor by talking to the members of a project maintained in a repository, adhering to the necessary coding standards, and being open to collaborative development, reviews, and feedback.
GitHub at its core uses Git functionality and hence it enables users to use all the features of distributed source code management and version control. The following are some of the concepts and terminology widely used in GitHub and the collaborative software development community:
GitHub also offers a variety of interesting features and capabilities besides code hosting, version control, and management. Some of its popular features include the following:
Thus you can guess by all the features and capabilities we listed that there is a lot which can be done with GitHub and it has really done a lot to make coding and software development more social, fun, and collaborative! Indeed, the official trademark mascot of GitHub, the Octocat is quite popular amongst the developer community and the logo can be observed in the following figure:
GitHub also provides various other features including public and private repositories and special enterprise software development capabilities also known as GitHub enterprise, where it is usually hosted in private enterprise environments behind corporate firewalls. GitHub also has gists for hosting short snippets of code. Besides this, it also has Speaker Deck which can be used for hosting slide and presentation decks. Now that we are well acquainted with GitHub and collaborative software development, let's start our journey by retrieving some data from GitHub.
18.188.254.179