Extracting the financial statements and notes dataset

The following code downloads and extracts all historical filings contained in the Financial Statement and Notes (FSN) datasets for the given range of quarters (see edgar_xbrl.ipynb for addition details):

SEC_URL = 'https://www.sec.gov/files/dera/data/financial-statement-and-notes-data-sets/'

first_year, this_year, this_quarter = 2014, 2018, 3
past_years = range(2014, this_year)
filing_periods = [(y, q) for y in past_years for q in range(1, 5)]
filing_periods.extend([(this_year, q) for q in range(1, this_quarter +
1)])
for i, (yr, qtr) in enumerate(filing_periods, 1):
filing = f'{yr}q{qtr}_notes.zip'
path = data_path / f'{yr}_{qtr}' / 'source'
response = requests.get(SEC_URL + filing).content
with ZipFile(BytesIO(response)) as zip_file:
for file in zip_file.namelist():
local_file = path / file
with local_file.open('wb') as output:
for line in zip_file.open(file).readlines():
output.write(line)

The data is fairly large and to enable faster access than the original text files permit, it is better to convert the text files to binary, columnar parquet format (see Efficient data storage with pandas section in this chapter for a performance comparison of various data-storage options compatible with pandas DataFrames):

for f in data_path.glob('**/*.tsv'):
file_name = f.stem + '.parquet'
path = Path(f.parents[1]) / 'parquet'
df = pd.read_csv(f, sep=' ', encoding='latin1', low_memory=False)
df.to_parquet(path / file_name)

For each quarter, the FSN data is organized into eight file sets that contain information about submissions, numbers, taxonomy tags, presentation, and more. Each dataset consists of rows and fields and is provided as a tab-delimited text file:

File

Dataset

Description

SUB

Submission

Identifies each XBRL submission by company, form, date, and so on

TAG

Tag

Defines and explains each taxonomy tag

DIM

Dimension

Adds detail to numeric and plain text data

NUM

Numeric

One row for each distinct data point in filing

TXT

Plain text

Contains all non-numeric XBRL fields

REN

Rendering

Information for rendering on SEC website

PRE

Presentation

Detail on the tag and number presentation in primary statements

CAL

Calculation

Shows arithmetic relationships among tags

 

 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.232.113.65