Data set basics
A data set is a collection of logically related data. It can be a source program, a library of macros, or a file of data records used by a processing program. Data records (also called logical records) are the basic unit of information used by a processing program. This chapter introduces you to the data set types available for use on z/OS, the devices they can be allocated onto, and their characteristics.
By placing your data into volumes of organized data sets, you can save and process the data efficiently. You can also print the contents of a data set, or display the contents on a workstation.
This chapter includes the following sections:
2.1 Data sets on storage devices
A z/OS data set is a collection of logically related data records stored on one or a set of volumes. A data set can be, for example, a source program, a library of macros, or a file of data records used by a processing program. You can print a data set or display it on a workstation. The logical record is the basic unit of information used by a processing program.
 
Note: As an exception, the z/OS UNIX services component supports zSeries file system (ZFS) data sets, where the collection is of bytes and the concept of logically related data records is not used.
2.1.1 Storage devices
Data can be stored on a magnetic direct access storage device (DASD), magnetic tape volume, or optical media. As mentioned previously, the term DASD applies to disks or simulated equivalents of disks. All types of data sets can be stored on DASD, but only sequential data sets can be stored on magnetic tape. The types of data sets are described in 2.2, “Data set types” on page 11.
2.1.2 DASD volumes
Each block of data on a DASD volume has a distinct location and a unique address, making it possible to find any record without extensive searching. You can store and retrieve records either directly or sequentially. Use DASD volumes for storing data and executable programs, including the operating system itself, and for temporary working storage. You can use one DASD volume for many separate data sets, and reallocate or reuse space on the volume.
2.1.3 Tape volumes
A tape volume provides serial access to the media for read and write operations. Because only sequential access is possible, only sequential data sets can be stored on tapes. A tape can be physical like model 3592JD used in automated tape libraries (ATLs) or logical, emulated by hardware like IBM Virtualization Engine (VE).
The following sections discuss the logical attributes of a data set, which affects how a data set can be created, which devices can store it, and what access method can be used to access it.
 
 
 
 
 
 
 
 
 
2.2 Data set types
The data organization that you choose depends on your applications and the operating environment. z/OS allows you to use temporary or permanent data sets, and to use several ways to organize files for data to be stored on magnetic media, as described in this section.
2.2.1 Non-VSAM data sets
Non-VSAM data sets are the most common type of data set found on z/OS. They are a collection of fixed-length or variable-length records, which can be physically stored in groups (blocks), or individually. There are several different types of non-VSAM data sets:
Physical sequential data set
Physical sequential (PS) data sets contain logical records that are stored in physical order. New records are appended to the end of the data set. You can specify a sequential data set in extended format or not.
Partitioned data set
Partitioned data sets (PDSs) are similar in organization to a library and are often referred to this way. Normally, a library contains a great number of “books,” and sorted directory entries that are used to locate them. In a partitioned organized data set, the “books” are called members, and to locate them, they are pointed to by entries in a directory.
The members are individual sequential data sets and can be read or written sequentially, after they have been located by directory. Then, the records of a specific member are written or retrieved sequentially. Partitioned data sets can only exist on DASD and is not eligible for multivolume. Each member has a unique name that is one to eight characters in length and is stored in a directory that is part of the data set.
Space for the directory is expressed in 256 byte blocks. Each block contains from 3 to 21 entries, depending on the length of the user data field. If you expect 200 directory entries, request at least 30 blocks. Any unused space on the last track of the directory is wasted unless there is enough space left to contain a block of the first member.
Partitioned data set extended
Partitioned data set extended (PDSE) is a type of data set organization that improves the PDS organization. It has an improved indexed directory structure and a different member format. You can use PDSE for source (programs and text) libraries, macros, and program object (the name of executable code when loaded in PDSE) libraries.
Logically, a PDSE directory is similar to a PDS directory. It consists of a series of directory records in a block. Physically, it is a set of “pages” at the front of the data set, plus additional pages interleaved with member pages. Five directory pages are initially created at the same time as the data set.
New directory pages are added, interleaved with the member pages, as new directory entries are required. A PDSE always occupies at least five pages of storage. It cannot be overwritten by being opened for sequential output.
There are several advantages of using PDSE data sets over regular PDS. For a list of the differences, see IBM z/OS DFSMS Using data sets, SC23-6855.
You can define several attributes when allocating a new non-VSAM file, including data set size, record length, record format, and so on. For a complete list of attributes and their functions, see IBM z/OS DFSMS Using data sets, SC23-6855.
2.2.2 VSAM data sets
Virtual Storage Access Method (VSAM) data sets are formatted differently than non-VSAM data sets. Except for linear data sets, VSAM data sets are collections of records, grouped into control intervals. The control interval is a fixed area of storage space in which VSAM stores records. The control intervals are grouped into contiguous areas of storage called control areas. To access VSAM data sets, use the VSAM access method.
VSAM arranges records by an index key, by a relative byte address, or by a relative record number. VSAM data sets are cataloged for easy retrieval.
Any type of VSAM data set can be in extended format. Extended-format data sets have a different internal storage format than data sets that are not extended. This storage format gives extended-format data sets additional usability characteristics and possibly better performance due to striping. You can choose that an extended-format key-sequenced data set be in the compressed format. Extended-format data sets must be SMS-managed.
For more information about VSAM data sets, see 4.4, “Virtual Storage Access Method” on page 48.
Key-sequenced data set
In a key-sequenced data set (KSDS), logical records are placed in the data set in ascending collating sequence by key. The key contains a unique value, which determines the record's collating position in the cluster. The key must be in the same position (off set) in each record.
The key field must be contiguous and each key’s contents must be unique. After it is specified, the value of the key cannot be altered, but the entire record can be deleted. When a new record is added to the data set, it is inserted in its logical collating sequence by key.
A KSDS has a data component and an index component. The index component tracks the used keys and is used by VSAM to retrieve a record from the data component quickly when a request is made for a record with a certain key.
A KSDS can have fixed or variable length records. A KSDS can be accessed in sequential mode, direct mode, or skip sequential mode (meaning that you process sequentially, but directly skip portions of the data set).
Entry-sequenced data set
An entry sequenced data set (ESDS) is comparable to a sequential data set. It contains fixed-length or variable-length records. Records are sequenced by the order of their entry in the data set, rather than by a key field in the logical record. All new records are placed at the end of the data set. An ESDS cluster has only a data component.
Records can be accessed sequentially or directly by relative byte address (RBA). When a record is loaded or added, VSAM indicates its RBA. The RBA is the offset of the first byte of the logical record from the beginning of the data set. The first record in a data set has an RBA of 0, the second record has an RBA equal to the length of the first record, and so on. The RBA of a logical record depends only on the record's position in the sequence of records. The RBA is always expressed as a full-word binary integer.
Although an entry-sequenced data set does not contain an index component, alternate indexes are allowed. You can build an alternate index with keys to track these RBAs.
Relative-record data set
A relative record data set (RRDS) consists of a number of preformed, fixed-length slots. Each slot has a unique relative record number, and the slots are sequenced by ascending relative record number. Each (fixed length) record occupies a slot, and it is stored and retrieved by the relative record number of that slot. The position of a data record is fixed, so its relative record number cannot change.
An RRDS cluster has a data component only. Random load of an RRDS requires a user program implementing such logic.
Linear data set
An linear data set (LDS) VSAM data set contains data that can be accessed as byte-addressable strings in virtual storage. A linear data set does not have embedded control information that other VSAM data sets hold.
The primary difference among these types of data sets is how their records are stored and accessed. VSAM arranges records by an index key, by relative byte address, or by relative record number. Data organized by VSAM must be cataloged and is stored in one of four types of data sets, depending on an application designer option.
2.2.3 z/OS UNIX files
z/OS UNIX System Services (z/OS UNIX) enables applications and even z/OS to access UNIX files. UNIX applications also can access z/OS data sets. You can use the hierarchical file system (HFS), z/OS Network File System (z/OS NFS), zSeries file system (zFS), and temporary file system (TFS) with z/OS UNIX. You can use the BSAM, QSAM, BPAM, and VSAM access methods to access data in UNIX files and directories. z/OS UNIX files are byte-oriented, similar to objects.
Hierarchical file system
On DASD, you can define an HFS data set on the z/OS system. Each HFS data set contains a hierarchical file system. Each hierarchical file system is structured like a tree with subtrees, and consists of directories and all their related files. Although HFS is still available, zFS provides the same capabilities with a better performance. Plan using zFS instead.
z/OS Network File System
The z/OS NFS is a distributed file system that enables users to access UNIX files and directories that are on remote computers as though they were local. NFS is independent of machine types, operating systems, and network architectures. Use the NFS for file serving (as a data repository) and file sharing between platforms supported by z/OS.
Figure 2-1 illustrates the client/server relationship:
The upper center portion shows the DFSMS NFS address space server, and the lower portion shows the DFSMS NFS address space client.
The left side of the figure shows various NFS clients and servers that can interact with the DFSMS NFS server and client.
In the center of the figure is the Transmission Control Protocol/Internet Protocol (TCP/IP) network used to communicate between clients and servers.
Figure 2-1 DFSMS Network File System
zSeries file system
A zFS is a UNIX file system that contains one or more file systems in a VSAM linear data set. zFS is application compatible with HFS and more performance efficient than HFS.
Temporary file system
A TFS is stored in memory and delivers high-speed I/O. A systems programmer can use a TFS for storing temporary files.
2.2.4 Object data sets
Objects are named streams of bytes that have no specific format or record orientation. Use the object access method (OAM) to store, access, and manage object data. You can use any type of data in an object because OAM does not recognize the content, format, or structure of the data. For example, an object can be a scanned image of a document, an engineering drawing, or a digital video. OAM objects are stored either on DASD in an IBM DB2® database, on an optical drive, or on tape storage volumes.
The storage administrator assigns objects to object storage groups and object backup storage groups. The object storage groups direct the objects to specific DASD, optical, or tape devices, depending on their performance requirements. You can have one primary copy of an object and up to two backup copies of an object. A Parallel Sysplex allows you to access objects from all instances of OAM and from optical hardware within the sysplex.
2.2.5 Other data set attributes
This section describes these data set types and their attributes:
Basic format data sets
Basic format data sets are sequential data sets that are specified as neither extended-format nor large-format. Basic format data sets have a size limit of 65,535 tracks (4369 cylinders) per volume. They can be system-managed or not, and can be accessed by using QSAM, BSAM, or EXCP.
You can allocate a basic format data set by using the DSNTYPE=BASIC parameter on the DD statement, dynamic allocation (SVC 99), TSO/E ALLOCATE, or the access method services ALLOCATE command, or the data class. If no DSNTYPE value is specified from any of these sources, then its default is BASIC.
Large format data sets
Large format data sets are sequential data sets that can grow beyond the size limit of 65,535 tracks (4369 cylinders) per volume that applies to other sequential data sets. Large format data sets can be system-managed or not. They can be accessed by using QSAM, BSAM, or EXCP.
Large format data sets reduce the need to use multiple volumes for single data sets, especially very large ones such as spool data sets, memory dumps, logs, and traces. Unlike extended-format data sets, which also support more than 65,535 tracks per volume, large format data sets are compatible with EXCP and do not need to be SMS-managed.
Data sets defined as large format must be accessed by using QSAM, BSAM, or EXCP.
Large format data sets have a maximum of 16 extents on each volume. Each large format data set can have a maximum of 59 volumes. Therefore, a large format data set can have a maximum of 944 extents (16 times 59).
A large format data set can occupy any number of tracks, without the limit of 65,535 tracks per volume. The minimum size limit for a large format data set is the same as for other sequential data sets that contain data: One track, which is about 56,000 bytes. Primary and secondary space can both exceed 65,535 tracks per volume.
Large format data sets can be on SMS-managed DASD or non-SMS-managed DASD.
 
Restriction: The following types of data sets cannot be allocated as large format data sets:
PDS, PDSE, and direct data sets
Virtual I/O data sets, password data sets, and system memory dump data sets
You can allocate a large format data set by using the DSNTYPE=LARGE parameter on the DD statement, dynamic allocation (SVC 99), TSO/E ALLOCATE, or the access method services ALLOCATE command.
For more information about large data sets, see IBM z/OS DFSMS Using data sets, SC23-6855.
Extended-format data sets
While sequential data sets have a maximum of 16 extents on each volume, extended-format data sets have a maximum of 123 extents on each volume. Each extended-format data set can have a maximum of 59 volumes, so an extended-format sequential data set can have a maximum of 7257 extents (123 times 59).
When defined as extended-format, sequential data sets can go beyond the 65,535 tracks limitation per data set. For VSAM files, extended addressability needs to be enabled for the data set in addition to extended-format.
An extended-format, striped sequential data set can contain up to approximately four billion blocks. The maximum size of each block is 32 760 bytes.
An extended-format data set supports the following additional functions:
Compression, which reduces the space for storing data and improves I/O, caching, and buffering performance.
Data striping, which in a sequential processing environment distributes data for one data set across multiple SMS-managed DASD volumes, improving I/O performance and reducing the batch window. For more information about data set striping, see 2.3, “Data set striping” on page 17.
Extended-addressability, which enables you to create a VSAM data set that is larger than 4 GB.
They are able to recover from padding error situations.
They can use the system-managed buffering (SMB) technique.
Virtual input/output data sets
You can manage temporary data sets with a function called virtual input/output (VIO). VIO uses DASD space and system I/O more efficiently than other temporary data sets.
You can use the BPAM, BSAM, QSAM, BDAM, and EXCP access methods with VIO data sets. SMS can direct SMS-managed temporary data sets to VIO storage groups.
Data set organization
Data set organization (DSORG) specifies the organization of the data set as physical sequential (PS), partitioned (PO), or direct (DA). If the data set is processed using absolute rather than relative addresses, you must mark it as unmovable by adding a U to the DSORG parameter (for example, by coding DSORG=PSU). You must specify the data set organization in the DCB macro. In addition, remember these guidelines:
When creating a direct data set, the DSORG in the DCB macro must specify PS or PSU and the DD statement must specify DA or DAU.
PS is for sequential and extended format DSNTYPE.
PO is the data set organization for both PDSEs and PDSs. DSNTYPE is used to distinguish between PDSEs and PDSs.
For more information about data set organization, data set types, and other data set attributes, see IBM z/OS DFSMS Using data sets, SC23-6855.
Data set naming
Whenever you allocate a new data set, you (or z/OS) must give the data set a unique name. Usually, the data set name is given as the DSNAME keyword in job control language (JCL).
A data set name can be one name segment, or a series of joined name segments. Each name segment represents a level of qualification. For example, the data set name HARRY.FILE.EXAMPLE.DATA is composed of four name segments. The first name on the left is called the high-level qualifier (HLQ), and the last name on the right is the lowest-level qualifier (LLQ).
Each name segment (qualifier) is 1 to 8 characters, the first of which must be alphabetic (A - Z) or national (# @ $). The remaining seven characters are either alphabetic, numeric (0 - 9), national, or a hyphen (-). Name segments are separated by a period (.).
 
Note: Including all name segments and periods, the length of the data set name must not exceed 44 characters. Thus, a maximum of 22 name segments can make up a data set name.
2.3 Data set striping
Striping is a software implementation that distributes sequential data sets across multiple 3390 volumes. Sequential data striping can be used for physical sequential data sets that cause I/O bottlenecks for critical applications. Sequential data striping uses extended-format sequential data sets that SMS can allocate over multiple volumes, preferably on separate channel paths and control units, to improve performance. These data sets must be on 3390 volumes that are on the IBM DS8000 family.
Sequential data striping can reduce the processing time that is required for long-running batch jobs that process large, physical sequential data sets. Smaller sequential data sets can also benefit because of DFSMS's improved buffer management for QSAM and BSAM access methods for striped extended-format sequential data sets.
A stripe in DFSMS is the portion of a striped data set that is on a single volume. The records in that portion are not always logically consecutive. The system distributes records among the stripes so that the volumes can be read from or written to simultaneously to gain better performance. Whether the data set is striped is not apparent to the application program.
Data striping distributes data for one data set across multiple SMS-managed DASD volumes, improving I/O performance and reducing batch window. For example, a data set with 28 stripes is distributed across 28 volumes.
You can write striped extended-format sequential data sets with the maximum physical block size for the data set plus the control information required by the access method. The access method writes data on the first volume selected until a track is filled. The next physical blocks are written on the second volume selected until a track is filled, continuing until all volumes selected have been used or no more data exists. Data is written again to selected volumes in this way until the data set has been created. A maximum of 59 stripes can be allocated for a data set. For striped data sets, the maximum number of extents on a volume is 123.
2.3.1 Physical sequential and VSAM data sets
The sustained data rate (SDR) only affects extended-format data sets. Striping allows you to spread data across DASD volumes and controllers. The number of stripes is the number of volumes on which the data set is initially allocated. Striped data sets must be system-managed and must be in an extended format. When no volumes that use striping are available, the data set is allocated as nonstriped with EXT=P specified in the data class. The allocation fails if EXT=R is specified in the data class.
Physical sequential data sets cannot be extended if none of the stripes can be extended. For VSAM data sets, each stripe can be extended to an available candidate volume if extensions fail on the current volume.
Data classes
Only extended-format, SMS-managed data sets can be striped. To achieve that, you can use data class to allocate sequential and VSAM data sets in extended format for the benefits of compression (sequential and VSAM KSDS), striping, and large data set sizes.
Storage groups
SMS calculates the average preference weight of each storage group using the preference weights of the volumes that will be selected if the storage group is selected for allocation. SMS then selects the storage group that contains at least as many primary volumes as the stripe count and has the highest average weight.
If there are no storage groups that meet these criteria, the storage group with the largest number of primary volumes is selected. If multiple storage groups are tied for the largest number of primary volumes, the one with the highest average weight is selected. If there are still multiple storage groups that meet the selection criteria, SMS selects one at random.
For striped data sets, ensure that are enough separate paths are available to DASD volumes in the storage group to allow each stripe to be accessible through a separate path. The maximum number of stripes for physical sequential (PS) data sets is 59. For VSAM data sets, the maximum number of stripes is 16. Only sequential or VSAM data sets can be striped.
Striping volume selection
Striping volume selection is very similar to conventional volume selection. Volumes that are eligible for selection are classified as primary and secondary, and assigned a volume preference weight based on preference attributes.
Data set separation
Data set separation allows you to designate groups of data sets in which all SMS-managed data sets within a group are kept separate, on the physical control unit (PCU) level or the volume level, from all the other data sets in the same group.
To use data set separation, you must create a data set separation profile and specify the name of the profile to the base configuration. During allocation, SMS attempts to separate the data sets listed in the profile. A data set separation profile contains at least one data set separation group.
Each data set separation group specifies whether separation is at the PCU or volume level, whether it is required or preferred, and includes a list of data set names to be separated from each other during allocation.
2.4 Accessing your data sets
After your data sets are created, you must be able to locate your data for future use by other programs and applications. To accomplish that, z/OS provides catalogs and volume tables of contents (VTOCs) to store data set related information, and allow a quick search interface.
Several structures can be searched during a locate request on z/OS systems, including these:
VTOC The volume table of contents is a sequential data set located in each DASD volume that describes the data set contents of this volume. The VTOC is used to find empty space for new allocations and to locate non-VSAM data sets.
Master catalog This structure is where all catalog searches begin. The master catalog contains information about system data sets, or points to the catalog that has the information about the requested data set.
User catalog A user catalog is the next step in the search hierarchy. After the master catalog points to a user catalog related to an Alias, the user catalog retrieves the data set location information.
Alias A special entry in the master catalog that points to a user catalog that coincides with the HLQ of a data set name. The alias is used to find in which user catalog the data set location information exists. That means that the data set with this HLQ is cataloged in that user catalog.
Figure 2-2 shows the difference between referencing a cataloged and an uncataloged data set on JCL.
Figure 2-2 Referencing cataloged and uncataloged data sets in JCL
2.4.1 Accessing cataloged data sets
When an existing data set is cataloged, z/OS obtains unit and volume information from the catalog by using the LOCATE macro service. When z/OS tries to locate an existing data set, the following sequence takes place:
1. The master catalog is examined:
 – If it has the searched data set name, the volume information is picked up and the volume VTOC is used to locate the data set in the specified volume. If the data set is not found in the catalog, there is no alias for the data set HLQ, or the data set does not physically exist, the locate returns a not found error.
 – If the HLQ is a defined alias in the master catalog, the user catalog is searched. If the data set name is found, processing proceeds as in a master catalog find. If the data set is not found in the catalog, or the data set does not physically exist, the locate returns a not found error.
2. The requesting program accesses the data set. As you can imagine, it is impossible to keep track of the location of millions of data sets without the catalog concept.
For detailed information about catalogs, see Chapter 6, “Catalogs” on page 103.
2.4.2 Accessing uncataloged data sets
When your existing data set is not cataloged, you must know in advance its volume location and specify it in your JCL. This specification can be done through the UNIT and VOL=SER.
See z/OS MVS JCL Reference, SA22-7597 for information about UNIT and VOL parameters.
 
Note: Avoid having uncataloged data sets in your installation because uncataloged data sets can cause problems with duplicate data and possible incorrect data set processing.
2.5 VTOC and DSCBs
The VTOC is a data set that describes the contents of the DASD volume on which it resides. It is a contiguous data set. That is, it is in a single extent on the volume, starts after cylinder 0, track 0, and ends before track 65,535.
Each VTOC record contains information about the volume or data sets on the volume. These records are called data set control block (DSCBs) and are described next.
2.5.1 Data set control block
The VTOC is composed of 140-byte DSCBs that point to data sets currently residing on the volume, or to contiguous, unassigned (free) tracks on the volume (depending on the DSCB type).
DSCBs also describe the VTOC itself. The common VTOC access facility (CVAF) routines automatically construct a DSCB when space is requested for a data set on the volume. Each data set on a DASD volume has one or more DSCBs (depending on its number of extents) describing space allocation and other control information such as operating system data, device-dependent information, and data set characteristics. There are nine kinds of DSCBs, each with a different purpose and a different format number.
The first record in every VTOC is the VTOC DSCB (format-4). The record describes the device, the volume that the data set is on, the volume attributes, and the size and contents of the VTOC data set itself. The next DSCB in the VTOC data set is a free-space DSCB (format-5) that describes the unassigned (free) space in the full volume. The function of various DSCBs depends on whether an optional Index VTOC is allocated in the volume. Index VTOC is a sort of B-tree that helps make the search in VTOC faster.
For more information about VTOC and DSCB allocation and usage, see z/OS DFSMSdfp Advanced Services, SC23-6861.
2.5.2 VTOC index
The VTOC index enhances the performance of VTOC access. The VTOC index is a physical-sequential data set on the same volume as the related VTOC, created by the ICKDSF utility program. It consists of an index of data set names in format-1 DSCBs contained in the VTOC and volume free space information.
 
Important: An SMS-managed volume requires an indexed VTOC. Otherwise, the VTOC index is highly preferred. For more information about SMS-managed volumes, see z/OS DFSMS Implementing System-Managed Storage, SC26-7407.
If the system detects a logical or physical error in a VTOC index, the system disables further access to the index from all systems that might be sharing the volume. In that case, the VTOC remains usable but with possibly degraded performance.
If a VTOC index becomes disabled, you can rebuild the index without taking the volume offline to any system. All systems can continue to use that volume without interruption to other applications, except for a brief pause during the index rebuild. After the system rebuilds the VTOC index, it automatically reenables the index on each system that has access to it.
2.5.3 Creating VTOC and VTOC Index
Before you can use a DASD, you must initialize your volume, creating a volume label, a VTOC, and preferably a VTOC index. ICKDSF is a program that you can use to perform the functions needed for the initialization, installation, use, and maintenance of DASD volumes. You can also use it to perform service functions, error detection, and media maintenance. However, due to the virtualization of the volumes there is no need for running media maintenance.
You use the INIT command to initialize volumes. The INIT command writes a volume label (on cylinder 0, track 0) and a VTOC on the device for use by z/OS. It reserves and formats tracks for the VTOC at the location specified by the user and for the number of tracks specified. If no location is specified, tracks are reserved at the default location.
If the volume is SMS-managed, the STORAGEGROUP option must be declared to keep such information (SMS-managed) in a format-4 DSCB.
For more information about how to use ICKDSF to maintain your DASD devices, see ICKDSF R17 User's Guide, GC35-0033.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.114.142