Home

Linux Logical Volume Manager (LVM)

The Logical Volume Manager uses a device mapper framework which creates an abstract layer of virtual devices to manage phyical devices with greater flexibility. Device mappers is not only for LVM, but is also used for software RAID, disk encryption, and features such as snapshots. Other features provided by a device mapper are:

  • cache - allows the creation of a hybrid volume similar to SSHDs such as Apple's Fusion Drive. This is almost identical to using Intel's Smart Response Technology.
  • crypt - Using the Linux Kernel's Crypto API, data is encrypted as data is passed through the logical volume manager's layer before being written to disk.
  • delay - Simply delays reads and writes to different devices for testing purposes.
  • era - Tracks which parts of a block device are written to over time. This is useful for maintaining cache coherency when using vendor snapshots.
  • flaky - Simulates intermittently fails I/Os for debugging purposes.
  • linear - Concatenates multiple devices to form one large device.
  • mirror - Allow volume managers to mirror logical volumes. This is also needed for live data migration tools such as pvmove.
  • multipath - Adds support for multipathed devices using devices that's connected to multiple SCSI or Fiber Channel ports.
  • RAID - offers an interfaces to the Linux kernel's software RAID driver.
  • snapshot - Allows snapshots of the volume to be taken.
  • striped (switch target) - Strips the data across physical devices, with an arbitrary mapping of fixed-size regions of I/O across a fixed set of paths.
  • zero - equivalent of /dev/zero

The common uses for LVM include utilizing several physical volumes or devices as a single logical volume for RAID, easier management of large hard disk farms by allowing disks to be added or replaced without any disruption, and the ability to resize partitions.

Terminology

  • Physical Volume (PV) - Physical volume is either a disk or partition.
  • Physical Extent (PE) - A chunk of data belonging to a physical volume. Each PE is the same size as the logical extents for the volume group.
  • Volume Group (VG) - Group of physical volumes and the highest level abstraction used within the LVM.
  • Logical Volume (LV) - Equivalent of a disk partition in a non-LVM system. A logical volume is seen as a standard block device and thus can contain a file system.
  • Logical Extent (LE) - A chunk of data belonging to a volume group. Each logical extent in a volume group is the same side.

/etc/lvm/lvm.conf

The lvm.conf file contains the default configuration and command line options if none are specified during the execution of the various lvm commands. For desktop computers and laptops, this file is generally fine. In other environments such as data centers or business settings some of the configurations settings may need to be tweaked.

For more information about the settings, explanations for each setting can be found in the /etc/lvm/lvm.conf file.


Physical Volumes

Physical volumes are the foundation of LVM, without them you cannot create groups or logical volumes. If you plan on using a whole disk as a physical volume, the disk must have no partition table.

# lsblk
NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda             8:0    0   20G  0 disk 
├─sda1          8:1    0  500M  0 part /boot
└─sda2          8:2    0 19.5G  0 part 
  ├─rhel-root 253:0    0 17.5G  0 lvm  /
  └─rhel-swap 253:1    0    2G  0 lvm  [SWAP]
sdb             8:16   0    8G  0 disk 
sdc             8:32   0    8G  0 disk 
sr0            11:0    1 1024M  0 rom 

To clear partition tables of drive's partition tables run the dd command and zero out the first 512 bytes.

# dd if=/dev/zero of=/dev/sdb bs=512 count=1
# dd if=/dev/zero of=/dev/sdc bs=512 count=1

Initialize the physical volumes

# pvcreate /dev/sdb /dev/sdc
Physical volume "/dev/sdb" successfully created
Physical volume "/dev/sdc" successfully created

or initialize disk partitions as physical volumes rather than whole disks

# pvcreate /dev/sdb1 /dev/sdc1
Physical volume "/dev/sdb1" successfully created
Physical volume "/dev/sdc1" successfully created

Options can be added to the pvcreate command to tell where the raw and meta data is to written on the device.

--norestorefileUsed in onjunction with the --uuid, the uuid specified will require that a backup of the metadata be provided.
--restorefile <file>Used in conjunction with the --uuid option, the pvcreate command extracts the location and size of the data on the PV from the file (produced by vgcfgbackup) and ensures that the metadata that the program produces is consistent with the contents of the file.
-M|--metadatatype 1|2Specifies the version of LVM to use.
--pvmetadatacopies #copiesSpecify the number of copies to keep on the physical volume. Either 0, 1, or 2 can be specified. The default is 1, which is kept at the start of the physical volume. When speficying 2, the first is at the start of the physical volume while the other is kept at the end.
--bootloaderareasize BootLoaderAreaSize[bBsSkKmMgGtTpPeE]

Creates a separate bootloader area of a specific size. The bootloader area is an area that is reserved and will not allocate any physical extents to.
This is used with bootloaders to embed their own data or metadata. The start of the bootloader area is always aligned,

--dataalignment Alignment[bBsSkKmMgGtTpPeE]

Specifies how to align the data on the physical volume by using multiples of the specified size. When creating a volume group you should also specify an appropriate PhysicalExtentSize.

To see the location of the first Physical Extent of an existing Volume use following command:
pvs -o +pe_start

--dataalignmentoffset MetadataSize[bBsSkKmMgGtTpPeE]Shifts the start of the data area by the size specified.
--setphycialvolumesize PhysicalVolumeSize[bBsSkKmMgGtTpPeE]This overrides the auto-detections of the physical volume's size.
-u|--uuid uuidSpecifies the uuid of the physical volume.
-y|--yesAnswer yes to all the questions.
-Z|--zero [y|n]Whether or not to wipe the first 4 sectors (2048 bytes) of the device. If the option is not given, the default is to zero the first 4 sectors unless either or both the --restorefile or --uuid options are provided.

If you go the route of creating LVM partition rather than using the whole disk, it's best not to create multiple LVM partitions on a single drive. Doing so will makes it more difficult to administer and will degrade striping performance.

To view a list of block devices that can be used:

# lvmdiskscan -l
  WARNING: only considering LVM devices
  /dev/sda2      [      19.51 GiB] LVM physical volume
  /dev/sdb       [       8.00 GiB] LVM physical volume
  /dev/sdc       [       8.00 GiB] LVM physical volume
  2 LVM physical volume whole disks
  1 LVM physical volume

Additional tools to display properties of LVM physical volumes are pvs, pvdisplay, and pvscan.

# pvs
  PV         VG   Fmt  Attr PSize  PFree 
  /dev/sda2  rhel lvm2 a--  19.51g 40.00m
  /dev/sdb        lvm2 ---   8.00g  8.00g
  /dev/sdc        lvm2 ---   8.00g  8.00g

Options to the pvs and pvdisplay commands allow you to customize the output by seeing everything; --all, removing headers; --noheadings, no suffixes; --nosuffix, or specifying the units when displaying sizes; --units (hHbBsSkKmMgGtTpPeE).

# pvdisplay
  --- Physical volume ---
  PV Name               /dev/sda2
  VG Name               rhel
  PV Size               19.51 GiB / not usable 3.00 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              4994
  Free PE               10
  Allocated PE          4984
  PV UUID               Bgdkmr-FK04-AiLF-nCTK-Eefn-DNN5-tmmE16
   
  "/dev/sdb" is a new physical volume of "8.00 GiB"
  --- NEW Physical volume ---
  PV Name               /dev/sdb
  VG Name               
  PV Size               8.00 GiB
  Allocatable           NO
  PE Size               0   
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               w7c8Df-ovfj-oYxI-BhBX-uB1D-FpEk-YvqCz3
   
  "/dev/sdc" is a new physical volume of "8.00 GiB"
  --- NEW Physical volume ---
  PV Name               /dev/sdc
  VG Name               
  PV Size               8.00 GiB
  Allocatable           NO
  PE Size               0   
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               s3qvN0-uD2N-u7xp-yMfG-90b3-55E0-5EWFE8

The pvscan command is more limiting in customization of output.

# pvscan
  PV /dev/sda2   VG rhel   lvm2 [19.51 GiB / 40.00 MiB free]
  PV /dev/sdb              lvm2 [8.00 GiB]
  PV /dev/sdc              lvm2 [8.00 GiB]
  Total: 3 [35.51 GiB] / in use: 1 [19.51 GiB] / in no VG: 2 [16.00 GiB]

To change the size of a physical volume, use the pvresize. Resizing is only available on physical volumes and not whole disk physical volumes. This command, as well as the pvremove have few command line options are generally used with just arguments.

# pvresize --setphysicalvolumesize 18G /dev/sda

Removing physical volumes is straight forward.

# pvremove /dev/sdb

To change the attributes of a physical volume:

# pvchange

Checking the physical volumes metadata is done by using the pvck command:

# pvck /dev/sdb

 


Volume Groups

As the name suggests, Volume Groups is a group of physical volumes. 

# vgcreate volgrp /dev/sdb /dev/sdc
  Volume group "volgrp" successfully created

By default the disk space is divided into 4MB extents. When the logical volume is resized, the LV will be resized by increments defined by the size of the extents. The extents can be specified by using the -s, --phsicalextentsize option.

-c, --clustered y|nEnabled by default. This must be disabled (--clustered n) if the new volume group contains only local disks
-l, --maxlogicalvolumes MaxLogicalVolumesSets the maximum number of logical volumes allowed in the group. This setting can be changed using the vgchange command.
-p, --maxphysicalvolumes MaxPhysicalVolumesSets the maximum number of physical volumes that can belong to the group. This setting can be changed using the vgchange command.
--vgmetadatacopies NumberOfCopies|unmanaged|allSpecifies the number of metadata copies in the volume group. Setting this to a non-zero value will cause LVM to manage the metadataignore flags on physical volumes. Setting the value to all will clear the metadataignore flags on all metadata areas in the group, then set the value to unmanaged.
-s, --physicalextentsize PhysicalExtentSize[bBsSkKmMgGtTpPeE]Sets the size physical extents on the physical volumes.
--sharedCreated the volume group shared. This allows multiple hosts to share a volume group on shared devices.
--systemid SystemIDSpecifies the system ID that will be assigned to the volume group.
--alloc [AllocationPolicy]Specifies how extents are placed. The default is normal, where the LVM will not place parallel stripes on the same physical volume. When using the contiguous policy, extents are placed right after the previous extent, providing there is sufficient space. Cling places new extents on the same physical volumeas existing extents in the same stripe. Policies can be modified by using the vgchange command.

Physical volumes can be added or removed from a volume group at any time. Using the vgextend command, one or more physical volumes can be added. In the event that a physical volume has gone missing, use the --restoremissing option to add it back without re-initializing it. 

# vgextend volgrp /dev/sdd
# vgreduce volgrp /dev/sdd

As demonstrated above, the vgreduce removes a specified physical volume. Issuing the -a, --all option will remove all unused physical volumes a group. In the event that there are missing physical volumes in a group, using the --removemissing will remove them providing there are no logical volumes allocated on them.

Parameters of a volume group can be modified by using the vgchange command. A useful option of the vgchange command is the -a, --activate option that allows you to activate or deactivate groups.

-A, --autobackup Sets whether or not a backup of metadata is created after a change. The default is yes.
-a, --activate [a|e|l] y|n

Controls whether or not logical volumes are known to the kernel. If the activation policy is set to auto, each logical volume in the group is activated only if it matches an item in the activation/auto_activation_volume_list set that is found in the lvm.conf file.If the list is not defined, then all volumes are considered for activation. To activate the group at boot use the -aay option.

--activationmode complete|degraded|partialDetermines whether or not logical volumes are allowed to activate when there are physical volumes missing.
-c, --clustered y|nWhen enabled, this indicates whether the specified volume group is shared with other nodes or if it only contains local disks not visible on other nodes.
--monitor y|nStart or stop monitoring a mirrored or snapshot logical volume with dmeventd.
--poll y|nWithout polling, a logical voume's background transformation process will never complete. In the event of an incomplete pvmove or lvconvert, this will enable the process to complete.
--sysinitIndicates that vgchange is being invoked from an earlier system initialization program such as rc.sysinit or initrd, before writeable filesystems are available.
--noudevsyncDisable udev synchronization.
--ignoremonitoringDisable monitoring. This shouldn't be used if dmeventd is already monitoring a device.
-l, --logicalvolume MaxPhysicalVolumesChanges the number of maximum logical volumes that can exist in a group.
-p, --maxphysicalvolumesChanges the number of maximum physical volumes that can exist in a group.
-s, --physicalextentsize PhysicalExtentSize[BbBsSkKmMgGtTpPeE]Changes the physical extent size on physical volumes of the specified group.
--refreshReloads the metadata for logical volumes in the group.
-x, --resizeable {y|n}Enables or disables the ability to resize the volume group.

To rename a volume group, use:

# vgs
  VG     #PV #LV #SN Attr   VSize  VFree 
  rhel     1   2   0 wz--n- 19.51g 40.00m
  volgrp   2   0   0 wz--n-  7.99g  7.99g
# vgrename volgrp storage
  Volume group "volgrp" successfully renamed to "storage"
# vgs
  VG      #PV #LV #SN Attr   VSize  VFree 
  rhel      1   2   0 wz--n- 19.51g 40.00m
  storage   2   0   0 wz--n-  7.99g  7.99g

In the event that volume groups are not being displayed or changes being reflected run the vgscan

# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "rhel" using metadata type lvm2
  Found volume group "storage" using metadata type lvm2

To display volume group information:

# vgdisplay
  --- Volume group ---
  VG Name               rhel
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  3
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               2
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               19.51 GiB
  PE Size               4.00 MiB
  Total PE              4994
  Alloc PE / Size       4984 / 19.47 GiB
  Free  PE / Size       10 / 40.00 MiB
  VG UUID               TGgpMv-TJGX-TIKu-gsjF-dtLc-2Vck-qgdKL3
   
  --- Volume group ---
  VG Name               storage
  System ID             
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                0
  Open LV               0
  Max PV                0
  Cur PV                2
  Act PV                2
  VG Size               7.99 GiB
  PE Size               4.00 MiB
  Total PE              2046
  Alloc PE / Size       0 / 0   
  Free  PE / Size       2046 / 7.99 GiB
  VG UUID               sxdpcG-rDI4-uSXe-Wqqb-wuRK-ha8t-w15U6g

To move physical volumes in a volume group to another system, use the vgexport and vgimport commands. The VG system ID is cleared on export and updated with the new system on import. If this is not done, LVM2 tools will ignore volume groups that don't belong to the system.

# vgchange -an storage
# vgexport storage
# pvscan
# vgimport storage
# vgchange -ya storage

If you want to merge two volume groups, this can be done with the vgmerge command. This can only be done if the physical extents of each group is equal and the physical and logical summaries of both groups fit into the destination's limits. There is a -t option that will to test the operation before performing it.

# vgchange -an to_merge 
# vgmerge storage to_merge

When a volume group is created a backup of the descriptors is created and stored in the /etc/lvm directory. If no volume groups are specified by the vgcfgbackup command then each group is backed up to a file named after the volume group. Using the -f option will allow you to specify an alternative name of the file. This command does not backup the /etc/lvm directory so it's important that this be backuped.

# vgcfgbackup storage

To restore the metadata of a volume group use the vgcfgrestore command. If no file is specified by using the -f option, the most recent backup will be used for that volume group.

# vgcfgrestore storage

In the event that volume groups are not being created or incorrect within the /dev directory, the vgmknodes command will create the device nodes.

# vgmknodes 
# vgscan --mknodes

When a volume group is no longer needed:

# vgremove storage

A volume group can be split into two groups with the use of the vgsplit command. Some of the options for this command are also found in the vgcreate command, such as -p --maxphysicalvolumes and -n --name.

# vgsplit -n storage2 storage /dev/sdb

Logical Volumes

The logical volumes is where the data is written to, at least indirectly. To create a basic logical volume:

# lvcreate -L 8G -n storage vgStorage

-a, --activateSets the availablity of logical volumes for immediate use after running the lvcreate command. The default behavior is that they are activated after the completion of the command. Although possible, a logical volume can be created but not activated. When using the --type snapshot option. Specification of logical volumes not being activated on start does not apply to thin volume snapshots because by default they are not activated.
-H, --cacheThis creates the logical volume as a cache, cache pool, or both. Using the optional argument, --size will cause the creation of the cache logical volume.
--cachemode passthrough|writeback|writethroughSpecifies the cache mode. When writeback is declared a write is considered complete as soon as it is stored in the cache pool.
--cachepolicy policy
Sets the cache policy, which is only available for cached LV. mq is the basic policy name and smq is the more advanced version available in newer kernels.
--cachepool CachePoolLogicalVolume{Name|Path}
Specifes the cache pool volume name.
--cachesettings key=value
Only applicable to cached LVs. Defaults should be adequate in most cases. See lvmcache.
-c, --chunksize ChunkSize[bBsSkKmMgG]
Gives the chunk size for snapshots, cache pool, and thin pool logical volumes. Snapshot sizes must be a power of 2 between 4KiB and 512KiB with the default being 4KiB.
For cache pools the values must be multiples of 32KiB between 32KiB and 1GiB with the default being 64KiB.
Thin pool values must be a multiple of 64KiB between 64KiB and 1GiB with the default being 64KiB and grows to fit the pool metadata size within 128MiB, if the pool metadata size is not specified.
-C, --contiguous y|n
Sets or resets the allocation policy for logical volumes. The default is no. 
--discards ignore|nopassdown|passdown
Sets discard behavior for thin pool.
--errorwhenfull y|n
The default is no. The device will queue I/O operations until the target times out expires. 
-K, --ignoreactivationskip Ignores the flag to that indicates the logical volumes should be skipped during activation.
--ignoremonitoring
Makes no attempt to interact with the dmeventd unless --monitor is specified.
-l, --extents LogicalExtentsNumber[%{VG|PVS|FREE|ORIGIN}]
 Specifies the number of logical extents to allocate for the new logical volume. 
-j, --major major
Sets the major number. Major numbers are not supported with pool volumes.
--minor minor
Sets the minor number. Minor numbers are not supported with pool volumes. 
-m, --mirrors Mirrors
Creates a mirrored logical volume with Mirrors copies. Specifying --nosync will cause the creation of the mirror to skip the initial resynchronization. 
--mirrorlog disk|core|mirrored
Specifies the type of log to be used for logical volumes utilizing the legacy "mirror" segment type. The default is disk, which is persistent and requires a small amount of storage space. 
Using core means the mirror is regenerated by copying the data from the first device each time the logical volume is activated, such as after each reboot.
Mirroed will create a persistent log that is itself mirrored.
--monitor y|n
Starts or avoids monitoring a mirrored, snapshot, or thin pool logical volume with dmeventd.
-n, --name LogicalVolumeName|Path
Specifies the name of the logical volume.
--nosync
The creation of the mirror to skip the initial resynchronization. 
--noudevsync
Disables udev synchronization meaning that the process will not wait for notification from udev.
-p, --permission r|rw
Sets access permissions of the logical volume. 
-M, --persistent y|n
Set to y to make the minor number specified persistent. Pool volumes cannot have persistent major and minor numbers. 
--poolmetadatasize MetadataVolumeSize[bBsSkKmMgG]
 Set the size of the size of the pool meta databata specified by MetadataVolumeSize.
Thin pools can range from 2MiB to 16GiB.
For cache pool up to 16GiB
--poolmetadataspare y|n Controls  creation and maintanence of pool metadata spare logical volume that will be used for automated pool recovery.
--[raid]maxrecoveryrate Rate[bBsSkKmMgG]
Sets the maximum recovery rate. If no unit is specified, then KiB is used. Using 0 means its unbound.
--[raid]minrecoveryrate Rate[bBsSkKmMgG] Sets the minimum recovery rate. If no unit is specified, then KiB is used. Using 0 means its unbound.
-r, --readahead {ReadAheadSectors|auto|none}
Sets the read ahead sector count of the logical volume. 0 and auto are the same.
-R, --regionsize MirrorLogRegionSize[bBsSkKmMgG]
A mirror is divided into regions of the size specified. The mirror log uses the granularity to track which regions are in sync.
-k, --setactivationskip y|n
Sets whether or not volume should be persistently flagged to be skipped during activation. 
-L, --size LogicalVolumeSize[bBsSkKmMgGtTpPeE]
Sets the size of the logical volume.
-s, --snapshot OriginalLogicalVolume{Name|Path}
Creates a snapshot logical volume or a snap shot of a logical volume. 
Thin snapshot is created when the origin is a thin volume and the size is not specified.
-i, --stripes Stripes Provides the number of stripes. The number of physical volumes to use. When setting up a RAID 4/5/6 logical volume, the extra device used for parity are internally accounted for. For example, when specifying 3 stripes on a RAID 4/5 the number of stripes is actually 4. On a RAID 6 logical volume, the number of stripes is 5. 
-I, --stripesize StripeSize
Specifies the stripe size in powers of 2 but cannot exceed the physical extent size. 
-T, --thin
Creates thin pool or thin logical volume or both. Using the --size or --extents will cause the creation of the thin pool logical volume.
--thinpool ThinPoolLogicalVolume{Name|Path}
Specifies the name of the thin pool volume name.
--type SegmentType
Specifies the logical volume's type. The following types are supported: cache, cache-pool, error, linear, mirror, raid1, raid4, raid5_la, raid5_ls (used when raid5 is specified), raid5_ra, raid5_rs, raid6_nc, radi6_nr, raid6_zr (used when raid6 is specified), raid10, snapshot, striped, thin, thin-pool, or zero. See dm_raid for more information on the different versions of riad.
-V, --virtualsize VirtualSize[bBsSkKmMgGtTpPeE]
Creates a thinly provisioned device or spare device of a given size.
-W, --wipesignatures y|n
Determines wiping of detected signatures on newly created logical volumes. 

Other uses of the lvcreate include:

# lvcreate -i 3 -I 8 -L 10G storage
# lvcreate -m1 -L 10G storage
# lvcreate --size 100m --snapshot --name snap /dev/storage/data
# lvcreate -s -l 20%ORIGIN --name snap storage/data
# lvcreate --virtualsize 1T --size 8G --snapshot --name sparse storage
# lvcreate -L64M -n data storage /dev/sdb:0-2 /dev/sdc:0-2
# lvcreate --type raid5 -L 10G -i 3 -I 64 -n raid5 storage
# lvcreate --type raid10 -L 10G -i 2 -m 1 -n raid10 storage
  1. Creates a new logical volume with three stripes, each stripe being 8KiB. The total size of the voume is 10GiB with the name being selected by LVM.
  2. The new volume's name will be determined by LVM. The volume will be a mirror volume with the size of 10GiB.
  3. Creates a new snapshot volume named snap that has access to /dev/storage/data
  4. Createss a new volume snapshot called snap that has a size of 20% of the logical volume storage/data.
  5. Creates a sparse volume the size of 1TiB with a minum size of 8GiB
  6. Creates a new logical volume  using the physical extents from sdb:0-2 and sdc:0-2.
  7. Creates a 10GiB RAID5 utilizing 3 64KiB.
  8. Creates a new logical volume in RAID10 that is 10GiB with two stripes.

Like volume groups and physical volumes, LVM provides commands for listing, resizing, renaming, convert: lvs, lvdisplay, lvscan, lvchange, lvconvert, lvextend, lvreduce, lvremove, lvrename, and lvresize.


Putting it all together

In the following example, we're creating a new RAID1 logical volume that is 8GB.

# pvs -a
  PV             VG   Fmt  Attr PSize  PFree 
  /dev/rhel/root           ---      0      0 
  /dev/rhel/swap           ---      0      0 
  /dev/sda1                ---      0      0 
  /dev/sda2      rhel lvm2 a--  19.51g 40.00m
  /dev/sdb                 ---      0      0 
  /dev/sdc                 ---      0      0
# dd if=/dev/zero of=/dev/sdb bs=512 count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000737609 s, 694 kB/s
# dd if=/dev/zero of=/dev/sdc bs=512 count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000341841 s, 1.5 MB/s
# pvcreate /dev/sdb /dev/sdc
  Physical volume "/dev/sdb" successfully created
  Physical volume "/dev/sdc" successfully created
# vgcreate storage /dev/sdb /dev/sdc
  Volume group "storage" successfully created
# lvcreate --type raid1 -l 100%FREE -m 1 -n data storage
  Logical volume "data" created.
# mkfs.xfs /dev/storage/data
meta-data=/dev/storage/data      isize=512    agcount=4, agsize=523776 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=2095104, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
# pvs
  PV         VG      Fmt  Attr PSize  PFree 
  /dev/sda2  rhel    lvm2 a--  19.51g 40.00m
  /dev/sdb   storage lvm2 a--   8.00g     0 
  /dev/sdc   storage lvm2 a--   8.00g     0
# vgs
  VG      #PV #LV #SN Attr   VSize  VFree 
  rhel      1   2   0 wz--n- 19.51g 40.00m
  storage   2   1   0 wz--n- 15.99g     0
# lvs
  LV   VG      Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root rhel    -wi-ao---- 17.47g                                                    
  swap rhel    -wi-ao----  2.00g                                                    
  data storage rwi-aor---  7.99g                                    100.00

mysqldump Too many open files

When runing the mysqldump command there's a possibility that you will receive the error message: mysqldump: Couldn't execute 'show create table `table`': Out of resources when opening file './database/table.MYD' (Errcode: 24 "Too many open files") (23)

There are several possibilities as to why this error is occuring. Before changing the number of open files allowed, you can try using the --single-transaction command line option.

# mysqldump --all-databases --single-transaction -p > databases.sql

Unfortunately, this didn't work in my case and I had to try several fixes before I was able to do a full database dump.

open_files_limit

The open_files_limit configuration variable and --open-files-limit command line option sets the number of file descriptors available to MariaDB and MySQL.

# mysqld --open-files-limit=4096 --basedir=/usr

Instead of having to specify the number of open files allowed to MariaDB/MySQL open_files_limit can be added to the appropiate configuration file under different sections. For example, If you just want to change the open file limit for mysqldump, then add the open_files_limit setting under the [mysqldump] section.  

[mysqld]
open_files_limit = 4096

[mariadb]
open_files_limit = 4096

[mysqldump]
open_files_limit = 4096
# systemctl restart mariadb

ulimit

The number of files opened by a process may be limited by the OS:

# ulimit -S
1024
# ulimit -H
1024
# su -l -s /bin/bash mysql
$ ulimit -S
1024
$ ulimit -H
1024

The command on line 5 logs into the the mysql account so you can verify the file limits for mysql user. The -l and -s /bin/bash are necessary if the mysql user is a system account. To change the number of files being open permanent edit the following file:

mysql soft nofile 4096
mysql hard nofile 4096

systemctl

The above methods will not work if MariaDB/MySQL is being started via systemd. LimitNOFILE and LimitMEMLOCK needs to be added to the [Service] section of the systemd service file for MariaDB/MySQL.

[Unit]
Description=MySQL database server
After=syslog.target
After=network.target
Conflicts=mariadb.service

[Service]
Type=simple
User=mysql
Group=mysql

# Add the following lines and set the value accordingly. 
LimitNOFILE = infinity 
LimitMEMLOCK = infinity

# Note: we set --basedir to prevent probes that might trigger SELinux alarms,
# https://bugzilla.redhat.com/show_bug.cgi?id=547485
ExecStart=/usr/sbin/mysqld --basedir=/usr
ExecStartPost=/usr/libexec/mysqld-wait-ready $MAINPID

# Give a reasonable amount of time for the server to start up/shut down
TimeoutSec=300

# We rely on systemd, not mysqld_safe, to restart mysqld if it dies
# Restart crashed server only, on-failure would also restart, for example, when
# my.cnf contains unknown option
Restart=on-abort
RestartSec=5s

# Place temp files in a secure directory, not /tmp
PrivateTmp=true

# To allow memlock to be used as non-root user if set in configuration
CapabilityBoundingSet=CAP_IPC_LOCK

# Prevent writes to /usr, /boot, and /etc
ProtectSystem=full

# Currently has issues with SELinux https://jira.mariadb.org/browse/MDEV-10404
# This is safe to uncomment when not using SELinux
#NoNewPrivileges=true

PrivateDevices=true

# Prevent accessing /home, /root and /run/user
ProtectHome=true

UMask=007

[Install]
WantedBy=multi-user.target

Any time a change is made to a systemd file, the daemon needs to be reloaded so it can apply the changes.

# systemctl daemon-reload
# systemctl restart mariadb
# mysql -u root -p
Enter password:
MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'open%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| open_files_limit | 4162  |
+------------------+-------+
1 row in set (0.00 sec)

The SHOW GLOBAL VARIABLES LIKE 'open%'; is a way of verifying that the changes took. In my case, prior to updating the mariadb.service file, the open_files_limit's value was 1024.

MariaDB Replication

Master Server

[mariadb]
log-bin
server_id=1
log-basename=master
#skip-networking
bind-address=192.168.1.100
  • log-bin causes the server to keep a log of all changes to the databses, both data and structure, as well as how long each statement took to execute.
  • The server_id is a integer 1 to 4,294,967,295. This ID must be unique for each server in a relication group.
  • log-basename=master specifies the basename of the log files. If this is not specified then the hostname will be used. Setting this will ensure that log files used won't change when the hostname does.
  • skip-networking will only allow local connections and thus will deny any replication.
  • If bind-address is set to 127.0.0.1, then only connections from the local machine will be accepted. 

Restart the service when done:

# systemctl restart mariadb

The binary log co-ordinates need to be obtained and used when setting up the slave server. To ensure that this doesn't change, the tables on the master must be locked.

GRANT REPLICATION SLAVE ON *.* TO replication_user;
FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS;
+-------------------+----------+--------------+------------------+
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000007 |      327 |              |                  |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

The replication_user an account set up prior for the sole purpose of replications. While the databases are locked, the existing databases, tables, and data needs to be migrated.

$ mysqldump --all-databases -u root -p > databases.sql

Slave Server

Set the server_id to any value other than the one given to the master server. Don't set the log-bin or log-basename options.

[mariadb]
server_id=2
# systemctl restart mariadb

Restore the database from the backup

$ mysql -u root -p < database.sql

Configure the slave to point to the real master server instead of itself and start the slave.

CHANGE MASTER TO
  MASTER_HOST='192.168.1.100',
  MASTER_USER='replication_user',
  MASTER_PASSWORD='password',
  MASTER_PORT=3306,
  MASTER_LOG_FILE='master-bin.000007',
  MASTER_LOG_POS=327,
  MASTER_CONNECT_RETRY=10;
START SLAVE;

Testing

To make sure that replication is working, create a new database on the master then log into the slave. If the new database is there then replication is working.

Pages