Blame view

kernel/linux-rt-4.4.41/Documentation/device-mapper/log-writes.txt 4.73 KB
5113f6f70   김현기   kernel add
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
  dm-log-writes
  =============
  
  This target takes 2 devices, one to pass all IO to normally, and one to log all
  of the write operations to.  This is intended for file system developers wishing
  to verify the integrity of metadata or data as the file system is written to.
  There is a log_write_entry written for every WRITE request and the target is
  able to take arbitrary data from userspace to insert into the log.  The data
  that is in the WRITE requests is copied into the log to make the replay happen
  exactly as it happened originally.
  
  Log Ordering
  ============
  
  We log things in order of completion once we are sure the write is no longer in
  cache.  This means that normal WRITE requests are not actually logged until the
  next REQ_FLUSH request.  This is to make it easier for userspace to replay the
  log in a way that correlates to what is on disk and not what is in cache, to
  make it easier to detect improper waiting/flushing.
  
  This works by attaching all WRITE requests to a list once the write completes.
  Once we see a REQ_FLUSH request we splice this list onto the request and once
  the FLUSH request completes we log all of the WRITEs and then the FLUSH.  Only
  completed WRITEs, at the time the REQ_FLUSH is issued, are added in order to
  simulate the worst case scenario with regard to power failures.  Consider the
  following example (W means write, C means complete):
  
  W1,W2,W3,C3,C2,Wflush,C1,Cflush
  
  The log would show the following
  
  W3,W2,flush,W1....
  
  Again this is to simulate what is actually on disk, this allows us to detect
  cases where a power failure at a particular point in time would create an
  inconsistent file system.
  
  Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as
  they complete as those requests will obviously bypass the device cache.
  
  Any REQ_DISCARD requests are treated like WRITE requests.  Otherwise we would
  have all the DISCARD requests, and then the WRITE requests and then the FLUSH
  request.  Consider the following example:
  
  WRITE block 1, DISCARD block 1, FLUSH
  
  If we logged DISCARD when it completed, the replay would look like this
  
  DISCARD 1, WRITE 1, FLUSH
  
  which isn't quite what happened and wouldn't be caught during the log replay.
  
  Target interface
  ================
  
  i) Constructor
  
     log-writes <dev_path> <log_dev_path>
  
     dev_path	: Device that all of the IO will go to normally.
     log_dev_path : Device where the log entries are written to.
  
  ii) Status
  
      <#logged entries> <highest allocated sector>
  
      #logged entries	       : Number of logged entries
      highest allocated sector   : Highest allocated sector
  
  iii) Messages
  
      mark <description>
  
  	You can use a dmsetup message to set an arbitrary mark in a log.
  	For example say you want to fsck a file system after every
  	write, but first you need to replay up to the mkfs to make sure
  	we're fsck'ing something reasonable, you would do something like
  	this:
  
  	  mkfs.btrfs -f /dev/mapper/log
  	  dmsetup message log 0 mark mkfs
  	  <run test>
  
  	  This would allow you to replay the log up to the mkfs mark and
  	  then replay from that point on doing the fsck check in the
  	  interval that you want.
  
  	Every log has a mark at the end labeled "dm-log-writes-end".
  
  Userspace component
  ===================
  
  There is a userspace tool that will replay the log for you in various ways.
  It can be found here: https://github.com/josefbacik/log-writes
  
  Example usage
  =============
  
  Say you want to test fsync on your file system.  You would do something like
  this:
  
  TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
  dmsetup create log --table "$TABLE"
  mkfs.btrfs -f /dev/mapper/log
  dmsetup message log 0 mark mkfs
  
  mount /dev/mapper/log /mnt/btrfs-test
  <some test that does fsync at the end>
  dmsetup message log 0 mark fsync
  md5sum /mnt/btrfs-test/foo
  umount /mnt/btrfs-test
  
  dmsetup remove log
  replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync
  mount /dev/sdb /mnt/btrfs-test
  md5sum /mnt/btrfs-test/foo
  <verify md5sum's are correct>
  
  Another option is to do a complicated file system operation and verify the file
  system is consistent during the entire operation.  You could do this with:
  
  TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
  dmsetup create log --table "$TABLE"
  mkfs.btrfs -f /dev/mapper/log
  dmsetup message log 0 mark mkfs
  
  mount /dev/mapper/log /mnt/btrfs-test
  <fsstress to dirty the fs>
  btrfs filesystem balance /mnt/btrfs-test
  umount /mnt/btrfs-test
  dmsetup remove log
  
  replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs
  btrfsck /dev/sdb
  replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \
  	--fsck "btrfsck /dev/sdb" --check fua
  
  And that will replay the log until it sees a FUA request, run the fsck command
  and if the fsck passes it will replay to the next FUA, until it is completed or
  the fsck command exists abnormally.