今天遇到了一个很有意思的问题:“如果我在 mv 操作的中间 Ctrl C 会发生什么?文件会损坏吗?” 在不考虑错误处理的情况下,这个问题需要分情况讨论:
- 是否在同一个文件系统上?
- mv 的对象是文件还是文件夹?
首先是最简单的情况:在同一个文件系统上 mv 一个文件,此时 mv 会使用 rename
syscall 来完成:
execve("/usr/bin/mv", ["mv", "a", "b"], 0x7ffe6ae9f580 /* 51 vars */) = 0
...
renameat2(AT_FDCWD, "a", AT_FDCWD, "b", RENAME_NOREPLACE) = 0
...
exit_group(0) = ?
+++ exited with 0 +++
根据 POSIX 的定义,rename
操作是原子的,Linux 内核中后续增加的 renameat
与 renameat2
同样符合该要求。rename(2) 中明确了这一点:
rename() renames a file, moving it between directories if
required. Any other hard links to the file (as created using
link(2)) are unaffected. Open file descriptors for oldpath are
also unaffected.
If newpath already exists, it will be atomically replaced, so
that there is no point at which another process attempting to
access newpath will find it missing. However, there will
probably be a window in which both oldpath and newpath refer to
the file being renamed.
If newpath exists but the operation fails for some reason,
rename() guarantees to leave an instance of newpath in place.
在同一个文件系统上 mv 一个文件夹同样可以使用 rename
,此处不再赘述。
综上,在同一个文件系统上的 mv 操作总是原子的,不存在被 Ctrl-C
打断的时机,也就是人们常说的 uninterruptable
。
接下来再处理不在同一个文件系统上的情形。对 Linux 来说,不在同一个文件系统上实际上会被更严格的限制为不在同一个挂载点下。也就是说,无论他们底层是否是同一个文件系统,只要他们的挂载点不同,rename
syscall 都会返回 EXDEV
错误:
EXDEV oldpath and newpath are not on the same mounted
filesystem. (Linux permits a filesystem to be mounted at
multiple points, but rename() does not work across
different mount points, even if the same filesystem is
mounted on both.)
作为暴露给用户使用的 mv
不支持在文件系统中移动显然会非常不舒服,mv 中记录这些考虑:
The rename() function is able to move directories within the same
file system. Some historical versions of mv have been able to
move directories, but not to a different file system. The
standard developers considered that this was an annoying
inconsistency, so this volume of POSIX.1‐2017 requires
directories to be able to be moved even across file systems.
There is no -R option to confirm that moving a directory is
actually intended, since such an option was not required for
moving directories in historical practice. Requiring the
application to specify it sometimes, depending on the
destination, seemed just as inconsistent. The semantics of the
rename() function were preserved as much as possible. For
example, mv is not permitted to ``rename'' files to or from
directories, even though they might be empty and removable.
具体到实现上,mv 是通过退化成 copy & unlink
的方式来实现跨文件系统移动操作的,我们能够通过阅读源码和分析 strace 来证实这一点。Archlinux 使用的是 coreutils,不妨来简单的看一眼:
static bool
do_move (char const *source, char const *dest, const struct cp_options *x)
{
bool copy_into_self;
bool rename_succeeded;
bool ok = copy (source, dest, false, x, ©_into_self, &rename_succeeded);
if (ok)
{
...
if (dir_to_remove != NULL)
{
...
status = rm ((void*) dir, &rm_options);
assert (VALID_STATUS (status));
if (status == RM_ERROR)
ok = false;
}
}
return ok;
}
所以如果在跨文件系统 mv 文件的过程中调用 Ctrl C
,mv 可能会在 src 或者 dst 处留下一个不完整的文件,但是 mv 总是会保证 src/dst 中的一个是完整的,而不是两处都不完整。就像 manual page 中提到的:
If the copying or removal of source_file is prematurely
terminated by a signal or error, mv may leave a partial copy of
source_file at the source or destination. The mv utility shall
not modify both source_file and the destination path
simultaneously; termination at any point shall leave either
source_file or the destination path complete.
上述提到的保证同样适用于跨文件系统 mv 文件夹,src/dst 中总有一处的文件是完整的,mv 总是会保证所有文件都复制完毕后再开始删除。
总的来说,在 mv 的过程中 Ctrl C
并不会破坏文件,在最极端的情况下也只是会出现 dst 复制不完整或者 src 的文件没有被完整删除。