• sync fsync fdatasync ---systemtap跟踪


    aa.stp:

    probe kernel .function ( "sys_sync" ) {

    printf ( "probfunc:%s fun:%s ",execname(),ppfunc());

    print_backtrace();
    print_ubacktrace();
    exit();
    }

    A:

    [root@localhost ~]# sync

    B:

    stap -v aa.stp -d /lib64/libc-2.5.so -d /bin/sync

    probfunc:sync fun:sys_sync
     0xffffffff810e73e7 : sys_sync+0x0/0x2e [kernel]
     0xffffffff8100bb29 : tracesys+0xd9/0xde [kernel]
     0x34688ce477 : sync+0x7/0x30 [/lib64/libc-2.5.so]
     0x4011b5 : usage+0x1f5/0x240 [/bin/sync]
     0x346881d9f4 : __libc_start_main+0xf4/0x1b0 [/lib64/libc-2.5.so]
     0x400f09 [/bin/sync+0xf09/0x4000]

    int fsync(int fd);
    int fdatasync(int fd);
    fsync() transfers (
    "flushes") all modified in-core data of (i.e., modi- fied buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) where that file resides. The call blocks until the device reports that the trans- fer has completed. It also flushes metadata information associated with the file (see stat(2)). Calling fsync() does not necessarily ensure that the entry in the direc- tory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed. fdatasync() is similar to fsync(), but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. For example, changes to st_atime or st_mtime (respectively, time of last access and time of last modifica- tion; see stat(2)) do not not require flushing because they are not nec- essary for a subsequent data read to be handled correctly. On the other hand, a change to the file size (st_size, as made by say ftruncate(2)), would require a metadata flush. The aim of fdatasync(2) is to reduce disk activity for applications that do not require all metadata to be synchronised with the disk.
    int open(const char *pathname, int flags);
    int open(const char *pathname, int flags, mode_t mode);
        
    flags:
           O_APPEND
                  The file is opened in append mode. Before each write(), the  file
                  offset  is positioned at the end of the file, as if with lseek().
                  O_APPEND may lead to corrupted files on NFS file systems if  more
                  than one process appends data to a file at once.  This is because
                  NFS does not support appending to a file, so  the  client  kernel
                  has to simulate it, which can’t be done without a race condition.
    
           O_ASYNC
                  Enable signal-driven I/O: generate a signal  (SIGIO  by  default,
                  but  this  can  be  changed  via  fcntl(2))  when input or output
                  becomes possible on this file descriptor.  This feature  is  only
                  available  for  terminals,  pseudo-terminals, sockets, and (since
                  Linux 2.6) pipes and FIFOs.  See fcntl(2) for further details.
    
           O_CREAT
                  If the file does not exist it will be created.  The  owner  (user
                  ID)  of  the file is set to the effective user ID of the process.
                  The group ownership (group ID) is set  either  to  the  effective
                  group  ID  of the process or to the group ID of the parent direc-
                  tory (depending on filesystem type and  mount  options,  and  the
                  mode  of  the parent directory, see, e.g., the mount options bsd-
                  groups and sysvgroups of the ext2  filesystem,  as  described  in
                  mount(8)).
    
           O_DIRECT
                  Try  to  minimize cache effects of the I/O to and from this file.
                  In general this will degrade performance, but  it  is  useful  in
                  special  situations,  such  as  when  applications  do  their own
                  caching.  File I/O is done directly to/from user  space  buffers.
                  The  I/O  is synchronous, i.e., at the completion of a read(2) or
                  write(2), data is guaranteed to  have  been  transferred.   Under
                  Linux  2.4  transfer  sizes, and the alignment of user buffer and
                  file offset must all be multiples of the logical  block  size  of
                  the  file  system.  Under  Linux 2.6 alignment must fit the block
                  size of the device.
    
                  A semantically  similar  (but  deprecated)  interface  for  block
                  devices is described in raw(8).
    
           O_DIRECTORY
                  If  pathname  is  not  a directory, cause the open to fail.  This
                  flag is Linux-specific, and was added in kernel version  2.1.126,
                  to  avoid denial-of-service problems if opendir(3) is called on a
                  FIFO or tape device, but should not be used outside of the imple-
                  mentation of opendir.
    
           O_EXCL When used with O_CREAT, if the file already exists it is an error
                  and the open() will  fail.  In  this  context,  a  symbolic  link
                  exists,  regardless  of  where it points to.  O_EXCL is broken on
                  NFS file systems; programs which rely on it for performing  lock-
                  ing  tasks  will contain a race condition.  The solution for per-
                  forming atomic file locking using  a  lockfile  is  to  create  a
                  unique file on the same file system (e.g., incorporating hostname
                  and pid), use link(2) to make a link to the lockfile.  If  link()
                  returns 0, the lock is successful.  Otherwise, use stat(2) on the
                  unique file to check if its link count has  increased  to  2,  in
                  which case the lock is also successful.
    
           O_LARGEFILE
                  (LFS)  Allow  files whose sizes cannot be represented in an off_t
                  (but can be represented in an off64_t) to be opened.
    
           O_NOATIME
                  (Since Linux 2.6.8) Do not  update  the  file  last  access  time
                  (st_atime  in  the inode) when the file is read(2).  This flag is
                  intended for use by indexing or backup programs,  where  its  use
                  can  significantly reduce the amount of disk activity.  This flag
                  may not be effective on all filesystems.   One  example  is  NFS,
                  where the server maintains the access time.
    
           O_NOCTTY
                  If  pathname  refers  to a terminal device — see tty(4) — it will
                  not become the process’s controlling terminal even if the process
                  does not have one.
    
           O_NOFOLLOW
                  If  pathname  is a symbolic link, then the open fails.  This is a
                  FreeBSD extension, which was added to Linux in  version  2.1.126.
                  Symbolic  links  in earlier components of the pathname will still
                  be followed.
    
           O_NONBLOCK or O_NDELAY
                  When possible, the file is opened in non-blocking  mode.  Neither
                  the  open()  nor any subsequent operations on the file descriptor
                  which is returned will cause the calling process  to  wait.   For
                  the  handling  of  FIFOs  (named pipes), see also fifo(7).  For a
                  discussion of the effect of O_NONBLOCK in conjunction with manda-
                  tory file locks and with file leases, see fcntl(2).
    
           O_SYNC The  file  is  opened  for  synchronous  I/O. Any write()s on the
                  resulting file descriptor will block the  calling  process  until
                  the  data has been physically written to the underlying hardware.
                  But see RESTRICTIONS below.
    
           O_TRUNC
                  If the file already exists and is a regular  file  and  the  open
                  mode  allows  writing  (i.e.,  is  O_RDWR or O_WRONLY) it will be
                  truncated to length 0.  If the file is a FIFO or terminal  device
                  file,  the  O_TRUNC  flag  is  ignored.  Otherwise  the effect of
                  O_TRUNC is unspecified.


    [root@localhost ~]# stap -L 'kernel .function ( "sys_*sync" )' 
    kernel.function("sys_fdatasync@fs/sync.c:284") $fd:unsigned int
    kernel.function("sys_fsync@fs/sync.c:279") $fd:unsigned int
    kernel.function("sys_msync@mm/msync.c:32") $start:long unsigned int $len:size_t $flags:int $mm:struct mm_struct*
    kernel.function("sys_sync@fs/sync.c:129")
    [root@localhost ~]# stap -v aa.stp -d /lib64/libc-2.5.so -d /lib64/libpthread-2.5.so  -d /usr/local/mysql56/bin/mysqld  
    
    probfunc:mysqld fun:sys_fsync
     0xffffffff810e718d : sys_fsync+0x0/0x10 [kernel]
     0xffffffff8100bb29 : tracesys+0xd9/0xde [kernel]
     0x346940e1d7 : __fsync_nocancel+0x2e/0x67 [/lib64/libpthread-2.5.so]
     0xba81a5 : _Z13os_file_fsynci+0x1b/0xda [/usr/local/mysql56/bin/mysqld]
     0xba8277 : _Z18os_file_flush_funci+0x13/0x94 [/usr/local/mysql56/bin/mysqld]
     0xd4d3b5 : _Z22pfs_os_file_flush_funciPKcm+0x7d/0xb4 [/usr/local/mysql56/bin/mysqld]
     0xd4dbf9 : _Z9fil_flushm+0x363/0x486 [/usr/local/mysql56/bin/mysqld]
     0xb8dcef : _Z15log_write_up_tommm+0x5b3/0x7c0 [/usr/local/mysql56/bin/mysqld]
     0xc96813 : _Z27trx_flush_log_if_needed_lowm+0x53/0x88 [/usr/local/mysql56/bin/mysqld]
     0xc96873 : _Z23trx_flush_log_if_neededmP5trx_t+0x2b/0x40 [/usr/local/mysql56/bin/mysqld]
     0xc97434 : _Z29trx_commit_complete_for_mysqlP5trx_t+0x84/0x96 [/usr/local/mysql56/bin/mysqld]
     0xb32ca8 : _Z15innobase_commitP10handlertonP3THDb+0x2a4/0x2f4 [/usr/local/mysql56/bin/mysqld]
     0x625e75 : _Z13ha_commit_lowP3THDbb+0xa1/0x1e6 [/usr/local/mysql56/bin/mysqld]
     0x70300f : _ZN12TC_LOG_DUMMY6commitEP3THDb+0x25/0x3e [/usr/local/mysql56/bin/mysqld]
     0x6264c0 : _Z15ha_commit_transP3THDbb+0x506/0x612 [/usr/local/mysql56/bin/mysqld]
     0x89ee02 : _Z17trans_commit_stmtP3THD+0x1cc/0x292 [/usr/local/mysql56/bin/mysqld]
     0x7d60b7 : _Z21mysql_execute_commandP3THD+0x7bc3/0x7ec8 [/usr/local/mysql56/bin/mysqld]
     0x7d67c4 : _Z11mysql_parseP3THDPcjP12Parser_state+0x408/0x690 [/usr/local/mysql56/bin/mysqld]
     0x7d83f4 : _Z16dispatch_command19enum_server_commandP3THDPcj+0xd0a/0x227e [/usr/local/mysql56/bin/mysqld]
     0x7d9c80 : _Z10do_commandP3THD+0x318/0x394 [/usr/local/mysql56/bin/mysqld]
     0x78e3fb : _Z24do_handle_one_connectionP3THD+0x1ad/0x246 [/usr/local/mysql56/bin/mysqld]
     0x78e4c1 : handle_one_connection+0x2d/0x34 [/usr/local/mysql56/bin/mysqld]
    innodb_flush_method参数 与 文件系统IO  

    mysql innodb引擎可以使用innodb_flush_method参数设置与文件系统的交互方式。
    linux下的可选项有:
    fdatasync
    O_DIRECT
    O_SYNC
    其中默认的是fdatasync
    
    三个参数是如何影响程序MySQL对日志和数据文件的操作:
      Open log Flush log Open datafile Flush data
    Fdatasync   fsync()   fsync()
    O_DSYNC O_SYNC     fsync()
    O_DIRECT   fsync() O_DIRECT fsync()
    
     
    注:
     1)参数fdatasync实际是使用的fsync()函数,fsync函数只对由文件描述符filedes指定的单一文件起作用,并且等待写磁盘操作结束,然后返回。
    fsync可用于数据库这样的应用程序,这种应用程序需要确保将修改过的块立即写到磁盘上。fsync()函数是flush阶段调用的函数
    2)参数O_DIRECT告诉操作系统禁用缓存,然后使用fsync()的方式将数据刷入磁盘。O_DIRECT是open阶段设置的标志位
    3)参数O_DSYNC实际是使用的O_SYNC作为打开日志文件的标志,O_SYNC是open阶段设置的标志位,也是表示同步写入IO,即将缓存中的数据写入磁盘后再返回。

    O_SYNC和O_DIRECT的区别是O_SYNC不会在操作系统层面禁用缓存。但会告诉硬件层设备不要使用缓存。


    程序描述了一般的文件I/O操作的三个过程 open、write、fdatasync,分别是打开文件、写文件、flush操作(将文件缓存刷到磁盘上)。
    一、Open阶段 open(
    "test.file",O_WRONLY|O_APPDENT|O_SYNC)) 系统调用Open会为该进程一个文件描述符fd【附录2】。这里使用了O_WRONLY|O_APPDENT|O_SYNC打开文件: O_WRONLY表示我们以""的方式打开,告诉内核我们需要向文件中写入数据; O_APPDENT告诉内核以"追加"的方式写文件; O_DSYNC告诉内核,当向文件写入数据的时候,只有当数据写到了磁盘时,写入操作才算完成(write才返回成功)。和O_DSYNC同类的文件标志,还有O_SYNC,O_RSYNC,O_DIRECT。 O_SYNC比O_DSYNC更严格,不仅要求数据已经写到了磁盘,而且对应的数据文件的属性(例如文件长度等)也需要更新完成才算write操作成功。可见O_SYNC较之O_DSYNC要多做一些操作。 O_RSYNC表示文件读取时,该文件的OS cache必须已经全部flush到磁盘了【附录3】; 如果使用O_DIRECT打开文件,则读/写操作都会跳过OS cache,直接在device(disk)上读/写。因为没有了OS cache,所以会O_DIRECT降低文件的顺序读写的效率。
    二、Write阶段 write(fd,buf,
    6) 在使用open打开文件获得文件描述符之后,我们就可以调用write函数来写入数据了,write会根据前面的open参数不同,而表现不同。
    三、Flush阶段 fdatasync(fd)
    == -1 flush的函数还有fsync、sync、fdatasync write操作后,我们还调用了fdatasync来确保文件数据flush到了disk上。fdatasync返回成功后,那么可以认为数据已经写到了磁盘上。像这样的flush的函数还有fsync、sync。 fsync和fdatasync的区别等同于O_SYNC和O_DSYNC的区别, fdatasync函数,与fsync函数类似,但是只刷文件的数据部分,不包括元数据(修改时间等) sync函数表示将文件在OS cache中的数据排入写队列,并不确认是否真的写磁盘了,所以sync并不可以靠。 忽略文件打开的过程,通常我们会说“写文件”有两个阶段,一个是调用write我们称为写数据阶段(其实是受open的参数影响),调用fsync(或者fdatasync)我们称为flush阶段。 传统的UNIX实现在内核中设有缓冲区高速缓存或页面高速缓存,大多数磁盘 I/O都通过缓冲进行。当将数据写入文件时,内核通常先将该数据复制到其中一个缓冲区中,如果该缓冲区尚未写满,则并不将其排入输出队列,而是等待其写满或者当内核需要重用该缓冲区以便存放其他磁盘块数据时,再将该缓冲排入输出队列,然后待其到达队首时,才进行实际的I/O操作。这种输出方式被称为延迟写(delayed write)(Bach [1986]第3章详细讨论了缓冲区高速缓存)。
    The innodb_flush_method options for Unix-like systems include:
    
    fsync: InnoDB uses the fsync() system call to flush both the data and log files. fsync is the default setting.
    
    O_DSYNC: InnoDB uses O_SYNC to open and flush the log files, and fsync() to flush the data files. InnoDB does not use O_DSYNC directly because there have been problems with it on many varieties of Unix.
    
    littlesync: This option is used for internal performance testing and is currently unsupported. Use at your own risk.
    
    nosync: This option is used for internal performance testing and is currently unsupported. Use at your own risk.
    
    O_DIRECT: InnoDB uses O_DIRECT (or directio() on Solaris) to open the data files, and uses fsync() to flush both the data and log files. This option is available on some GNU/Linux versions, FreeBSD, and Solaris.
    
    O_DIRECT_NO_FSYNC: InnoDB uses O_DIRECT during flushing I/O, but skips the fsync() system call afterwards. This setting is suitable for some types of file systems but not others. For example, it is not suitable for XFS. If you are not sure whether the file system you use requires an fsync(), for example to preserve all file metadata, use O_DIRECT instead. This option was introduced in MySQL 5.6.7 (Bug #11754304, Bug #45892).
    stap -v aa.stp -d /lib64/libc-2.5.so -d /lib64/libpthread-2.5.so  -d /usr/local/mysql56/bin/mysqld
  • 相关阅读:
    MySQL 处理重复数据
    MySQL 序列使用
    MySQL 元数据
    MySQL 临时表和复制表
    MySQL 索引
    MySQL ALTER命令-修改数据表名或者修改数据表字段
    MySQL 事务
    MySQL 正则表达式
    MySQL NULL 值处理
    MySQL 排序
  • 原文地址:https://www.cnblogs.com/zengkefu/p/5587881.html
Copyright © 2020-2023  润新知