• Asynchronous Programming in Rust


    https://rust-lang.github.io/async-book

    Async

    Why Async

    Rust中的简单线程可以实现如下:

    fn get_two_sites() {
        // Spawn two threads to do work.
        let thread_one = thread::spawn(|| download("https://www.foo.com"));
        let thread_two = thread::spawn(|| download("https://www.bar.com"));
    
        // Wait for both threads to complete.
        thread_one.join().expect("thread one panicked");
        thread_two.join().expect("thread two panicked");
    }
    

    但是,thread::spawn还是有诸多不足:

    1. 在多个线程中切换会导致overhead,即使线程什么也不做也会导致资源占用

    使用async和await函数能够使得我们无需创建多个线程就运行多个tasks。

    async fn get_two_sites_async() {
        // Create two different "futures" which, when run to completion,
        // will asynchronously download the webpages.
        let future_one = download_async("https://www.foo.com");
        let future_two = download_async("https://www.bar.com");
    
        // Run both futures to completion at the same time.
        join!(future_one, future_two);
    }
    

    在Rust中,async fn会自动创建一个异步函数,该函数返回Future。

    需要注意的是,async语法在带来更少的overhead并提供更高级可用语法的同时,也确实带来了开发负担。

    async & .await

    在实现了Future这个trait的object上调用.await后,async能够将代码块转化state machine。Future Executor会尝试将对应Future运行完毕,如果出现block,blocked Future会把运行权力还给executor。直到block状态消失,executor再次把控制权还给这个Future运行到结束,.await也就完成。

    block_on(async_func);
    async fn f1(){f2.await}
    

    async lifetimes

    参数中带有reference或者其他非static lifetime的async fn会返回一个lifetime被这些参数限制住的Future。
    比如

    // This function:
    async fn foo(x: &u8) -> u8 { *x }
    
    // Is equivalent to this function:
    fn foo_expanded<'a>(x: &'a u8) -> impl Future<Output = u8> + 'a {
        async move { *x }
    }
    

    所以async处理reference要慎重,最好直接把变量的控制权交给这个async func。这里有async move{}能直接把对应代码块中用到的变量的ownership取走。在多个线程中都被用到的变量就必须要能够在多个线程中分享,在rust中,表现为必须实现Send。

    /// `async` block:
    ///
    /// Multiple different `async` blocks can access the same local variable
    /// so long as they're executed within the variable's scope
    async fn blocks() {
        let my_string = "foo".to_string();
    
        let future_one = async {
            // ...
            println!("{}", my_string);
        };
    
        let future_two = async {
            // ...
            println!("{}", my_string);
        };
    
        // Run both futures to completion, printing "foo" twice:
        let ((), ()) = futures::join!(future_one, future_two);
    }
    
    /// `async move` block:
    ///
    /// Only one `async move` block can access the same captured variable, since
    /// captures are moved into the `Future` generated by the `async move` block.
    /// However, this allows the `Future` to outlive the original scope of the
    /// variable:
    fn move_block() -> impl Future<Output = ()> {
        let my_string = "foo".to_string();
        async move {
            // ...
            println!("{}", my_string);
        }
    }
    

    Future

    Future是延迟异步计算任务,Future的基本逻辑如下:

    Basic

    
    trait SimpleFuture {
        type Output;
        fn poll(&mut self, wake: fn()) -> Poll<Self::Output>;
    }
    
    enum Poll<T> {
        Ready(T),
        Pending,
    }
    

    明显,用户可以调用poll函数来检测异步函数是否已经得到返回值。注意这里的wake()非常重要,和java的interrupt有点类似,能用来实现一个控制流,确保执行完毕的信号往回传,SimpleFuture能及时回收数据,rust中异步同步需要非常注意,很容易就会发生卡在poll这步的情况。

        fn poll(&mut self, wake: fn()) -> Poll<Self::Output> {
            if self.socket.has_data_to_read() {
                Poll::Ready(self.socket.read_buf())
            } else {
                // The socket does not yet have data.
                //
                // Arrange for `wake` to be called once data is available.
                // When data becomes available, `wake` will be called, and the
                // user of this `Future` will know to call `poll` again and
                // receive data.
                self.socket.set_readable_callback(wake);
                Poll::Pending
            }
        }
    

    真实的Future则是需要如下操作:
    具体来说,Future executors会接受一个top-level Future集合(即应当await的最顶层异步函数),然后运行,并通过poll确定这些Future都执行完毕。开始执行的时候会调用一次poll,接着,Future执行完时调用的wake则会将完毕信号放入Future executor的队列中,通知Future executor重新poll来确定状态。

    trait Future {
        type Output;
        fn poll(
            // Note the change from `&mut self` to `Pin<&mut Self>`:
            self: Pin<&mut Self>,
            // and the change from `wake: fn()` to `cx: &mut Context<'_>`:
            cx: &mut Context<'_>,
        ) -> Poll<Self::Output>;
    }
    

    这里Pin<&mut Self>允许我能创建immovable future,
    例如下面这个例子将Waker存起来,等sleep足够时间之后调用waker.wake()。

    impl Future for TimerFuture {
        type Output = ();
        fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
            // Look at the shared state to see if the timer has already completed.
            let mut shared_state = self.shared_state.lock().unwrap();
            if shared_state.completed {
                Poll::Ready(())
            } else {
                // Set waker so that the thread can wake up the current task
                // when the timer has completed, ensuring that the future is polled
                // again and sees that `completed = true`.
                //
                // It's tempting to do this once rather than repeatedly cloning
                // the waker each time. However, the `TimerFuture` can move between
                // tasks on the executor, which could cause a stale waker pointing
                // to the wrong task, preventing `TimerFuture` from waking up
                // correctly.
                //
                // N.B. it's possible to check for this using the `Waker::will_wake`
                // function, but we omit that here to keep things simple.
                shared_state.waker = Some(cx.waker().clone());
                Poll::Pending
            }
        }
    }
    
    impl TimerFuture {
        /// Create a new `TimerFuture` which will complete after the provided
        /// timeout.
        pub fn new(duration: Duration) -> Self {
            let shared_state = Arc::new(Mutex::new(SharedState {
                completed: false,
                waker: None,
            }));
    
            // Spawn the new thread
            let thread_shared_state = shared_state.clone();
            thread::spawn(move || {
                thread::sleep(duration);
                let mut shared_state = thread_shared_state.lock().unwrap();
                // Signal that the timer has completed and wake up the last
                // task on which the future was polled, if one exists.
                shared_state.completed = true;
                if let Some(waker) = shared_state.waker.take() {
                    waker.wake()
                } 
            });
    
            TimerFuture { shared_state }
        }
    }
    

    Future Executor演示

    值得注意的是,Future本身可以被认为是线程执行的。Rust在线程方面的支持力度并不大。
    Future Executor具体逻辑如下:

    use {
        futures::{
            future::{BoxFuture, FutureExt},
            task::{waker_ref, ArcWake},
        },
        std::{
            future::Future,
            sync::mpsc::{sync_channel, Receiver, SyncSender},
            sync::{Arc, Mutex},
            task::{Context, Poll},
            time::Duration,
        },
        // The timer we wrote in the previous section:
        timer_future::TimerFuture,
    };
    
    /// Task executor that receives tasks off of a channel and runs them.
    struct Executor {
        ready_queue: Receiver<Arc<Task>>,
    }
    
    /// `Spawner` spawns new futures onto the task channel.
    #[derive(Clone)]
    struct Spawner {
        task_sender: SyncSender<Arc<Task>>,
    }
    
    /// A future that can reschedule itself to be polled by an `Executor`.
    struct Task {
        /// In-progress future that should be pushed to completion.
        ///
        /// The `Mutex` is not necessary for correctness, since we only have
        /// one thread executing tasks at once. However, Rust isn't smart
        /// enough to know that `future` is only mutated from one thread,
        /// so we need to use the `Mutex` to prove thread-safety. A production
        /// executor would not need this, and could use `UnsafeCell` instead.
        future: Mutex<Option<BoxFuture<'static, ()>>>,
    
        /// Handle to place the task itself back onto the task queue.
        task_sender: SyncSender<Arc<Task>>,
    }
    
    fn new_executor_and_spawner() -> (Executor, Spawner) {
        // Maximum number of tasks to allow queueing in the channel at once.
        // This is just to make `sync_channel` happy, and wouldn't be present in
        // a real executor.
        const MAX_QUEUED_TASKS: usize = 10_000;
        let (task_sender, ready_queue) = sync_channel(MAX_QUEUED_TASKS);
        (Executor { ready_queue }, Spawner { task_sender })
    }
    
    impl Spawner {
        fn spawn(&self, future: impl Future<Output = ()> + 'static + Send) {
            let future = future.boxed();
            let task = Arc::new(Task {
                future: Mutex::new(Some(future)),
                task_sender: self.task_sender.clone(),
            });
            self.task_sender.send(task).expect("too many tasks queued");
        }
    }
    
    impl ArcWake for Task {
        fn wake_by_ref(arc_self: &Arc<Self>) {
            // Implement `wake` by sending this task back onto the task channel
            // so that it will be polled again by the executor.
            let cloned = arc_self.clone();
            arc_self
                .task_sender
                .send(cloned)
                .expect("too many tasks queued");
        }
    }
    
    impl Executor {
        fn run(&self) {
            while let Ok(task) = self.ready_queue.recv() {
                // Take the future, and if it has not yet completed (is still Some),
                // poll it in an attempt to complete it.
                let mut future_slot = task.future.lock().unwrap();
                if let Some(mut future) = future_slot.take() {
                    // Create a `LocalWaker` from the task itself
                    let waker = waker_ref(&task);
                    let context = &mut Context::from_waker(&*waker);
                    // `BoxFuture<T>` is a type alias for
                    // `Pin<Box<dyn Future<Output = T> + Send + 'static>>`.
                    // We can get a `Pin<&mut dyn Future + Send + 'static>`
                    // from it by calling the `Pin::as_mut` method.
                    if let Poll::Pending = future.as_mut().poll(context) {
                        // We're not done processing the future, so put it
                        // back in its task to be run again in the future.
                        *future_slot = Some(future);
                    }
                }
            }
        }
    }
    
    fn main() {
        let (executor, spawner) = new_executor_and_spawner();
    
        // Spawn a task to print before and after waiting on a timer.
        spawner.spawn(async {
            println!("howdy!");
            // Wait for our timer future to complete after two seconds.
            TimerFuture::new(Duration::new(2, 0)).await;
            println!("done!");
        });
    
        // Drop the spawner so that our executor knows it is finished and won't
        // receive more incoming tasks to run.
        drop(spawner);
    
        // Run the executor until the task queue is empty.
        // This will print "howdy!", pause, and then print "done!".
        executor.run();
    }
    

    以Socket为例,Rust就是通过poll+queue这样的机制来检测新数据的。比如Linux平台的epoll, FreeBSD+MacOS的kqueue等,这些API让线程block在特定IO事件上直到事件发生。这些API的基本逻辑一般如下:

    struct IoBlocker {
        /* ... */
    }
    
    struct Event {
        // An ID uniquely identifying the event that occurred and was listened for.
        id: usize,
    
        // A set of signals to wait for, or which occurred.
        signals: Signals,
    }
    
    impl IoBlocker {
        /// Create a new collection of asynchronous IO events to block on.
        fn new() -> Self { /* ... */ }
    
        /// Express an interest in a particular IO event.
        fn add_io_event_interest(
            &self,
    
            /// The object on which the event will occur
            io_object: &IoObject,
    
            /// A set of signals that may appear on the `io_object` for
            /// which an event should be triggered, paired with
            /// an ID to give to events that result from this interest.
            event: Event,
        ) { /* ... */ }
    
        /// Block until one of the events occurs.
        fn block(&self) -> Event { /* ... */ }
    }
    
    let mut io_blocker = IoBlocker::new();
    io_blocker.add_io_event_interest(
        &socket_1,
        Event { id: 1, signals: READABLE },
    );
    io_blocker.add_io_event_interest(
        &socket_2,
        Event { id: 2, signals: READABLE | WRITABLE },
    );
    let event = io_blocker.block();
    
    // prints e.g. "Socket 1 is now READABLE" if socket one became readable.
    println!("Socket {:?} is now {:?}", event.id, event.signals);
    

    所以Socket当然可以这么实现:

    impl Socket {
        fn set_readable_callback(&self, waker: Waker) {
            // `local_executor` is a reference to the local executor.
            // this could be provided at creation of the socket, but in practice
            // many executor implementations pass it down through thread local
            // storage for convenience.
            let local_executor = self.local_executor;
    
            // Unique ID for this IO object.
            let id = self.id;
    
            // Store the local waker in the executor's map so that it can be called
            // once the IO event arrives.
            local_executor.event_map.insert(id, waker);
            local_executor.add_io_event_interest(
                &self.socket_file_descriptor,
                Event { id, signals: READABLE },
            );
        }
    }
    

    Pin

    Pin类型包裹指针类型,保证指针所指的值不会move。对于没有实现Unpin trait的类型来说,在pinned之后就不能再move,不过对unpin的类型来说,pin之后还是可以unpin的,就没有影响。Pin<&mut UnpinT>就相当于征程的&mut UnpinT。
    例如,先尝试pin自定义!unPin类型,再修改对应数据,就会发现编译器报错了。

    pub fn main() {
        let mut test1 = Test::new("test1");
        let mut test1 = unsafe { Pin::new_unchecked(&mut test1) };
        Test::init(test1.as_mut());
    
        let mut test2 = Test::new("test2");
        let mut test2 = unsafe { Pin::new_unchecked(&mut test2) };
        Test::init(test2.as_mut());
    
        println!("a: {}, b: {}", Test::a(test1.as_ref()), Test::b(test1.as_ref()));
        std::mem::swap(test1.get_mut(), test2.get_mut());
        println!("a: {}, b: {}", Test::a(test2.as_ref()), Test::b(test2.as_ref()));
    }
    

    这里

    use std::pin::Pin;
    use std::marker::PhantomPinned;
    
    #[derive(Debug)]
    struct Test {
        a: String,
        b: *const String,
        _marker: PhantomPinned,
    }
    
    
    impl Test {
        fn new(txt: &str) -> Self {
            Test {
                a: String::from(txt),
                b: std::ptr::null(),
                _marker: PhantomPinned, // This makes our type `!Unpin`
            }
        }
        fn init<'a>(self: Pin<&'a mut Self>) {
            let self_ptr: *const String = &self.a;
            let this = unsafe { self.get_unchecked_mut() };
            this.b = self_ptr;
        }
    
        fn a<'a>(self: Pin<&'a Self>) -> &'a str {
            &self.get_ref().a
        }
    
        fn b<'a>(self: Pin<&'a Self>) -> &'a String {
            assert!(!self.b.is_null(), "Test::b called without Test::init being called first");
            unsafe { &*(self.b) }
        }
    }
    

    stack pinning是一定要包裹在unsafe中的。当指针在lifetime a中被pinned之后,编译器不能确保在lifetime a完成之后被指数据是否会改变,而按照Pin contract,被指数据不应该被改变。
    以下情况就会violates the Pin contract,而且还不会引起编译器报错。所以Object在被Pin之后,最好直接shadow掉,防止再次更改具体值。

    fn main() {
       let mut test1 = Test::new("test1");
       let mut test1_pin = unsafe { Pin::new_unchecked(&mut test1) };
       Test::init(test1_pin.as_mut());
       drop(test1_pin);
       println!(r#"test1.b points to "test1": {:?}..."#, test1.b);
       let mut test2 = Test::new("test2");
       mem::swap(&mut test1, &mut test2);
       println!("... and now it points nowhere: {:?}", test1.b);
    }
    

    heap pinning则能够告知编译器数据具有稳定的地址,在pinned之后不会move,所以无需unsafe。例如以下的例子在new时完成Box::pin

    use std::pin::Pin;
    use std::marker::PhantomPinned;
    
    #[derive(Debug)]
    struct Test {
        a: String,
        b: *const String,
        _marker: PhantomPinned,
    }
    
    impl Test {
        fn new(txt: &str) -> Pin<Box<Self>> {
            let t = Test {
                a: String::from(txt),
                b: std::ptr::null(),
                _marker: PhantomPinned,
            };
            let mut boxed = Box::pin(t);
            let self_ptr: *const String = &boxed.as_ref().a;
            unsafe { boxed.as_mut().get_unchecked_mut().b = self_ptr };
    
            boxed
        }
    
        fn a<'a>(self: Pin<&'a Self>) -> &'a str {
            &self.get_ref().a
        }
    
        fn b<'a>(self: Pin<&'a Self>) -> &'a String {
            unsafe { &*(self.b) }
        }
    }
    
    pub fn main() {
        let mut test1 = Test::new("test1");
        let mut test2 = Test::new("test2");
    
        println!("a: {}, b: {}",test1.as_ref().a(), test1.as_ref().b());
        println!("a: {}, b: {}",test2.as_ref().a(), test2.as_ref().b());
    }
    

    在需要Unpin属性,但是对应的Future或者Stream又是!Unpin的时候,可以使用Pin<Box>或者Pin<&mut T>来包裹一下。大部分标准库类型都实现了Unpin。

    use pin_utils::pin_mut; // `pin_utils` is a handy crate available on crates.io
    
    // A function which takes a `Future` that implements `Unpin`.
    fn execute_unpin_future(x: impl Future<Output = ()> + Unpin) { /* ... */ }
    
    let fut = async { /* ... */ };
    execute_unpin_future(fut); // Error: `fut` does not implement `Unpin` trait
    
    // Pinning with `Box`:
    let fut = async { /* ... */ };
    let fut = Box::pin(fut);
    execute_unpin_future(fut); // OK
    
    // Pinning with `pin_mut!`:
    let fut = async { /* ... */ };
    pin_mut!(fut);
    execute_unpin_future(fut); // OK
    
  • 相关阅读:
    used内存较大,实际top查看系统进程中并没有占用这么多内存
    查看LINUX进程内存占用情况
    关于ConcurrentHashMap的key和value不能为null的深层次原因
    Linux修改用户所在组方法
    原因可能是托管的PInvoke签名与非托管的目标签名不匹配
    vs2019 实现C#调用c++的dll两种方法
    java jvm 参数 -Xms -Xmx -Xmn -Xss 调优总结
    java 读取文件的几种方式和通过url获取文件
    Idea中Maven的默认配置 (非常好)
    去哪儿网models数据更新
  • 原文地址:https://www.cnblogs.com/xuesu/p/14188968.html
Copyright © 2020-2023  润新知