@@ -10,9 +10,6 @@ This article seeks to help you build a sturdy mental model of how asyncio
1010fundamentally works.
1111Something that will help you understand the how and why behind the recommended
1212patterns.
13- The final section, :ref: `which_concurrency_do_I_want `, zooms out a bit and
14- compares the common approaches to concurrency -- multiprocessing,
15- multithreading & asyncio -- and describes where each is most useful.
1613
1714During my own asyncio learning process, a few aspects particually drove my
1815curiosity (read: drove me nuts).
@@ -27,15 +24,6 @@ of this article.
2724- How would I go about writing my own asynchronous variant of some operation?
2825 Something like an async sleep, database request, etc.
2926
30- The first two sections feature some examples but are generally focused on theory
31- and explaining concepts.
32- The next two sections are centered around examples, focused on further
33- illustrating and reinforcing ideas practically.
34-
35- .. contents :: Sections
36- :depth: 1
37- :local:
38-
3927---------------------------------------------
4028A conceptual overview part 1: the high-level
4129---------------------------------------------
@@ -489,368 +477,3 @@ For reference, you could implement it without futures, like so::
489477 else:
490478 await YieldToEventLoop()
491479
492-
493- .. _anaylzing-control-flow-example :
494-
495- ----------------------------------------------
496- Analyzing an example programs control flow
497- ----------------------------------------------
498-
499- We'll walkthrough, step by step, a simple asynchronous program following along
500- in the key methods of Task & Future that are leveraged when asyncio is
501- orchestrating the show.
502-
503-
504- ===============
505- Task.step
506- ===============
507-
508- The actual method that invokes a Tasks coroutine:
509- ``asyncio.tasks.Task.__step_run_and_handle_result `` is about 80 lines long.
510- For the sake of clarity, I've removed all of the edge-case error handling,
511- simplified some aspects and renamed it, but the core logic remains unchanged.
512-
513- ::
514-
515- 1 class Task(Future):
516- 2 ...
517- 3 def step(self):
518- 4 try:
519- 5 awaited_task = self.coro.send(None)
520- 6 except StopIteration as e:
521- 7 super().set_result(e.value)
522- 8 else:
523- 9 awaited_task.add_done_callback(self.__step)
524- 10 ...
525-
526-
527- ======================
528- Example program
529- ======================
530-
531- ::
532-
533- # Filename: program.py
534- 1 async def triple(val: int):
535- 2 return val * 3
536- 3
537- 4 async def main():
538- 5 triple_task = asyncio.Task(coro=triple(val=5))
539- 6 tripled_val = await triple_task
540- 7 return tripled_val + 2
541- 8
542- 9 loop = asyncio.new_event_loop()
543- 10 main_task = asyncio.Task(main(), loop=loop)
544- 11 loop.run_forever()
545-
546- =====================
547- Control flow
548- =====================
549-
550- At a high-level, this is how control flows:
551-
552- .. code-block :: none
553-
554- 1 program
555- 2 event-loop
556- 3 main_task.step
557- 4 main()
558- 5 triple_task.__await__
559- 6 main()
560- 7 main_task.step
561- 8 event-loop
562- 9 triple_task.step
563- 10 triple()
564- 11 triple_task.step
565- 12 event-loop
566- 13 main_task.step
567- 14 triple_task.__await__
568- 15 main()
569- 16 main_task.step
570- 17 event-loop
571-
572- And, in much more detail:
573-
574- 1. Control begins in ``program.py ``
575- Line 9 creates an event-loop, line 10 creates ``main_task `` and adds it to
576- the event-loop, line 11 indefinitely passes control to the event-loop.
577- 2. Control is now in the event-loop
578- The event-loop pops ``main_task `` off its queue, then invokes it by calling
579- ``main_task.step() ``.
580- 3. Control is now in ``main_task.step ``
581- We enter the try-block on line 4 then begin the coroutine ``main() `` on
582- line 5.
583- 4. Control is now in the coroutine: ``main() ``
584- The Task ``triple_task `` is created on line 5 which adds it to the
585- event-loops queue. Line 6 ``await ``\ s triple_task.
586- Remember, that calls ``Task.__await__ `` then percolates any ``yield ``\ s.
587- 5. Control is now in ``triple_task.__await__ ``
588- ``triple_task `` is not done given it was just created, so we enter
589- the first if-block on line 5 and ``yield `` the thing we'll be waiting
590- for -- ``triple_task ``.
591- 6. Control is now in the coroutine: ``main() ``
592- ``await `` percolates the ``yield `` and the yielded value -- ``triple_task ``.
593- 7. Control is now in ``main_task.step ``
594- The variable ``awaited_task `` is ``triple_task ``.
595- No ``StopIteration `` was raised so the else in the try-block on line 8
596- executes.
597- A done-callback: ``main_task.step `` is added to the ``triple_task ``.
598- The step method ends and returns to the event-loop.
599- 8. Control is now in the event-loop
600- The event-loop cycles to the next task in its queue.
601- The event-loop pops ``triple_task `` from its queue and invokes it by
602- calling ``triple_task.step() ``.
603- 9. Control is now in ``triple_task.step ``
604- We enter the try-block on line 4 then begin the coroutine ``triple() ``
605- via line 5.
606- 10. Control is now in the coroutine: ``triple() ``
607- It computes 3 times 5, then finishes and raises a ``StopIteration ``
608- exception.
609- 11. Control is now in ``triple_task.step ``
610- The ``StopIteration `` exception is caught so we go to line 7.
611- The return value of the coroutine ``triple() `` is embedded in the value
612- attribute of that exception.
613- ``Future.set_result() `` saves the result, marks the task as done and adds
614- the done-callbacks of ``triple_task `` to the event-loops queue.
615- The step method ends and returns control to the event-loop.
616- 12. Control is now in the event-loop
617- The event-loop cycles to the next task in its queue.
618- The event-loop pops ``main_task `` and resumes it by calling
619- ``main_task.step() ``.
620- 13. Control is now in ``main_task.step ``
621- We enter the try-block on line 4 then resume the coroutine ``main ``
622- which will pick up again from where it ``yield ``-ed.
623- Recall, it ``yield ``-ed not in the coroutine, but in
624- ``triple_task.__await__ `` on line 6.
625- 14. Control is now in ``triple_task.__await__ ``
626- We evaluate the if-statement on line 8 which ensures that ``triple_task ``
627- was completed.
628- Then, it returns the result of ``triple_task `` which was saved earlier.
629- Finally that result is returned to the caller
630- (i.e. ``... = await triple_task ``).
631- 15. Control is now in the coroutine: ``main() ``
632- ``tripled_val `` is 15. The coroutine finishes and raises a
633- ``StopIteration `` exception with the return value of 17 attached.
634- 16. Control is now in ``main_task.step ``
635- The ``StopIteration `` exception is caught and ``main_task `` is marked
636- as done and its result is saved.
637- The step method ends and returns control to the event-loop.
638- 17. Control is now in the event-loop
639- There's nothing in the queue.
640- The event-loop cycles aimlessly onwards.
641-
642- ----------------------------------------------
643- Barebones network I/O example
644- ----------------------------------------------
645-
646- Here we'll see a simple but thorough example showing how asyncio can offer an
647- advantage over serial programs.
648- The example doesn't rely on any asyncio operators (besides the event-loop).
649- It's all non-blocking sockets & custom awaitables that help you see what's
650- actually happening under the hood and how you could do something similar.
651-
652- Performing a database request across a network might take half a second or so,
653- but that's ages in computer-time.
654- Your processor could have done millions or even billions of things.
655- The same is true for, say, requesting a website, downloading a car, loading a
656- file from disk into memory, etc.
657- The general theme is those are all input/output (I/O) actions.
658-
659- Consider performing two tasks: requesting some information from a server and
660- doing some computation locally.
661- A serial approach would look like: ping the server, idle while waiting for a
662- response, receive the response, perform the local computation.
663- An asynchronous approach would look like: ping the server, do some of the
664- local computation while waiting for a response, check if the server is ready
665- yet, do a bit more of the local computation, check again, etc.
666- Basically we're freeing up the CPU to do other activities instead of scratching
667- its belly button.
668-
669- This example has a server (a separate, local process) compute the sum of many
670- samples from a Gaussian (i.e. normal) distribution.
671- And the local computation finds the sum of many samples from a uniform
672- distribution.
673- As you'll see, the asynchronous approach runs notably faster, since progress
674- can be made on computing the sum of many uniform samples, while waiting for
675- the server to calculate and respond.
676-
677- =====================
678- Serial output
679- =====================
680-
681- .. code-block :: none
682-
683- $ python serial_approach.py
684- Beginning server_request.
685- ====== Done server_request. total: -2869.04. Ran for: 2.77s. ======
686- Beginning uniform_sum.
687- ====== Done uniform_sum. total: 60001676.02. Ran for: 4.77s. ======
688- Total time elapsed: 7.54s.
689-
690- =====================
691- Asynchronous output
692- =====================
693-
694- .. code-block :: none
695-
696- $ python async_approach.py
697- Beginning uniform_sum.
698- Pausing uniform_sum at sample_num: 26,999,999. time_elapsed: 1.01s.
699-
700- Beginning server_request.
701- Pausing server_request. time_elapsed: 0.00s.
702-
703- Resuming uniform_sum.
704- Pausing uniform_sum at sample_num: 53,999,999. time_elapsed: 1.05s.
705-
706- Resuming server_request.
707- Pausing server_request. time_elapsed: 0.00s.
708-
709- Resuming uniform_sum.
710- Pausing uniform_sum at sample_num: 80,999,999. time_elapsed: 1.05s.
711-
712- Resuming server_request.
713- Pausing server_request. time_elapsed: 0.00s.
714-
715- Resuming uniform_sum.
716- Pausing uniform_sum at sample_num: 107,999,999. time_elapsed: 1.04s.
717-
718- Resuming server_request.
719- ====== Done server_request. total: -2722.46. ======
720-
721- Resuming uniform_sum.
722- ====== Done uniform_sum. total: 59999087.62 ======
723-
724- Total time elapsed: 4.60s.
725-
726- ======================
727- Code
728- ======================
729-
730- Now, we'll explore some of the most important snippets.
731-
732- Below is the portion of the asynchronous approach responsible for checking if
733- the server's done yet.
734- And, if not, yielding control back to the event-loop instead of idly waiting.
735- I'd like to draw your attention to a specific part of this snippet.
736- Setting a socket to non-blocking mode means the ``recv() `` call won't idle while
737- waiting for a response.
738- Instead, if there's no data to be read, it'll immediately raise a
739- ``BlockingIOError ``.
740- If there is data available, the ``recv() `` will proceed as normal.
741-
742- .. code-block :: python
743-
744- class YieldToEventLoop :
745- def __await__ (self ):
746- yield
747- ...
748-
749- async def get_server_data ():
750- client = socket.socket()
751- client.connect(server.SERVER_ADDRESS )
752- client.setblocking(False )
753-
754- while True :
755- try :
756- # For reference, the first argument to recv() is the maximum number
757- # of bytes to attempt to read. Setting it to 4096 means we could get 2
758- # bytes or 4 bytes, or even 4091 bytes, but not 4097 bytes back.
759- # However, if there are no bytes available to be read, this recv()
760- # will raise a BlockingIOError since the socket was set to
761- # non-blocking mode.
762- response = client.recv(4096 )
763- break
764- except BlockingIOError :
765- await YieldToEventLoop()
766- return response
767-
768-
769- And this is the portion of code responsible for asynchronously computing
770- the uniform sums.
771- It's designed to allow for working through the sum a portion at a time.
772- The ``time_allotment `` argument to the coroutine function decides how long the sum
773- function will iterate, in other words, synchronously hog control, before ceding
774- back to the event-loop.
775-
776- .. code-block :: python
777-
778- async def uniform_sum (n_samples : int , time_allotment : float ) -> float :
779-
780- start_time = time.time()
781-
782- total = 0.0
783- for _ in range (n_samples):
784- total += random.random()
785-
786- time_elapsed = time.time() - start_time
787- if time_elapsed > time_allotment:
788- await YieldToEventLoop()
789- start_time = time.time()
790-
791- return total
792-
793- The above snippet was simplified a bit. Reading ``time.time() `` and evaluating
794- an if-condition on every iteration for many, many iterations (in this case
795- roughly a hundred million) more than eats up the runtime savings associated
796- with the asynchronous approach.
797- The actual implementation involves chunking the iteration, so you only perform
798- the check every few million iterations.
799- With that change, the asynchronous approach wins in a landslide.
800- This is important to keep in mind.
801- Too much checking or constantly jumping between tasks can ultimately cause more
802- harm than good!
803-
804- The server, async & serial programs are available in full here:
805- https://github.com/anordin95/a-conceptual-overview-of-asyncio/tree/main/barebones-network-io-example.
806-
807- .. _which_concurrency_do_I_want :
808-
809- ------------------------------
810- Which concurrency do I want
811- ------------------------------
812-
813- ===========================
814- multiprocessing
815- ===========================
816-
817- For any computationally bound work in Python, you likely want to use
818- multiprocessing.
819- Otherwise, the Global Interpreter Lock (GIL) will generally get in your way!
820- For those who don't know, the GIL is a lock which ensures only one Python
821- instruction is executed at a time.
822- Of course, since processes are generally entirely independent from one another,
823- the GIL in one process won't be impeded by the GIL in another process.
824- Granted, I believe there are ways to also get around the GIL in a single process
825- by leveraging C extensions or via subinterpreters.
826-
827- ===========================
828- multithreading & asyncio
829- ===========================
830-
831- Multithreading and asyncio are much more similar in where they're useful with
832- Python: not at all for computationally-bound work, and crucially for I/O bound
833- work.
834- For applications that need to manage absolutely tons of distinct I/O connections
835- or chunks-of-work, asyncio is a must.
836- For example, a web server handling thousands of requests "simultaneously"
837- (in quotes, because, as we saw, the frequent handoffs of control only create
838- the illusion of simultaneous execution).
839- Otherwise, I think the choice between which to use is somewhat down to taste.
840-
841- Multithreading maintains an OS managed thread for each chunk of work. Whereas
842- asyncio uses Tasks for each work-chunk and manages them via the event-loop's
843- queue.
844- I believe the marginal overhead of one more chunk of work is a fair bit lower
845- for asyncio than threads, which matters a lot for applications that need to
846- manage many, many chunks of work.
847-
848- There are some other benefits associated with using asyncio.
849- One is clearer visibility into when and where interleaving occurs.
850- The code chunk between two awaits is certainly synchronous.
851- Another is simpler debugging, since it's easier to attach and follow a trace and
852- reason about code execution.
853- With threading, the interleaving is more of a black-box.
854- One benefit of multithreading is not really having to worry about greedy threads
855- hogging execution, something that could happen with asyncio where a greedy
856- coroutine never awaits and effectively stalls the event-loop.
0 commit comments