Run it like `pprof -http : your_profile.out` and it will open a browser with a really nice interactive flamegraph (way better than the Perl version), plus a call graph, source line profiling, top functions, etc. etc.
It's so much better. Don't use the Perl version. I should probably write a post showing how to do this.
Another also-much-better alternative is Samply (https://github.com/mstange/samply) which uses the Firefox Profiler as a GUI. I don't like it quite as much as pprof but it's clearly still much better than what's in this article:
It should be noted that even though the post links to the perl version for some reason, it is actually not what cargo flamegraph [0] uses, it uses a reimplementation of it in Rust called inferno [1].
But even so, pprof's is better. (You'll have to try it or take my word for it; they don't seem to have a demo anywhere unfortunately.)
When you hover a function it highlights all the other calls to that function (in different stacks), and if you click it it shows all the calls to and from that function in all stacks with two-sided flame graph.
You can also run `perf script -F +pid > out.perf` and then open `out.perf` in Firefox's built-in profile viewer (which is super neat) https://profiler.firefox.com
pprof doesn't do an amazing job of explaining how to use it with perf (which you'd need to use for a rust project like OP), so:
First install perf, graphviz, perf_data_converter and ofc pprof, then generate the data with `perf record [command]`, and display it with `pprof -http=: perf.data`.
> We’re paying a cost for each split_whitespace call, which allocates intermediate slices.
This part seems bit confused, I don't think `split_whitespace` does any allocations. I wish there were few intermediary steps here, e.g. going from &str and split_whitespace to &[u8] and split.
The tokenizer at that point is bit clunky, it is not really comparable to split_whitespace. The new tokenizer doesn't actually have any whitespace handling, it just assumes that every token is followed by exactly one whitespace. That alone might explain some of the speedup.
Likely the reason `split_whitespace` is so slow is
> ‘Whitespace’ is defined according to the terms of the Unicode Derived Core Property White_Space.
If they used `split_ascii_whitespace` things would likely be faster.
Switching parsing from `&str` to `&[u8]` can offer other benefits. In their case, they do `&str` comparisons and are switching that to a `u8` comparison. A lot of other parsers are doing `char` comparisons which requires decoding a `&str` to a `char` which can be expensive and is usually not needed because most grammars can be parsed as `&[u8]` just fine.
So, we're just calling split(IsWhitespace).filter(IsNotEmpty) and keeping the resulting iterator.
Rust's iterators are lazy, they only do work when asked for the next item, so their internal state is only what is necessary to keep doing that each time.
IsWhitespace and IsNotEmpty are both predicates which do exactly what you think they do, they're provided in the library because they might not get inlined and if they don't we might as well only implement them exactly once.
Can you help me understand what’s happening between the split and the filter on “a <space> <space> <space> b”? I expect that to be a series of calls to split, each yielding an empty slice. So the whole iterator yields a slice pointing at a, then a slice pointing at b—but it’s had to handle three intermediate slices to get the b. Right?
It creates a Split<'_> iterator using the IsWhitespace function as the pattern. As the user calls .next() on the outer SplitWhitespace<'_>, it calls .next() on the inner Split<'_>, which yields slices "a", "", "", and "b", and the filtered iterator reduces them to "a" and "b".
(But as mentioned, this doesn't perform any allocations, since each slice is just a pointer + length into the original string.)
You're right - split_whitespace returns an iterator that yields string slices (&str) which are just views into the original string without allocation, though the performance difference likely comes from avoiding the iterator indirection and boundary checks.
I don't think memory mapping does anything to prevent false sharing. All threads still get the same data at the same address. You may get page alignment for the file, but the free-form data in the file still crosses page boundaries and cache lines.
Also you don't get contention when you don't write to the memory.
The speedup may be from just starting the work before the whole file is loaded, allowing the OS to prefetch the rest in parallel.
You probably would get the same result if you loaded the file in smaller chunks.
I feel like I'm taking crazy pills. It's not a parser, but a fused parser AND interpreter. This changes the game considerably! It doesn't have to produce an intermediate AST, and therefore can avoid the majority of the allocation that most parsers will perform.
However, avoiding creating the AST is not very realistic for most uses. It's usually needed to perform optimizations, or even just for more complicated languages that have interesting control-flow.
This reminds me I should actually write a "natural" arithmetic expression parser for my Rust crate realistic
Right now, realistic can parse "(* (^ 40 0.5) (^ 90 0.5))" and it will tell you that's 60, because yeah, it's sixty, that's how real arithmetic works.
But it would be nice to write "(40^0.5) * (90^0.5)" or similar and have that work instead or as well. The months of work on realistic meant I spent so long without a "natural" parser that I got used to this.
I think having a polish notation parser is good enough for math-y applciations, I wouldn't worry about it too much if I were you. Nice crate by the way!
I very much like those optimizations articles , what could be interesting is to look at benchmarks not only wall time but also other metrics :
- cpu time (better CPU usage can mean shorter wall time but higher CPU)
- memory usage
- but also and maybe more interestingly complexity of code (not an absolute metric, but a very complex/non portable code for 5% speedup may or may not be worth it)
I'm somewhat curious on if these optimizations would all have roughly the same impact if done in other orders? The presentation certainly makes it look like creating a big list of tokens is always the culprit here. Seems somewhat expected, so I agree with the text; but I still wonder if the other optimizations are best to look at in terms of percentage gains or absolute gains, here.
I am a bit surprised that the author didn't try to implement a stream parser. This could avoid loading the entire file in memory or relying on OS features like memory-mapped files.
This would be a perfect class project. First lesson is letting them go loose, second lessen would be to find out which optimizations they used and which are available
I am not even a newbye in Rust and also this could be just nitpicking, but it seems that match is comparing strings and not characters, if this is the case then I think Common Lisp can optimize more, since there is a special comparison for characters in CL.
Edited: In the optimized version the author use bytes and generators and avoid using strings. I don't know if Rust generators are optimized for speed or memory, ideally you could define the length of the buffer according to the memory cache available.
Edited: I find strange using input = read_input_file()? and then using eval(&input), what happens when there is an error reading the file? Rust is supposed to be a high security language. In CL there are keyword like if-does-not-exists to decide what to do and also read accepts additional parameters for end-of-file and for expressing that this read is inside a recursive procedure inside another read.
I should stop comparing Rust to CL, better learn Rust first. I consider this kind of articles a very good way of learning Rust for those interested in parsing and optimizations. Rust seems to be a very nice language when you can afford the time to develop your program.
Specifically the ? symbol is currently implemented via the operator trait Try as Try::branch() which gets you a ControlFlow
If Try::branch gives us a ControlFlow::Break we're done here, return immediately with the value wrapped by Break [if any] inside an Err, otherwise we have ControlFlow::Continue wrapping a value we can use to continue with execution of this function.
This is type checked, so if the function says it returns Result<Goose, Dalek> then the type of the value wrapped in a ControlFlow::Break had better be Err(Dalek) or else we can't use our ? operator here.
Reifying ControlFlow here separates concerns properly - if we want to stop early successfully then control flow can represent that idea just fine whereas an Exception model ties early exit to failure.
Thanks for the info. I imagine that in this care, since it seems the error is not captured, it should end producing panic. So a question mark is used when the expected result is of type Result or Error. Also the web page, https://doc.rust-lang.org/rust-by-example/error/result.html, describe the result type as Ok(T) or Err(E), and indicates that is a richer version of Option.
Yeah, if `main` returns an error I think it exits with an error code and prints it out, so quite similar to a panic.
I think the blog post is not focussing on error handling too much, but in any case this is 'safe', just could likely be handled better in a real-world case.
Likely, comparing on `char` ('+') would be slower as it requires decoding the `&str` as a `char` which comes with some significant overhead (I've seen 9% on a fairly optimized parser). Ideally, when you grammar is 7-bit ASCII (or any 8-bit UTF-8 values are opaque to your grammar), you instead parse on `&[u8]` and do `u8` comparisons, rather than `char` or `&[u8]`.
Thanks for all the information you provided. I will read Rust by Example and stop posting in this thread to avoid deviating from the OP. Anyway, perhaps other readers are learning Rust and having the same questions in their minds, so your answers are also welcome for them.
Edited: I will eliminate my catfacts username (changing passsord to a random one), I don't like being downvoted and I know I should not mention it, but things are what they are. Good bye catfacts !.
If you've never been exposed to a Hindley-Milner type system[1] it can seem a bit magical, but it essentially works by trying to figure out the types from the inside and out by inferring usage all the way to the top. The type of `n` however is `&str`, but I take it you mean the matching. `n.parse()` can be anything that implements `FromStr`, but `Token::Operand` can only take a `u32`, so it can immediately infer that the result of `n.parse().unwrap()` must be `u32` (`n.parse()` is a `Result<u32, Err>`).
We're doing a pattern match, so, this variable n has to be something that matches the entire value matched, its type will be identical to the type of the value matched, s a few lines earlier.
That value is an item from the iterator we got from calling split_whitespace() and split_whitespace() returns a SplitWhiteSpace, a custom iterator whose items are themselves sub-strings of the input string with (no surprise) no white space in them. In Rust's terminology these are &str, references to a string slice.
In this case the compiler actually first wants the type for a parameter to the function Token::Operand
That function is not shown, but it is included in the full source code which was linked. Well, technically we need to know that Rust says if there's a sum type Token::Operand which has an associated value, we can always call a function to make a Token::Operand with that value, and it just names this function Token::Operand too.
So, Token::Operand takes an i32, a 32-bit signed integer. The compiler knows we're eventually getting an i32 to call this function, if not our program isn't valid.
Which means n.parse().unwrap() has the type i32
We know n is an &str, the &str type has a generic function parse(), with the following signature:
pub fn parse<F>(&self) -> Result<F, <F as FromStr>::Err> where F: FromStr
So the type you now care about, that of n.parse() has to be Result of some kind, and we're going to call Result::unwrap() on that, to get an i32
This can only work if the type F in that generic signature above is i32
Which means the new type you care about was Result<i32, ParseIntError> and the parse function called will be the one which makes Ok(i32) when presented an in-range integer.
Edited: Word-smithing, no significant change of meaning.
That function is returning a Vec<Token>, and so it knows the .collect() call needs to return a Vec<Token>, and so therefore the .map() function needs to return a Token. Therefore each match arm needs to return a Token too, so therefore the compiler selects the implementation of .parse() that returns Token.
I admit when I started rust, seeing calls to .parse() was one of the more confusing things I saw in rust code, because of how much it leans on type inference to be readable. In places like these, it's a bit more readable:
let ip: IpAddr = ip_str.parse()?;
But when you see the .parse buried several levels deep and you have no idea what type it's trying to produce, it's a pain in the ass to read. This is why it's nice to use the turbo-fish syntax:
let ip = ip_str.parse::<IpAddr>()?;
Since you can drop .parse::<IpAddr>()? anywhere to make the type explicit, especially when buried in type-inferred blocks like the code in TFA.
`n` is the same type as `s` from "match s" and 'n' is just `s` but renamed, if none of the previous conditions passed.
Because `match <exp>` could have contained an expression, you might need to handle a "catch all" case where you can refer to the result of that expression.
The code could have been `match s.doSomething() { ...`. The lines above what you have quoted just compare the result to a couple of a constants. If none are true, the line that you have quoted is equivalent to renaming the result of that expression to `n` and then handling that case.
maybe this isn't the question you meant to ask, but:
`n` has the same type as the input of the `match` block. In other words, it's a fallback case. (In this case, it's `&str`; the same as `"+"`, `"-"`, etc)
If you're wondering how `n.parse().unwrap()` has its type computed, well that part is because type inference is able to look at the definition of `Token::Operand(u32)` and discover that it's `u32`.
From my experience: The compiler can do this, as long as the first usage of the unknown-typed-thing gives it a type.
If the first usage of it doesn't, then it won't try any harder to infer the type and it won't compile unless you add your own annotations on.
I am wondering if there is a different approach that 'peaks' better in terms of perf, like instead of doing :
- Optimization 1: Do not allocate a Vector when tokenizing
- Optimization 2: Zero allocations — parse directly from the input bytes
- Optimization 3: Do not use Peekable
- Optimization 4: Multithreading and SIMD
- Optimization 5: Memory‑mapped I/O
Example :
- Optimization 1: Memory‑mapped I/O
- Optimization 2: Do not use Peekable
- Optimization 3: Do not allocate a Vector when tokenizing
- Optimization 4: Zero allocations — parse directly from the input bytes
Conclusion
- Optimization 5: Multithreading and SIMD
I might be guessing, but in this order probably by Optimization 3 you would reach already a high throughput that you wouldn't bother with manual simd nor Multithreading. (this is a pragmatic way, in real life you will try to minimize risk and try to reach goal as fast as possible, simd/Multithreading carry a lot of risk for your average dev team)
> I might be guessing, but in this order probably by Optimization 3 you would reach already a high throughput that you wouldn't bother with manual simd nor Multithreading.
I agree with you though from my experience Memory Mapping is only useful if you need to jump through the file or read it multiple times (as is the case after the author added simd and a two pass step, the first to identify whitespaces and the second to parse the operation and the operants). If you just need to read the file once it's better to avoid memory mapping as it adds a little overhead.
On the other hand parsing directly from the input bytes avoiding the UTF-8 validation needed to have &str type is easy enough to do but still improves performance quite a bit. Even the rust csv crate, which does much more, is around 30% faster with this optimization. https://docs.rs/csv/latest/csv/tutorial/index.html#amortizin...
This is to say, my list for "easy optimizations, big gains", would be 1) Do not allocate a Vector — 2) Do not use peekable — 3) Avoid utf8 validation.
I'm still guessing, but I think memory mapping can be skipped, and might be worth it only if you plan on also implementing simd.
Every time I see people use flamegraphs it's the ancient Perl version. There's a much better version!!!
Use the Go version of pprof: https://github.com/google/pprof
Run it like `pprof -http : your_profile.out` and it will open a browser with a really nice interactive flamegraph (way better than the Perl version), plus a call graph, source line profiling, top functions, etc. etc.
It's so much better. Don't use the Perl version. I should probably write a post showing how to do this.
Another also-much-better alternative is Samply (https://github.com/mstange/samply) which uses the Firefox Profiler as a GUI. I don't like it quite as much as pprof but it's clearly still much better than what's in this article:
https://share.firefox.dev/3j3PJoK
It should be noted that even though the post links to the perl version for some reason, it is actually not what cargo flamegraph [0] uses, it uses a reimplementation of it in Rust called inferno [1].
[0]: https://github.com/flamegraph-rs/flamegraph
[1]: https://github.com/jonhoo/inferno
Also Jon streamed the process of porting flamegraph to Rust: https://youtu.be/jTpK-bNZiA4?si=VsvBln60BCEU7hbz
Ah interesting. It seems to be no better than the Perl one though, except not requiring Perl. It's still a static SVG.
It's not a static SVG though. The SVG supports interactivity. You can click on each element to zoom in and it even has a search feature.
Ah really? Their example here doesn't do that: https://github.com/jonhoo/inferno/blob/main/tests/data/flame...
But even so, pprof's is better. (You'll have to try it or take my word for it; they don't seem to have a demo anywhere unfortunately.)
When you hover a function it highlights all the other calls to that function (in different stacks), and if you click it it shows all the calls to and from that function in all stacks with two-sided flame graph.
The one at https://github.com/flamegraph-rs/flamegraph/blob/main/exampl... supports both zooming and search.
Ah I see. Yeah pprof is significantly superior.
I recently discovered that `perf` itself can spit out flamegraphs. My workflow has been:
You can also run `perf script -F +pid > out.perf` and then open `out.perf` in Firefox's built-in profile viewer (which is super neat) https://profiler.firefox.comThat repo has no builds and no releases, kind of surprising? And needs another tool to consume perf data?
edit: And I can only build it using bazel, and I need bazel to build bazel? I think I'll stick with Perl...
I guess you didn't get very far in the README because near the top it tells you how to install it. It's a single command:
I got very far in the README, bazel is required for the other repo (perf_data_converter).
Not everybody wants to install Go just to get an application.
pprof doesn't do an amazing job of explaining how to use it with perf (which you'd need to use for a rust project like OP), so:
First install perf, graphviz, perf_data_converter and ofc pprof, then generate the data with `perf record [command]`, and display it with `pprof -http=: perf.data`.
I typically use gperftools for profiling instead of perf. You can LD_PRELOAD it.
> We’re paying a cost for each split_whitespace call, which allocates intermediate slices.
This part seems bit confused, I don't think `split_whitespace` does any allocations. I wish there were few intermediary steps here, e.g. going from &str and split_whitespace to &[u8] and split.
The tokenizer at that point is bit clunky, it is not really comparable to split_whitespace. The new tokenizer doesn't actually have any whitespace handling, it just assumes that every token is followed by exactly one whitespace. That alone might explain some of the speedup.
Likely the reason `split_whitespace` is so slow is
> ‘Whitespace’ is defined according to the terms of the Unicode Derived Core Property White_Space.
If they used `split_ascii_whitespace` things would likely be faster.
Switching parsing from `&str` to `&[u8]` can offer other benefits. In their case, they do `&str` comparisons and are switching that to a `u8` comparison. A lot of other parsers are doing `char` comparisons which requires decoding a `&str` to a `char` which can be expensive and is usually not needed because most grammars can be parsed as `&[u8]` just fine.
> I don't think `split_whitespace` does any allocations.
Correct. Here's the implementation of split_whitespace
So, we're just calling split(IsWhitespace).filter(IsNotEmpty) and keeping the resulting iterator.Rust's iterators are lazy, they only do work when asked for the next item, so their internal state is only what is necessary to keep doing that each time.
IsWhitespace and IsNotEmpty are both predicates which do exactly what you think they do, they're provided in the library because they might not get inlined and if they don't we might as well only implement them exactly once.
Can you help me understand what’s happening between the split and the filter on “a <space> <space> <space> b”? I expect that to be a series of calls to split, each yielding an empty slice. So the whole iterator yields a slice pointing at a, then a slice pointing at b—but it’s had to handle three intermediate slices to get the b. Right?
It creates a Split<'_> iterator using the IsWhitespace function as the pattern. As the user calls .next() on the outer SplitWhitespace<'_>, it calls .next() on the inner Split<'_>, which yields slices "a", "", "", and "b", and the filtered iterator reduces them to "a" and "b".
(But as mentioned, this doesn't perform any allocations, since each slice is just a pointer + length into the original string.)
You're right - split_whitespace returns an iterator that yields string slices (&str) which are just views into the original string without allocation, though the performance difference likely comes from avoiding the iterator indirection and boundary checks.
I don't think memory mapping does anything to prevent false sharing. All threads still get the same data at the same address. You may get page alignment for the file, but the free-form data in the file still crosses page boundaries and cache lines.
Also you don't get contention when you don't write to the memory.
The speedup may be from just starting the work before the whole file is loaded, allowing the OS to prefetch the rest in parallel.
You probably would get the same result if you loaded the file in smaller chunks.
I feel like I'm taking crazy pills. It's not a parser, but a fused parser AND interpreter. This changes the game considerably! It doesn't have to produce an intermediate AST, and therefore can avoid the majority of the allocation that most parsers will perform.
However, avoiding creating the AST is not very realistic for most uses. It's usually needed to perform optimizations, or even just for more complicated languages that have interesting control-flow.
thought it would be fun to write this in raku
This reminds me I should actually write a "natural" arithmetic expression parser for my Rust crate realistic
Right now, realistic can parse "(* (^ 40 0.5) (^ 90 0.5))" and it will tell you that's 60, because yeah, it's sixty, that's how real arithmetic works.
But it would be nice to write "(40^0.5) * (90^0.5)" or similar and have that work instead or as well. The months of work on realistic meant I spent so long without a "natural" parser that I got used to this.
I think having a polish notation parser is good enough for math-y applciations, I wouldn't worry about it too much if I were you. Nice crate by the way!
I very much like those optimizations articles , what could be interesting is to look at benchmarks not only wall time but also other metrics :
- cpu time (better CPU usage can mean shorter wall time but higher CPU)
- memory usage
- but also and maybe more interestingly complexity of code (not an absolute metric, but a very complex/non portable code for 5% speedup may or may not be worth it)
EDIT: formatting
I'm somewhat curious on if these optimizations would all have roughly the same impact if done in other orders? The presentation certainly makes it look like creating a big list of tokens is always the culprit here. Seems somewhat expected, so I agree with the text; but I still wonder if the other optimizations are best to look at in terms of percentage gains or absolute gains, here.
Neat write up! Kudos on that.
I am a bit surprised that the author didn't try to implement a stream parser. This could avoid loading the entire file in memory or relying on OS features like memory-mapped files.
A math expression is basically a tree but represented here as a string in a way that's probably impossible to stream.
This would be a perfect class project. First lesson is letting them go loose, second lessen would be to find out which optimizations they used and which are available
I am not even a newbye in Rust and also this could be just nitpicking, but it seems that match is comparing strings and not characters, if this is the case then I think Common Lisp can optimize more, since there is a special comparison for characters in CL.
Edited: In the optimized version the author use bytes and generators and avoid using strings. I don't know if Rust generators are optimized for speed or memory, ideally you could define the length of the buffer according to the memory cache available.
Edited: I find strange using input = read_input_file()? and then using eval(&input), what happens when there is an error reading the file? Rust is supposed to be a high security language. In CL there are keyword like if-does-not-exists to decide what to do and also read accepts additional parameters for end-of-file and for expressing that this read is inside a recursive procedure inside another read.
I should stop comparing Rust to CL, better learn Rust first. I consider this kind of articles a very good way of learning Rust for those interested in parsing and optimizations. Rust seems to be a very nice language when you can afford the time to develop your program.
> what happens when there is an error reading the file?
the question mark `?` denotes the fact that the error is bubbled up (kind of like an exception, but with stronger typing and less silent)
Specifically the ? symbol is currently implemented via the operator trait Try as Try::branch() which gets you a ControlFlow
If Try::branch gives us a ControlFlow::Break we're done here, return immediately with the value wrapped by Break [if any] inside an Err, otherwise we have ControlFlow::Continue wrapping a value we can use to continue with execution of this function.
This is type checked, so if the function says it returns Result<Goose, Dalek> then the type of the value wrapped in a ControlFlow::Break had better be Err(Dalek) or else we can't use our ? operator here.
Reifying ControlFlow here separates concerns properly - if we want to stop early successfully then control flow can represent that idea just fine whereas an Exception model ties early exit to failure.
Thanks for the info. I imagine that in this care, since it seems the error is not captured, it should end producing panic. So a question mark is used when the expected result is of type Result or Error. Also the web page, https://doc.rust-lang.org/rust-by-example/error/result.html, describe the result type as Ok(T) or Err(E), and indicates that is a richer version of Option.
Yeah, if `main` returns an error I think it exits with an error code and prints it out, so quite similar to a panic.
I think the blog post is not focussing on error handling too much, but in any case this is 'safe', just could likely be handled better in a real-world case.
> I should stop comparing Rust to CL, better learn Rust first
Yes
The ? will directly return Err if there is one during read_input_file(). This is just some syntactic sugar.
If you wanted to match on characters (`char`s) then you could do this with single quotes (`'+'`)
Or if you wanted to do it on bytes, you could also do this, with (`b'+'`).
Unsure if that would provide a meaningful boost or not
Likely, comparing on `char` ('+') would be slower as it requires decoding the `&str` as a `char` which comes with some significant overhead (I've seen 9% on a fairly optimized parser). Ideally, when you grammar is 7-bit ASCII (or any 8-bit UTF-8 values are opaque to your grammar), you instead parse on `&[u8]` and do `u8` comparisons, rather than `char` or `&[u8]`.
Thanks for all the information you provided. I will read Rust by Example and stop posting in this thread to avoid deviating from the OP. Anyway, perhaps other readers are learning Rust and having the same questions in their minds, so your answers are also welcome for them.
Edited: I will eliminate my catfacts username (changing passsord to a random one), I don't like being downvoted and I know I should not mention it, but things are what they are. Good bye catfacts !.
Can somebody explain this line:
How does the compiler derive the type of n?If you've never been exposed to a Hindley-Milner type system[1] it can seem a bit magical, but it essentially works by trying to figure out the types from the inside and out by inferring usage all the way to the top. The type of `n` however is `&str`, but I take it you mean the matching. `n.parse()` can be anything that implements `FromStr`, but `Token::Operand` can only take a `u32`, so it can immediately infer that the result of `n.parse().unwrap()` must be `u32` (`n.parse()` is a `Result<u32, Err>`).
[1]: https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_sy...
Operand(u32) (see definition of Token), n will be parsed as a u32. See here: https://doc.rust-lang.org/std/primitive.str.html#method.pars...
We're doing a pattern match, so, this variable n has to be something that matches the entire value matched, its type will be identical to the type of the value matched, s a few lines earlier.
That value is an item from the iterator we got from calling split_whitespace() and split_whitespace() returns a SplitWhiteSpace, a custom iterator whose items are themselves sub-strings of the input string with (no surprise) no white space in them. In Rust's terminology these are &str, references to a string slice.
So, the type is &str
Aha. But what type does n.parse() have then, and how does the compiler derive it?
In this case the compiler actually first wants the type for a parameter to the function Token::Operand
That function is not shown, but it is included in the full source code which was linked. Well, technically we need to know that Rust says if there's a sum type Token::Operand which has an associated value, we can always call a function to make a Token::Operand with that value, and it just names this function Token::Operand too.
So, Token::Operand takes an i32, a 32-bit signed integer. The compiler knows we're eventually getting an i32 to call this function, if not our program isn't valid.
Which means n.parse().unwrap() has the type i32
We know n is an &str, the &str type has a generic function parse(), with the following signature:
So the type you now care about, that of n.parse() has to be Result of some kind, and we're going to call Result::unwrap() on that, to get an i32This can only work if the type F in that generic signature above is i32
Which means the new type you care about was Result<i32, ParseIntError> and the parse function called will be the one which makes Ok(i32) when presented an in-range integer.
Edited: Word-smithing, no significant change of meaning.
That function is returning a Vec<Token>, and so it knows the .collect() call needs to return a Vec<Token>, and so therefore the .map() function needs to return a Token. Therefore each match arm needs to return a Token too, so therefore the compiler selects the implementation of .parse() that returns Token.
I admit when I started rust, seeing calls to .parse() was one of the more confusing things I saw in rust code, because of how much it leans on type inference to be readable. In places like these, it's a bit more readable:
But when you see the .parse buried several levels deep and you have no idea what type it's trying to produce, it's a pain in the ass to read. This is why it's nice to use the turbo-fish syntax: Since you can drop .parse::<IpAddr>()? anywhere to make the type explicit, especially when buried in type-inferred blocks like the code in TFA.`n` is the same type as `s` from "match s" and 'n' is just `s` but renamed, if none of the previous conditions passed.
Because `match <exp>` could have contained an expression, you might need to handle a "catch all" case where you can refer to the result of that expression.
The code could have been `match s.doSomething() { ...`. The lines above what you have quoted just compare the result to a couple of a constants. If none are true, the line that you have quoted is equivalent to renaming the result of that expression to `n` and then handling that case.
maybe this isn't the question you meant to ask, but:
`n` has the same type as the input of the `match` block. In other words, it's a fallback case. (In this case, it's `&str`; the same as `"+"`, `"-"`, etc)
If you're wondering how `n.parse().unwrap()` has its type computed, well that part is because type inference is able to look at the definition of `Token::Operand(u32)` and discover that it's `u32`.
From my experience: The compiler can do this, as long as the first usage of the unknown-typed-thing gives it a type. If the first usage of it doesn't, then it won't try any harder to infer the type and it won't compile unless you add your own annotations on.
Might also be useful for me to link to the docs for `parse` [1] and to the trait `FromStr` [2] that it relies on:
[1]: https://doc.rust-lang.org/std/primitive.str.html#method.pars... [2]: https://doc.rust-lang.org/std/str/trait.FromStr.html
That's an i32 not a u32 - the operands are allowed to be say -1234 not only positive numbers apparently.
[dead]
I have observed that "fearless concurrency" didn't really do much in this case compared to basic practices like not allocating in tight loops.
I am wondering if there is a different approach that 'peaks' better in terms of perf, like instead of doing : - Optimization 1: Do not allocate a Vector when tokenizing - Optimization 2: Zero allocations — parse directly from the input bytes - Optimization 3: Do not use Peekable - Optimization 4: Multithreading and SIMD - Optimization 5: Memory‑mapped I/O
Example : - Optimization 1: Memory‑mapped I/O - Optimization 2: Do not use Peekable - Optimization 3: Do not allocate a Vector when tokenizing - Optimization 4: Zero allocations — parse directly from the input bytes Conclusion - Optimization 5: Multithreading and SIMD
I might be guessing, but in this order probably by Optimization 3 you would reach already a high throughput that you wouldn't bother with manual simd nor Multithreading. (this is a pragmatic way, in real life you will try to minimize risk and try to reach goal as fast as possible, simd/Multithreading carry a lot of risk for your average dev team)
> I might be guessing, but in this order probably by Optimization 3 you would reach already a high throughput that you wouldn't bother with manual simd nor Multithreading.
I agree with you though from my experience Memory Mapping is only useful if you need to jump through the file or read it multiple times (as is the case after the author added simd and a two pass step, the first to identify whitespaces and the second to parse the operation and the operants). If you just need to read the file once it's better to avoid memory mapping as it adds a little overhead.
On the other hand parsing directly from the input bytes avoiding the UTF-8 validation needed to have &str type is easy enough to do but still improves performance quite a bit. Even the rust csv crate, which does much more, is around 30% faster with this optimization. https://docs.rs/csv/latest/csv/tutorial/index.html#amortizin...
This is to say, my list for "easy optimizations, big gains", would be 1) Do not allocate a Vector — 2) Do not use peekable — 3) Avoid utf8 validation. I'm still guessing, but I think memory mapping can be skipped, and might be worth it only if you plan on also implementing simd.