iterator - Rust String indexing. Compare str[i] char -
iterator - Rust String indexing. Compare str[i] char -
i wan't read strings "input.txt" , leave those, have no #
(comment) symbol in start of line. wrote code:
use std::io::bufferedreader; utilize std::io::file; fn main() { allow path = path::new("input.txt"); allow mut file = bufferedreader::new(file::open(&path)); allow lines: vec<string> = file.lines().map(|x| x.unwrap()).collect(); allow mut iter = lines.iter().filter(|&x| x.as_slice().chars().next() != "#".chars().next()); println!("{}", iter.next().unwrap()); }
but line
|&x| x.as_slice().chars().next() != "#".chars().next()
smells bad me, because can |x| x[0] == "#"
, can't check e.g. sec char in string.
so how can refactor code?
rust strings stored sequence of bytes representing characters in utf-8 encoding. utf-8 variable-width encoding, byte indexing can leave within character, unsafe. getting code point index o(n) operation. moreover, indexing code points not want do, because there code points not have associated characters, diacritics or other modifiers. indexing grapheme clusters closer right approach, is needed in text rendering or, probably, language processing.
what mean indexing string hard define properly, , people want wrong. hence rust not provide generic index operation on strings.
occasionally, however, need index strings. example, if know in advance string contains ascii characters or if working binary data. in case rust, of course, provides necessary means.
first, can obtain view of underlying sequence of bytes. &str
has as_bytes()
method returns &[u8]
, piece of bytes string consists of. can utilize usual indexing operation:
x[].as_bytes()[0] != b'#'
note special notation: b'#'
means "ascii character #
of type u8
", i.e. byte character literal (also note don't need write "#".chars().next()
character #
, can write '#'
- plain character literal). unsafe, however, because &str
utf-8-encoded string , first character can consist of more 1 byte.
the proper way handle ascii info in rust utilize ascii
, slices of it. can go &str
&[ascii]
to_ascii()
, to_ascii_opt()
methods. can see them here. can utilize this:
x[].to_ascii()[0] != '#'.to_ascii()
this way need more typing much more safety in return, because to_ascii()
checks work ascii info only.
sometimes, however, want work binary data, without interpreting characters, if source contains ascii characters. can happen, example, when you're writing parser markup language markdown. in case can treat whole input sequence of bytes:
fn main() { allow path = path::new("input.txt"); allow mut file = bufferedreader::new(file::open(&path)); allow buf: vec<u8> = file.read_to_end().unwrap(); allow mut iter = buf[].split(|c| c == b'\n').filter(|line| line[0] == b'#') println!("{}", iter.next().unwrap()); }
string iterator rust
Comments
Post a Comment