Domenico Luciani - Save 5 minutes web scraping with Rust

Often I read that for simple tasks Rust is not a good choice. Go is more adopted when we need to create a small script to automatize our jobs. But is it true?

So far, I have been in Spain for two years, and now I can consider myself fully on-boarded. I learned Spanish, I have my residency here, and I have my Spanish telephone number.
The company that I chose is Lycamobile, it’s cheap, and it has the minimum requirements to live without any¹ problem.
The website tho has a problem with the login: I can’t copy-paste my credentials. Besides that, in general, the UI it’s a mess, and it’s pretty slow, especially when you log in and want to see your data, but as I said, it’s cheap.
Every time I want to see how much money/internet data I still have, I have to spend some time to get my creds and access my profile. It’s very annoying, so I thought to automatize this process by getting that info directly in my shell with one execution. Of course, I decided to use Rust for this job.

vamos

As the first task, I needed to log in and keep the session alive. How to do it in Rust?

I used the reqwest crave; I just need to add

reqwest = { version = "0.11", features = ["blocking", "cookies"] }

to my Cargo.toml file to use it and have the blocking and cookies support since I don’t need any async operation and need to save the session from one request to another.

Then let’s start with the login:

let client = reqwest::blocking::Client::builder().cookie_store(true).build()?;
client.post("https://www.lycamobile.es/wp-admin/admin-ajax.php")
    .form(&[
            ("action", "lyca_login_ajax"),
            ("method", "login"),
            ("mobile_no", "<MOBILE_PHONE_NUMBER>"),
            ("pass", "<SUPER_SECRET_PASSWORD>")
    ])
    .send()?;

Yes, that’s it. If everything is as it should be, I should have logged in. No, in this article, I won’t do any further checks for simplicity.

Once I have done the login, I should get into my profile to get my stats; I can do it with these two lines of code:

let response = client.get("https://www.lycamobile.es/es/my-account/").send()?;
let body_response = response.text()?;

If everything went well, I should have gotten the page’s body right into the body_response variable. Easy no?

easy

Scraping the body

Now the tricky part, scraping the page’s body; how to do it?

I used the Scraper crate; I need to add

scraper = "0.12.0"

to my Cargo.toml file to be able to use it.

First of all, we need to parse our document that, in this case, is in the body_response variable.

let parsed_html = Html::parse_document(&body_response)

and then we can start scraping it:

let selector = &Selector::parse("p.bdl-balance > span")
    .expect("Error during the parsing using the given selector");
let span_text = parsed_html
    .select(selector)
    .flat_map(|el| el.text())
    .collect()

Here, I’m navigating the DOM of the page as I’m using JQuery. I say to the scraper that I want to get the text inside the span element, children of the paragraph with class bdl-balance. Incredible uh?

Then a bit of text manipulation:

span_text.split("hasta").nth(1)
    .expect("Can't get the expiration date correctly")
    .trim_start().to_string()

And the result: 07-05-2021

Same thing for all the information I need, just a bit of walking through the DOM using the correct selector, manipulate the result to obtain what I need and print it.

For example, the internet data left:

let selector = &Selector::parse("div.bdl-mins")
    .expect("Error during the parsing using the given selector");
let div_text = parsed_html
    .select(selector)
    .flat_map(|el| el.text())
    .collect()

and

div_text.get(2..)
.unwrap_or_else(||"Can't get the internet balance correctly")
.to_string()

To obtain: 5.66GB

I spent like 20minutes in total to get it done, the crate’s documentation and the examples I found on its pages were super helpful, and I’ve got my script done without any headache.

Later on, I decided that maybe this could have been helpful for someone else, so I decided to add more things like reading the credentials from a YAML file, having a pretty output, and checking more errors during the process.

I made it public on my Github: https://github.com/dlion/rustyca, of course, any constructive feedback is welcome.

Here an example of the final output:

__________                __
\______   \__ __  _______/  |_ ___.__. ____ _____
 |       _/  |  \/  ___/\   __<   |  |/ ___\\__  \
 |    |   \  |  /\___ \  |  |  \___  \  \___ / __ \_
 |____|_  /____//____  > |__|  / ____|\___  >____  /
        \/           \/        \/         \/     \/

Money Balance: €0.03
Internet Balance: 5.66GB
Expiration Date: 07-05-2021

Nice uh?

bello

My final thoughts about it are that I haven’t had any issues using new crates or reading through the documentation and that the major problem I had was related to the fact that I’m still learning the language. I need more time to get used to the basics; doing that, I’m pretty sure I will speed up even the “let’s do the right things in the right ways” part (the one that I care more, to be honest).

Actually I’ve had some problems with this cheap company but it’s another another story ↩

Menu

Login

Scraping the body