Iakov's blog

Wednesday, July 18, 2018

Database indexes

Few days ago I read an article (https://www.toptal.com/database/database-design-bad-practices), which describes some best practices on database schema design. Among the practices described, I would like to extend original post with more details on indexes.

What is an index?

Index is a special type of data, which does not contain table data, but describes where to find records in the table data-file.

We need to understand that majority of databases today work as a tape-recorder. By tape-recorder I mean, that all records that we insert to the table are appended to the end of table data-file on the disk (we are not considering in-memory databases here).

So, basically index describes at what position a record with specific field value is present in table data-file.

How indexes work?

Usually indexes represent some kind of search tree data-structure (usually B-trees).

When record is inserted to a table, the indexed field value is also inserted to search tree structure, which points to a record address in table data-file.

Why search tree structures are used under the hood?

Search tree structures guarantee almost constant time when you search for a value in the tree. Well, not exactly constant time, in reality it is logarithmic time, however it is a bit slower than constant time but much faster than linear search.

For example, you’ve got a table with 1 million records. To find a single record in such table without using an index you will have to iterate through all records and perform 1 million iterations in the worst case.

However, if we take simplest binary search tree as the index implementation, we would make only log2(1000000) iterations, which approximately equals to 20!

Indexes within database environment.

It should be mentioned, that indexes are not always used by the database engine (DBE).

For example, you have a table with 1 million records, and the table has field “Age”.

For instance, value Age = 29 is present in 999500 records, value Age = 30 is present in 500 records, and you have created an index for Age field.

Then you run a query to retrieve all records where field Age is equal to 29.

With 99% probability, your index will not be used, and DBE will choose to use Full Table Scan (FTS) instead (i.e. iterate through all the records).

Why would it do that way? Didn’t we just say that indexes increase our extraction time?

Let’s take a look at our data first, 99.95% of our records contain Age value which is equal to 29.

So using FTS for extracting 99.95% of table data is more efficient than using index lookup. DBE will simply iterate through all table records and ignore records where Age is not equal to 29.

On the other hand, if you run query to return all records where field Age is equal to 30, then DBE will use index, because index lookup will work faster to extract 0.05% of table data.

The amount of records that should be returned by the query compared to total amount of records in the table is called selectivity. If query shows good selectivity, then index is used, otherwise it is more likely that DBE will iterate through entire table.

Alright, we discovered what selectivity is, but how does the DBE knows how many records are there in table with specific values? Does it pre-scans entire table?

No it does not do any pre-scaning.

Remember we said: when new record is inserted to the table, we also insert new value to index data-structure? When DBE inserts new value to index data-structure it also adds indexed value to index statistics.

Index statistics keeps information regarding how many times certain values are present in the table. Based on such index statistics, DBE chooses the strategy for query execution and data extraction.

Summary.

In this article I tried to give some introductory information on how database indexes work.

Hopefully this will help newcomers to conquer database design science faster.

Use indexes wisely.

Good luck!

Wednesday, November 1, 2017

Xamarin, bluetooth connection gets dropped after GC is run

Xamarin is a complex beast, and sometimes it gets ugly if you don't use it right way...

On my current job one of the projects is a mobile (android) app which connects via bluetooth to vehicle CAN-bus, gathers some info, sends that info to backend, backend performs some statistics calculations and displays results to end user.

I must admit that we should not have used Xamarin at all, as we faced so many problems with it.
I wish we could rewind back several months of work, and use plain Java and Android API.

As we finished coding all the features required, and tested the app with CAN-bus software simulator, we shipped the app to the customer. This is where testing has started using real hardware.

The problem that we observed was Xamarin's garbage collector...
We used separate thread for exchanging data between the app and BT adaptor.
Every time Mono runtime ran a garbage collector - bluetooth connection would get lost.
The app knew how to reconnect to BT adaptor gracefully, but everytime connection was lost and established again - a toast would pop-up displaying corresponding message, and that would happen too often and annoy an app user.

So why would GC shutdown a bluetooth connection in the app?
And why did not we observe such behavior using simulator?

We did not observe it with simulator, because real hardware was sending a lot more data packets at higher speed.
So with simulator we only tested the correctness of data protocol, but not the high bandwith.
Basically with simulator we did not have GC running so often, and did not observe an issue.

Returning back to the first question - why would GC do such bad things for us?
We tried all three types of Xamarin GC currently available: tarjan, new and old. Nothing helped...
Then we just started to review code over and over again, trying to understand what went wrong.
The problem was that connection was created in one thread, and all data exchange was performed in the other thread.
Once we moved connection creation and data exchange logic to same thread - the problem disappeared.

I should mention that it was not due that we lost reference to connection object somehow and it would automatically get disposed, or other known issues of managed runtime.
App would simply throw an exception having java.io.IOException under the hood saying that read() method returned -1.

I'm not sure if it is a limitation of Android platform, or some specifics of Xamarin runtime on top of Android.
But if you guys face same issue with Xam, I hope this post will save you several days.

Sunday, October 29, 2017

How would you explain recursion to a non-programmer?

Couple days ago I tried to explain what recursion is to my dad.

Let me explain a bit about my father. He is 54 y.o. and he wants to study some programming to develop android apps for his personal needs. So I decided to give him beginners introduction to Java. He has a copy of H.Schildt's Java 8 book. At one point he got stuck on the section called Recursion.

You know, this is quite a challange to explain to non-programmer what recursion is.

Quite often book authors tend to use factorial as an example of recursion:

class Factorial {
    int fact(int n) {
        int result;
        if (n == l) return 1;
        result = fact(n - 1) * n;
        return result;
    }
}

As it has appeared, the most complex thing here is to explain that function (method in Java) calls itself.

My first thought was - having a non-void method (i.e. method that returns some value) adds more complexity to understanding the recursion.

So, first I asked my dad what will be the output of the following code:

void out(int n) {
    if (n > 0) {
        System.out.println(n);
        out(n - 1);
    }
}

out(3);

"3 2 1" - he said.
- Yes this is correct.

Fine, what will be the output of the following code, if we had changed the order of instructions?

void out(int n) {
    if (n > 0) {
        out(n - 1);
        System.out.println(n);
    }
}

out(3);

"3 2 1" he said.
When I told him, that the code will output "1 2 3", he said that he didn't understand why.

Then I tried to expand "out" recursive calls in java notation (using java syntax and System.out.println), and show him that println statements go in "1 2 3" order, but still no luck, he wouldn't understand.

We tried different approaches, and here is the one that actually did the trick.
We used following notation to expand recursion method calls.

Let's see what happens to "1 2 3" example, when we start to expand recursion calls (we'll do it step by step).

"1 2 3" example, step-by-step recursion expansion.

I will highlight in red expanded lines on every step.

Step 1, expand out(3) call

out(3) // We call out(3) from outer code
    -> 3 > 1 // n > 0 condition is true, as n equals to 3, and 3 is greater than 0
    -> out(2) // out(n - 1) call, as n equals to 3, then n - 1 will equal to 3 - 1 = 2, so we call out(2)
    -> print(3) // System.out.println(n), n equals to 3, so print(3)

Step 2, expand inner out(2) call

out(3) // We call out(3) from outer code
    -> 3 > 0 // n > 0 condition is true, as n equals to 3, and 3 is greater than 0
    -> out(2) // out(n - 1) call, as n equals to 3, then n - 1 will equal to 3 - 1 = 2, so we call out(2)
        -> 2 > 0 // we're expanding out(2) call, n > 0 condition is true, inside the inner call n equals to 2 and 2 > 0.
        -> out(1) // out(n - 1) call, as n equals to 2 in the inner call, 2 - 1 = 1
        -> print(2) // print 2
    -> print(3) // System.out.println(n), n equals to 3, so print(3)

Step 3, expand inner out(1) call

out(3) // We call out(3) from outer code
    -> 3 > 0 // n > 0 condition is true, as n equals to 3, and 3 is greater than 0
    -> out(2) // out(n - 1) call, as n equals to 3, then n - 1 will equal to 3 - 1 = 2, so we call out(2)
        -> 2 > 0 // we're expanding out(2) call, n > 0 condition is true, inside the inner call n equals to 2 and 2 > 0.
        -> out(1) // out(n - 1) call, as n equals to 2 in the inner call, 2 - 1 = 1
            -> 1 > 0 // true, n > 0, n equals to 1, so 1 > 0 is true
            -> out(0) // out(n - 1) call
            -> print(1) // print 1
        -> print(2) // print 2
    -> print(3) // System.out.println(n), n equals to 3, so print(3)

Step 4, expand inner out(0) call

out(3) // We call out(3) from outer code
    -> 3 > 0 // n > 0 condition is true, as n equals to 3, and 3 is greater than 0
    -> out(2) // out(n - 1) call, as n equals to 3, then n - 1 will equal to 3 - 1 = 2, so we call out(2)
        -> 2 > 0 // we're expanding out(2) call, n > 0 condition is true, inside the inner call n equals to 2 and 2 > 0.
        -> out(1) // out(n - 1) call, as n equals to 2 in the inner call, 2 - 1 = 1
            -> 1 > 0 // true, n > 0, n equals to 1, so 1 > 0 is true
            -> out(0) // out(n - 1) call
                -> 0 > 0 // false, n equals to 0 here and 0 cannot be greater than 0. the rest of statements in out(0) call are not executed.
            -> print(1) // print 1
        -> print(2) // print 2
    -> print(3) // System.out.println(n), n equals to 3, so print(3)

In step 4, I highlighted print statements in blue color.
As we all know, programs execute statements from top to bottom, so the code will print "1 2 3".

"Factorial" example, step-by-step recursion expansion.

Fine, we just saw how to expand recursion calls with no return value.

We have following factorial calculation code:


int fact(int n) {
    if (n > l) {
        int result = fact(n - 1);
        result = result * n;
        return result;
    }
    else {
        return 1;
    }
    return result;
}

System.out.println(fact(3)); // prints out "6"

Now let's see how to expand recursion with return value.

When you deal with return value, you should perform expansion in 2 ways:

Forward expansion - when you expand recursion calls from topmost call to innermost, will be highlighted in red;
Backward expansion - when you collect return values from innermost call to topmost, will be highlighted in blue.

Step 1, forward expansion, expand fact(3) call

fact(3)
    -> 3 > 1 // condition n > 1 is true, as n equals to 3, 3 is greater than 1
    -> result = fact(2) // it is a fact(n - 1) call, as n equals to 3, 3 - 1 = 2, so we call fact(2)
    -> result = result * 3 // "result = result * n" statement, n equals to 3, will be explained in backward expansion
    -> return result // will be explained in backward expansion

Step 2, forward expansion, expand inner fact(2) call

fact(3)
    -> 3 > 1 // condition n > 1 is true, as n equals to 3, 3 is greater than 1
    -> result = fact(2) // it is a fact(n - 1) call, as n equals to 3, 3 - 1 = 2, so we call fact(2)
        -> 2 > 1 // condition n > 1 is true, as n equals to 2, 2 is greater than 1
        -> result = fact(1) // it is fact(n - 1) call, as n equals to 2, 2 - 1 = 1, so we call fact(1)
        -> result = result * 2 // "result = result * n" statement, n equals to 2, will be explained in backward expansion
        -> return result // will be explained in backward expansion
    -> result = result * 3 // "result = result * n" statement, n equals to 3, will be explained in backward expansion
    -> return result // will be explained in backward expansion

Step 3, forward expansion, expand inner fact(1) call

fact(3)
    -> 3 > 1 // condition n > 1 is true, as n equals to 3, 3 is greater than 1
    -> result = fact(2) // it is a fact(n - 1) call, as n equals to 3, 3 - 1 = 2, so we call fact(2)
        -> 2 > 1 // condition n > 1 is true, as n equals to 2, 2 is greater than 1
        -> result = fact(1) // it is fact(n - 1) call, as n equals to 2, 2 - 1 = 1, so we call fact(1)
            -> 1 > 1 // condition n > 1 is false, as n equals to 1, 1 is not greater than 1
            -> return 1 // here we start backward expansion, fact(1) returns 1
        -> result = result * 2 // "result = result * n" statement, n equals to 2, will be explained in backward expansion
        -> return result // will be explained in backward expansion
    -> result = result * 3 // "result = result * n" statement, n equals to 3, will be explained in backward expansion
    -> return result // will be explained in backward expansion

Step 4, backward expansion, gather result from inner fact(1) call

fact(3)
    -> 3 > 1 // condition n > 1 is true, as n equals to 3, 3 is greater than 1
    -> result = fact(2) // it is a fact(n - 1) call, as n equals to 3, 3 - 1 = 2, so we call fact(2)
        -> 2 > 1 // condition n > 1 is true, as n equals to 2, 2 is greater than 1
        -> result = fact(1) // fact(1) call returned 1, result equals to 1
            -> 1 > 1 // condition n > 1 is false, as n equals to 1, 1 is not greater than 1
            -> return 1 // fact(1) returns 1
        -> result = result * 2 // as result equals to 1 from fact(1) assignment, and we multiply it by 2, so result = 2
        -> return result // fact(2) returns 2
    -> result = result * 3 // "result = result * n" statement, n equals to 3, will be explained in backward expansion
    -> return result // will be explained in backward expansion

Step 5, backward expansion, gather result from inner fact(2) call

fact(3)
    -> 3 > 1 // condition n > 1 is true, as n equals to 3, 3 is greater than 1
    -> result = fact(2) // fact(2) call returned 2, result equals to 2
        -> 2 > 1 // condition n > 1 is true, as n equals to 2, 2 is greater than 1
        -> result = fact(1) // fact(1) call returned 1, result equals to 1
            -> 1 > 1 // condition n > 1 is false, as n equals to 1, 1 is not greater than 1
            -> return 1 // fact(1) returns 1
        -> result = result * 2 // as result equals to 1 from fact(1) assignment, and we multiply it by 2, so result = 2
        -> return result // fact(2) returns 2
    -> result = result * 3 // as result equals to 2 from fact(2) assignment, and we multiply it by 3, so result = 6
    -> return result // fact(3) returns result to outer method, result = 6

Recursion might get tricky when you try to explain that technique to someone else.

It feels like it is an obvious thing to you (when you're a programmer), but it makes no sense to person you explain it to.

Hopefully this article did not sound like "... to understand recursion you first have to understand recursion ...", haha.

Tuesday, May 6, 2014

Domain classes & EntityFramework

There are tons of samples on the net of how easy it is to persist objects with EntityFramework.
Most of those examples use the simplest POCO structures, that have list of C# automatic properties with public getters and setters, like this model of a person (Owner) and his pets (ICollection<Pet>):

public class Owner

{

    public int Id { get; set; }

    public string FullName { get; set; }

    public ICollection<Pet> Pets { get; set; }

}

public class Pet

{

    public int Id { get; set; }

    public string Name { get; set; }

}

Well, this is a working design. But think of how you would create instances of such classes. You create an instance, and then you assign all the required properties in your code. What if by accident you forget to assign one of the required properties?! - The answer is: it might take a while to find such bug in your application.

Another thing, which I don't like, is that such design allows user to fetch object from database and change it's ID, which is bad, we should never allow someone to change entity's ID.

We might also want to control other scalar property values, which are defined by our domain. But such design of data structures simply does not give us a certain amount of control over data, it forces us to put all the domain-validation logic outside of model classes.

Constructors and access modifiers come to rescue.
Let's modify the previous example:

public class Owner

{

    public Owner(int id, string fullName)

    {

        this.Id = id;

        this.FullName = fullName;

        this.Pets = new List<Pet>();

    }

    public int Id { get; internal set; }

    public string FullName { get; set; }

    public ICollection<Pet> Pets { get; internal set; }

}

public class Pet

{

    public Pet(int id, string name)

    {

        this.Id = id;

        this.Name = name;

    }

    public int Id { get; internal set; }

    public string Name { get; set; }

}

We added constructors to both Owner and Pet class. Constructors in both classes take parameters for all required properties. Now we are unable to create model instances without filling all the required properties. We have also changed setter access of ID properties to internal.

The design looks okay now.
But we won't be able to persist these classes with EF.
In order to persist an instance of a class, EF requires us to declare parameterless constructor, let's define one:

public class Owner

{

    internal Owner() {}

    public Owner(int id, string fullName)

    {

        this.Id = id;

        this.FullName = fullName;

        this.Pets = new List<Pet>();

    }

    public int Id { get; internal set; }

    public string FullName { get; set; }

    public ICollection<Pet> Pets { get; internal set; }

}

public class Pet

{

    internal Pet() {}

    public Pet(int id, string name)

    {

        this.Id = id;

        this.Name = name;

    }

    public int Id { get; internal set; }

    public string Name { get; set; }

}

Now we're able to persist classes with EF.
But there're some limitations conserning EF lazy loading feature. It won't work with such design.

EF documentation states, that we should mark association properties as virtual. But take a look, we have internal setter for ICollection<Pet> Pets property. We also have internal parameterless constructor for Owner and Pet classes. To make lazy loading working, EF subclasses your entity class and overrides virtual properties. With internal access level of constructor and association property, EF will not be able to override anything.

protected internal access level comes to rescue:

public class Owner

{

    protected internal Owner() {}

    public Owner(int id, string fullName)

    {

        this.Id = id;

        this.FullName = fullName;

        this.Pets = new List<Pet>();

    }

    public int Id { get; internal set; }

    public string FullName { get; set; }

    public virtual ICollection<Pet> Pets { get; protected internal set; }

}

public class Pet

{

    protected internal Pet() {}

    public Pet(int id, string name)

    {

        this.Id = id;

        this.Name = name;

    }

    public int Id { get; internal set; }

    public string Name { get; set; }

}

protected internal access level acts as protected OR internal. This way it is allowed for the subclass to override virtual members.

Sunday, July 22, 2012

Combining jquery validate and tipsy plugins

There're two popular plugins for jQuery, which are both handy in today's web-ui development:

jQuery validate (download link, documentation) - client-side forms validation
jQuery tipsy (download link & documentation) - nice looking tooltips

Both plugins a very easy to integrate into existing page. However I could not find any posts on the internet on how to use tipsy with validation plugin.

Well, since both plugins are highly extendible, it did not take long to integrate both in a single solution.

Let's assume we've got a simple sign-in form:

<form action="/SignIn" method="post" id="form">

    <table align="center">

        <tr>

            <td class="field">User name:</td>

            <td class="input">

                <input type="text" name="UserName" id="UserName" />

            </td>

        </tr>

        <tr>

            <td class="field">Password:</td>

            <td class="input">

                <input type="password" name="Password" id="Password" />

            </td>

        </tr>

        <tr>

            <td colspan="2">

                <input type="submit" value="Sign In" />

            </td>

        </tr>

    </table>

</form>

We would like to validate that both UserName and Password fields are filled.

The solution using both tipsy and validate plugins looks like this:

    $(function () {

        $('#form :input').tipsy({ trigger: 'manual', fade: true, gravity: 'w' });

        $('#form').validate({

            rules: {

                'UserName': 'required',

                'Password': 'required'

            },

            messages: {

                'UserName': 'UserName is required field',

                'Password': 'Password is required field'

            },

            success: function (label) {

                $(label).each(function () {

                    $('#' + this.htmlFor).tipsy('hide').removeAttr('title');

                });

            },

            errorPlacement: function (error, element) {

                element.attr('title', error.text());

                element.tipsy('show');

            }

        });

    });

We might also want to apply such behavior to all the pages where we include validate and tipsy plugins together. To achieve this, let's put the following piece of code to separate .js file, and include this file in all pages. As you can see, we've just overrided success and errorPlacement default options:

$(function () {

    if (typeof $.fn.validate != 'undefined'

        && typeof $.fn.tipsy != 'undefined') {

        $.validator.setDefaults({

            success: function (label) {

                $(label).each(function () {

                    $('#' + this.htmlFor).tipsy('hide').removeAttr('title');

                });

            },

            errorPlacement: function (error, element) {

                element.attr('title', error.text());

                element.tipsy('show');

            }

        });

    }

});

Monday, October 31, 2011

Great JSON library for .NET

Last night, I came across a great JSON serialization/deserialization library for .NET - http://json.codeplex.com/ (Newtonsoft.Json). It provides features similar to System.Xml & System.Xml.Linq, but for JSON.

I simply needed to build a custom well-formed JSON response from my AJAX service, and the library had everything I needed.

Saturday, October 29, 2011

Mono XSP problem on Windows

With the latest builds of Mono (2.10.5, 2.10.6) XSP web server does not work on my Windows 7 machine. XSP starts correctly but when you send an http-request from your browser to XSP server, the page just hangs and no response is received. As a solution I have downloaded MonoDevelop 2.8 package, which has it's own newest XSP server. The server from MonoDevelop package seem to work just fine in Windows 7 environment.