[VOTE] Add new check to measure quality of tests

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[VOTE] Add new check to measure quality of tests

vmassol
Administrator
Hi devs,

As part of the STAMP research project, we’ve developed a new tool (Descartes, based on Pitest) to measure the quality of tests. It generates a mutation score for your tests, defining how good the tests are. Technical Descartes performs some extreme mutations on the code under test (e.g. remove content of void methods, return true for methods returning a boolean, etc - See https://github.com/STAMP-project/pitest-descartes). If the test continues to pass then it means it’s not killing the mutant and thus its mutation score decreases.

So in short:
* Jacoco/Clover: measure how much of the code is tested
* Pitest/Descartes: measure how good the tests are

Both provide a percentage value.

I’m proposing to compute the current mutation scores for xwiki-commons and xwiki-rendering and fail the build when new code is added that reduce the mutation score threshold (exactly the same as our jacoco threshold and strategy).

I consider this is an experiment to push the limit of software engineering a bit further. I don’t know how well it’ll work or not. I propose to do the work and test this for over 2-3 months and see how well it works or not. At that time we can then decide whether it works or not (i.e whether the gains it brings are more important than the problems it causes).

Here’s my +1 to try this out.

Some links:
* pitest: http://pitest.org/
* descartes: https://github.com/STAMP-project/pitest-descartes
* http://massol.myxwiki.org/xwiki/bin/view/Blog/ControllingTestQuality
* http://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes

If you’re curious, you can see a screenshot of a mutation score report at http://massol.myxwiki.org/xwiki/bin/download/Blog/MutationTestingDescartes/report.png

Please cast your votes.

Thanks
-Vincent
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Add new check to measure quality of tests

Thomas Mortagne
Administrator
+1

On Thu, Mar 15, 2018 at 9:30 AM, Vincent Massol <[hidden email]> wrote:

> Hi devs,
>
> As part of the STAMP research project, we’ve developed a new tool (Descartes, based on Pitest) to measure the quality of tests. It generates a mutation score for your tests, defining how good the tests are. Technical Descartes performs some extreme mutations on the code under test (e.g. remove content of void methods, return true for methods returning a boolean, etc - See https://github.com/STAMP-project/pitest-descartes). If the test continues to pass then it means it’s not killing the mutant and thus its mutation score decreases.
>
> So in short:
> * Jacoco/Clover: measure how much of the code is tested
> * Pitest/Descartes: measure how good the tests are
>
> Both provide a percentage value.
>
> I’m proposing to compute the current mutation scores for xwiki-commons and xwiki-rendering and fail the build when new code is added that reduce the mutation score threshold (exactly the same as our jacoco threshold and strategy).
>
> I consider this is an experiment to push the limit of software engineering a bit further. I don’t know how well it’ll work or not. I propose to do the work and test this for over 2-3 months and see how well it works or not. At that time we can then decide whether it works or not (i.e whether the gains it brings are more important than the problems it causes).
>
> Here’s my +1 to try this out.
>
> Some links:
> * pitest: http://pitest.org/
> * descartes: https://github.com/STAMP-project/pitest-descartes
> * http://massol.myxwiki.org/xwiki/bin/view/Blog/ControllingTestQuality
> * http://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes
>
> If you’re curious, you can see a screenshot of a mutation score report at http://massol.myxwiki.org/xwiki/bin/download/Blog/MutationTestingDescartes/report.png
>
> Please cast your votes.
>
> Thanks
> -Vincent



--
Thomas Mortagne
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Add new check to measure quality of tests

Guillaume Delhumeau
+1

2018-03-15 11:26 GMT+01:00 Thomas Mortagne <[hidden email]>:

> +1
>
> On Thu, Mar 15, 2018 at 9:30 AM, Vincent Massol <[hidden email]>
> wrote:
> > Hi devs,
> >
> > As part of the STAMP research project, we’ve developed a new tool
> (Descartes, based on Pitest) to measure the quality of tests. It generates
> a mutation score for your tests, defining how good the tests are. Technical
> Descartes performs some extreme mutations on the code under test (e.g.
> remove content of void methods, return true for methods returning a
> boolean, etc - See https://github.com/STAMP-project/pitest-descartes). If
> the test continues to pass then it means it’s not killing the mutant and
> thus its mutation score decreases.
> >
> > So in short:
> > * Jacoco/Clover: measure how much of the code is tested
> > * Pitest/Descartes: measure how good the tests are
> >
> > Both provide a percentage value.
> >
> > I’m proposing to compute the current mutation scores for xwiki-commons
> and xwiki-rendering and fail the build when new code is added that reduce
> the mutation score threshold (exactly the same as our jacoco threshold and
> strategy).
> >
> > I consider this is an experiment to push the limit of software
> engineering a bit further. I don’t know how well it’ll work or not. I
> propose to do the work and test this for over 2-3 months and see how well
> it works or not. At that time we can then decide whether it works or not
> (i.e whether the gains it brings are more important than the problems it
> causes).
> >
> > Here’s my +1 to try this out.
> >
> > Some links:
> > * pitest: http://pitest.org/
> > * descartes: https://github.com/STAMP-project/pitest-descartes
> > * http://massol.myxwiki.org/xwiki/bin/view/Blog/ControllingTestQuality
> > * http://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes
> >
> > If you’re curious, you can see a screenshot of a mutation score report
> at http://massol.myxwiki.org/xwiki/bin/download/Blog/
> MutationTestingDescartes/report.png
> >
> > Please cast your votes.
> >
> > Thanks
> > -Vincent
>
>
>
> --
> Thomas Mortagne
>



--
Guillaume Delhumeau ([hidden email])
Research & Development Engineer at XWiki SAS
Committer on the XWiki.org project
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Add new check to measure quality of tests

Alex Cotiugă
In reply to this post by Thomas Mortagne
+1

Thanks,
Alex

On Mar 15, 2018 12:26, "Thomas Mortagne" <[hidden email]> wrote:

+1

On Thu, Mar 15, 2018 at 9:30 AM, Vincent Massol <[hidden email]> wrote:
> Hi devs,
>
> As part of the STAMP research project, we’ve developed a new tool
(Descartes, based on Pitest) to measure the quality of tests. It generates
a mutation score for your tests, defining how good the tests are. Technical
Descartes performs some extreme mutations on the code under test (e.g.
remove content of void methods, return true for methods returning a
boolean, etc - See https://github.com/STAMP-project/pitest-descartes). If
the test continues to pass then it means it’s not killing the mutant and
thus its mutation score decreases.
>
> So in short:
> * Jacoco/Clover: measure how much of the code is tested
> * Pitest/Descartes: measure how good the tests are
>
> Both provide a percentage value.
>
> I’m proposing to compute the current mutation scores for xwiki-commons
and xwiki-rendering and fail the build when new code is added that reduce
the mutation score threshold (exactly the same as our jacoco threshold and
strategy).
>
> I consider this is an experiment to push the limit of software
engineering a bit further. I don’t know how well it’ll work or not. I
propose to do the work and test this for over 2-3 months and see how well
it works or not. At that time we can then decide whether it works or not
(i.e whether the gains it brings are more important than the problems it
causes).

>
> Here’s my +1 to try this out.
>
> Some links:
> * pitest: http://pitest.org/
> * descartes: https://github.com/STAMP-project/pitest-descartes
> * http://massol.myxwiki.org/xwiki/bin/view/Blog/ControllingTestQuality
> * http://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes
>
> If you’re curious, you can see a screenshot of a mutation score report at
http://massol.myxwiki.org/xwiki/bin/download/Blog/MutationTestingDescartes/
report.png
>
> Please cast your votes.
>
> Thanks
> -Vincent



--
Thomas Mortagne
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Add new check to measure quality of tests

Eduard Moraru
In reply to this post by Thomas Mortagne
Sounds interesting,
+1.

Thanks,
Eduard

On Thu, Mar 15, 2018 at 12:26 PM, Thomas Mortagne <[hidden email]
> wrote:

> +1
>
> On Thu, Mar 15, 2018 at 9:30 AM, Vincent Massol <[hidden email]>
> wrote:
> > Hi devs,
> >
> > As part of the STAMP research project, we’ve developed a new tool
> (Descartes, based on Pitest) to measure the quality of tests. It generates
> a mutation score for your tests, defining how good the tests are. Technical
> Descartes performs some extreme mutations on the code under test (e.g.
> remove content of void methods, return true for methods returning a
> boolean, etc - See https://github.com/STAMP-project/pitest-descartes). If
> the test continues to pass then it means it’s not killing the mutant and
> thus its mutation score decreases.
> >
> > So in short:
> > * Jacoco/Clover: measure how much of the code is tested
> > * Pitest/Descartes: measure how good the tests are
> >
> > Both provide a percentage value.
> >
> > I’m proposing to compute the current mutation scores for xwiki-commons
> and xwiki-rendering and fail the build when new code is added that reduce
> the mutation score threshold (exactly the same as our jacoco threshold and
> strategy).
> >
> > I consider this is an experiment to push the limit of software
> engineering a bit further. I don’t know how well it’ll work or not. I
> propose to do the work and test this for over 2-3 months and see how well
> it works or not. At that time we can then decide whether it works or not
> (i.e whether the gains it brings are more important than the problems it
> causes).
> >
> > Here’s my +1 to try this out.
> >
> > Some links:
> > * pitest: http://pitest.org/
> > * descartes: https://github.com/STAMP-project/pitest-descartes
> > * http://massol.myxwiki.org/xwiki/bin/view/Blog/ControllingTestQuality
> > * http://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes
> >
> > If you’re curious, you can see a screenshot of a mutation score report
> at http://massol.myxwiki.org/xwiki/bin/download/Blog/
> MutationTestingDescartes/report.png
> >
> > Please cast your votes.
> >
> > Thanks
> > -Vincent
>
>
>
> --
> Thomas Mortagne
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Add new check to measure quality of tests

Ecaterina Moraru (Valica)
+1

Thanks,
Caty

On Thu, Mar 15, 2018 at 12:48 PM, Eduard Moraru <[hidden email]>
wrote:

> Sounds interesting,
> +1.
>
> Thanks,
> Eduard
>
> On Thu, Mar 15, 2018 at 12:26 PM, Thomas Mortagne <
> [hidden email]
> > wrote:
>
> > +1
> >
> > On Thu, Mar 15, 2018 at 9:30 AM, Vincent Massol <[hidden email]>
> > wrote:
> > > Hi devs,
> > >
> > > As part of the STAMP research project, we’ve developed a new tool
> > (Descartes, based on Pitest) to measure the quality of tests. It
> generates
> > a mutation score for your tests, defining how good the tests are.
> Technical
> > Descartes performs some extreme mutations on the code under test (e.g.
> > remove content of void methods, return true for methods returning a
> > boolean, etc - See https://github.com/STAMP-project/pitest-descartes).
> If
> > the test continues to pass then it means it’s not killing the mutant and
> > thus its mutation score decreases.
> > >
> > > So in short:
> > > * Jacoco/Clover: measure how much of the code is tested
> > > * Pitest/Descartes: measure how good the tests are
> > >
> > > Both provide a percentage value.
> > >
> > > I’m proposing to compute the current mutation scores for xwiki-commons
> > and xwiki-rendering and fail the build when new code is added that reduce
> > the mutation score threshold (exactly the same as our jacoco threshold
> and
> > strategy).
> > >
> > > I consider this is an experiment to push the limit of software
> > engineering a bit further. I don’t know how well it’ll work or not. I
> > propose to do the work and test this for over 2-3 months and see how well
> > it works or not. At that time we can then decide whether it works or not
> > (i.e whether the gains it brings are more important than the problems it
> > causes).
> > >
> > > Here’s my +1 to try this out.
> > >
> > > Some links:
> > > * pitest: http://pitest.org/
> > > * descartes: https://github.com/STAMP-project/pitest-descartes
> > > * http://massol.myxwiki.org/xwiki/bin/view/Blog/ControllingTestQuality
> > > * http://massol.myxwiki.org/xwiki/bin/view/Blog/
> MutationTestingDescartes
> > >
> > > If you’re curious, you can see a screenshot of a mutation score report
> > at http://massol.myxwiki.org/xwiki/bin/download/Blog/
> > MutationTestingDescartes/report.png
> > >
> > > Please cast your votes.
> > >
> > > Thanks
> > > -Vincent
> >
> >
> >
> > --
> > Thomas Mortagne
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Add new check to measure quality of tests

Marius Dumitru Florea
+1

Thanks,
Marius

On Thu, Mar 15, 2018 at 12:49 PM, Ecaterina Moraru (Valica) <
[hidden email]> wrote:

> +1
>
> Thanks,
> Caty
>
> On Thu, Mar 15, 2018 at 12:48 PM, Eduard Moraru <[hidden email]>
> wrote:
>
> > Sounds interesting,
> > +1.
> >
> > Thanks,
> > Eduard
> >
> > On Thu, Mar 15, 2018 at 12:26 PM, Thomas Mortagne <
> > [hidden email]
> > > wrote:
> >
> > > +1
> > >
> > > On Thu, Mar 15, 2018 at 9:30 AM, Vincent Massol <[hidden email]>
> > > wrote:
> > > > Hi devs,
> > > >
> > > > As part of the STAMP research project, we’ve developed a new tool
> > > (Descartes, based on Pitest) to measure the quality of tests. It
> > generates
> > > a mutation score for your tests, defining how good the tests are.
> > Technical
> > > Descartes performs some extreme mutations on the code under test (e.g.
> > > remove content of void methods, return true for methods returning a
> > > boolean, etc - See https://github.com/STAMP-project/pitest-descartes).
> > If
> > > the test continues to pass then it means it’s not killing the mutant
> and
> > > thus its mutation score decreases.
> > > >
> > > > So in short:
> > > > * Jacoco/Clover: measure how much of the code is tested
> > > > * Pitest/Descartes: measure how good the tests are
> > > >
> > > > Both provide a percentage value.
> > > >
> > > > I’m proposing to compute the current mutation scores for
> xwiki-commons
> > > and xwiki-rendering and fail the build when new code is added that
> reduce
> > > the mutation score threshold (exactly the same as our jacoco threshold
> > and
> > > strategy).
> > > >
> > > > I consider this is an experiment to push the limit of software
> > > engineering a bit further. I don’t know how well it’ll work or not. I
> > > propose to do the work and test this for over 2-3 months and see how
> well
> > > it works or not. At that time we can then decide whether it works or
> not
> > > (i.e whether the gains it brings are more important than the problems
> it
> > > causes).
> > > >
> > > > Here’s my +1 to try this out.
> > > >
> > > > Some links:
> > > > * pitest: http://pitest.org/
> > > > * descartes: https://github.com/STAMP-project/pitest-descartes
> > > > * http://massol.myxwiki.org/xwiki/bin/view/Blog/
> ControllingTestQuality
> > > > * http://massol.myxwiki.org/xwiki/bin/view/Blog/
> > MutationTestingDescartes
> > > >
> > > > If you’re curious, you can see a screenshot of a mutation score
> report
> > > at http://massol.myxwiki.org/xwiki/bin/download/Blog/
> > > MutationTestingDescartes/report.png
> > > >
> > > > Please cast your votes.
> > > >
> > > > Thanks
> > > > -Vincent
> > >
> > >
> > >
> > > --
> > > Thomas Mortagne
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Add new check to measure quality of tests

vmassol
Administrator
In reply to this post by vmassol
FYI I’ve implemented it locally for all modules of xwiki-commons and did some build time measurements:

* With pitest/descartes: 37:16 minutes
* Without pitest/descartes 5:10 minutes

So that’s a pretty important hit….

So I think one strategy could be to not run pitest/descartes by default in the quality profile (i.e. have it off by default with <xwiki.pitest.skip>true</xwiki.pitest.skip>) and run it on the CI, from time to time, like once per day for example, or once per week.

Small issue: I need to find/test a way to run a crontab type of job in a Jenkins pipeline script. I know how to do in theory but I need to test it and verify it works. I still have some doubts ATM...

WDYT?

Thanks
-Vincent

> On 15 Mar 2018, at 09:30, Vincent Massol <[hidden email]> wrote:
>
> Hi devs,
>
> As part of the STAMP research project, we’ve developed a new tool (Descartes, based on Pitest) to measure the quality of tests. It generates a mutation score for your tests, defining how good the tests are. Technical Descartes performs some extreme mutations on the code under test (e.g. remove content of void methods, return true for methods returning a boolean, etc - See https://github.com/STAMP-project/pitest-descartes). If the test continues to pass then it means it’s not killing the mutant and thus its mutation score decreases.
>
> So in short:
> * Jacoco/Clover: measure how much of the code is tested
> * Pitest/Descartes: measure how good the tests are
>
> Both provide a percentage value.
>
> I’m proposing to compute the current mutation scores for xwiki-commons and xwiki-rendering and fail the build when new code is added that reduce the mutation score threshold (exactly the same as our jacoco threshold and strategy).
>
> I consider this is an experiment to push the limit of software engineering a bit further. I don’t know how well it’ll work or not. I propose to do the work and test this for over 2-3 months and see how well it works or not. At that time we can then decide whether it works or not (i.e whether the gains it brings are more important than the problems it causes).
>
> Here’s my +1 to try this out.
>
> Some links:
> * pitest: http://pitest.org/
> * descartes: https://github.com/STAMP-project/pitest-descartes
> * http://massol.myxwiki.org/xwiki/bin/view/Blog/ControllingTestQuality
> * http://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes
>
> If you’re curious, you can see a screenshot of a mutation score report at http://massol.myxwiki.org/xwiki/bin/download/Blog/MutationTestingDescartes/report.png
>
> Please cast your votes.
>
> Thanks
> -Vincent

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Add new check to measure quality of tests

vmassol
Administrator


> On 27 Mar 2018, at 19:32, Vincent Massol <[hidden email]> wrote:
>
> FYI I’ve implemented it locally for all modules of xwiki-commons and did some build time measurements:
>
> * With pitest/descartes: 37:16 minutes
> * Without pitest/descartes 5:10 minutes

Actually I was able to reduce the time to 15:12 minutes with configuring pitest with 4 threads.

Thanks
-Vincent

>
> So that’s a pretty important hit….
>
> So I think one strategy could be to not run pitest/descartes by default in the quality profile (i.e. have it off by default with <xwiki.pitest.skip>true</xwiki.pitest.skip>) and run it on the CI, from time to time, like once per day for example, or once per week.
>
> Small issue: I need to find/test a way to run a crontab type of job in a Jenkins pipeline script. I know how to do in theory but I need to test it and verify it works. I still have some doubts ATM...
>
> WDYT?
>
> Thanks
> -Vincent
>
>> On 15 Mar 2018, at 09:30, Vincent Massol <[hidden email]> wrote:
>>
>> Hi devs,
>>
>> As part of the STAMP research project, we’ve developed a new tool (Descartes, based on Pitest) to measure the quality of tests. It generates a mutation score for your tests, defining how good the tests are. Technical Descartes performs some extreme mutations on the code under test (e.g. remove content of void methods, return true for methods returning a boolean, etc - See https://github.com/STAMP-project/pitest-descartes). If the test continues to pass then it means it’s not killing the mutant and thus its mutation score decreases.
>>
>> So in short:
>> * Jacoco/Clover: measure how much of the code is tested
>> * Pitest/Descartes: measure how good the tests are
>>
>> Both provide a percentage value.
>>
>> I’m proposing to compute the current mutation scores for xwiki-commons and xwiki-rendering and fail the build when new code is added that reduce the mutation score threshold (exactly the same as our jacoco threshold and strategy).
>>
>> I consider this is an experiment to push the limit of software engineering a bit further. I don’t know how well it’ll work or not. I propose to do the work and test this for over 2-3 months and see how well it works or not. At that time we can then decide whether it works or not (i.e whether the gains it brings are more important than the problems it causes).
>>
>> Here’s my +1 to try this out.
>>
>> Some links:
>> * pitest: http://pitest.org/
>> * descartes: https://github.com/STAMP-project/pitest-descartes
>> * http://massol.myxwiki.org/xwiki/bin/view/Blog/ControllingTestQuality
>> * http://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes
>>
>> If you’re curious, you can see a screenshot of a mutation score report at http://massol.myxwiki.org/xwiki/bin/download/Blog/MutationTestingDescartes/report.png
>>
>> Please cast your votes.
>>
>> Thanks
>> -Vincent
>

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Add new check to measure quality of tests

vmassol
Administrator
If you’re curious I’ve also added some more information at https://github.com/STAMP-project/pitest-descartes/issues/54

ATM, and even with a time of “only” 10mn to run pitest/descartes on xwiki-commons, it’s still too much IMO so I’ll work on turning it off by default but having a job to run it on the CI.

Let me know if you have remarks.

Thanks
-Vincent

> On 27 Mar 2018, at 20:09, Vincent Massol <[hidden email]> wrote:
>
>
>
>> On 27 Mar 2018, at 19:32, Vincent Massol <[hidden email]> wrote:
>>
>> FYI I’ve implemented it locally for all modules of xwiki-commons and did some build time measurements:
>>
>> * With pitest/descartes: 37:16 minutes
>> * Without pitest/descartes 5:10 minutes
>
> Actually I was able to reduce the time to 15:12 minutes with configuring pitest with 4 threads.
>
> Thanks
> -Vincent
>
>>
>> So that’s a pretty important hit….
>>
>> So I think one strategy could be to not run pitest/descartes by default in the quality profile (i.e. have it off by default with <xwiki.pitest.skip>true</xwiki.pitest.skip>) and run it on the CI, from time to time, like once per day for example, or once per week.
>>
>> Small issue: I need to find/test a way to run a crontab type of job in a Jenkins pipeline script. I know how to do in theory but I need to test it and verify it works. I still have some doubts ATM...
>>
>> WDYT?
>>
>> Thanks
>> -Vincent
>>
>>> On 15 Mar 2018, at 09:30, Vincent Massol <[hidden email]> wrote:
>>>
>>> Hi devs,
>>>
>>> As part of the STAMP research project, we’ve developed a new tool (Descartes, based on Pitest) to measure the quality of tests. It generates a mutation score for your tests, defining how good the tests are. Technical Descartes performs some extreme mutations on the code under test (e.g. remove content of void methods, return true for methods returning a boolean, etc - See https://github.com/STAMP-project/pitest-descartes). If the test continues to pass then it means it’s not killing the mutant and thus its mutation score decreases.
>>>
>>> So in short:
>>> * Jacoco/Clover: measure how much of the code is tested
>>> * Pitest/Descartes: measure how good the tests are
>>>
>>> Both provide a percentage value.
>>>
>>> I’m proposing to compute the current mutation scores for xwiki-commons and xwiki-rendering and fail the build when new code is added that reduce the mutation score threshold (exactly the same as our jacoco threshold and strategy).
>>>
>>> I consider this is an experiment to push the limit of software engineering a bit further. I don’t know how well it’ll work or not. I propose to do the work and test this for over 2-3 months and see how well it works or not. At that time we can then decide whether it works or not (i.e whether the gains it brings are more important than the problems it causes).
>>>
>>> Here’s my +1 to try this out.
>>>
>>> Some links:
>>> * pitest: http://pitest.org/
>>> * descartes: https://github.com/STAMP-project/pitest-descartes
>>> * http://massol.myxwiki.org/xwiki/bin/view/Blog/ControllingTestQuality
>>> * http://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes
>>>
>>> If you’re curious, you can see a screenshot of a mutation score report at http://massol.myxwiki.org/xwiki/bin/download/Blog/MutationTestingDescartes/report.png
>>>
>>> Please cast your votes.
>>>
>>> Thanks
>>> -Vincent
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Add new check to measure quality of tests

Thomas Mortagne
Administrator
On Wed, Mar 28, 2018 at 1:25 PM, Vincent Massol <[hidden email]> wrote:
> If you’re curious I’ve also added some more information at https://github.com/STAMP-project/pitest-descartes/issues/54
>
> ATM, and even with a time of “only” 10mn to run pitest/descartes on xwiki-commons, it’s still too much IMO so I’ll work on turning it off by default but having a job to run it on the CI.

Sounds good. I agree that we can't enable it all the time given the time lost.

>
> Let me know if you have remarks.
>
> Thanks
> -Vincent
>
>> On 27 Mar 2018, at 20:09, Vincent Massol <[hidden email]> wrote:
>>
>>
>>
>>> On 27 Mar 2018, at 19:32, Vincent Massol <[hidden email]> wrote:
>>>
>>> FYI I’ve implemented it locally for all modules of xwiki-commons and did some build time measurements:
>>>
>>> * With pitest/descartes: 37:16 minutes
>>> * Without pitest/descartes 5:10 minutes
>>
>> Actually I was able to reduce the time to 15:12 minutes with configuring pitest with 4 threads.
>>
>> Thanks
>> -Vincent
>>
>>>
>>> So that’s a pretty important hit….
>>>
>>> So I think one strategy could be to not run pitest/descartes by default in the quality profile (i.e. have it off by default with <xwiki.pitest.skip>true</xwiki.pitest.skip>) and run it on the CI, from time to time, like once per day for example, or once per week.
>>>
>>> Small issue: I need to find/test a way to run a crontab type of job in a Jenkins pipeline script. I know how to do in theory but I need to test it and verify it works. I still have some doubts ATM...
>>>
>>> WDYT?
>>>
>>> Thanks
>>> -Vincent
>>>
>>>> On 15 Mar 2018, at 09:30, Vincent Massol <[hidden email]> wrote:
>>>>
>>>> Hi devs,
>>>>
>>>> As part of the STAMP research project, we’ve developed a new tool (Descartes, based on Pitest) to measure the quality of tests. It generates a mutation score for your tests, defining how good the tests are. Technical Descartes performs some extreme mutations on the code under test (e.g. remove content of void methods, return true for methods returning a boolean, etc - See https://github.com/STAMP-project/pitest-descartes). If the test continues to pass then it means it’s not killing the mutant and thus its mutation score decreases.
>>>>
>>>> So in short:
>>>> * Jacoco/Clover: measure how much of the code is tested
>>>> * Pitest/Descartes: measure how good the tests are
>>>>
>>>> Both provide a percentage value.
>>>>
>>>> I’m proposing to compute the current mutation scores for xwiki-commons and xwiki-rendering and fail the build when new code is added that reduce the mutation score threshold (exactly the same as our jacoco threshold and strategy).
>>>>
>>>> I consider this is an experiment to push the limit of software engineering a bit further. I don’t know how well it’ll work or not. I propose to do the work and test this for over 2-3 months and see how well it works or not. At that time we can then decide whether it works or not (i.e whether the gains it brings are more important than the problems it causes).
>>>>
>>>> Here’s my +1 to try this out.
>>>>
>>>> Some links:
>>>> * pitest: http://pitest.org/
>>>> * descartes: https://github.com/STAMP-project/pitest-descartes
>>>> * http://massol.myxwiki.org/xwiki/bin/view/Blog/ControllingTestQuality
>>>> * http://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes
>>>>
>>>> If you’re curious, you can see a screenshot of a mutation score report at http://massol.myxwiki.org/xwiki/bin/download/Blog/MutationTestingDescartes/report.png
>>>>
>>>> Please cast your votes.
>>>>
>>>> Thanks
>>>> -Vincent
>>>
>>
>



--
Thomas Mortagne
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Add new check to measure quality of tests

vmassol
Administrator
In reply to this post by vmassol
FYI I’ve now committed this on issue https://jira.xwiki.org/browse/XCOMMONS-1385 for xwiki-commons.

And I’ve created an adhoc job at http://ci.xwiki.org/job/xwiki-commons_pitest/ which executes PIT/Descartes (to be moved to our Jenkins pipeline later on).

Let’s now make sure we fix the build when this job breaks and verify if the strategy works or not!

Thanks
-Vincent



> On 15 Mar 2018, at 09:30, Vincent Massol <[hidden email]> wrote:
>
> Hi devs,
>
> As part of the STAMP research project, we’ve developed a new tool (Descartes, based on Pitest) to measure the quality of tests. It generates a mutation score for your tests, defining how good the tests are. Technical Descartes performs some extreme mutations on the code under test (e.g. remove content of void methods, return true for methods returning a boolean, etc - See https://github.com/STAMP-project/pitest-descartes). If the test continues to pass then it means it’s not killing the mutant and thus its mutation score decreases.
>
> So in short:
> * Jacoco/Clover: measure how much of the code is tested
> * Pitest/Descartes: measure how good the tests are
>
> Both provide a percentage value.
>
> I’m proposing to compute the current mutation scores for xwiki-commons and xwiki-rendering and fail the build when new code is added that reduce the mutation score threshold (exactly the same as our jacoco threshold and strategy).
>
> I consider this is an experiment to push the limit of software engineering a bit further. I don’t know how well it’ll work or not. I propose to do the work and test this for over 2-3 months and see how well it works or not. At that time we can then decide whether it works or not (i.e whether the gains it brings are more important than the problems it causes).
>
> Here’s my +1 to try this out.
>
> Some links:
> * pitest: http://pitest.org/
> * descartes: https://github.com/STAMP-project/pitest-descartes
> * http://massol.myxwiki.org/xwiki/bin/view/Blog/ControllingTestQuality
> * http://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes
>
> If you’re curious, you can see a screenshot of a mutation score report at http://massol.myxwiki.org/xwiki/bin/download/Blog/MutationTestingDescartes/report.png
>
> Please cast your votes.
>
> Thanks
> -Vincent

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Add new check to measure quality of tests

vmassol
Administrator
Note: Now that this is done for xwiki-commons, I’d like to also do that for xwiki-rendering when I get the time.

Thanks
-Vincent

> On 4 Apr 2018, at 16:14, Vincent Massol <[hidden email]> wrote:
>
> FYI I’ve now committed this on issue https://jira.xwiki.org/browse/XCOMMONS-1385 for xwiki-commons.
>
> And I’ve created an adhoc job at http://ci.xwiki.org/job/xwiki-commons_pitest/ which executes PIT/Descartes (to be moved to our Jenkins pipeline later on).
>
> Let’s now make sure we fix the build when this job breaks and verify if the strategy works or not!
>
> Thanks
> -Vincent
>
>
>
>> On 15 Mar 2018, at 09:30, Vincent Massol <[hidden email]> wrote:
>>
>> Hi devs,
>>
>> As part of the STAMP research project, we’ve developed a new tool (Descartes, based on Pitest) to measure the quality of tests. It generates a mutation score for your tests, defining how good the tests are. Technical Descartes performs some extreme mutations on the code under test (e.g. remove content of void methods, return true for methods returning a boolean, etc - See https://github.com/STAMP-project/pitest-descartes). If the test continues to pass then it means it’s not killing the mutant and thus its mutation score decreases.
>>
>> So in short:
>> * Jacoco/Clover: measure how much of the code is tested
>> * Pitest/Descartes: measure how good the tests are
>>
>> Both provide a percentage value.
>>
>> I’m proposing to compute the current mutation scores for xwiki-commons and xwiki-rendering and fail the build when new code is added that reduce the mutation score threshold (exactly the same as our jacoco threshold and strategy).
>>
>> I consider this is an experiment to push the limit of software engineering a bit further. I don’t know how well it’ll work or not. I propose to do the work and test this for over 2-3 months and see how well it works or not. At that time we can then decide whether it works or not (i.e whether the gains it brings are more important than the problems it causes).
>>
>> Here’s my +1 to try this out.
>>
>> Some links:
>> * pitest: http://pitest.org/
>> * descartes: https://github.com/STAMP-project/pitest-descartes
>> * http://massol.myxwiki.org/xwiki/bin/view/Blog/ControllingTestQuality
>> * http://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes
>>
>> If you’re curious, you can see a screenshot of a mutation score report at http://massol.myxwiki.org/xwiki/bin/download/Blog/MutationTestingDescartes/report.png
>>
>> Please cast your votes.
>>
>> Thanks
>> -Vincent
>