ESEM 2017: EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT
PROGRAM FOR THURSDAY, NOVEMBER 9TH
Days:
next day
all days

View: session overviewtalk overview

09:30-10:00 Session 2: Presentation
Location: Markham Ballroom A/B
09:30
An empirical analysis of FLOSS repositories to compare One-Time Contributors to Core and Periphery Developers

ABSTRACT. Context: Free/Libre Open Source Software (FLOSS) communities consist of different types of contributors. Core contributors and peripheral contributors work together to create a successful project, each playing a different role. One-Time Contributors (OTCs), who are on the very fringe of the peripheral developers, are largely unstudied despite offering unique insights into the overall development process. In a prior survey, we identified OTCs and discovered their motivations and barriers. Aims: The objective of this study is to corroborate the survey results and provide a better understand of OTCs. We compare OTCs to other peripheral and core contributors to determine that they are distinct. Method: We mined data from the same code-review repository used to generate survey respondents in our previous study. After identifying each contributor as core, periphery, or OTC, we compared them in terms of patch size, time interval from patch submission to reply, the nature of their conversations, and their patch acceptance rates. Results: We identified a continuum between core developers and OTCs. OTCs create smaller patches, face longer time intervals between patch submission and rejection, have longer review conversations, and face lower patch acceptance rates. Conversely, core contributors create larger patches, face shorter time intervals for feedback, have shorter review conversations, and have patches accepted at the highest rate. The peripheral developers fall in between the OTCs and the core contributors. Conclusion: OTCs do, in fact, face the barriers identified in our prior survey. They represent a distinct group of contributors compared to core and peripheral developers.

10:00-10:30Coffee Break
10:30-12:00 Session 3A: Prediction/ Estimation Models
Location: Markham Ballroom A/B
10:30
Code churn: A neglected metric in effort-aware Just-In-Time defect prediction

ABSTRACT. A recent study by Yang et al. leveraged individual change metrics to build unsupervised Just-in-Time (JIT) defect prediction model. They found that many unsupervised models perform similarly to or better than the state-of-the-art supervised models in effort-aware JIT defect prediction. However, Churn as the most important change size metric was not used for evaluating unsupervised models. In this study, consistent with Yang et al., we first use Churn to build an unsupervised model. Then, we evaluate the prediction performance of the churn based unsupervised model against the state-of-the-art supervised and unsupervised models under the following three prediction settings: Cross-validation, time-wise cross-validation, and cross-project prediction. Based on six open-source projects, our experiment results show that the churn based unsupervised model performs better than all the state-of-the-art supervised and unsupervised models. The result suggests that future JIT defect prediction research should use churn based unsupervised model as the baseline model for comparison when a novel model is proposed.

11:00
Security Vulnerabilities in Categories of Clones and Non-Cloned Code: An Empirical Study

ABSTRACT. Background: Software security has drawn immense importance in the recent years. While efforts are expected in minimizing security vulnerabilities in source code, the developers’ practice of code cloning often causes multiplication of such vulnerabilities and program faults. Although previous studies examined the bug-proneness, stability, and changeability of clones against non-cloned code, the security aspects remained ignored. Aims: The objective of this work is to explore and understand the security vulnerabilities and their severity in different types of clones compared to non-clone code. Method: Using a state- of-the-art clone detector and two reputed security vulnerability detection tools, we detect clones and vulnerabilities in 8.7 million lines of code over 34 software systems. We perform a comparative study of the vulnerabilities identified in different types of clones and non-cloned code. The results are derived based on quantitative analyses with statistical significance. Results: Our study reveals that the security vulnerabilities found in code clones have higher severity of security risks compared to those in non-cloned code. However, the proportion (i.e., density) of vulnerabilities in clones and non-cloned code does not have any significant difference. Conclusion: The findings from this work add to our understanding of the characteristics and impacts of clones, which will be useful in clone-aware software development with improved software security.

11:30
Early Phase Cost Models for Agile Software Processes in the US DoD

ABSTRACT. Background: Software effort estimates are necessary and critical at an early phase for decision makers to establish initial budgets, and in a government context to select the most competitive bidder for a contract. The challenges are that estimated software requirements is the only size information available at this stage, compounded with the newly increasing adoption of agile processes in the US DoD. Aims: The objectives are improving cost estimation by investigating available sizing measures, and providing practical effort estimation models for agile software development projects during the contract bidding phase or earlier. Method: The analysis explores the effects of independent variables for product size, peak staff, and domain on effort. The empirical data for model calibration is from 20 industrial projects recently completed in the last 2 years for the US DoD, among a larger dataset of recent projects using other lifecycle processes. Results: Statistical results showed that initial software requirements is a valid size metric for estimating both agile and traditional software development effort. Prediction accuracy improves when peak staff and domain are added as inputs to the cost models. Conclusion: These models may be used for estimates of agile projects, and evaluating software development contract cost proposals with inputs available during the bidding phase or earlier before contract award.

10:30-12:00 Session 3B: Infrastructures
Location: Markham Ballroom C
10:30
Automatic Building of Java Projects in Software Repositories: A Study on Feasibility and Challenges

ABSTRACT. Despite the advancement in software build tools such as Maven and Gradle, human involvement is still often required in software building. To engage in large-scale advanced program analysis and data mining of software artifacts, software engineering researchers need to have a large corpus of built software, so automatic software building becomes essential to improve research productivity. In this paper, we present the first feasibility study on automatic software building. Particularly, we first put state-of-the-art build automation tools (Maven, Ant and Gradle) to the test by automatically executing their respective default build commands on top 200 Java projects from Github. Next, we focused on the 86 projects that failed this initial automated build attempt, manually examining and determining correct build sequences to build each of these projects.We present a detailed build failure taxonomy from these build results and also show that at least 57% build failures can be automatically resolved.

11:00
Coding in Your Browser: Characterizing Programming Behavior in Cloud Based IDEs

ABSTRACT. Background: Cloud based integrated development environments (IDEs) are rapidly gaining popularity for its native support and potential to accelerate DevOps. However, there is little research of how developers behave when interacting with these environments. Aims: To develop empirical knowledge about how developers behave when interacting with cloud based IDEs to deal with programming tasks at various difficulty levels. Method: We conducted a laboratory user study using a cloud based IDE, JazzHub. We collected and coded session trace data, self-reported effort and frustration levels, and screen recordings. Results: We built a Markov activity transition model that describes the transitions among common development activities such as coding, debugging, and searching for information, and also captures extended interactions with remote resources. We also correlated activity transition with different code growth trajectories. Conclusion: The findings are an early step toward realizing the potential for enhanced interactions in cloud based IDEs. Our study provides empirical evidence that may inspire the future evolution of cloud based IDE designs and features.

11:30
(Journal First) Improving the delivery cycle: A multiple-case study of the toolchains in Finnish software intensive enterprises

ABSTRACT. (JOURNAL FIRST: https://doi.org/10.1016/j.infsof.2016.09.001)

Context: Software companies seek to gain benefit from agile development approaches in order to meet evolving market needs without losing their innovative edge. Agile practices emphasize frequent releases with the help of an automated toolchain from code to delivery. Objective: We investigate, which tools are used in software delivery, what are the reasons omitting certain parts of the toolchain and what implications toolchains have on how rapidly software gets delivered to customers. Method: We present a multiple-case study of the toolchains currently in use in Finnish software-intensive organizations interested in improving their delivery frequency. We conducted qualitative semi-structured interviews in 18 case organizations from various software domains. The interviewees were key representatives of their organization, considering delivery activities. Results: Commodity tools, such as version control and continuous integration, were used in almost every organization. Modestly used tools, such as UI testing and performance testing, were more distinctly missing from some organizations. Uncommon tools, such as artifact repository and acceptance testing, were used only in a minority of the organizations. Tool usage is affected by the state of current workflows, manual work and relevancy of tools. Organizations whose toolchains were more automated and contained fewer manual steps were able to deploy software more rapidly. Conclusions: There is variety in the need for tool support in different development steps as there are domain-specific differences in the goals of the case organizations. Still, a well-founded toolchain supports speedy delivery of new software.

10:30-12:00 Session 3C: Code Smells
Location: Butternut/Holly
10:30
An Empirical Examination of the Relationship Between Code Smells and Merge Conflicts

ABSTRACT. Background: Merge conflicts are a common occurrence in software development. Researchers have shown the negative impact of conflicts on the resulting code quality and the development workflow. Thus far, no one has investigated the effect of bad design (code smells) on merge conflicts. Aims: We posit that entities that exhibit certain types of code smells are more likely to be involved in a merge conflict. We also postulate that code elements that are both “smelly” and involved in a merge conflict are associated with other undesirable effects (more likely to be buggy). Method: We mined 143 repositories from GitHub and recreated 6,979 merge conflicts to obtain metrics about code changes and conflicts. We categorized conflicts into semantic or non-semantic, based on whether changes affected the Abstract Syntax Tree. For each conflicting change, we calculate the number of code smells and the number of future bug-fixes associated with the affected lines of code. Results: We found that entities that are smelly are three times more likely to be involved in merge conflicts. Method-level code smells (Blob Operation and Internal Duplication) are highly correlated with semantic conflicts. We also found that code that is smelly and experiences merge conflicts is more likely to be buggy. Conclusion: Bad code design not only impacts maintainability, it also impacts the day to day operations of a project, such as merging contributions, and negatively impacts the quality of the resulting code. Our findings indicate that research is needed to identify better ways to support merge conflict resolution to minimize its effect on code quality.

11:00
On the Influence of Human Factors for Identifying Code Smells: A Multi-Trial Empirical Study

ABSTRACT. Context: Code smells are symptoms in the source code that represent poor design choices. Professional developers often perceive several types of code smells as indicators of actual design problems. However, the identification of code smells involves multiple steps that are subjective in nature, requiring the engagement of humans. Human factors are likely to play a key role in the precise identification of code smells in industrial settings. Unfortunately, there is limited knowledge about the influence of human factors on smell identification. Goal: We aim at investigating whether the precision of smell identification is influenced by three key human factors, namely reviewer’s professional background, reviewer’s module knowledge and collaboration of reviewers during the task. We also aim at deriving recommendations for allocating human resources to smell identification tasks. Method: We performed 19 comparisons among different subsamples from two trials of an empirical study on code smell identification. One trial was conducted in industrial settings while the other had involved graduate students. The diversity of the samples allowed us to analyze the influence of the three factors in isolation and in conjunction. Results: We found that (i) reviewers’ collaboration significantly increases the precision of smell identification, but (ii) some professional background is required from the reviewers to reach high precision. Surprisingly, we also found that: (iii) having previous knowledge of the reviewed module does not affect the precision of experienced reviewers. However, this factor was influential on successful identification of more complex smells. Conclusion: We expect that our findings are helpful to support researchers in conducting proper experimental procedures in the future. Besides, they may also be useful for supporting project managers in allocating resources for smell identification tasks.

11:30
What if I had no smells?

ABSTRACT. What would have happened if I did not have any code smell? This is an interesting question that no previous study, to the best of our knowledge, tried to answer. In this paper we presented a method for implementing a what-if scenario analysis estimating the number of defective files in absence of smells. Our industrial case study shows that 20% of the total defective files were likely avoidable by avoiding smells. Such estimation needs to be used with the due care though as it is based on a hypothetical history (i.e., zero number of smells and same process and product change characteristics). Specifically, the number of defective files could even increase for some types of smells. In addition, we note that in some circumstances, accepting code with smells might still be a good option for a company.

12:00-13:00Lunch Break
13:00-14:30 Session 4A: Testing
Location: Markham Ballroom A/B
13:00
Introducing automated GUI testing and observing its benefits: an industrial case study in the context of law-practice management software

ABSTRACT. Motivated by a real-world industrial need in the context of a large IT solutions company based in Turkey, the authors and their colleagues conducted an industry-academia collaborative project and have developed and introduced automated test suites for GUI testing of two large-scale law-practice management software (comprising of 414 and 105 KLOC). We report in this paper our experience in developing and introducing a set of large automated test suites (more than 50 KLOC in total), using best practices in state-of-the art and –practice, and to report its observed benefits by conducting cost-benefit analysis in the specific industrial context. The project was conducted based on the principles of case-study and “action research” in which the real industrial needs drove the research. Among the best practices that we used are the followings: (1) the page-object test pattern, (2) modularity in test code, (3) creating test-specific libraries, and (4) using systematic guidelines to decide when and what (test cases) to automate. To assess the cost-benefit and Return On Investment (ROI) of test automation, we followed a hybrid measurement and assessed both the quantitative and qualitative (intangible) benefits of test automation. The empirical findings showed that the automated GUI testing approach has indeed benefitted the test and QA team in the company under study and automation has been highly welcome by the test engineers. By serving as a success story and experience report in development and introduction of automated test suites in an industrial setting, this paper adds to the body of evidence in this area and it aims at sharing both technical (e.g., using automated test patterns) and process aspects (e.g., test process improvement) of our project with other practitioners and researchers with the hope of encouraging more industry-academia collaborations in test automation.

13:30
(Journal First) Maintenance of automated test suites in industry: An empirical study on Visual GUI Testing

ABSTRACT. (JOURNAL FIRST: https://doi.org/10.1016/j.infsof.2016.01.012)

Context: Verification and validation (V&V) activities make up 20–50% of the total development costs of a software system in practice. Test automation is proposed to lower these V&V costs but available research only provides limited empirical data from industrial practice about the maintenance costs of automated tests and what factors affect these costs. In particular, these costs and factors are unknown for automated GUI-based testing. Objective: This paper addresses this lack of knowledge through analysis of the costs and factors associated with the maintenance of automated GUI-based tests in industrial practice. Method: An empirical study at two companies, Siemens and Saab, is reported where interviews about, and empirical work with, Visual GUI Testing is performed to acquire data about the technique’s maintenance costs and feasibility. Results: 13 factors are observed that affect maintenance, e.g. tester knowledge/experience and test case complexity. Further, statistical analysis shows that developing new test scripts is costlier than maintenance but also that frequent maintenance is less costly than infrequent, big bang maintenance. In addition a cost model, based on previous work, is presented that estimates the time to positive return on investment (ROI) of test automation compared to manual testing. Conclusions: It is concluded that test automation can lower overall software development costs of a project while also having positive effects on software quality. However, maintenance costs can still be considerable and the less time a company currently spends on manual testing, the more time is required before positive, economic, ROI is reached after automation.

14:00
Would You Like to Motivate Software Testers? Ask Them How

ABSTRACT. Context. Considering the importance of Software Testing to the development of high quality and reliable software systems this paper aims to investigate an important human aspect related to this specific activity in software development, the motivation of software testers. Method. We applied a questionnaire developed based on a previous theory of motivation and satisfaction of software engineers to conduct a survey-based study to explore and understand how software testers perceive and value work-related factors that could influence their motivation at work. Results. With a sample of 80 software testers we observed that these professionals are strongly motivated by variety of work, creative tasks, and activities that allow them to acquire new knowledge, but in general the social impact of this activity has low influence on their motivation. Conclusion. This study discusses the difference of opinions among experienced professionals and those at the beginning of the carrier, which can be relevant for managers and leaders regarding the motivation of the team.

13:00-14:30 Session 4B: Qualitative Research I
Location: Butternut/Holly
13:00
Characterizing Software Developers by Perceptions of Productivity

ABSTRACT. Understanding developer productivity is important to deliver software on time and at reasonable cost. Yet, there are numerous definitions of productivity and, as previous research found, productivity means different things to different developers. In this paper, we analyze the variation in productivity perceptions based on an online survey with 413 professional software developers at Microsoft. Through a cluster analysis, we identify and describe six groups of developers with similar perceptions of productivity: social, lone, focused, balanced, leading, and goal-oriented developers. We discuss design implications of these clusters for tools to support developers’ productivity.

13:30
Beyond Continuous Delivery: An Empirical Investigation of Continuous Deployment Challenges

ABSTRACT. Context: A growing number of software organizations have been adopting Continuous DElivery (CDE) and Continuous Deployment (CD) practices. Researchers have started investing significant efforts in studying different aspects of CDE and CD. Many studies refer CDE (i.e., where an application is potentially capable of being deployed) and CD (i.e., where an application is automatically deployed on every update) as synonyms and do not distinguish them from each other. Despite CDE being successfully adopted by a large number of organizations, it is not empirically known why organizations still are unable or demotivated to have automatic and continuous deployment (i.e., CD practice). Goal: This study aims at empirically investigating and classifying the factors that may impact on adopting and implementing CD practice. Method: We conducted a mixed-method empirical study consisting of interviewing 21 software practitioners, followed by a survey with 98 respondents. Results: Our study reveals 11 confounding factors that limit or demotivate software organizations to push changes automatically and continuously to production. The most important ones are “lack of automated (user) acceptance test”, “manual quality check”, “deployment as business decision”, “insufficient level of automated test coverage”, and “highly bureaucratic deployment process”. Conclusion: Our findings highlight several areas for future research and provide suggestions for practitioners to streamline deployment process.

14:00
(Journal First) Benefits and drawbacks of software reference architectures: A case study

ABSTRACT. (JOURNAL FIRST: https://doi.org/10.1016/j.infsof.2017.03.011)

Context: Software Reference Architectures (SRAs) play a fundamental role for organizations whose business greatly depends on the efficient development and maintenance of complex software applications. However, little is known about the real value and risks associated with SRAs in industrial practice. Objective: To investigate the current industrial practice of SRAs in a single company from the perspective of different stakeholders. Method: An exploratory case study that investigates the benefits and drawbacks perceived by relevant stakeholders in nine SRAs designed by a multinational software consulting company. Results: The study shows the perceptions of different stakeholders regarding the benefits and drawbacks of SRAs (e.g., both SRA designers and users agree that they benefit from reduced development costs; on the contrary, only application builders strongly highlighted the extra learning curve as a drawback associated with mastering SRAs). Furthermore, some of the SRA benefits and drawbacks commonly highlighted in the literature were remarkably not mentioned as a benefit of SRAs (e.g., the use of best practices). Likewise, other aspects arose that are not usually discussed in the literature, such as higher time-to-market for applications when their dependencies on the SRA are managed inappropriately. Conclusions: This study aims to help practitioners and researchers to better understand real SRAs projects and the contexts where these benefits and drawbacks appeared, as well as some SRA improvement strategies. This would contribute to strengthening the evidence regarding SRAs and support practitioners in making better informed decisions about the expected SRA benefits and drawbacks. Furthermore, we make available the instruments used in this study and the anonymized data gathered to motivate others to provide similar evidence to help mature SRA research and practice.

13:00-14:30 Session 4C: Change/ Issue Management I
Location: Markham Ballroom C
13:00
Where is the Road for Issue Reports Classification Based on Text Mining?

ABSTRACT. Currently, open source projects receive various kinds of issues daily, because of the extreme openness of Issue Tracking System (ITS) in GitHub. ITS is a labor-intensive and time-consuming task of issue categorization for project managers. However, a contributor is only required a short textual abstract to report an issue in GitHub. Thus, most traditional classification approaches based on detailed and structured data (e.g., priority, severity, software version and so on) are difficult to adopt. In this paper, Issue classification approaches on a large-scale dataset, including 101 popular projects and over 252,000 issue reports collected from GitHub, were investigated First, four traditional text-based classification methods and their performances were discussed. Semantic perplexity (i.e., a issues description confuses bug-related sentences with nonbug-related sentences) is a crucial factor that affects the classification performances based on quantitative and qualitative study. Finally, A two-stage classifier framework based on the novel metrics of semantic perplexity of issue reports was designed. Results show that our two-stage classification can significantly improve issue classification performances.

13:30
Predicting the Vector Impact of Change - An Industrial Case Study at Brightsquid

ABSTRACT. Background: Understanding and controlling the impact of change decides about the success or failure of evolving products. The problem is magnified for start-ups operating with limited resources. Their usual focus is on Minimum Viable Product (MVP's) providing specialized functionality, thus have little expense available for handling changes.

Aims: Change Impact Analysis CIA refers to the identification of source code files which will be impacted when implementing a change request. We extend this question to predict not only impact on files, but also the effort needed for implementing the change, the duration needed for that.

Method: This study evaluates the performance of three textual techniques for CIA based on: Bag of words in combination with either topic modeling or file coupling.

Results: The approaches are applied on data from two industrial projects. The data comes as part of an industrial collaboration project with Brightsquid, a Canadian start-up company specializing in secure communication solutions. Performance analysis show that combining textual similarity with file coupling improves impact prediction significantly, and highest Recall of 67\% is achieved. Using only textual similarity effort and duration could be predicted for 84\% and 67\% of change requests respectively.

Conclusions: The relative effort invested into CIA for predicting impacted files can be reduced by extending its applicability to multiple dimensions which include impacted files, effort, and duration.

14:00
Managing Hidden Dependencies in OO Software: a Study Based on Open Source Projects

ABSTRACT. Dependency-based software change impact analysis is the domain concerned with estimating sets of artifacts impacted by a change to a related artifact. Research has shown that analysing the various class dependency types independently will not reveal a complete estimate of impact sets.Therefore, dependency types are combined to improve the precision of estimated when compared to impact sets. Software classes can be linked in different ways; for instance semantically, if their meaning is somewhat related or, structurally, if one class depends on the services of other classes.

'Hidden' dependencies arise when two classes, linked structurally, do not share the same semantic namespace or when semantically dependent classes do not share a structural link. With the goal of revealing hidden dependencies during change impact analysis, we empirically investigated the interplay between structural and semantic class dependencies in object-oriented software systems.

Results show that (i) semantic and structural links are significantly associated, (ii) the strengths of those links does not play a significant role and, (iii) a significant number of dependencies are hidden. We propose refactoring techniques to deal with hidden dependencies, based on existing design patterns. Our approach has the potential of reducing refactoring and testing effort.

14:30-15:00Coffee Break
15:00-16:30 Session 5A: Tools/ Frameworks
Location: Markham Ballroom A/B
15:00
STRESS: A Semi-Automated, Fully Replicable Approach for Project Selection

ABSTRACT. The mining of software repositories has provided significant advances in a multitude of software engineering fields, including defect prediction. Several studies show that the performance of a software engineering technology (e.g., prediction model) differs across different project repositories. Thus, it is important that the project selection is replicable. The aim of this paper is to present STRESS, a semi-automated and fully replicable approach that allows researchers to select projects by configuring the desired level of diversity, fit, and quality. STRESS records the rationale behind the researcher decisions and allows different users to re-run or modify such decisions. STRESS is open-source and it can be used used locally or even online (www.falessi.com/STRESS/). We perform a systematic mapping study that considers studies that analyzed projects managed with JIRA and Git to asses the project selection replicability of past studies. We validate the feasible application of STRESS in realistic research scenarios by applying STRESS to select projects among the 211 Apache Software Foundation projects. Our systematic mapping study results show that none of the 68 analyzed studies is completely replicable. Regarding STRESS, it successfully supported the project selection among all 211 ASF projects. It also supported the measurement of 100 projects characteristics, including the 32 criteria of the studies analyzed in our mapping study. The mapping study and STRESS are, to our best knowledge, the first attempt to investigate and support the replicability of project selection. We plan to extend them to other technologies such as GitHub.

15:15
Change-Aware Build Prediction Model for Stall Avoidance in Continuous Integration

ABSTRACT. Continuous Integration(CI) is a widely used development practice where developers integrate their work after submitting code changes at central repository. CI servers usually monitor central repository for code change submission and automatically build software with changed code, perform unit testing, integration testing and provide test summary report. If build or test fails developers fix those issues and submit the code changes. Continuous submission of code modification by developers and build latency time creates stalls at CI server build pipeline and hence developers have to wait long time to get build outcome. In this paper, we proposed build prediction model that uses TravisTorrent data set with build error log clustering and AST level code change modification data to predict whether a build will be successful or not without attempting actual build so that developer can get early build outcome result. With the proposed model, we can predict build outcome with over 90 percent accuracy for Ant and Maven and 87 percent accuracy for Gradle.

15:30
Delta-Bench: Differential Benchmark for Static Analysis Security Testing Tools

ABSTRACT. Background: SAST tools may be evaluated using synthetic micro benchmarks and benchmarks based on real-world software. Aims: The aim of this study is to address the limitations of the existing SAST tool benchmarks: lack of vulnerability realism, uncertain ground truth, and large amount of findings not related to analyzed vulnerability. Method: We propose Delta-Bench - a novel approach for the automatic construction of benchmarks for SAST tools based on differencing vulnerable and fixed versions in Free and Open Source (FOSS) repositories. To test our approach, we used 7 state of the art SAST tools against 70 revisions of four major versions of Apache Tomcat spanning 62 distinct CVE fixes and vulnerable files totalling over 100K LoCs as the source of ground truth vulnerabilities. Results: Our experiment allows us to draw interesting conclusions (e.g., tools perform differently due to the selected benchmark). Conclusions: Delta-Bench allows SAST tools to be automatically evaluated on the real-world historical vulnerabilities using only the findings that a tool produced for the analyzed vulnerability.

15:45
An Ontology-based Approach to Automate Tagging of Software Artifacts

ABSTRACT. Context: Software engineering repositories contain a wealth of textual information such as source code comments, developers’ discussions, commits, and bug reports. Common to these resources is that they contain information about security concerns, these concerns are often only discussed implicitly. Goal: Derive an approach to extract security concerns from textual information that can yield several benefits, such as bug management (e.g., prioritization), bug triage or capturing zero-day attack. Method: Propose a fully automated classification and tagging approach that can extract security tags from these texts without the need for manual training data. Results: We introduce an ontology based Software Security Tagger Framework that can automatically identify and classify cybersecurity-related entities, and concepts in text of software artifacts. Conclusion: Our preliminary results indicate that the framework can successfully extract and classify cybersecurity knowledge captured in unstructured text found in software artifacts.

16:00
REACT: An Approach for Capturing Rationale in Chat Messages

ABSTRACT. Background: Developers’ chat messages are a rich source of rationale behind development decisions. Rationale comprises valuable knowledge during software evolution for understanding and maintaining the software system. However, developers resist explicit methods for rationale capturing in practice, due to their intrusiveness and cognitive overhead. Aim: Our primary goal is to help developers capture rationale in chat messages with low effort. Further, we seek to encourage the collaborative capturing of rationale in development teams. Method: In this paper, we present REACT, a lightweight approach for annotating chat messages that contain rationale. To evaluate the feasibility of REACT, we conducted two studies. In the first study, we evaluated the approach with eleven development teams during a short-term design task. In the second study, we evaluated the approach with one development team over a duration of two months. In addition, we distributed a questionnaire to both study participants. Results: Our preliminary results show that REACT is easily learned and used by developers. Also, it encourages the collaborative capturing of rationale. Remarkably, the majority of participants do not perceive privacy as a barrier when capturing rationale from their informal communication. Conclusions: REACT is a first step towards enhancing rationale capturing in developers’ chat messages.

15:00-16:30 Session 5B: Research Methods
Location: Markham Ballroom C
15:00
Describing What Experimental Software Engineering Experts Do When They Design their Experiments – A Qualitative Study

ABSTRACT. Although there has been a significant amount of research focused on designing and conducting controlled experiments, few studies report how experienced software engineering researchers actually design and conduct their studies. This study aimed to offer a practical perspective from their viewpoint regarding controlled experiment planning. We collected data through semi-structured interviews from 11 researchers, and we used qualitative analysis methods from the grounded theory approach to analyze them. Although the complete study presents four research questions, in this paper, we answer the first one. As a result, we present a preliminary result about what these experts actually do when they design experiments. This work contributes to a better understanding of the practical performance of experimental software engineering.

15:15
Using a Visual Abstract as a Lens for Communicating and Promoting Design Science Research in Software Engineering

ABSTRACT. Much empirical software engineering research aims at producing prescriptive knowledge that helps software engineers improve their work or solve their problems. But deriving general knowledge from real world problem solving instances can be challenging. In this paper, we promote design science as a paradigm to support producing and communicating prescriptive knowledge. We propose a visual abstract template to communicate design science contributions and to highlight the main problem and solution constructs of their research, as well as present validity aspects of design knowledge. Our conceptualization of design science is derived from existing literature and were together with the visual abstract derived iteratively as we applied them to different examples of design science research. We present and discuss one example application of the visual abstract. This is work in progress and further evaluation by practitioners and researchers is encouraged.

15:30
Member Checking in Software Engineering Research: Lessons Learned from an Industrial Case Study

ABSTRACT. Context. Member checking can be defined as a phase during a qualitative research in which the researcher compares her interpretations and understanding obtained from the data analysis with the actual perceptions of the participants to increase accuracy and consistency of results. This is an important step for any qualitative research. However, considering a sample of 66 case studies developed and published in the context of software engineering, only 10 studies briefly described the use of this technique. Method. In this article, we present a set of lessons learned obtained from planning and performing member checking to validate the results of an industrial case study performed in a large software company. Results. Member checking was effective to validate the findings obtained from the qualitative case study and was also useful to revel important information not observed in the data analysis process. It has also shown to be effective to observe divergences among different groups of participants. Conclusion. We described how member checking was undertaken and discussed seven lessons learned in this process. We expect that our experience can be useful to software engineering researchers when performing member checking.

15:45
Investigating the Use of a Hybrid Search Strategy for Systematic Reviews

ABSTRACT. [Background] Systematic Literature Reviews (SLRs) are one of the important pillars when employing an evidence-based paradigm in Software Engineering. To date most SLRs have been conducted using a search strategy involving several digital libraries. However, significant issues have been reported for digital libraries and applying such search strategy requires substantial effort. On the other hand, snowballing has recently arisen as a potentially more efficient alternative solution. Nevertheless, it requires a relevant seed set of papers. [Aims] This paper proposes and evaluates a hybrid search strategy combining searching in a specific digital library (Scopus) with backward and forward snowballing. [Method] The proposed hybrid strategy was applied to two previously published SLRs that adopted database searches. We investigate whether it is able to retrieve the same included papers with lower effort in terms of the number of analysed papers. The two selected SLRs relate respectively to elicitation techniques (not confined to Software Engineering (SE)) and to a specific SE topic on cost estimation. [Results] Our results provide preliminary support for the proposed hybrid search strategy as being suitable for SLRs investigating a specific research topic within the SE domain. Furthermore, it helps overcoming existing issues with using digital libraries in SE. [Conclusions] The hybrid search strategy provides competitive results, similar to using several digital libraries. However, further investigation is needed to evaluate the hybrid search strategy.

16:00
Notifying and Involving Users in Experimentation: Ethical Perceptions of Software Practitioners

ABSTRACT. Background: Experiment-driven development with the help of real usage data helps to build software products and services that are of high value to their users. As more software companies use experimentation in their development practises, ethical concerns are increasingly important. Objective: There is a need for understanding the ethical issues companies must take into account when practising experimentation as a development strategy. This paper examines how ethical issues involved in experimentation are currently understood by practitioners in soft- ware development. Method: We conducted a survey within four software companies, inviting employees in different functional roles to indicate their attitudes and perceptions through a number of ethical statements. Results: Employees working in different roles have different viewpoints on ethical issues. While managers are more conscious about company-customer relationships, UX designers appear more familiar with involving users. Developers think that details of experiments can be withheld from users if the results depend on it. Conclusion: Barriers to successfully conducting experiment-driven development are different for different roles. Clear and specific guidelines are needed for ethical aspects of experimentation.

16:15
Reporting Ethics Considerations in Software Engineering Publications

ABSTRACT. Ethical guidelines of software engineering journals require authors to provide statements related to the conflict of interest and the process of obtaining consent (if human subjects are involved). The objective of this study is to review the reporting of the ethical considerations in Empirical Software Engineering - An International Journal. The results indicate that only two out of seven studies reported some ethical information however, not explicitly. The ethical discussions were focussed on anonymity and confidentiality. Ethical aspects such as competence, comprehensibility and vulnerability of the subjects were not discussed in any of the papers reviewed in this study. It is important to not only state that consent was obtained however, the procedure of obtaining consent should be reported to improve the accountability and trust.

15:00-16:30 Session 5C: Human Factors
Location: Butternut/Holly
15:00
Analysis of the understanding of the concepts of Task and Skill Variety by software engineering professionals

ABSTRACT. Context: In organizational psychology literature, Task Variety and Skill Variety are considered different aspects of work design. Albeit related to different aspects of the work, it is common to find strong correlations between these constructs. After applying the Work Design Questionnaire (WDQ) to a sample of 102 software professional, we found the similar correlations and conjectured that the correlations were partly due to a misunderstanding about what a task is, what a skill is, and what could be considered a variety of those concepts in the practice of software development. Goal: Our goal in this study was to investigate the actual existence and the possible sources of such misunderstanding. Method: We performed semi-structured interviews with software professionals that had previously participated in the application of the WDQ and analyzed the results using qualitative research techniques. We used a mix of random and purposive sampling to selected four software professionals among those with higher experience in software development. Results: We collected a rich amount of data about software engineering professionals and their views on the design of their work, both quantitative and qualitative. Quantitative data indicated a correlation between the constructs, and the similarity of our sample to other samples in the literature. Qualitative data directed us in why such correlation could have arisen in our sample. Conclusions: Our findings pointed out that the misunderstanding of such concepts may affect the results of the application of quantitative questionnaires that measure these constructs.

15:15
Understanding the Heterogeneity of Contributors in Bug Bounty Programs

ABSTRACT. Background: While bug bounty programs are not new in software development, an increasing number of companies, as well as open source projects, rely on external parties to perform the security assessment of their software for reward. However, there is relatively little empirical knowledge about the characteristics of bug bounty program contributors. Aim: This paper aims to understand those contributors by highlighting the heterogeneity among them. Method: We analyzed the histories of 82 bug bounty programs and 2,504 distinct bug bounty contributors, and conducted a quantitative and qualitative survey. Results: We found that there are project-specific and non-specific contributors who have different motivations for contributing to the products and organizations. Conclusions: Our findings provide insights to make bug bounty programs better and for further studies of new software development roles.

15:30
Autonomy in Software Engineering: A Preliminary Study on the Influence of Education Level and Professional Experience

ABSTRACT. Context: Software development process is executed by professionals with different roles, who are responsible for distinct activities. These roles can have different degrees of autonomy depending on some factors, such as the adopted process and hierarchy. Goal: This study aims to identify what factors can impact autonomy and also to investigate how autonomy is given to an employee based on two main factors: education level and professional experience. Methodology: Initially, a survey was carried out to understand how autonomy is perceived by software engineers, as well as by professionals from other areas. The next step was applying semi-structured interviews with software engineers to search for a better understanding of the quantitative findings. Results: In general, education level and professional experience do not have impact on autonomy. Only when autonomy is evaluated from the education level perspective, there is a significant difference among the respondents. Autonomy depends more on the experience that software engineer has in a current project than her overall professional experience. The development process adopted by the company also influences how autonomy is perceived. Conclusion: While professional qualification and experience are not directly related to autonomy, the lack of process and the amount of work experience on specific projects represent relevant factors to be aware of.

15:45
Team Maturity in Software Engineering Teams

ABSTRACT. Background: In Software Engineering (SE) the term maturity is often linked to the work process and product quality. In many cases, team maturity is seen as a backdrop to the process of SE, and sometimes as something that is known to exist, but which cannot be understood, neither measured accurately nor even dimension its value. Aim: In this article, we seek to understand the concept of mature teams in the context of SE, from the perspective of the software engineers themselves. Methods: We performed an exploratory qualitative research, collecting data from 26 practitioners, from 6 companies in 4 cities in Brazil. Data was analyzed using coding techniques from qualitative research. Results: Our findings pointed out three major dimensions of the concept of Team Maturity - Learning, Relationship and Technical Maturity – which, when kept in balance, can enhance the collective productivity, product quality and also the customer satisfaction. Conclusions: These results extend our current understanding of maturity of SE teams, shedding light on aspects that have been little explored so far in this field. The proposed model can serve as a guide for teams to enhance their approach of teamwork and for future research in this area.

16:00
Towards an Approach to Prevent Social Loafing in Software Development Teams

ABSTRACT. A high-functioning team is a decisive factor for a successful software development project. However building such a team is not easy. Among many issues and obstacles encountered by teams, social loafing is a common but difficult one to tackle. This study presents a social loafing prevention approach that we have applied in an educational context. The approach starts with increasing team members’ awareness of social loafing. Team Expectations Agreement (TEA) is then used to help the team to write down the terms that explicitly prevent social loafing. During the project, a small survey instrument is used to track regularly if the specified terms are followed by the team members. At the end of a period, the absence/presence of social loafing is assessed by the team using another short survey. How to interpret the results of the surveys is explained as part of the presented approach. This approach has potential to improve teamwork skills of students, which is not adequately addressed in higher education programs. Meanwhile it can be adapted in professional software development environments to prevent social loafing and improve teamwork. The next step of our study will be using the collected data to evaluate the proposed approach, and formulating a set of recommendations of using the approach in the professional software development context.