Blog

go back to the blog

A View from the Chair with Michael Bolton (Volume 2)

  • 04/03/2013
  • 9514 Views
  • no comments
  • Posted by EuroSTAR
-->

What Do We Want To Talk About? The call for papers closed a couple of weeks ago, and now the hard part has begun. There were lots of proposals submitted, and organizing them into a programme takes plenty of review and scrutiny. For sorting and filtering and manipulating raw data quickly and easily, nothing beats a spreadsheet. So I took the records of all the proposals, and used Excel to do some quick-and-dirty analysis on them. Let’s start with some raw data. We received 405 submissions in all; that’s easy to see from the number of rows in the sheet. The next thing was to find out how many proposals were of in each class of presentation. I named the master spreadsheet “Master” (clever, I thought). Copying the “Submission Type” column from the master spreadsheet to the first column of a new sheet, and then using Excel’s Data / Remove Duplicates feature gave me a list of unique session types.

Submission Type
Track Session (45 mins)
Workshop (90 mins)
Full-day tutorial
Special
Half-day Tutorial

Then I created a simple formula in the cell adjacent to in “Track Session” in Column B of my new sheet. Excel’s =countif function allows you to search a range of cells for strings of text, and to report how many cells contain a string exactly the same as the search term. =countif(Master!B:B, A2) That says, “count every cell in Column B on the Master sheet that is equal to the value cell in A2 on this sheet” (which in the case was “Track Session”). So, instantly, the number 332 appeared in the cell beside “Track Session”. Then I copied that formula downwards. This gave:

Submission Type Count
Track Session (45 mins) 332
Workshop (90 mins) 23
Full-day tutorial 19
Special (A type other than those mentioned. Please state type in presentation summary) 9
Half-day Tutorial 22

I used a similar approach for other things I was curious about. If you want evidence that EuroSTAR is truly an international conference, you need look no further. We received proposals from 34 countries in all, including at least one from every continent but Antarctica (I suppose they’re in a code freeze at the moment). The largest number of submissions was from the Netherlands, with 79, closely followed by the UK, with 73. Then there’s a substantial dropoff; we had 28 submissions from the United States; 26 each from Sweden (the host country) and India; 21 each from Belgium and Denmark; 15 from Finland; and 12 from Germany. Rounding out the top ten was a three-way tie: 7 each from Canada, Israel, and Norway. A little more simple Excel magic shows that 281 different people sent in proposals. Of these, 203 sent in a single proposal. 53 submitted two, and 14 people submitted three. Five people prepared four proposals, five prepared five, and one enterprising aspirant sent in nine. Having made these easy observations, some new questions occurred to me: what were people proposing? I wanted to do some first-order looks at that, so I set up another set of formulas to probe keywords in the abstracts. In addition to counting for exact matches, =countif can match substrings within a cell, but the syntax of the function is a little weird, in that it requires you to put the substring that you seek between wildcard characters. For Excel, those wildcards are ? (match any single character) or * (match any string of characters), so if I wanted to look for the substring “foo” within the cells, my search string would have to be “*foo*”. The abstracts appeared in Column H on the master sheet, and I wanted to find out how many abstracts mentioned “testing”, so my countif function looked like this: =countif(Master!H:H, “*testing*”) This revealed that 344 abstracts contained the word “testing”. That’s interesting; evidently, 61 didn’t. What about “test”? =countif(Master!H:H, “*test*”) 391 was the answer there; that is, 391 of the 405 abstracts contained the string “test”. Interesting: 14 abstracts didn’t mention either “test” or “testing”. I made a note to myself to find those and have a look at them. From time to time in any experiment, I have to pause and consider how my tools might be misleading me. For example, my first search string would match “testing”, but it could match “protesting”; the second string could match “test” (and “testing”, too), but it could also match “attest” or “contest”. So I decided to modify the search string to make sure that whitespace preceded the first letter of the word: =countif(Master!H:H, ” *test*”) Now only 390 matched. Now, this isn’t a particularly rigourous experiment; it’s just a finger in the air, so to speak-but I reckoned it would probably be good enough to collect some potentially interesting data. Except typing in a bunch of =countif statements for every interesting word would get old pretty quickly. Why not let Excel do the dirty work? In Column A, I put a header and a list of search terms

Keyword
test
testing
experience
quality
process
agile
skill
design
practice

I wanted the count for each keyword to appear in the next column, Column B. But I needed an intermediate calculation that I put in column C: ="* "&A2&"*" This says “create a string that consists of an asterisk, a space, whatever value (like “test”) is in Cell A2, and then another asterisk”. Then, in cell 2 of Column B, next to “test”, I put my countif formula: =COUNTIF(Master!H:H, C2) Here, C2 refers to the string that Excel had constructed based on the keyword. The net effect of that was as though cell B2 contained the countif function =COUNTIF(Master!H:H, "* test*") Copying this formula downwards meant that whatever value appeared in Column A would be used in a substring search, and the count of abstracts that contained that substring would appear in the adjacent cell. Here’s the data:

 

Keyword Occurrences
test 390
testing 342
experience 155
quality 149
process 130
agile 102
skill 94
design 87
practice 92

 

A lot of other ideas for keywords occurred to me, so I put them in Column A and copied down the formulas. Then I realized that I might be missing some important words, so I wrote a little Ruby program to count the frequency of every word in every abstract. (If you’re not familiar with Ruby, the lines that start with # are comments to explain what’s going on. You can use this program to count the frequency of words in your own text files.) # dexer.rb - Count word frequency in a text file # Usage: dexer.rb [filename] # create an array of all the lines in the file a = File.readlines(ARGV[0]) # create a master list to track the occurrence of each word words = Hash.new(0) # go through the array -- each line in the file... a.each do | line | # ...and split each line into a list of words b = line.split(/[\s]/) # go through that list, putting the word in lowercase # put each new word we find in the master list or, if # we've seen it before, increment the number of occurences b.each do | word | # strip out trailing punctuation marks word = word.match(/([0-9A-Za-z]*)/)[1] if (word.nil?) next end words[word.downcase]+=1 end end # sort our list of words by the number of occurences sorted = words.sort_by { | word, count | count } # change the order so the most frequent words appear first # and, for each word, tell us how many times it appears sorted.reverse.each do | keyword, number | puts "#{keyword} => #{number}" end I copied the abstracts to a single text file, and used that as input for my program. This returned a list of the occurence of each word on the command line, so I dumped that into a file: dexer.rb abstracts.txt > abstract_words.txt Here are the top 10 entries:

 

Keyword Occurrences
test 390
testing 342
experience 155
quality 149
process 130
agile 102
skill 94
design 87
practice 92

 

>WordOccurrences the6178 and3982 to3947 of3148 a2663 in2174 is1887 testing1761 test1494 (Thanks to an earlier bug in my pattern matching, you can also see the frequency of letters.

 

Letter Occurrences
e 77796
t 63988
a 46223
i 46018
o 45771
s 44622
n 42817
r 34292
l 24315
h 24023
d 20822
c 19298
u 17641

 

Those results are quite consistent with patterns of distribution that have been observed before: http://en.wikipedia.org/wiki/Etaoin_shrdlu) Back to my word list. When I stripped out the ordinary words, and kept the testing-related ones, my top ten looked like this:

 

Keyword Occurrences
testing 1761
test 1494
software 474
testers 438
quality 409
development 290
project 286
tester 262
agile 261
automation 252

 

I reviewed the rest of the list and tried to combine words (like tester and testers) for the table in the Excel sheet. So here it is: a list of 100 things that people who submitted proposals want to talk about at EuroSTAR:

 

Keyword Occurrences Percentage of Abstracts
test 390 96
testing 342 84
tester 233 58
time 187 46
question 167 41
manage 159 39
experien 156 39
quality 149 37
approach 140 35
product 135 33
process 130 32
system 124 31
tool 115 28
value 105 26
automat 103 25
agile 102 25
explor 98 24
skill 94 23
practice 92 23
management 90 22
manager 88 22
design 87 21
plan 87 21
developer 86 21
environment 78 19
risk 76 19
technique 74 18
user 73 18
model 71 18
profession 67 17
defect 65 16
cost 64 16
context 63 16
document 57 14
bug 55 14
customer 54 13
exploratory 53 13
data 50 12
test case 48 12
strategy 46 11
stakeholder 46 11
professional 46 11
measure 44 11
release 39 10
story 38 9
script 36 9
future 36 9
coverage 35 9
phase 35 9
framework 35 9
control 32 8
performance 32 8
review 31 8
structur 31 8
technical 31 8
technical 31 8
acceptance 30 7
assur 27 7
scrum 27 7
metric 23 6
study 23 6
estimat 20 5
unit 20 5
QA 20 5
mobile 19 5
profession 19 5
certif 18 4
programming 18 4
valid 17 4
best practice 15 4
quality assurance 15 4
heuristic 14 3
verif 14 3
measurement 12 3
procedure 12 3
certification 11 3
continuous integration 11 3
programmer 11 3
programmer 11 3
boundar 10 2
risk-based 10 2
traceab 10 2
TDD 10 2
testability 9 2
ISTQB 9 2
integration test 9 2
test script 7 2
BDD 7 2
ROI 6 1
TMM 5 1
critical thinking 4 1
devops 4 1
equivalence 3 1
SBTM 3 1
session-based 3 1
Tmap 3 1
classification tree 2 0
ISEB 2 0
oracle 2 0
equivalence class 1 0

 

Blog post by

go back to the blog

eurostar

Leave your blog link in the comments below.

EuroSTAR In Pictures

View image gallery