SlideShare a Scribd company logo
1 of 62
Download to read offline
Crowdsourcing using Mechanical
Turk for Human Computer
Interaction Research	


Ed H. Chi	

	

Research Scientist	

Google	

(work done while at [Xerox] PARC)	





                                       1
Historical Footnote
                               	


De Prony, 1794, hired hairdressers 	

•  (unemployed after French revolution; knew only
    addition and subtraction) 	

•  to create logarithmic and trigonometric tables. 	

	

•  He managed the process by splitting the
    work into very detailed workflows.	

        !"#$% &'#(")$)*'%+ ,'"%-
                                                          • !"#$%/ 0121 )31

	

                                                         56'#(")12/+7 "/1-
                                                            #$)3 6'#(")$)*'%

    –  Grier, When computers were human, 2005	

                 • !"#$%&'() 6'#(")
                                                                   – &9$*2$")+ $/)2'%'#
                                                                     &'#(")1- )31 !$9
                                                                     6'#1) '2?*) @)3211
                                                                     (2'?91#A )&*&)&%#
                                                                     -$./" '4 %"#12*6
                                                                     6'#(")$)*'%/ $62'
                                                                     $/)2'%'#12/
                                                                          C2*12+ D31% 6'#(")12/ 0
                                                                                           C2*12




                                                                  2
Talk in 3 Acts
                                    	


•  Act 1:	

     –  How we almost failed in using MTurk?! 	

     –  [Kittur, Chi, Suh, CHI2008]	


•  Act II:	

     –  Apply MTurk to visualization evaluation 	

     –  [Kittur, Suh, Chi, CSCW2008]	


•  Act III:	

     –  Where are the limits?	



  Aniket Kittur, Ed H. Chi, Bongwon Suh. 	

  Crowdsourcing User Studies With Mechanical Turk. In CHI2008.	

  	

  Aniket Kittur, Bongwon Suh, Ed H. Chi. Can You Ever Trust a Wiki?
  Impacting Perceived Trustworthiness in Wikipedia. In CSCW2008.	



                                                                      3
Example Task from Amazon MTurk	





                                    4
Using Mechanical Turk for user studies	


                        Traditional user           Mechanical Turk	

                            studies	

Task complexity	

            Complex	

                  Simple	

                               Long	

                    Short	

Task subjectivity	

         Subjective	

               Objective	

                             Opinions	

                 Verifiable	

User information	

    Targeted demographics	

   Unknown demographics 	

                          High interactivity	

    Limited interactivity	



         Can Mechanical Turk be usefully used for user studies?	



                                                                              5
Task	


•  Assess quality of Wikipedia articles	

•  Started with ratings from expert Wikipedians	

    –  14 articles (e.g., Germany , Noam Chomsky )	

    –  7-point scale	

•  Can we get matching ratings with mechanical turk?	





                                                          6
Experiment 1	


•  Rate articles on 7-point scales:	

    –  Well written	

    –  Factually accurate	

    –  Overall quality	

•  Free-text input:	

    –  What improvements does the article need?	

•  Paid $0.05 each	





                                                     7
Experiment 1: Good news	


•  58 users made 210 ratings (15 per article)	

   –  $10.50 total	

•  Fast results	

   –  44% within a day, 100% within two days	

   –  Many completed within minutes	





                                                   8
Experiment 1: Bad news	


•  Correlation between turkers and Wikipedians
   only marginally significant (r=.50, p=.07)	

•  Worse, 59% potentially invalid responses	

                             Experiment 1
            Invalid              49%
          comments
            <1 min               31%
          responses

•  Nearly 75% of these done by only 8 users	





                                                  9
Not a good start	

•  Summary of Experiment 1:	

   –  Only marginal correlation with experts.	

   –  Heavy gaming of the system by a minority	

•  Possible Response:	

   –  Can make sure these gamers are not rewarded	

   –  Ban them from doing your hits in the future	

   –  Create a reputation system [Delores Lab]	

•  Can we change how we collect user input ?	





                                                       10
Design changes	


•  Use verifiable questions to signal monitoring	

   –  How many sections does the article have? 	

   –  How many images does the article have? 	

   –  How many references does the article have? 	





                                                       11
Design changes	


•  Use verifiable questions to signal monitoring	

•  Make malicious answers as high cost as good-faith
   answers	

   –  Provide 4-6 keywords that would give someone a
     good summary of the contents of the article 	





                                                       12
Design changes	


•  Use verifiable questions to signal monitoring	

•  Make malicious answers as high cost as good-faith
   answers	

•  Make verifiable answers useful for completing
   task	

   –  Used tasks similar to how Wikipedians evaluate quality
      (organization, presentation, references)	





                                                               13
Design changes	


•  Use verifiable questions to signal monitoring	

•  Make malicious answers as high cost as good-faith
   answers	

•  Make verifiable answers useful for completing
   task	

•  Put verifiable tasks before subjective responses	

   –  First do objective tasks and summarization	

   –  Only then evaluate subjective quality	

   –  Ecological validity?	





                                                        14
Experiment 2: Results	


    •  124 users provided 277 ratings (~20 per article)	

    •  Significant positive correlation with Wikipedians 	

        –  r=.66, p=.01	

    •  Smaller proportion malicious responses	

    •  Increased time on task	


                        Experiment 1	

      Experiment 2	

   Invalid                   49%	

                3%	

 comments	

   <1 min                    31%	

                7%	

 responses	

Median time	

               1:30	

               4:06	

                                                               15
Quick Summary of Tips	


•  Mechanical Turk offers the practitioner a way to access a
   large user pool and quickly collect data at low cost	

•  Good results require careful task design	


  1.    Use verifiable questions to signal monitoring	

  2.    Make malicious answers as high cost as good-faith answers	

  3.    Make verifiable answers useful for completing task	

  4.    Put verifiable tasks before subjective responses	





                                                                       16
Generalizing to other MTurk studies	


•  Combine objective and subjective questions	

    –  Rapid prototyping: ask verifiable questions about content/
      design of prototype before subjective evaluation	

    –  User surveys: ask common-knowledge questions before
      asking for opinions	

•  Filtering for Quality	

    –  Put in a field for Free-Form Responses and Filter out
       data without answers	

    –  Results that came in too quickly	

    –  Sort by WorkerID and look for cut and paste answers	

    –  Look for outliers in the data that are suspicious	





                                                                   17
Talk in 3 Acts
                                 	


•  Act 1:	

     –  How we almost failed?!	



•  Act II:	

     –  Applying MTurk to visualization evaluation	



•  Act III:	

     –  Where are the limits?	





                                                        18
What would make you trust Wikipedia more?




                                        20
What is Wikipedia?




    Wikipedia is the best thing ever. Anyone in the world can write
anything they want about any subject, so you know you re getting the
                      best possible information.
                      – Steve Carell, The Office


                                                                   21
What would make you trust Wikipedia more?




              Nothing



                                        22
What would make you trust Wikipedia more?




       Wikipedia, just by its nature, is
      impossible to trust completely. I don't
      think this can necessarily be
      changed.




                                                23
WikiDashboard
       Transparency of social dynamics can reduce conflict and coordination
        issues
       Attribution encourages contribution
         –  WikiDashboard: Social dashboard for wikis
         –  Prototype system: http://wikidashboard.parc.com



       Visualization for every wiki page
        showing edit history timeline and
        top individual editors

       Can drill down into activity history
        for specific editors and view edits
        to see changes side-by-side

Citation: Suh et al.
CHI 2008 Proceedings


                    2011 UCBerkeley Visual Computing Retreat                   24
Hillary	
  Clinton	
  




  2011 UCBerkeley Visual
                           25
    Computing Retreat           25
Top	
  Editor	
  -­‐	
  Wasted	
  Time	
  R	
  




              2011 UCBerkeley Visual
                                                  26
                Computing Retreat
Surfacing information

•  Numerous studies mining Wikipedia revision
   history to surface trust-relevant information
   –  Adler & Alfaro, 2007; Dondio et al., 2006; Kittur et al., 2007;
      Viegas et al., 2004; Zeng et al., 2006




                                          Suh, Chi, Kittur, & Pendleton, CHI2008


•  But how much impact can this have on user
   perceptions in a system which is inherently
   mutable?
                                                                              27
Hypotheses

1.  Visualization will impact perceptions of trust
2.  Compared to baseline, visualization will
    impact trust both positively and negatively
3.  Visualization should have most impact when
    high uncertainty about article
   •    Low quality
   •    High controversy




                                                     28
Design

        •  3 x 2 x 2 design


                          Controversial    Uncontroversial


Visualization              Abortion          Volcano
                                                             High quality
•    High stability     George Bush           Shark
•    Low stability
•    Baseline (none)   Pro-life feminism        Disk
                                           defragmenter      Low quality
                       Scientology and
                          celebrities        Beeswax




                                                                           29
Example: High trust visualization




                                    30
Example: Low trust visualization




                                   31
Summary info

          •  % from anonymous
             users




                                32
Summary info

          •  % from anonymous
             users
          •  Last change by
             anonymous or
             established user




                                33
Summary info

          •  % from anonymous
             users
          •  Last change by
             anonymous or
             established user
          •  Stability of words




                                  34
Graph

•  Instability




                         35
Method

•  Users recruited via Amazon s Mechanical Turk
   –    253 participants
   –    673 ratings
   –    7 cents per rating
   –    Kittur, Chi, & Suh, CHI 2008: Crowdsourcing user studies
•  To ensure salience and valid answers, participants
   answered:
   –    In what time period was this article the least stable?
   –    How stable has this article been for the last month?
   –    Who was the last editor?
   –    How trustworthy do you consider the above editor?



                                                                   36
Results

                                    7       High stability        Baseline        Low stability


                                    6
           Trustworthiness rating
                                    5

                                    4

                                    3

                                    2

                                    1
                                        Low qual      High qual       Low qual        High qual

                                           Uncontroversial                   Controversial


main effects of quality and controversy:
• high-quality articles > low-quality articles (F(1, 425) = 25.37, p < .001)
• uncontroversial articles > controversial articles (F(1, 425) = 4.69, p = .
031)

                                                                                                  37
Results

                                   7       High stability        Baseline        Low stability


                                   6
          Trustworthiness rating
                                   5

                                   4

                                   3

                                   2

                                   1
                                       Low qual      High qual       Low qual        High qual

                                          Uncontroversial                   Controversial


interaction effects of quality and controversy:
• high quality articles were rated equally trustworthy whether controversial
or not, while
• low quality articles were rated lower when they were controversial than
when they were uncontroversial.
                                                                                                 38
Results
1.  Significant effect of visualization: High-Stability > Low-Stability, p < .001
2.  Viz has both positive and negative effects:
    –  High-Stability > Baseline (p < .001) > Low-Stability, p < .01
3.  No interaction of visualization with either quality or controversy
    –  Robust across visualization conditions
                                           7       High stability        Baseline        Low stability


                                           6
                  Trustworthiness rating




                                           5

                                           4

                                           3

                                           2

                                           1
                                               Low qual      High qual       Low qual        High qual

                                                  Uncontroversial                   Controversial
                                                                                                         39
Results
1.  Significant effect of visualization: High-Stability > Low-Stability, p < .001
2.  Viz has both positive and negative effects:
    –  High-Stability > Baseline (p < .001) > Low-Stability, p < .01
3.  No interaction of visualization with either quality or controversy
    –  Robust across visualization conditions
                                           7       High stability        Baseline        Low stability


                                           6
                  Trustworthiness rating




                                           5

                                           4

                                           3

                                           2

                                           1
                                               Low qual      High qual       Low qual        High qual

                                                  Uncontroversial                   Controversial
                                                                                                         40
Results
1.  Significant effect of visualization: High-Stability > Low-Stability, p < .001
2.  Viz has both positive and negative effects:
    –  High-Stability > Baseline (p < .001) > Low-Stability, p < .01
3.  No interaction of visualization with either quality or controversy
    –  Robust across visualization conditions
                                           7       High stability        Baseline        Low stability


                                           6
                  Trustworthiness rating




                                           5

                                           4

                                           3

                                           2

                                           1
                                               Low qual      High qual       Low qual        High qual

                                                  Uncontroversial                   Controversial
                                                                                                         41
Talk in 3 Acts
                                 	


•  Act 1:	

     –  How we almost failed?!	



•  Act II:	

     –  Applying MTurk to visualization evaluation	



•  Act III:	

     –  Where are the limits?	





                                                        42
Limitations of Mechanical Turk	


•  No control of users environment	

   –  Potential for different browsers, physical distractions	

   –  General problem with online experimentation 	

•  Not yet designed for user studies	

   –  Difficult to do between-subjects design	

   –  May need some programming	

•  Hard to control user population	

   –  hard to control demographics, expertise	





                                                                   43
Crowdsourcing for HCI Research
                                   	


•  Does my interface/visualization work?	

   –  WikiDashboard: transparency vis for Wikipedia [Suh et al.]	

   –  Replicating Perceptual Experiments [Heer et al., CHI2010]	

•  Coding of large amount of user data	

   –  What is a Question in Twitter? [Sharoda Paul, Lichan Hong, Ed Chi]	

•  Incentive mechanisms	

   –  Intrinsic vs. Extrinsic rewards: Games vs. Pay	

   –  [Horton & Chilton, 2010 for Mturk] and [Ariely, 2009] in general	





                                                                              44
Crowdsourcing for HCI Research
                                    	


•  Does my interface/visualization work?	

   –  WikiDashboard: transparency vis for Wikipedia [Suh et al. VAST,
      Kittur et al. CSCW2008]	

   –  Replicating Perceptual Experiments [Heer et al., CHI2010]	

•  Coding of large amount of user data	

   –  What is a Question in Twitter? [S. Paul, L. Hong, E. Chi, ICWSM 2011]	

•  Incentive mechanisms	

   –  Intrinsic vs. Extrinsic rewards: Games vs. Pay	

   –  [Horton & Chilton, 2010 on MTurk] and Satisficing	

   –  [Ariely, 2009] in general: Higher pay != Better work	





                                                                                 45
Managing Quality
                                 	


•  Quality through redundancy: Combining votes 	

      –  Majority vote [work best when similar worker quality]	

      –  Worker-Quality‐adjusted vote	

      –  Managing dependencies	

•  Quality through gold data	

      –  Advantaged when imbalanced dataset & bad workers	

•  Estimating worker quality (Redundancy + Gold)	

      –  Calculate the confusion matrix and see if you actually
         get some information from the worker	

	

•  Toolkit: http://code.google.com/p/get‐another‐label/	




                                  Source: Ipeirotis, WWW2011        46
Coding and Machine Learning
                !"#$%& '(%)*"(+ 	

    •  Integration with Machine Learning	

• ,)#-+' %-.&% */-"+"+0 1-*- using
     –  Build automatic classification models
        crowdsourced data	

• 2'& */-"+"+0 1-*- *( .)"%1 #(1&%

                         Data from existing
                       crowdsourced answers



N
New C
    Case                Automatic Model                  Automatic
                     (through machine learning)           Answer


                                    Source: Ipeirotis, WWW2011
                                                                     47
Crowd Programming for Complex Tasks
                                  	


  •  Decompose tasks into smaller tasks	

     –  Digital Taylorism	

     –  Frederick Winslow Taylor (1856-1915) 	

     –  1911 'Principles Of Scientific Management’	

  •  Crowd Programming Explorations	

     –  MapReduce Models	

         •  Kittur, A.; Smus, B.; and Kraut, R. CHI2011EA on CrowdForge.	

         •  Kulkarni, Can, Hartmann, CHI2011 workshop & WIP	

     –  Little, G.; Chilton, L.; Goldman, M.; and Miller, R. C. In
        KDD 2010 Workshop on Human Computation.	





                                                                              48
CHI 2011 • Work-in-Progress                                                                                           May 7–12, 2011 • Vancouver, BC, Canada



                 Crowd Programming for Complex Tasks
                                !

                                                   	

                                                                                                   !




                                "#!$%&!'%()(*!%!(&+,-.-+/!&01,+((-#2!('+&!-(!%&&3-+/!'1!         &%0'-'-1#!('+&!%()+/!:10)+0(!'1!,0+%'+!%#!%0'-,3+!18'3-#+*!
                                +%,4!-'+$!-#!'4+!&%0'-'-1#5!64+(+!'%()(!%0+!-/+%337!             0+&0+(+#'+/!%(!%#!%00%7!1.!(+,'-1#!4+%/-#2(!(8,4!%(!
                              •  Crowd Programming Explorations	

                                (-$&3+!+#1824!'1!9+!%#(:+0%93+!97!%!(-#23+!:10)+0!-#!%!
                                (410'!%$18#'!1.!'-$+5!;10!+<%$&3+*!%!$%&!'%()!.10!
                                                                                                 EF-('107G!%#/!EH+120%&47G5!"#!%#!+#=-01#$+#'!:4+0+!
                                                                                                 :10)+0(!:183/!,1$&3+'+!4-24!+..10'!'%()(*!'4+!#+<'!('+&!

                                    –  Kittur, A.; Smus, B.; and Kraut, R. CHI2011EA on
                                %0'-,3+!:0-'-#2!,183/!%()!%!:10)+0!'1!,133+,'!1#+!.%,'!1#!
                                %!2-=+#!'1&-,!-#!'4+!%0'-,3+>(!18'3-#+5!?83'-&3+!-#('%#,+(!
                                                                                                 $-24'!9+!'1!4%=+!(1$+1#+!:0-'+!%!&%0%20%&4!.10!+%,4!
                                                                                                 (+,'-1#5!F1:+=+0*!'4+!/-..-,83'7!%#/!'-$+!-#=13=+/!-#!

                                       CrowdForge.	

                                1.!%!$%&!'%()(!,183/!9+!-#('%#'-%'+/!.10!+%,4!&%0'-'-1#@!
                                +525*!$83'-&3+!:10)+0(!,183/!9+!%()+/!'1!,133+,'!1#+!.%,'!
                                                                                                 .-#/-#2!'4+!-#.10$%'-1#!.10!%#/!:0-'-#2!%!,1$&3+'+!
                                                                                                 &%0%20%&4!.10!%!4+%/-#2!-(!%!$-($%',4!'1!'4+!31:!:10)!
                                +%,4!1#!%!'1&-,!-#!&%0%33+35!                                    ,%&%,-'7!1.!$-,01I'%()!$%0)+'(5!648(!:+!901)+!'4+!'%()!
                                    –  Kulkarni, Can, Hartmann, CHI2011 workshop & WIP	

        8&!.80'4+0*!(+&%0%'-#2!'4+!-#.10$%'-1#!,133+,'-1#!%#/!
                                ;-#%337*!0+/8,+!'%()(!'%)+!%33!'4+!0+(83'(!.01$!%!2-=+#!         :0-'-#2!(89'%()(5!B&+,-.-,%337*!+%,4!(+,'-1#!4+%/-#2!
                                $%&!'%()!%#/!,1#(13-/%'+!'4+$*!'7&-,%337!-#'1!%!(-#23+!          .01$!'4+!&%0'-'-1#!:%(!8(+/!'1!2+#+0%'+!$%&!'%()(!-#!
                                0+(83'5!"#!'4+!%0'-,3+!:0-'-#2!+<%$&3+*!%!0+/8,+!('+&!
                                $-24'!'%)+!.%,'(!,133+,'+/!.10!%!2-=+#!'1&-,!97!$%#7!
                                :10)+0(!%#/!4%=+!%!:10)+0!'80#!'4+$!-#'1!%!&%0%20%&45!        “Please solve the 16-question SAT located at
                                A#7!1.!'4+(+!('+&(!,%#!9+!-'+0%'-=+5!;10!+<%$&3+*!'4+!        http://bit.ly/SATexam”. In both cases, we paid workers
                                '1&-,!.10!%#!%0'-,3+!(+,'-1#!/+.-#+/!-#!%!.-0('!&%0'-'-1#!
                                                                                              between $0.10 and $0.40 per HIT. Each “subdivide” or
                                ,%#!-'(+3.!9+!&%0'-'-1#+/!-#'1!(89(+,'-1#(5!B-$-3%037*!'4+!
                                &%0%20%&4(!0+'80#+/!.01$!1#+!0+/8,'-1#!('+&!,%#!-#!           “merge” HIT received answers within 4 hours; solutions
                                '80#!9+!0+10/+0+/!'401824!%!(+,1#/!0+/8,'-1#!('+&5!
                                                                                              to the initial task were complete within 72 hours.
                                !"#$%#&'()$#%
                             C+!+<&310+/!%(!%!,%(+!('8/7!'4+!,1$&3+<!'%()!1.!
                             :0-'-#2!%#!+#,7,31&+/-%!%0'-,3+5!C0-'-#2!%#!%0'-,3+!-(!%!        Results
                             ,4%33+#2-#2!%#/!-#'+0/+&+#/+#'!'%()!'4%'!-#=13=+(!$%#7!          The decompositions produced by Turkers while running
                             /-..+0+#'!(89'%()(D!&3%##-#2!'4+!(,1&+!1.!'4+!%0'-,3+*!
                             41:!-'!(4183/!9+!('08,'80+/*!.-#/-#2!%#/!.-3'+0-#2!              Turkomatic are displayed in Figure 1 (essay-writing)
                             -#.10$%'-1#!'1!-#,38/+*!:0-'-#2!8&!'4%'!-#.10$%'-1#*!
                             .-#/-#2!%#/!.-<-#2!20%$$%0!%#/!(&+33-#2*!%#/!$%)-#2!
                                                                                              and Figure 4 (SAT).
                             '4+!%0'-,3+!,14+0+#'5!64+(+!,4%0%,'+0-('-,(!$%)+!%0'-,3+!
                       Figure 4. For the SAT task, we uploaded
                             :0-'-#2!%!,4%33+#2-#2!98'!0+&0+(+#'%'-=+!'+('!,%(+!.10!
                       sixteen questions from a high school
                             180!%&&01%,45!                                                   In the essay task, each “subdivide” HIT was posted
                       Scholastic Aptitude Test to the web and                                three times by Turkomatic and the best of the three
                             61!(13=+!'4-(!&0193+$!:+!,0+%'+/!%!(-$&3+!.31:!                      *)+',$%-.%/",&)"0%,$#'0&#%12%"%3100"41,"&)5$%
                                                                                              was selected by experimenters (simulating Turker      49
                       posed ,1#(-('-#2!1.!%!&%0'-'-1#*!$%&*!%#/!0+/8,+!('+&5!!64+!
                             the following task to Turkomatic:                                    6,)&)7+%&"#89%

                       “Please solve the 16-question SAT located                              voting) to continue the solution process. The proposed
                       at http://bit.ly/SATexam”.                                             decompositions were overwhelmingly linear and chose
                                                                                                                                                1804
Future Directions in Crowdsourcing
                                                	


                 •  Real-time Crowdsourcing	

                        –  Bigham, et al. VizWiz, UIST 2010	


 What color is this pillow?   What denomination is   Do you see picnic tables What temperature is my    Can you please tell me   What k
                                    this bill?        across the parking lot?      oven set to?           what this can is?         thi




 (89s)             .              (24s) 20                  (13s) no          (69s) it looks like 425    (183s) chickpeas.       (91s)
 (105s) multiple shades           (29s) 20                  (46s) no          degrees but the image      (514s) beans            (99s) n
 of soft green, blue and                                                      is difficult to see.       (552s) Goya Beans       picture
 gold                                                                         (84s) 400                                          (247s)
                                                                              (122s) 450

Figure 2: Six questions asked by participants, the photographs they took, and answers received with latency in s
                                                                                                                      50


the total time required to answer a question. quikTurkit also                   distribution was set such that half of the HI
Future Directions in Crowdsourcing
                                 	


•  Real-time Crowdsourcing	

   –  Bigham, et al. VizWiz, UIST 2010	

•  Embedding of Crowdwork inside Tools	

   –  Bernstein, et al. Solyent, UIST 2010	





                                                51
Crowd Feedback
                                             To effectively design feedback mechanism
                                             the goals of learning, engagement, and qu
                                             improvement, we first analyze the importa
Future Directions in Crowdsourcing
                                 	

         dimensions of the design space for crowd
                                             (Figure 2).

                                            Timeliness: When should feedback be sho
•  Real-time Crowdsourcing	

               In micro-task work, workers stay with tas
   –    Bigham, et al. VizWiz, UIST 2010	

 while, then move on. This implies two tim
                                            synchronously deliver feedback when wor
•  Embedding of Crowdwork inside Tools	

   engaged in a set of tasks, or asynchronou
   –    Bernstein, et al. Solyent, UIST 2010	

                                            feedback after workers have completed th

•  Shepherding Crowdwork	

                     Synchronous feedback may have more im
   –    Dow et al. CHI2011 WIP	

                               task performance since
                                                                while workers are still th
                                                                the task domain. It also
                                                                probability that workers
                                                                onto similar tasks. Howe
                                                                synchronous feedback p
                                                                burden on the feedback
                                                                they have little time to r
                                                                This implies a need for t
                                                                scheduling algorithms th
                                                                near real-time feedback
                                                                Asynchronous feedback
                                                                             52
                                                                feedback providers more
         Figure 2: Current systems (in orange) focus on
         asynchronous, single-bit feedback by requesters.       review and comment on
Tutorials	

•    Matt Lease http://ir.ischool.utexas.edu/crowd/	

•    AAAI 2011 (w HCOMP 2011): Human Computation: Core Research Questions
     and State of the Art (E. Law & Luis von Ahn)	

•    WSDM 2011: Crowdsourcing 101: Putting the WSDM of Crowds to Work for
     You (Omar Alonso and Matthew Lease)	

      –    http://ir.ischool.utexas.edu/wsdm2011_tutorial.pdf	

•    LREC 2010 Tutorial: Statistical Models of the Annotation Process (Bob Carpenter
     and Massimo Poesio) 	

      –    http://lingpipe-blog.com/2010/05/17/ 	

•    ECIR 2010: Crowdsourcing for Relevance Evaluation. (Omar Alonso)                        	

	

      –    http://wwwcsif.cs.ucdavis.edu/~alonsoom/crowdsourcing.html	

•    CVPR 2010: Mechanical Turk for Computer Vision. (Alex Sorokin and Fei‐Fei Li) 	

      –    http://sites.google.com/site/turkforvision/	

•    CIKM 2008: Crowdsourcing for Relevance Evaluation (D. Rose) 	

      –    http://videolectures.net/cikm08_rose_cfre/ 	

•    WWW2011: Managing Crowdsourced Human Computation (Panos Ipeirotis)	

      –    http://www.slideshare.net/ipeirotis/managing-crowdsourced-human-computation 	





                                                                                                      53
Social Q&A on Twitter!
!




                       S.	
  Paul,	
  L.	
  Hong,	
  E.	
  Chi,	
  ICWSM	
  2011	
  

                                	
  
     3/27/12                                                                           54
Why social Q&A?!

    !

    !




People turn to their friends on social networks because they
trust their friends to provide tailored answers to subjective
questions on niche topics.!
   !


 3/27/12!                                                 55
Research Questions!
  !
What kinds of questions are Twitter users asking
their friends?!
   !
  Types and topics of questions!
  !
Are users receiving responses to the questions
they are asking?!
  Number, speed, and relevancy of responses!
  !
How does the nature of the social network affect Q&A
behavior?!
  Size and usage of network, reciprocity of relationship!




 3/27/12                                                    58
Identifying question tweets was challenging!
  !
                                  Advertisement framed as question!
    !
                                     !

                                         Rhetorical question!
                                     !

                                            !
                                            Missing context!
                                            !
                                                !


Used heuristics to identify candidate tweets! that were
possibly questions!

! 3/27/12                                                       59
Classifying candidates tweets using
Mechanical Turk!
Crowd-sourced question tweet identification to Amazon Mechanical
Turk!



     !

     !
                                                    Control tweet!



                                                        !
•  Each Tweet classified by two Turkers!
                                              !
•  Each Turker classified 25 tweets: 20 candidates and 5
   control tweets!
•  Only accepted data from Turkers who classified all
   control tweets correctly!
 3/27/12                                                             60
Overall method for filtering questions!
     ! Candidate tweets	
   Random sample of public tweets	
  


                                     Applied heuristics to !
                                     identify candidate tweets !


12,000!                                                       1.2 million!
     (4,100 presented to Turkers)!
	
  


               Classified candidates!                   Tracked responses !
                using Mechanical Turk!                  to each candidate tweet!

                                                             624!



                1152!



     3/27/12                                                                   61
Findings: Types and topics of questions!
   !
Rhetorical (42%), factual (16%), and poll (15%) questions
were common!
Significant percentage of personal & health (11%)questions!
       !
!
              Question types!                                                                     Question topics!


                            How do you feel about                                                                              Which team is better
                            interracial dating?                                                      others	
                  raiders or steelers?
                                                                                                      16%	
  
                                                           uncategorized	
  
                                                                5%	
  
                                                                                                                              entertainment	
  
                                                                  professional	
                                                   32%	
  
                                                                      4%	
  
                                                            restaurant/food	
                                                      Any good iPad app
                                                                  4%	
                                                             recommendations?
                                                                current	
  events	
  
                                In UK, when you need to              4%	
                  gree8ngs	
  	
                         technology	
  
                                see a specialist, do you                                      7%	
                                   10%	
  
                                                                                                                  personal	
  
                                need special forms or                                                             &	
  health	
  
                                permission?                                                                            11%	
  
                                                                                         ethics	
  &	
  
                                                                                        philosophy	
  
                                                                                            7%	
                   Any idea how to lost
                                                                                                                   weight fast?

    3/27/12                                                                                                                                        62
Findings: Responses to questions!
 8
   !
                           7

                               !
log(number of questions)




                           6
                                       Number of responses
                           5
                                       have a long tail            Low (18.7%)
                           4           distribution!               response rate in
                           3                                       general, but quick
                           2                                       responses!
                           1

                           0
                                                                   !
                                     0
                                     1
                                     2
                                     3
                                     4
                                     5
                                     6
                                     7
                                     8
                                    10
                                    16
                                    17
                                    28
                                    29
                                    39
                                   147
                                       Number of answers


                                   Most often reciprocity between asker and
                                   answerer was one-way (55%)!
                                   Responses were largely (84%) relevant!

                     3/27/12
                                   !                                               63
Findings: Social network characteristics!
  !
    Which characteristics of asker predict whether she will
    receive a response?!
       !
       !
      Network size and status in network are good
       !
      predictors of whether asker will receive
      response!


Logistic regression modeling (structural properties)!
       !
       Number of followers (+)      "      "Number of tweets posted!
       Number of days on Twitter (+)"      "Frequency of use of Twitter!
       Ratio of followers/followees (+)!
       Reciprocity rate (-)!

!
       !
       !
    3/27/12                                                                64
Thanks!

	

•  chi@acm.org	

•  http://edchi.net	

•  @edchi	



•    Aniket Kittur, Ed H. Chi, Bongwon Suh. Crowdsourcing User Studies
     With Mechanical Turk. In Proceedings of the ACM Conference on Human-
     factors in Computing Systems (CHI2008), pp.453-456. ACM Press, 2008.
     Florence, Italy.	

•    Aniket Kittur, Bongwon Suh, Ed H. Chi. Can You Ever Trust a Wiki?
     Impacting Perceived Trustworthiness in Wikipedia. In Proc. of Computer-
     Supported Cooperative Work (CSCW2008), pp. 477-480. ACM Press, 2008.
     San Diego, CA. [Best Note Award]	




                                                                               66

More Related Content

Similar to Crowdsourcing using MTurk for HCI research

OpenRepGrid – An Open Source Software for the Analysis of Repertory Grids
OpenRepGrid – An Open Source Software for the Analysis of Repertory GridsOpenRepGrid – An Open Source Software for the Analysis of Repertory Grids
OpenRepGrid – An Open Source Software for the Analysis of Repertory GridsMark Heckmann
 
Aect2018 workshop-v6ij-compressed
Aect2018 workshop-v6ij-compressedAect2018 workshop-v6ij-compressed
Aect2018 workshop-v6ij-compressedIsa Jahnke
 
2013 ucar best practices
2013 ucar best practices2013 ucar best practices
2013 ucar best practicesc.titus.brown
 
Human computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveHuman computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveoralonso
 
492 final presentation
492 final presentation492 final presentation
492 final presentationMaheshWosti
 
[HCMC STC Jan 2015] Proving Our Worth Quantifying The Value Of Testing
[HCMC STC Jan 2015] Proving Our Worth  Quantifying The Value Of Testing[HCMC STC Jan 2015] Proving Our Worth  Quantifying The Value Of Testing
[HCMC STC Jan 2015] Proving Our Worth Quantifying The Value Of TestingHo Chi Minh City Software Testing Club
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter? CS, NcState
 
Thesispresentatie November
Thesispresentatie NovemberThesispresentatie November
Thesispresentatie NovemberRobin De Croon
 
MODEL-DRIVEN ENGINEERING (MDE) in Practice
MODEL-DRIVEN ENGINEERING (MDE) in PracticeMODEL-DRIVEN ENGINEERING (MDE) in Practice
MODEL-DRIVEN ENGINEERING (MDE) in PracticeHussein Alshkhir
 
In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?CS, NcState
 
Cleaning Code - Tools and Techniques for Large Legacy Projects
Cleaning Code - Tools and Techniques for Large Legacy ProjectsCleaning Code - Tools and Techniques for Large Legacy Projects
Cleaning Code - Tools and Techniques for Large Legacy ProjectsMike Long
 
FutureOfTesting2008
FutureOfTesting2008FutureOfTesting2008
FutureOfTesting2008vipulkocher
 
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...Dirk Lewandowski
 
User centered design workshop
User centered design workshopUser centered design workshop
User centered design workshopPatrick McNeil
 
Lecture_2_Stats.pdf
Lecture_2_Stats.pdfLecture_2_Stats.pdf
Lecture_2_Stats.pdfpaijitk
 

Similar to Crowdsourcing using MTurk for HCI research (20)

OpenRepGrid – An Open Source Software for the Analysis of Repertory Grids
OpenRepGrid – An Open Source Software for the Analysis of Repertory GridsOpenRepGrid – An Open Source Software for the Analysis of Repertory Grids
OpenRepGrid – An Open Source Software for the Analysis of Repertory Grids
 
Aect 2018 workshop
Aect 2018 workshopAect 2018 workshop
Aect 2018 workshop
 
Aect2018 workshop-v6ij-compressed
Aect2018 workshop-v6ij-compressedAect2018 workshop-v6ij-compressed
Aect2018 workshop-v6ij-compressed
 
HCI-Lecture-1
HCI-Lecture-1HCI-Lecture-1
HCI-Lecture-1
 
2013 ucar best practices
2013 ucar best practices2013 ucar best practices
2013 ucar best practices
 
The Art of Project Estimation
The Art of Project EstimationThe Art of Project Estimation
The Art of Project Estimation
 
Human computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveHuman computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspective
 
492 final presentation
492 final presentation492 final presentation
492 final presentation
 
Prototype in HCI
Prototype in HCIPrototype in HCI
Prototype in HCI
 
The art of project estimation
The art of project estimationThe art of project estimation
The art of project estimation
 
[HCMC STC Jan 2015] Proving Our Worth Quantifying The Value Of Testing
[HCMC STC Jan 2015] Proving Our Worth  Quantifying The Value Of Testing[HCMC STC Jan 2015] Proving Our Worth  Quantifying The Value Of Testing
[HCMC STC Jan 2015] Proving Our Worth Quantifying The Value Of Testing
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter?
 
Thesispresentatie November
Thesispresentatie NovemberThesispresentatie November
Thesispresentatie November
 
MODEL-DRIVEN ENGINEERING (MDE) in Practice
MODEL-DRIVEN ENGINEERING (MDE) in PracticeMODEL-DRIVEN ENGINEERING (MDE) in Practice
MODEL-DRIVEN ENGINEERING (MDE) in Practice
 
In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?
 
Cleaning Code - Tools and Techniques for Large Legacy Projects
Cleaning Code - Tools and Techniques for Large Legacy ProjectsCleaning Code - Tools and Techniques for Large Legacy Projects
Cleaning Code - Tools and Techniques for Large Legacy Projects
 
FutureOfTesting2008
FutureOfTesting2008FutureOfTesting2008
FutureOfTesting2008
 
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
 
User centered design workshop
User centered design workshopUser centered design workshop
User centered design workshop
 
Lecture_2_Stats.pdf
Lecture_2_Stats.pdfLecture_2_Stats.pdf
Lecture_2_Stats.pdf
 

More from Ed Chi

2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...Ed Chi
 
HCI Korea 2012 Keynote Talk on Model-Driven Research in Social Computing
HCI Korea 2012 Keynote Talk on Model-Driven Research in Social ComputingHCI Korea 2012 Keynote Talk on Model-Driven Research in Social Computing
HCI Korea 2012 Keynote Talk on Model-Driven Research in Social ComputingEd Chi
 
Location and Language in Social Media (Stanford Mobi Social Invited Talk)
Location and Language in Social Media (Stanford Mobi Social Invited Talk)Location and Language in Social Media (Stanford Mobi Social Invited Talk)
Location and Language in Social Media (Stanford Mobi Social Invited Talk)Ed Chi
 
CIKM 2011 Social Computing Industry Invited Talk
CIKM 2011 Social Computing Industry Invited TalkCIKM 2011 Social Computing Industry Invited Talk
CIKM 2011 Social Computing Industry Invited TalkEd Chi
 
WikiSym 2011 Closing Keynote
WikiSym 2011 Closing KeynoteWikiSym 2011 Closing Keynote
WikiSym 2011 Closing KeynoteEd Chi
 
CSCL 2011 Keynote on Social Computing and eLearning
CSCL 2011 Keynote on Social Computing and eLearningCSCL 2011 Keynote on Social Computing and eLearning
CSCL 2011 Keynote on Social Computing and eLearningEd Chi
 
Replication is more than Duplication: Position slides for CHI2011 panel on re...
Replication is more than Duplication: Position slides for CHI2011 panel on re...Replication is more than Duplication: Position slides for CHI2011 panel on re...
Replication is more than Duplication: Position slides for CHI2011 panel on re...Ed Chi
 
Eddi: Topic Browsing of Twitter Streams
Eddi: Topic Browsing of Twitter StreamsEddi: Topic Browsing of Twitter Streams
Eddi: Topic Browsing of Twitter StreamsEd Chi
 
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...Ed Chi
 
Model-based Research in Human-Computer Interaction (HCI): Keynote at Mensch u...
Model-based Research in Human-Computer Interaction (HCI): Keynote at Mensch u...Model-based Research in Human-Computer Interaction (HCI): Keynote at Mensch u...
Model-based Research in Human-Computer Interaction (HCI): Keynote at Mensch u...Ed Chi
 
Zerozero88 Twitter URL Item Recommender
Zerozero88 Twitter URL Item RecommenderZerozero88 Twitter URL Item Recommender
Zerozero88 Twitter URL Item RecommenderEd Chi
 
Smart eBooks: ScentIndex and ScentHighlight research published at VAST2006
Smart eBooks: ScentIndex and ScentHighlight research published at VAST2006Smart eBooks: ScentIndex and ScentHighlight research published at VAST2006
Smart eBooks: ScentIndex and ScentHighlight research published at VAST2006Ed Chi
 
Model-Driven Research in Social Computing
Model-Driven Research in Social ComputingModel-Driven Research in Social Computing
Model-Driven Research in Social ComputingEd Chi
 
ASC Disaster Response Proposal from Aug 2007
ASC Disaster Response Proposal from Aug 2007ASC Disaster Response Proposal from Aug 2007
ASC Disaster Response Proposal from Aug 2007Ed Chi
 
Using Information Scent to Model Users in Web1.0 and Web2.0
Using Information Scent to Model Users in Web1.0 and Web2.0Using Information Scent to Model Users in Web1.0 and Web2.0
Using Information Scent to Model Users in Web1.0 and Web2.0Ed Chi
 
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...Ed Chi
 
2010-03-10 PARC Augmented Social Cognition Research Overview
2010-03-10 PARC Augmented Social Cognition Research Overview2010-03-10 PARC Augmented Social Cognition Research Overview
2010-03-10 PARC Augmented Social Cognition Research OverviewEd Chi
 
2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica
2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica
2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica SinicaEd Chi
 
Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...
Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...
Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...Ed Chi
 
Slowing Growth of Wikipedia and Models of its Dynamic (Presented at Wikimedia...
Slowing Growth of Wikipedia and Models of its Dynamic (Presented at Wikimedia...Slowing Growth of Wikipedia and Models of its Dynamic (Presented at Wikimedia...
Slowing Growth of Wikipedia and Models of its Dynamic (Presented at Wikimedia...Ed Chi
 

More from Ed Chi (20)

2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...
 
HCI Korea 2012 Keynote Talk on Model-Driven Research in Social Computing
HCI Korea 2012 Keynote Talk on Model-Driven Research in Social ComputingHCI Korea 2012 Keynote Talk on Model-Driven Research in Social Computing
HCI Korea 2012 Keynote Talk on Model-Driven Research in Social Computing
 
Location and Language in Social Media (Stanford Mobi Social Invited Talk)
Location and Language in Social Media (Stanford Mobi Social Invited Talk)Location and Language in Social Media (Stanford Mobi Social Invited Talk)
Location and Language in Social Media (Stanford Mobi Social Invited Talk)
 
CIKM 2011 Social Computing Industry Invited Talk
CIKM 2011 Social Computing Industry Invited TalkCIKM 2011 Social Computing Industry Invited Talk
CIKM 2011 Social Computing Industry Invited Talk
 
WikiSym 2011 Closing Keynote
WikiSym 2011 Closing KeynoteWikiSym 2011 Closing Keynote
WikiSym 2011 Closing Keynote
 
CSCL 2011 Keynote on Social Computing and eLearning
CSCL 2011 Keynote on Social Computing and eLearningCSCL 2011 Keynote on Social Computing and eLearning
CSCL 2011 Keynote on Social Computing and eLearning
 
Replication is more than Duplication: Position slides for CHI2011 panel on re...
Replication is more than Duplication: Position slides for CHI2011 panel on re...Replication is more than Duplication: Position slides for CHI2011 panel on re...
Replication is more than Duplication: Position slides for CHI2011 panel on re...
 
Eddi: Topic Browsing of Twitter Streams
Eddi: Topic Browsing of Twitter StreamsEddi: Topic Browsing of Twitter Streams
Eddi: Topic Browsing of Twitter Streams
 
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...
 
Model-based Research in Human-Computer Interaction (HCI): Keynote at Mensch u...
Model-based Research in Human-Computer Interaction (HCI): Keynote at Mensch u...Model-based Research in Human-Computer Interaction (HCI): Keynote at Mensch u...
Model-based Research in Human-Computer Interaction (HCI): Keynote at Mensch u...
 
Zerozero88 Twitter URL Item Recommender
Zerozero88 Twitter URL Item RecommenderZerozero88 Twitter URL Item Recommender
Zerozero88 Twitter URL Item Recommender
 
Smart eBooks: ScentIndex and ScentHighlight research published at VAST2006
Smart eBooks: ScentIndex and ScentHighlight research published at VAST2006Smart eBooks: ScentIndex and ScentHighlight research published at VAST2006
Smart eBooks: ScentIndex and ScentHighlight research published at VAST2006
 
Model-Driven Research in Social Computing
Model-Driven Research in Social ComputingModel-Driven Research in Social Computing
Model-Driven Research in Social Computing
 
ASC Disaster Response Proposal from Aug 2007
ASC Disaster Response Proposal from Aug 2007ASC Disaster Response Proposal from Aug 2007
ASC Disaster Response Proposal from Aug 2007
 
Using Information Scent to Model Users in Web1.0 and Web2.0
Using Information Scent to Model Users in Web1.0 and Web2.0Using Information Scent to Model Users in Web1.0 and Web2.0
Using Information Scent to Model Users in Web1.0 and Web2.0
 
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...
China HCI Symposium 2010 March: Augmented Social Cognition Research from PARC...
 
2010-03-10 PARC Augmented Social Cognition Research Overview
2010-03-10 PARC Augmented Social Cognition Research Overview2010-03-10 PARC Augmented Social Cognition Research Overview
2010-03-10 PARC Augmented Social Cognition Research Overview
 
2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica
2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica
2010-02-22 Wikipedia MTurk Research talk given in Taiwan's Academica Sinica
 
Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...
Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...
Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...
 
Slowing Growth of Wikipedia and Models of its Dynamic (Presented at Wikimedia...
Slowing Growth of Wikipedia and Models of its Dynamic (Presented at Wikimedia...Slowing Growth of Wikipedia and Models of its Dynamic (Presented at Wikimedia...
Slowing Growth of Wikipedia and Models of its Dynamic (Presented at Wikimedia...
 

Recently uploaded

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 

Recently uploaded (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 

Crowdsourcing using MTurk for HCI research

  • 1. Crowdsourcing using Mechanical Turk for Human Computer Interaction Research Ed H. Chi Research Scientist Google (work done while at [Xerox] PARC) 1
  • 2. Historical Footnote De Prony, 1794, hired hairdressers •  (unemployed after French revolution; knew only addition and subtraction) •  to create logarithmic and trigonometric tables. •  He managed the process by splitting the work into very detailed workflows. !"#$% &'#(")$)*'%+ ,'"%- • !"#$%/ 0121 )31 56'#(")12/+7 "/1- #$)3 6'#(")$)*'% –  Grier, When computers were human, 2005 • !"#$%&'() 6'#(") – &9$*2$")+ $/)2'%'# &'#(")1- )31 !$9 6'#1) '2?*) @)3211 (2'?91#A )&*&)&%# -$./" '4 %"#12*6 6'#(")$)*'%/ $62' $/)2'%'#12/ C2*12+ D31% 6'#(")12/ 0 C2*12 2
  • 3. Talk in 3 Acts •  Act 1: –  How we almost failed in using MTurk?! –  [Kittur, Chi, Suh, CHI2008] •  Act II: –  Apply MTurk to visualization evaluation –  [Kittur, Suh, Chi, CSCW2008] •  Act III: –  Where are the limits? Aniket Kittur, Ed H. Chi, Bongwon Suh. Crowdsourcing User Studies With Mechanical Turk. In CHI2008. Aniket Kittur, Bongwon Suh, Ed H. Chi. Can You Ever Trust a Wiki? Impacting Perceived Trustworthiness in Wikipedia. In CSCW2008. 3
  • 4. Example Task from Amazon MTurk 4
  • 5. Using Mechanical Turk for user studies Traditional user Mechanical Turk studies Task complexity Complex Simple Long Short Task subjectivity Subjective Objective Opinions Verifiable User information Targeted demographics Unknown demographics High interactivity Limited interactivity Can Mechanical Turk be usefully used for user studies? 5
  • 6. Task •  Assess quality of Wikipedia articles •  Started with ratings from expert Wikipedians –  14 articles (e.g., Germany , Noam Chomsky ) –  7-point scale •  Can we get matching ratings with mechanical turk? 6
  • 7. Experiment 1 •  Rate articles on 7-point scales: –  Well written –  Factually accurate –  Overall quality •  Free-text input: –  What improvements does the article need? •  Paid $0.05 each 7
  • 8. Experiment 1: Good news •  58 users made 210 ratings (15 per article) –  $10.50 total •  Fast results –  44% within a day, 100% within two days –  Many completed within minutes 8
  • 9. Experiment 1: Bad news •  Correlation between turkers and Wikipedians only marginally significant (r=.50, p=.07) •  Worse, 59% potentially invalid responses Experiment 1 Invalid 49% comments <1 min 31% responses •  Nearly 75% of these done by only 8 users 9
  • 10. Not a good start •  Summary of Experiment 1: –  Only marginal correlation with experts. –  Heavy gaming of the system by a minority •  Possible Response: –  Can make sure these gamers are not rewarded –  Ban them from doing your hits in the future –  Create a reputation system [Delores Lab] •  Can we change how we collect user input ? 10
  • 11. Design changes •  Use verifiable questions to signal monitoring –  How many sections does the article have? –  How many images does the article have? –  How many references does the article have? 11
  • 12. Design changes •  Use verifiable questions to signal monitoring •  Make malicious answers as high cost as good-faith answers –  Provide 4-6 keywords that would give someone a good summary of the contents of the article 12
  • 13. Design changes •  Use verifiable questions to signal monitoring •  Make malicious answers as high cost as good-faith answers •  Make verifiable answers useful for completing task –  Used tasks similar to how Wikipedians evaluate quality (organization, presentation, references) 13
  • 14. Design changes •  Use verifiable questions to signal monitoring •  Make malicious answers as high cost as good-faith answers •  Make verifiable answers useful for completing task •  Put verifiable tasks before subjective responses –  First do objective tasks and summarization –  Only then evaluate subjective quality –  Ecological validity? 14
  • 15. Experiment 2: Results •  124 users provided 277 ratings (~20 per article) •  Significant positive correlation with Wikipedians –  r=.66, p=.01 •  Smaller proportion malicious responses •  Increased time on task Experiment 1 Experiment 2 Invalid 49% 3% comments <1 min 31% 7% responses Median time 1:30 4:06 15
  • 16. Quick Summary of Tips •  Mechanical Turk offers the practitioner a way to access a large user pool and quickly collect data at low cost •  Good results require careful task design 1.  Use verifiable questions to signal monitoring 2.  Make malicious answers as high cost as good-faith answers 3.  Make verifiable answers useful for completing task 4.  Put verifiable tasks before subjective responses 16
  • 17. Generalizing to other MTurk studies •  Combine objective and subjective questions –  Rapid prototyping: ask verifiable questions about content/ design of prototype before subjective evaluation –  User surveys: ask common-knowledge questions before asking for opinions •  Filtering for Quality –  Put in a field for Free-Form Responses and Filter out data without answers –  Results that came in too quickly –  Sort by WorkerID and look for cut and paste answers –  Look for outliers in the data that are suspicious 17
  • 18. Talk in 3 Acts •  Act 1: –  How we almost failed?! •  Act II: –  Applying MTurk to visualization evaluation •  Act III: –  Where are the limits? 18
  • 19. What would make you trust Wikipedia more? 20
  • 20. What is Wikipedia? Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you re getting the best possible information. – Steve Carell, The Office 21
  • 21. What would make you trust Wikipedia more? Nothing 22
  • 22. What would make you trust Wikipedia more? Wikipedia, just by its nature, is impossible to trust completely. I don't think this can necessarily be changed. 23
  • 23. WikiDashboard   Transparency of social dynamics can reduce conflict and coordination issues   Attribution encourages contribution –  WikiDashboard: Social dashboard for wikis –  Prototype system: http://wikidashboard.parc.com   Visualization for every wiki page showing edit history timeline and top individual editors   Can drill down into activity history for specific editors and view edits to see changes side-by-side Citation: Suh et al. CHI 2008 Proceedings 2011 UCBerkeley Visual Computing Retreat 24
  • 24. Hillary  Clinton   2011 UCBerkeley Visual 25 Computing Retreat 25
  • 25. Top  Editor  -­‐  Wasted  Time  R   2011 UCBerkeley Visual 26 Computing Retreat
  • 26. Surfacing information •  Numerous studies mining Wikipedia revision history to surface trust-relevant information –  Adler & Alfaro, 2007; Dondio et al., 2006; Kittur et al., 2007; Viegas et al., 2004; Zeng et al., 2006 Suh, Chi, Kittur, & Pendleton, CHI2008 •  But how much impact can this have on user perceptions in a system which is inherently mutable? 27
  • 27. Hypotheses 1.  Visualization will impact perceptions of trust 2.  Compared to baseline, visualization will impact trust both positively and negatively 3.  Visualization should have most impact when high uncertainty about article •  Low quality •  High controversy 28
  • 28. Design •  3 x 2 x 2 design Controversial Uncontroversial Visualization Abortion Volcano High quality •  High stability George Bush Shark •  Low stability •  Baseline (none) Pro-life feminism Disk defragmenter Low quality Scientology and celebrities Beeswax 29
  • 29. Example: High trust visualization 30
  • 30. Example: Low trust visualization 31
  • 31. Summary info •  % from anonymous users 32
  • 32. Summary info •  % from anonymous users •  Last change by anonymous or established user 33
  • 33. Summary info •  % from anonymous users •  Last change by anonymous or established user •  Stability of words 34
  • 35. Method •  Users recruited via Amazon s Mechanical Turk –  253 participants –  673 ratings –  7 cents per rating –  Kittur, Chi, & Suh, CHI 2008: Crowdsourcing user studies •  To ensure salience and valid answers, participants answered: –  In what time period was this article the least stable? –  How stable has this article been for the last month? –  Who was the last editor? –  How trustworthy do you consider the above editor? 36
  • 36. Results 7 High stability Baseline Low stability 6 Trustworthiness rating 5 4 3 2 1 Low qual High qual Low qual High qual Uncontroversial Controversial main effects of quality and controversy: • high-quality articles > low-quality articles (F(1, 425) = 25.37, p < .001) • uncontroversial articles > controversial articles (F(1, 425) = 4.69, p = . 031) 37
  • 37. Results 7 High stability Baseline Low stability 6 Trustworthiness rating 5 4 3 2 1 Low qual High qual Low qual High qual Uncontroversial Controversial interaction effects of quality and controversy: • high quality articles were rated equally trustworthy whether controversial or not, while • low quality articles were rated lower when they were controversial than when they were uncontroversial. 38
  • 38. Results 1.  Significant effect of visualization: High-Stability > Low-Stability, p < .001 2.  Viz has both positive and negative effects: –  High-Stability > Baseline (p < .001) > Low-Stability, p < .01 3.  No interaction of visualization with either quality or controversy –  Robust across visualization conditions 7 High stability Baseline Low stability 6 Trustworthiness rating 5 4 3 2 1 Low qual High qual Low qual High qual Uncontroversial Controversial 39
  • 39. Results 1.  Significant effect of visualization: High-Stability > Low-Stability, p < .001 2.  Viz has both positive and negative effects: –  High-Stability > Baseline (p < .001) > Low-Stability, p < .01 3.  No interaction of visualization with either quality or controversy –  Robust across visualization conditions 7 High stability Baseline Low stability 6 Trustworthiness rating 5 4 3 2 1 Low qual High qual Low qual High qual Uncontroversial Controversial 40
  • 40. Results 1.  Significant effect of visualization: High-Stability > Low-Stability, p < .001 2.  Viz has both positive and negative effects: –  High-Stability > Baseline (p < .001) > Low-Stability, p < .01 3.  No interaction of visualization with either quality or controversy –  Robust across visualization conditions 7 High stability Baseline Low stability 6 Trustworthiness rating 5 4 3 2 1 Low qual High qual Low qual High qual Uncontroversial Controversial 41
  • 41. Talk in 3 Acts •  Act 1: –  How we almost failed?! •  Act II: –  Applying MTurk to visualization evaluation •  Act III: –  Where are the limits? 42
  • 42. Limitations of Mechanical Turk •  No control of users environment –  Potential for different browsers, physical distractions –  General problem with online experimentation •  Not yet designed for user studies –  Difficult to do between-subjects design –  May need some programming •  Hard to control user population –  hard to control demographics, expertise 43
  • 43. Crowdsourcing for HCI Research •  Does my interface/visualization work? –  WikiDashboard: transparency vis for Wikipedia [Suh et al.] –  Replicating Perceptual Experiments [Heer et al., CHI2010] •  Coding of large amount of user data –  What is a Question in Twitter? [Sharoda Paul, Lichan Hong, Ed Chi] •  Incentive mechanisms –  Intrinsic vs. Extrinsic rewards: Games vs. Pay –  [Horton & Chilton, 2010 for Mturk] and [Ariely, 2009] in general 44
  • 44. Crowdsourcing for HCI Research •  Does my interface/visualization work? –  WikiDashboard: transparency vis for Wikipedia [Suh et al. VAST, Kittur et al. CSCW2008] –  Replicating Perceptual Experiments [Heer et al., CHI2010] •  Coding of large amount of user data –  What is a Question in Twitter? [S. Paul, L. Hong, E. Chi, ICWSM 2011] •  Incentive mechanisms –  Intrinsic vs. Extrinsic rewards: Games vs. Pay –  [Horton & Chilton, 2010 on MTurk] and Satisficing –  [Ariely, 2009] in general: Higher pay != Better work 45
  • 45. Managing Quality •  Quality through redundancy: Combining votes –  Majority vote [work best when similar worker quality] –  Worker-Quality‐adjusted vote –  Managing dependencies •  Quality through gold data –  Advantaged when imbalanced dataset & bad workers •  Estimating worker quality (Redundancy + Gold) –  Calculate the confusion matrix and see if you actually get some information from the worker •  Toolkit: http://code.google.com/p/get‐another‐label/ Source: Ipeirotis, WWW2011 46
  • 46. Coding and Machine Learning !"#$%& '(%)*"(+ •  Integration with Machine Learning • ,)#-+' %-.&% */-"+"+0 1-*- using –  Build automatic classification models crowdsourced data • 2'& */-"+"+0 1-*- *( .)"%1 #(1&% Data from existing crowdsourced answers N New C Case Automatic Model Automatic (through machine learning) Answer Source: Ipeirotis, WWW2011 47
  • 47. Crowd Programming for Complex Tasks •  Decompose tasks into smaller tasks –  Digital Taylorism –  Frederick Winslow Taylor (1856-1915) –  1911 'Principles Of Scientific Management’ •  Crowd Programming Explorations –  MapReduce Models •  Kittur, A.; Smus, B.; and Kraut, R. CHI2011EA on CrowdForge. •  Kulkarni, Can, Hartmann, CHI2011 workshop & WIP –  Little, G.; Chilton, L.; Goldman, M.; and Miller, R. C. In KDD 2010 Workshop on Human Computation. 48
  • 48. CHI 2011 • Work-in-Progress May 7–12, 2011 • Vancouver, BC, Canada Crowd Programming for Complex Tasks ! ! "#!$%&!'%()(*!%!(&+,-.-+/!&01,+((-#2!('+&!-(!%&&3-+/!'1! &%0'-'-1#!('+&!%()+/!:10)+0(!'1!,0+%'+!%#!%0'-,3+!18'3-#+*! +%,4!-'+$!-#!'4+!&%0'-'-1#5!64+(+!'%()(!%0+!-/+%337! 0+&0+(+#'+/!%(!%#!%00%7!1.!(+,'-1#!4+%/-#2(!(8,4!%(! •  Crowd Programming Explorations (-$&3+!+#1824!'1!9+!%#(:+0%93+!97!%!(-#23+!:10)+0!-#!%! (410'!%$18#'!1.!'-$+5!;10!+<%$&3+*!%!$%&!'%()!.10! EF-('107G!%#/!EH+120%&47G5!"#!%#!+#=-01#$+#'!:4+0+! :10)+0(!:183/!,1$&3+'+!4-24!+..10'!'%()(*!'4+!#+<'!('+&! –  Kittur, A.; Smus, B.; and Kraut, R. CHI2011EA on %0'-,3+!:0-'-#2!,183/!%()!%!:10)+0!'1!,133+,'!1#+!.%,'!1#! %!2-=+#!'1&-,!-#!'4+!%0'-,3+>(!18'3-#+5!?83'-&3+!-#('%#,+(! $-24'!9+!'1!4%=+!(1$+1#+!:0-'+!%!&%0%20%&4!.10!+%,4! (+,'-1#5!F1:+=+0*!'4+!/-..-,83'7!%#/!'-$+!-#=13=+/!-#! CrowdForge. 1.!%!$%&!'%()(!,183/!9+!-#('%#'-%'+/!.10!+%,4!&%0'-'-1#@! +525*!$83'-&3+!:10)+0(!,183/!9+!%()+/!'1!,133+,'!1#+!.%,'! .-#/-#2!'4+!-#.10$%'-1#!.10!%#/!:0-'-#2!%!,1$&3+'+! &%0%20%&4!.10!%!4+%/-#2!-(!%!$-($%',4!'1!'4+!31:!:10)! +%,4!1#!%!'1&-,!-#!&%0%33+35! ,%&%,-'7!1.!$-,01I'%()!$%0)+'(5!648(!:+!901)+!'4+!'%()! –  Kulkarni, Can, Hartmann, CHI2011 workshop & WIP 8&!.80'4+0*!(+&%0%'-#2!'4+!-#.10$%'-1#!,133+,'-1#!%#/! ;-#%337*!0+/8,+!'%()(!'%)+!%33!'4+!0+(83'(!.01$!%!2-=+#! :0-'-#2!(89'%()(5!B&+,-.-,%337*!+%,4!(+,'-1#!4+%/-#2! $%&!'%()!%#/!,1#(13-/%'+!'4+$*!'7&-,%337!-#'1!%!(-#23+! .01$!'4+!&%0'-'-1#!:%(!8(+/!'1!2+#+0%'+!$%&!'%()(!-#! 0+(83'5!"#!'4+!%0'-,3+!:0-'-#2!+<%$&3+*!%!0+/8,+!('+&! $-24'!'%)+!.%,'(!,133+,'+/!.10!%!2-=+#!'1&-,!97!$%#7! :10)+0(!%#/!4%=+!%!:10)+0!'80#!'4+$!-#'1!%!&%0%20%&45! “Please solve the 16-question SAT located at A#7!1.!'4+(+!('+&(!,%#!9+!-'+0%'-=+5!;10!+<%$&3+*!'4+! http://bit.ly/SATexam”. In both cases, we paid workers '1&-,!.10!%#!%0'-,3+!(+,'-1#!/+.-#+/!-#!%!.-0('!&%0'-'-1#! between $0.10 and $0.40 per HIT. Each “subdivide” or ,%#!-'(+3.!9+!&%0'-'-1#+/!-#'1!(89(+,'-1#(5!B-$-3%037*!'4+! &%0%20%&4(!0+'80#+/!.01$!1#+!0+/8,'-1#!('+&!,%#!-#! “merge” HIT received answers within 4 hours; solutions '80#!9+!0+10/+0+/!'401824!%!(+,1#/!0+/8,'-1#!('+&5! to the initial task were complete within 72 hours. !"#$%#&'()$#% C+!+<&310+/!%(!%!,%(+!('8/7!'4+!,1$&3+<!'%()!1.! :0-'-#2!%#!+#,7,31&+/-%!%0'-,3+5!C0-'-#2!%#!%0'-,3+!-(!%! Results ,4%33+#2-#2!%#/!-#'+0/+&+#/+#'!'%()!'4%'!-#=13=+(!$%#7! The decompositions produced by Turkers while running /-..+0+#'!(89'%()(D!&3%##-#2!'4+!(,1&+!1.!'4+!%0'-,3+*! 41:!-'!(4183/!9+!('08,'80+/*!.-#/-#2!%#/!.-3'+0-#2! Turkomatic are displayed in Figure 1 (essay-writing) -#.10$%'-1#!'1!-#,38/+*!:0-'-#2!8&!'4%'!-#.10$%'-1#*! .-#/-#2!%#/!.-<-#2!20%$$%0!%#/!(&+33-#2*!%#/!$%)-#2! and Figure 4 (SAT). '4+!%0'-,3+!,14+0+#'5!64+(+!,4%0%,'+0-('-,(!$%)+!%0'-,3+! Figure 4. For the SAT task, we uploaded :0-'-#2!%!,4%33+#2-#2!98'!0+&0+(+#'%'-=+!'+('!,%(+!.10! sixteen questions from a high school 180!%&&01%,45! In the essay task, each “subdivide” HIT was posted Scholastic Aptitude Test to the web and three times by Turkomatic and the best of the three 61!(13=+!'4-(!&0193+$!:+!,0+%'+/!%!(-$&3+!.31:! *)+',$%-.%/",&)"0%,$#'0&#%12%"%3100"41,"&)5$% was selected by experimenters (simulating Turker 49 posed ,1#(-('-#2!1.!%!&%0'-'-1#*!$%&*!%#/!0+/8,+!('+&5!!64+! the following task to Turkomatic: 6,)&)7+%&"#89% “Please solve the 16-question SAT located voting) to continue the solution process. The proposed at http://bit.ly/SATexam”. decompositions were overwhelmingly linear and chose 1804
  • 49. Future Directions in Crowdsourcing •  Real-time Crowdsourcing –  Bigham, et al. VizWiz, UIST 2010 What color is this pillow? What denomination is Do you see picnic tables What temperature is my Can you please tell me What k this bill? across the parking lot? oven set to? what this can is? thi (89s) . (24s) 20 (13s) no (69s) it looks like 425 (183s) chickpeas. (91s) (105s) multiple shades (29s) 20 (46s) no degrees but the image (514s) beans (99s) n of soft green, blue and is difficult to see. (552s) Goya Beans picture gold (84s) 400 (247s) (122s) 450 Figure 2: Six questions asked by participants, the photographs they took, and answers received with latency in s 50 the total time required to answer a question. quikTurkit also distribution was set such that half of the HI
  • 50. Future Directions in Crowdsourcing •  Real-time Crowdsourcing –  Bigham, et al. VizWiz, UIST 2010 •  Embedding of Crowdwork inside Tools –  Bernstein, et al. Solyent, UIST 2010 51
  • 51. Crowd Feedback To effectively design feedback mechanism the goals of learning, engagement, and qu improvement, we first analyze the importa Future Directions in Crowdsourcing dimensions of the design space for crowd (Figure 2). Timeliness: When should feedback be sho •  Real-time Crowdsourcing In micro-task work, workers stay with tas –  Bigham, et al. VizWiz, UIST 2010 while, then move on. This implies two tim synchronously deliver feedback when wor •  Embedding of Crowdwork inside Tools engaged in a set of tasks, or asynchronou –  Bernstein, et al. Solyent, UIST 2010 feedback after workers have completed th •  Shepherding Crowdwork Synchronous feedback may have more im –  Dow et al. CHI2011 WIP task performance since while workers are still th the task domain. It also probability that workers onto similar tasks. Howe synchronous feedback p burden on the feedback they have little time to r This implies a need for t scheduling algorithms th near real-time feedback Asynchronous feedback 52 feedback providers more Figure 2: Current systems (in orange) focus on asynchronous, single-bit feedback by requesters. review and comment on
  • 52. Tutorials •  Matt Lease http://ir.ischool.utexas.edu/crowd/ •  AAAI 2011 (w HCOMP 2011): Human Computation: Core Research Questions and State of the Art (E. Law & Luis von Ahn) •  WSDM 2011: Crowdsourcing 101: Putting the WSDM of Crowds to Work for You (Omar Alonso and Matthew Lease) –  http://ir.ischool.utexas.edu/wsdm2011_tutorial.pdf •  LREC 2010 Tutorial: Statistical Models of the Annotation Process (Bob Carpenter and Massimo Poesio) –  http://lingpipe-blog.com/2010/05/17/ •  ECIR 2010: Crowdsourcing for Relevance Evaluation. (Omar Alonso) –  http://wwwcsif.cs.ucdavis.edu/~alonsoom/crowdsourcing.html •  CVPR 2010: Mechanical Turk for Computer Vision. (Alex Sorokin and Fei‐Fei Li) –  http://sites.google.com/site/turkforvision/ •  CIKM 2008: Crowdsourcing for Relevance Evaluation (D. Rose) –  http://videolectures.net/cikm08_rose_cfre/ •  WWW2011: Managing Crowdsourced Human Computation (Panos Ipeirotis) –  http://www.slideshare.net/ipeirotis/managing-crowdsourced-human-computation 53
  • 53. Social Q&A on Twitter! ! S.  Paul,  L.  Hong,  E.  Chi,  ICWSM  2011     3/27/12 54
  • 54. Why social Q&A?! ! ! People turn to their friends on social networks because they trust their friends to provide tailored answers to subjective questions on niche topics.! ! 3/27/12! 55
  • 55. Research Questions! ! What kinds of questions are Twitter users asking their friends?! ! Types and topics of questions! ! Are users receiving responses to the questions they are asking?! Number, speed, and relevancy of responses! ! How does the nature of the social network affect Q&A behavior?! Size and usage of network, reciprocity of relationship! 3/27/12 58
  • 56. Identifying question tweets was challenging! ! Advertisement framed as question! ! ! Rhetorical question! ! ! Missing context! ! ! Used heuristics to identify candidate tweets! that were possibly questions! ! 3/27/12 59
  • 57. Classifying candidates tweets using Mechanical Turk! Crowd-sourced question tweet identification to Amazon Mechanical Turk! ! ! Control tweet! ! •  Each Tweet classified by two Turkers! ! •  Each Turker classified 25 tweets: 20 candidates and 5 control tweets! •  Only accepted data from Turkers who classified all control tweets correctly! 3/27/12 60
  • 58. Overall method for filtering questions! ! Candidate tweets   Random sample of public tweets   Applied heuristics to ! identify candidate tweets ! 12,000! 1.2 million! (4,100 presented to Turkers)!   Classified candidates! Tracked responses ! using Mechanical Turk! to each candidate tweet! 624! 1152! 3/27/12 61
  • 59. Findings: Types and topics of questions! ! Rhetorical (42%), factual (16%), and poll (15%) questions were common! Significant percentage of personal & health (11%)questions! ! ! Question types! Question topics! How do you feel about Which team is better interracial dating? others   raiders or steelers? 16%   uncategorized   5%   entertainment   professional   32%   4%   restaurant/food   Any good iPad app 4%   recommendations? current  events   In UK, when you need to 4%   gree8ngs     technology   see a specialist, do you 7%   10%   personal   need special forms or &  health   permission? 11%   ethics  &   philosophy   7%   Any idea how to lost weight fast? 3/27/12 62
  • 60. Findings: Responses to questions! 8 ! 7 ! log(number of questions) 6 Number of responses 5 have a long tail Low (18.7%) 4 distribution! response rate in 3 general, but quick 2 responses! 1 0 ! 0 1 2 3 4 5 6 7 8 10 16 17 28 29 39 147 Number of answers Most often reciprocity between asker and answerer was one-way (55%)! Responses were largely (84%) relevant! 3/27/12 ! 63
  • 61. Findings: Social network characteristics! ! Which characteristics of asker predict whether she will receive a response?! ! ! Network size and status in network are good ! predictors of whether asker will receive response! Logistic regression modeling (structural properties)! ! Number of followers (+) " "Number of tweets posted! Number of days on Twitter (+)" "Frequency of use of Twitter! Ratio of followers/followees (+)! Reciprocity rate (-)! ! ! ! 3/27/12 64
  • 62. Thanks! •  chi@acm.org •  http://edchi.net •  @edchi •  Aniket Kittur, Ed H. Chi, Bongwon Suh. Crowdsourcing User Studies With Mechanical Turk. In Proceedings of the ACM Conference on Human- factors in Computing Systems (CHI2008), pp.453-456. ACM Press, 2008. Florence, Italy. •  Aniket Kittur, Bongwon Suh, Ed H. Chi. Can You Ever Trust a Wiki? Impacting Perceived Trustworthiness in Wikipedia. In Proc. of Computer- Supported Cooperative Work (CSCW2008), pp. 477-480. ACM Press, 2008. San Diego, CA. [Best Note Award] 66