一、实验目的
(1)熟悉 Spark 的 RDD 基本操作及键值对操作;
(2)熟悉使用 RDD 编程解决实际具体问题的方法。
二、实验平台
操作系统:Ubuntu16.04
Spark 版本:2.1.0
三、实验内容和要求
1.spark-shell 交互式编程
请到本教程官网的“下载专区”的“数据集”中下载 chapter5-data1.txt,该数据集包含
了某大学计算机系的成绩,数据格式如下所示:
Tom,DataBase,80
Tom,Algorithm,50
Tom,DataStructure,60
Jim,DataBase,90
Jim,Algorithm,60
Jim,DataStructure,80
……
请根据给定的实验数据,在 spark-shell 中通过编程来计算以下内容:
数据如下:
Aaron,OperatingSystem,100 Aaron,Python,50 Aaron,ComputerNetwork,30 Aaron,Software,94 Abbott,DataBase,18 Abbott,Python,82 Abbott,ComputerNetwork,76 Abel,Algorithm,30 Abel,DataStructure,38 Abel,OperatingSystem,38 Abel,ComputerNetwork,92 Abraham,DataStructure,12 Abraham,ComputerNetwork,78 Abraham,Software,98 Adair,DataBase,20 Adair,Python,98 Adair,Software,88 Adam,Algorithm,18 Adam,ComputerNetwork,70 Adam,Software,80 Adolph,DataStructure,82 Adolph,CLanguage,100 Adolph,ComputerNetwork,70 Adolph,Software,18 Adonis,DataBase,86 Adonis,Algorithm,34 Adonis,DataStructure,52 Adonis,CLanguage,30 Adonis,Python,86 Alan,Algorithm,48 Alan,OperatingSystem,86 Alan,CLanguage,72 Alan,Python,94 Alan,ComputerNetwork,88 Albert,DataStructure,60 Albert,CLanguage,76 Albert,ComputerNetwork,62 Aldrich,DataBase,42 Aldrich,Python,98 Aldrich,ComputerNetwork,80 Alexander,Algorithm,56 Alexander,DataStructure,4 Alexander,CLanguage,74 Alexander,Python,70 Alfred,Algorithm,60 Alfred,Python,96 Alger,Algorithm,50 Alger,OperatingSystem,32 Alger,Python,96 Alger,ComputerNetwork,20 Alger,Software,74 Allen,Algorithm,76 Allen,OperatingSystem,70 Allen,Python,10 Allen,Software,76 Alston,Algorithm,78 Alston,DataStructure,74 Alston,Python,96 Alston,Software,28 Alva,DataBase,72 Alva,DataStructure,64 Alva,CLanguage,0 Alva,ComputerNetwork,58 Alva,Software,82 Alvin,DataBase,88 Alvin,Algorithm,96 Alvin,OperatingSystem,26 Alvin,Python,84 Alvin,ComputerNetwork,76 Alvis,Algorithm,18 Alvis,DataStructure,56 Alvis,OperatingSystem,64 Alvis,CLanguage,56 Alvis,Python,64 Alvis,ComputerNetwork,56 Amos,DataBase,60 Amos,Algorithm,22 Amos,DataStructure,46 Amos,OperatingSystem,42 Amos,ComputerNetwork,4 Andrew,Algorithm,96 Andrew,DataStructure,62 Andrew,CLanguage,20 Andrew,Python,94 Andy,Algorithm,52 Andy,Python,76 Andy,ComputerNetwork,20 Angelo,CLanguage,30 Angelo,Software,54 Antony,DataBase,100 Antony,OperatingSystem,72 Antony,CLanguage,98 Antony,Python,46 Antony,ComputerNetwork,28 Antonio,DataBase,92 Antonio,CLanguage,22 Antonio,ComputerNetwork,0 Archer,Algorithm,18 Archer,OperatingSystem,70 Archer,CLanguage,44 Archer,Python,54 Archer,Software,10 Archibald,DataBase,20 Archibald,Algorithm,0 Archibald,CLanguage,30 Archibald,Python,84 Archibald,ComputerNetwork,30 Aries,Algorithm,60 Aries,DataStructure,10 Arlen,DataStructure,34 Arlen,OperatingSystem,2 Arlen,ComputerNetwork,52 Arlen,Software,54 Armand,DataBase,26 Armand,DataStructure,42 Armand,OperatingSystem,18 Armstrong,DataBase,28 Armstrong,Software,26 Baron,Algorithm,12 Baron,DataStructure,40 Baron,OperatingSystem,72 Baron,CLanguage,86 Baron,ComputerNetwork,96 Baron,Software,54 Barry,DataStructure,90 Barry,OperatingSystem,60 Barry,Python,100 Barry,ComputerNetwork,28 Barry,Software,16 Bartholomew,Algorithm,16 Bartholomew,CLanguage,44 Bartholomew,Python,100 Bartholomew,ComputerNetwork,34 Bartholomew,Software,50 Bart,DataBase,64 Bart,Algorithm,12 Bart,DataStructure,62 Bart,Python,56 Bart,Software,8 Barton,Python,90 Basil,DataBase,8 Basil,CLanguage,92 Basil,Python,98 Basil,Software,48 Beck,DataBase,92 Beck,DataStructure,66 Beck,OperatingSystem,30 Beck,ComputerNetwork,0 Ben,DataBase,52 Ben,Algorithm,100 Ben,Python,40 Ben,ComputerNetwork,42 Benedict,DataBase,60 Benedict,DataStructure,96 Benedict,CLanguage,8 Benedict,Python,98 Benedict,ComputerNetwork,84 Benedict,Software,76 Benjamin,Algorithm,74 Benjamin,DataStructure,94 Benjamin,Python,60 Benjamin,Software,82 Bennett,DataBase,88 Bennett,Algorithm,42 Bennett,DataStructure,60 Bennett,CLanguage,74 Bennett,ComputerNetwork,56 Bennett,Software,38 Benson,Algorithm,64 Benson,DataStructure,52 Benson,OperatingSystem,38 Benson,CLanguage,86 Berg,Algorithm,88 Berg,DataStructure,28 Berg,CLanguage,92 Berg,Python,70 Bernard,DataStructure,46 Bernard,Python,98 Bernie,DataStructure,46 Bernie,ComputerNetwork,4 Bernie,Software,28 Bert,DataBase,58 Bert,Python,16 Bert,Software,94 Bertram,OperatingSystem,54 Bertram,ComputerNetwork,86 Bertram,Software,4 Bevis,OperatingSystem,74 Bevis,CLanguage,66 Bevis,Python,84 Bevis,ComputerNetwork,72 Bill,DataBase,56 Bill,ComputerNetwork,86 Bing,DataBase,74 Bing,DataStructure,28 Bing,OperatingSystem,100 Bing,CLanguage,18 Bing,Python,56 Bing,ComputerNetwork,100 Bishop,Algorithm,12 Bishop,OperatingSystem,60 Blair,CLanguage,98 Blair,Python,4 Blair,ComputerNetwork,18 Blair,Software,90 Blake,DataBase,88 Blake,CLanguage,18 Blake,Python,52 Blake,ComputerNetwork,94 Blithe,DataStructure,64 Blithe,ComputerNetwork,94 Blithe,Software,86 Bob,DataBase,64 Bob,Algorithm,20 Bob,CLanguage,56 Booth,Algorithm,76 Booth,OperatingSystem,70 Booth,CLanguage,48 Booth,Python,26 Booth,ComputerNetwork,22 Booth,Software,82 Borg,DataBase,52 Borg,CLanguage,30 Borg,Python,60 Borg,ComputerNetwork,38 Boris,Algorithm,60 Boris,DataStructure,16 Boris,OperatingSystem,16 Boris,CLanguage,72 Boris,Python,10 Boris,Software,94 Bowen,DataBase,68 Bowen,Algorithm,40 Bowen,DataStructure,62 Bowen,CLanguage,26 Bowen,Python,60 Boyce,DataBase,74 Boyce,Software,6 Boyd,DataStructure,18 Boyd,OperatingSystem,94 Boyd,Software,40 Bradley,DataBase,34 Bradley,Algorithm,14 Brady,DataBase,10 Brady,Algorithm,92 Brady,DataStructure,72 Brady,CLanguage,50 Brady,Python,100 Brandon,DataBase,68 Brandon,Algorithm,74 Brandon,DataStructure,20 Brandon,OperatingSystem,80 Brandon,Software,80 Brian,Algorithm,56 Brian,DataStructure,34 Brian,OperatingSystem,12 Brian,CLanguage,2 Brian,Python,14 Brian,Software,8 Broderick,Algorithm,34 Broderick,DataStructure,32 Broderick,ComputerNetwork,48 Brook,DataStructure,72 Brook,OperatingSystem,58 Brook,CLanguage,66 Brook,Software,56 Bruce,Algorithm,100 Bruce,OperatingSystem,62 Bruce,CLanguage,26 Bruno,DataBase,98 Bruno,DataStructure,6 Bruno,CLanguage,92 Bruno,Python,68 Bruno,Software,78 Chad,DataBase,36 Chad,Algorithm,26 Chad,DataStructure,18 Chad,OperatingSystem,68 Chad,Python,36 Chad,ComputerNetwork,30 Channing,DataStructure,38 Channing,CLanguage,2 Channing,ComputerNetwork,18 Channing,Software,90 Chapman,DataBase,42 Chapman,Algorithm,42 Chapman,OperatingSystem,72 Chapman,Python,86 Charles,DataBase,36 Charles,Algorithm,14 Charles,OperatingSystem,86 Chester,DataBase,78 Chester,Algorithm,66 Chester,DataStructure,40 Chester,OperatingSystem,10 Chester,ComputerNetwork,52 Chester,Software,58 Christ,DataStructure,98 Christ,CLanguage,58 Christian,DataStructure,38 Christian,CLanguage,62 Christopher,DataBase,4 Christopher,Algorithm,22 Christopher,DataStructure,58 Christopher,Software,36 Clare,DataStructure,74 Clare,OperatingSystem,30 Clare,CLanguage,76 Clare,Software,36 Clarence,DataBase,82 Clarence,Algorithm,64 Clarence,DataStructure,98 Clarence,OperatingSystem,78 Clarence,CLanguage,22 Clarence,ComputerNetwork,92 Clarence,Software,56 Clark,DataBase,26 Clark,Algorithm,60 Clark,DataStructure,14 Clark,OperatingSystem,56 Clark,CLanguage,8 Clark,Software,44 Claude,CLanguage,52 Claude,ComputerNetwork,70 Clement,DataBase,92 Clement,OperatingSystem,8 Clement,CLanguage,86 Clement,Python,92 Clement,ComputerNetwork,16 Cleveland,DataBase,78 Cleveland,Algorithm,70 Cleveland,OperatingSystem,74 Cleveland,CLanguage,70 Cliff,Algorithm,46 Cliff,DataStructure,10 Cliff,CLanguage,52 Cliff,ComputerNetwork,74 Cliff,Software,10 Clyde,DataBase,86 Clyde,Algorithm,76 Clyde,DataStructure,82 Clyde,OperatingSystem,82 Clyde,Python,22 Clyde,ComputerNetwork,78 Clyde,Software,76 Colbert,DataBase,4 Colbert,Algorithm,4 Colbert,Python,32 Colbert,Software,12 Colby,DataBase,70 Colby,Algorithm,24 Colby,DataStructure,94 Colby,OperatingSystem,62 Colin,Algorithm,10 Colin,CLanguage,90 Colin,Python,82 Colin,ComputerNetwork,62 Colin,Software,30 Conrad,DataBase,48 Conrad,ComputerNetwork,76 Corey,DataBase,22 Corey,Algorithm,58 Corey,OperatingSystem,6 Corey,Python,94 Dean,DataBase,26 Dean,Algorithm,54 Dean,DataStructure,90 Dean,CLanguage,26 Dean,Python,98 Dean,ComputerNetwork,50 Dean,Software,82 Dempsey,DataStructure,70 Dempsey,OperatingSystem,70 Dempsey,CLanguage,98 Dempsey,ComputerNetwork,30 Dennis,Algorithm,100 Dennis,DataStructure,40 Dennis,Python,22 Dennis,ComputerNetwork,94 Derrick,DataBase,44 Derrick,Algorithm,26 Derrick,CLanguage,16 Derrick,Python,100 Derrick,ComputerNetwork,36 Derrick,Software,74 Devin,DataBase,16 Devin,DataStructure,70 Devin,Python,98 Devin,Software,0 Dick,DataStructure,62 Dick,Python,32 Dick,ComputerNetwork,2 Dominic,DataBase,16 Dominic,Python,30 Dominic,ComputerNetwork,12 Dominic,Software,24 Don,Algorithm,52 Don,ComputerNetwork,36 Donahue,DataBase,86 Donahue,DataStructure,88 Donahue,CLanguage,16 Donahue,ComputerNetwork,24 Donahue,Software,40 Donald,Algorithm,28 Donald,CLanguage,18 Donald,Python,52 Donald,ComputerNetwork,62 Drew,Algorithm,78 Drew,DataStructure,0 Drew,OperatingSystem,14 Drew,Python,28 Drew,Software,46 Duke,DataBase,14 Duke,Algorithm,28 Duke,OperatingSystem,68 Duke,CLanguage,78 Duncann,Algorithm,34 Duncann,DataStructure,86 Duncann,Python,94 Duncann,ComputerNetwork,24 Duncann,Software,78 Edward,DataBase,18 Edward,Algorithm,22 Edward,DataStructure,2 Edward,CLanguage,4 Egbert,Algorithm,26 Egbert,CLanguage,24 Egbert,Python,92 Egbert,ComputerNetwork,12 Eli,DataBase,54 Eli,Algorithm,54 Eli,CLanguage,94 Eli,Python,60 Eli,ComputerNetwork,30 Elijah,CLanguage,30 Elijah,Python,62 Elijah,ComputerNetwork,96 Elijah,Software,36 Elliot,Algorithm,60 Elliot,OperatingSystem,96 Elliot,Software,78 Ellis,Algorithm,90 Ellis,OperatingSystem,36 Ellis,ComputerNetwork,56 Ellis,Software,28 Elmer,DataStructure,34 Elmer,CLanguage,98 Elmer,Python,22 Elmer,ComputerNetwork,44 Elroy,DataBase,48 Elroy,Algorithm,82 Elroy,DataStructure,44 Elroy,OperatingSystem,56 Elroy,CLanguage,78 Elton,DataBase,80 Elton,DataStructure,2 Elton,OperatingSystem,16 Elton,CLanguage,44 Elton,Python,40 Elvis,DataBase,32 Elvis,DataStructure,20 Emmanuel,DataBase,32 Emmanuel,OperatingSystem,42 Emmanuel,CLanguage,12 Enoch,DataBase,54 Enoch,Algorithm,22 Enoch,Python,78 Eric,DataBase,18 Eric,Algorithm,62 Eric,ComputerNetwork,68 Eric,Software,64 Ernest,DataBase,62 Ernest,OperatingSystem,6 Ernest,CLanguage,70 Ernest,Python,94 Ernest,ComputerNetwork,16 Eugene,CLanguage,80 Evan,DataStructure,8 Evan,OperatingSystem,100 Evan,Python,20 Ford,DataBase,32 Ford,Algorithm,66 Ford,Python,68 Francis,DataBase,58 Francis,OperatingSystem,78 Francis,CLanguage,6 Francis,Software,76 Frank,DataBase,74 Frank,Python,58 Frank,ComputerNetwork,60 Geoffrey,OperatingSystem,4 Geoffrey,CLanguage,24 Geoffrey,Python,86 Geoffrey,Software,52 George,Algorithm,72 George,DataStructure,80 George,Python,36 George,ComputerNetwork,50 Gerald,Algorithm,46 Gerald,OperatingSystem,94 Gerald,CLanguage,90 Gerald,ComputerNetwork,8 Gilbert,Algorithm,80 Gilbert,CLanguage,96 Gilbert,ComputerNetwork,72 Giles,DataBase,6 Giles,Algorithm,12 Giles,DataStructure,26 Giles,CLanguage,6 Giles,Python,72 Giles,ComputerNetwork,18 Giles,Software,78 Glenn,DataBase,12 Glenn,Algorithm,42 Glenn,OperatingSystem,82 Glenn,CLanguage,20 Glenn,Python,84 Glenn,ComputerNetwork,76 Gordon,DataBase,60 Gordon,Algorithm,64 Gordon,OperatingSystem,38 Gordon,Python,48 Greg,Algorithm,18 Greg,DataStructure,28 Greg,Python,78 Greg,Software,72 Griffith,Algorithm,40 Griffith,DataStructure,58 Griffith,OperatingSystem,10 Griffith,Software,4 Harlan,Algorithm,44 Harlan,OperatingSystem,46 Harlan,CLanguage,86 Harlan,Python,86 Harlan,ComputerNetwork,56 Harlan,Software,12 Harold,DataStructure,78 Harold,OperatingSystem,100 Harold,CLanguage,52 Harold,Python,12 Harry,DataBase,74 Harry,OperatingSystem,60 Harry,Python,42 Harry,Software,46 Harvey,DataBase,86 Harvey,Algorithm,88 Harvey,DataStructure,40 Harvey,OperatingSystem,74 Harvey,Python,14 Harvey,ComputerNetwork,78 Harvey,Software,22 Hayden,Algorithm,36 Hayden,DataStructure,80 Hayden,Software,34 Henry,Python,4 Henry,ComputerNetwork,74 Herbert,OperatingSystem,88 Herbert,CLanguage,26 Herbert,ComputerNetwork,18 Herman,OperatingSystem,24 Herman,ComputerNetwork,14 Herman,Software,78 Hilary,DataStructure,58 Hilary,Python,2 Hilary,ComputerNetwork,98 Hilary,Software,32 Hiram,DataBase,12 Hiram,Algorithm,44 Hiram,DataStructure,74 Hiram,OperatingSystem,70 Hiram,CLanguage,46 Hiram,ComputerNetwork,38 Hobart,DataBase,26 Hobart,Algorithm,0 Hobart,DataStructure,44 Hobart,ComputerNetwork,48 Hogan,DataBase,80 Hogan,CLanguage,40 Hogan,Python,10 Hogan,Software,26 Horace,DataBase,22 Horace,OperatingSystem,52 Horace,CLanguage,54 Horace,ComputerNetwork,10 Horace,Software,24 Ivan,OperatingSystem,70 Ivan,Python,10 Ivan,ComputerNetwork,100 Ivan,Software,36 Jason,Algorithm,38 Jason,OperatingSystem,18 Jason,CLanguage,8 Jason,ComputerNetwork,4 Jay,Algorithm,58 Jay,DataStructure,30 Jay,OperatingSystem,24 Jay,CLanguage,22 Jay,Python,38 Jay,Software,6 Jeff,DataBase,20 Jeff,DataStructure,0 Jeff,ComputerNetwork,18 Jeff,Software,16 Jeffrey,DataStructure,66 Jeffrey,OperatingSystem,4 Jeffrey,CLanguage,100 Jeffrey,Software,86 Jeremy,DataBase,84 Jeremy,Algorithm,44 Jeremy,DataStructure,90 Jeremy,CLanguage,94 Jeremy,Python,60 Jeremy,Software,66 Jerome,DataBase,16 Jerome,DataStructure,64 Jerome,OperatingSystem,10 Jerry,DataStructure,30 Jerry,Python,46 Jerry,ComputerNetwork,94 Jesse,Algorithm,78 Jesse,DataStructure,50 Jesse,OperatingSystem,14 Jesse,CLanguage,100 Jesse,Python,28 Jesse,ComputerNetwork,94 Jesse,Software,84 Jim,Algorithm,32 Jim,OperatingSystem,36 Jim,Python,4 Jim,ComputerNetwork,38 Jo,DataBase,14 Jo,DataStructure,52 Jo,OperatingSystem,68 Jo,CLanguage,92 Jo,ComputerNetwork,28 John,DataBase,60 John,Algorithm,14 John,OperatingSystem,64 John,Python,34 John,ComputerNetwork,34 John,Software,36 Jonas,Algorithm,38 Jonas,Python,84 Jonas,ComputerNetwork,0 Jonas,Software,44 Jonathan,OperatingSystem,74 Jonathan,CLanguage,38 Jonathan,Python,86 Jonathan,Software,30 Joseph,DataStructure,30 Joseph,CLanguage,28 Joseph,ComputerNetwork,84 Joshua,Algorithm,30 Joshua,DataStructure,46 Joshua,OperatingSystem,74 Joshua,Software,0 Ken,Algorithm,74 Ken,OperatingSystem,60 Ken,CLanguage,68 Kennedy,DataBase,68 Kennedy,DataStructure,32 Kennedy,OperatingSystem,20 Kennedy,Python,14 Kenneth,OperatingSystem,74 Kenneth,CLanguage,18 Kenneth,ComputerNetwork,34 Kent,DataBase,82 Kent,DataStructure,50 Kent,CLanguage,34 Kent,Python,20 Kerr,Algorithm,70 Kerr,Python,32 Kerr,ComputerNetwork,36 Kerr,Software,36 Kerwin,Algorithm,64 Kerwin,OperatingSystem,24 Kerwin,ComputerNetwork,58 Kevin,DataBase,54 Kevin,DataStructure,44 Kevin,CLanguage,6 Kevin,Software,26 Kim,DataBase,0 Kim,Algorithm,40 Kim,DataStructure,14 Kim,Python,6 Len,DataBase,60 Len,OperatingSystem,22 Len,Python,88 Len,ComputerNetwork,76 Len,Software,92 Lennon,DataBase,84 Lennon,Algorithm,2 Lennon,OperatingSystem,98 Lennon,Software,42 Leo,DataBase,44 Leo,OperatingSystem,42 Leo,CLanguage,46 Leo,Python,38 Leo,Software,20 Leonard,Algorithm,96 Leonard,Software,20 Leopold,DataBase,48 Leopold,Algorithm,38 Leopold,DataStructure,96 Leopold,CLanguage,24 Leopold,Python,52 Leopold,ComputerNetwork,90 Leopold,Software,94 Les,DataBase,72 Les,Algorithm,58 Les,DataStructure,26 Les,CLanguage,2 Les,Python,38 Les,ComputerNetwork,20 Lester,DataStructure,100 Lester,CLanguage,100 Lester,Python,96 Lester,ComputerNetwork,50 Levi,CLanguage,36 Levi,Software,86 Lewis,Algorithm,62 Lewis,DataStructure,60 Lewis,OperatingSystem,18 Lewis,Python,60 Lionel,DataStructure,82 Lionel,OperatingSystem,88 Lionel,CLanguage,22 Lionel,ComputerNetwork,22 Lou,OperatingSystem,88 Lou,Software,52 Louis,DataBase,50 Louis,Algorithm,76 Louis,DataStructure,32 Louis,OperatingSystem,18 Louis,Python,56 Louis,Software,94 Lucien,DataStructure,22 Lucien,CLanguage,58 Lucien,Python,94 Lucien,ComputerNetwork,94 Lucien,Software,58 Luthers,Algorithm,44 Luthers,DataStructure,16 Luthers,OperatingSystem,84 Luthers,CLanguage,22 Luthers,ComputerNetwork,88 Marico,DataBase,56 Marico,Algorithm,56 Marico,DataStructure,16 Marico,CLanguage,40 Marico,ComputerNetwork,18 Marico,Software,24 Mark,DataBase,66 Mark,Algorithm,46 Mark,DataStructure,36 Mark,OperatingSystem,86 Mark,Python,84 Mark,ComputerNetwork,30 Mark,Software,60 Marlon,DataStructure,44 Marlon,OperatingSystem,52 Marlon,CLanguage,34 Marlon,Software,62 Marsh,Algorithm,64 Marsh,Python,86 Marsh,ComputerNetwork,68 Marsh,Software,42 Marshall,DataBase,38 Marshall,OperatingSystem,38 Marshall,CLanguage,50 Marshall,Software,76 Martin,CLanguage,84 Martin,Python,98 Martin,Software,38 Marvin,Algorithm,12 Marvin,OperatingSystem,82 Marvin,CLanguage,64 Matt,DataBase,46 Matt,DataStructure,48 Matt,CLanguage,22 Matt,Python,100 Matthew,CLanguage,14 Matthew,ComputerNetwork,48 Maurice,DataStructure,26 Maurice,ComputerNetwork,16 Max,Algorithm,32 Max,DataStructure,38 Max,ComputerNetwork,36 Maxwell,OperatingSystem,78 Maxwell,Python,52 Maxwell,ComputerNetwork,82 Maxwell,Software,22 Meredith,DataBase,26 Meredith,Algorithm,42 Meredith,OperatingSystem,42 Meredith,Python,52 Merle,OperatingSystem,12 Merle,ComputerNetwork,40 Merle,Software,4 Merlin,Algorithm,62 Merlin,DataStructure,2 Merlin,OperatingSystem,90 Merlin,ComputerNetwork,60 Merlin,Software,20 Michael,Algorithm,92 Michael,CLanguage,66 Michael,Python,6 Michael,ComputerNetwork,42 Michael,Software,98 Mick,DataStructure,64 Mick,OperatingSystem,98 Mick,Python,2 Mick,Software,76 Mike,Algorithm,92 Mike,DataStructure,56 Mike,ComputerNetwork,62 Miles,DataBase,56 Miles,Algorithm,76 Miles,DataStructure,66 Miles,OperatingSystem,60 Miles,Python,32 Miles,ComputerNetwork,80 Milo,CLanguage,68 Milo,Python,64 Monroe,DataBase,42 Monroe,Algorithm,16 Monroe,ComputerNetwork,28 Montague,Algorithm,36 Montague,OperatingSystem,24 Montague,ComputerNetwork,16 Nelson,DataBase,40 Nelson,Algorithm,80 Nelson,DataStructure,16 Nelson,OperatingSystem,24 Nelson,Python,36 Newman,Algorithm,84 Newman,Software,52 Nicholas,DataBase,24 Nicholas,Algorithm,38 Nicholas,DataStructure,58 Nicholas,OperatingSystem,78 Nicholas,CLanguage,100 Nick,OperatingSystem,100 Nick,CLanguage,56 Nick,Python,12 Nick,ComputerNetwork,92 Nick,Software,64 Nigel,Algorithm,4 Nigel,ComputerNetwork,10 Nigel,Software,4 Noah,DataBase,80 Noah,OperatingSystem,54 Noah,CLanguage,44 Noah,Python,22 Payne,DataBase,50 Payne,Algorithm,30 Payne,DataStructure,62 Payne,Python,94 Payne,ComputerNetwork,92 Payne,Software,80 Perry,DataStructure,38 Perry,OperatingSystem,88 Perry,CLanguage,18 Perry,ComputerNetwork,68 Perry,Software,98 Pete,DataStructure,10 Pete,OperatingSystem,42 Pete,Software,74 Peter,DataBase,88 Peter,Algorithm,46 Peter,DataStructure,58 Peter,Software,54 Phil,DataBase,16 Phil,OperatingSystem,16 Phil,Software,14 Philip,DataBase,24 Philip,OperatingSystem,30 Randolph,Algorithm,18 Randolph,DataStructure,82 Randolph,OperatingSystem,90 Raymondt,DataBase,86 Raymondt,Algorithm,54 Raymondt,DataStructure,78 Raymondt,CLanguage,46 Raymondt,Python,78 Raymondt,Software,100 Robin,Algorithm,68 Robin,DataStructure,2 Robin,Python,90 Robin,Software,54 Rock,DataBase,6 Rock,Algorithm,92 Rock,OperatingSystem,88 Rock,CLanguage,0 Rock,Python,94 Rock,Software,98 Rod,Algorithm,84 Rod,OperatingSystem,94 Rod,Python,18 Rod,ComputerNetwork,56 Roderick,DataBase,50 Roderick,Algorithm,62 Roderick,OperatingSystem,66 Roderick,CLanguage,12 Rodney,Algorithm,34 Rodney,OperatingSystem,52 Rodney,ComputerNetwork,44 Ron,DataBase,82 Ron,Algorithm,76 Ron,DataStructure,36 Ron,CLanguage,58 Ron,Python,40 Ron,ComputerNetwork,36 Ronald,DataBase,66 Ronald,Algorithm,20 Ronald,CLanguage,32 Rory,Algorithm,68 Rory,OperatingSystem,12 Rory,CLanguage,90 Rory,Software,76 Roy,DataBase,88 Roy,DataStructure,58 Roy,OperatingSystem,20 Roy,CLanguage,74 Roy,Python,70 Roy,ComputerNetwork,0 Samuel,DataBase,66 Samuel,Algorithm,32 Samuel,OperatingSystem,20 Samuel,ComputerNetwork,96 Sandy,DataStructure,72 Saxon,DataBase,44 Saxon,Algorithm,52 Saxon,DataStructure,52 Saxon,OperatingSystem,46 Saxon,CLanguage,60 Saxon,ComputerNetwork,66 Saxon,Software,38 Scott,Algorithm,46 Scott,OperatingSystem,78 Scott,Software,4 Sean,DataBase,62 Sean,Algorithm,92 Sean,OperatingSystem,92 Sean,CLanguage,0 Sean,Python,62 Sean,ComputerNetwork,34 Sebastian,DataBase,68 Sebastian,Algorithm,38 Sebastian,OperatingSystem,62 Sebastian,CLanguage,10 Sebastian,Python,64 Sebastian,ComputerNetwork,100 Sid,DataBase,14 Sid,OperatingSystem,20 Sid,CLanguage,88 Sidney,DataBase,96 Sidney,Algorithm,36 Sidney,DataStructure,8 Sidney,ComputerNetwork,0 Sidney,Software,34 Simon,ComputerNetwork,96 Simon,Software,64 Solomon,DataBase,2 Solomon,Algorithm,46 Solomon,DataStructure,20 Solomon,ComputerNetwork,64 Solomon,Software,18 Spencer,DataStructure,24 Spencer,OperatingSystem,88 Spencer,CLanguage,96 Spencer,Python,14 Spencer,ComputerNetwork,98 Stan,DataStructure,64 Stan,CLanguage,48 Stan,Python,46 Todd,OperatingSystem,82 Todd,Python,52 Todd,ComputerNetwork,42 Tom,DataBase,26 Tom,Algorithm,12 Tom,OperatingSystem,16 Tom,Python,40 Tom,Software,60 Tony,DataBase,30 Tony,Algorithm,12 Tony,Python,96 Tracy,DataBase,34 Tracy,CLanguage,72 Tracy,Software,74 Truman,Algorithm,60 Truman,Python,74 Truman,ComputerNetwork,54 Upton,DataBase,94 Upton,Algorithm,52 Upton,DataStructure,28 Upton,Python,86 Upton,ComputerNetwork,78 Uriah,Algorithm,54 Valentine,DataBase,10 Valentine,DataStructure,76 Valentine,CLanguage,96 Valentine,Python,38 Valentine,Software,60 Valentine,DataBase,0 Valentine,DataStructure,40 Valentine,CLanguage,56 Verne,OperatingSystem,30 Verne,Python,74 Verne,Software,94 Vic,DataBase,62 Vic,CLanguage,56 Vic,ComputerNetwork,66 Victor,ComputerNetwork,42 Victor,Software,6 Vincent,DataBase,70 Vincent,Algorithm,98 Vincent,OperatingSystem,48 Vincent,ComputerNetwork,64 Vincent,Software,48 Virgil,DataStructure,30 Virgil,OperatingSystem,8 Virgil,Python,22 Virgil,ComputerNetwork,68 Virgil,Software,60 Walter,DataBase,96 Walter,Algorithm,34 Walter,OperatingSystem,62 Walter,Software,4 Ward,DataStructure,38 Ward,OperatingSystem,64 Ward,ComputerNetwork,96 Ward,Software,88 Webb,DataBase,26 Webb,Algorithm,32 Webb,DataStructure,94 Webb,CLanguage,38 Webb,Python,44 Webb,ComputerNetwork,42 Webb,Software,84 Webster,OperatingSystem,98 Webster,Software,16 Will,Algorithm,30 Will,OperatingSystem,96 Will,CLanguage,38 William,DataBase,74 William,DataStructure,36 William,OperatingSystem,58 William,CLanguage,98 William,ComputerNetwork,68 William,Software,74 Willie,DataStructure,24 Willie,OperatingSystem,70 Willie,Python,48 Willie,ComputerNetwork,92 Winfred,Algorithm,16 Winfred,CLanguage,22 Winfred,Software,26 Winston,DataStructure,66 Winston,OperatingSystem,26 Winston,CLanguage,98 Winston,Software,40 Woodrow,DataBase,26 Woodrow,OperatingSystem,72 Woodrow,Python,44 Wordsworth,DataStructure,50 Wordsworth,OperatingSystem,62 Wordsworth,Python,42 Wordsworth,ComputerNetwork,4 Wright,DataBase,76 Wright,OperatingSystem,100 Wright,ComputerNetwork,44 Wright,Software,60
请根据给定的实验数据,在 spark-shell 中通过编程来计算以下内容:
进入spark/bin目录下输入spark-shell启动spark
(1)该系共有多少学生;
(2)该系共开设来多少门课程;
(3)Tom 同学的总成绩平均分是多少;
val lines = sc.textFile("file:///usr/local/sparkdata/Data01.txt") lines.filter(row=>row.split(",")(0)=="Tom") .map(row=>(row.split(",")(0),row.split(",")(2).toInt)) .mapValues(x=>(x,1)). reduceByKey((x,y) => (x._1+y._1,x._2 + y._2)) .mapValues(x => (x._1 / x._2)) .collect()
(4)求每名同学的选修的课程门数;
(5)该系 DataBase 课程共有多少人选修;
(6)各门课程的平均分是多少;
(7)使用累加器计算共有多少人选了 DataBase 这门课。
2.编写独立应用程序实现数据去重
对于两个输入文件 A 和 B,编写 Spark 独立应用程序,对两个文件进行合并,并剔除其
中重复的内容,得到一个新文件 C。下面是输入文件和输出文件的一个样例,供参考。
输入文件 A 的样例如下:
20170101 x
20170102 y
20170103 x
20170104 y
20170105 z
20170106 z
输入文件 B 的样例如下:
20170101 y
20170102 y
20170103 x
20170104 z
20170105 y
根据输入的文件 A 和 B 合并得到的输出文件 C 的样例如下:
20170101 x
20170101 y
20170102 y
20170103 x
20170104 y
20170104 z
20170105 y
20170105 z
20170106 z
实验代码:
package sy4 import org.apache.spark.{SparkConf, SparkContext} object sjqc { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("Sjqc") val sc = new SparkContext(conf) val dataFile = "E:\IntelliJ IDEA 2019.3.3\WorkSpace\MyScala\src\main\scala\sy4\A.txt,E:\IntelliJ IDEA 2019.3.3\WorkSpace\MyScala\src\main\scala\sy4\B.txt" val lines = sc.textFile(dataFile,2) val distinct_lines = lines.distinct() distinct_lines.repartition(1).saveAsTextFile("./src/main/scala/sy4/C.txt") } }
实验结果:
3.编写独立应用程序实现求平均值问题
每个输入文件表示班级学生某个学科的成绩,每行内容由两个字段组成,第一个是学生
名字,第二个是学生的成绩;编写 Spark 独立应用程序求出所有学生的平均成绩,并输出到
一个新文件中。下面是输入文件和输出文件的一个样例,供参考。
Algorithm 成绩:
小明 92
小红 87
小新 82
小丽 90
Database 成绩:
小明 95
小红 81
小新 89
小丽 85
Python 成绩:
小明 82
小红 83
小新 94
小丽 91
平均成绩如下:
(小红,83.67)
(小新,88.33)
(小明,89.67)
(小丽,88.67)
实验代码:
package sy4 import org.apache.spark.SparkContext import org.apache.spark.SparkConf import org.apache.spark.HashPartitioner object exercise03 { def main(args:Array[String]) { val conf = new SparkConf().setAppName("exercise03") val sc = new SparkContext(conf) val dataFile = "file://E:/IntelliJ IDEA 2019.3.3/WorkSpace/MyScala/src/main/scala/sy4/student.txt" val data = sc.textFile(dataFile,3) val res=data.filter( _.trim().length>0).map(line=>(line.split(" ")(0).trim(),line.split(" ")(1).trim().toInt)).partitionBy(new HashPartitioner(1)).groupByKey().map(x=>{ var n=0 var sum=0.0 for(i<-x._2){ sum=sum+i n=n+1 } val avg=sum/n val format=f"$avg%1.2f".toDouble (x._1,format)}) res.saveAsTextFile("./result") } }
实验结果: