vendetta Posted September 21, 2017 Report Posted September 21, 2017 created intial RDD in spark shell by importing data from traditional database to HDFS and processing data using scala on spark sample: column 1 column2 column3 1 10.00 2 2 30.00 2 3 16.96 3 4 18.06 3 so have to create single line RDD only by using aggregateByKey in scala(no spark sql , no dataframes API)and should give result as max(column2),count(column1),avg(column2),min(column2) based on column 3 (like groupby colum3 in sql), want Single line code using aggregateByKey in scala for spark unable to get the aggregateBykey how it works in this case , got the solution using dataframe API and spark SQL google anakandi , i didnt find solution and felt kind of tricky aggregateByKey has four values so the intial values are (0.0F,0.0F,0.0F,0.0F) 😔 if any one can explain will post the complete question in pm Quote
JavaBava Posted September 21, 2017 Report Posted September 21, 2017 Google lo kuda dorakanidi db lo dorukutundani aasapaddav chudu.. Super ahey nuvvu Quote
former Posted September 21, 2017 Report Posted September 21, 2017 5 hours ago, vendetta said: created intial RDD in spark shell by importing data from traditional database to HDFS and processing data using scala on spark sample: table column 1 column2 column3 1 10.00 2 2 30.00 2 3 16.96 3 4 18.06 3 so have to create single line RDD only by using aggregateByKey in scala(no spark sql , no dataframes API)and should give result as max(column2),count(column1),avg(column2),min(column2) based on column 3 (like groupby colum3 in sql), want Single line code using aggregateByKey in scala for spark unable to get the aggregateBykey how it works in this case , got the solution using dataframe API and spark SQL google anakandi , i didnt find solution and felt kind of tricky aggregateByKey has four values so the intial values are (0.0F,0.0F,0.0F,0.0F) 😔 You can try with Spark with python? why only scala? Quote
kasi Posted September 21, 2017 Report Posted September 21, 2017 7 hours ago, vendetta said: created intial RDD in spark shell by importing data from traditional database to HDFS and processing data using scala on spark sample: table column 1 column2 column3 1 10.00 2 2 30.00 2 3 16.96 3 4 18.06 3 so have to create single line RDD only by using aggregateByKey in scala(no spark sql , no dataframes API)and should give result as max(column2),count(column1),avg(column2),min(column2) based on column 3 (like groupby colum3 in sql), want Single line code using aggregateByKey in scala for spark unable to get the aggregateBykey how it works in this case , got the solution using dataframe API and spark SQL google anakandi , i didnt find solution and felt kind of tricky aggregateByKey has four values so the intial values are (0.0F,0.0F,0.0F,0.0F) 😔 akka aggregateByKey can only be used in paired RDD.....i cant understand what exactly u r tring to do here, and why are you tring to use aggregateByKey only???? Quote
kasi Posted September 21, 2017 Report Posted September 21, 2017 1 hour ago, former said: You can try with Spark with python? why only scala? Python Scala R Java Quote
vendetta Posted September 21, 2017 Author Report Posted September 21, 2017 41 minutes ago, kasi said: akka aggregateByKey can only be used in paired RDD.....i cant understand what exactly u r tring to do here, and why are you tring to use aggregateByKey only???? Uncle Convert chey ichina data ni key,value pair ga and get the output using aggregateByKey . pai data already rdd ga convert chesinde sample part vesa ala table columns laga orginal data contains more columns and TB of records if you can explain pm me Quote
vendetta Posted September 21, 2017 Author Report Posted September 21, 2017 @kasi why only aggregatebykey antunav already post chesa getting solution was easy using spark dataframe API and spark sql ani aggregatebykey just learning kosam , case class use cheykunda alternative ante LOC taggutundi ane ante var Result = resultDF.map(a=>(a(1).toInt,a(2).toDouble)).aggregateByKey((0.0,0.0,0,9999999999999.0))((a,b)=>(math.max(a._1,b),a._2+1,a._3+y,math.min(a._4,y)),(a,b)=>(math.max(a._1,b._1),a._2+b._2,a._3+b._3,math.min(a._4,b._4))).sortBy(_._1, false); Result.collect().foreach(println); ikda accumulator and combiner working ardam katledu Ade solution and Naku explanation kavali ante ,the guy who gave me the solution didn't explain vere works lo busy undi and don't want to disturb him teliste cheppu well and good lekapoyna ok Quote
vendetta Posted September 21, 2017 Author Report Posted September 21, 2017 2 hours ago, former said: You can try with Spark with python? why only scala? scala lone kavali that to only using aggregateByKey Quote
vendetta Posted September 21, 2017 Author Report Posted September 21, 2017 4 hours ago, JavaBava said: Google lo kuda dorakanidi db lo dorukutundani aasapaddav chudu.. Super ahey nuvvu Late night post chesa pichi pichiga anpinchi anta flow lo veltunde ikda stuck ayya ,ekda ardam katledu so post chesa forums lo y not here ikda e db lo kuda unnaru chala mandi Naku telsi and last time evaro sparkscala training for free ani classes conduct cheste manollu chalamandi attend kudA ayyaru ,so asha ante But ipdu anpistundi enduku post chesana ikda ani ,vachina rakapoyna sarcasm exhibit chestaru ayna Ipdu dini gurinchi teliyakapoyna ok next week cheppetollu free ga untaru i know how to get using reducebykey and groupbykey but felt aggregatebykey tricky idanta part of learning ante Quote
former Posted September 21, 2017 Report Posted September 21, 2017 16 minutes ago, vendetta said: scala lone kavali that to only using aggregateByKey 1 Idi scala code: var Result = resultDF.map(a=>(a(1).toInt,a(2).toDouble)).aggregateByKey((0.0,0.0,0,9999999999999.0))((a,b)=>(math.max(a._1,b),a._2+1,a._3+y,math.min(a._4,y)),(a,b)=>(math.max(a._1,b._1),a._2+b._2,a._3+b._3,math.min(a._4,b._4))).sortBy(_._1, false); Result.collect().foreach(println); You are trying to understnd what is going with above Result variable ? Quote
vendetta Posted September 21, 2017 Author Report Posted September 21, 2017 3 minutes ago, former said: Idi scala code: var Result = resultDF.map(a=>(a(1).toInt,a(2).toDouble)).aggregateByKey((0.0,0.0,0,9999999999999.0))((a,b)=>(math.max(a._1,b),a._2+1,a._3+y,math.min(a._4,y)),(a,b)=>(math.max(a._1,b._1),a._2+b._2,a._3+b._3,math.min(a._4,b._4))).sortBy(_._1, false); Result.collect().foreach(println); You are trying to understnd what is going with above Result variable ? That Accumulator and combiner part man Quote
vendetta Posted September 21, 2017 Author Report Posted September 21, 2017 teliste pm me for now evening post chesta malli Quote
kasi Posted September 21, 2017 Report Posted September 21, 2017 var Result = resultDF.map(a=>(a(1).toInt,a(2).toDouble)). what this is doing it taking in each row from resultDF and taking a(1) and a(2) and creating it as paired RDD --> ouput will be like (a(1), a(2)) aggregateByKey((0.0,0.0,0,9999999999999.0)) now you are using this pairedRDD to create a list of 4 values, and you are initializing this in the above step i changed some code below, i dont think your version will work ((a,b)=>((math.max(a._1,b),a._2+1),(a._3+y,math.min(a._4,y))), now you are passing this pairedRDD and creating another PairedRDD with aggregation, (this aggregating is self explanatory ) ((a,b)=>(math.max(a._1,b._1),a._2+b._2,a._3+b._3,math.min(a._4,b._4)))).sortBy(_._1, false); now you take the output of the above paired RDD and you are creating a list of 4 values using aggregations Quote
former Posted September 22, 2017 Report Posted September 22, 2017 8 hours ago, kasi said: var Result = resultDF.map(a=>(a(1).toInt,a(2).toDouble)). what this is doing it taking in each row from resultDF and taking a(1) and a(2) and creating it as paired RDD --> ouput will be like (a(1), a(2)) aggregateByKey((0.0,0.0,0,9999999999999.0)) now you are using this pairedRDD to create a list of 4 values, and you are initializing this in the above step i changed some code below, i dont think your version will work ((a,b)=>((math.max(a._1,b),a._2+1),(a._3+y,math.min(a._4,y))), now you are passing this pairedRDD and creating another PairedRDD with aggregation, (this aggregating is self explanatory ) ((a,b)=>(math.max(a._1,b._1),a._2+b._2,a._3+b._3,math.min(a._4,b._4)))).sortBy(_._1, false); now you take the output of the above paired RDD and you are creating a list of 4 values using aggregations @kasi a._1 b._1 a._2 b._2 etc enti bhayya ?? aggregatebyKey lo intilaize cheyatam ante expecting output data type reference values ni pass cheyyatama ? Quote
kasi Posted September 22, 2017 Report Posted September 22, 2017 12 hours ago, former said: @kasi a._1 b._1 a._2 b._2 etc enti bhayya ?? aggregatebyKey lo intilaize cheyatam ante expecting output data type reference values ni pass cheyyatama ? ((val1, val2), (val3, val4) ) a._1 - val1 a._2 - val2 b._1 - val3 b._2 - val4 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.