Sarvapindi Posted July 18, 2020 Author Report Share Posted July 18, 2020 14 minutes ago, SharkTank said: Real time depends minimum 500 line of code . Simple practise wise ayte 30 lines code ni practise kosam. Datasets chala untay . Go to kaggle and download a dataset. Install scala spark and also intellij .Start practicing 500 aa...inka ainatte Quote Link to comment Share on other sites More sharing options...
kevinUsa Posted July 18, 2020 Report Share Posted July 18, 2020 3 minutes ago, Sarvapindi said: 500 aa...inka ainatte Easy Ee kani u have to understand Quote Link to comment Share on other sites More sharing options...
Killer66 Posted July 18, 2020 Report Share Posted July 18, 2020 Bhaiya big daya lite tesuko vere vi vetuko, . Coding rakapothey big data lo vundadam kastam 😐 Quote Link to comment Share on other sites More sharing options...
Picheshwar Posted July 18, 2020 Report Share Posted July 18, 2020 5 minutes ago, Killer66 said: Bhaiya big daya lite tesuko vere vi vetuko, . Coding rakapothey big data lo vundadam kastam 😐 😂 Quote Link to comment Share on other sites More sharing options...
kevinUsa Posted July 18, 2020 Report Share Posted July 18, 2020 1) used to create the folder on HDFS hadoop fs -mkdir//////***** 2) starting Sparkshell spark-shell --master yarn 3) importing Packages import org.apache.spark.sql.functions.{expr, col, column} import org.apache.spark.sql.SQLContext val sqlContext = new SQLContext(sc) 4) removing the columns from the data set val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power ") !== 0) && (col("body_type") !=="") && (col("color_slug") !== "") && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col(" fuel_type") !== "") && (col("date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0)) val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power ") !== 0) && (col("body_type") !=="") && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col("fuel_type") !== "") && (col(" date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0)) 5) Removing null values from the data set val cars_nullaf = af_temp.where((col("maker").isNotNull) && (col("model").isNotNull) && (col("mileage").isNotNull) && (col("manufacture_year").isNotNull) && (col("engine_displacement").isNo tNull) && (col("engine_power").isNotNull) && (col("stk_year").isNotNull) && (col("transmission").isNotNull) && (col("door_count").isNotNull) && (col("seat_count").isNotNull) && (col("fuel_type").i sNotNull) && (col("date_created").isNotNull) && (col("date_last_seen").isNotNull) && (col("price_eur").isNotNull)) cars_nullaf.select("maker","model","engine_power","transmission","fuel_type").orderBy(desc ("engine_power")).show cars_nullaf.createOrReplaceTempView("cars") val sqlDF = spark.sql("SELECT * FROM cars") val Total_number_columns = spark.sql("SELECT COUNT(*) FROM cars") Total_number_columns.show() scala> val Number_of_models_by_manufactuer = spark.sql("SELECT model, maker, COUNT(model) FROM cars Group by maker ,model") scala> Number_of_models_by_manufactuer.show() al Number_of_models_by_manufactuer = spark.sql("SELECT maker,model, COUNT(model) as top_car_models_sold F ROM cars Group by maker,model" ) Number_of_models_by_manufactuer.show() val Type_of_Tranmissions_sold = spark.sql("SELECT transmission, COUNT( transmission) FROM cars Group by tr ansmission") Type_of_Tranmissions_sold.show() val Type_of_Car_sold = spark.sql("SELECT maker,transmission, count(*) FROM cars Group by transmission, ma ker") Type_of_Car_sold.show() val AVG_Price_car_by_model = spark.sql("SELECT maker, model, AVG(price_eur) FROM cars GROUP BY maker ,mode l") AVG_Price_car_by_model.show() Quote Link to comment Share on other sites More sharing options...
kevinUsa Posted July 18, 2020 Report Share Posted July 18, 2020 the above is the code I have used Quote Link to comment Share on other sites More sharing options...
SharkTank Posted July 18, 2020 Report Share Posted July 18, 2020 (edited) 22 minutes ago, Sarvapindi said: 500 aa...inka ainatte intrest unte edaina easy avtadi bro .......starting epdu kastamgane anpistadi when you get into it and do some hardwork you will start enjoying it. Hard work must . Practise practise coding . First spark practise cheyi bro , adi simple untundi then slowly application building oops concepts must ...then ala chala chala nerchukuntu velpotaru Edited July 18, 2020 by SharkTank Quote Link to comment Share on other sites More sharing options...
SharkTank Posted July 18, 2020 Report Share Posted July 18, 2020 11 minutes ago, kevinUsa said: 1) used to create the folder on HDFS hadoop fs -mkdir//////***** 2) starting Sparkshell spark-shell --master yarn 3) importing Packages import org.apache.spark.sql.functions.{expr, col, column} import org.apache.spark.sql.SQLContext val sqlContext = new SQLContext(sc) 4) removing the columns from the data set val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power ") !== 0) && (col("body_type") !=="") && (col("color_slug") !== "") && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col(" fuel_type") !== "") && (col("date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0)) val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power ") !== 0) && (col("body_type") !=="") && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col("fuel_type") !== "") && (col(" date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0)) 5) Removing null values from the data set val cars_nullaf = af_temp.where((col("maker").isNotNull) && (col("model").isNotNull) && (col("mileage").isNotNull) && (col("manufacture_year").isNotNull) && (col("engine_displacement").isNo tNull) && (col("engine_power").isNotNull) && (col("stk_year").isNotNull) && (col("transmission").isNotNull) && (col("door_count").isNotNull) && (col("seat_count").isNotNull) && (col("fuel_type").i sNotNull) && (col("date_created").isNotNull) && (col("date_last_seen").isNotNull) && (col("price_eur").isNotNull)) cars_nullaf.select("maker","model","engine_power","transmission","fuel_type").orderBy(desc ("engine_power")).show cars_nullaf.createOrReplaceTempView("cars") val sqlDF = spark.sql("SELECT * FROM cars") val Total_number_columns = spark.sql("SELECT COUNT(*) FROM cars") Total_number_columns.show() scala> val Number_of_models_by_manufactuer = spark.sql("SELECT model, maker, COUNT(model) FROM cars Group by maker ,model") scala> Number_of_models_by_manufactuer.show() al Number_of_models_by_manufactuer = spark.sql("SELECT maker,model, COUNT(model) as top_car_models_sold F ROM cars Group by maker,model" ) Number_of_models_by_manufactuer.show() val Type_of_Tranmissions_sold = spark.sql("SELECT transmission, COUNT( transmission) FROM cars Group by tr ansmission") Type_of_Tranmissions_sold.show() val Type_of_Car_sold = spark.sql("SELECT maker,transmission, count(*) FROM cars Group by transmission, ma ker") Type_of_Car_sold.show() val AVG_Price_car_by_model = spark.sql("SELECT maker, model, AVG(price_eur) FROM cars GROUP BY maker ,mode l") AVG_Price_car_by_model.show() try to implement everything using dataframes single shot . Spark sql usage enta limit cheste good, coz view occupies space. Use spark 2 . Ipdu indulo emundi . Error post chey . As i said give spark.conf.set("spark.sql.shuffle.partitions",2000) on your sparkscala console . This should fix. spark.conf.set("spark.sql.shuffle.partitions",100) Quote Link to comment Share on other sites More sharing options...
raaajaaa Posted July 18, 2020 Report Share Posted July 18, 2020 19 minutes ago, Sarvapindi said: 500 aa...inka ainatte Enduku vayya intha bayapadukuntu learn python basic try job on automation testing with python Quote Link to comment Share on other sites More sharing options...
SharkTank Posted July 18, 2020 Report Share Posted July 18, 2020 @Sarvapindi bro https://sparkbyexamples.com/category/spark/ This site would help you. spark examples good. Spark okate vaste sarpodu ....there is more konni projects lo spark asalu use cheyam , we use only hadoop stack and oozie workflows for scheduling jobs. Good luck. Bye Quote Link to comment Share on other sites More sharing options...
kevinUsa Posted July 18, 2020 Report Share Posted July 18, 2020 2 minutes ago, SharkTank said: try to implement everything using dataframes single shot . Spark sql usage enta limit cheste good, coz view occupies space. Use spark 2 . Ipdu indulo emundi . Error post chey . As i said give spark.conf.set("spark.sql.shuffle.partitions"2000) on your sparkscala console . This should fix. spark.conf.set("spark.sql.shuffle.partitions",100) learning bro adi em anna doubts unte pm chesta e m anna manchi maternial unte pm cheyi or ikkada post cheyi btw is my approach correct ? i used GCP for this Quote Link to comment Share on other sites More sharing options...
soodhilodaaram Posted July 18, 2020 Report Share Posted July 18, 2020 2 hours ago, Sarvapindi said: Big data real time scenarios ela untay.. jara seppandi real ga untay Quote Link to comment Share on other sites More sharing options...
kevinUsa Posted July 18, 2020 Report Share Posted July 18, 2020 3 minutes ago, soodhilodaaram said: real ga untay em chepparu!!!! 1 Quote Link to comment Share on other sites More sharing options...
Sarvapindi Posted July 18, 2020 Author Report Share Posted July 18, 2020 27 minutes ago, Killer66 said: Bhaiya big daya lite tesuko vere vi vetuko, . Coding rakapothey big data lo vundadam kastam 😐 last ki ade aithadi Quote Link to comment Share on other sites More sharing options...
Sarvapindi Posted July 18, 2020 Author Report Share Posted July 18, 2020 38 minutes ago, Killer66 said: Bhaiya big daya lite tesuko vere vi vetuko, . Coding rakapothey big data lo vundadam kastam 😐 rate kuda super em lev..current situation la reporting tools ki enthundo antey istunnar nee yavva...ala aithe waste inka..if we r sufer thopu we can demand otherwise waste e Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.