dasari4kntr Posted July 18, 2020 Report Posted July 18, 2020 57 minutes ago, kevinUsa said: @dasari4kntr garu Scala Ani Denni kuda konchem add cheyandi please added... Quote
Tellugodu Posted July 18, 2020 Report Posted July 18, 2020 1 hour ago, kevinUsa said: 1) used to create the folder on HDFS hadoop fs -mkdir//////***** 2) starting Sparkshell spark-shell --master yarn 3) importing Packages import org.apache.spark.sql.functions.{expr, col, column} import org.apache.spark.sql.SQLContext val sqlContext = new SQLContext(sc) 4) removing the columns from the data set val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power ") !== 0) && (col("body_type") !=="") && (col("color_slug") !== "") && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col(" fuel_type") !== "") && (col("date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0)) val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power ") !== 0) && (col("body_type") !=="") && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col("fuel_type") !== "") && (col(" date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0)) 5) Removing null values from the data set val cars_nullaf = af_temp.where((col("maker").isNotNull) && (col("model").isNotNull) && (col("mileage").isNotNull) && (col("manufacture_year").isNotNull) && (col("engine_displacement").isNo tNull) && (col("engine_power").isNotNull) && (col("stk_year").isNotNull) && (col("transmission").isNotNull) && (col("door_count").isNotNull) && (col("seat_count").isNotNull) && (col("fuel_type").i sNotNull) && (col("date_created").isNotNull) && (col("date_last_seen").isNotNull) && (col("price_eur").isNotNull)) cars_nullaf.select("maker","model","engine_power","transmission","fuel_type").orderBy(desc ("engine_power")).show cars_nullaf.createOrReplaceTempView("cars") val sqlDF = spark.sql("SELECT * FROM cars") val Total_number_columns = spark.sql("SELECT COUNT(*) FROM cars") Total_number_columns.show() scala> val Number_of_models_by_manufactuer = spark.sql("SELECT model, maker, COUNT(model) FROM cars Group by maker ,model") scala> Number_of_models_by_manufactuer.show() al Number_of_models_by_manufactuer = spark.sql("SELECT maker,model, COUNT(model) as top_car_models_sold F ROM cars Group by maker,model" ) Number_of_models_by_manufactuer.show() val Type_of_Tranmissions_sold = spark.sql("SELECT transmission, COUNT( transmission) FROM cars Group by tr ansmission") Type_of_Tranmissions_sold.show() val Type_of_Car_sold = spark.sql("SELECT maker,transmission, count(*) FROM cars Group by transmission, ma ker") Type_of_Car_sold.show() val AVG_Price_car_by_model = spark.sql("SELECT maker, model, AVG(price_eur) FROM cars GROUP BY maker ,mode l") AVG_Price_car_by_model.show() Arey @kevinUsa ga, neeku intha programming ela vachu ra ?? I thought you are timepass batch like my grandpa @raaajaaa. Quote
Sarvapindi Posted July 18, 2020 Author Report Posted July 18, 2020 58 minutes ago, dasari4kntr said: added... ekada added Quote
dasari4kntr Posted July 18, 2020 Report Posted July 18, 2020 1 minute ago, Sarvapindi said: ekada added Quote
Sarvapindi Posted July 18, 2020 Author Report Posted July 18, 2020 2 minutes ago, dasari4kntr said: Okay...Python learning ani okayi add chey baa....we can use it for Python doubts ... Quote
dasari4kntr Posted July 18, 2020 Report Posted July 18, 2020 2 minutes ago, Sarvapindi said: Okay...Python learning ani okayi add chey baa....we can use it for Python doubts ... post the thread...and give me the link i will add Quote
Sarvapindi Posted July 18, 2020 Author Report Posted July 18, 2020 7 minutes ago, dasari4kntr said: post the thread...and give me the link i will add Thread link how to post Quote
acuman Posted July 18, 2020 Report Posted July 18, 2020 4 hours ago, Sarvapindi said: Big data real time scenarios ela untay.. jara seppandi Bigdata poyindi due to AWS 1 Quote
Sarvapindi Posted July 18, 2020 Author Report Posted July 18, 2020 Just now, acuman said: Bigdata poyindi due to AWS Big data ekkadiki podhu....just new techs coming.. 1 Quote
dasari4kntr Posted July 18, 2020 Report Posted July 18, 2020 3 minutes ago, Sarvapindi said: Thread link how to post go to the thread...copy the url and post it in AFDB Directory thread i will add Quote
dasari4kntr Posted July 18, 2020 Report Posted July 18, 2020 4 minutes ago, Sarvapindi said: Thread link how to post needhi recent gaa..programming difficult ani thread vesav kada...daani piki lepu..i will add that Quote
Sarvapindi Posted July 18, 2020 Author Report Posted July 18, 2020 4 minutes ago, acuman said: Bigdata poyindi due to AWS Aws lo emundi... Quote
acuman Posted July 18, 2020 Report Posted July 18, 2020 21 minutes ago, Sarvapindi said: Big data ekkadiki podhu....just new techs coming.. Meant companies no more needed Bigdata tech as AWS is providing many Bigdata features through various AWS tools instantly. It’s a start of AWS era. 1 Quote
jajjanaka_jandri Posted July 18, 2020 Report Posted July 18, 2020 5 hours ago, Sarvapindi said: Big data real time scenarios ela untay.. jara seppandi nenu 5 years nunchi working baa. You have to know the following. 1. hadoop framework tools (sqoop, oozie, hive, pig, hbase, hdfs etc). 2. sql querying, performance tuning 3. python, java/scala languages 4. APIs 5. linux env hands on/ scripting 6. No sql like: mangodb 7.inka aah pina vache latest tools flunk, beam, kafka and the list goes on. 1 Quote
Sarvapindi Posted July 18, 2020 Author Report Posted July 18, 2020 28 minutes ago, jajjanaka_jandri said: nenu 5 years nunchi working baa. You have to know the following. 1. hadoop framework tools (sqoop, oozie, hive, pig, hbase, hdfs etc). 2. sql querying, performance tuning 3. python, java/scala languages 4. APIs 5. linux env hands on/ scripting 6. No sql like: mangodb 7.inka aah pina vache latest tools flunk, beam, kafka and the list goes on. Inni ravalante billing rate baagane undale kada..general ga enthundi ipuudu.. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.