Jump to content

Big data guys randi vaaa....


Sarvapindi

Recommended Posts

1 hour ago, kevinUsa said:


1)  used to create the folder on HDFS
hadoop fs -mkdir//////*****
2) starting Sparkshell
spark-shell --master yarn
3) importing Packages
 import org.apache.spark.sql.functions.{expr, col, column}
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)

4) removing the columns  from the data set

val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power
") !== 0) && (col("body_type") !=="") && (col("color_slug") !== "") && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col("
fuel_type") !== "") && (col("date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0))

val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power
") !== 0) && (col("body_type") !=="")  && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col("fuel_type") !== "") && (col("
date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0))
5) Removing null values from the data set


val cars_nullaf = af_temp.where((col("maker").isNotNull) && (col("model").isNotNull) && (col("mileage").isNotNull) && (col("manufacture_year").isNotNull) && (col("engine_displacement").isNo
tNull) && (col("engine_power").isNotNull) && (col("stk_year").isNotNull) && (col("transmission").isNotNull) && (col("door_count").isNotNull) && (col("seat_count").isNotNull) && (col("fuel_type").i
sNotNull) && (col("date_created").isNotNull) && (col("date_last_seen").isNotNull) && (col("price_eur").isNotNull))


cars_nullaf.select("maker","model","engine_power","transmission","fuel_type").orderBy(desc ("engine_power")).show


cars_nullaf.createOrReplaceTempView("cars")

val sqlDF = spark.sql("SELECT * FROM cars")

val Total_number_columns = spark.sql("SELECT COUNT(*) FROM cars")

Total_number_columns.show()


scala> val Number_of_models_by_manufactuer = spark.sql("SELECT model, maker,  COUNT(model) FROM cars Group by maker
,model")

scala> Number_of_models_by_manufactuer.show()

al Number_of_models_by_manufactuer = spark.sql("SELECT  maker,model,  COUNT(model) as top_car_models_sold F
ROM cars Group by maker,model" )

Number_of_models_by_manufactuer.show()

 val Type_of_Tranmissions_sold = spark.sql("SELECT  transmission, COUNT( transmission)  FROM cars Group by tr
ansmission")

Type_of_Tranmissions_sold.show()


val Type_of_Car_sold = spark.sql("SELECT   maker,transmission, count(*)  FROM cars Group by transmission, ma
ker")

Type_of_Car_sold.show()

val AVG_Price_car_by_model = spark.sql("SELECT maker, model, AVG(price_eur)  FROM cars  GROUP BY maker ,mode
l")
AVG_Price_car_by_model.show()


 

Arey @kevinUsa ga, neeku intha programming ela vachu ra ?? I thought you are timepass batch like my grandpa @raaajaaa

Link to comment
Share on other sites

2 minutes ago, Sarvapindi said:

Okay...Python learning ani okayi add chey baa....we can use it for Python doubts ...

post the thread...and give me the link

i will add

Link to comment
Share on other sites

4 minutes ago, Sarvapindi said:

Thread link how to post

needhi recent gaa..programming difficult ani thread vesav kada...daani piki lepu..i will add that

Link to comment
Share on other sites

21 minutes ago, Sarvapindi said:

Big data ekkadiki podhu....just new techs coming..

Meant companies no more needed Bigdata tech as AWS is providing many Bigdata features through various AWS tools instantly. It’s a start of AWS era.

  • Haha 1
Link to comment
Share on other sites

5 hours ago, Sarvapindi said:

Big data real time scenarios ela untay.. jara seppandi

nenu 5 years nunchi working baa.

You have to know the following.

1. hadoop framework tools (sqoop, oozie, hive, pig, hbase, hdfs etc).

2. sql querying, performance tuning

3. python, java/scala languages

4. APIs

5. linux env hands on/ scripting 

6. No sql like: mangodb

7.inka aah pina vache latest tools flunk, beam, kafka and the list goes on.

 

 

 

  • Upvote 1
Link to comment
Share on other sites

28 minutes ago, jajjanaka_jandri said:

nenu 5 years nunchi working baa.

You have to know the following.

1. hadoop framework tools (sqoop, oozie, hive, pig, hbase, hdfs etc).

2. sql querying, performance tuning

3. python, java/scala languages

4. APIs

5. linux env hands on/ scripting 

6. No sql like: mangodb

7.inka aah pina vache latest tools flunk, beam, kafka and the list goes on.

 

 

 

Inni ravalante billing rate baagane undale kada..general ga enthundi ipuudu..

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...