Jump to content

Big data guys randi vaaa....


Recommended Posts

Posted
14 minutes ago, SharkTank said:

Real time  depends minimum 500 line of code . Simple practise wise ayte 30 lines code ni practise kosam. Datasets chala untay . Go to kaggle and download a dataset. Install scala spark and also intellij .Start practicing 

500 aa...inka ainatte 

Posted
3 minutes ago, Sarvapindi said:

500 aa...inka ainatte 

Easy Ee kani u have to understand

Posted

Bhaiya big daya lite tesuko vere vi vetuko, . Coding rakapothey big data lo vundadam kastam 😐

Posted
5 minutes ago, Killer66 said:

Bhaiya big daya lite tesuko vere vi vetuko, . Coding rakapothey big data lo vundadam kastam 😐

😂

Posted


1)  used to create the folder on HDFS
hadoop fs -mkdir//////*****
2) starting Sparkshell
spark-shell --master yarn
3) importing Packages
 import org.apache.spark.sql.functions.{expr, col, column}
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)

4) removing the columns  from the data set

val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power
") !== 0) && (col("body_type") !=="") && (col("color_slug") !== "") && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col("
fuel_type") !== "") && (col("date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0))

val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power
") !== 0) && (col("body_type") !=="")  && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col("fuel_type") !== "") && (col("
date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0))
5) Removing null values from the data set


val cars_nullaf = af_temp.where((col("maker").isNotNull) && (col("model").isNotNull) && (col("mileage").isNotNull) && (col("manufacture_year").isNotNull) && (col("engine_displacement").isNo
tNull) && (col("engine_power").isNotNull) && (col("stk_year").isNotNull) && (col("transmission").isNotNull) && (col("door_count").isNotNull) && (col("seat_count").isNotNull) && (col("fuel_type").i
sNotNull) && (col("date_created").isNotNull) && (col("date_last_seen").isNotNull) && (col("price_eur").isNotNull))


cars_nullaf.select("maker","model","engine_power","transmission","fuel_type").orderBy(desc ("engine_power")).show


cars_nullaf.createOrReplaceTempView("cars")

val sqlDF = spark.sql("SELECT * FROM cars")

val Total_number_columns = spark.sql("SELECT COUNT(*) FROM cars")

Total_number_columns.show()


scala> val Number_of_models_by_manufactuer = spark.sql("SELECT model, maker,  COUNT(model) FROM cars Group by maker
,model")

scala> Number_of_models_by_manufactuer.show()

al Number_of_models_by_manufactuer = spark.sql("SELECT  maker,model,  COUNT(model) as top_car_models_sold F
ROM cars Group by maker,model" )

Number_of_models_by_manufactuer.show()

 val Type_of_Tranmissions_sold = spark.sql("SELECT  transmission, COUNT( transmission)  FROM cars Group by tr
ansmission")

Type_of_Tranmissions_sold.show()


val Type_of_Car_sold = spark.sql("SELECT   maker,transmission, count(*)  FROM cars Group by transmission, ma
ker")

Type_of_Car_sold.show()

val AVG_Price_car_by_model = spark.sql("SELECT maker, model, AVG(price_eur)  FROM cars  GROUP BY maker ,mode
l")
AVG_Price_car_by_model.show()


 

Posted (edited)
22 minutes ago, Sarvapindi said:

500 aa...inka ainatte 

intrest unte edaina easy avtadi bro .......starting epdu kastamgane anpistadi when you get into it and do some hardwork you will start enjoying it. Hard work must . Practise practise coding . First spark practise cheyi bro  , adi simple untundi then slowly application building oops concepts must ...then ala chala chala nerchukuntu velpotaru 

Edited by SharkTank
Posted
11 minutes ago, kevinUsa said:


1)  used to create the folder on HDFS
hadoop fs -mkdir//////*****
2) starting Sparkshell
spark-shell --master yarn
3) importing Packages
 import org.apache.spark.sql.functions.{expr, col, column}
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)

4) removing the columns  from the data set

val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power
") !== 0) && (col("body_type") !=="") && (col("color_slug") !== "") && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col("
fuel_type") !== "") && (col("date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0))

val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power
") !== 0) && (col("body_type") !=="")  && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col("fuel_type") !== "") && (col("
date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0))
5) Removing null values from the data set


val cars_nullaf = af_temp.where((col("maker").isNotNull) && (col("model").isNotNull) && (col("mileage").isNotNull) && (col("manufacture_year").isNotNull) && (col("engine_displacement").isNo
tNull) && (col("engine_power").isNotNull) && (col("stk_year").isNotNull) && (col("transmission").isNotNull) && (col("door_count").isNotNull) && (col("seat_count").isNotNull) && (col("fuel_type").i
sNotNull) && (col("date_created").isNotNull) && (col("date_last_seen").isNotNull) && (col("price_eur").isNotNull))


cars_nullaf.select("maker","model","engine_power","transmission","fuel_type").orderBy(desc ("engine_power")).show


cars_nullaf.createOrReplaceTempView("cars")

val sqlDF = spark.sql("SELECT * FROM cars")

val Total_number_columns = spark.sql("SELECT COUNT(*) FROM cars")

Total_number_columns.show()


scala> val Number_of_models_by_manufactuer = spark.sql("SELECT model, maker,  COUNT(model) FROM cars Group by maker
,model")

scala> Number_of_models_by_manufactuer.show()

al Number_of_models_by_manufactuer = spark.sql("SELECT  maker,model,  COUNT(model) as top_car_models_sold F
ROM cars Group by maker,model" )

Number_of_models_by_manufactuer.show()

 val Type_of_Tranmissions_sold = spark.sql("SELECT  transmission, COUNT( transmission)  FROM cars Group by tr
ansmission")

Type_of_Tranmissions_sold.show()


val Type_of_Car_sold = spark.sql("SELECT   maker,transmission, count(*)  FROM cars Group by transmission, ma
ker")

Type_of_Car_sold.show()

val AVG_Price_car_by_model = spark.sql("SELECT maker, model, AVG(price_eur)  FROM cars  GROUP BY maker ,mode
l")
AVG_Price_car_by_model.show()


 

try to implement everything using dataframes single shot . Spark sql usage enta limit cheste good, coz view occupies space. Use spark 2 . Ipdu indulo emundi . Error post chey . As i said give 

spark.conf.set("spark.sql.shuffle.partitions",2000) on your sparkscala console . This should fix.


 
spark.conf.set("spark.sql.shuffle.partitions",100)
Posted
19 minutes ago, Sarvapindi said:

500 aa...inka ainatte 

Enduku vayya intha bayapadukuntu learn python basic try job on automation testing with python

Posted
2 minutes ago, SharkTank said:

try to implement everything using dataframes single shot . Spark sql usage enta limit cheste good, coz view occupies space. Use spark 2 . Ipdu indulo emundi . Error post chey . As i said give 

spark.conf.set("spark.sql.shuffle.partitions"2000) on your sparkscala console . This should fix.



 

spark.conf.set("spark.sql.shuffle.partitions",100)

learning bro adi

em anna doubts unte pm chesta e m anna manchi maternial unte pm cheyi 

or ikkada post cheyi 

btw is my approach correct ?

i used   GCP 

for this 

 

Posted
2 hours ago, Sarvapindi said:

Big data real time scenarios ela untay.. jara seppandi

real ga untay

Posted
3 minutes ago, soodhilodaaram said:

real ga untay

 em chepparu!!!!

  • Haha 1
Posted
27 minutes ago, Killer66 said:

Bhaiya big daya lite tesuko vere vi vetuko, . Coding rakapothey big data lo vundadam kastam 😐

last ki ade aithadi

Posted
38 minutes ago, Killer66 said:

Bhaiya big daya lite tesuko vere vi vetuko, . Coding rakapothey big data lo vundadam kastam 😐

rate kuda super em lev..current situation la reporting tools ki enthundo antey istunnar nee yavva...ala aithe waste inka..if we r sufer thopu we can demand otherwise waste e

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...