Jump to content
View in the app

A better way to browse. Learn more.

Andhrafriends.com

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Big data guys randi vaaa....

Featured Replies

  • Author
14 minutes ago, SharkTank said:

Real time  depends minimum 500 line of code . Simple practise wise ayte 30 lines code ni practise kosam. Datasets chala untay . Go to kaggle and download a dataset. Install scala spark and also intellij .Start practicing 

500 aa...inka ainatte 

  • Replies 83
  • Views 10.4k
  • Created
  • Last Reply

Top Posters In This Topic

Most Popular Posts

  • Sarvapindi
    Sarvapindi

    Java vasthe e lolli antha enduku...

  • Sarvapindi
    Sarvapindi

    Sql vasthe pyspark radhu...pyspark ante python in spark.....u can use sql in pyspark....ya python or scala rendilto edoti vasthe saal...

  • SharkTank
    SharkTank

    Emadya devops infrastructure setup kuda expecting .. 

3 minutes ago, Sarvapindi said:

500 aa...inka ainatte 

Easy Ee kani u have to understand

Bhaiya big daya lite tesuko vere vi vetuko, . Coding rakapothey big data lo vundadam kastam 😐

5 minutes ago, Killer66 said:

Bhaiya big daya lite tesuko vere vi vetuko, . Coding rakapothey big data lo vundadam kastam 😐

😂


1)  used to create the folder on HDFS
hadoop fs -mkdir//////*****
2) starting Sparkshell
spark-shell --master yarn
3) importing Packages
 import org.apache.spark.sql.functions.{expr, col, column}
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)

4) removing the columns  from the data set

val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power
") !== 0) && (col("body_type") !=="") && (col("color_slug") !== "") && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col("
fuel_type") !== "") && (col("date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0))

val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power
") !== 0) && (col("body_type") !=="")  && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col("fuel_type") !== "") && (col("
date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0))
5) Removing null values from the data set


val cars_nullaf = af_temp.where((col("maker").isNotNull) && (col("model").isNotNull) && (col("mileage").isNotNull) && (col("manufacture_year").isNotNull) && (col("engine_displacement").isNo
tNull) && (col("engine_power").isNotNull) && (col("stk_year").isNotNull) && (col("transmission").isNotNull) && (col("door_count").isNotNull) && (col("seat_count").isNotNull) && (col("fuel_type").i
sNotNull) && (col("date_created").isNotNull) && (col("date_last_seen").isNotNull) && (col("price_eur").isNotNull))


cars_nullaf.select("maker","model","engine_power","transmission","fuel_type").orderBy(desc ("engine_power")).show


cars_nullaf.createOrReplaceTempView("cars")

val sqlDF = spark.sql("SELECT * FROM cars")

val Total_number_columns = spark.sql("SELECT COUNT(*) FROM cars")

Total_number_columns.show()


scala> val Number_of_models_by_manufactuer = spark.sql("SELECT model, maker,  COUNT(model) FROM cars Group by maker
,model")

scala> Number_of_models_by_manufactuer.show()

al Number_of_models_by_manufactuer = spark.sql("SELECT  maker,model,  COUNT(model) as top_car_models_sold F
ROM cars Group by maker,model" )

Number_of_models_by_manufactuer.show()

 val Type_of_Tranmissions_sold = spark.sql("SELECT  transmission, COUNT( transmission)  FROM cars Group by tr
ansmission")

Type_of_Tranmissions_sold.show()


val Type_of_Car_sold = spark.sql("SELECT   maker,transmission, count(*)  FROM cars Group by transmission, ma
ker")

Type_of_Car_sold.show()

val AVG_Price_car_by_model = spark.sql("SELECT maker, model, AVG(price_eur)  FROM cars  GROUP BY maker ,mode
l")
AVG_Price_car_by_model.show()


 

the above is the code I have used 

22 minutes ago, Sarvapindi said:

500 aa...inka ainatte 

intrest unte edaina easy avtadi bro .......starting epdu kastamgane anpistadi when you get into it and do some hardwork you will start enjoying it. Hard work must . Practise practise coding . First spark practise cheyi bro  , adi simple untundi then slowly application building oops concepts must ...then ala chala chala nerchukuntu velpotaru 

Edited by SharkTank

11 minutes ago, kevinUsa said:


1)  used to create the folder on HDFS
hadoop fs -mkdir//////*****
2) starting Sparkshell
spark-shell --master yarn
3) importing Packages
 import org.apache.spark.sql.functions.{expr, col, column}
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)

4) removing the columns  from the data set

val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power
") !== 0) && (col("body_type") !=="") && (col("color_slug") !== "") && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col("
fuel_type") !== "") && (col("date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0))

val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power
") !== 0) && (col("body_type") !=="")  && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col("fuel_type") !== "") && (col("
date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0))
5) Removing null values from the data set


val cars_nullaf = af_temp.where((col("maker").isNotNull) && (col("model").isNotNull) && (col("mileage").isNotNull) && (col("manufacture_year").isNotNull) && (col("engine_displacement").isNo
tNull) && (col("engine_power").isNotNull) && (col("stk_year").isNotNull) && (col("transmission").isNotNull) && (col("door_count").isNotNull) && (col("seat_count").isNotNull) && (col("fuel_type").i
sNotNull) && (col("date_created").isNotNull) && (col("date_last_seen").isNotNull) && (col("price_eur").isNotNull))


cars_nullaf.select("maker","model","engine_power","transmission","fuel_type").orderBy(desc ("engine_power")).show


cars_nullaf.createOrReplaceTempView("cars")

val sqlDF = spark.sql("SELECT * FROM cars")

val Total_number_columns = spark.sql("SELECT COUNT(*) FROM cars")

Total_number_columns.show()


scala> val Number_of_models_by_manufactuer = spark.sql("SELECT model, maker,  COUNT(model) FROM cars Group by maker
,model")

scala> Number_of_models_by_manufactuer.show()

al Number_of_models_by_manufactuer = spark.sql("SELECT  maker,model,  COUNT(model) as top_car_models_sold F
ROM cars Group by maker,model" )

Number_of_models_by_manufactuer.show()

 val Type_of_Tranmissions_sold = spark.sql("SELECT  transmission, COUNT( transmission)  FROM cars Group by tr
ansmission")

Type_of_Tranmissions_sold.show()


val Type_of_Car_sold = spark.sql("SELECT   maker,transmission, count(*)  FROM cars Group by transmission, ma
ker")

Type_of_Car_sold.show()

val AVG_Price_car_by_model = spark.sql("SELECT maker, model, AVG(price_eur)  FROM cars  GROUP BY maker ,mode
l")
AVG_Price_car_by_model.show()


 

try to implement everything using dataframes single shot . Spark sql usage enta limit cheste good, coz view occupies space. Use spark 2 . Ipdu indulo emundi . Error post chey . As i said give 

spark.conf.set("spark.sql.shuffle.partitions",2000) on your sparkscala console . This should fix.


 
spark.conf.set("spark.sql.shuffle.partitions",100)
19 minutes ago, Sarvapindi said:

500 aa...inka ainatte 

Enduku vayya intha bayapadukuntu learn python basic try job on automation testing with python

@Sarvapindi bro 

https://sparkbyexamples.com/category/spark/

This site would help you. spark examples good. Spark okate vaste sarpodu ....there is more konni projects lo spark asalu use cheyam , we use only hadoop stack and oozie workflows for scheduling jobs.

Good luck. Bye 

2 minutes ago, SharkTank said:

try to implement everything using dataframes single shot . Spark sql usage enta limit cheste good, coz view occupies space. Use spark 2 . Ipdu indulo emundi . Error post chey . As i said give 

spark.conf.set("spark.sql.shuffle.partitions"2000) on your sparkscala console . This should fix.



 

spark.conf.set("spark.sql.shuffle.partitions",100)

learning bro adi

em anna doubts unte pm chesta e m anna manchi maternial unte pm cheyi 

or ikkada post cheyi 

btw is my approach correct ?

i used   GCP 

for this 

 

2 hours ago, Sarvapindi said:

Big data real time scenarios ela untay.. jara seppandi

real ga untay

3 minutes ago, soodhilodaaram said:

real ga untay

 em chepparu!!!!

  • Author
27 minutes ago, Killer66 said:

Bhaiya big daya lite tesuko vere vi vetuko, . Coding rakapothey big data lo vundadam kastam 😐

last ki ade aithadi

  • Author
38 minutes ago, Killer66 said:

Bhaiya big daya lite tesuko vere vi vetuko, . Coding rakapothey big data lo vundadam kastam 😐

rate kuda super em lev..current situation la reporting tools ki enthundo antey istunnar nee yavva...ala aithe waste inka..if we r sufer thopu we can demand otherwise waste e

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

Account

Navigation

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.