mettastar

Spark - Scala - help

Recommended Posts

mettastar

Gurus,

naku okahelp kavali .. I have large dataset, daanni spark lo read chesi .. I want dynamically partition the data into multiple folders based on a date field

I was able to do that using hiveContext but it is taking a lot of time..

so deeni badhulu I want to read distinct dates from that date field and store them in one variable and for loop use chesi I want to manually create folders and load the data into them .. itla aithe I dont have to use hiveql

 

any other sugestions 

Share this post


Link to post
Share on other sites
mettastar

eda sacharu spark meda job chesevaallu

Share this post


Link to post
Share on other sites
Bhai
1 hour ago, mettastar said:

Gurus,

naku okahelp kavali .. I have large dataset, daanni spark lo read chesi .. I want dynamically partition the data into multiple folders based on a date field

I was able to do that using hiveContext but it is taking a lot of time..

so deeni badhulu I want to read distinct dates from that date field and store them in one variable and for loop use chesi I want to manually create folders and load the data into them .. itla aithe I dont have to use hiveql

 

any other sugestions 

spark scala ante vendetta. 

 

Evevo chebutundedi

Share this post


Link to post
Share on other sites
ranku_mogudu
Just now, Bhai said:

spark scala ante vendetta. 

 

Evevo chebutundedi

@3$%

 

caalign @Keerthana  tag seyyakapothey akka susi oorukutnadhi

Share this post


Link to post
Share on other sites
former
1 hour ago, mettastar said:

Gurus,

naku okahelp kavali .. I have large dataset, daanni spark lo read chesi .. I want dynamically partition the data into multiple folders based on a date field

I was able to do that using hiveContext but it is taking a lot of time..

so deeni badhulu I want to read distinct dates from that date field and store them in one variable and for loop use chesi I want to manually create folders and load the data into them .. itla aithe I dont have to use hiveql

 

any other sugestions 

Convert data into DataFrames and create different DataFrames based on the date. 

Share this post


Link to post
Share on other sites
kasi
2 hours ago, mettastar said:

Gurus,

naku okahelp kavali .. I have large dataset, daanni spark lo read chesi .. I want dynamically partition the data into multiple folders based on a date field

I was able to do that using hiveContext but it is taking a lot of time..

so deeni badhulu I want to read distinct dates from that date field and store them in one variable and for loop use chesi I want to manually create folders and load the data into them .. itla aithe I dont have to use hiveql

 

any other sugestions 

use partitioning

read about it

Share this post


Link to post
Share on other sites
kasi

partitioned By(<columnname>)

date meeda parition cheyadham antha machi option kadu, better use weekly

Share this post


Link to post
Share on other sites
mettastar
24 minutes ago, former said:

Convert data into DataFrames and create different DataFrames based on the date. 

Dataframes create chese chesthunna bro .. data 1.7Billion undi 6hrs+ nunchi run aithundi just killed it. 

Data frames based on date ante how do we automate it ?

Share this post


Link to post
Share on other sites
mettastar
10 minutes ago, kasi said:

partitioned By(<columnname>)

date meeda parition cheyadham antha machi option kadu, better use weekly

Date meda cheyali bro .. partitioned by ante hive table ddl lona bro?

Share this post


Link to post
Share on other sites
kasi
30 minutes ago, mettastar said:

Date meda cheyali bro .. partitioned by ante hive table ddl lona bro?

direct ga nuvu write lo kuda vadochu 

df.repartition("entity", "year", "month", "day", "status").write.partitionBy("entity", "year", "month", "day", "status").mode('Append').parquet(s"$location")

 

njoy madi

Share this post


Link to post
Share on other sites
mettastar
11 minutes ago, kasi said:

direct ga nuvu write lo kuda vadochu 

df.repartition("entity", "year", "month", "day", "status").write.partitionBy("entity", "year", "month", "day", "status").mode('Append').parquet(s"$location")

 

njoy madi

idi tried bro .. hive query use chesina daniki deeniki pedha difference ledu..

one month data tho try chesa ..comparable ga fast undi .. so now I want to identofy max and min dates in the dataset and loop through in 1 month increments .. 

hiveql output ni variables ki etla assign cheyali bro .. google chesthe dorakatle .. 

val max = hiveContext(s"""select max(processed_day) from table """)   -- this is not assigning the output 

Share this post


Link to post
Share on other sites
kasi

did you create hiveContext in the first place?

Share this post


Link to post
Share on other sites
kasi

sc = SparkContext()
sqlContext = HiveContext(sc)

 

sqlContext.Sql("""select * from table""").....ila rayali 

Share this post


Link to post
Share on other sites
siritptpras

Total diff question, how is spark and Scala job opportunities..op sorry for posting here as I am planning to learn..

Share this post


Link to post
Share on other sites
mettastar
10 minutes ago, kasi said:

sc = SparkContext()
sqlContext = HiveContext(sc)

 

sqlContext.Sql("""select * from table""").....ila rayali 

val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val rowRDD = r.map(row => Row.fromSeq(row.split("\t", -1)))
val df = hiveContext.createDataFrame(rowRDD, schema)


df.write.mode(SaveMode.Overwrite).format("orc").partitionBy("processed_day").save("/user/hive/warehouse/df_d_distributor_return_items/")

 

tried this way . and was running forever .. and same thing hive dynamic partitioning try chesa adi kuda running forever.. 

hive dynamic partitioning one month data thoni chesthe 20mins lo atla finish aindhi..

now i want to get max and min of processed_day column and after that use for loop to loop through in one month increments .. so one month increments lo dynamic partition chestha..

so aa max and min kanukoni for loop lo one month increments lo etla cheyalno cheppava bro .. nenu kuda google chesthunna ..thx

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now