mettastar Posted November 21, 2017 Author Report Posted November 21, 2017 7 minutes ago, siritptpras said: Total diff question, how is spark and Scala job opportunities..op for posting here as I am planning to learn.. no idea bro.. Quote
mettastar Posted November 21, 2017 Author Report Posted November 21, 2017 1 minute ago, mettastar said: val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) val rowRDD = r.map(row => Row.fromSeq(row.split("\t", -1))) val df = hiveContext.createDataFrame(rowRDD, schema) df.write.mode(SaveMode.Overwrite).format("orc").partitionBy("processed_day").save("/user/hive/warehouse/df_d_distributor_return_items/") tried this way . and was running forever .. and same thing hive dynamic partitioning try chesa adi kuda running forever.. hive dynamic partitioning one month data thoni chesthe 20mins lo atla finish aindhi.. now i want to get max and min of processed_day column and after that use for loop to loop through in one month increments .. so one month increments lo dynamic partition chestha.. so aa max and min kanukoni for loop lo one month increments lo etla cheyalno cheppava bro .. nenu kuda google chesthunna ..thx ltt Quote
k2s Posted November 21, 2017 Report Posted November 21, 2017 4 hours ago, mettastar said: Gurus, naku okahelp kavali .. I have large dataset, daanni spark lo read chesi .. I want dynamically partition the data into multiple folders based on a date field I was able to do that using hiveContext but it is taking a lot of time.. so deeni badhulu I want to read distinct dates from that date field and store them in one variable and for loop use chesi I want to manually create folders and load the data into them .. itla aithe I dont have to use hiveql any other sugestions already solution you know no...why ask how to do it again ? Quote
Bhai Posted November 21, 2017 Report Posted November 21, 2017 51 minutes ago, k2s said: already solution you know no...why ask how to do it again ? Quote
mettastar Posted November 21, 2017 Author Report Posted November 21, 2017 51 minutes ago, k2s said: already solution you know no...why ask how to do it again ? E programming adi naku vachi savadhu vuncle.. and e scala examples kuda ekuva doriki saavatle.. anduke asking Quote
kittaya Posted November 21, 2017 Report Posted November 21, 2017 10 hours ago, mettastar said: Gurus, naku okahelp kavali .. I have large dataset, daanni spark lo read chesi .. I want dynamically partition the data into multiple folders based on a date field I was able to do that using hiveContext but it is taking a lot of time.. so deeni badhulu I want to read distinct dates from that date field and store them in one variable and for loop use chesi I want to manually create folders and load the data into them .. itla aithe I dont have to use hiveql any other sugestions @kasi ni adugu Quote
fake_Bezawada Posted November 21, 2017 Report Posted November 21, 2017 13 hours ago, mettastar said: Gurus, naku okahelp kavali .. I have large dataset, daanni spark lo read chesi .. I want dynamically partition the data into multiple folders based on a date field I was able to do that using hiveContext but it is taking a lot of time.. so deeni badhulu I want to read distinct dates from that date field and store them in one variable and for loop use chesi I want to manually create folders and load the data into them .. itla aithe I dont have to use hiveql any other sugestions dora edo hadoop basha lo matladutunav emi ardam kavatla Quote
k2s Posted November 21, 2017 Report Posted November 21, 2017 16 hours ago, mettastar said: E programming adi naku vachi savadhu vuncle.. and e scala examples kuda ekuva doriki saavatle.. anduke asking u beating a dead cow uncle Quote
mettastar Posted November 23, 2017 Author Report Posted November 23, 2017 Done .. 1yr partitions ki almost 1.5hrs paduthundi .. not bad .. Dataframe ni iterate cheyalemu .. we have to use RDD to iterate through... So distinct dates ni Rdd loki store chesi then rdd paina For each loop use chesi one month of data at a time read chesi dynamic partition chesthunna using hivecontext. Evarikanna code kaavalante chepandi i can paste here .. inka optimization emanna cheyachemo i dont know Quote
perugu_vada Posted November 23, 2017 Report Posted November 23, 2017 Just now, mettastar said: Done .. 1yr partitions ki almost 1.5hrs paduthundi .. not bad .. Dataframe ni iterate cheyalemu .. we have to use RDD to iterate through... So distinct dates ni Rdd loki store chesi then rdd paina For each loop use chesi one month of data at a time read chesi dynamic partition chesthunna using hivecontext. Evarikanna code kaavalante chepandi i can paste here .. inka optimization emanna cheyachemo i dont know holiday roju kuda working ah Quote
mettastar Posted November 24, 2017 Author Report Posted November 24, 2017 51 minutes ago, perugu_vada said: holiday roju kuda working ah em plans levu .. so working Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.