SharkTank Posted July 18, 2020 Report Share Posted July 18, 2020 28 minutes ago, Sarvapindi said: google cheste real time telvadhi ga man...real time lo full advanced untaya ..konchem basics tho manage seyocha ani doubt real time scenarios ante informatica or datastage idea unda Quote Link to comment Share on other sites More sharing options...
Sarvapindi Posted July 18, 2020 Author Report Share Posted July 18, 2020 Just now, trent said: Gattiga planning uncle Nuvvu ithe definitions nerchukunte job iyyar baa...ee louda gallu technology ni thega develope sestunnar oka 10 yrs no new tech ante bagundu..Lekapothe every year okati nee yavva Quote Link to comment Share on other sites More sharing options...
Sarvapindi Posted July 18, 2020 Author Report Share Posted July 18, 2020 2 minutes ago, SharkTank said: real time scenarios ante informatica or datastage idea unda Ha undi Quote Link to comment Share on other sites More sharing options...
SharkTank Posted July 18, 2020 Report Share Posted July 18, 2020 13 minutes ago, Sarvapindi said: Ha undi Informatica datastage lanti etl application develop chestav either in python or scala and deploy it on spark cluster for processing data .data live streaming,unstructured,semi structured untundi u will build etl Data pipelines which process any type of data huge data from several sources. Usually source rdms or live stream or flat files untay processing will be on spark and target will be some nosql database. programming (okadu scala antadu inkodu Python )+ Hadoop stack + spark + Reporting tool +shell scripting + automation test inka na bonda boshanam mannu mashanam ,not discouraging but fed up Quote Link to comment Share on other sites More sharing options...
SharkTank Posted July 18, 2020 Report Share Posted July 18, 2020 Emadya devops infrastructure setup kuda expecting .. 1 Quote Link to comment Share on other sites More sharing options...
kevinUsa Posted July 18, 2020 Report Share Posted July 18, 2020 30 minutes ago, Sarvapindi said: Sql vasthe pyspark radhu...pyspark ante python in spark.....u can use sql in pyspark....ya python or scala rendilto edoti vasthe saal... In Scala lo kuda u can use SQL BTW u create a SQL temp Run the query and print it Quote Link to comment Share on other sites More sharing options...
kevinUsa Posted July 18, 2020 Report Share Posted July 18, 2020 9 minutes ago, SharkTank said: Informatica datastage lanti etl application develop chestav either in python or scala and deploy it on spark cluster for processing data .data live streaming,unstructured,semi structured untundi u will build etl Data pipelines which process any type of data huge data from several sources. Usually source rdms or live stream or flat files untay processing will be on spark and target will be some nosql database. programming (okadu scala antadu inkodu Python )+ Hadoop stack + spark + Reporting tool +shell scripting + automation test inka na bonda boshanam mannu mashanam ,not discouraging but fed up Bro do u even do cleansing In Scala If so how ?? Can u let me know please last week oka data set meda try chesa 4mn rows unna data set. Came to 1650 rows Quote Link to comment Share on other sites More sharing options...
SharkTank Posted July 18, 2020 Report Share Posted July 18, 2020 8 minutes ago, kevinUsa said: Bro do u even do cleansing In Scala If so how ?? Can u let me know please last week oka data set meda try chesa 4mn rows unna data set. Came to 1650 rows U use spark + scala . transforming cleansing analytics anni chestam.4 mn dataset 1650 rows ante u r records are dropped. It depends on Shuffle partitions you mention .usually we give 2000 . If that is not set try to give it and execute your query Quote Link to comment Share on other sites More sharing options...
SharkTank Posted July 18, 2020 Report Share Posted July 18, 2020 20 minutes ago, kevinUsa said: In Scala lo kuda u can use SQL BTW u create a SQL temp Run the query and print it Spark sql is not recommendable as temp tables occupy space. sql 10% work. Sql vaste saripodu Quote Link to comment Share on other sites More sharing options...
SharkTank Posted July 18, 2020 Report Share Posted July 18, 2020 39 minutes ago, Sarvapindi said: Ha undi Bro meeru ikda post cheste pedda use undadu.1/4th knowledge suggestions vastay . You need to build an application then meeku idea vastundi chepte vachevi kadu . Quote Link to comment Share on other sites More sharing options...
Sarvapindi Posted July 18, 2020 Author Report Share Posted July 18, 2020 3 minutes ago, SharkTank said: Bro meeru ikda post cheste pedda use undadu.1/4th knowledge suggestions vastay . You need to build an application then meeku idea vastundi chepte vachevi kadu . ela practice cheyalo chepu jara..manki real time data dorkadu kada .. Quote Link to comment Share on other sites More sharing options...
Sarvapindi Posted July 18, 2020 Author Report Share Posted July 18, 2020 34 minutes ago, SharkTank said: Informatica datastage lanti etl application develop chestav either in python or scala and deploy it on spark cluster for processing data .data live streaming,unstructured,semi structured untundi u will build etl Data pipelines which process any type of data huge data from several sources. Usually source rdms or live stream or flat files untay processing will be on spark and target will be some nosql database. programming (okadu scala antadu inkodu Python )+ Hadoop stack + spark + Reporting tool +shell scripting + automation test inka na bonda boshanam mannu mashanam ,not discouraging but fed up Ade ba ..chala pedda code rayala in scala or python or few lines enuf aa? Online documentation lo examples la ne untaya inka ekkuva na Quote Link to comment Share on other sites More sharing options...
SharkTank Posted July 18, 2020 Report Share Posted July 18, 2020 10 minutes ago, Sarvapindi said: Ade ba ..chala pedda code rayala in scala or python or few lines enuf aa? Online documentation lo examples la ne untaya inka ekkuva na Real time depends minimum 500 line of code . Simple practise wise ayte 30 lines code ni practise kosam. Datasets chala untay . Go to kaggle and download a dataset. Install scala spark and also intellij .Start practicing Quote Link to comment Share on other sites More sharing options...
kevinUsa Posted July 18, 2020 Report Share Posted July 18, 2020 31 minutes ago, SharkTank said: U use spark + scala . transforming cleansing analytics anni chestam.4 mn dataset 1650 rows ante u r records are dropped. It depends on Shuffle partitions you mention .usually we give 2000 . If that is not set try to give it and execute your query I will post it code what I have done Quote Link to comment Share on other sites More sharing options...
kevinUsa Posted July 18, 2020 Report Share Posted July 18, 2020 @dasari4kntr garu Scala Ani Denni kuda konchem add cheyandi please Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.