SharkTank Posted July 18, 2020 Report Posted July 18, 2020 28 minutes ago, Sarvapindi said: google cheste real time telvadhi ga man...real time lo full advanced untaya ..konchem basics tho manage seyocha ani doubt real time scenarios ante informatica or datastage idea unda Quote
Sarvapindi Posted July 18, 2020 Author Report Posted July 18, 2020 Just now, trent said: Gattiga planning uncle Nuvvu ithe definitions nerchukunte job iyyar baa...ee louda gallu technology ni thega develope sestunnar oka 10 yrs no new tech ante bagundu..Lekapothe every year okati nee yavva Quote
Sarvapindi Posted July 18, 2020 Author Report Posted July 18, 2020 2 minutes ago, SharkTank said: real time scenarios ante informatica or datastage idea unda Ha undi Quote
SharkTank Posted July 18, 2020 Report Posted July 18, 2020 13 minutes ago, Sarvapindi said: Ha undi Informatica datastage lanti etl application develop chestav either in python or scala and deploy it on spark cluster for processing data .data live streaming,unstructured,semi structured untundi u will build etl Data pipelines which process any type of data huge data from several sources. Usually source rdms or live stream or flat files untay processing will be on spark and target will be some nosql database. programming (okadu scala antadu inkodu Python )+ Hadoop stack + spark + Reporting tool +shell scripting + automation test inka na bonda boshanam mannu mashanam ,not discouraging but fed up Quote
SharkTank Posted July 18, 2020 Report Posted July 18, 2020 Emadya devops infrastructure setup kuda expecting .. 1 Quote
kevinUsa Posted July 18, 2020 Report Posted July 18, 2020 30 minutes ago, Sarvapindi said: Sql vasthe pyspark radhu...pyspark ante python in spark.....u can use sql in pyspark....ya python or scala rendilto edoti vasthe saal... In Scala lo kuda u can use SQL BTW u create a SQL temp Run the query and print it Quote
kevinUsa Posted July 18, 2020 Report Posted July 18, 2020 9 minutes ago, SharkTank said: Informatica datastage lanti etl application develop chestav either in python or scala and deploy it on spark cluster for processing data .data live streaming,unstructured,semi structured untundi u will build etl Data pipelines which process any type of data huge data from several sources. Usually source rdms or live stream or flat files untay processing will be on spark and target will be some nosql database. programming (okadu scala antadu inkodu Python )+ Hadoop stack + spark + Reporting tool +shell scripting + automation test inka na bonda boshanam mannu mashanam ,not discouraging but fed up Bro do u even do cleansing In Scala If so how ?? Can u let me know please last week oka data set meda try chesa 4mn rows unna data set. Came to 1650 rows Quote
SharkTank Posted July 18, 2020 Report Posted July 18, 2020 8 minutes ago, kevinUsa said: Bro do u even do cleansing In Scala If so how ?? Can u let me know please last week oka data set meda try chesa 4mn rows unna data set. Came to 1650 rows U use spark + scala . transforming cleansing analytics anni chestam.4 mn dataset 1650 rows ante u r records are dropped. It depends on Shuffle partitions you mention .usually we give 2000 . If that is not set try to give it and execute your query Quote
SharkTank Posted July 18, 2020 Report Posted July 18, 2020 20 minutes ago, kevinUsa said: In Scala lo kuda u can use SQL BTW u create a SQL temp Run the query and print it Spark sql is not recommendable as temp tables occupy space. sql 10% work. Sql vaste saripodu Quote
SharkTank Posted July 18, 2020 Report Posted July 18, 2020 39 minutes ago, Sarvapindi said: Ha undi Bro meeru ikda post cheste pedda use undadu.1/4th knowledge suggestions vastay . You need to build an application then meeku idea vastundi chepte vachevi kadu . Quote
Sarvapindi Posted July 18, 2020 Author Report Posted July 18, 2020 3 minutes ago, SharkTank said: Bro meeru ikda post cheste pedda use undadu.1/4th knowledge suggestions vastay . You need to build an application then meeku idea vastundi chepte vachevi kadu . ela practice cheyalo chepu jara..manki real time data dorkadu kada .. Quote
Sarvapindi Posted July 18, 2020 Author Report Posted July 18, 2020 34 minutes ago, SharkTank said: Informatica datastage lanti etl application develop chestav either in python or scala and deploy it on spark cluster for processing data .data live streaming,unstructured,semi structured untundi u will build etl Data pipelines which process any type of data huge data from several sources. Usually source rdms or live stream or flat files untay processing will be on spark and target will be some nosql database. programming (okadu scala antadu inkodu Python )+ Hadoop stack + spark + Reporting tool +shell scripting + automation test inka na bonda boshanam mannu mashanam ,not discouraging but fed up Ade ba ..chala pedda code rayala in scala or python or few lines enuf aa? Online documentation lo examples la ne untaya inka ekkuva na Quote
SharkTank Posted July 18, 2020 Report Posted July 18, 2020 10 minutes ago, Sarvapindi said: Ade ba ..chala pedda code rayala in scala or python or few lines enuf aa? Online documentation lo examples la ne untaya inka ekkuva na Real time depends minimum 500 line of code . Simple practise wise ayte 30 lines code ni practise kosam. Datasets chala untay . Go to kaggle and download a dataset. Install scala spark and also intellij .Start practicing Quote
kevinUsa Posted July 18, 2020 Report Posted July 18, 2020 31 minutes ago, SharkTank said: U use spark + scala . transforming cleansing analytics anni chestam.4 mn dataset 1650 rows ante u r records are dropped. It depends on Shuffle partitions you mention .usually we give 2000 . If that is not set try to give it and execute your query I will post it code what I have done Quote
kevinUsa Posted July 18, 2020 Report Posted July 18, 2020 @dasari4kntr garu Scala Ani Denni kuda konchem add cheyandi please Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.