Jump to content

Big data guys randi vaaa....


Sarvapindi

Recommended Posts

28 minutes ago, Sarvapindi said:

google cheste real time telvadhi ga man...real time lo full advanced untaya ..konchem basics tho manage seyocha ani doubt

real time scenarios ante informatica or datastage idea unda

Link to comment
Share on other sites

Just now, trent said:

Gattiga planning uncle Nuvvu ithe

definitions nerchukunte job iyyar baa...ee louda gallu technology ni thega develope sestunnar oka 10 yrs no new tech ante bagundu..Lekapothe every year okati nee yavva

Link to comment
Share on other sites

13 minutes ago, Sarvapindi said:

Ha undi

Informatica datastage lanti etl application develop chestav either in python or scala and deploy it on spark cluster for processing data .data live streaming,unstructured,semi structured untundi

u will build etl Data pipelines which process any type of data huge data from several sources. Usually source rdms or live stream or flat files untay processing will be on spark and target will be some nosql database.

programming (okadu scala antadu inkodu Python )+ Hadoop stack + spark +  Reporting tool +shell scripting + automation test inka na bonda boshanam mannu mashanam ,not discouraging but fed up 

Link to comment
Share on other sites

30 minutes ago, Sarvapindi said:

Sql vasthe pyspark radhu...pyspark ante python in spark.....u can use sql in pyspark....ya python or scala rendilto edoti vasthe saal...

In Scala lo kuda u can use SQL

BTW u create a SQL temp

Run the query and print it 

 

Link to comment
Share on other sites

9 minutes ago, SharkTank said:

Informatica datastage lanti etl application develop chestav either in python or scala and deploy it on spark cluster for processing data .data live streaming,unstructured,semi structured untundi

u will build etl Data pipelines which process any type of data huge data from several sources. Usually source rdms or live stream or flat files untay processing will be on spark and target will be some nosql database.

programming (okadu scala antadu inkodu Python )+ Hadoop stack + spark +  Reporting tool +shell scripting + automation test inka na bonda boshanam mannu mashanam ,not discouraging but fed up 

Bro do u even do cleansing

In Scala

If so how ??

Can u let me know please last week oka data set meda try chesa

4mn rows unna data set. Came to 1650 rows

 

 

Link to comment
Share on other sites

8 minutes ago, kevinUsa said:

Bro do u even do cleansing

In Scala

If so how ??

Can u let me know please last week oka data set meda try chesa

4mn rows unna data set. Came to 1650 rows

 

 

U use spark + scala . transforming cleansing analytics anni chestam.4 mn dataset 1650 rows ante u r records are dropped. It depends on Shuffle partitions you mention .usually we give 2000 . If that is not set try to give it and execute your query 

Link to comment
Share on other sites

20 minutes ago, kevinUsa said:

In Scala lo kuda u can use SQL

BTW u create a SQL temp

Run the query and print it 

 

Spark sql is not recommendable as temp tables occupy space. sql 10% work. Sql vaste saripodu

Link to comment
Share on other sites

39 minutes ago, Sarvapindi said:

Ha undi

Bro meeru ikda post cheste pedda use undadu.1/4th knowledge suggestions vastay . You need to build an application then meeku idea vastundi chepte vachevi kadu .

Link to comment
Share on other sites

3 minutes ago, SharkTank said:

Bro meeru ikda post cheste pedda use undadu.1/4th knowledge suggestions vastay . You need to build an application then meeku idea vastundi chepte vachevi kadu .

ela practice cheyalo chepu jara..manki real time data dorkadu kada ..

Link to comment
Share on other sites

34 minutes ago, SharkTank said:

Informatica datastage lanti etl application develop chestav either in python or scala and deploy it on spark cluster for processing data .data live streaming,unstructured,semi structured untundi

u will build etl Data pipelines which process any type of data huge data from several sources. Usually source rdms or live stream or flat files untay processing will be on spark and target will be some nosql database.

programming (okadu scala antadu inkodu Python )+ Hadoop stack + spark +  Reporting tool +shell scripting + automation test inka na bonda boshanam mannu mashanam ,not discouraging but fed up 

Ade ba ..chala pedda code rayala in scala or python or few lines enuf aa? Online documentation lo examples la ne untaya inka ekkuva na

Link to comment
Share on other sites

10 minutes ago, Sarvapindi said:

Ade ba ..chala pedda code rayala in scala or python or few lines enuf aa? Online documentation lo examples la ne untaya inka ekkuva na

Real time  depends minimum 500 line of code . Simple practise wise ayte 30 lines code ni practise kosam. Datasets chala untay . Go to kaggle and download a dataset. Install scala spark and also intellij .Start practicing 

Link to comment
Share on other sites

31 minutes ago, SharkTank said:

U use spark + scala . transforming cleansing analytics anni chestam.4 mn dataset 1650 rows ante u r records are dropped. It depends on Shuffle partitions you mention .usually we give 2000 . If that is not set try to give it and execute your query 

I will post it code what I have done

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...