Jump to content

What Is Big Data And What Is Hadoop


Recommended Posts

Posted

It is a good question and lot of people will confuse about it..

 

Big data is a issue or problem where companies not able to handle the data coming from lot of sources(Like Web logs,Sensor dat etc..i mean even the small company nd big company)..Within the next coming 3 to 5 years the data increases like 50 to 70% more compared to previous Last 5 years.So companies will miss important info( if they are not able to analyze this data).So Hadoop is used to handle this type of data.

 

Hadoop is not a tool and it is an Open source Eco system which is used to handle this Big data and will process distributedly.

 

Please look into RDBMS V/S Hadoop in google..If you want to know why oracle,teradata etc.. not able to handle this type of data??

 

Batch Processing : Mapreduce,Pig,Apache Spark

Real time Processing : Storm or Spark Streaming

Sql interface : Hive,Impala,.SparkSql,Apache Drill

Sqoop : Used to move the data from RDBMS to HDFS or Vice-Versa

Oozie: Scheduling the jobs

No-sql : Hbase,MongoDB,Cassandra etc.

Graph database : Neo4j

Dashboards to visualize this data: Tableau

 

Technologies which is important to learn CoreJava,Python,Scala,R

Hadoop is completely different when you compare to Teradata

Hadoop = unstructured data and TD = structured data 

There is no comparison between TD and Hadoop(as they are completely different), Teradata handles large amount of data for example ebay/paypal data in Teradata is around 1.2 Peta Bytes and I guess this is proof that TD can  handle large amount of data.

  • Replies 52
  • Created
  • Last Reply

Top Posters In This Topic

  • namesake

    9

  • vasu123

    5

  • Darling999

    4

  • suryausa

    4

Top Posters In This Topic

Posted

Hadoop is completely different when you compare to Teradata
Hadoop = unstructured data and TD = structured data
There is no comparison between TD and Hadoop(as they are completely different), Teradata handles large amount of data for example ebay/paypal data in Teradata is around 1.2 Peta Bytes and I guess this is proof that TD can handle large amount of data.


Then why organizations prefering hadoop than teradata man
Posted

Then why organizations prefering hadoop than teradata man

If you understand the difference between structured and unstructured data then you got the answer for your question.

Posted

It is a good question and lot of people will confuse about it..

Big data is a issue or problem where companies not able to handle the data coming from lot of sources(Like Web logs,Sensor dat etc..i mean even the small company nd big company)..Within the next coming 3 to 5 years the data increases like 50 to 70% more compared to previous Last 5 years.So companies will miss important info( if they are not able to analyze this data).So Hadoop is used to handle this type of data.

Hadoop is not a tool and it is an Open source Eco system which is used to handle this Big data and will process distributedly.

Please look into RDBMS V/S Hadoop in google..If you want to know why oracle,teradata etc.. not able to handle this type of data??

Batch Processing : Mapreduce,Pig,Apache Spark
Real time Processing : Storm or Spark Streaming
Sql interface : Hive,Impala,.SparkSql,Apache Drill
Sqoop : Used to move the data from RDBMS to HDFS or Vice-Versa
Oozie: Scheduling the jobs
No-sql : Hbase,MongoDB,Cassandra etc.
Graph database : Neo4j
Dashboards to visualize this data: Tableau

Technologies which is important to learn CoreJava,Python,Scala,R


Indulo ETL ki related edi.
Posted

Indulo ETL ki related edi.

Any databases will require to load the data bro, so ETL is completely different topic.

For Teradata, they have inbuild ETL tools like FLoad, MLoad, BTEQ, etc., or you can use regular etls tools like informatica also but mostly Teradata we use ELT than ETLs

so when it comes to ETL its completely different topic

Posted

If you understand the difference between structured and unstructured data then you got the answer for your question.

 

so Teradata and other big data bases ni Hadoop or Big data replace cheyadantava market lo after few years?  Teradata ki inka enni yr untadanukuntunnav life?

Posted

Hadoop is completely different when you compare to Teradata

Hadoop = unstructured data and TD = structured data 

There is no comparison between TD and Hadoop(as they are completely different), Teradata handles large amount of data for example ebay/paypal data in Teradata is around 1.2 Peta Bytes and I guess this is proof that TD can  handle large amount of data.

 

Hadoop lo kuda you can import structured data from various RDBMS systems using Sqoop.. kani that data will be stored in semi-structured format (separated by delimeters) .. kani of course it is easier to handle structured data in RDBMS than Hadoop.. kani Hadoop tho advantage is:

 

Elastic: Eppudanna if batch processing has more data for a particular month, we can easily add a few nodes to speed up the process

Fault Tolerant: Even if a couple of nodes fail, the entire job wont fail.. ade RDBMS lo ee facility ekkuva undadu

Cheap: Teradata, Netezza, etc.. nodes ki expensive hardware avasaram kani Hadoop nodes ki commodity hardware is enough

 

TD and Netezza can handle large amounts of data, kani Hadoop can process it faster for certain use cases

Posted

so Teradata and other big data bases ni Hadoop or Big data replace cheyadantava market lo after few years?  Teradata ki inka enni yr untadanukuntunnav life?

 

RDBMS ippatlo replace ayye chance ledu.. they are good at what they do..

 

RDBMS: Good for OLTP. Good for crunching latest and aggregated data

Hadoop: Good for Active Archive or basically old/all data at any granular level

Posted

Hadoop lo kuda you can import structured data from various RDBMS systems using Sqoop.. kani that data will be stored in semi-structured format (separated by delimeters) .. kani of course it is easier to handle structured data in RDBMS than Hadoop.. kani Hadoop tho advantage is:

 

Elastic: Eppudanna if batch processing has more data for a particular month, we can easily add a few nodes to speed up the process

Fault Tolerant: Even if a couple of nodes fail, the entire job wont fail.. ade RDBMS lo ee facility ekkuva undadu

Cheap: Teradata, Netezza, etc.. nodes ki expensive hardware avasaram kani Hadoop nodes ki commodity hardware is enough

 

TD and Netezza can handle large amounts of data, kani Hadoop can process it faster for certain use cases

Fault Tolerant: Even if a couple of nodes fail, the entire job wont fail.. ade RDBMS lo ee facility ekkuva undadu  

          Teradata can support False Tolerance, there is no way that the all the nodes fail at the same time and even if they do TD always has backup nodes which

          will become active once the other nodes fail and I am not sure about other RDBMs

 

Cheap: Teradata, Netezza, etc.. -  I agree on this as Hadoop is open source and very cheap when compared to Teradata ala ani clients wont go with Hadoop all the time and some clients dont care about money, they just need the performance.

Posted

gadhe vaste manchiga BE, BPM ki potunde kadha.. nag-smiling-o_zpsd23b83a3.gif?1367267799

bpm ki java avsram ledh..avsram lene ledh..gallery_8818_6_385253.gif?1367349476

Posted

Fault Tolerant: Even if a couple of nodes fail, the entire job wont fail.. ade RDBMS lo ee facility ekkuva undadu  

          Teradata can support False Tolerance, there is no way that the all the nodes fail at the same time and even if they do TD always has backup nodes which

          will become active once the other nodes fail and I am not sure about other RDBMs

 

Cheap: Teradata, Netezza, etc.. -  I agree on this as Hadoop is open source and very cheap when compared to Teradata ala ani clients wont go with Hadoop all the time and some clients dont care about money, they just need the performance.

 

 

agreed on the replacement factor.. RDBMS ippatlo replace ayye chance ledu.. they are good at what they do.. kani for some unsolvable loads, Hadoop is a good answer

×
×
  • Create New...