Jump to content

80 GB files xml to CSV how to convert


kevinUsa

Recommended Posts

jokes apart .. use python to convert xml to csv using pandas

you may need to spin up a large virtual machine or databricks clusters to make it quick

 

 

Link to comment
Share on other sites

4 hours ago, kevinUsa said:

Will it work

yes, you can spin up 8 clusters and get the job done in 10 mins may be

xml is semi structured data and csv is structured, its like flattening data, which means lot of writes and processing

80GB xml can easily output to 200gb csv, which requires lot of parallel processing, quickest and easiest way is databricks as it it readily available and get the job done in may be 5$

open a python notebooks and start fiddling with this code example

https://stackoverflow.com/questions/49898661/xml-to-csv-python

 

Link to comment
Share on other sites

17 minutes ago, soodhilodaaram said:

yes, you can spin up 8 clusters and get the job done in 10 mins may be

xml is semi structured data and csv is structured, its like flattening data, which means lot of writes and processing

80GB xml can easily output to 200gb csv, which requires lot of parallel processing, quickest and easiest way is databricks as it it readily available and get the job done in may be 5$

open a python notebooks and start fiddling with this code example

https://stackoverflow.com/questions/49898661/xml-to-csv-python

 

Why would 80gb become 200gb? You will remove all xml tags which are consuming lot of space.csv has just the needed data.i am thinking it will be less than 2gb in csv

Link to comment
Share on other sites

14 minutes ago, FLraja said:

Why would 80gb become 200gb? You will remove all xml tags which are consuming lot of space.csv has just the needed data.i am thinking it will be less than 2gb in csv

maybe you are right, I was coming from hierarchical flattening of data resulting in more records in csv

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...