Re: large xml database - Mailing list pgsql-sql
From | Andreas Joseph Krogh |
---|---|
Subject | Re: large xml database |
Date | |
Msg-id | 4CCC96E1.9050806@officenet.no Whole thread Raw |
In response to | large xml database (Viktor Bojović <viktor.bojovic@gmail.com>) |
Responses |
Re: large xml database
|
List | pgsql-sql |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 10/30/2010 11:49 PM, Viktor Bojović wrote: > Hi, > i have very big XML documment which is larger than 50GB and want to import > it into databse, and transform it to relational schema. > When splitting this documment to smaller independent xml documments i get > ~11.1mil XML documents. > I have spent lots of time trying to get fastest way to transform all this > data but every time i give up because it takes too much time. Sometimes more > than month it would take if not stopped. > I have tried to insert each line as varchar into database and parse it using > plperl regex.. > also i have tried to store every documment as XML and parse it, but it is > also to slow. > i have tried to store every documment as varchar but it is also slow when > using regex to get data. > > many tries have failed because 8GB of ram and 10gb of swap were not enough. > also sometimes i get that more than 2^32 operations were performed, and > functions stopped to work. > > i wanted just to ask if someone knows how to speed this up. > > thanx in advance Use a SAX-parser and handle the endElement(String name) events to insert the element's content into your db. - -- Andreas Joseph Krogh <andreak@officenet.no> Senior Software Developer / CTO Public key: http://home.officenet.no/~andreak/public_key.asc - ------------------------+---------------------------------------------+ OfficeNet AS | The most difficult thing in the world is to | Rosenholmveien 25 | know how to do a thing and to watch | 1414 Trollåsen | somebody else doing it wrong, without | NORWAY | comment. | | | Tlf: +47 24 15 38 90 | | Fax: +47 24 15 38 91 | | Mobile: +47 909 56 963 | | - ------------------------+---------------------------------------------+ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iF4EAREIAAYFAkzMltwACgkQ+QNFm4X8jCLZzwD/ZIAktYXFqwUgtLLiHgYpoYNo Nf+r1r9cGNVIwMC6kH8A/i0RUwAkL45xeQ8CsiyALXYAawZF/n6Fnql15qAkZDip =t+Xo -----END PGP SIGNATURE-----