gsh | database question

You're viewing

gsh's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

I'm creating a *huge* table in mysql, at least 50 Gig. I have a C program reading a file from else where, picking out the info I want/need, then stuffing it into mysql with lots of insert into table (ra,dec,mag....) values () where ra and dec are indexed.

I noticed the amount of time it takes to do the inserts increases.

Does anyone have any advice on if it is faster to create my table with indicies, and stuff away, or should I create my table without indecies,
slurp in the 50-75 gig of data, then ALTER TABLE ADD INDEX?

Flat | Top-Level Comments Only

From:

nminusone.livejournal.com

I haven't used MySQL but all the others DBs I have used (Oracle, Sybase, SQL Server/MSDE and Access) tell you it's faster to drop the indices, insert the data and recreate the indices.

All those DBs also have bulk copy (bcp) mechanisms designed for this case which are way faster than row by row inserts. In fact, now that you mention it the file you were parsing the other day sounds exactly like it was created by such a tool reading out of a DB. If MySQL has something like that I'd try to see if you can use it. BCP is generally considered to be the fastest way to get data into a DB.

From:

gsh

Mysql has a bulk data loader thingy, but I'm only trying to load a subset of the data.
I don't mind it running all weekend.

I'll drop the table, and try again.

From:

dckermit.livejournal.com

My experience, also with other applications, is that it's far faster to have the extraction program create a flat file from the "elsewhere" data, then allow the database application to load that flat file in bulk. Sometimes days faster.

From:

gsh

Yea, I'm doing "creative lazyness" and hoping to avoid that. I started a shell script
to load the data. If its all loaded by monday 9 am i'll call it victory.