sql - c# pushing to database REALLY slow -
i'm making folder/text file reader in one. obtained data should pushed sql server. it's going really, really.. slow.
really slow =>
data needs inserted:
2.73 gb (2,938,952,122 bytes)
spread over
78,995 files, 5,908 folders
with folder structure of
folder (toplevel)
- folder
- folder
- textfiles
- ...
- more folders -> 100 maybe
- folder
- folder
- folder
- textfiles
- ..
- ..
- folder
- 60 folders total
- ...
- ...
- ...
i've been reading them 3 days or smth
also because files contain lot of duplicate values think it's slow too
reasons think happening:
- because have relational database need keep opening new connection
nested foreach
- is there way increase dramatic performance?
- should use sqlbatchcopy instead? can use relation table because examples see 1 table getting filled, ignoring foreign key's need inserted (my get generated auto increment in sql db)
- is there maybe solution make lot easier?
source code:
static void leestxt(string rapport, string treinnaam) { foreach (string textfilepath in directory.enumeratefiles(rapport, "*.txt")) { string textname = path.getfilename(textfilepath); textname = textname.substring(0, textname.length - 4); list<string> variablen = new list<string>(); using (streamreader r = new streamreader(textfilepath)) { (int x = 0; x <= 10; x++) r.readline(); string output; while (true) { output = r.readline(); if (output == null) break; if (regex.ismatch(output, @"^\d")) { variablen.clear(); string[] info = output.split(' '); int kolom = 6; datum = info[0]; string[] datumtijdelijk = datum.split(new[] { '/' }); try { datum = string.format("{2}/{1}/{0}", } catch { datum = "0002/02/02"; } try { tijd = info[1]; } catch { debug.writeline(tijd); tijd = "00:00:00.000"; } try { foutcode = info[2]; absentofpresent = info[4]; teller = info[5]; omschrijving = info[6]; } catch { } while (kolom < info.count() - 1) { kolom++; omschrijving = omschrijving + " " + info[kolom]; } pushfoutentosqldb(datum, tijd, foutcode, textname, omschrijving, teller, absentofpresent, treinnaam); } if (output == string.empty) { output = " "; } if (char.isletter(output[0])) { if (variablen.contains(output)) output = output + "*"; try { pushextrainfotosqldb(output, datum, tijd, foutcode, textname, teller, absentofpresent, omschrijving, treinnaam); } catch (exception ex) { } variablen.add(output); } } } static void pushextrainfotosqldb(string waarde, string datum, string tijd, string foutcode, string module, string teller, string mnemo, string omschrijving, string treinnaam) { mycommand = new sqlcommand("insert [events].[dbo].[extrainfo] (value,foutid) values (@waarde,(select foutid [events].[dbo].[fouten] datum = @datum , time = @tijd , foutcode = @foutcode , treinid = (select treinid [events].[dbo].[treinen] name = @treinnaam)))", myconnection); mycommand.parameters.addwithvalue("@waarde", waarde); mycommand.parameters.addwithvalue("@datum", datum); mycommand.parameters.addwithvalue("@tijd", tijd); mycommand.parameters.addwithvalue("@foutcode", foutcode); mycommand.parameters.addwithvalue("@module", module); mycommand.parameters.addwithvalue("@teller", teller); mycommand.parameters.addwithvalue("@mnemo", mnemo); mycommand.parameters.addwithvalue("@omschrijving", omschrijving); mycommand.parameters.addwithvalue("@treinnaam", treinnaam); try { mycommand.executenonquery(); } catch (exception ex) { } } static void pushfoutentosqldb(string datum, string tijd, string foutcode, string module, string omschrijving, string teller, string absentpresent, string treinnaam) { mycommand = new sqlcommand("insert [events].[dbo].[fouten] (datum ,foutcode, omschrijving, module,time,teller,mnemo, treinid) values (@datum , @foutcode, @omschrijving, @module, @tijd, @teller, @absentpresent ,(select treinid [events].[dbo].[treinen] name = @treinnaam))", myconnection); mycommand.parameters.addwithvalue("@datum", datum); mycommand.parameters.addwithvalue("@tijd", tijd); mycommand.parameters.addwithvalue("@foutcode", foutcode); mycommand.parameters.addwithvalue("@module", module); mycommand.parameters.addwithvalue("@teller", teller); mycommand.parameters.addwithvalue("@omschrijving", omschrijving); mycommand.parameters.addwithvalue("@absentpresent", absentpresent); mycommand.parameters.addwithvalue("@treinnaam", treinnaam); try { mycommand.executenonquery(); } catch (exception ex) { } }
as these inserts, , has pointed out, doing line line insert not approach. check out sqlbulkcopy class. designed write batch/bulk inserts database directly code performance in mind.
excerpt documentation.
microsoft sql server includes popular command-prompt utility named bcp moving data 1 table another, whether on single server or between servers. sqlbulkcopy class lets write managed code solutions provide similar functionality. there other ways load data sql server table (insert statements, example), sqlbulkcopy offers significant performance advantage on them.
your best approach (because data coming text file(s)) create in memory datatable in define schema matches database schema. populate table data want insert , call writetoserver method , pass in table.
as sqlbulkcopy supports writing 1 table per instance have execute 2 times, once fouten table , once extrainfo table. have pull in of information before hand using select statements in existing insert statements, can used populate datatable before pass sqlbulkcopy instance. sqlbulkcopy not work transactions have sanatize data before insert have no easy way roll if there validation error on 1 or more records.
pseudo code
- create in memory dictionary of key =
treinid, value =name, populate existing data usingdatareader. - create datatable
foutenmatchesfoutentable schema - populate table data text file , use dictionary created in step 1
treinid - call
sqlbulkcopy, upload data - create in memory dictionary of key = date+time+foutcode+treinnaam, value = foutid, populate existing (new) data using
datareader. seems conveluted, maybe there better way define lookup? - populate
extrainfotable data text file , use dictionary created in step 5 fk valuefouten - call
sqlbulkcopy, upload data
other considerations existing structure
- run sql server profiler , see if there other factors contributing slow inserts not aware of (for example not tuned select statement in insert). if see single insert statement have defined expensive doing above
sqlbulkcopyhas 0 impact. - your tables large indicated in comments. in insert statements, doing 1 @ time, doing select each statement. if indexes not defined executing table scan each select in each insert. lead huge performance loss. best fix here:
- remove select insert caching possible list of values in memory in dictionary. use datareader accomplish this.
- tune indexes ensure being used in
selectpart ofinsertstatements.
- make sure data defined in insert command matches data types in schema. example: if have nvarchar column define type varchar take longer, or having bigint in database passing int32 code take longer because these have converted.
- make sure tables not have indexes or statistics expensive update every update operation execute. disable them , rebuild once done if causing problems.
- check auto growth settings on database , amount of space reserved log (also data file). default used 10mb , large database increment 10mb if out of space expensive , happen large number of inserts. here should increase size of data files. reduces fragmentation on disk. see considerations "autogrow" , "autoshrink" settings in sql server more info.
- check triggers on tables. if have insert triggers these lead significant performance degradation.
Comments
Post a Comment