Memory error when looping through log files


I am attempting to anonymize a directory full of trajectory log files and want it to be flexible in terms of the number of files it can handle.

I am currently looping the list of files like this:

for file in log_files: log = load_log(file) log.anonymize()

This however causes memory errors after around 300 files (there are currently ~700 in the directory).

I also tried the anonymize module but whilst stating it had completed only around 200 anonymous files were created, hence trying to loop through myself.

I’ve tried adding steps (based on googling) such as:
del log gc.collect()

Any ideas how I can get around the memory issue without having to manually wait to see how far it gets and then starting again form that point?



Hey Matt, unfortunately there is a memory leak I have yet to find and squash related to this workflow. The # you can get through depends on the memory your system has. It’s not pretty, but my hack would be to separate out the files into folders with chunks of files (maybe 100 in your case?) and then run the script on each folder. You could also have a post-processing line that moves the original log file out of the folder when done. That way if it errors out you can do it again without losing your place. =/

Shout out to the community to see if anyone had managed to find the memory leak in the log file analysis?
Having to manually split files into various folders and restart each time we complete a set of files is ‘OK’ for research purposes, but for routine use isn;t really desirable and is limiting us with what we want to do in the future.



Hi Matt:

Does the memory leak happen only when you try to anonymize the logs, or just when you perform a loop over log files?
I routinely perform “for loops” to extract information from logs, and I can process more than 5000 trajectory logs without problems.



Just tested and also get the same if not anonymizing.
Loop used ‘log_files’ is just a list of the file locations:

for file in log_files: log = load_log(file) #log.anonymize(inplace=False) # this gets to approx 120 of 688 files before giving a memory error value = log.axis_data.mlc.get_RMS_avg() # this gets similar number of files before memory error num = num+1 print(num,'of',len(log_files), value)

Screenshot of memory increase


Note PC is Win 7, 4GB RAM, 32-bit.