Lego Otaku

LDD does not use multi-core CPU?

Recommended Posts

I loaded up a monster 30,000 pieces set, then used the explode feature while watching performance monitor. The total CPU usage onmy desktop system never went past 16% even though it took LDD several seconds just to start, then another 10 seconds stutter before it could un-explode. I also tried the same thing on my laptop, and never got past 50%

16% of AMD 1090T happens to equal to just one core. 50% of Intel C2D happens to be just one core. SO I am pretty sure LDD is using only 1 core of a multi core CPU.

Using LDD 4.17 BTW.

Any reason LEGO won't tweak their LDD to use more cores? Multi-core CPU has been around for more than 5 years now and almost all mainstream computer sold today has multi-core CPU. Only really cheap model (some Walmart sub-$200 models) may have only 1 core CPU. If LDD were optimized with multi core support, it may help ambitious users build insanely large project without having their computer crawling along.

Share this post


Link to post
Share on other sites

I've been wishing they would re-write their code for multi-cpu support, since I really really really need it. But I'm guessing, and as far as I've heard, that its not that easy and they would have to redo a lot of LDD program to make it work on multiple cores.

While multi-cpu computers have been around for a while, the main customers for LDD rarely use it for such large projects, and its core purpose is the design by me aspect. Though they have done a lot for us fans :grin: so we'll see.

I've got a quad core, and LDD runs at max 25% (if you open up task manager it shows you individual program cpu usage).

To think...I could put my city together for once :laugh:

Share this post


Link to post
Share on other sites

To think...I could put my city together for once :laugh:

I could build my spaceships :wub:

Imagine, in a couple of years 2 6 core CPU's on a motherboard will be affordable.

LDD goes from using 1 core to 12........ 'scuse me, i need clean pants........

Share this post


Link to post
Share on other sites

Imagine, in a couple of years 2 6 core CPU's on a motherboard will be affordable.

LDD goes from using 1 core to 12........ 'scuse me, i need clean pants........

I'm sure there are some new Mac Pro owners out there who wish we could do that right now...

(myself not included, sadly. Though a dozen cores is overkill for what I do)

Share this post


Link to post
Share on other sites

Writing programs for multiple processors isn't always easy or even beneficial. Reading/Writing files is going to be bound by your drive - the processor is already faster than you can read and write from your drive, so multiprocessing doesn't help disk speed.

Rendering could be helped, but it depends on how it's already being rendered. For most people, using hardware acceleration instead of a software based library would be the most helpful as far as speed goes, but then you have other issues. I don't know what LDD uses, but if they use Direct3D for rendering, then you'll never see this app on a Mac or any other OS. A lot of people could benefit if they used OpenGL, though.

Multi-processing is really only useful for heavy computations. Games can take advantage because they can process physics, sound effects and music on different processors... programs like photoshop benefit because you can usually apply algorithms simultaneously to many subdivisions of an image. I'm not seeing where you'd see a huge benefit with LDD.

I'm not suggesting LDD wouldn't be better with multi processor support (as far as a program is concerned, multi-core = multi-processor), but I don't think you'd see the benefits you seem to assume you'd get.

Moreover, you could keep doubling your cores and even general purpose programs written for multiple processors aren't going to get much faster... there's only so much work you can distribute, most work needs to be done serially.

Share this post


Link to post
Share on other sites

3D design (like LDD) is one of a class of applications that can take advantage multiple CPU cores very efficiently - only if the application was correctly architected for this in the first place. LDD has to do a large number of geometry calculations every time you move or rotate the model. They could, in theory, offload much of this to the graphics card, but the capabilities of the graphics cards out there and graphics library versions vary so much that it may be better to perform the geometry calculations using the main CPU (hopefully across multiple cores) and offload only the rendering functions to the graphics card.

Disk I/O access should not be a bottleneck for LDD as the entire part library can easily be cached in DRAM memory.

Edited by hga09

Share this post


Link to post
Share on other sites
I'm not suggesting LDD wouldn't be better with multi processor support (as far as a program is concerned, multi-core = multi-processor), but I don't think you'd see the benefits you seem to assume you'd get.

For really large design (over 10,000 pieces), LDD could theoretically benefit from multi-core support rather than slow to a crawl because it's already maxing out a single core.

Share this post


Link to post
Share on other sites

For really large design (over 10,000 pieces), LDD could theoretically benefit from multi-core support rather than slow to a crawl because it's already maxing out a single core.

It doesn't necessarily work like that, if operations need to be carried out linearly then multi-core won't help.

Share this post


Link to post
Share on other sites

Well, I haven't had to deal with parallelizing code in over 15 years, and I haven't done 3D programming for almost as long.

The problem with multi-processing geometric transforms is fighting over memory. Rendering is highly parallelize-able, because you're writing to sectioned off chunk of memory while all processors can read the already transformed geometry without penalty (or with little penalty, anyway). Transforms are better done with a vector processor - the kind you'd find on a GPU.

LDD already requires OpenGL... so if you've got hardware acceleration, LDD should be taking advantage of it.

Maybe one of the problems here is you're seeing maxed out CPUs, but you're not getting the same information about your GPUs... maybe the GPUs are maxing out.

Share this post


Link to post
Share on other sites

Maybe one of the problems here is you're seeing maxed out CPUs, but you're not getting the same information about your GPUs... maybe the GPUs are maxing out.

If only that were so. I've had to swap out graphics cards (my last one fried from LDD coincidentally) and the gap in performance of cards was large, but LDD's performance didn't change at all.

One of the main stress points most people see is when moving a large amount of bricks, LDD slows to a crawl. While the rendering aspect of transforming the visual geometric model might be done on a GPU (I'm not really all that knowledgeable on 3D programming, so correct me if I'm wrong :tongue: ), the calculations needed to figure out connection points for the pieces is most likely done on the CPU.

In cases where I've torn apart a section only to later put it back together, the program will figure out all the hundreds of connection points that were originally there incredibly well and if I've moved the chunk to just the right position it fits together, but it takes a lot of processing power to do it. When I select the entire model to move it, then there is usually never too much of a slow-down.

What might be the real issue is memory, both RAM and cpu. Once again my technical knowledge is small, but my guess is that the giant calculations send data to the cpu to process, and it gets bottled up in the bus before the cpu can even handle everything, so even if you cranked up the processing speed of the cpu, too much data is sent at once. LDD uses huge amounts of memory when models get big, but I noticed that long before my memory gets capped out (LDD maxes out maybe at 75% of total RAM; including other processes) LDD will slow/crash/hurt. So I don't think RAM is the main issue, but its definitely indicative of what could be the problem, extremely large calculation sets. Also, I've had instances where I've made a move that has crashed LDD after longer successions of moves, that after a restart LDD had no problem with. LDD tends to build up its memory usage with further actions (another example, I've used the undo button and LDD has crashed).

What could be a benefit of multi-core functionality is that the code written could allow the large calculations to be sent in a more efficient manner to the multiple cores, as opposed to a large linear calculation dump to a single core. But again, I have no idea if that is true or if I'm completely misunderstanding how this stuff works.

Hope my experience can help get a better understanding of whats going on :sweet:

Share this post


Link to post
Share on other sites

With typically several layers of cache memory, I don't think memory bandwidth would be the biggest issue, but as I mentioned before, things like that are hard to parallelize because later calculations often depend on current calculations - it mostly has to be done serially.

Even if the data could be arranged to be parallelized, it seems to me the memory would then be an issue because different processors would be trying to write to the same pages of memory at the same time.

It all comes down to the fact that multicore and multiprocessing isn't the panacea that people think it is.

I can't tell you how the 3D software they use where I work does parallelization of the interactive work, although I'm sure it's multithreaded for certain operations to keep a more interactive interface (in which case threads could run on other processors), but the greatest advantage is realized during rendering (not a graphics adapter rendering, but generating the series of image files that will later be composited together). In fact, even with a farm of multicore rendering boxes, we have scripts that break up the rendering portions and send them to different boxes to render, then the scripts put them back together at the end.

I'm not arguing that TLG couldn't do this better and faster, but it's unlikely you'd see a huge gain. Frankly, most good compilers should be able to just enable optimizations for SMP when compiling without having to change the code at all - just a recompile. Of course, the style of programming makes optimizations easier or harder, too.

Share this post


Link to post
Share on other sites

Being a 15+ years in software developing myself, i learned much about multithreading, 3D graphics etc. on the way. First of all, porting a singlethread to a multithread is not really easy. It requires lots of additional work to synchronize the threads. Maybe if only some parts (like calculations, where most of the data is only read and not written) would be multithreaded (so the main thread would split into multiple threads, each would calculate a part of the result data, which would then be merged after). A full blown native multithreaded software would need to be rewritten from scratch, which TLG most certainly wont do.

As you correctly found out, moving lots of parts really slows down, since the count of calculations goes up really fast.

For example (without any optimization): if you have like 200 bricks with 1000 free connection points and you move 1 brick with like 10 connection points, that's about 10.000 checks (1000 x 10). On the other hand if you move 10 bricks with like 100 connection points, that's 100.000 checks (10x more -> 10x slower). With a "normal" refresh rate of like 25fps (and that's pretty low), that's 2.5 million checks (again... without optimization, with fictional numbers).

And this is just for connection points. Then there's the geometry overlaps, which of each of the bricks has A LOT (thou probably the first check is overlapping the border boxes first, which is quite fast).

One of the solutions could be diving into OpenCL (which utilizes GPUs, CPUs and all other compatible processing units in the computer), but i'm 99.9% sure that they wont choose that path, since half of the LDD users don't even have "draw outlines" compatible computers.

Share this post


Link to post
Share on other sites

Frankly, most good compilers should be able to just enable optimizations for SMP when compiling without having to change the code at all - just a recompile.

If only life were that easy. I started (but for various reasons never completed) a PhD into trying to do exactly that. It's a really hard problem, even for relatively simple problems. There are certain programming paradigms which lend themselves more to parallelization but often they do so by effectively using a lot of extra CPU power to do work that might turn out to be unnecessary. We might reach a point where that's a worthwhile trade, but we're not there yet.

And, believe it or not, LDD is multithreaded to some extent at least. Sat idle on this machine here, Task Manager is showing 19 threads in LDD. I suspect the problem is just that parallelizing the calculations for connection points and brick positioning is non trivial. And, let's not forget, the primary purpose of LDD is to make Design By Me sets and it's already possible to build them bigger than the Design By Me service can accomodate, so it's not like anybody within Lego is likely to see supporting bigger models as a priority.

Share this post


Link to post
Share on other sites

Hi,

I can just figure out that LDD works in a way similar to SR 3D Builder (the LEGO application I'm developing) so,

when you select one or more parts to be moved, LDD (or other applications) need to copy the geometry to what I call "the selection buffer" that represent your current selection you can move, rotate, copy, delete etc...

The bigger will be the selection, the longer the program will take to copy the geometry. Parallelism in this case is a bit hard to do, since all the geometry need to be copied to the same "buffer". At the same time, geometries about remaining parts in the model should be packed to avoid memory holes, but here parallelism could be implemented !!

Also, the UNDO function take a long time because again the geometry need to be re-moved back to the original model, and again, since geometry is read from a single buffer, parallelism is hard to implement.

With 30.000 parts model probably is the GPU that goes to the MAX since it can't process all the rendering requests.

While moving a selection the CPU does a really little work: just change the value of a pair of matrix and perform the rendering routines, consisting in sending chunks of model geometry and required matrix transformation to the GPU.

I don't know if LDD has a chunk for every single model part. If this is the case, then sending 30.000 chunks to the GPU is a relative long operation and may cause CPU load !!

Anyway, all geometric transformation are based on the sent modified matrix(s) and should be performed by the GPU. If there are a lot of triangles then it takes a bit longer... or too much longer, resulting in non smooth movements.

Hope this comment is not too technical :wink:

Sergio

Share this post


Link to post
Share on other sites

If only life were that easy.

No... I know it's not, but good compilers can do some wonderful optimizations; I realize it's not a "silver bullet" to just recompile with optimizations... if your code doesn't lend itself to being parallelized, then it doesn't lend itself to being parallelized.

The simplest use of multicores right now is simply allowing the OS to run separate processes on separate cores. If you're only doing one intensive thing, though, you don't see much improvement.

Share this post


Link to post
Share on other sites

I'm loving this conversation! :wub: It's very nice to see so many programmers on EB.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.