Thursday, August 12, 2010

Membership Determination Problems

So the majority of today has been trying to get my membership probability calculating program to work. It's now giving resonable numbers, though I believe I'm actually fundamentally misunderstanding basic statistics and completely messing up.

So I'm using distance from cluster center, and RA/ DEC proper motions as given by ASCC-2.5. So, the probability of a data is just the value of the Gaussian at that point. But when I see membership probabilities in WEBDA, they seem much higher. How do they get a 98% membership probability if the peak 1/sqrt(2pi)*sigma is less than that. And other places speak of 67% chance as a 1-sigma confidence interval. Being 1-sigma away from the mean doesn't mean the value of the Gaussian is 67%. And doesn't a PDF only start having meaning when you integrate and say the probability of something between two values?

So I was thinking along the integration lines, and started computing the probability by measuring what percentage of the PDF it was better than. Something in the center has 1 and at infinity has zero. Though I guess 1-sigma is then 33% not 67%. I don't know.

Also, how do I bring together three different probability sources. No matter how good a star is, multiplying them, which follows formulaic logic, is going to lower its probability. The new data will confirm that its a good point, but the probability will go down.

I feel like I should understand this stuff.

On the bright side, I started running a basic level of the code earlier when I left Cahill, and will check it in the morning. I only need to decide on the metalicity grid to add that dimension, which should be finished tomorrow. I also want to get a specific book about Cross Entropy from the library to research parameter tuning.

After I finish those details, I need to figure out bootstrapping and implement that. I don't understand bootstrapping yet, but it shouldn't take too long to master. To finish, I will test the program's efficiency. My plans to test efficiency include creating a synthetic cluster and attempting to retrieve input parameters. Also, I want to determine the influence of binary population, use of weights, IMF, smoothing parameters, and isochrones source (Padova etc.) on the output. I plan to finish this all next week and get started on the mass fitting protocol. The mass fitting protocol should be simple when compared to this process.

Finishing those, I only need to run clusters through the code pipeline and get back masses. Clusters with previously unpublished values can also be calculated and possibly included in the survey. I believe I will be able to get the necessary stars.

Oh- and I'm writing my second progress report this weekend.

No comments:

Post a Comment