A. The methods differ in whether or not the likelihood is based on the conditional distribution [y | x] or the joint distribution [y, x]. Methods 1, 3, and 4, use the latter approach typical for SEM, whereas the other methods use the former approach which is more generally applicable (the choice makes a difference in some modeling).
B. If you sharpen the convergence criterion you probably get 0.916 also here.
When all y's are continuous, normal-theory ML gives the same results whether the conditional or joint approach is taken - this is shown in a JASA article by Joreskog & Goldberger (1975?). When y's are categorical, the conditional approach makes less strong assumptions as argued in the 1984 Psychometrika article by me. In general, statistical writing seems to go with the conditional approach because then you make distributional assumptions for residuals rather than the whole y distribution. Why make a distributional assumption for [x] when you don't have to?