[Disclaimer: I'm a "punk statistician", not a real one :-P]
You're saying that there may be an arbitrarily complex relationship between your variables and the target, and you want to model this relationship... that's generally a really difficult problem. Typically the way one goes about this is to assume the relationship follows some parametric model: e.g., linear, polynomial, exponential. Then you can try to fit the model to the data and see how good the fit is. In the simplest case you'd test for a linear relationship and look at correlation or R2 score, as you mentioned. As you pick more and more complex models, you need to start worrying about the bias variance dilemma [i.e., your model might get a perfect fit, but it'll be fitting noise in the training data; see Bias–variance dilemma].
The obvious question is: how do I pick a model to fit? No easy answer here, unfortunately. One way to do this is to just look at the data and get a feel for what the possible relationships may look like. Then pick the SIMPLEST model that you feel may capture the relationship (see again bias-variance). The ultimate test is to see how well your model can predict the target for new (or held out) data points.
Another option is to fit a non-parametric model. These models are great if your ultimate goal is to make predictions for novel data points (e.g., what will be the disease incidence if temperature rises to 45, and elevation is X?). You could try random forest regression [see 3.2.3.3.2. sklearn.ensemble.RandomForestRegressor], or radial basis functions [see Radial basis function]. The main disadvantage to these types of models is that while they may be able to make good predictions, they are not as interpretable. When relationships are really complex though, this is your best bet.
If the one independent variable is categorical (information about group membership) and all of the outcome variables are quantitative, roughly normally distributed, and associations among them are linear, then Multivariate Analysis of Variance (MANOVA) could be used. (In SPSS this is obtained using GLM).
If all of the variables are quantitative, you could set up a regression using the scores for your “causal” independent variable as values of Y, and the scores for your outcome variables as values of X1, X2 etc. The regression would show you whether there is some weighted linear combination of s
If the one independent variable is categorical (information about group membership) and all of the outcome variables are quantitative, roughly normally distributed, and associations among them are linear, then Multivariate Analysis of Variance (MANOVA) could be used. (In SPSS this is obtained using GLM).
If all of the variables are quantitative, you could set up a regression using the scores for your “causal” independent variable as values of Y, and the scores for your outcome variables as values of X1, X2 etc. The regression would show you whether there is some weighted linear combination of scores on the quantitative X variables that is strongly related to the score on the single Y variable.
Y’ = b0 + b1X1 + b2X2 + …. bkXk
It’s conventional to think of the X’s as “causes” of Y in this regression situation, but the analysis is just a way of finding out whether scores on Y are related to some combination of scores on the X’s. Regression does not “know” anything about your assumptions regarding possible cause/effect.
Unless your research design was experimental, neither analysis tells you anything about cause and “effect”. Each analysis only tells you if the scores for your set of outcomes are statistically significantly related to the score on your single predictor variable.
I'll be unkind now. There is high chance you will not get any more answers because you don't respect the people your asking help from.
You can commit a bit more time in stating your question properly, on double-checking your spelling and your grammar.
What is your hypothesis? What is your continuous variable?
What I understand from your brief description is that you want to compare if the guys choosing one color have a higher average in their continuous variable than the ones who are not choosing the color (same applies for color combinations).
Let's say that you have color blue, and the continuou
I'll be unkind now. There is high chance you will not get any more answers because you don't respect the people your asking help from.
You can commit a bit more time in stating your question properly, on double-checking your spelling and your grammar.
What is your hypothesis? What is your continuous variable?
What I understand from your brief description is that you want to compare if the guys choosing one color have a higher average in their continuous variable than the ones who are not choosing the color (same applies for color combinations).
Let's say that you have color blue, and the continuous variable is height. What is your hypothesis? Do you want for example to show that people who choose color blue (e.g. N1=15) are taller than those not choosing color blue (e.g N2=42)?
Then calculate the average for each group (the blues and the non blues) and measure their difference. You can use permutation testing to check if your result is statistically significant, good is that permutation testing has (next to zero) assumptions. This can be done by randomly permuting the members of each group keeping the group sizes same. So out of the 15+42=57 people (your observations), assign them randomly into groups of 15 and 42 members and calculate again their mean difference. Repeat this process for a lot of times depending on the significance threshold you want. For example repeat the process for 10000 times and then sort the results (the mean differences). Then if the real difference you calculated is higher than let's say 9900 of the permutations your result is significant at a significance threshold of p<0.01.
The conclusion of that test is that the groups are not defined randomly but rather there is a relationship between the color choice and the height, so that randomly divided groups would not have such a high difference as the one you observed from your data.
Good luck with that
Environmental analysis involves examining external and internal factors to understand opportunities, challenges, and trends affecting an organization or system. Several models and frameworks are commonly used to analyze and interpret environmental factors:
1. PESTEL Analysis
Focuses on macro-environmental factors:
- Political: Government policies, regulations, and political stability.
- Economic: Market trends, inflation rates, economic growth, and exchange rates.
- Social: Cultural trends, demographics, and consumer behaviors.
- Technological: Innovations, R&D, and technological changes.
- Environmental: Sust
Environmental analysis involves examining external and internal factors to understand opportunities, challenges, and trends affecting an organization or system. Several models and frameworks are commonly used to analyze and interpret environmental factors:
1. PESTEL Analysis
Focuses on macro-environmental factors:
- Political: Government policies, regulations, and political stability.
- Economic: Market trends, inflation rates, economic growth, and exchange rates.
- Social: Cultural trends, demographics, and consumer behaviors.
- Technological: Innovations, R&D, and technological changes.
- Environmental: Sustainability issues, climate change, and environmental regulations.
- Legal: Laws affecting business, including labor laws, consumer protections, and intellectual property rights.
2. SWOT Analysis
Examines internal and external factors:
- Strengths: Internal advantages or capabilities.
- Weaknesses: Internal limitations or deficiencies.
- Opportunities: External chances for growth or improvement.
- Threats: External risks or challenges.
3. Porter’s Five Forces
Assesses industry-level competitiveness:
- Threat of new entrants: Barriers to entry for potential competitors.
- Bargaining power of buyers: Influence of customers on pricing and terms.
- Bargaining power of suppliers: Influence of suppliers on costs and inputs.
- Threat of substitutes: Availability of alternative products or services.
- Industry rivalry: Intensity of competition among existing players.
4. Value Chain Analysis
Breaks down internal operations to identify areas for value creation or improvement:
- Primary activities: Inbound logistics, operations, outbound logistics, marketing, and sales.
- Support activities: Procurement, technology development, HR management, and firm infrastructure.
5. Scenario Analysis
Explores multiple future possibilities by creating scenarios based on varying assumptions about key uncertainties (e.g., technological developments, regulatory changes, or market conditions).
6. Ansoff Matrix
Helps identify strategic growth opportunities:
- Market penetration: Existing products in existing markets.
- Product development: New products in existing markets.
- Market development: Existing products in new markets.
- Diversification: New products in new markets.
7. STEEP Analysis
Similar to PESTEL, but focuses on:
- Social
- Technological
- Economic
- Ecological
- Political factors.
8. McKinsey 7S Framework
Evaluates organizational effectiveness by focusing on seven interrelated factors:
- Strategy, Structure, Systems (hard elements).
- Shared values, Skills, Style, Staff (soft elements).
9. Benchmarking
Compares an organization’s performance, practices, or processes to industry standards or competitors to identify areas for improvement.
10. Environmental Scanning
A continuous process of collecting information on external trends and events using tools like:
- Surveys and reports.
- Media monitoring.
- Big data analytics.
11. BCG Matrix
Used to analyze a portfolio of business units or products:
- Stars: High growth, high market share.
- Cash cows: Low growth, high market share.
- Question marks: High growth, low market share.
- Dogs: Low growth, low market share.
12. Ecosystem Mapping
Identifies key stakeholders, relationships, and dynamics within a specific environment or system to assess opportunities for collaboration or intervention.
13. Force Field Analysis
Analyzes driving and restraining forces influencing change, helping decision-makers address barriers and leverage strengths.
14. Critical Success Factor (CSF) Analysis
Identifies areas essential for achieving organizational objectives and evaluates external factors impacting these areas.
These models provide diverse perspectives on analyzing the environment, allowing organizations to make informed strategic decisions and adapt to changing contexts effectively.
Multiple regression does not speak to the question of whether there is a correlation between two variables. To answer that, you must do a univariate regression. If the t-statistic on the slope coefficient is significantly different from zero, you can assert that the correlation between the dependent and independent variable is significantly different from zero with the same significance level.
One way to think about the slope coefficient for a variable in a multiple regression is it tells you whether the independent variable is correlated to the dependent variable after adjusting both for all t
Multiple regression does not speak to the question of whether there is a correlation between two variables. To answer that, you must do a univariate regression. If the t-statistic on the slope coefficient is significantly different from zero, you can assert that the correlation between the dependent and independent variable is significantly different from zero with the same significance level.
One way to think about the slope coefficient for a variable in a multiple regression is it tells you whether the independent variable is correlated to the dependent variable after adjusting both for all the other independent variables. That’s not precisely true, but it gives you the general idea.
For example, you might find that owning a bicycle had a negative correlation to weight in a univariate analysis, but if you added amount of weekly exercise and age as independent variables, the effect of owning a bicycle on weight became insignificant. That would suggest that while bicycle ownership is correlated with weight, it operates as an indirect reflection of age and exercise, without statistically significant direct effect on weight.
statistical analysis does NOT Cause' anything!
However, depending on the nature of the data , it may reveal causal connection.
Some programs like nVivo seek to do a similar thing with qualitative data, but this is NOT the same thing.
Because in humans the outcome is (almost) always affected by factors that you wonder if they matter.
In research I was closely involved in, two treatments for club foot were being compared to see which would be better able to achieve a good outcome without surgery.
Sounds pretty straight forward, doesn’t it? To make a long story short, one method required a lot of treatment of the feet at home, but had fewer trips to the hospital than the other, which was mostly hospital treatments of the foot. It turns out that the home based treatment was better, but required a lot of dedication to the treatme
Because in humans the outcome is (almost) always affected by factors that you wonder if they matter.
In research I was closely involved in, two treatments for club foot were being compared to see which would be better able to achieve a good outcome without surgery.
Sounds pretty straight forward, doesn’t it? To make a long story short, one method required a lot of treatment of the feet at home, but had fewer trips to the hospital than the other, which was mostly hospital treatments of the foot. It turns out that the home based treatment was better, but required a lot of dedication to the treatment plan. Problem: dedication was hit and miss. When the parents did not stick to the treatment plan, the result was usually disappointing.
Turns out that the effective parents were of certain socioeconomic groups. Likewise the effective outcomes were seen with parents in other socioeconomic groups. If that information had not been recorded, the mixed results for the home treatment would have been hard to explain.
Hope that helps.
The wording of the question is a little hard to follow, but it sounds like a standard 8x2 factorial design in ANOVA, or a simple interaction in multiple regression, but split by the latter variable into two analyses (which is a standard practice to some, but not ideal since it loses power).
The simple solution, if you want to go the regression route (and it sounds like you do), is to use interaction terms in your regression. In linear regression, that’s as simple as multiplying each of your eight variables by your key “relatable conditions” variable, i.e. a second variable, such that your regre
The wording of the question is a little hard to follow, but it sounds like a standard 8x2 factorial design in ANOVA, or a simple interaction in multiple regression, but split by the latter variable into two analyses (which is a standard practice to some, but not ideal since it loses power).
The simple solution, if you want to go the regression route (and it sounds like you do), is to use interaction terms in your regression. In linear regression, that’s as simple as multiplying each of your eight variables by your key “relatable conditions” variable, i.e. a second variable, such that your regression looks like this:
b0 + b1(var1) + b2(var2) + b3(var1)(var2) +… (repeat for other vars)… + e = outcome
In which each b (which is meant to be beta, sorry) are your coefficient, and you basically do it 8 times against that second key variable. That interaction term (b3 in this case) represents the change in variable 1 in the presence of variable 2 (or an increase of 1 in variable 2); and in linear regression can simply be added to the main effects (in this case, b1 and b2). It’s much better than splitting this into two regressions, since it’ll give you more statistical power since your sample includes both sides of that key variable (left/right, whatever).
Now, if you’re really feeling saavy, technically you can compare results of two multiple regression analyses— especially if you aren’t as concerned about statistical power. In meta-analysis we do this all the time by converting it to an effect size metric, like Hedges’ G. That said, if something isn’t significant, that is sometimes considered a “0” in effect size, so it only really works with significant findings unless you’re willing to “fake” it with a weight.
Suffice it to say, it’s easiest to just run everything at once using the interaction term.
You look at the current scientific literature on the subject, typically what’s been published about it within the past five years. You can go a little further back than that, but expect it to be considered outdated.
But even if a disorder has a genetic basis—guess what? You can still do something about it. You’re not necessarily doomed. If you have a psychological disorder such a schizophrenia (which has a genetic component), you can still choose to take your medicine, get plenty of sleep, go to your doctor, do your therapy, get help from your treatment team, go to support groups, etc.
In other
You look at the current scientific literature on the subject, typically what’s been published about it within the past five years. You can go a little further back than that, but expect it to be considered outdated.
But even if a disorder has a genetic basis—guess what? You can still do something about it. You’re not necessarily doomed. If you have a psychological disorder such a schizophrenia (which has a genetic component), you can still choose to take your medicine, get plenty of sleep, go to your doctor, do your therapy, get help from your treatment team, go to support groups, etc.
In other words, you have choices, despite your genetics.
Disclaimer:
This answer is not a substitute for professional psychotherapeutic or medical advice. This answer is for general informational purposes only and is not a substitute for professional medical or psychiatric advice. If you think you may have a medical emergency, call your doctor or (in the United States) 911 immediately. Always seek the advice of your doctor or therapist before starting or changing treatment. All case examples have had identities and other details changed in order to secure and protect confidentiality. They are case studies with significant details altered. Quora users who provide responses to health-related questions are intended third party beneficiaries with certain rights under Quora's Terms of Service (https://www.quora.com/about/tos).
- Without some knowledge of the process that you are examining it is impossible to give a good answer to this question
- I would suggest that you return to the theory underlying the process and specify some kind of model.
- When you have specified your model you should use your dataset to verify that the model and your data are consistent.
- Then you estimate the model and make deductions
- Doing multiple searches of correlations and models will lead to spurious results.
- You should talk to your teacher and determine what he wants you to do. Your lecture notes or recommended textbook may help. Perhaps a simil
- Without some knowledge of the process that you are examining it is impossible to give a good answer to this question
- I would suggest that you return to the theory underlying the process and specify some kind of model.
- When you have specified your model you should use your dataset to verify that the model and your data are consistent.
- Then you estimate the model and make deductions
- Doing multiple searches of correlations and models will lead to spurious results.
- You should talk to your teacher and determine what he wants you to do. Your lecture notes or recommended textbook may help. Perhaps a similar model and dataset have already been studied in your lectures and that is what the lecturer wants you to apply
Treat each each combination of the two variables as a single variable.
Then calculate the correlation Coefficient for the third variable against the combination variable representing the combination of the the other two variables. This would give you the impact the one variable has on different combinations of the other two variable.
Correlation Coefficient: Simple Definition, Formula, Easy Steps
Correlation coefficients are used in statistics to measure how strong a relationship is between two variables. There are several types of correlation coefficient: Pearson’s correlation (also called Pears
Treat each each combination of the two variables as a single variable.
Then calculate the correlation Coefficient for the third variable against the combination variable representing the combination of the the other two variables. This would give you the impact the one variable has on different combinations of the other two variable.
Correlation Coefficient: Simple Definition, Formula, Easy Steps
Correlation coefficients are used in statistics to measure how strong a relationship is between two variables. There are several types of correlation coefficient: Pearson’s correlation (also called Pearson’s R) is a correlation coefficientcommonly used in linear regression. If you’re starting out in statistics, you’ll probably learn about Pearson’s R first. In fact, when anyone refers to the correlation coefficient, they are usually talking about Pearson’s. Correlation Coefficient: Simple Definition, Formula, Easy Steps
Fist you have to do a simple transformation by shifting the Average of the distances of each point from an Origin or a fixed point. You will have the relative distances for each point from that origin. And also shift the Average of the Ph to the Origin.
To get a preliminary idea, plotting the Distances along the axis of X and the Soil PH along the Y will give a visual of how the values are scattered. A pattern will emerge that may give an idea of what statistical tool to use. Is there a correlation between the two if so what is it? A +Ve or a -Ve correlation? This is assuming there are only 2 v
Fist you have to do a simple transformation by shifting the Average of the distances of each point from an Origin or a fixed point. You will have the relative distances for each point from that origin. And also shift the Average of the Ph to the Origin.
To get a preliminary idea, plotting the Distances along the axis of X and the Soil PH along the Y will give a visual of how the values are scattered. A pattern will emerge that may give an idea of what statistical tool to use. Is there a correlation between the two if so what is it? A +Ve or a -Ve correlation? This is assuming there are only 2 variables are being considered.
By fitting A Simple Linear Regression line will also help predicting the PH level at a certain distance, after converting back to the original measures of location. This is the simplest way to analyze the data.
Might also consider other features of the data, such as are there other components in the soil that affecting the PH levels equally or differently. But that will need more analyses of the soil.
By definition, dependent variables are affected by independent variables, so presumably you want to know which test to use when you have three categorical independent variables and one dependent variable?
The answer depends on (among other things) the level of measurement of the dependent variable (nominal, ordinal, interval, or ratio) and the design of the experiment (e.g., between-groups, within-groups, mixed).
For example, if your dependent variable is interval or ratio, an ANOVA might be appropriate, though, if your dependent variable is nominal, a chi square test might be.
Based on the infor
By definition, dependent variables are affected by independent variables, so presumably you want to know which test to use when you have three categorical independent variables and one dependent variable?
The answer depends on (among other things) the level of measurement of the dependent variable (nominal, ordinal, interval, or ratio) and the design of the experiment (e.g., between-groups, within-groups, mixed).
For example, if your dependent variable is interval or ratio, an ANOVA might be appropriate, though, if your dependent variable is nominal, a chi square test might be.
Based on the information in your question, there are various ways your analysis might be approached, so providing a little more information (e.g., level of measure of dependent variable, experimental design, hypotheses) would enable respondents to provide a more precise answer.
Statistical analysis helps in analysing the data by using appropriate concept measures formulae and tests mainly basing on mean and standard deviation
I think application of significance confidence limits errors probality May indicate that the results are approaching to reality (As you mentioned)
Link between variablble
Relation ship between X and y VARIABLES may be association in numbers or terms ( quantitative or qualitatative) Increase or decrease ----)
Qualitative terms Good bad Large small --
Casual or emperical May be
In addition to the logistic regression analysis, you might also try the decision trees or random forest algorithms. It might be argued that these are “data science” rather than “statistical analysis”, but that’s a definitional issue. Either of these methods will work for predicting a binary output. For that matter, a neural net type method will also work.
I have tried all three on the same dataset in the past (predicting success of graduate students in various university programs) and they came to broadly similar conclusions. Some notes about those comparisons (all done using SPSS, though I cou
In addition to the logistic regression analysis, you might also try the decision trees or random forest algorithms. It might be argued that these are “data science” rather than “statistical analysis”, but that’s a definitional issue. Either of these methods will work for predicting a binary output. For that matter, a neural net type method will also work.
I have tried all three on the same dataset in the past (predicting success of graduate students in various university programs) and they came to broadly similar conclusions. Some notes about those comparisons (all done using SPSS, though I could have used R or Python, I suppose):
- Logistic regression was superior in understanding the importance of each variable, and happened to be slightly better at prediction. But it can be hard to interpret (e.g. no “easy to understand” R-square measure to quote to your employer/supervisor/client).
- Decision trees was similar to logistic regression, both in terms of understanding the importance of the different variables and in prediction. It could be argued that logistic regression gives a more “fine-grained” understanding of the importance of the different variables (I would probably support that position). However, Decision Trees is easy to understand and explain to non-statisticians, which is a definite advantage for it. You also don’t have to do a lot of variable transformations (e.g. for categorical data), which is necessary for logistic regression.
- A simple perceptron neural net (in SPSS) had a similar level of predictive power to the other methods, but it doesn’t score well on the understanding/explainability factors. However, if that isn’t an issue, it does have the advantage of potentially accepting many more variables in the model. Of course, putting a lot of variables into a model increases the problem of over-fitting, so there’s that. You could also step this up to much more sophisticated “deep-learning” algorithms, which could be advantageous, depending on the research problem.
So, you have choices. In practical terms, it probably depends on what method you know best (or are willing to commit the time to learn), what your data seems to call for, and who your audience is.
Clearly, the easiest approach is to use multiple regression analysis. However, it depends on what you want to find out.
METHOD ONE
If you want to know the relationship between each IV and the DV separately, then you can run a series of simple regressions (one IV per regression) or just examine the correlations between each IV and the DV.
METHOD TWO
If you want to find out the relationship between each IV and the DV — but controlling for the effects of all other IV’s — then run a regression using all the IV’s simultaneously. This is usually the best approach.
METHOD THREE
If you want to organize the
Clearly, the easiest approach is to use multiple regression analysis. However, it depends on what you want to find out.
METHOD ONE
If you want to know the relationship between each IV and the DV separately, then you can run a series of simple regressions (one IV per regression) or just examine the correlations between each IV and the DV.
METHOD TWO
If you want to find out the relationship between each IV and the DV — but controlling for the effects of all other IV’s — then run a regression using all the IV’s simultaneously. This is usually the best approach.
METHOD THREE
If you want to organize the IV’s into new variables that measure some common characteristic, then run the IV’s through principal components analysis (PCA) and use the PCA variables instead of your original IV’s. This approach can work well if you have a large number of IV’s that are highly correlated. However, interpretation can be very difficult. Thus, this approach has to be used with care.
My advice is to use multiple regression analysis (Method 2). For most people, this approach gives sensible answers and is easy to interpret.
If you already know that, say, x, y, z are independent, then you can regress w on all of them to get partial regression coefficients for each:
[math]w=ax+by+cz+d[/math]
You can do this in R by loading your x,y,z,w data into a data frame:
MyData <- data.frame(x=x,y=y,z=z,w=w)
summary( lm( w ~ x+y+z, MyData) )
What this will do is, assuming the relationships are linear, give you the independent contributions of each
If you already know that, say, x, y, z are independent, then you can regress w on all of them to get partial regression coefficients for each:
[math]w=ax+by+cz+d[/math]
You can do this in R by loading your x,y,z,w data into a data frame:
MyData <- data.frame(x=x,y=y,z=z,w=w)
summary( lm( w ~ x+y+z, MyData) )
What this will do is, assuming the relationships are linear, give you the independent contributions of each of x, y and z to w via a partial regression coeff...
Broad Factors Analysis, commonly called the PEST Analysis, is a key component of external analysis. A Broad Factors Analysis assesses and summarizes the four macro-environmental factors — political, economic, socio-demographic (social), and technological. These factors have significant impacts on a business’s operating environment, posing opportunities and threats to the company and all of its competitors. Broad Factors Analysis is widely used in strategic analysis and planning because it helps companies determine the risks and opportunities in the marketplace. That, in turn, becomes an import
Broad Factors Analysis, commonly called the PEST Analysis, is a key component of external analysis. A Broad Factors Analysis assesses and summarizes the four macro-environmental factors — political, economic, socio-demographic (social), and technological. These factors have significant impacts on a business’s operating environment, posing opportunities and threats to the company and all of its competitors. Broad Factors Analysis is widely used in strategic analysis and planning because it helps companies determine the risks and opportunities in the marketplace. That, in turn, becomes an important consideration when companies are developing corporate and business strategies.
Me, first thing I do is graph the data and then calculate a correlation coefficient. Does it form a line or does it look like a shotgun blast? You’re done. If it forms a pattern other than a line but is some other identifiable pattern such as an exponential or logarithmic, I fit an equation to it and re-calculate the correlation coefficient.
I agree with Quora User wholeheartedly - as someone who has quite a bit of expertise in time series.
If this is critical, hire someone with the chops to do it. You can learn how, but it would probably be cheaper to have someone do it for you. It would certainly take less time.
And for the love of <insert random religious/whatever figure here>, do NOT use Excel…
Pearson correlation coefficient - Wikipedia
If you meet the necessary assumptions, you can convert the r into a t and run a Student’s t-test. That’s what most software will spit out.
Pearson correlation coefficient - Wikipedia
If you meet the necessary assumptions, you can convert the r into a t and run a Student’s t-test. That’s what most software will spit out.
“Relationship between variables”, is a very fuzzy notion. I suspect that what you are trying to ask, without actually putting it so is, “How can statistical methods determine if there is or isn’t a cause-and-effect relationship between two variables?”
Have a look at this link for the Framingham Heart Study, a data collection and analysis project that is now in its 76th year. They have been using statistical methods to winkle out the relationships between environmental, behavioral, and genetic variables and the likelihood that individuals will develop heart disease.
“When it launched in 1948 the
“Relationship between variables”, is a very fuzzy notion. I suspect that what you are trying to ask, without actually putting it so is, “How can statistical methods determine if there is or isn’t a cause-and-effect relationship between two variables?”
Have a look at this link for the Framingham Heart Study, a data collection and analysis project that is now in its 76th year. They have been using statistical methods to winkle out the relationships between environmental, behavioral, and genetic variables and the likelihood that individuals will develop heart disease.
“When it launched in 1948 the original goal of the Framingham Heart Study (FHS) was to identify common factors or characteristics that contribute to cardiovascular disease.”
SPSS is a software tool not a form of analysis. You can do most of the analytics using SPSS if you wish.
As for which analysis you should use that depends on what you are trying to do and what data you have.
I strongly suggest you give some thought to what you are asking and provide the context.
I don’t know if there is a ‘proper’ correlation for any type of variable since there are many possible ways to evaluate pairwise dependence. The commonsense or conventional meaning of ‘correlation’ refers to the Pearson correlation, which is a test for linearity.
Spearman correlations use the same formula as the Pearson but first transform the variables into rank ordering. Hence, it is a test for monotonic dependence.
Probably the most general dependence metric is Szekely’s distance correlation.
This paper compares and contrasts many dependence metrics, but it is not exhaustive.
I don’t know if there is a ‘proper’ correlation for any type of variable since there are many possible ways to evaluate pairwise dependence. The commonsense or conventional meaning of ‘correlation’ refers to the Pearson correlation, which is a test for linearity.
Spearman correlations use the same formula as the Pearson but first transform the variables into rank ordering. Hence, it is a test for monotonic dependence.
Probably the most general dependence metric is Szekely’s distance correlation.
This paper compares and contrasts many dependence metrics, but it is not exhaustive.
Test of correlation[math]\rho[/math] would be the best choice to determine if there is significant relationship between two variables.
If you want to more details check:
If you already have a sample data you can try using my online step by step solution calculator to perform the test:
https://app.stat-solution.com/#!/hypothesis/test-of-correlationDo visit the link below for such online step by step automated solutions:
https://www.stat-solution.com/Commonly used statistical analysis methods in scientific studies include:
1. Descriptive Statistics;
2. Inferential Statistics;
3. Regression Analysis;
4. Hypothesis Testing;
5. Sample Size Determination; and
6. Correlation Analysis.
The ratio of genetic variance over total variance is called heritability. It’s exact definition can vary quite a bit, but ultimately is determined by how you decide to make your estimates.
Researchers estimate heritability by performing heritability studies.
Why is it challenging to study genetic contributions to behaviour?
The concept of heritability was adopted from its usage by breeders.
In humans, parent-offspring regression and twin studies are the bread-and-butter for heritabilit
The ratio of genetic variance over total variance is called heritability. It’s exact definition can vary quite a bit, but ultimately is determined by how you decide to make your estimates.
Researchers estimate heritability by performing heritability studies.
Why is it challenging to study genetic contributions to behaviour?
The concept of heritability was adopted from its usage by breeders.
In humans, parent-offspring regression and twin studies are the bread-and-butter for heritability studies.
More sophisticated and extensive studies can make use of extended familial relationships (i.e. pedigrees) and even (“realized”) genetic relationships based on molecular markers.
It’s important to note that a lot of survey designs and estimate adjustments are done for confounding factors (particularly the environment). These are regularly insufficient, however. Human heritability studies, especially those dealing with behavior, psychology, and intelligence, are often criticized for this reason.
It depends on the disorder and the existing research. Some disorders have a lot of research and even genetic tests, so you could probably find an answer and believe it with more certainty than in other cases.
To my knowledge, there are some disorders or illnesses that are directly caused by the environment. Examples include alcohol intoxication caused by excessive drinking; physical and psychological illness caused by high levels of mercury or toxins.
Many disorders and illnesses have a genetic basis, as evidenced by genetic tests for breast cancer and trends in mental illness within families. T
It depends on the disorder and the existing research. Some disorders have a lot of research and even genetic tests, so you could probably find an answer and believe it with more certainty than in other cases.
To my knowledge, there are some disorders or illnesses that are directly caused by the environment. Examples include alcohol intoxication caused by excessive drinking; physical and psychological illness caused by high levels of mercury or toxins.
Many disorders and illnesses have a genetic basis, as evidenced by genetic tests for breast cancer and trends in mental illness within families. The nature-nurture paradigm might be relevant here, since lifestyle factors are so important in mental health and chronic disease.
Since gene research and medicine are constantly evolving, if you were to ask this question in ten or twenty years, you would have more data to draw from and more disorders to research.
That depends on how the ranked data for your DV works. Are you giving people several options and then asking them to rank them by preference? For example, if colors were ranked, a person might rank blue first, green second, red third, etc. In that case, it’s best to create several variables and treat each choice as its own variable. So, in this case, you would have a variable called “blue” and it would get a value of 1, while another variable called “green” would get a value of 2, and so on. That would give you as many variables as you had choices for ranking.
You have a few choices on how to a
That depends on how the ranked data for your DV works. Are you giving people several options and then asking them to rank them by preference? For example, if colors were ranked, a person might rank blue first, green second, red third, etc. In that case, it’s best to create several variables and treat each choice as its own variable. So, in this case, you would have a variable called “blue” and it would get a value of 1, while another variable called “green” would get a value of 2, and so on. That would give you as many variables as you had choices for ranking.
You have a few choices on how to analyze that kind of data but probably the most important question is what are you trying to accomplish? What the purpose or practical outcome of this analysis? If you can say more about your project and what the variables are like, then I might be able to give a more specific and useful suggestion. I’d love to hear from you about that.
Bart
Ok so first you need to understand your outcome and how is it measured. In your case , is your outcome measured at different time points or has multiple measurements? If yes then how is it measured eg is it proportions measured over time or is it a continuous measurement like hb, diastolic bp or systolic bp measured over time.
These all things obviously depends upon your research question and your hypothesis.
if you need further help. Do contact me on my mail id where we can discuss further.
and like always choosing a statistical test is not important but the reason behind it and how appropriate
Ok so first you need to understand your outcome and how is it measured. In your case , is your outcome measured at different time points or has multiple measurements? If yes then how is it measured eg is it proportions measured over time or is it a continuous measurement like hb, diastolic bp or systolic bp measured over time.
These all things obviously depends upon your research question and your hypothesis.
if you need further help. Do contact me on my mail id where we can discuss further.
and like always choosing a statistical test is not important but the reason behind it and how appropriate it is for your data holds the power.
Yes and no - and the “statistics” part is irrelevant. First you need to define what “relationship” means. And then “variables.” Those definitions can be, and often are, so nebulous that any “relationship” can be inferred.
One logical postulat that should be kept in mind: Any conclusion following from a false premise is true.
For example, I’ve recently been amusing myself reading St. Augustine’s musings about the “relationship” between himself and “god.” First he defines “god” as “infinite” and “omnipotent”, and himself as “humble” and “unworthy.” Then proceeds to define a relationship between th
Yes and no - and the “statistics” part is irrelevant. First you need to define what “relationship” means. And then “variables.” Those definitions can be, and often are, so nebulous that any “relationship” can be inferred.
One logical postulat that should be kept in mind: Any conclusion following from a false premise is true.
For example, I’ve recently been amusing myself reading St. Augustine’s musings about the “relationship” between himself and “god.” First he defines “god” as “infinite” and “omnipotent”, and himself as “humble” and “unworthy.” Then proceeds to define a relationship between those two variables that presumes that his characterization of “god” is correct.