« Older Entries | Newer Entries »


Sampling, the procedure of selecting a sample from a population of interest, plays a key role in research both in quantitative and qualitative data analysis. A representative sample provides the basis for making inference about a population, and the inferences based on a true sample have more generalizability about the world we live in. But, getting a representative sample from a population of interest is a real struggle in the practical world.
Sampling biases: A sample fails to represent a target population when it suffers from biases such as coverage bias and nonresponse bias. “Coverage bias occurs when the members of the sampling frame are systematically different from the target population in way that influences the result of the study” Remler & Van Ryzin, (2011). A sample having a good coverage may have very low response rate. The response rate is a product of contact rate and cooperation rate i.e. response rate = contact rate × cooperation rate. The nonresponse bias occurs when a significant number of members of a target sample refuse to respond and they have influence on the results.
Non-probability sampling: Though probability sampling is considered gold standard for generalizability, for many practical reasons, researchers choose non-probability sampling for many of their studies. The major forms of non-probability sampling include voluntary sampling, convenience sampling, and purposive sampling.
Voluntary sampling refers to the participation of members in a study responding voluntarily to an open call. Again, the concern of this kind of sampling is nonresponse bias called volunteer bias because the volunteers may differ from a more representative sample of the population. Convenience sampling refers to situations when researchers recruit participants from a natural gathering or from the people they have easy access to. Convenience sampling suffers from coverage bias because the people who are available to the researcher may not represent the target population of interest. Purposive sampling is a process of choosing participant with unique perspective or holding important roles to represent a theoretical category or considerations of a study. Snowball sampling is another type of sampling mostly used in qualitative studies where the participants are requested to refer the people they know for inclusion in the sample.
Probability sampling: Sampling technique that offers a chance or probability to each element of a population to be selected in a sample is called probability sampling. There are different types of probability sampling such as simple random sampling, systematic sampling, stratified sampling, and cluster sampling.
Simple random sampling is situations where each unit has equal chance or probability of selection. Systematic sampling is a technique of selecting every kth unit from a sample frame beginning at a random start point. Sometimes researcher divides the population into different groups called strata where the strata are mutually exclusive and exhaust entire population. Then, he chooses sample from each strata, which is called stratified sampling. Cluster sampling is another technique where the population is divided into different clusters sometimes following multiple stages, and finally one or more than one clusters are chosen as sample. The basic difference between stratified sampling and cluster sampling is that the strata are internally homogeneous and externally heterogeneous where clusters are internally heterogeneous and externally homogeneous. Random Digit Dialing (RDD) sampling has been used as a probability sampling technique since late 1960 where both listed and unlisted telephone numbers are given equal chance of being selected in a sample.
Sampling distribution: Sampling distribution is an important concept in probability sampling, which refers to the distribution of estimates (means) from many samples. The central limit theorem predicts that estimates of a large number of samples are distributed normally in a distribution, and the curve takes a bell shape, where mean, median and mode are equal. The standard deviation of sample means of the distribution is called standard error, and 68% of sampling distribution falls within +-1 standard error, 95% falls within +-2 standard errors, and 99.7% falls within +-3 standard errors.

-Remler & Van Ryzin (2011). Sampling. Research methods in practice: Strategies for description and causation

For a Humorous Logic Model Example

For a Humorous Logic Model Example: The Friendship Algorithm

Albright & Thyer (2010) A Test of Validity of the LCSW Examination: Quis Custodiet Ipsos Custodes


Albright & Thyer (2010) A Test of Validity of the LCSW Examination: Quis Custodiet Ipsos Custodes

            This article describes a study conducted to evaluate the validity of the Licensed Clinical Social Worker Exam (LCSW), which is considered to be a reliable measure of social workers’ professional competency. The LCSW “pass/fail” exam is administered to social workers with varied levels of education and work experience (Associate, BSW, MSW, Advanced Generalist, and Clinical)[i] and is mandatory in most jurisdictions in the USA.  Administered since 1983, it is “… prepared, marketed and administered…” by the Association of Social Work Boards (ASWB), with the goal to “promote consumer protection” and it is seen as, “… one of the most important assurances that a social worker possesses the competence to practice responsibly”. In their study, Albright & Thyer (2010) administered a modified version of the clinical-level practice exam to 59 MSW students who were in their first year of studies. The exam was modified so that none of the 50 questions could be seen, leaving only the 4 multiple-choice answers related to the deleted questions, visible. Students were asked to guess the right answers. Whereas, “on the basis of chance alone” 25% of the answers should have been guessed correctly”, surprisingly, the students correctly guessed 52% of the answers, raising serious doubt about the ASWB’s claim that it is “a valid assessment of competence to practice social work”.  The results of their study highlight several important points about the importance of using measurements that are valid and psychometrically sound.

– Current literature is lacking in the area of licensing for social workers and social work examinations.  Publishing the psychometric properties behind this measure in professional journals would generate a clearer understanding of the process undertaken and help reassure “…members of the profession who have historically been suspicious of standardized licensing tests.”

– The ASWB asserts that the LCSW is not “… created in a vacuum”, and that the individuals items included in the 11 question categories have face and content validity. However, this is not enough the ensure validity as face and content validity are subjective in nature and are not generally considered to be persuasive by social scientists (Singleton & Straits, 2010). Criterion validity would be more appropriate.[ii]

– The items on the LCSW exam stem from the responses of licensed social workers who are surveyed about the types of tasks they perform, the importance they attribute to these tasks, and the extent to which they consider the task to be a necessary skill. The ASWB ensures that each “…item undergoes a statistical and sensitivity analysis by a group of expert social work professionals”, however no mention is made about the type of analysis and no details are available publically. In addition, the assurances made by the ASWB are not questioned by members of the social work profession. Rather, they are accepted “…as a matter of faith”.  Given the enormous weight attributed to the results of this test and its impact on a social worker’s professional trajectory, and considering the consumer protection objective of the exam, “… anything less than the scientific transparency provided by both internal and independently conducted and published psychometric evaluations…” is not enough.

Other concerns about the LCSW exam:

– As aforementioned, the ASWB is involved in all fundamental and implemental aspects of the LCSW exam, raising the question of Cuis custodiet ipsos custodes? “But who will guard the custodians themselves”?[iii]

– There are tremendous financial gains associated to the ASWB exam yet no financial disclosure on the part of the ASWB and this raises questions. As indicated on the Association of Social Work Boards official website, the current rate for the Clinical exam (used in this study) is $260.00 per exam. A printed study guide can be purchased for $38.00 and a practice exam is available at the cost of $85.00.  In 2013, ten thousand, eight hundred and three (10,803) social workers took the exam, a total cost of $4,137,549.00.  Of this group, approximately 2377 social workers (78%) did not pass the exam and had to retake it, at the full cost (approximately $618,020).  The potential earnings from this clinical cohort alone were $4,755,569.00.[iv]

For the curious and daring, here are 2 sample questions from the clinical-level exam: [v]

1. A six-year-old child lives with a foster family. His father is in prison and his mother is in residential treatment for alcohol dependence. The child is small for his age, often has temper outbursts, and has difficulty completing schoolwork. The social worker notes that his speech is immature. What should the social worker do FIRST?

(A) Work with the foster parents on a behavior modification plan

(B) Suggest that the child’s teacher refer him for special education placement

(C) Refer the child for assessment for fetal alcohol syndrome

(D) Work with the child’s biological mother toward reunification


2. A social worker is conducting a first interview with a client who attempts to dominate the interview from the beginning. The client complains that his telephone is tapped, and says that his house is watched by the police. How can the social worker BEST establish a beginning level of rapport with the client?

(A) Interrupt the client to ask factual questions about his background

(B) Ask the client about the ways in which the social worker can be helpful with these problems

(C) Question the client about when he first believed that his house was being watched

(D) Ask the client to describe the evidence he has that his phone is being tapped


Answers:  1, C; 2, B

How did you do? (Phew, I passed)




[i] Association of Social Work Boards official website.  Accessed on May 12, 2014 from:


[ii] Singleton Jr, R. A., & Straits, B. C. (2010). Measurement. In Approaches to social research. New York: Oxford University Press.

[iii] Yahoo Voices (2014). Ten Latin Proverbs Everyone Should Know.  Accessed on May 12, 2014 from:


[iv] Association of Social Work Boards official website.  Accessed on May 12, 2014 from:


[v] See iv


Dillman, 2007, Effective Survey Design

Dillman’s (2007) presentation slides and exercise related to effective survey design make an excellent point about the importance of survey design and wording of questions for gathering data and conducting useful research. The presentation offers some very tangible illustrations of what can go wrong, and how results can vary as a result of survey question design/wording issues. For example, in two slides on page 4, Dillman demonstrates the difference of opinions expressed about divorce when the question is presented in an open-ended structure and then when it is presented in a close-ended structure (both ordered and “out of order”). The rates of opinion vary so considerably that they greatly call into question the validity of the results.

The good news is that the presentation slides provide some points of consideration to help researchers avoid such pitfalls. Dillman suggests that researchers consider a number of factors or questions when designing a survey. Overall, one is advised to develop survey questions that are most likely to: a) help the researcher get at what they would like to know, b) motivate respondents to answer with honesty and accuracy, and c) reduce unintentional response or measurement error. Different question structures (open-ended, close-ended [with ordered & unordered response categories] and partially close-ended) are then reviewed with the strengths and (especially) the drawbacks of each design’s capacity to contribute to these goals reviewed. Emphasis is also placed on wording questions effectively. The take-home message is that it is a good idea to vary question structures in order to achieve the goals previously discussed.

While Dillman’s notes do not, and could not possibly, provide an exact roadmap to effective survey design, the point about the importance of effective survey design is well-stated and the tips offered are useful. The exercise/illustration at the end of the document, which shows the numerous ways in which a single question can be posed, provides an interesting piece for discussion while also demonstrating how a question can tell us more than simply what we think we might be asking. Which of these question designs would you choose? Why? What could one question structure helps us to learn that another design might not be able to accomplish?

In support of Dillman’s emphasis on the importance of test design and wording, Albright and Thyer’s article about test validity and the ASWB clinical social worker exam offers a strong complement. Albright and Thyer’s experiment, which called into question the genuine challenge presented by (and therefore validity of) the social worker examination, reinforces Dillman’s assertion about the importance of wording and design. Their work also emphasizes what I see as an underlying message in both pieces, which is that test/survey questions express a lot more than the simple question we may think we are asking.

Week 6: Understanding and Interpreting Effect Size Measures (LeCroy & Krysik, 2007)

Understanding and Interpreting Effect Size Measurements 


Effect Size: index of magnitude not directly affected by sample size. (strength of relationship).

-Used for a) Power Estimation b) Sample Size Determination, c) interpret findings.

No known SW journals require reporting Effect size.

Different measures have different reporting of effect sizes.

Hence, this paper will help the reader understand:

a) What the effect size means

b) How they differ

c) How to present outcome for easier interpretation.* Special focus on this section.

Because SW researchers are asking for them,

The Basics

E.S: Magnitude of the effect. (practically significant).

Error to assume P Value alone (likelihood that a finding is due to chance or sampling error)

We should use P Values as “guidance rather than sanctification”.

Often (when sample is small, and author falsely concludes a finding of “no difference” when indeed a large effect size is present, showing a meaningful difference”.

Instead, one should replicate the study with a larger sample size.

OR, Significant result (with large sample), yielding a small effect size (little practical importance).

Different from P value (significance test), a) Effect size is independent of sample size, b) written in standardized units so easier to compare across studies, c) show magnitude of difference

Different Measures of Effect Size: 2 Types. 

1) -Cohen’s d: Standardized Mean Difference: Most common: Cohen’s D. The difference between the two means expressed in terms of their common standard deviation.

Eg: d= .66  = 2/3 of a standard deviation separates the two means.  + = improvement; – = deterioration.

Most often in meta-analyses

The difference between the treatment and control group, in SD or unit scores.

CAN be above 1. CAN be calculated from r by rX2  / Square root  of 1-r2.

2) Point-Biserial r = Effect Size Correlation for Intervention Research

Compute between dichotomous IV (yes/no) and Continuous DV.

3) r2: The proportion of the variance in the dependent variable explained by the independent variable is obtained.  = Strength of the effect size correlation.

e.g.: r=.3; r2=.09, or, 9% of variance explained.

Goodness of Prediction: How much variation is attributable to in the predictor scores.

-Omega squared (w2): Use with ANOVAs

-Binomial Effect Size Display: 2X2 contingency table  showing meaningful effect sizes.

Note: 1) Even small effect sizes are important. 2) Even when effect sizes are similar to other studies, this is often overlooked due to lack of knowledge about effect sizes.

Rows= dichotomous independent variable (Treatment and control)

Columns:dependent variable displayed as a dichotomous outcome (for example, improved compared with not improved).

To Solve this problem,… (see below).

Improving Effect Size Interpretation : The Binomial Effect Size Display

BESD: Helps to interpret effect size when r2 is small.

Is a 2X2 Contingency table.

Rows=dichotomous IV (treatment or control)

Columns=Dichotomous outcome DV (improved vs. not improved).

OR, continuous DV can be presented in Dichotomous categories as well.

Results indicate The r-based BESD illustrates the difference in treatment success if one-half of the population received one condition and one-half received the other condition. The BESD assumes a 5()*>i base rate for both experimental and control groups.

Q the r-BESD wants to answer: “What would the correlationally equivalent effect of the treatment be if 50% of the participants had the occurrence and 50% did not (equal group sizes assumed).

Answer: NO difference between the 2 groups? each cell =50%

Correlation = (A minus B) = Difference in rate of outcome between Experiment  Group and Control Group

Rows: (left to right) = IV (dichotomous, predictor (belonging to a control or belonging to the experiment).

Columns (Top to bottom) = DV (dichotomous, outcome) (worked well/did not work well).

Now, Row and Column totals = 100.

(from elsewhere) cell %ages are all standardized;

In their example with body image, they show that “What is useful about the BESD is that it provides the difference between success rates, whereas r’ as a measure of the strength of the effect size correlation is not very intuitive.” BESD “increases understanding, interpretability, and comparability”.

Some final comments about interpreting effect size measures

Providing and interpreting effect size is important.

Problem: still unknown meter stick to determine what is “meaningful” and what is “not meaningful”.

Answer: become proficient with the studies in your chosen field to understand how small = meaningful.

Problem: What is “clinically significant” and “reliable change” in

Eg: of Aspirin preventing heart attack study: Because of a large p value, ethically, the study was cancelled and both groups received drug; however, due to a small effect size, many scientists are uncertain.

So, they used the BESD, and reanalyzed, that study showed a 3.4% fewer heart attacks (aspiring group : 48.3% compared to 51.7% control group), showing that findings are indeed meaningful.

Conclusions and Recommendations

Providing effect size also allows researchers to conduct meta-analyses, provide outcome expectations for future studies, allow comparisons between studies.

So, provide both Significance Test (p value) as well as Interpretation of Meaning (Effect size).

Always provide Cohen’s d, even if using other effect size methods, as is the most well-known.

Calculate BESD for even more interpretation.

Go a step further, and compare your effect size to what other studies have found.

-A Paper that helps to better understand how to Use and Interpret BESD Effect Sizes can be found at


A Summary of DeVellis R. (1991). Scale Development: Theory and Applications


Construct Validity: “relationship of one variable to a second variable.

Is the measure “behaving” the way that the construct it purports to measure should behave with regard to established measures of other constructs” In sum, Construct Valiidty is a test or a measure of how well a particular measure actually measures that which it is actually claiming to measure.

Different from Criterion-related validity:, which is really a measure of how well one or more variables predicts some sort of outcome based on knowledge of other variables.

Both reach the same end, though I’m not convinced.

In sum, it is the extent that the measure is measuring a criteria that exists in the real world.

Known-Groups validation: Can be either construct or criterion validity, “depending on investigator’s intent”. How well does a scale differentiate members from Group A from Group B, judging on the scale output.

Q: How strong should correlations be in order to demonstrate Construct Validity? No cut off, but essentially above what would otherwise be found due to simple method variance (random error normally established through the covariation of these two)

Multitrait-multimethod Matrix: 

Used to examine construct validity: (how well a measure actually measures what it is actually claiming to measure), but measuring more than one construct with more than one method, creating a matrix of correlations between measurements. Be careful that the correlation is not strong because of “method correlation” (how it is measured) but more as a function of Construct covariation (what is being measured). If measures are related, they score high on convergent validity (similarity between measures of theoretically related constructs; how well are two measures related in relation to their (should be) strong theoretical relatedness). The opposite, Discriminant/Divergent validity would hold if there is no correlation between measures of unrelated constructs, the measures show this. So, if two measures are in fact unrelated, how unrelated are they?

Guidelines in Scale Development: How to create Measurement Scales

Step 1: Determine Clearly What it is You Want to Measure

-Theory as an aid to clarity: Be well versed in the theory related to the construct you want to measure, so that the content of your scale does not “drift into unintended domains”. Yes, always think how theory relates to your construct you want to measure. If not theory exists, think about the construct conceptually, and how your construct relates to other phenomena.

-Specificity as an aid to clarity: Varies along Content domains, setting, or population. How specific is your scale, and how well does it relate to other scales similar to yours.

-Being clear about what to include in a measure: Best to be super clear. How district is the creator’s construct from others’ constructs? Avoid crossing over items that do not fit within the same construct you are intending to measure.

Step 2: Generate an Item Pool

-Choose items that reflect the scale’s purpose; i.e. “the thing”: All items in the scale should reflect the latent variable underlying them. (variable that is not directly observable). How? The content of the item should reflect the construct you intend to measure. Eg: What other ways can an item be worded to get to tap the construct?

Items are: overt manifestations of a common latent variable that is their cause.

-Redundancy: not a bad thing. We want to find as many items that captures the phenomenon. Redundancy is expected.

-Number of items: More than you plan to include in your final scale. You want good internal consistency: how strongly the times correlate with one another. ~ 3-4X more items than in your final scale. The larger the pool, the better. Can start to eliminate items if they are not clear, relevant, unnecessarily similar.

-Characteristics of good and bad items: Clarity. Unambiguous. Avoid lengthy items (leads to complexity). Consider reading level of audience. Avoid multiple negatives. Avoid double-barrelled items (convey two or more ideas). Avoid ambiguous pronoun references: “who is the “their: referring to?

-Positively and negatively worded items: Used to avoid acquiescence, affirmation, or agreement bias. Agree regardless of content in the survey. Problem is negatives may confuse the reader.

-Conclusion: Read above for summary.

Step 3: Determine the format for Measurement: 

Decide on format while choosing items. e.g.: Checklist (be not declarative)

-Thurstone Scaling: Create items that are differentially responsive to specific levels of the attribute in the question. The tuning (determination of what level of the construct each item responds to) is done by a group of judges. Is difficult to find items that confidently resonate to a specific phenomenon.

-Guttman Scaling: Items tapping progressively higher levels of an attribute. Too high of an item should be left out. (smoke 1 pack, some 10 packs, smoke 1000 packs; delete this last one). Applicability is rather limited.

-Scales with equally weighed items:- Thurstone and Guttman: best for scales of items that aim to equally detect a particular phenomenon.

-How many response categories? Can respondent discriminate meaningfully? Avoid “somewhat” and “not very”. Odd or even numbers? Odd implies a central/neutral item; An even requires respondent to make a commitment, even if a weak one. Neither is better.

-Specific types of response formats: many formats.

-Likert Scale: a declarative sentence. Responses should be equally spaced in agreement. Provides a generally strong opinion for surveys. Avoid too mild statements in your questions to avoid too much agreement. Avoid offending participants. Best used to learn opinion, attitude, belief, or other clearly studied construct.

-Semantic Differential: A response on a continuum from right to left, spanning 7-8 lines between them, with respondent placing a mark on the line where they agree.

-Visual Analog: Same as above, but is a continuous scale. Problem: different people give different meanings to different spaces on the line. Great, because they are potentially sensitive, to help determine difference in a weak phenomenon experienced. Also, difficult to recall previous responses, ensuring true responses and avoiding bias in post-manipulation studies.

-Binary Options: e.g.: agree/disagree. Short coming is minimal variability in responses. Need more items to ensure high scale variability. Good as they are easy to answer.

-Item Time Frames: Choose time scale actively rather than passively.

Step 4: Have Initial Item Pool Reviewed by Experts

Get a large group of people who understand your content well to review your questions.

-Confirms or invalidates your definition of the phenomenon. “How relevant is each item to what you intend to measure?”

-Evaluate clarity and conciseness of scale items.

-Helps you tap phenomenon you have failed to include.

-Final decision to include or not is yours.

Step 5: Consider Inclusion of Validation Items

-Add a couple types of times

1) Items that detect flaws/problems. ~ measuring social desirability scale.

2) Pertain to construct validity of the scale.

Step 6: Administer items to a development scale

-administer to a large number of participants. ~ 20 items= 300 participants.

-Sufficiently large to represent the population. Can run a G study to determine generalizability across different populations, so that items are generalized to everyone.

Step 7: Evaluate the Items: The Heart of Scale Development

-Initial examination of items’ performance: High correlation with with the true score of the latent variable. We cannot directly assess the true score and thus cannot compute corrections. We can make inferences.

1) High intercorrelated items. -inspect correlation matrix.

-Reverse Scoring: negative correlations between items? Try to reverse score them.

-Item-scale correlations: one item should strongly correlate with a group of other highly correlated items put into a group, but excluding itself.

-Item Variances: good for scale.

-Item means: If mean is close to middle of scale (1-7), this is good. If it’s to the extreme ends, then may not detect certain values, which will also have low item variance.

-Coefficient alpha/Covariant Matrix: MOST IMPORTANT INDICATOR OF A SCALE’S QUALITY. Evaluates proper variance in the scale that is in fact attributable to the true score. Or, can create  covariate matrix, or Spearman-Brown formula.

Cronbach’s alpha is a measure of internal consistency, or correlations between items, when all items are measuring the same construct. The Alpha value is also influenced by the number of items in the scale.

Alpha: if negative, something is wrong. (0-1). If neg, reverse score items. Anything higher than .70 is a good alpha. If above .90, can consider shortening the scale.

Step 8: Optimize Scale Length

-Effect of scale length on reliability: Since Alpha is influenced by correlation between items and the number of items in a scale, can begin to reduce one of these. Longer scales are more reliable; shorter are easier for participants.

-Effects of dropping “bad” items: If an item has a lower than average correlation with other items, removing that item will increase alpha.

-Tinkering with scale length: The items that have the lowest scale-item correlations should be eliminated first. But, the more items, the more the alpha value increases, and becomes a greater estimate of reliability.

-Scale samples: Split sample of items into two, tinker with one sample, and cross-compare it to the unaffected sample. Split half or unevenly if you have a small sample.

Summary of Paulos (2010) Stories vs. Statistics

Qualitative versus quantitative ways of thinking, speaking and doing research have been an ongoing and seemingly never ending debate especially in social sciences disciplines. As discussed in last week’s readings, this is particularly relevant for the growing discipline of social work, which has been traditionally rooted in qualitative approaches and is even to this day reluctant to embrace the post-positivist/quantitative perspective. Paulos’ (2010, October 24) opinion piece in the New York Times on the dichotomy between storytelling and statistics illustrates that in fact we utilize language from both the literary and scientific cultures in our every day lives, often times unbeknownst to us. Even our every day language contains notions of statistics, mathematics and quantitative philosophies, that come to us quite naturally yet unconsciously. Words such as “usual” and “typical” convey notions of central tendency; words such as “likelihood” and “odds” convey notions of probability; words such as “instance” and “example” convey notions of sampling. Thus, even informal storytelling often times requires the use of quantitative dialect; and on the other side of the coin, the communication of the results of statistical analyses require the use of storytelling methods.  In our every day lives as individuals, as scholars and as researchers, our vocabulary is riddled with the language of both cultures despite our allegiance to one or the other – according to Paulos, it is an unavoidable phenomenon. “With regards to information statistics, we’re a bit like Moliere’s character, who was shocked to find that he’d been speaking prose his whole life” (p. 1 of 4).

Although parts of the opinion article read like the disjointed ramblings of a mathematician mad man that I could not truly grasp, there are still some important points to draw from it. In Paulos’ opinion, one dialect cannot effectively operate without the other – both quantitative and qualitative worlds assist us in responding to what we observe as researchers and communicating those observations with others. The use of both should be encouraged. For example, administering close-ended surveys and questionnaires require reflection on the ordering and phrasing in order to obtain the information we are seeking. Communicating statistical results to social workers or other front line workers requires a certain storytelling ability in order for the findings to make sense and be contextualized in every day client-worker interactions. Although tensions between stories and statistics will always persist, one is just as important as the other, and both cultures need each other in order to make sense to the world.

Summary of Singleton and Straits, 2005, Chapter 2: The Nature of Science, in Approaches to Social Research

Singleton and Straits’ Chapter, The Nature of Science (2005) deals with exactly that, the nature of science. It is an attempt to offer a response to the questions of what science is, what it is not, how it is developed and what are its limitations. It is interested particularly in social science with heavier reference made to sociology and political science. The content of this chapter is foundational for developing an understanding of (and appreciation for?) scientific research.

“What unites science are its objectives, its presuppositions, its general methodology and its logic” (2005, pg. 14). Accordingly, the goal of science is to produce ideas about questions which are appropriate to ask of science (i.e. questions whose answers may be observed). Importantly this means that questions of “morality, existence or ultimate causality” (pg. 16) are outside the purview of science as they are arguably not verifiable through observation.

The product of science, ideas or knowledge helps with our understanding of the world by offering a tentative description of, explanation for, prediction about and/or relationship between certain phenomena. Singleton and Strait state that “Scientists never achieve complete understanding, nor do they assume access to indubitable truths” (pg. 22). While I agree that this is a principle that is inherently at the core of scientific inquiry, I am concerned by a feeling that this is not a characteristic of science that is often emphasized when its products (ideas) are applied to “real world” problems.

While ideas and knowledge are the products of science, the process and principles and/or conditions under which these are produced, are perhaps most characteristic of science. These principles/conditions are:

  • Logical reasoning: requiring that conclusions are based in evidence (whether inductive or deductive)
  • Empiricism: requiring that evidence be observable using at least one of five senses (observation may be direct or indirect, using tools which extend ability/capacity to observe
  • Objectivity: requiring intersubjective testability – the notion that “it would be possible for two or more independent observers…to agree that they are observing the same thing or event” (pg.31).
  • Control: requiring efforts to “eliminate, as far as possible, sources of bias and error that may distort their results” (pg. 32).

These authors are forthcoming in offering that their view of science is idealized and that in reality, there are a number of constraints and challenges which prevent science from being fully actualized as described. The strength of this chapter is that it offers a very clear definition and description of science, which can (and likely should) be applied when considering any research which claims to be based in scientific inquiry. While these authors acknowledge the limits of science and the imperfection of its general application (though I might argue that this is perhaps a bit underemphasized), they offer an extremely straightforward standard by which ideas may be characterized as scientific or not, and they provide students (such as myself) a tool for understanding the principles which should be apparent in the social scientific work that we read, absorb and utilize as a foundation for our own pursuits.

Really annoying discussion question:
If the scientific method is required to produce scientific ideas, does this mean that this chapter – which offered little verifiable evidence to suggest that the “nature of science” was determined using scientific methods – is unscientific? What is the nature of these author’s knowledge about what science is or is not?

Secondary data sources on the web

In class today someone asked about some of the best resources on the web for secondary data. Here are a few that come to mind.

Self-teaching: Income coding edition

Some CRCF brownbaggers are fairly adept at self-teaching. I have to say, it’s not my forté, and I need a pretty specific guide to help me along the way. So, I was very excited when I stumbled upon this set of self-teaching guides from the Luxembourg Income Study (LIS). In SPSS, SAS, or Stata, you can learn how to code the Gini coefficient, equivalent household income, or a relative poverty line. The code, while tailored to LIS variables, is obviously transferrable if you have equivalent variables in your dataset. I’ve been using this one today, to confirm/clarify some of my potentially fuzzy coding logic, and so far the guide is fantastic!

« Older Entries | Newer Entries »
Blog authors are solely responsible for the content of the blogs listed in the directory. Neither the content of these blogs, nor the links to other web sites, are screened, approved, reviewed or endorsed by McGill University. The text and other material on these blogs are the opinion of the specific author and are not statements of advice, opinion, or information of McGill.