r - Converting factor to numeric -
i have 3 million row, 500 column dataset. although columns numeric, when importing csv file, treated factor, not numeric. trying convert them numeric command
wikifixedn<-as.numeric(as.character(wikifixed))
wikifixed dataframe.
it's taking forever... macbook pro, 16gb ram , 2.3ghz core i7 has been churning @ more hour. can see somewhere how far along in process or if process moving along? here another, faster method deal conversation problem?
btw: tried, when importing csv file, force columns treated numeric using
> wikifixed<-read.csv('~/onedrive/kredible/finaldata/wutao/wikipediausers.csv', header = true, stringsasfactors=f)
yet, when checking get
> is.numeric(wikifixed) [1] false
see here
https://stat.ethz.ch/r-manual/r-devel/library/utils/html/read.table.html
you should create vector colclasses
read.table(file, header = false, sep = "", quote = "\"'", dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"), row.names, col.names, as.is = !stringsasfactors, na.strings = "na", colclasses = na, nrows = -1, skip = 0, check.names = true, fill = !blank.lines.skip, strip.white = false, blank.lines.skip = true, comment.char = "#", allowescapes = false, flush = false, stringsasfactors = default.stringsasfactors(), fileencoding = "", encoding = "unknown", text, skipnul = false)
stringsasfactors
logical: should character vectors converted factors? note overridden as.is , colclasses, both of allow finer control.
colclasses
character. vector of classes assumed columns. recycled necessary, or if character vector named, unspecified values taken na.
possible values na (the default, when type.convert used), "null" (when column skipped), 1 of atomic vector classes (logical, integer, numeric, complex, character, raw), or "factor", "date" or "posixct". otherwise there needs method (from package methods) conversion "character" specified formal class.
note colclasses specified per column (not per variable) , includes column of row names (if any).
also see here in case want go data.table because may run more issues.
fread in r imports large .csv file data frame 1 row
require(data.table) fread("pre2012_alldatapoints.csv", sep = ",", header= true)
and read
the data.table faq at
Comments
Post a Comment