r - Converting factor to numeric -


i have 3 million row, 500 column dataset. although columns numeric, when importing csv file, treated factor, not numeric. trying convert them numeric command

wikifixedn<-as.numeric(as.character(wikifixed)) 

wikifixed dataframe.

it's taking forever... macbook pro, 16gb ram , 2.3ghz core i7 has been churning @ more hour. can see somewhere how far along in process or if process moving along? here another, faster method deal conversation problem?

btw: tried, when importing csv file, force columns treated numeric using

> wikifixed<-read.csv('~/onedrive/kredible/finaldata/wutao/wikipediausers.csv', header = true, stringsasfactors=f) 

yet, when checking get

> is.numeric(wikifixed) [1] false 

see here

https://stat.ethz.ch/r-manual/r-devel/library/utils/html/read.table.html

you should create vector colclasses

read.table(file, header = false, sep = "", quote = "\"'",        dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),        row.names, col.names, as.is = !stringsasfactors,        na.strings = "na", colclasses = na, nrows = -1,        skip = 0, check.names = true, fill = !blank.lines.skip,        strip.white = false, blank.lines.skip = true,        comment.char = "#",        allowescapes = false, flush = false,        stringsasfactors = default.stringsasfactors(),        fileencoding = "", encoding = "unknown", text, skipnul = false) 

stringsasfactors

logical: should character vectors converted factors? note overridden as.is , colclasses, both of allow finer control.

colclasses

character. vector of classes assumed columns. recycled necessary, or if character vector named, unspecified values taken na.

possible values na (the default, when type.convert used), "null" (when column skipped), 1 of atomic vector classes (logical, integer, numeric, complex, character, raw), or "factor", "date" or "posixct". otherwise there needs method (from package methods) conversion "character" specified formal class.

note colclasses specified per column (not per variable) , includes column of row names (if any).

also see here in case want go data.table because may run more issues.

fread in r imports large .csv file data frame 1 row

require(data.table) fread("pre2012_alldatapoints.csv", sep = ",", header= true) 

and read

the data.table faq at

https://github.com/rdatatable/data.table/wiki


Comments

Popular posts from this blog

google chrome - Developer tools - How to inspect the elements which are added momentarily (by JQuery)? -

angularjs - Showing an empty as first option in select tag -

php - Cloud9 cloud IDE and CakePHP -