Python：wordcloud.wordcloud()函数的参数解析及其说明 / 四六文摘

Python：wordcloud.wordcloud()函数的参数解析及其说明wordcloud.wordcloud()函数的参数解析及其说明class WordCloud Found at: wordcloud.wordcloudclass WordCloud(object):"""Word cloud object for generating and drawing.Parameters----------font_path: stringFont path to the font that will be used (OTF or TTF).Defaults to DroidSansMono path on a Linux machine. If you are on another OS or don't have this font, you need to adjust this path.width : int (default=400)Width of the canvas.height : int (default=200)Height of the canvas.prefer_horizontal : float (default=0.90)The ratio of times to try horizontal fitting as opposed to vertical. If prefer_horizontal < 1, the algorithm will try rotating the word if it doesn't fit. (There is currently no built-in way to get only vertical words.)mask : nd-array or None (default=None)If not None, gives a binary mask on where to draw words. If mask is not None, width and height will be ignored and the shape of mask will be used instead. All white (#FF or #FFFFFF) entries will be considerd "masked out" while other entries will be free to draw on. [This changed in the most recent version!]scale : float (default=1)Scaling between computation and drawing. For large word-cloud images,using scale instead of larger canvas size is significantly faster, but might lead to a coarser fit for the words.min_font_size : int (default=4)Smallest font size to use. Will stop when there is no more room in this size.font_step : int (default=1)Step size for the font. font_step > 1 might speed up computation but give a worse fit.max_words : number (default=200)The maximum number of words.stopwords : set of strings or NoneThe words that will be eliminated. If None, the build-in STOPWORDS list will be used.background_color : color value (default="black")Background color for the word cloud image.max_font_size : int or None (default=None)Maximum font size for the largest word. If None, height of the image is used.mode : string (default="RGB")Transparent background will be generated when mode is "RGBA" and background_color is None.relative_scaling : float (default=.5)Importance of relative word frequencies for font-size. With relative_scaling=0, only word-ranks are considered. With relative_scaling=1, a word that is twice as frequent will have twice the size. If you want to consider the word frequencies and not only their rank, relative_scaling around .5 often looks good... versionchanged: 2.0Default is now 0.5.color_func: callable, default=NoneCallable with parameters word, font_size, position, orientation, font_path, random_state that returns a PIL color for each word.Overwrites "colormap". See colormap for specifying a matplotlib colormap instead.regexp : string or None (optional)Regular expression to split the input text into tokens in process_text.If None is specified, ``r"\w[\w']+"`` is used.collocations : bool, default=TrueWhether to include collocations (bigrams) of two words... versionadded: 2.0colormap : string or matplotlib colormap, default="viridis"Matplotlib colormap to randomly draw colors from for each word.Ignored if "color_func" is specified... versionadded: 2.0normalize_plurals : bool, default=TrueWhether to remove trailing 's' from words. If True and a word appears with and without a trailing 's', the one with trailing 's' is removed and its counts are added to the version without trailing 's' -- unless the word ends with 'ss'.类WordCloud在:WordCloud找到。wordcloudclass WordCloud(对象):用于生成和绘制的Word云对象。参数----------font_path:字符串要使用的字体(OTF或TTF)的字体路径。Linux机器上的默认DroidSansMono路径。如果你在另一个操作系统上或者没有这个字体，你需要调整这个路径。width :int(默认=400)画布的宽度。height :int(默认=200)画布的高度。prefer_horizontal : float(默认=0.90)尝试水平拟合与垂直拟合的时间比。如果prefer_horizontal < 1，算法将尝试旋转不适合的单词。(目前还没有内置的方法来只获取垂直的单词。)mask : nd-array或None(默认=None)如果没有，给出一个二进制掩码在哪里绘制单词。如果遮罩不是None，宽度和高度将被忽略，而使用遮罩的形状。所有白色(#FF或#FFFFFF)的参赛作品将被视为“屏蔽”，而其他参赛作品将可以自由提取。[这在最近的版本中有所改变!]scale :浮动(默认=1)在计算和绘图之间缩放。对于大的字云图像，使用scale而不是更大的画布尺寸会快得多，但可能会导致适合文字的粗化。min_font_size : int(默认=4)使用的最小字体大小。将停止时，没有更多的空间在这个大小。font_step : int(默认=1)字体的步长。font_step > 1可能会加速计算，但是匹配效果更差。max_words :数字(默认=200)单词的最大数量。stopwords :一组字符串或没有将被删除的单词。如果没有，将使用内置的STOPWORDS列表。background_color :颜色值(默认=“黑色”)背景色为字云图像。max_font_size : int或None(默认=None)为最大的字的最大字体大小。如果没有，则使用图像的高度。mode :string(默认="RGB")当模式为“RGBA”，background_color为None时，将生成透明背景。relative_scaling :浮动(默认= 5)字体大小的相对频率的重要性。对于relative_scaling=0，只考虑单词的等级。使用relative_scaling=1，出现频率两倍的单词的大小也会增加一倍。如果您想要考虑单词的频率而不仅仅是它们的排名，那么在5左右的relative_scaling通常看起来不错。. .versionchanged: 2.0现在默认值是0.5。color_func:可调用，默认=无可调用参数word, font_size, position, orientation, font_path, random_state，为每个单词返回一个PIL颜色。覆盖“colormap”。请参阅colormap以指定matplotlib的colormap。regexp :字符串或无(可选)正则表达式，用于在process_text中将输入文本分割为令牌。如果没有指定,“r”\ w (\ w) +”“使用。&collocations :bool, default=True是否包含两个单词的搭配(双字母组合)。. .versionadded: 2.0colormap : string或matplotlib colormap，默认="viridis"Matplotlib colormap为每个单词随机绘制颜色。如果指定了“color_func”，则忽略。. .versionadded: 2.0normalize_plurals : bool, default=True是否删除单词后面的“s”。如果是真的，并且一个单词出现时带有或不带有结尾s，那么带有结尾s的单词将被删除，并将其计数添加到没有结尾s的版本中——除非这个单词以“ss”结尾。Attributes----------``words_`` : dict of string to floatWord tokens with associated frequency... versionchanged: 2.0``words_`` is now a dictionary``layout_ `` : list of tuples (string, int, (int, int), int, color))Encodes the fitted word cloud. Encodes for each word the string, font size, position, orientation and color.Notes-----Larger canvases with make the code significantly slower. If you need a large word cloud, try a lower canvas size, and set the scale parameter.The algorithm might give more weight to the ranking of the words than their actual frequencies, depending on the ``max_font_size ` and the scaling heuristic."""属性---------' ' words_ ' ':浮动字符串的dict具有相关频率的单词标记。. .versionchanged: 2.0“words_”现在是一本字典' ' layout_ ' ':元组列表(字符串，int， (int, int)， int, color))编码合适的词云。为每个单词编码字符串、字体大小、位置、方向和颜色。笔记-----较大的画布使代码明显地变慢。如果你需要一个大的字云，尝试一个较低的画布大小，并设置比例参数。根据' ' max_font_size '和缩放启发式，算法可能给予单词的排名比它们的实际频率更多的权重。”“”def __init__(self, font_path=None, width=400, height=200,margin=2,ranks_only=None, prefer_horizontal=.9, mask=None, scale=1,color_func=None, max_words=200, min_font_size=4,stopwords=None, random_state=None,background_color='black',max_font_size=None, font_step=1, mode="RGB",relative_scaling=.5, regexp=None, collocations=True,colormap=None, normalize_plurals=True):if font_path is None:font_path = FONT_PATHif color_func is None and colormap is None:# we need a color mapimport matplotlibversion = matplotlib.__version__if version[0] < "2" and version[2] < "5":colormap = "hsv"else:colormap = "viridis"self.colormap = colormapself.collocations = collocationsself.font_path = font_pathself.width = widthself.height = heightself.margin = marginself.prefer_horizontal = prefer_horizontalself.mask = maskself.scale = scaleself.color_func = color_func or colormap_color_func(colormap)self.max_words = max_wordsself.stopwords = stopwords if stopwords is not None elseSTOPWORDSself.min_font_size = min_font_sizeself.font_step = font_stepself.regexp = regexpif isinstance(random_state, int):random_state = Random(random_state)self.random_state = random_stateself.background_color = background_colorself.max_font_size = max_font_sizeself.mode = modeif relative_scaling < 0 or relative_scaling > 1:raise ValueError("relative_scaling needs to be ""between 0 and 1, got %f." %relative_scaling)self.relative_scaling = relative_scalingif ranks_only is not None:warnings.warn("ranks_only is deprecated and will beremoved as"" it had no effect. Look into relative_scaling.",DeprecationWarning)self.normalize_plurals = normalize_pluralsdef fit_words(self, frequencies):"""Create a word_cloud from words and frequencies.Alias to generate_from_frequencies.Parameters----------frequencies : dict from string to floatA contains words and associated frequency.Returns-------self"""return self.generate_from_frequencies(frequencies)def generate_from_frequencies(self, frequencies,max_font_size=None):"""Create a word_cloud from words and frequencies. Parameters----------frequencies : dict from string to floatA contains words and associated frequency.max_font_size : intUse this font-size instead of self.max_font_sizeReturns-------self"""# make sure frequencies are sorted and normalizedfrequencies = sorted(frequencies.items(), key=itemgetter(1),reverse=True)if len(frequencies) <= 0:raise ValueError("We need at least 1 word to plot a wordcloud, ""got %d." %len(frequencies))frequencies = frequencies[:self.max_words] # largest entry willbe 1max_frequency = float(frequencies[0][1])frequencies = [(word, freq / max_frequency) forword, freq in frequencies]if self.random_state is not None:random_state = self.random_stateelse:random_state = Random()if self.mask is not None:mask = self.maskwidth = mask.shape[1]height = mask.shape[0]if mask.dtype.kind == 'f':warnings.warn("mask image should be unsigned bytebetween 0"" and 255. Got a float array")if mask.ndim == 2:boolean_mask = mask == 255elif mask.ndim == 3: # if all channels are white, mask out:::3]255, axis=-1)else:boolean_mask = np.all(mask[ ==raise ValueError("Got mask of invalid shape: %s" %str(mask.shape))else:boolean_mask = Noneheight, width = self.height, self.widthoccupancy = IntegralOccupancyMap(height, width,boolean_mask)# create imageimg_grey = Image.new("L", (width, height))draw = ImageDraw.Draw(img_grey)img_array = np.asarray(img_grey)font_sizes, positions, orientations, colors = [], [], [], []last_freq = 1.if max_font_size is None:# if not provided use default font_sizemax_font_size = self.max_font_sizeif max_font_size is None:# figure out a good font size by trying to draw with# just the first two wordsif len(frequencies) == 1:# we only have one word. We make it big!font_size = self.heightelse:self.generate_from_frequencies(dict(frequencies[:2]),max_font_size=self.height)# find font sizessizes = [x[1] for x in self.layout_]try:font_size = int(2 * sizes[0] * sizes[1] /(sizes[0] + sizes[1]))# quick fix for if self.layout_ contains less than 2 values# on very small images it can be emptyexcept IndexError:try:font_size = sizes[0]except IndexError:raise ValueError('canvas size is too small')else:font_size = max_font_size# we set self.words_ here because we calledgenerate_from_frequencies# above... hurray for good design?self.words_ = dict(frequencies)# start drawing grey imagefor word, freq in frequencies:# select the font sizers = self.relative_scalingif rs != 0:font_size = int(round((rs * (freq / float(last_freq)) +(1 - rs)) * font_size))if random_state.random() < self.prefer_horizontal:orientation = Noneelse:orientation = Image.ROTATE_90tried_other_orientation = Falsewhile True:# try to find a positionfont = ImageFont.truetype(self.font_path, font_size)# transpose font optionallytransposed_font = ImageFont.TransposedFont(font, orientation=orientation)# get size of resulting textbox_size = draw.textsize(word, font=transposed_font)# find possible places using integral image:result = occupancy.sample_position(box_size[1] + self.margin,box_size[0] + self.margin,random_state)if result is not None or font_size < self.min_font_size:# either we found a place or font-size went too smallbreak# if we didn't find a place, make font smaller# but first try to rotate!if not tried_other_orientation and self.prefer_horizontal <1:orientation = Image.ROTATE_90 if orientation is Noneelse Image.ROTATE_90tried_other_orientation = Trueelse:font_size -= self.font_steporientation = Noneif font_size < self.min_font_size:# we were unable to draw any morebreakx, y = np.array(result) + self.margin // 2# actually draw the textdraw.text((y, x), word, fill="white", font=transposed_font)positions.append((x, y))orientations.append(orientation)font_sizes.append(font_size)colors.append(self.color_func(word, font_size=font_size,position=(x, y),orientation=orientation,random_state=random_state,font_path=self.font_path))# recompute integral imageif self.mask is None:img_array = np.asarray(img_grey)else:img_array = np.asarray(img_grey) + boolean_mask# recompute bottom right# the order of the cumsum's is important for speed ?!occupancy.update(img_array, x, y)last_freq = freqself.layout_ = list(zip(frequencies, font_sizes, positions,orientations, colors))return selfdef process_text(self, text):"""Splits a long text into words, eliminates the stopwords.Parameters----------text : stringThe text to be processed.Returns-------words : dict (string, int)Word tokens with associated frequency...versionchanged:: 1.2.2Changed return type from list of tuples to dict.Notes-----There are better ways to do word tokenization, but I don'twant toinclude all those things."""stopwords = set([i.lower() for i in self.stopwords])flags = re.UNICODE if sys.version < '3' and type(text) is unicodeelse 0regexp = self.regexp if self.regexp is not None else r"\w[\w']+"words = re.findall(regexp, text, flags)# remove stopwordswords = [word for word in words if word.lower() not instopwords]# remove 'swords = [word[:-2] if word.lower().endswith("'s") else word forword in words]# remove numberswords = [word for word in words if not word.isdigit()]if self.collocations:word_counts = unigrams_and_bigrams(words, self.normalize_plurals)else:word_counts, _ = process_tokens(words, self.normalize_plurals)return word_countsdef generate_from_text(self, text):"""Generate wordcloud from text.The input "text" is expected to be a natural text. If you pass asortedlist of words, words will appear in your output twice. Toremove thisduplication, set ``collocations=False``.Calls process_text and generate_from_frequencies...versionchanged:: 1.2.2Argument of generate_from_frequencies() is not return ofprocess_text() any more.Returns-------self"""words = self.process_text(text)self.generate_from_frequencies(words)return selfdef generate(self, text):"""Generate wordcloud from text.The input "text" is expected to be a natural text. If you pass asortedlist of words, words will appear in your output twice. Toremove thisduplication, set ``collocations=False``.Alias to generate_from_text.Calls process_text and generate_from_frequencies.Returns-------self"""return self.generate_from_text(text)def _check_generated(self):"""Check if ``layout_`` was computed, otherwise raise error."""if not hasattr(self, "layout_"):raise ValueError("WordCloud has not been calculated, callgenerate"" first.")def to_image(self):self._check_generated()if self.mask is not None:width = self.mask.shape[1]height = self.mask.shape[0]else:height, width = self.height, self.widthimg = Image.new(self.mode, (int(width * self.scale),int(height * self.scale)),self.background_color)draw = ImageDraw.Draw(img)for (word, count), font_size, position, orientation, color in self.layout_:font = ImageFont.truetype(self.font_path,int(font_size * self.scale))transposed_font = ImageFont.TransposedFont(font, orientation=orientation)pos = int(position[1] * self.scale), int(position[0] * self.scale)draw.text(pos, word, fill=color, font=transposed_font)return imgdef recolor(self, random_state=None, color_func=None,colormap=None):"""Recolor existing layout.Applying a new coloring is much faster than generating thewholewordcloud.Parameters----------random_state : RandomState, int, or None, default=NoneIf not None, a fixed random state is used. If an int is given,thisis used as seed for a random.Random state.color_func : function or None, default=NoneFunction to generate new color from word count, font size,positionand orientation. If None, self.color_func is used.colormap : string or matplotlib colormap, default=NoneUse this colormap to generate new colors. Ignored ifcolor_funcis specified. If None, self.color_func (or self.color_map) isused.Returns-------self"""if isinstance(random_state, int):random_state = Random(random_state)self._check_generated()if color_func is None:if colormap is None:color_func = self.color_funcelse:color_func = colormap_color_func(colormap)self.layout_ = [(word_freq, font_size, position, orientation,color_func(word=word_freq[0], font_size=font_size,position=position, orientation=orientation,random_state=random_state,font_path=self.font_path)) forword_freq, font_size, position, orientation, _ inself.layout_]return selfdef to_file(self, filename):"""Export to image file.Parameters----------filename : stringLocation to write to.Returns-------self"""img = self.to_image()img.save(filename, optimize=True)return selfdef to_array(self):"""Convert to numpy array.Returns-------image : nd-array size (width, height, 3)Word cloud image as numpy matrix."""return np.array(self.to_image())def __array__(self):"""Convert to numpy array.Returns-------image : nd-array size (width, height, 3)Word cloud image as numpy matrix."""return self.to_array()def to_html(self):raise NotImplementedError("FIXME!!!")

Python：wordcloud.wordcloud()函数的参数解析及其说明

相关推荐