pro-xmlworker 中文的中文意思

workproff 在外贸中是什么
你好 workspace的含义是:包含应用系统、用户、组合的定义,用户由打开ACCES至关闭ACCESS的期间,为一个Workspace 希望可以帮到你
short term是短期的意思,long term是长期的意思
比如说长期债券和短期债券的单词中用的就是它们。
答: 要说财务会计的话
个人感觉Excel用的机会会更多些
Word主要用于文档方面
写论文、报告、通知或者下发的文件等等
它更注重与文章格式排版
而PPT呢自然是幻...
答: 这个问题牵扯的面很大,涉及到软件的破解问题,三言两语说不清的。买破解版的软件应该没有这个问题了。要学破解的话,必须会用汇编语言。暂时说这么多。
答: 楼主可以找找一下软件:
电脑监控专家
F8电脑监控软件
健康上网专家
前两个是共享版,很难找到注册的
大家还关注
确定举报此问题
举报原因(必选):
广告或垃圾信息
激进时政或意识形态话题
不雅词句或人身攻击
侵犯他人隐私
其它违法和不良信息
报告,这不是个问题
报告原因(必选):
这不是个问题
这个问题分类似乎错了
这个不是我熟悉的地区veteranworker中文是什么意思_百度知道
色情、暴力
我们会通过消息、邮箱等方式尽快将举报结果通知您。
veteranworker中文是什么意思
我有更好的答案
kə][美][&#712veteran worker(老司机)[英][ˈn ˈwə,你的支持我们的动力;rən ˈwɚvɛtə-----------------------------希望采纳;kɚ]老练工人; 老工人;vetər&#601
ve teran worker我工作的工人ve teran worker我工作的工人ve teran worker我工作的工人
为您推荐:
其他类似问题
换一换
回答问题,赢新手礼包当前位置: >>
worker replacement
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/Task assignment optimization in knowledgeintensive crowdsourcingArticle
The VLDB Journal ? March 2015DOI: 10.-015-0385-2 ? Source: arXivCITATIONSREADS7585 authors, including: Senjuti Basu Roy University of Washington Tacoma40 PUBLICATIONS
390 CITATIONS
SEE PROFILESaravanan Thirumuruganathan University of Texas at Arlington31 PUBLICATIONS
53 CITATIONS
SEE PROFILESihem Amer-Yahia156 PUBLICATIONS
2,880 CITATIONS
SEE PROFILEAll in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.Available from: I. Lykourentzou Retrieved on: 19 August 2016 The VLDB Journal DOI 10.-015-0385-2REGULAR PAPERTask assignment optimization in knowledge-intensive crowdsourcingSenjuti Basu Roy1 ? Ioanna Lykourentzou2 ? Saravanan Thirumuruganathan3 ? Sihem Amer-Yahia4 ? Gautam Das3Received: 12 March 2014 / Revised: 1 December 2014 / Accepted: 23 March 2015 ? Springer-Verlag Berlin Heidelberg 2015Abstract We present SmartCrowd, a framework for optimizing task assignment in knowledge-intensive crowdsourcing (KI-C). SmartCrowd distinguishes itself by formulating, for the ?rst time, the problem of worker-to-task assignment in KI-C as an optimization problem, by proposing ef?cient adaptive algorithms to solve it and by accounting for human factors, such as worker expertise, wage requirements, and availability inside the optimization process. We present rigorous theoretical analyses of the task assignment optimization problem and propose optimal and approximation algorithms with guarantees, which rely on indexElectronic supplementary material The online version of this article (doi:10.-015-0385-2) contains supplementary material, which is available to authorized users. The work of Saravanan Thirumuruganathan and Gautam Das is partially supported by NSF Grants 15, a NHARP grant from the Texas Higher Education Coordinating Board, and grants from Microsoft Research and Nokia Research.pre-computation and adaptive maintenance. We perform extensive performance and quality experiments using real and synthetic data to demonstrate that the SmartCrowd approach is necessary to achieve ef?cient task assignments of high-quality under guaranteed cost budget. Keywords Collaborative crowdsourcing ? Optimization ? Knowledge-intensive crowdsourcing ? Human factors1 IntroductionKnowledge-intensive crowdsourcing (KI-C) is acknowledged as one of the most promising areas of next-generation crowdsourcing [29], mostly for the critical role it can play in today’s knowledge-savvy economy. KI-C refers to the collaborative creation of knowledge content (for example, Wikipedia articles or news articles) through crowdsourcing. Crowd workers, each having a certain degree of expertise, collaborate and “build” on each other’s contributions to gradually increase the quality of each knowledge piece (hereby referred to as “task”). Despite its importance, no work or platform so far optimizes KI-C, a fact which often results in poor task quality and higher-than-expected costs, thus undermining the reliability of crowds for knowledge-intensive applications. In this paper, we propose SmartCrowd, an optimization framework for knowledge-intensive collaborative crowdsourcing that aims at improving KI-C by optimizing one of its fundamental processes, i.e., worker-to-task assignment, while taking into account the dynamic and uncertain nature of a real crowdsourcing environment [43]. Consider the example of a KI-C application offering news articles on demand as a service to interested stakeholders, such as publication houses, blogs, and individuals. SeveralBSenjuti Basu R senjutib@uw.edu Ioanna Lykourentzou ioanna.lykourentzou@tudor. ioanna.lykourentzou@inria.fr Saravanan Thirumuruganathan saravanan.thirumuruganathan@mavs.uta.edu Sihem Amer-Yahia sihem.amer-yahia@imag.fr Gautam Das gdas@uta.edu1 2University of Washington Tacoma, Tacoma, WA, USA CRP Henri Tudor/INRIA Nancy Grand-Est, Villers-lés-Nancy, France UT Arlington, Arlington, TX, USA LIG, CNRS, Grenoble, France3 4123 S. B. Roy et al.thousands of workers are potentially available to compose thousands of news articles collaboratively. It is easy to imagine that such an application needs to judiciously assign workers to tasks, so as to ensure the delivery of high-quality articles while being cost-effective. Two main challenges need to be investigated: (1) How to formalize the KI-C workerto-task assignment problem? (2) How to solve the problem ef?ciently so as to warrant the desired quality/cost outcome of the KI-C platform, while taking into account the unpredictability of human behavior and the volatility of workers in a realistic crowdsourcing environment? The paper is structured as follows: First, we formalize KI-C worker-to-task assignment as an optimization problem (Sect. 2). In our formulation, the resources are the worker pro?les (knowledge skill per domain, requested wage) and the tasks are the news articles (each having a minimum quality and a maximum cost requirement, as well as the need for certain skills).1 The objective function is formalized so as to guarantee that each task surpasses its quality threshold, stays below its cost limit, and that workers are not over or under utilized. Given the innate uncertainty induced by human involvement, we also use probabilistic modeling to include a third human factor, i.e., the workers’ acceptance ratio,2 in the problem formulation. Then, we argue that it may be prohibitively expensive to assign workers to tasks optimally in real time and reason about the necessity of pre-computation for ef?ciency reasons. We propose index design as a means to ef?ciently address the KI-C optimization problem (Sect. 3). One of the novel contributions of this work is proposing the precomputation of crowd indexes (C-dex) for KI-C tasks, to be used ef?ciently afterward during the actual worker-to-task assignment process. We show how KI-C tasks could bene?t from these pre-computed crowd indexes to ef?ciently maximize the objective function. Third, we examine the problem under dynamic conditions of the crowdsourcing environment, where new workers may subscribe, existing ones may leave, worker pro?les may change over time, and workers may accept or decline recommended tasks. To tackle such unforeseen scenarios, SmartCrowd proposes the adaptive maintenance of the pre-computed indexes, while respecting worker nonpreemption.3 Fourth, we prove several theoretical properties of the Cdex design problem, such as NP-Completeness (using a1reduction from the Multiple Knapsack Problem [13]), as well as submodularity and monotonicity under certain conditions. This in-depth theoretical analysis is critical to understand the problem complexity, as well as to design ef?cient principled solutions with theoretical guarantees. Finally, we propose novel optimal and approximate solutions for the index design and maintenance, depending on the exact KI-C problem conditions. Our optimal solution uses an integer linear programming (ILP) approach (Sect. 4). For the case where the optimal index building or maintenance is too expensive, we propose two types of ef?cient approximate strategies: (1) a greedy approach (Sect. 5) consisting of one randomized and one deterministic approximation algorithm, which both need polynomial computation time and admit constant approximation factors under certain conditions (2/5 for the randomized algorithm and (1 ? 1/e) for the deterministic one), and (2) a clustering-based approximation approach (Sect. 6), called C-dex+ , which is based on the idea of building and maintaining the indexes using clusters of similar workers (a notion called “virtual worker”) instead of the actual worker pool. We design comprehensive experimental studies (Sect. 7), both with real-users and simulations, to validate SmartCrowd qualitatively and ef?ciency-wise. With an appropriate adaptation of Amazon Mechanical Turk (AMT), we conduct extensive quality experiments involving real workers to compose news articles. Such an adaptation needs a careful design of the validation strategies, since AMT (like many other popular paid crowdsourcing platforms) does not yet support KI-C tasks. Extensive simulation studies are used to further investigate quality and ef?ciency. In these, we compare against several baseline algorithms, including one of the latest state-of-the-art techniques [18] for online task assignment in crowdsourcing. The obtained results demonstrate that the algorithms proposed by the SmartCrowd framework achieve 3× improvement, both qualitatively and ef?ciency-wise, justifying the necessity for pre-computed indexes and their adaptive maintenance for the KI-C optimization problem. Our main contributions are summarized as follows: 1. We initiate the study into task assignment optimization in knowledge-intensive crowdsourcing (KI-C), formalize the problem, and propose rigorous theoretical analyses. 2. We propose the necessity of index design and dynamic maintenance to address the KI-C task assignment optimization problem. We propose novel optimal and approximate solutions (C-dex, greedy C-dex, and C-dex+ ) for index creation and adaptive maintenance. 3. We conduct extensive experiments on real and simulated crowdsourcing settings to demonstrate the effectiveness of our proposed solution qualitatively and ef?ciencywise.With the availability of historical information, worker pro?les (knowledge skills and expected wage) can be learned by the platform. Pro?le learning is an independent research problem in its own merit, orthogonal to this work. Acceptance ratio of a worker is the probability that she accepts a recommended task. Non-preemption ensures that a worker cannot be interrupted after she is assigned to a task.23123 Task assignment optimization in knowledge-intensive crowdsourcingThe rest of this paper is organized as follows. Section 2 describes the KI-C task assignment optimization problem, including a description of its data model, its constraints, and its objective. Section 3 presents the SmartCrowd framework, which maps the KI-C task assignment problem to an index design problem (C-dex design problem) and analyzes the latter theoretically. Section 4 describes the proposed optimal algorithm (C-dex-Optimal). Section 5 describes the two greedy approximation algorithms (C-dex-Randomized and C-dex-Deterministic), and Sect. 6 presents the clusteringbased approximation strategy (C-dex+ ). Section 7 contains the experiments, Sect. 8 presents the related work, and Sect. 9 triggers a discussion on the extensions and future perspectives. Finally, Sect. 10 concludes the paper.Table 1 Major notations used in the paper NotationU SInterpretation The set of n workers The set of m skills The set of l tasks A task vector with quality and cost thresholds Pro?le of worker u , m skills, wage, and acceptance ratio Value of task j A C-dex for task t A set of C-dex, C-dex+ The number of active tasks assigned to worker u Two constant weights associated with skill and cost, respectively Minimum and maximum (respectively) task load of a workerT t = Q t1 , Q t2 , . . . , Q tm , W t u = u s1 , u s2 , . . ., u sm , wu , pu vj i t = Pi , LiI , Iv CuC1 , C22 KI-C problem settingsIn this section, we describe a framework to formalize knowledge-intensive crowdsourcing (KI-C). KI-C is widely considered as a key component of next generation of crowdsourcing. We refer the reader to Sect. 8 for its description and a brief survey of KI-C. For most of our paper, we use the application of collaborative document editing to illustrate our model and algorithms. We detail how our framework could be utilized for other popular KI-C applications such as fansubbing (sentence translation by fans) in Sect. 9.4. 2.1 Data model The important notations are summarized in Table 1. We have a set of workers U = {u 1 , u 2 , . . . , u n }, a set of skills S = {s1 , s2 , . . . , sm }, and a set of tasks T = {t1 , t2 , . . . , tl }. In the context of collaborative document writing, skills represent topics such as Egyptian Politics, PlayStation games, or the NSA document leakage. Tasks represent the documents that are being edited collaboratively. Skills A skill is the knowledge on a particular topic, quanti?ed in a continuous scale between [0, 1]. It is associated with workers and tasks. When associated with a worker, it represents the worker’s expertise of a topic. When associated with a task, a skill represents the minimum quality requirement for that task. A value of 0 for a skill re?ects no expertise of a worker for that skill. For a task, 0 re?ects no requirement for that skill. Workers Each worker u ∈ U has a pro?le, that is, a vector, u s1 , u s2 , . . . , u sm , wu , pu , of length m + 2 describing her m skills in S , her wage wu , and her task acceptance ratio pu . C Skill u si ∈ [0, 1] is the expertise level of worker u for skill si . Skill expertise re?ects the quality that the worker’sXl , X hcontribution will assign to a task accomplished by that worker. C Wage wu ∈ [0, 1] is the minimum amount of money for which a worker u is willing to accept to complete a given task. C Acceptance ratio pu ∈ [0, 1] is the probability at which a worker u accepts a task. It re?ects the worker’s availability to complete tasks assigned to her. A value of 0 is used to model workers who are not available (as workers who do not accept any task). We refer to a worker’s skill, wage expectation, and acceptance ratio as human factors that may vary over the time. Tasks A task t ∈ T is characterized by a vector, Q t1 , Q t2 , . . . , Q tm , Wt of length m +1, which re?ects the task’s minimum quality requirement per skill and its maximum cost (or maximum allowed wage). A task t that is being executed has a set of contributors Ut ? U so far, and it is characterized by a: C Current quality qti = Σu ∈Ut u si ∈ [0, |Ut |] for skill si , with u si being the expertise of worker u on skill si . The current quality qti is thus the aggregate of the skill i of all workers who have contributed to the task t so far. C Current cost wt = Σu ∈Ut wu ∈ [0, |Ut |], with wu being the wage paid to worker u . wt aggregates the wages of all workers who have contributed to t so far. Notice that in our paper, we have de?ned task quality as the sum of workers’ skills who take part in it. This is com-123 S. B. Roy et al.monly known as the additive skill aggregation model [2] that is commonly used in KI-C tasks such as document editing, fan-subbing (sentence translation by fans). In additive skill aggregation model, the quality of ?nal task is proportional to the sum of worker skills. This simple intuitive function for transforming individual contributions into a collective result has been adopted in many previous KI-C tasks [2,34], where workers’ build on each others’ contribution by performing edits. Of course, other aggregation measures such as maximum, minimum, or product [2] could also be used to compute the quality of a task. However, their use in KI-C tasks is less obvious. We would also like to note that the acceptance ratio of a worker has an impact on how the current quality and cost of a task are computed. If all the workers assigned to a task are available, the current quality and cost of the task are simply the sum of worker skills and wages, respectively. However, it is possible that when a team is formed, some of the workers might not be available. For example, a worker u with acceptance ratio of 0.5 could only be available 50 % of the time. We can extend the de?nition to handle this uncertainty in a straightforward manner. We now compute the expected quality and cost of a task which are computed by a weighted sum of worker skills and wages with their respective acceptance ratio acting as the weight. Workload We assume a static workload T that represents a set of active tasks over a given time period. Each task in the workload is associated with both minimum quality requirement per skill and a maximum cost. In the spirit of database workloads, we assume that a crowdsourcing workload is representative of the type of tasks handled by the crowdsourcing platform. Existing techniques for capturing database workloads are equally applicable for estimating crowdsourcing workloads. 2.2 Constraints The following constraints are considered: C Minimum quality For each task t ∈ T , the worker-to-task assignment has to be such that the aggregated skill of assigned workers is at least as high as the minimum skill requirement of t for each skill.4 C Maximum cost For each task t ∈ T , the aggregated workers’ wage (wt ) cannot exceed the maximum cost that t can pay, i.e., wt ≤ Wt . C Non-preemption Once a worker has been assigned to a task, she cannot be pulled out of that task until ?nished.4Table 2 Workers pro?les Worker Skill Wage Acceptance ratio u1 0.1 0.05 0.8 u2 0.3 0.25 0.7 u3 0.2 0.3 0.8 u4 0.6 0.7 0.5 u5 0.4 0.3 0.6 u6 0.5 0.4 0.9Table 3 Task descriptions Task Quality threshold Cost threshold t1 0.7 1.08 t2 0.7 1.1 t3 0.9 2.0C Minimum/maximum tasks per worker A worker must be assigned a minimum number of X l tasks and no more than X h tasks. 2.3 Objective Given a set T of tasks and a set U of workers, the objective is to perform worker-to-task assignment for all tasks in T , such that the overall task quality is maximized and the cost is minimized, while the constraints of skill, cost, and tasks per worker are satis?ed. Example 1 We describe a running example consisting of a minuscule version of the news article composition task. Assume that the platform consists of six workers to compose three news articles (tasks) on “Egyptian Politics” (t1 ), “NSA leakage” (t2 ), and “US Health Care Law” (t3 ). For simplicity, we assume that all tasks belong to same topic (“Politics”) and therefore require only one skill (knowledge in politics). We also assume that X l = 1, X h = 2. Worker pro?les (skill, wage, acceptance ratio) and task requirements (minimum quality, maximum cost) are depicted numerically in Tables 2 and 3. This example will be used throughout the paper to illustrate our solution.3 SmartCrowdThe SmartCrowd framework maps the KI-C optimization problem to a problem of index building and maintenance. In the following, we formalize the index design problem and analyze it in-depth theoretically. Then we highlight the need for adaptive index maintenance. Finally, we close this section by presenting the uni?ed approach that all SmartCrowd algorithms adopt toward solving the problem. 3.1 C-DEX C-DEX design problem We now formally describe the CDex design problem. C-Dex stands for crowd index which is,Q t j is the threshold for skill j and qt j ≥ Q t j .123 Task assignment optimization in knowledge-intensive crowdsourcingintuitively, a pre-computation of task assignments for a given worker pool and workload. The pre-computed assignments facilitate faster worker-to-task assignment (akin to indexing in databases). Given a new task, C-dex could be used to make a fast “lookup” to ?nd a team of workers best suited for the task. We start with the KI-C objective described in Sect. 2.3. We de?ne vt to denote the value of each task t in T (in the beginning, vt is 0 for every task). The task value is associated with the current quality and cost of the task. More speci?cally, task value is calculated as a weighted linear combination of skills (higher is better) and cost (lower is better). Also recall that each task has a minimum skill requirement and a maximum cost budget. The index associated with task t should be created such that the aggregated skill of the assigned workers is at least as large as the minimum skill requirement of t for each skill5 and that the aggregated workers’ wage does not exceed the maximum cost that t can pay (i.e., wt ≤ Wt ). Once these two hard constraints are satis?ed, a positive value vt is associated with task t . The objective is to design an index Cdex such that the sum of values V = Σt ∈T vt of all tasks T is maximized, while the problem constraints, as set in Sect. 2.2, are satis?ed. For a task t , its individual value vt and the global value V of the objective function are de?ned in Eq. 1. Maximize V =t ∈TPit of its respective task t (which include the task’s value vt , estimated quality per skill qt1 , . . . , qtm , and ?nal estimated t of users who are assigned to the cost wt ) and the set Li task. We de?ne the crowd index C-dex as follows:t ) is a pair that De?nition 1 (C-dex) A C-dex i t = (Pit , Li represents an assignment of a set of workers in U to a task t . Formally, it is described by a vector Pit of length m + 2, t . P t = v , q , . . . , q , w contains and a set of workers Li t t1 tm t i the value vt of task t , its expected total expertise qti for each t ? U contains the skill si , and its expected total cost wt . Li t workers assigned to index i .Consider Example 1 with T = {t1 , t2 , t3 } for which three indexes are to be created of?ine. If workers {u 1 , u 2 , u 6 } are assigned to task t1 with C1 = C2 = 0.5, then the index for task t1 will be, i t1 = ( 0.6, 0.74, 0.58 , {u 1 , u 2 , u 6 }). The expected quality of t1 is computed by multiplying the skills of assigned workers with their acceptance ratio. For this assignment, the expected quality is 0.1 × 0.8 + 0.3 × 0.7 + 0.5 × 0.9 = 0.74. Similarly, the expected cost is computed as 0.05 × 0.8 + 0.25 × 0.7 + 0.4 × 0.9 ≈ 0.58. Finally, the value of this particularly assignment is 0.5 × 0.74 + 0.5 × (1 ? 0.58/1.08) = 0.6. 3.1.1 Theoretical analysesvtsubject to:wt Wt?t ∈ T vt =? ? C × q + C2 × 1 ? ? ? ? 1 1≤ j ≤m t j if qt j ≥ Q t j and wt ≤ Wt ? ? ? ?0 otherwiseTheorem 1 The decision version of C-dex Design problem is NP-Complete. (1) Proof Given a workload T of l tasks, a set of workers (and their pro?les), constant values C1 , C2 , X l , X h , and V , the decision version of the problem seeks if a set of l indexes could be created (one for each task), where vt is the value of index i t , such that all constraints are satis?ed and the i aggregated global value l i =1 vt greater than or equal to V . It is easy to see that the problem is in NP. To prove NPCompleteness, we prove that the well-known Partition [13] problem is polynomial time reducible to an instance of the C-dex Design problem, i.e., Partition ≤ p C-dex Design problem. The decision version of the Partition problem is as follows: given a ?nite multiset A of positive integers, A can be partitioned into two disjoint subsets A1 and A2 such that the sum of the numbers in A1 [i.e., S (A1 )] equals the sum of the numbers in A2 [i.e., S (A2 )]. We reduce an instance of Partition to create an instance of the C-dex Design problem, as follows. Each number in the multiset represents the skill of an individual worker u (number of skill domain m = 1) and is scaled down to a rational number between [0 ? 1] by diving it by the maximum integer ( Imax ) in the multiset. We assign the wage of each u to be 0, and acceptance ratio to be 1, i.e., pu = 1. The workloadwhere C1 , C2 ≥ 0 and C1 + C2 = 1. The above formulation is a ?exible incorporation of different skills and cost, letting the application select the respective weights (C1 , C2 ), as appropriate. Since C-dex is pre-computed for future use, the skills (or quality) and wages are computed in an expected sense considering the workers’ acceptance ratio, instead of actual aggregates, as follows: ?t ∈ T , ?1 ≤ j ≤ m ?t ∈ T , ?t ∈ T , ?u ∈ U , ?u ∈ U , qt j = u ∈U u t × pu × u s j wt = u ∈U u t × pu × wu u t ∈ {0, 1} X l ≤ t ∈T u t ≤ X hSolution format The solution to the above problem formulation is a set I of C-dex indexes, one index for each task t ∈ T . Each C-dex index i t contains the estimated properties5Q t j is the threshold for skill j and qt j ≥ Q t j .123 S. B. Roy et al.consists of two tasks (is equal to the number of indexes). Both tasks have a minimum and equal skill requirement, i.e., for task t , Q t = S (A1 )/ Imax = S (A2 )/ Imax = S (A)/(2 ? Imax ). Since each worker does not have a cost associated, the cost constraint Wt for task t can be set arbitrarily. Each partition represents an C-dex and the aggregated skill of the workers inside the index must be equal or more than S (A1 )/ Imax or S (A2 )/ Imax . The constant weights C1 and C2 are chosen arbitrarily, as long as C1 + C2 = 1. This creates the following instance of the C-dex Design problem, where v j is the value of the j th C-dexand V is the overall value: V = v1 + v2 v j = C1 × q j + C2 × 1 ? qj =u ∈Uevery x ∈ A\Y , we have f (X ∪ {x }) ? f (X ) ≥ f (Y ∪ { x } ) ? f (Y ). The value function vt for task t satis?es this form: It maps each subset of the workers S from U to a real number vt , denoting the value that we get if that subset of workers are assigned to task t . Conversely, the global optimization function V = Σt ∈T vt is de?ned over a set of sets, where each set maps an assignment of a subset of the workers from U to a task t ∈ T with value vt . Monotonic function A function f de?ned on non-empty subsets is monotonic if for every X ? Y , f (X ) ≤ f (Y ). Theorem 2 The value function vt is not submodular in the C-dex Design problem, if Q t j & 0, ? j ∈ {1 . . . m }.0 Wj,u j × pu × u s , X l = X h = 1,w j = 0,Q j = S (A1 )/ Imax = S (A2 )/ Imax , q j ≥ Q j , ? j ∈ {1, 2} Given the above instance of the C-dex Design problem, the objective is to create 2 C-dex, such that V = C1 × S (A1 )+ S ( A2 ) + C2 and there exists a solution of the Partition Imax problem, if and only if a solution to our instance of the C-dex Design problem exists. 3.1.2 Effects of the constraints on the C-DEX Design problem We investigate interesting theoretical properties of the optimization problem (Eq. 1) under different conditions and constraints. In particular, we investigate the submodularity and monotonicity properties [38] of the objective function, which are heavily used to design approximation algorithms with theoretical guarantees in Sect. 5. Speci?cally, next we prove that our value function vt and our objective function V are neither submodular nor monotonic in the general case. These results prevent the design of approximation algorithms with theoretical guarantees for our problem in the general case. However, under special conditions of the constraints, the value function and the objective function become submodular. Finally, as we show next, monotonicity could be ensured for these functions, when the weight value C2 on cost becomes 0. Submodular function In general, if A is a set, a submodular function is a set function: f : 2A → R that satis?es the following condition: For every X , Y ? A with X ? Y andProof Sketch: Without loss of generality, we ignore the weights and the acceptance ratios of the workers for this proof. For the simplicity of exposition, imagine m = 1. The value vt is de?ned on both quality and cost, but remains 0, if qt1 & Q t1 , or wt & Wt , or both. Therefore, for the rest of our argument, we assume an in?nite cost budget and only focus on quality. Under this assumption, vt remains 0, until qt1 becomes ≥ Q t1 . Consider a subset R ? S , and imagine f (R) & Q t1 , leading to vt = 0. If an element k is added to R, if f (R ∪ k ) & Q t1 , then vt = 0. However, if f (S ) ≥ Q t1 , vt & 0. Therefore, f (S ∪ k ) ? f (S ) & 0. In such cases, it is easy to see, f (S ∪ k ) ? f (S ) & f (R ∪ k ) ? f (R). This clearly violates the submodularity condition. We omit the details for brevity. Theorem 3 The value function vt in the C-dex Design problem is submodular but non-monotone, when Q t j = 0, ? j ∈ {1 . . . m }. Proof Sketch: As long as Q t j = 0, ? j ∈ {1 . . . m } (meaning no quality threshold is provided), it could be proved that the increase in value by adding a worker k to S is less or equal to adding k to R, where R ? S , with the cost threshold wt ≤ Wt . Therefore, the following condition of submodularity i.e., “diminishing return” holds: f (S ∪ k ) ? f (S ) ≤ f (R ∪ k ) ? f (R). At the same time, vt could increase or decrease when a worker is added (depending on whether the skill increase is more than the cost decrease or vice versa). Hence, vt is non-monotone. Consider our running example (Example 1), and note that the value function for task t1 will be submodular but nonmonotone when quality threshold is changed to 0, i.e., Q t1 = 0, instead of 0.7. Theorem 4 The value function vt and the objective function V in the C-dex Design problem are submodular and monotonic, when Q t j = 0, ? j ∈ {1 . . . m } and C2 = 0.123 Task assignment optimization in knowledge-intensive crowdsourcingProof Sketch: Consider Theorem 3 that proves the submodularity property of vt when Q t j = 0. It is easy to see that when C2 = 0, vt will only strictly increase with the addition of workers. This ensures the monotonicity of vt . Next, consider our objective function V = Σt ∈T vt de?ned over a set of sets, where each set de?nes a subset of workers assigned to a task t with value vt . Adding a worker k to a set R (corresponds to task t ) will impact vt and therefore the overall V . Without the skill threshold, i.e., Q t j = 0, ? j ∈ {1 . . . m }, if k is added to S instead, where R ? S , the following condition of submodularity will hold: f (S ∪ k ) ? f (S ) & f (R ∪ k ) ? f (R). Furthermore, V strictly increases when C2 = 0 and ensures monotonicity. Consider our running example (Example 1) again, and note that the value function vt1 for task t1 will be both submodular and monotone when the quality threshold is changed to 0, and the cost function in the objective function (i.e., wt )) becomes 0 by setting C2 = 0. This means C 2 × (1 ? W t that the objective function only wishes to maximize the skill while satisfying only the cost threshold. Similarly, the global objective function V will become submodular as well as monotone when all three tasks have quality threshold as 0 and have C2 = 0. 3.2 Index maintenance Indexing workers in KI-C is more challenging than data indexing for query processing, due to the human factors involved in a dynamic crowdsourcing environment. In particular, a unique challenge that SmartCrowd faces is that even if the most appropriate index is selected for a task, one or more workers who were assigned to the task may not be available (for example, they are not online or they decline the task). The acceptance ratio only quanti?es an overall availability of a worker, but not for a particular task. Therefore, SmartCrowd needs to dynamically ?nd a replacement for unavailable workers. At the same time, SmartCrowd needs to strictly ensure non-preemption of the workers, since it is not desirable that the workers who have already accepted a task and are currently working on it to be forced to stop their current assignment in order to be reassigned to different tasks. Furthermore, SmartCrowd has to deal with scenarios where new workers may subscribe to the system any time or some existing ones may delete their accounts. Similarly, as existing workers complete more tasks, the system may update their pro?le (re?ne their skills for example). How to learn the pro?le of a new worker or an updated pro?le of an existing worker is orthogonal to this work. What we are interested in here is how SmartCrowdmakes use of these dynamic updates, by maintaining the indices incrementally.We will therefore need to investigate a principled solution toward incremental index maintenance for four scenarios: (1) worker replacement due to unavailability for the task, (2) worker addition, (3) worker deletion, and (4) worker pro?le update. 3.3 Sketching the solution Taking into the account the above-presented problem characteristics, as well as it theoretical analysis, we now proceed with sketching the solution. SmartCrowd adopts a uni?ed approach to solve the C-dex design and maintenance problem, and the overall functionality of its algorithms is as follows: C 1. Of?ine phase―index building A set of indexes I , referred to as C-dex are pre-computed based on a simple de?nition of past task workload. This step is referred to as the of?ine phase. C 2. Online phase―index use and maintenance The precomputed indexes are used to perform ef?cient workerto-task assignments once the actual tasks arrive. The pre-computed indexes are also maintained adaptively to account for worker replacements, additions, deletions, or pro?le updates, while respecting worker non-preemption. This step is referred to as the online phase. Of course, if the actual tasks are substantially different from the workload, SmartCrowd has to halt and redesign the indexes from scratch. The latter scenario is orthogonal to us. Taking into account the above, in the next sections we proceed as follows. First, we propose an optimal (i.e., exact) solution in Sect. 4. Next, in Sect. 5, we propose two approximate algorithms, each of which uses a greedy C-dex building and maintenance strategy and admits a provable approximation factor under certain conditions. Then, in Sect. 6, we propose an alternative index building and maintenance algorithm, namely C-dex+ , which is based on clustering.4 Optimal algorithmWe describe the optimal (i.e., exact) C-dex building solution in Sect. 4.1, and we discuss it maintenance in Sect. 4.2. 4.1 C-DEX-Optimal design (of?ine phase) Recall Theorem 1 and note that the C-dex Design problem is proved to be NP-Hard. SmartCrowd proposes an integer linear programming (ILP)-based solution that solves the optimization problem de?ned in Eq. 1 optimally satisfying the constraints. Our implementation uses the primalCdual barrier method [45] to solve the ILP.123 S. B. Roy et al.Algorithm 1 Optimal C-dex Design AlgorithmInput: Workload T 1: Solve the C-dex Design ILP to get an assignment of the u t ∈ {0, 1}, where u is a worker, and t ∈ T . t 2: using u t , for each t ∈ T , compute and output i t = Pit , Li 3: return Index set IWhile the optimization problem is a linear combination of weights and skills, unfortunately, the decision variables (i.e., u t ’s) are required to be integers. More speci?cally, Cdex sets are created by generating a total of n × |T | boolean decision variables, and the solution of this optimization problem assigns a 1/0 value to each variable, denoting that a worker is assigned to a particular task, or not. These integrality constraints make the above formulation an Integer Linear Programming (ILP) problem [15]. A solution to the ILP problem performs an assignment of a worker to a task in T . Once the optimization problem is solved, an index i t t is is designed for each task in the workload and Pit , Li calculated. Algorithm 1 summarizes the pseudo-code. Given Example 1, when C1 = C2 = 0.5, the best allocation gives rise to V = 1.94, with the following workerto-task allocation: u 1 = {t1 }, u 2 = {t1 , t2 }, u 3 = {t3 }, u 4 = {t2 , t3 }, u 5 = {t2 , t3 }, u 6 = {t1 , t3 }. This creates the following 3 indexes: i t1 = ( 0.6, 0.74, 0.58 , {u 1 , u 2 , u 6 }), i t2 = ( 0.55, 0.75, 0.71 , {u 2 , u 4 , u 5 }), and i t3 = ( 0.79, 1.15, 1.13 {u 3 , u 4 , u 5 , u 6 }). Unfortunately, ILP is also NP-Complete [13]. The commercial implementations of ILP use techniques such as Branch and Bound [15] with the objective to speed up the computations. Yet, computation time is mostly nonlinear to the number of associated variables and could become exponential in the worst case. 4.2 C-DEX-Optimal maintenance (online phase) We design index maintenance algorithms, which generate optimal solutions under the non-preemption constraint (constraint no. 3, Sect. 2.2). Non-preemption of workers enforces that the existing assignment of an available worker can not be disrupted, only new assignments can be made if the worker is not maxed out. Under this assumption, all four incremental maintenance strategies described below are optimal. 4.2.1 Replacing workers To dynamically ?nd a replacement for unavailable workers, without disrupting already made assignments, we formulate a marginal ILP and solve the problem optimally only with the available set of workers. We illustrate the scenario with an example. Suppose that after the most appropriate index i t is selected for taskt = Q t1 , Q t2 , . . . , Q tm , Wt using Eq. 1, a subset of workt is unavailable or declines to work on t . Imagine ers in Li that the quality of i t declines to qt j from qt j , for skill j , ? j ∈ [1, m ], and the cost declines to wt from wt , since some workers do not accept the task. Consequently, the value of i t also declines, let us say, to vt from vt . From the worker pool U , let us imagine that a subset of workers U is available and their current assignment has not maxed out (i.e., Cu & X h ). To ?nd the replacement of the unavailable workers, SmartCrowd works as follows: It formulates a marginal ILP problem with the same optimization objective for t , only with the workers in U . More formally, the task is formulated as: Maximize vt = vt + C1 ×?1≤ j ≤m u ∈Uqt j + C 2 × 1 ?wt Wtq t j = q t j + Σ u j × pu × u s j wt = wt + Σ u t × pu × wu , u t ∈ {0, 1}.u ∈U(2) Lemma 1 The marginal ILP in Eq. 2 involves only |U | variables. The above optimization problem is formulated only for |U | workers. It is a task t and considering only |U | incremental in nature, as it “builds” on the current solution (notice that it uses the declined cost, skills, and value in the formulation), involving a much smaller number of variables and leading to small latency. Moreover, this strategy is fully aligned with the optimization objective that SmartCrowd t is updated with proposes. After this formulation is solved, Li the new workers for which the above formulation has produced u t = 1. 4.2.2 Adding new workers Assume that a set A of new workers has subscribed to the platform. The task for SmartCrowd is to decide whether (or not) to assign those workers to any task in T , and if yes, what should be the assignment. Note that SmartCrowd already has assigned the existing worker set U to the tasks in T , and they cannot be preempted. The overall idea is to solve optimally a marginal ILP only with the new workers in A and tasks T , without making any modi?cations to the existing assignments of the U workers to the T tasks. Formally, the problem is formulated as follows: Maximize Σt ∈T vt vt = vt + C1 ×?1≤ j ≤mqti + C2 × 1 ?wt Wt123 Task assignment optimization in knowledge-intensive crowdsourcingq t j = q t j + Σ u t × pu × u s ju ∈Awt = wt + Σ u t × p u × wu u t ∈ {0, 1}, 0 ≤ Σ {u t ∈ A} ≤ X h .t ∈T u ∈A(3)Similar to the previous cases, the proposed solution is principled and well aligned with the optimization objective that SmartCrowd proposes. The solution involves only |A |×|T | variables, and our experimental results corroborate that it generates the output within reasonable latency.Lemma 2 The optimization problem in Eq. 3 involves only |A| × |T | variables. 4.2.3 Deleting workers In principle, the treatment of worker deletion is analogous to that of worker replacement strategies in Sect. 4.2.1. Basically, the idea is to determine the decreased quality, cost, and value of each of the tasks that are impacted by the deletion, and then re-formulate an optimization problem only with those tasks and the remaining workers who are not maxed out yet (i.e., Cu & X h ) on their assignment, using the current quality, cost, and value. Similar to Sect. 4.2.1, this formulation is also a marginal ILP that is incremental in nature and involves a smaller number of variables. We omit further discussion on this for brevity. 4.2.4 Updating worker pro?les5 Greedy approximation algorithmsThe optimal algorithm presented in Sect. 4 may be very expensive regarding index building and maintenance time, since the ILP-based solution has an exponential computation time in the worst case. To expedite these steps, in this section we propose two approximate solutions, one randomized and one deterministic, which are both guaranteed to run in polynomial time and they have provable approximation factors under certain conditions. The randomized algorithm admits a better approximation factor than the deterministic one, when the objective function is only submodular. The deterministic algorithm requires both submodularity and monotonicity to admit the provable approximation factor, but it is computationally more ef?cient than its randomized counterpart. The randomized algorithm is presented in Sect. 5.1 and the deterministic in Sect. 5.2. 5.1 Greedy C-DEX randomized algorithmInterestingly, the handling of worker pro?le updates is also incremental in SmartCrowd. If the skill, wage, or acceptance ratio of a subset A of workers get updated, SmartCrowd ?rst updates the respective value of the tasks (where these workers were assigned), by discounting the contribution of the workers in A . After that, a smaller optimization problem is formulated involving only A workers and T tasks. After discounting the contribution of the workers in A , if the latest value of a task t is vt ,6 current quality on skill j is qt j , and current cost is wt , then the optimization problem is formulated as follows: Maximize Σt ∈T vt where, vt = vt + C1 ×?1≤ j ≤m5.1.1 C-DEX randomized index building (of?ine phase) Offline-CDEX-Randomized This randomized approximation algorithm is an adaptation of the solutions proposed in [11]. Feige et al. [11] proposes its randomized algorithm for a single set (analogous to a single task in our case), whereas here we need to perform the assignment for a set of tasks. The intuitive idea behind the algorithm described in [11] is to perform an “adaptive local search”. It proceeds by locally optimizing a smoothed variant of the optimization function f ( S ), obtained by biased sampling depending on S . The approach of locally optimizing a modi?ed function has been referred to as “non-oblivious local search” [1] in the literature. For smoothing, the aforementioned work [11] uses a multi-linear relaxation [46] of the objective function. In particular, the algorithm in [11] starts with an empty set. For each element x , it computes the marginal gain of adding the element by computing multiple possible random sets with and without x . These steps are referred to as “smoothing”. Then the element that has a marginal gain greater than a threshold value is added to the set. Similarly, if any element in the current solution has a marginal value less than a threshold, it is dropped. This process is repeated until the objective function reaches a local optimum. In this randomized local search, the elements are sampled randomly with different probabilities.(4)qt j + C 2 × 1 ?wt Wt,q t j = q t j + Σ u t × pu × u s ju ∈Awt = wt + Σ u t × p u × wuu ∈Au t = {0, 1},6X l ≤ Σ {u t ∈ A } ≤ X ht ∈TIf none of the workers in A contributed to t , then vt = vt .123 S. B. Roy et al.Algorithm 2 Of?ine-CDEX-RandomizedInput: Workload T = {t1 , t2 , . . . , tl }, U = {u 1 , u 2 , . . . , u n }, X h 1: Fix parameters, δ, δ ∈ [?1, 1]. Start with A = { A1 = {}, A2 = {}, . . . , Al = {}}, no of elements X = n × X h . 2: Call Offline-CDex-ApproxDeterministic to get an estimate of value for optimal worker-to-task assignment, i.e., OPT. 3: For each element u and set At , de?ne w At ,δ (u ) = E [ f ( R ( At , δ) ∪ {u })] ? E [ f ( R ( At , δ)\{u })]. By repeated sampling, compute w ? At ,δ (u ), an estimate of w At ,δ (u ) of OPT. within a factor ± (n ×1 X )2 4: If there exists an u ∈ X \ At such that w ? At ,δ (u ) & (n ×2 O PT , X h )2 include u in At and go to step 3. 5: If there exists an u ∈ X \ At such that w ? At ,δ (u ) & ? (n ×2 O PT , X h )2 exclude u from At and go to step 3. 6: Return a random set R ( At , δ ). 7: return the index set I = { A1 , A2 , . . . , Al }.hAlgorithm 3 Of?ine-CDex-ApproxDeterministicInput: Workload T = {t1 , t2 , . . . , tl }, U = {u 1 , u 2 , . . . , u n }, A = { A1 = {}, A2 = {}, . . . , Al = {}}, X h , X = n × X h 1: For each At , choose a single distinct worker u from X who maximizes f ({u }) and remove u from X 2: If there exists an element u ∈ X \ At , such that f ( At ∪ {u }) & (1 + n 2 ) f ( At ) , then At = At ∪ {u }. Go back to step 1. 3: If there exists an element u ∈ At , such that f ( At \u ) & (1 + ) f ( At ), then At = At \u . Go back to step 1. n2 4: ?t ∈ T , vt = maximum of f ( At ) and f ( X \ At ) 5: return V = t ∈T vt as an estimate of OPT.In other words, given a current solution S , the idea is to do a biased sampling based on whether an element is present in the set. This algorithm [11] is adapted to our problem. It is easy to see that the items in [11] correspond to workers. Algorithm 2 contains the pseudo-code. Given the pool of tasks and workers, the workers are sampled randomly with different probabilities. We start with a set of null sets, i.e., A = { A1 = {}, A2 = {}, . . . , Al = {}}, where the number of sets equals the number of tasks. δ, δ are two ?xed parameters ∈ [?1, 1]. Theorem 3.6 in [11] suggests that we set δ as 1/3 and δ is chosen randomly to be 1/3 with probability 0.9 and ?1 with probability 0.1. We have adhered to this suggestion in our implementation. R ( At , δ) denotes a random set of workers for a task t , where the workers in A δ are sampled with probability p = 1+ 2 and workers outside δ A are sampled with probability q = 1? 2 . For each worker u , in an iteration, the weight of u is computed as w A ,δ,t (u ), which is the marginal gain of adding u to the random set R ( At , δ) in an expected sense (Step 3 of Algorithm 2). In other words, we construct a random set and see the marginal utility of adding this worker. We repeat this process numerous times to get an estimate of the worker’s real weight, w ? At ,δ (u ). After that, worker u is either included in At or excluded from At based on a certain probability check (Step 4 and 5 of Algorithm 2). Finally, a random set R ( At , δ ) is returned for a given δ , representing the assignment of a set of workers to task t (Step 6 of Algorithm 2). In order to decide whether to add a worker to the solution, the algorithm uses a probabilistic check that requires an estimate of the optimal solution. We use the same technique as [11] and use a deterministic local search algorithm (referred to as Offline-CDex-ApproxDeterministic as a subroutine for this purpose (Step 2 of Algorithm 2). Further, since a worker is allowed to be assigned to at most X h tasks, a total number of X h × n elements (each element is a worker) are created.The subroutine Offline-CDex-ApproxDeterministic is called inside Of?ine-CDEX-Randomized to estimate the V , i.e., OPT. This algorithm also runs in a greedy fashion, to increase the value V in each iteration, it either includes a new element u in At or discards it from At . Whether an element would be added or discarded is based on the check that is described in Step-4 of Subroutine 3. This algorithm has an approximation factor of 1/3, which could be proved by directly using the results of [11]. Theorem 5 Offline-CDEX-Randomized has an approximation factor of 2/5, when Q t j = 0, ? j ∈ {1 . . . m }. Proof Sketch: Section 3.1.1 proves that V becomes submodular under the above-mentioned conditions. After that, the approximation factor follows directly from [11]. Lemma 3 The run time of algorithm Offline-CDEX2 Randomized is polynomial, i.e., O ( X h ×δ |U |) × |T | . Proof Based on the previous result [11], the number of itera2 tions per task is at most O ( X h ×δ |U |) ; after that, our result trivially follows. 5.1.2 C-DEX randomized index maintenance (online phase) Akin to the of?ine scenario, we propose a randomized greedy approximation algorithm Online-CDEX-Randomized that is incremental and designed to ensure worker nonpreemption. Replacing workers After a task arrives, if one or more of the assigned workers to this task are not available, an ef?cient greedy solution is proposed by selecting replacement workers from the available pool. This strategy leads to a provable approximation factor of 2/5, when Q t j = 0, ? j ∈ {1 . . . m }. Online-CDEX-Randomized works akin to Offline-CDEX-Randomized, except that it needs to ?nd replacement workers for a single task t . Given t , SmartCrowd consida set of unavailable workers in Li ers the available set of workers U and repeats Algorithm 2123 Task assignment optimization in knowledge-intensive crowdsourcingconsidering only those workers for t and their available bandwidth. The rest of the algorithm is akin to the one described earlier. Theorem 6 Online-CDEX-Randomized admits an approximation factor of 2/5, when Q t j = 0, ? j ∈ {1 . . . m }. Proof Sketch: Our proof uses the submodularity property of vt as proved in Sect. 3.1.1 under these conditions. Of course, unless the above conditions are satis?ed, the above approximation factor does not theoretically hold. Lemma 4 The run time of Online-CDEX-Randomized 2 is polynomial, i.e., O ( ( X h ×δ |U |) ). Addition of new workers Our proposed randomized algorithm can be used to assign new workers to the tasks. This is similar in principle to the of?ine greedy randomized approximation algorithm described above. New workers must be assigned to the pre-computed indexes using Algorithm 2 (randomized solution) without disrupting the existing allocation of the current workers. However, in order to satisfy any theoretical guarantee, the objective function has to relax the quality threshold constraint (to satisfy submodularity). The run time complexity of the algorithm remains unaltered. Deletion of workers The randomized approximation algorithm is adapted to handle worker deletions, akin to the greedy worker replacement strategy described above. It admits the exact same set of theoretical claims under similar conditions as described above. Updates of worker pro?le If the skill, wage, or acceptance ratio of a subset A of workers get updated, SmartCrowd ?rst updates the respective value of the tasks (where these workers were assigned), by discounting the contribution of the workers in A . After that, the greedy randomized approximation algorithm Offline-CDEX-Randomized is adapted involving A workers and T tasks. It iteratively adds a worker in A to a task in T based on sampling as described in Algorithm 2, while satisfying the skill, cost, and number of workers per task constraint. Akin to Offline-CDEX-Randomized, this algorithm does not satisfy the 2/5 approximation factor unless Q t j = 0, ? j ∈ {1 . . . m }. 5.2 Greedy C-DEX deterministic algorithm 5.2.1 C-DEX deterministic index building (of?ine phase) Next we describe the second approximation algorithm Offline-CDEX-Deterministic, which has a provable approximation factor when the function is submodularand monotonic and is more computationally ef?cient compared to its randomized counterpart. Given the pool of tasks and workers, the algorithm iteratively adds a worker to a task such that the addition ensures the highest marginal gain in V in that iteration, while ensuring the quality, cost, and tasks-per-worker constraints. Imagine a particular instance of Offline-CDEX-Deterministic on Example 1 after the ?rst iteration. After a single worker assignment (?rst iteration will assign one worker to one of the indexes), if only u 1 is assigned to i t1 and nobody to i t2 and i t3 yet, then the algorithm may select u 6 to assign to i t3 in the second iteration to ensure the highest marginal gain in V. Theorem 7 Offline-CDEX-Deterministic has an approximation factor of (1 ? 1/e), when Q t j = 0, ? j ∈ {1 . . . m } and C2 = 0. Proof Sketch: The proof directly uses the results of Sect. 3.1.2 (Theorem 4) and on the fact that the optimization function V becomes submodular and monotonic under the abovementioned conditions. After that, the approximation factor follows directly from [38]. Lemma 5 The run time of algorithm Offline-CDEXDeterministic is polynomial, i.e., O ( X h × |U | × |T |). It is evident that Offline-CDEX-Deterministic is more ef?cient than its randomized counterpart OfflineCDEX-Randomized(referred to Lemma 3). 5.2.2 C-DEX deterministic index maintenance (online phase) Replacing workers The worker replacement strategy in this case is a deterministic greedy algorithm Online-CDEXDeterministic that leads to a provable approximation factor when Q t j = 0, ? j ∈ {1 . . . m } C2 = 0. We describe its functionality next. Online-CDEX-Deterministic Given a set of t , SmartCrowd performs a simunavailable workers in Li ple iterative greedy replacement from the available pool of workers U . In a given iteration, the idea is to select the worker from the available pool, whose addition will give t . This iterathe highest marginal gain in vt and add her to Li tive process continues until the cost constraint exceeds. This greedy algorithm is approximate in nature but admits a provable approximation factor under certain conditions. Theorem 8 Online-CDEX-Deterministic admits an approximation factor of 1 ? 1/e, when Q t j = 0, ? j ∈ {1 . . . m } and C2 = 0. Proof Sketch: Our proof uses the monotonicity and submodularity property of vt as proved in Sect. 3.1.1 under these conditions.123 S. B. Roy et al.Of course, unless the above conditions are satis?ed, the above approximation factor does not theoretically hold. Lemma 6 The run time of Online-CDEX-Deterministic is polynomial, i.e., O (|U |). Addition of new workers Again the proposed deterministic algorithm is used to assign new workers to the tasks similarly to the of?ine greedy deterministic approximation algorithm described above. New workers are assigned to the pre-computed indexes using the highest marginal gain of the deterministic solution without disrupting the existing allocation of the current workers. In order to satisfy any theoretical guarantee, the objective function has to satisfy both submodularity and monotonicity. The run time complexity of the algorithms remains unaltered. Deletion of workers A deterministic approximation algorithm is proposed to handle worker deletions, similarly to the greedy worker replacement strategy described above. It admits the exact same set of theoretical claims under similar conditions. Updates of worker pro?le If the skill, wage, or acceptance ratio of a subset A of workers get updated, SmartCrowd ?rst updates the respective value of the tasks (where these workers were assigned) by discounting the contribution of the workers in A . After that, the greedy solution is proposed as follows. A deterministic approximation algorithm is constructed that adapts Offline-CDEX-Deterministic involving A workers and T task. It iteratively adds a worker in A to a task in T based on the highest marginal gain in value, while satisfying the skill, cost, and number of workers per task constraint. Akin to Offline-CDEX-Deterministic, this algorithm does not satisfy the (1 ? 1/e) approximation factor unless Q t j = 0, ? j ∈ {1 . . . m } and C2 = 0.Intuitively, a Virtual Worker represents a set of “indistinguishable” actual workers, who are similar in skills and cost. For the simplicity of exposition, if we assume that in a given worker pool, there are three workers who posses exactly the same skill s and cost w , then a single Virtual Worker V could be created replacing those three with skill s and cost w . Obviously, when there are variations in skills and costs of workers, the pro?le of V needs to be de?ned conservatively―by taking the maximum cost of the individual workers as V ’s cost, and the minimum of the individual worker’s expertise per skill, as V ’s skill. The formal de?nition of V is: De?nition 2 Virtual Worker V: V represents a set n of actual workers that are “indistinguishable”. V is an m + 2 dimensional vector, Vs1 , Vs2 , . . . , Vsm , Vw , |n | describing expected skill, expected wage, number of actual workers in V , where, Vsi = minu ∈n pu × u si , Vw = maxu ∈n pu × wu . Consider Example 1 again, if u 2 and u 5 are grouped together to form a Virtual Worker V , then V = 0.21, 0.18, 2 . It is apparent that the Virtual Workers help reduce the size of the optimization problem. The formal de?nition of C-dex+ is:tV ) is a pair De?nition 3 (C-dex+ ) A C-dex+ i tV = (PitV , Li that represents an assignment of a set of Virtual Workers in tV t , de?ned using are similar to Pit , Li N to a task t . PitV , Li the Virtual Worker set N .6.1 C-DEX+ design (of?ine phase) We work in two steps: (1) Creating the Virtual Workers and (2) Designing the C-dex+ . Creating virtual workers First, a set N of Virtual Workers is created, given U . Intuitively, a Virtual Worker V should represent a set of workers who are similar in their pro?le. In SmartCrowd, Virtual Workers are created by performing multi-dimensional clustering [17] on U and considering a threshold α that dictates the maximum distance between any worker-pairs inside the same cluster. The size of the Virtual Worker set N clearly depends on α , with a large value of α leading to smaller |N |, and vice versa. Interestingly, this allows ?exible design, as the appropriate trade-off between the quality and the runtime complexity could be chosen by the system, as needed. Formally, given U and α , the task is to design a set of Virtual Workers, such that the following condition is satis?ed: ?u , u : u ∈ V , u ∈ V , Dist u , u ≤ α Our implementation uses a variant of Connectivity-based Clustering [17] considering Euclidean distance to that end.6 Clustering-based approximation algorithm: C-DEX+In this section we present a different approximate solution, which is called C-dex+ and it is based on an alternative index building idea. In C-dex+ , the actual worker pool is intelligently replaced by a set of Virtual Workers, which are much smaller in count. SmartCrowd uses the Virtual Workers and the same workload to pre-compute a set of indexes, referred to as C-dex+ . The C-dex+ approach enables ef?cient precomputation, as well as faster assignments from workers to tasks. C-dex+ is an approximate solution, and the quality of its approximation is hinged on how the Virtual Workers are created.123 Task assignment optimization in knowledge-intensive crowdsourcingAlgorithm 4 C-dex+ Design AlgorithmInput: Workload T , U , α 1: Create N , using U and α . 2: Solve the C-dex+ Design ILP problem to get an assignment of Virtual Workers to the tasks. 3: return Index set IV .akin to its C-dex counterpart. Once the solution is achieved, individual worker assignments can be performed with a postprocessing algorithm, in a round robin fashion, by keeping track of individual L V ’s. Addition of new workers First, the existing Virtual Worker set N needs to get updated. Since α is pre-determined, the new workers can be accommodated with incremental clustering, just by forming new clusters (i.e., creating new Virtual Workers) involving those additions, without having to recreate the entire N from scratch. After that, a smaller ILP is formulated, involving only the Virtual Workers that are affected by the updates and considering existing partial assignments, akin to Eq. 3. We omit the details for brevity. Deletion of workers The handling of worker deletion is akin to addition, in the sense that SmartCrowd propagates these updates incrementally to the Virtual Worker set N . To satisfy the pre-de?ned α , it accounts for the remaining actual workers of each Virtual Worker V that has at least one worker deletion. It reruns a smaller clustering solution involving only those remaining workers. As soon as N gets updated, the rest of the maintenance is exactly the same with what is discussed in Sect. 4.2 regarding deletion handling. We omit the details for brevity. Updates of worker pro?le Similarly, if SmartCrowd gets to have an updated pro?le of some of the workers, it ?rst updates the Virtual Workers set by solving a smaller clustering problem, akin to deletion. With the updated Virtual Workers set, the rest of the maintenance is the same as solving a marginal ILP involving only the updated Virtual Workers, similarly to the pro?le update maintenance discussed in Sect. 4.2.For example, if α = 0.25, Example 1 will create |N | = 2 Virtual W V1 with {u 1 , u 2 , u 3 , u 5 } and V2 with{u 4 , u 6 }; V1 = 0.08, 0.18, 4 and V2 = 0.3, 0.36, 2 . Designing C-DEX + : For a Virtual Worker V with |n | actual workers, a counter CV is created stating the maximum assignments of V , i.e., CV = |n | × X h . An ILP is designed analogous to Sect. 4 with |N | workers and all the tasks in T . Additionally, a total of 2|N | c one per V , stating that the maximum and the minimum allocation of V are CV and (|n | × X l ), respectively. Lemma 7 The optimization problem for C-DEX+ involves only |N | × |T | variables. Using the above lemma, it is easy to see that the ILP is likely to get solved faster for C-dex+ , as it involves a smaller number of variables. Example 1 gives two virtual workers V1 , V2 when α = 0.25. Two additional maximum allocation constraints will be added to the optimization problem, such that CV1 = 4, CV2 = 2. Therefore, the index design problem with Virtual Workers could be solved for 3 tasks and 2 Virtual Workers, involving only 3 × 2 = 6 decision variables, instead of 6 × 3 = 18 variables that C-dex has to deal with. While this solution is much more ef?cient compared to C-dex, it may give rise to an approximation of the achieved quality (i.e., the value of the objective function V ), because the search space for the optimization problem gets further restricted with the Virtual Workers, leading to sub-optimal solution for V . Interestingly, our empirical results show that this alternative solution is ef?cient and that the decline in the overall quality is negligible. The output of the C-dex+ Design Algorithm is the set of task indexes IV using virtual workers. Considering Example 1, IV = {i tV1 , i tV2 , i tV3 }. For task t1 , the algorithm created i tV1 = ( 0.38, 0.76, 1.08 , {V1 , V1 , V2 , V2 }), when C1 = C2 = 0.5. The individual worker-to-task assignment can be performed after that by a simple post-processing. 6.2 C-DEX+ maintenance (online phase) Recall that the maintenance strategies are designed for four different scenarios, enforcing the worker non-preemption constraint. Replacing workers C-dex+ designs a marginal ILP, involving task t and all the Virtual Workers whose current CV & 0,7 Experimental evaluationWe perform two different types of experiments. Real-data experiments are conducted involving 250 AMT7 workers in three different stages. Synthetic data experiments are conducted using an event-based crowd simulator. 7.1 Real-data experiments The purpose of these experiments is to evaluate our approach in terms of feasibility and quality. We study feasibility since the current paid crowdsourcing platforms (like AMT) do not support KI-C task development and thus this is one of the ?rst studies trying to optimize KI-C in such an environment. We study quality with the aim to measure the key qualitative axes of the knowledge produced by the hired workers.7Amazon Mechanical Turk, .123 S. B. Roy et al.Fig. 1 Stages of the user studyOverall, the study is designed as an application of collaborative news document writing. Speci?cally, workers hired through AMT are asked to produce documents on ?ve diverse topics (KI-C tasks) of current interest: (1) Political unrest in Egypt, (2) NSA document leakage, (3) PlayStation (PS) games, (4) All electric cars and (5) Global Warming. For simplicity and ease of quanti?cation, we consider that each task requires one skill (i.e., expertise on that topic). To perform the assignment of workers to tasks, we compare three strategies. The ?rst one is the C-dexOptimal, proposed by SmartCrowd and presented in Sect. 4. The second, namely Benchmark, is a rival strategy according to which the workers self-appoint themselves to articles after a skill-based preselection process, akin

我要回帖

更多关于 xmlworkerhelper 中文 的文章

 

随机推荐